How can services recover from lost messages resulting from communication failures?
Problem
Network and server hardware failure can lead to lost messages, resulting in cases where a service consumer receives no response to its request. Attempts to reissue the request message can lead to unpredictable behavior within the service and the service consumer logic.
Solution
Design service capabilities capable of safely supporting repeated message exchanges.
Application
Idempotent capabilities guarantee that repeated invocations are safe and will have no negative effect. These types of capabilities are generally defined in terms of "set", "put" or "delete" actions that have a post-condition that does not depend on the original state of the service.
Idempotent capabilities that do request changes to service state can be safely retried and are generally limited to read-only data fetches or queries.
The design of an idempotent capability can include the use of a unique identifier with each request so that repeated requests that have already been processed will be discarded by the service rather than being processed again.
Impacts
The use of a unique identifier to define an idempotent capability requires session state to be reliably recorded by the service and preserved across server hardware failures. This can harm the scalability of the service, and may be further complicated if redundant service implementations are operating at different sites that experience network failures.
Not all service capabilities can be idempotent. Unsafe capabilities included those that need to perform "increment", "reverse" or "escalate" transition functions where the post-execution condition is dependent upon the original state of the service.
The getValue() and setValue() capabilities can be safely retried because they are idempotent. The incrementValue() capability is not idempotent and must not be retried unless each request is associated with a unique identifier. Such an identifier would need to be incorporated into the service's session state and must be synchronized between service instances alongside changes to the value.
Related Patterns in This Catalog Reliable Messaging (Little, Rischbeck, Simon),
Reusable Contract (Balasubramanian, Carlyle, Pautasso) ,
Uniform Contract (Raj Balasubramanian, Jim Webber, Thomas Erl, David Booth)