In a microservices-based online order processing system using a saga choreography approach with synchronous API calls, a customer places an order involving multiple microservices: Order Service, Inventory Service, and Payment Service. After successful order placement, the Inventory Service decrements stock, and the Payment Service processes the payment. If a failure occurs during the compensation transaction to refund the payment amount after multiple retries, how can the system automatically roll back the inventory update without manual intervention or introducing a single point of failure, given that no queue mechanism is used, and all communication is synchronous?
We are working on a microservices-based system for processing online orders. Our system has three main services: Order Service, Inventory Service, and Payment Service. When a customer places an order, these services work together to handle the process—reserving items, updating stock, and processing payments.
The challenge we’re facing is how to handle failures and rollbacks. For example, if the payment process fails after the inventory has been updated, we need to undo the stock update to maintain consistency. We’re trying to solve this problem using a saga choreography approach, where each service communicates with the others through synchronous API calls.
The main difficulty is ensuring that if a compensating action, like refunding a payment, fails, we can still roll back other actions (like updating inventory) without manual intervention. We can’t use message queues or other asynchronous mechanisms, which makes it harder to coordinate these compensations and prevent inconsistencies. We’re looking for a solution that can handle these situations automatically and reliably, without creating a single point of failure.
We tried implementing the saga choreography pattern with synchronous API calls between our microservices (Order Service, Inventory Service, and Payment Service). The idea was that each service would call the next one in line and handle compensations if something went wrong.
Here’s what we did:
Order Placement: The Order Service created an order and called the Inventory Service to update the stock.
Stock Update: The Inventory Service decremented the stock and then called the Payment Service to process the payment.
Payment Processing: The Payment Service processed the payment and confirmed the transaction.
We expected that if any step failed, the services would call each other to perform compensating actions. For example, if the payment failed, the Inventory Service would be called to revert the stock update.
What We Expected:
Automatic Compensation: We expected the system to automatically roll back previous actions if something went wrong, ensuring data consistency across all services.
Reliability: We aimed for a reliable process without manual intervention, even in cases of failure, and without introducing a single point of failure.
However, we encountered issues with handling compensation transactions, especially when they failed after multiple retries. This made it challenging to maintain consistency without manual intervention, and we needed a way to ensure that compensations (like rolling back inventory updates) could be reliably performed.