How a Payment System Works? 💸 🧐

 

Content and diagrams credit: 1levelup.dev

Today, we're learning the miracle of payment system design. If you've ever wondered what happens behind the scenes when you make a purchase online, buckle up and let's explore! 🚀

How Does a Payment System Work? 🏦

Before we jump into the technical stuff, let's get a high-level view of how payment systems work globally. Picture this: a customer places an order on a merchant's website. To complete the transaction, they need to provide their payment information.

Step-by-Step Process 📝

  1. Order Placement: The customer places an order on the merchant's website.
  2. Payment Information: The customer is redirected to a payment form to enter their payment details. This form is typically provided by a Payment Gateway.
  3. Compliance and Security: The Payment Gateway ensures compliance with various rules, such as PCI DSS and GDPR. It also handles risk and fraud prevention. Learn more about PCI DDS and GDPR.
  4. Validation: The Payment Gateway validates the financial credentials and transfers them to the merchant’s bank account.
  5. Transaction Transmission: The cardholder's information is sent to the acquiring bank, which processes the payment on behalf of the merchant.
  6. Approval: The acquiring bank captures the transaction info and routes it through the appropriate card networks to the issuing bank for approval.
  7. Response: The issuing bank checks the transaction details, account balance, and account status, then approves or declines the transaction.
  8. Completion: The transaction status is returned to the merchant, who informs the customer.

Payment Service Provider (PSP) 🌐

A PSP is a third-party company that helps businesses handle payments securely. They offer risk management, reconciliation tools, and sometimes even order management. PSPs can also be the acquiring bank, but only sometimes.

Non-functional Requirements 📋

A payment system needs to move money from account A to account B. Sounds simple, right? Well, the challenge lies in making the system reliable, especially in unknown situations. A small slip could potentially cause significant revenue loss. Here, we focus more on the technical concepts that apply to almost every system.

Payment System Components 🧱

Let’s say we need to build a payment system for an online store. We should provide at least the following core features:



  1. Payment Event Generation: When a user clicks the “place order” button, a payment event is generated and sent to the payment service.
  2. External Payment Service: The payment service calls an external PSP to process the card payment.
  3. Payment Page: The user is directed to a payment page where they can enter their details.
  4. PSP Functions: The PSP sends card details to banks or card schemes.
  5. Wallet Update: After successful payment processing, the wallet (account balance of the merchant) is updated.
  6. Ledger Update: All financial transactions are logged for auditing and revenue calculation.

Why Asynchronous Payments? ❓

Benefits of Asynchronous Communication 📈

  • Scalability: Handles a large number of requests without blocking the main thread.
  • Performance: Processes transactions quickly without waiting for responses from external components.
  • Fault Tolerance: Manages errors robustly, retrying requests if components fail.
  • Loose Coupling: This makes it easier to modify or replace components without affecting the entire system.
  • Asynchronous Processing: Frees up resources for other tasks, reducing transaction time.

However, some scenarios, like physical store payments, require real-time authorization, so synchronous communication might be necessary.

Dealing with Payment Failures 💥

Types of Failures

  • System Failures: Network or server failures.
  • Poison Pill Errors: When an inbound message cannot be processed.
  • Functional Bugs: No technical errors, but invalid results.

Guarantee Transaction Completion ✅

We can use a messaging queue like Apache Kafka to guarantee transaction completion. We also create an order event in Kafka for any order placed or paid. This component will help us persist in communicating messages so they are not lost even when things don’t go as planned. The payment operation does not complete successfully until an event is safely stored in this message queue.

Dealing with Transient Failures

Retry Strategies

A customer may try to make a payment, but the request fails due to an unstable network connection. In those cases, it makes sense to retry the operation because network problems are usually temporary.

Immediate Retry

The most basic retry implementation is to retry immediately after a failure. However, it's unlikely that the issue has been solved in such a short amount of time. We can retry at fixed intervals of time or, better yet, at incremental intervals of time to give the system a bit of a break to recover.

Exponential Backoff Retry

In this strategy, we double the waiting time between retries after each failed attempt (e.g., 2^n). This gives the system ample time to recover between retries.

Timeout Pattern ⏱

Set timeouts at a balanced level, allowing for slower responses while avoiding waiting indefinitely for a response. If a request times out, it's treated as a failure, which can lead to issues like double charging. To avoid this, we can use idempotency together with retry strategies.

Fallbacks

The Fallback pattern allows a service to continue execution even if requests to another service fail by filling in a fallback value. This compromise can help avoid losing customers.

Dealing with Persistent Failures 💪

If the error is due to incompatible information, it should be saved for later debugging by isolating problematic messages in a dead-letter queue. If the error is due to a service being down, transactions can be stored in a persistent queue until the service recovers.

Idempotency 🔄

An idempotent operation has no additional effect if it is called more than once with the same input parameters. An idempotency key is generated at the client to avoid double payments and added to the HTTP header. The unique key constraint of any database can be used to ensure the "Exactly Once" guarantee.

Security 🔒

Enforce Encryption for Data-at-Rest and Data-in-Transit

  • Data-at-Rest: Encrypt data into a secure format that needs a key to be read.
  • Data-in-Transit: Encrypt data over a network using VPNs and TLS.

Use access controls to restrict data access, update software regularly, back up data, and use long, complex passwords.

Data Integrity Monitoring 📈

Monitor data integrity by regularly checking for changes in vulnerable data and generating security alerts. This technique helps detect malware and other security threats.

Conclusions 🎉

For a payment system, reliability and fault tolerance are key requirements. We discussed tools and strategies like redundancy, Kafka for message persistence, retry strategies, timeouts, fallbacks, and idempotent message handling to ensure robust and predictable systems. Implement these strategies to build a reliable and fault-tolerant payment system that handles transactions smoothly and efficiently.

#Payments #FinTech #TechTips #DevOps #Scalability #Security #Integration #SolutionArchitect

Post a Comment

Previous Post Next Post