Skip to content
Back to Insights
FinTech Payments PCI DSS Distributed Systems Event Sourcing

Payments Infrastructure at Scale: Patterns That Don't Lose Transactions

Processing payments at scale is one of the hardest distributed-systems problems in practice. A bug that loses a single transaction is a customer-service incident, a compliance event, and a trust crisis all at once. This post walks through the patterns that keep payment infrastructure reliable under real production load.

Codecanis Admin

10 min read
Financial dashboard with metrics
Real-time view of a high-volume payments fabric handling millions of monthly transactions.

Payment systems are where distributed-systems theory meets its harshest real-world test. A message queue that drops 0.001% of messages is fine for analytics; it's catastrophic for financial transactions. Every architectural decision has downstream consequences in compliance, reconciliation, and user trust.

This post covers the patterns we use in production across our FinTech clients, collectively processing high transaction volumes month after month.

Idempotency: The Foundation of Everything

Before anything else: every payment mutation endpoint must be idempotent. Networks are unreliable. Clients retry. Load balancers retry. If your POST /charges endpoint isn't idempotent, you will double-charge customers. It's a matter of when, not if.

The canonical pattern is an idempotency key — a client-generated UUID passed as a header on every state-changing request. The server records the key and its response; on retry, it returns the cached response without re-executing.

// Laravel example — idempotency middleware
public function handle(Request $request, Closure $next): Response
{
    $key = $request->header('Idempotency-Key');

    if (! $key) {
        return $next($request); // non-idempotent endpoints skip this
    }

    $cacheKey = 'idem:' . hash('sha256', $request->user()->id . ':' . $key);

    if ($cached = Cache::get($cacheKey)) {
        return response()->json(
            json_decode($cached['body'], true),
            $cached['status']
        )->withHeaders(['X-Idempotent-Replay' => 'true']);
    }

    $response = $next($request);

    // Cache for 24 hours — long enough to cover all reasonable retry windows
    Cache::put($cacheKey, [
        'body'   => $response->getContent(),
        'status' => $response->getStatusCode(),
    ], now()->addHours(24));

    return $response;
}

Stripe's idempotency implementation is the industry reference. Adyen uses a similar pattern under the name reference fields. Whatever PSP you're on, enforce idempotency keys as a hard API contract.

Dual-Write and the Problem of Distributed State

A common mistake is writing to two stores atomically — for example, debiting your internal ledger and calling the PSP in the same "transaction." These are not atomic. The PSP call can succeed after your DB write fails, or vice versa. You end up with money in motion but no record of it.

The safe pattern is the outbox pattern (a form of dual-write done correctly):

  1. Write the intent (pending transaction) to your DB atomically with an outbox event in the same SQL transaction.
  2. A separate worker polls the outbox, calls the PSP, and updates the transaction record on success.
  3. The outbox entry is deleted only after the PSP confirms.

This makes the PSP call eventually consistent with your local state, but ensures no money moves without a matching record. Combine with idempotency keys on the PSP call and you handle retries cleanly.

Event Sourcing for the Ledger

For the transaction ledger itself, event sourcing is the right model. Instead of storing the current balance, store every event that affected it — FundsDebited, FundsCredited, ChargebackReceived, RefundIssued — and derive the balance by replaying the event stream.

Benefits in a FinTech context:

  • Complete audit trail — required by PCI DSS and most banking regulators.
  • Temporal queries — "what was the balance at 14:32:07 on this date?" is trivial.
  • Reconciliation — replay and compare against PSP settlement reports.
  • Debugging — every state transition is recorded with timestamp and actor.

We use an append-only ledger_events table with a sequence_number per account, enforced by a unique constraint. Materialized views or a dedicated read model serve balance queries without replaying the full history on every request.

PCI DSS Scope Reduction

Handling raw card data makes you PCI DSS Level 1 compliant territory — an audit that takes months and costs tens of thousands of dollars annually. The engineering goal is to push your system out of scope entirely.

The standard approach:

  • Use Stripe.js / Adyen Web Components / Braintree Drop-in — card data is entered directly into a hosted iframe, tokenised in the PSP's environment, and never touches your servers.
  • Store only the payment method token (e.g., pm_1NqXXX) — not the PAN, CVV, or expiry.
  • Your servers only ever receive and transmit tokens. Scope reduces to SAQ-A or SAQ-A-EP depending on your integration model.

This is not just a compliance win — it shifts the security burden of protecting card data to an organisation whose entire business model depends on getting that right.

Circuit Breakers for PSP Calls

Payment providers have outages. Stripe has had them. Adyen has had them. Without a circuit breaker, a PSP degradation cascades into your entire checkout flow timing out, exhausting your connection pool, and taking down unrelated services.

We use a three-state circuit breaker (Closed → Open → Half-Open) with these thresholds:

  • Open after: 5 failures in a 10-second window.
  • Half-open after: 30 seconds in the Open state.
  • Close after: 3 consecutive successes in the Half-Open state.

When the circuit is Open, payment attempts are queued (with the user shown a "processing" state) or fail fast with a clear error, depending on whether the payment is synchronous (card present) or asynchronous (bank transfer).

In Laravel, we implement this with a Redis-backed state machine and wrap every outbound PSP call in it. spatie/laravel-health exposes circuit state as a health check endpoint so your on-call engineer sees it immediately.

Reconciliation Pipelines

Reconciliation is how you verify that what your system recorded matches what actually moved through the PSP and what appeared in your bank account. Most teams treat it as an accounting problem. We treat it as an engineering problem.

Our reconciliation pipeline runs nightly (and on-demand for anomalies):

  1. Fetch the PSP settlement report via API (Stripe has /v1/reporting/report_runs; Adyen has settlement detail reports via SFTP).
  2. Join on our internal transaction IDs.
  3. Flag discrepancies into a reconciliation_exceptions table with type (missing_in_psp, missing_in_ledger, amount_mismatch, fx_discrepancy).
  4. Exceptions below a threshold auto-escalate to the finance team's Slack channel; above the threshold, they page on-call.

Target: zero unresolved exceptions older than 24 hours. Any exception that lives longer than that is a potential revenue leak or a sign of a systemic bug.

Key Takeaways

  • Idempotency keys are non-negotiable — enforce them as a hard API contract.
  • Use the outbox pattern for PSP calls; never write to two stores in the same atomic operation.
  • Event-source your ledger — it gives you audit trail, temporal queries, and reconciliation for free.
  • Use hosted payment fields (Stripe.js, Adyen Components) to reduce PCI DSS scope to SAQ-A.
  • Wrap every PSP call in a circuit breaker with Redis-backed state.
  • Run automated reconciliation nightly; treat exceptions like production bugs.
Let's build something

Want to work together?

If this article made you think about your architecture, your roadmap, or a problem you haven't solved yet — let's talk.