Case Study · Logistics & Supply Chain

How we helped a Fortune 500 logistics enterprise hit 99.99% uptime and cut run cost by 40%

A 22-year-old shipment management monolith was throttling growth in 14 markets. In 11 months, FastCurve replatformed it onto an event-driven, cloud-native architecture — without a single customer-visible outage.

FastCurve Delivery Team9 min read

99.99%

Production uptime

−40%

Infrastructure TCO

6×

Release frequency

Cutover incidents

The client

A global top-5 freight forwarder operating across 14 countries, processing more than 28 million shipment events per day across air, ocean, and inland transport. Their customer-facing visibility portal and EDI pipelines ran on a single .NET monolith with a 4 TB SQL Server cluster — an architecture that had grown organically over two decades.

The problem

Peak season traffic was triggering cascading failures. A single slow report query could degrade the entire booking surface. Releases required 6-hour maintenance windows and were limited to one per month. Most damaging, the on-call team was firefighting 30+ Sev-2 incidents per quarter — and the business was losing tenders to digital-native competitors.

Average page load on the customer portal: 4.8 seconds
Monthly maintenance downtime: 6–9 hours
Mean time to recover from a Sev-2 incident: 2h 40m
Engineering effort spent on keep-the-lights-on work: ~65%

Our approach

We did not propose a big-bang rewrite. Instead, FastCurve ran a four-phase Strangler-Fig modernization with the client's platform team embedded in our pods.

Phase 1 — Observability baseline: OpenTelemetry, distributed tracing, and SLOs defined per business capability before any code moved.
Phase 2 — Capability extraction: 9 bounded contexts identified using event storming with business stakeholders; each carved out behind an anti-corruption layer.
Phase 3 — Event backbone: Kafka-based event mesh with idempotent consumers, schema registry, and outbox pattern for transactional safety.
Phase 4 — Cutover by capability: Dual-write and shadow-read for 30 days per capability; traffic shifted via feature flags with one-click rollback.

What we built

The target architecture is a polyglot set of services on AWS EKS, fronted by a federated GraphQL gateway. Stateful workloads use Aurora PostgreSQL with logical replication for zero-downtime migrations. Asynchronous workflows run on Temporal, which replaced 2,300 lines of brittle in-house orchestration code.

A platform-as-a-product team — staffed jointly by FastCurve and the client — owns the golden paths: a paved-road service template, CI/CD with progressive delivery (Argo Rollouts), and a developer portal that cut time-to-first-commit for new engineers from 11 days to under 2.

FastCurve treated our modernization like a product, not a project. The discipline around SLOs and reversibility is what let our board sign off on running the cutover during peak season.
— VP Engineering, Client

Outcomes after 12 months

99.99% measured uptime across all customer-facing surfaces — up from 99.4%
P95 portal load time reduced from 4.8s to 780ms
Release cadence moved from monthly to daily, with progressive rollouts and automated rollback
Infrastructure run cost reduced 40% via right-sizing, spot capacity, and shutting down two legacy data centers
Engineering team reallocated: KTLO dropped from 65% to 22% of capacity, freeing 14 FTE for new revenue features
Net Promoter Score from large shipper accounts improved 19 points year-over-year

Why it worked

Three decisions made the difference: defining SLOs before writing code, refusing to start any cutover without a validated rollback, and embedding FastCurve engineers in the client's on-call rotation. Modernization is a contact sport — we don't hand it over the wall.

Next step

Have a similar problem on your roadmap?

FastCurve partners with engineering and product leaders to ship enterprise-grade software faster, with measurable business outcomes.

Talk to our team