How we helped a Fortune 500 logistics enterprise hit 99.99% uptime and cut run cost by 40%
A 22-year-old shipment management monolith was throttling growth in 14 markets. In 11 months, FastCurve replatformed it onto an event-driven, cloud-native architecture — without a single customer-visible outage.
The client
A global top-5 freight forwarder operating across 14 countries, processing more than 28 million shipment events per day across air, ocean, and inland transport. Their customer-facing visibility portal and EDI pipelines ran on a single .NET monolith with a 4 TB SQL Server cluster — an architecture that had grown organically over two decades.
The problem
Peak season traffic was triggering cascading failures. A single slow report query could degrade the entire booking surface. Releases required 6-hour maintenance windows and were limited to one per month. Most damaging, the on-call team was firefighting 30+ Sev-2 incidents per quarter — and the business was losing tenders to digital-native competitors.
- Average page load on the customer portal: 4.8 seconds
- Monthly maintenance downtime: 6–9 hours
- Mean time to recover from a Sev-2 incident: 2h 40m
- Engineering effort spent on keep-the-lights-on work: ~65%
Our approach
We did not propose a big-bang rewrite. Instead, FastCurve ran a four-phase Strangler-Fig modernization with the client's platform team embedded in our pods.
- Phase 1 — Observability baseline: OpenTelemetry, distributed tracing, and SLOs defined per business capability before any code moved.
- Phase 2 — Capability extraction: 9 bounded contexts identified using event storming with business stakeholders; each carved out behind an anti-corruption layer.
- Phase 3 — Event backbone: Kafka-based event mesh with idempotent consumers, schema registry, and outbox pattern for transactional safety.
- Phase 4 — Cutover by capability: Dual-write and shadow-read for 30 days per capability; traffic shifted via feature flags with one-click rollback.
What we built
The target architecture is a polyglot set of services on AWS EKS, fronted by a federated GraphQL gateway. Stateful workloads use Aurora PostgreSQL with logical replication for zero-downtime migrations. Asynchronous workflows run on Temporal, which replaced 2,300 lines of brittle in-house orchestration code.
A platform-as-a-product team — staffed jointly by FastCurve and the client — owns the golden paths: a paved-road service template, CI/CD with progressive delivery (Argo Rollouts), and a developer portal that cut time-to-first-commit for new engineers from 11 days to under 2.
FastCurve treated our modernization like a product, not a project. The discipline around SLOs and reversibility is what let our board sign off on running the cutover during peak season.
Outcomes after 12 months
- 99.99% measured uptime across all customer-facing surfaces — up from 99.4%
- P95 portal load time reduced from 4.8s to 780ms
- Release cadence moved from monthly to daily, with progressive rollouts and automated rollback
- Infrastructure run cost reduced 40% via right-sizing, spot capacity, and shutting down two legacy data centers
- Engineering team reallocated: KTLO dropped from 65% to 22% of capacity, freeing 14 FTE for new revenue features
- Net Promoter Score from large shipper accounts improved 19 points year-over-year
Why it worked
Three decisions made the difference: defining SLOs before writing code, refusing to start any cutover without a validated rollback, and embedding FastCurve engineers in the client's on-call rotation. Modernization is a contact sport — we don't hand it over the wall.
Have a similar problem on your roadmap?
FastCurve partners with engineering and product leaders to ship enterprise-grade software faster, with measurable business outcomes.
Talk to our team