The challenge
Infrastructure that grew, but was never measured
The client operates a SaaS platform that processes email newsletter link tracking, handling high-volume bursts of traffic whenever customers send campaigns. Over several years the infrastructure had grown organically into an oversized, expensive, and fragile setup:
- 13 servers - a mix of cloud VMs and two dedicated bare-metal database machines
- 6 load balancers - half of them dead or empty, still billed every month
- Two dedicated bare-metal PostgreSQL servers with overlapping roles
- ~EUR 1,147 / month spend that kept rising while the business did not
The setup was never measured against real usage. Capacity was guessed, then never revisited.
Database - 2 bare-metal, overlapping
Load balancers - 6, three dead
Application VMs - 11, near-idle
Other
Visually cluttered on purpose - much of this bought nothing.
The diagnosis
30 days of real metrics, not guesses
Rather than guessing, the engagement started with 30 days of real production metrics pulled from the client's monitoring platform. The data told a blunt story:
- The 11 application servers ran at 0.2%-2.6% average CPU
- Memory utilization sat below 15% across the fleet
- 3 of 6 load balancers had zero or disabled backends - pure waste
- The two dedicated bare-metal servers duplicated each other's function
The gap between actual usage and the target line is the waste.
EUR 1,147 / month, split
Most of the monthly bill bought nothing.
The solution
A lean, cloud-native, autoscaling architecture
The infrastructure was redesigned as a lean, cloud-native, autoscaling architecture with a proper high-availability database cluster.
PostgreSQL high-availability cluster
The two dedicated bare-metal servers were replaced with a managed PostgreSQL cluster: a primary node and a physical streaming replica in a separate datacenter, continuous replication with lag monitoring, failover within minutes, and automated backups to object storage on a tiered 7d / 4w / 12m retention.
Hot-table partitioning
The two highest-traffic tables, the click and rotator event logs, were re-architected with 12-way hash partitioning, improving query performance and keeping the dataset maintainable at scale.
Zero-downtime migration
The move off bare metal used logical replication: the new cloud cluster synced live from the old database, then a brief low-traffic cutover promoted it to primary, with no service interruption.
Elastic application layer with autoscaling
The 11 application servers were consolidated to 3, placed behind cloud autoscaling. Normal traffic runs on a minimal footprint; a newsletter blast provisions extra nodes automatically, then releases them once the spike passes. The client pays for real demand, not idle worst-case capacity.
Load balancer consolidation
Six load balancers, three of them dead, were consolidated to a single properly sized load balancer.
Autoscaling group
Five boxes instead of twenty-plus. The bill now follows real traffic.
Phase 1
Quick wins
Phase 2
PostgreSQL migration
0 downtime cutoverPhase 3
Cleanup & handover
Phase 4
Backups
The results
The bill cut by two thirds, the platform more reliable
| Metric | Before | After |
|---|---|---|
| Monthly cost | EUR 1,147 | EUR 363 |
| Annual cost | EUR 13,764 | EUR 4,356 |
| Servers | 13 | 5 |
| Load balancers | 6 | 1 |
| Database | 2 bare-metal, overlapping | HA cluster (primary + replica) |
| Scaling | Manual, static | Automatic |
| Migration downtime | - | 0 minutes |
Why it worked
The savings came from measuring before acting
The savings did not come from cutting corners, they came from measuring before acting. Most infrastructure waste is invisible because nobody looks. By starting with 30 days of real metrics, the engagement targeted exactly what was over-provisioned and left everything load-bearing untouched. The result is an architecture that costs less, and is more resilient and more modern than what it replaced.
Services applied