> The Journey
Growing from thousands to millions of monitors required fundamental architectural changes.
- Challenges Faced
- Database Scaling - Query performance degraded with dataset growth
- Worker Management - Scheduling millions of checks efficiently
- Data Storage - Billions of data points per day
- Cost Control - Infrastructure costs scaling linearly
- Solutions Implemented
Sharding Strategy
We implemented horizontal sharding based on monitor ID ranges.
Time-Series Database
Migrated from PostgreSQL to TimescaleDB for metrics storage.
Distributed Scheduling
Built a custom scheduler using Kafka and distributed workers.
- Performance Results
Current system handles:
- 1.2M active monitors
- 100K checks per second
- 10TB of metrics data per month
- Sub-second query performance
The architecture is now designed to scale to 10M monitors without major changes.