> Beyond HTTP 200
Most monitoring solutions stop at the HTTP status code. But what if the connection takes 10 seconds to establish?
- The 3-Way Handshake
Understanding the SYN, SYN-ACK, ACK process is crucial for identifying network-level issues before they become critical.
Client Server
| SYN |
|------------->|
| SYN-ACK |
|<-------------|
| ACK |
|------------->|
- Why It Matters
A slow TCP handshake can indicate:
- Network congestion
- Server overload
- DNS resolution issues
- Firewall misconfigurations
- Measuring Connection Time
We track multiple metrics:
- DNS Lookup Time - How long to resolve the hostname
- TCP Connect Time - Time to establish the connection
- TLS Handshake Time - SSL/TLS negotiation duration
- Time to First Byte (TTFB) - Server response time
- Implementation
Here's how we measure these metrics in our monitoring system:
interface ConnectionMetrics {
dnsLookup: number;
tcpConnect: number;
tlsHandshake: number;
ttfb: number;
total: number;
}
async function measureConnection(url: string): Promise<ConnectionMetrics> {
const metrics: Partial<ConnectionMetrics> = {};
// DNS lookup timing
const dnsStart = performance.now();
const resolved = await dns.resolve(url);
metrics.dnsLookup = performance.now() - dnsStart;
// TCP connection timing
const tcpStart = performance.now();
const socket = await connect(resolved);
metrics.tcpConnect = performance.now() - tcpStart;
// ... rest of the implementation
return metrics as ConnectionMetrics;
}
- Real-World Example
We once caught a critical issue where HTTP requests returned 200 OK, but the TCP handshake was taking 8+ seconds due to a misconfigured load balancer.
- Best Practices
- Monitor all connection phases separately
- Set appropriate thresholds for each metric
- Alert on trends, not just absolute values
- Consider geographic distribution in your measurements
> Conclusion
True uptime monitoring goes beyond checking status codes. Understanding and monitoring the underlying network behavior is essential for maintaining reliable services.