5 things your uptime monitor won't catch (but your users will)

May 2026 · Sam Reid

Most uptime monitors work like this: send an HTTP request to your URL, check the status code, record the response time. If you get a 200, the monitor logs "up" and goes quiet. If you get a 500, you get a wake-up call at 3 AM.

That model catches a specific class of failure: the server being completely unreachable or crashing hard enough to return a 5xx. It misses a whole other class of failure that, in my experience, is actually more common once your infrastructure is reasonably stable.

Here are five failures I've personally hit that returned HTTP 200 the whole time.

1. The blank screen (JavaScript crash)

Your React or Next.js app is a thin HTML shell. The actual content is rendered by JavaScript. When the server receives a request, it sends back a small HTML file with a script tag, and the app does the rest.

When that JavaScript crashes — a null reference, a bad import, a missing environment variable that was fine on staging but undefined in production — the page renders a blank white screen. Sometimes there's an error boundary that shows a "something went wrong" message. Sometimes there's nothing at all.

The server returned 200. The HTML file was valid. Your ping monitor has no idea anything went wrong. Your users are staring at a white screen.

I hit this after a deploy where a feature flag environment variable wasn't set in production. The JavaScript tried to read it, threw, and crashed the entire React tree. Uptime monitor: fine. Users: confused. Duration: 22 minutes until someone noticed in Slack.

2. The stale CDN serving a cached error

CDNs cache aggressively. That's their whole job. But sometimes they cache things they shouldn't — a 500 error that slipped through with the wrong cache headers, a maintenance page, an old version of your app after a broken deploy that got cached before you rolled it back.

Once that bad response is in the CDN cache, every user hitting that edge node gets the error. Your origin server is fine and returning 200 for new requests. But the CDN is serving 40,000 users a cached bad page, also with 200.

If your uptime monitor hits a node that isn't cached (or hits your origin directly), it sees the good response. It never knows the CDN is on fire.

3. The auth loop

Your login page redirects to the dashboard after auth. Your dashboard redirects to login if you're not authenticated. Somewhere in a deploy, a session cookie configuration changed, or a JWT validation started failing, or your OAuth callback URL got misconfigured.

Now every user who tries to log in gets bounced: login → dashboard → login → dashboard. Infinite loop. They can never get in.

Your ping monitor is checking the homepage, which doesn't require auth. It's fine. The part of your app that's broken — the login flow — is also returning 200 on each individual redirect. The monitor sees all 200s and reports everything as up.

4. The partial page load

Your page loads. The navbar appears. The header loads. Then nothing. The main content — products, posts, whatever the page is supposed to show — never comes. It might be a failed API call, a database query that times out, a third-party widget that silently errors. The page shell is there. The content isn't.

HTTP 200. HTML served. Monitor: happy. Users: staring at a spinner that never resolves.

5. The broken layout after a CSS deploy

A CSS file was updated. The wrong selector got changed, or a new Tailwind purge removed a class that was being generated dynamically, or a font CDN is down. The page loads, all the content is there, but it looks like it was designed by someone who hates users. Text is unstyled. Buttons are invisible. Images are missing.

HTTP 200. All content present. Technically functional. Visually a disaster.

What actually catches these

The only way to catch these failure modes is to actually look at the page. Load it in a real browser, run the JavaScript, wait for the content, take a screenshot, compare it against what the page looked like when everything was working.

That's what GrabDiff does. It runs a headless Chrome instance, loads your URL on a schedule, takes a full-page screenshot, and diffs it pixel-by-pixel against your baseline. If something looks different — blank screen, missing content, broken layout — you get an alert with the diff image attached so you can see exactly what changed.

It doesn't replace a ping monitor. You still want HTTP checks for catching hard failures fast. But for the silent failures that HTTP 200 hides, you need something that looks at the actual page.