GrabDiff
← Blog

How to know when your cron job fails silently

May 2026 ·

Cron jobs are one of the most under-monitored parts of most production systems. They run quietly in the background, and unless you go looking, you often have no idea whether they ran, whether they succeeded, or whether they've been silently failing for the past three weeks.

I learned this the hard way when I discovered that our nightly invoice generation job had been failing for 19 days. No alert. No email. Nothing. It just... stopped working after a dependency update changed a function signature, threw an exception, and exited with a non-zero status code that cron dutifully ignored.


The problem with cron's error handling

By default, cron sends output to the local mail spool. Not email. Local mail. The kind nobody reads. If you haven't explicitly configured cron to send failure output somewhere useful — and most people haven't — failed jobs are invisible.

You can add MAILTO=you@example.com to your crontab, which helps, but you're now dependent on your server's mail sending working correctly, which introduces its own failure modes.

More importantly, cron doesn't know what "success" means for your job. It can tell whether the process exited with 0 or non-zero. It can't tell whether a process that exited 0 actually did what you wanted.

What heartbeat monitoring is

A heartbeat monitor inverts the problem. Instead of monitoring whether your cron job fails, you monitor whether it stops checking in.

The flow is simple:

  1. You set up a heartbeat monitor with an expected interval — say, every 24 hours
  2. At the end of your cron job (after it successfully completes), it sends an HTTP GET to a unique URL
  3. If the monitor doesn't receive a ping within the expected window, it fires an alert

The key insight: you're monitoring for the absence of a signal, not the presence of an error. If your job doesn't run (cron died, the server rebooted, the job was accidentally removed), you get alerted. If your job crashes before reaching the ping line, you get alerted. If the ping URL just gets removed from your code, you get alerted.

What this looks like in practice

Here's a simple bash cron job with a heartbeat ping added:

#!/bin/bash
set -e

# Do your actual work
python3 /opt/scripts/generate-invoices.py

# Only reached if the above succeeded (set -e exits on any error)
curl -fsS --retry 3 "https://grabdiff.com/ping/your-monitor-slug" > /dev/null

The set -e at the top means any command that fails will exit the script immediately. The ping at the bottom only runs if everything above it succeeded. If the script fails partway through, the ping doesn't fire, and you get alerted after the expected interval passes.

For a Python job:

import requests
import sys

def run():
    # Your job logic here
    generate_invoices()
    send_reports()

if __name__ == "__main__":
    try:
        run()
        # Ping on success
        requests.get("https://grabdiff.com/ping/your-monitor-slug", timeout=10)
    except Exception as e:
        print(f"Job failed: {e}", file=sys.stderr)
        sys.exit(1)

The variants: start and fail pings

Some heartbeat systems support three ping types:

  • /ping/slug — job completed successfully
  • /ping/slug/start — job started (lets the monitor track duration)
  • /ping/slug/fail — job explicitly failed (lets you report failure immediately rather than waiting for timeout)

Adding a start ping lets you detect jobs that are running but never finishing — hanging processes, deadlocks, queries that are taking unexpectedly long.

#!/bin/bash
# Signal start
curl -fsS "https://grabdiff.com/ping/your-slug/start" > /dev/null

# Run job
if python3 /opt/scripts/generate-invoices.py; then
    curl -fsS "https://grabdiff.com/ping/your-slug" > /dev/null
else
    curl -fsS "https://grabdiff.com/ping/your-slug/fail" > /dev/null
    exit 1
fi

GrabDiff supports all three ping types. Set up a heartbeat monitor, add the ping to your cron job, and you'll know within minutes if your scheduled tasks stop running.