Docker Health Checks

1. Introduction

[!NOTE] Question What are Docker health checks and how do you implement them?

What we're trying to achieve: Learn to implement container health monitoring to ensure applications are running correctly and enable automatic recovery in orchestration systems.

Goal/Aim: By the end of this tutorial, you'll master implementing health checks in Dockerfiles, understand health states, and use health checks with orchestration tools.

2. How to Solve (Explained Simply)

Think of health checks like a heartbeat monitor in a hospital:

Without Health Checks:

Patient appears to be lying in bed (container is running)
But you don't know if they're actually alive and well
Could be unconscious, in pain, or in critical condition
No one knows until someone manually checks

With Health Checks:

Automatic heartbeat monitor (periodic health check)
Alerts when something is wrong
Doctors notified immediately (orchestrator restarts container)
Patient gets help before it's too late

Why Health Checks Matter:

Auto-Recovery: Orchestrators can restart unhealthy containers
Load Balancing: Don't send traffic to unhealthy containers
Monitoring: Know when things go wrong
Zero-Downtime: Replace unhealthy containers automatically
Reliability: Catch issues before they affect users

3. Visual Representation

💓 Health Check States

🚀

STARTING

Grace period - No checks yet

✅

HEALTHY

Health checks passing - Ready for traffic

❌

UNHEALTHY

Health checks failing - Needs restart

🔄

RESTART

Orchestrator restarts container automatically

4. Requirements / What Needs to Be Gathered

Prerequisites:

Basic Docker knowledge
Understanding of your application's health endpoints
Familiarity with Dockerfile syntax
Basic HTTP/networking concepts

Conceptual Requirements:

What makes an application "healthy"?
How to expose health endpoints
Understanding of exit codes
Container orchestration basics

Tools Needed:

Docker
Health checking tool (curl, wget, or custom script)
Application with health endpoint

5. Key Topics to Consider & Plan of Action

Health Check Parameters:

--interval: Time between checks (default: 30s)
--timeout: Max time for check to run (default: 30s)
--start-period: Grace period before checks start (default: 0s)
--retries: Consecutive failures to mark unhealthy (default: 3)

Health Check Methods:

HTTP Endpoint: Most common (curl, wget)
TCP Socket: Check if port is open
Script: Custom health validation
Command: Execute app-specific command

Understanding Plan:

text

Step 1: Define what "healthy" means for your app
↓
Step 2: Implement health check in Dockerfile
↓
Step 3: Test health check behavior
↓
Step 4: Integrate with orchestration
↓
Step 5: Monitor and adjust thresholds

6. Code Implementation

Basic HTTP Health Check

dockerfile

# Simple web application
FROM nginx:alpine

COPY index.html /usr/share/nginx/html/

# Health check using curl
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost/ || exit 1

EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

bash

# Build and run
docker build -t healthy-nginx .
docker run -d --name test-health healthy-nginx

# Check health status
docker ps
# Look for "health: starting" or "healthy"

# Detailed health info
docker inspect test-health --format='{{.State.Health.Status}}'

# View health check logs
docker inspect test-health --format='{{json .State.Health}}' | jq

Node.js Application with Health Endpoint

dockerfile

# Dockerfile
FROM node:18-alpine

WORKDIR /app

COPY package*.json ./
RUN npm ci --only=production

COPY . .

# Expose port
EXPOSE 3000

# Health check that calls our /health endpoint
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD node healthcheck.js

CMD ["node", "server.js"]

javascript

// server.js
const express = require("express");
const app = express();

let isHealthy = true;
let dbConnected = false;

// Simulate DB connection
setTimeout(() => {
  dbConnected = true;
  console.log("Database connected");
}, 5000);

// Health endpoint
app.get("/health", (req, res) => {
  if (!isHealthy || !dbConnected) {
    return res.status(503).json({
      status: "unhealthy",
      database: dbConnected ? "connected" : "disconnected",
    });
  }

  res.json({
    status: "healthy",
    uptime: process.uptime(),
    timestamp: new Date().toISOString(),
  });
});

// Readiness endpoint (for Kubernetes)
app.get("/ready", (req, res) => {
  if (!dbConnected) {
    return res.status(503).json({ status: "not ready" });
  }
  res.json({ status: "ready" });
});

app.get("/", (req, res) => {
  res.send("Hello World!");
});

const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
  console.log(`Server running on port ${PORT}`);
});

javascript

// healthcheck.js - Custom health check script
const http = require("http");

const options = {
  host: "localhost",
  port: 3000,
  path: "/health",
  timeout: 2000,
};

const healthCheck = http.request(options, (res) => {
  console.log(`Health check status: ${res.statusCode}`);

  if (res.statusCode === 200) {
    process.exit(0); // Healthy
  } else {
    process.exit(1); // Unhealthy
  }
});

healthCheck.on("error", (err) => {
  console.error("Health check failed:", err.message);
  process.exit(1); // Unhealthy
});

healthCheck.end();

Python Flask Application

dockerfile

# Dockerfile
FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 5000

# Health check using wget (smaller than curl)
RUN apt-get update && apt-get install -y wget && rm -rf /var/lib/apt/lists/*

HEALTHCHECK --interval=30s --timeout=3s --start-period=30s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:5000/health || exit 1

CMD ["python", "app.py"]

python

# app.py
from flask import Flask, jsonify
import psutil
import time

app = Flask(__name__)
start_time = time.time()

@app.route('/health')
def health():
    """Health check endpoint"""
    try:
        # Check if we have enough memory
        memory = psutil.virtual_memory()
        if memory.percent > 90:
            return jsonify({
                'status': 'unhealthy',
                'reason': 'high memory usage',
                'memory_percent': memory.percent
            }), 503

        # Check if we have enough disk space
        disk = psutil.disk_usage('/')
        if disk.percent > 90:
            return jsonify({
                'status': 'unhealthy',
                'reason': 'low disk space',
                'disk_percent': disk.percent
            }), 503

        # All checks passed
        return jsonify({
            'status': 'healthy',
            'uptime_seconds': time.time() - start_time,
            'memory_percent': memory.percent,
            'disk_percent': disk.percent
        }), 200

    except Exception as e:
        return jsonify({
            'status': 'unhealthy',
            'error': str(e)
        }), 503

@app.route('/ready')
def ready():
    """Readiness check endpoint"""
    # Check if dependencies are ready
    # e.g., database connection, external APIs, etc.
    return jsonify({'status': 'ready'}), 200

@app.route('/')
def index():
    return 'Hello World!'

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Database Container Health Check

dockerfile

# PostgreSQL with health check
FROM postgres:15

# Health check using pg_isready
HEALTHCHECK --interval=10s --timeout=5s --start-period=30s --retries=5 \
  CMD pg_isready -U postgres || exit 1

# Or more thorough check with actual query
# HEALTHCHECK --interval=10s --timeout=5s --start-period=30s --retries=5 \
#   CMD psql -U postgres -c "SELECT 1" || exit 1

Redis Container Health Check

dockerfile

# Redis with health check
FROM redis:7-alpine

# Health check using redis-cli
HEALTHCHECK --interval=5s --timeout=3s --start-period=10s --retries=3 \
  CMD redis-cli ping || exit 1

Docker Compose with Health Checks

yaml

# docker-compose.yml
version: "3.8"

services:
  web:
    build: ./web
    ports:
      - "8080:80"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    depends_on:
      api:
        condition: service_healthy

  api:
    build: ./api
    ports:
      - "3000:3000"
    healthcheck:
      test: ["CMD", "node", "healthcheck.js"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    depends_on:
      database:
        condition: service_healthy
      redis:
        condition: service_healthy

  database:
    image: postgres:15
    environment:
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    volumes:
      - db_data:/var/lib/postgresql/data

  redis:
    image: redis:7-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 3s
      retries: 3

volumes:
  db_data:

Advanced Health Check with Shell Script

dockerfile

# Dockerfile
FROM node:18-alpine

WORKDIR /app

# Install dependencies
RUN apk add --no-cache curl jq

COPY package*.json ./
RUN npm ci --only=production

COPY . .

# Copy health check script
COPY healthcheck.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/healthcheck.sh

EXPOSE 3000

# Use custom health check script
HEALTHCHECK --interval=30s --timeout=10s --start-period=40s --retries=3 \
  CMD /usr/local/bin/healthcheck.sh

CMD ["node", "server.js"]

bash

#!/bin/sh
# healthcheck.sh - Comprehensive health check

# Check if process is running
if ! pgrep -f "node server.js" > /dev/null; then
    echo "Process not running"
    exit 1
fi

# Check HTTP endpoint
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:3000/health)
if [ "$HTTP_CODE" != "200" ]; then
    echo "HTTP health check failed: $HTTP_CODE"
    exit 1
fi

# Check response time
RESPONSE_TIME=$(curl -s -w "%{time_total}" -o /dev/null http://localhost:3000/health)
if [ "$(echo "$RESPONSE_TIME > 5" | bc)" -eq 1 ]; then
    echo "Response time too slow: ${RESPONSE_TIME}s"
    exit 1
fi

# Check memory usage
MEMORY_PERCENT=$(ps -o %mem -p $(pgrep -f "node server.js") | tail -1 | tr -d ' ')
if [ "$(echo "$MEMORY_PERCENT > 80" | bc)" -eq 1 ]; then
    echo "Memory usage too high: ${MEMORY_PERCENT}%"
    exit 1
fi

echo "All health checks passed"
exit 0

TCP Socket Health Check

dockerfile

# Check if port is accepting connections
FROM myapp:latest

HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD nc -z localhost 8080 || exit 1

7. Things to Consider

Best Practices:

Implement Proper Health Endpoints

javascript

// ✅ Good - comprehensive check
app.get("/health", async (req, res) => {
  try {
    await db.ping(); // Check database
    await cache.ping(); // Check Redis
    // Check other dependencies
    res.json({ status: "healthy" });
  } catch (error) {
    res.status(503).json({ status: "unhealthy", error: error.message });
  }
});

Set Appropriate Timeouts

dockerfile

# ✅ Good timing
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3

# ❌ Too aggressive
HEALTHCHECK --interval=5s --timeout=1s --start-period=0s --retries=1

Use Start Period for Slow Startups

dockerfile

# ✅ Good - adequate startup time
HEALTHCHECK --start-period=60s CMD ...

# For databases
HEALTHCHECK --start-period=120s CMD ...

Return Proper Exit Codes

bash

# ✅ Healthy
exit 0

# ❌ Unhealthy
exit 1

Common Pitfalls:

❌ No start period (marks healthy apps as unhealthy during startup) ✅ Set adequate

text

--start-period

for initialization

❌ Too frequent checks (unnecessary overhead) ✅ Use reasonable intervals (30s is good default)

❌ Timeout too short (false positives) ✅ Allow enough time for health check to complete

❌ Checking only process existence (doesn't verify functionality) ✅ Check actual application functionality

Health vs Readiness:

Type	Purpose	When to Fail	Example
Health	Is app functioning?	App crash, deadlock	Process check, /health
Readiness	Can app serve traffic?	DB down, warming up	/ready endpoint

javascript

// Health - application itself
app.get("/health", (req, res) => {
  if (processIsHealthy()) {
    res.status(200).send("OK");
  } else {
    res.status(503).send("Unhealthy");
  }
});

// Readiness - dependencies ready
app.get("/ready", async (req, res) => {
  if (await allDependenciesReady()) {
    res.status(200).send("Ready");
  } else {
    res.status(503).send("Not Ready");
  }
});

8. Additional Helpful Sections

Monitoring Health Status

bash

# View health status
docker inspect myapp --format='{{.State.Health.Status}}'

# View full health history
docker inspect myapp --format='{{json .State.Health}}' | jq

# Watch health status in real-time
watch -n 1 'docker inspect myapp --format="{{.State.Health.Status}}"'

# Filter containers by health
docker ps --filter health=healthy
docker ps --filter health=unhealthy

Health Check Events

bash

# Monitor health check events
docker events --filter type=container --filter event=health_status

# Sample output:
# 2024-01-15T10:30:00 container health_status: healthy (container_id)
# 2024-01-15T10:30:30 container health_status: unhealthy (container_id)

Integration with Orchestration

Docker Swarm:

yaml

version: "3.8"

services:
  web:
    image: myapp:latest
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
        failure_action: rollback
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/health"]
      interval: 30s
      timeout: 10s
      retries: 3
# Swarm won't route traffic to unhealthy replicas

Kubernetes (converted):

yaml

apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
    - name: myapp
      image: myapp:latest
      livenessProbe:
        httpGet:
          path: /health
          port: 8080
        initialDelaySeconds: 40
        periodSeconds: 30
        timeoutSeconds: 10
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080
        initialDelaySeconds: 10
        periodSeconds: 10

Debugging Failed Health Checks

bash

# Run health check manually
docker exec myapp curl -f http://localhost/health

# View health check logs
docker inspect myapp --format='{{range .State.Health.Log}}{{.Output}}{{end}}'

# Check last health check result
docker inspect myapp --format='{{(index .State.Health.Log 0).ExitCode}}'

# Test health check outside container
curl -v http://localhost:8080/health

Summary

Docker health checks monitor container health by periodically executing commands and marking containers as healthy or unhealthy based on exit codes. Implement health checks using

text

HEALTHCHECK

in Dockerfiles with four key parameters:

text

--interval

(check frequency),

text

--timeout

(max execution time),

text

--start-period

(grace period), and

text

--retries

(failures before unhealthy). Use exit code 0 for healthy and 1 for unhealthy states. Create dedicated

text

/health

endpoints that check critical dependencies like databases and external APIs. Health checks enable orchestrators to automatically restart failing containers, prevent traffic routing to unhealthy instances, and ensure high availability. Always set adequate start periods for slow-starting applications, use reasonable check intervals (30s default), and validate actual functionality rather than just process existence. Health checks are essential for production deployments and zero-downtime updates.