Docker Swarm Basics

1. Introduction

[!NOTE] Question What is Docker Swarm and how do you use it for container orchestration?

What we're trying to achieve: Learn to use Docker Swarm for orchestrating containers across multiple hosts, enabling high availability, scaling, and load balancing.

Goal/Aim: By the end of this tutorial, you'll understand Docker Swarm architecture, know how to create and manage a swarm cluster, deploy services, and implement rolling updates.

2. How to Solve (Explained Simply)

Think of Docker Swarm like managing a fleet of delivery trucks:

Without Swarm (Single Host):

One warehouse (server) with one truck (container)
If the truck breaks down, deliveries stop
Can't handle increased demand
One person managing everything manually

With Docker Swarm (Orchestration):

Multiple warehouses (nodes) with many trucks (containers)
If one truck breaks, others take over automatically
Add more trucks when demand increases
Central dispatch system manages everything
Automatic routing and load balancing

Why Use Docker Swarm?

High Availability: If a node fails, containers restart on healthy nodes
Load Balancing: Distributes traffic across containers automatically
Scaling: Add or remove containers with one command
Rolling Updates: Update services with zero downtime
Service Discovery: Containers find each other automatically
Simple Setup: Easier than Kubernetes for small to medium deployments

3. Visual Representation

Docker Swarm Architecture

Swarm Cluster Architecture

Manager Nodes (Control Plane)

Manager 1 (Leader)

Service 1

Service 2

Manager 2 (Reachable)

Service 1

Service 2

↓

Worker Nodes (Container Execution)

Worker 1

Worker 2

Worker 3

✅

High Availability

✅

Load Balancing

✅

Auto-Scaling

4. Requirements / What Needs to Be Gathered

Prerequisites:

Docker installed on multiple hosts (or use Docker Desktop)
Basic Docker knowledge
Understanding of networking basics
Terminal access to all nodes

Conceptual Requirements:

What is container orchestration?
Understanding of services vs tasks
Concepts of replication and scaling
Basic load balancing knowledge

Tools Needed:

Docker Engine (Swarm mode built-in)
Multiple hosts/VMs (or Docker Desktop for testing)
Network connectivity between nodes

5. Key Topics to Consider & Plan of Action

Swarm Concepts:

Node Types:
- Manager Nodes: Control plane, maintain cluster state
- Worker Nodes: Execute containers
Services:
- Replicated: Specified number of identical tasks
- Global: One task per node
Tasks:
- Individual container instances
- Scheduled by managers onto workers
Overlay Networks:
- Multi-host networking
- Secure container communication

Understanding Plan:

text

Step 1: Initialize Swarm
↓
Step 2: Add nodes to cluster
↓
Step 3: Deploy services
↓
Step 4: Scale and manage
↓
Step 5: Implement updates

6. Code Implementation

Initialize a Swarm

bash

# On the manager node
docker swarm init

# Output:
# Swarm initialized: current node (xyz) is now a manager.
# To add a worker to this swarm, run the following command:
#     docker swarm join --token SWMTKN-1-xxx... 192.168.1.100:2377

# Check swarm status
docker info | grep Swarm
# Swarm: active

# View nodes
docker node ls
# ID        HOSTNAME   STATUS   AVAILABILITY   MANAGER STATUS
# abc123    manager1   Ready    Active         Leader

Add Worker Nodes

bash

# On worker nodes, use the token from swarm init
docker swarm join \
  --token SWMTKN-1-xxx... \
  192.168.1.100:2377

# On manager, verify nodes
docker node ls
# ID        HOSTNAME   STATUS   AVAILABILITY   MANAGER STATUS
# abc123    manager1   Ready    Active         Leader
# def456    worker1    Ready    Active
# ghi789    worker2    Ready    Active

Deploy a Simple Service

bash

# Create a service with 3 replicas
docker service create \
  --name web \
  --replicas 3 \
  --publish published=8080,target=80 \
  nginx:alpine

# List services
docker service ls
# ID        NAME   MODE        REPLICAS   IMAGE
# xyz123    web    replicated  3/3        nginx:alpine

# Inspect service
docker service ps web
# ID        NAME     IMAGE          NODE      DESIRED STATE   CURRENT STATE
# abc       web.1    nginx:alpine   manager1  Running         Running 1 min
# def       web.2    nginx:alpine   worker1   Running         Running 1 min
# ghi       web.3    nginx:alpine   worker2   Running         Running 1 min

Scaling Services

bash

# Scale up to 5 replicas
docker service scale web=5

# Scale down to 2 replicas
docker service scale web=2

# Scale multiple services
docker service scale web=3 api=5

# View scaling in action
watch docker service ps web

Service with Environment Variables

bash

# Create service with env vars
docker service create \
  --name api \
  --replicas 3 \
  --env NODE_ENV=production \
  --env PORT=3000 \
  --env DB_HOST=db.example.com \
  --publish 3000:3000 \
  myapp/api:latest

# Update environment variables
docker service update \
  --env-add LOG_LEVEL=debug \
  api

Service with Volumes

bash

# Create service with volume
docker service create \
  --name db \
  --replicas 1 \
  --mount type=volume,source=db-data,target=/var/lib/postgresql/data \
  --env POSTGRES_PASSWORD=secret \
  postgres:15

# With bind mount
docker service create \
  --name web \
  --replicas 3 \
  --mount type=bind,source=/host/path,target=/container/path \
  nginx:alpine

Overlay Network

bash

# Create overlay network
docker network create \
  --driver overlay \
  --attachable \
  my-network

# Create services on the network
docker service create \
  --name frontend \
  --network my-network \
  --replicas 3 \
  myapp/frontend:latest

docker service create \
  --name backend \
  --network my-network \
  --replicas 5 \
  myapp/backend:latest

# Services can communicate using service names
# frontend can reach backend at "http://backend:port"

Rolling Updates

bash

# Update service to new image
docker service update \
  --image myapp:v2 \
  web

# Update with controlled rollout
docker service update \
  --image myapp:v2 \
  --update-parallelism 1 \
  --update-delay 10s \
  web

# Rollback to previous version
docker service rollback web

# Update with constraints
docker service update \
  --constraint-add node.labels.environment==production \
  web

Docker Stack (Multi-Service Deployment)

yaml

# docker-compose.yml (stack file)
version: "3.8"

services:
  web:
    image: nginx:alpine
    ports:
      - "80:80"
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure
    networks:
      - frontend

  api:
    image: myapp/api:latest
    deploy:
      replicas: 5
      placement:
        constraints:
          - node.role == worker
      resources:
        limits:
          cpus: "0.5"
          memory: 512M
        reservations:
          cpus: "0.25"
          memory: 256M
    environment:
      - NODE_ENV=production
    networks:
      - frontend
      - backend

  database:
    image: postgres:15
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.labels.db == true
    environment:
      - POSTGRES_PASSWORD_FILE=/run/secrets/db_password
    secrets:
      - db_password
    volumes:
      - db-data:/var/lib/postgresql/data
    networks:
      - backend

networks:
  frontend:
    driver: overlay
  backend:
    driver: overlay

volumes:
  db-data:

secrets:
  db_password:
    external: true

bash

# Create secret
echo "mysecretpassword" | docker secret create db_password -

# Deploy stack
docker stack deploy -c docker-compose.yml mystack

# List stacks
docker stack ls

# List services in stack
docker stack services mystack

# View stack details
docker stack ps mystack

# Remove stack
docker stack rm mystack

Service Placement Constraints

bash

# Run only on manager nodes
docker service create \
  --name admin \
  --constraint node.role==manager \
  admin-app:latest

# Run on nodes with specific label
docker service create \
  --name gpu-app \
  --constraint node.labels.gpu==true \
  ml-app:latest

# Add label to node
docker node update --label-add gpu=true worker1

# Run on specific hostname
docker service create \
  --name special-app \
  --constraint node.hostname==worker2 \
  myapp:latest

Global Services

bash

# Deploy one container per node
docker service create \
  --name monitoring \
  --mode global \
  --mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock \
  monitoring-agent:latest

# Good for: logging, monitoring, security agents

Health Checks in Swarm

bash

# Create service with health check
docker service create \
  --name web \
  --replicas 3 \
  --health-cmd "curl -f http://localhost/health || exit 1" \
  --health-interval 30s \
  --health-timeout 10s \
  --health-retries 3 \
  myapp:latest

# Swarm will restart unhealthy tasks automatically

Service Logs

bash

# View service logs
docker service logs web

# Follow logs
docker service logs -f web

# Logs from specific task
docker service logs web.1

# Last 100 lines
docker service logs --tail 100 web

Draining and Removing Nodes

bash

# Drain node (move containers away)
docker node update --availability drain worker1

# View effect
docker node ls
# worker1 shows "Drain" availability

# Bring node back
docker node update --availability active worker1

# Remove node from swarm
# First, on worker node:
docker swarm leave

# Then on manager:
docker node rm worker1

7. Things to Consider

Best Practices:

Use Odd Number of Managers

bash

# ✅ Good - 1, 3, 5, or 7 managers
# Maintains quorum for high availability

# ❌ Avoid - 2, 4, 6 managers
# No benefit, higher split-brain risk

Separate Manager and Worker Roles

bash

# Prevent containers on managers
docker node update --availability drain manager1

Use Docker Secrets for Sensitive Data

bash

# ✅ Good
echo "password" | docker secret create db_pass -

# ❌ Avoid - environment variables for secrets
--env DB_PASSWORD=mysecret

Implement Health Checks

yaml

# In stack file
healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost/health"]
  interval: 30s
  timeout: 10s
  retries: 3

Common Pitfalls:

❌ Not using overlay networks (containers can't communicate across nodes) ✅ Create overlay networks for multi-host communication

❌ No resource limits (one service can consume all resources) ✅ Set CPU and memory limits in deploy section

❌ Ignoring update strategies (all replicas update at once) ✅ Use update_config for controlled rollouts

❌ Publishing ports on global services (port conflicts) ✅ Use host mode or routing mesh carefully

Swarm vs Kubernetes:

Feature	Docker Swarm	Kubernetes
Setup	Very easy	Complex
Learning Curve	Gentle	Steep
Ecosystem	Smaller	Massive
Use Case	Small-medium	Medium-large
Auto-scaling	Basic	Advanced
Community	Good	Excellent

8. Additional Helpful Sections

Monitoring Swarm

bash

# Node information
docker node inspect manager1

# Service details
docker service inspect web

# Real-time events
docker events --filter type=service

# Stack services status
watch docker stack ps mystack

Backup and Restore

bash

# Backup swarm state (on manager)
sudo systemctl stop docker
sudo tar -czf swarm-backup.tar.gz /var/lib/docker/swarm
sudo systemctl start docker

# Restore (on new manager)
sudo systemctl stop docker
sudo rm -rf /var/lib/docker/swarm
sudo tar -xzf swarm-backup.tar.gz -C /
sudo systemctl start docker
docker swarm init --force-new-cluster

Troubleshooting

bash

# Service not starting
docker service ps web --no-trunc

# Check task logs
docker service logs web

# Inspect failed task
docker inspect <task-id>

# Network issues
docker network inspect my-network

# Node connectivity
docker node ls
docker node inspect worker1

Production Example

yaml

# production-stack.yml
version: "3.8"

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
        failure_action: rollback
      restart_policy:
        condition: any
        delay: 5s
        max_attempts: 3
      placement:
        constraints:
          - node.role == worker
    networks:
      - frontend
    configs:
      - source: nginx_config
        target: /etc/nginx/nginx.conf

  app:
    image: myapp:${VERSION:-latest}
    deploy:
      replicas: 10
      update_config:
        parallelism: 2
        delay: 10s
      resources:
        limits:
          cpus: "1"
          memory: 1G
        reservations:
          cpus: "0.5"
          memory: 512M
    environment:
      - NODE_ENV=production
    secrets:
      - db_password
      - api_key
    networks:
      - frontend
      - backend
    healthcheck:
      test: ["CMD", "node", "healthcheck.js"]
      interval: 30s
      timeout: 10s
      retries: 3

  db:
    image: postgres:15
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.labels.storage == ssd
    environment:
      - POSTGRES_PASSWORD_FILE=/run/secrets/db_password
    secrets:
      - db_password
    volumes:
      - db-data:/var/lib/postgresql/data
    networks:
      - backend

networks:
  frontend:
    driver: overlay
  backend:
    driver: overlay
    internal: true

volumes:
  db-data:

secrets:
  db_password:
    external: true
  api_key:
    external: true

configs:
  nginx_config:
    file: ./nginx.conf

bash

# Deploy to production
docker stack deploy -c production-stack.yml prod

Quick Commands Reference

bash

# Swarm Management
docker swarm init                    # Initialize swarm
docker swarm join                    # Join swarm
docker swarm leave                   # Leave swarm
docker node ls                       # List nodes
docker node inspect <node>           # Node details

# Service Management
docker service create                # Create service
docker service ls                    # List services
docker service ps <service>          # Service tasks
docker service logs <service>        # Service logs
docker service scale <service>=N     # Scale service
docker service update                # Update service
docker service rm <service>          # Remove service

# Stack Management
docker stack deploy -c <file> <name> # Deploy stack
docker stack ls                      # List stacks
docker stack services <stack>        # Stack services
docker stack ps <stack>              # Stack tasks
docker stack rm <stack>              # Remove stack

# Secrets
docker secret create <name> <file>   # Create secret
docker secret ls                     # List secrets
docker secret inspect <secret>       # Secret details
docker secret rm <secret>            # Remove secret

Summary

Docker Swarm is Docker's native container orchestration solution that enables deploying and managing containerized applications across multiple hosts. Initialize a swarm with

text

docker swarm init

on a manager node, add workers with

text

docker swarm join

, and deploy services using

text

docker service create

text

docker stack deploy

. Swarm provides built-in load balancing, service discovery, rolling updates, and high availability. Use overlay networks for multi-host communication, implement health checks for automatic recovery, and leverage Docker secrets for secure credential management. While simpler than Kubernetes, Swarm is powerful enough for small to medium-scale production deployments. Key features include automatic container distribution, scaling with a single command, zero-downtime updates with rollback capabilities, and seamless integration with Docker Compose files. Docker Swarm is ideal when you need orchestration but want to avoid Kubernetes' complexity.