M
MeshWorld.
Docker Docker Compose DevOps Production Deployment How-To 7 min read

Docker Compose in Production: What You Need to Know

Vishnu
By Vishnu

:::note[TL;DR]

  • Add restart: unless-stopped to every service — without it, crashed containers stay dead
  • Health checks + depends_on: condition: service_healthy prevent startup race conditions between services
  • Never put secrets in .env files on the server — use Docker secrets, CI/CD injection, or a secrets manager
  • Set max-size and max-file on log drivers — default JSON logging fills disk silently
  • Use resource limits (cpu/memory) to prevent one container from taking down the entire host :::

Docker Compose works great in development. In production, the same file will get you in trouble if you don’t change a few things. The defaults are built for convenience, not resilience.

This guide covers what to update before you point your domain at a Compose-based deployment.

What’s different in production

In development:

  • Containers restart manually
  • Secrets are in .env files
  • Logs go wherever
  • Containers share all resources freely
  • Health checks don’t matter

In production:

  • Containers must restart automatically
  • Secrets must not be in files on disk
  • Logs need to go to a log driver or external system
  • Resource limits prevent one container from killing the host
  • Health checks gate your load balancer and deployment logic

Use a separate production compose file

Don’t use one file for everything. Keep docker-compose.yml for shared config and docker-compose.prod.yml for production overrides:

docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

Restart policies

Without this, containers that crash stay dead:

services:
  app:
    restart: unless-stopped    # restart on crash, not on manual stop

Options:

  • no — never restart (dev default)
  • always — always restart, even on manual stop
  • on-failure — only restart on non-zero exit code
  • unless-stopped — restart always except when manually stopped (production default)

Health checks

Health checks let Docker know when a container is actually ready, not just running:

services:
  app:
    image: my-app:latest
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s    # grace period on startup

  db:
    image: postgres:16
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5

Use depends_on with condition to wait for a healthy dependency:

services:
  app:
    depends_on:
      db:
        condition: service_healthy

The scenario: You deploy your app and it starts in 3 seconds, but PostgreSQL takes 8 seconds to be ready for connections. Without health checks, your app crashes on startup trying to connect to a database that isn’t ready yet. With service_healthy, the app waits.

Resource limits

Without limits, one misbehaving container can take down the host:

services:
  app:
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
        reservations:
          cpus: '0.25'
          memory: 256M

Secrets management

:::warning Never commit .env files containing production secrets to your repo — even private repos. If the repo is ever made public, or a team member’s account is compromised, those secrets are exposed. Use .gitignore to exclude .env and inject secrets at deploy time via CI/CD or a secrets manager. :::

Never put production secrets in .env files committed to the repo or sitting on the server in plaintext.

Option 1: Docker secrets (Swarm mode)

secrets:
  db_password:
    external: true

services:
  db:
    secrets:
      - db_password
    environment:
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password

Option 2: Environment variables from a secure source

Use your CI/CD system (GitHub Actions, GitLab CI) to inject secrets at deploy time, or a secrets manager (Vault, AWS Secrets Manager) that your deploy script calls before starting Compose.

Option 3: .env file with restricted permissions

If you must use a .env file on the server, restrict access:

chmod 600 /app/.env
chown appuser:appuser /app/.env

Never commit it. Add it to .gitignore.

Logging

:::tip Always configure max-size and max-file on your log driver. The default JSON file driver writes unlimited logs to disk — a verbose service can fill a server disk in hours. Set max-size: "10m" and max-file: "3" as a minimum for every service in production. :::

Default JSON file logs fill up your disk. Configure a log driver:

services:
  app:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

For centralized logging, use loki, fluentd, or awslogs:

    logging:
      driver: awslogs
      options:
        awslogs-region: ap-south-1
        awslogs-group: /app/production
        awslogs-stream: app

Networking

By default, all services in a Compose file share one network. In production, segment your services:

networks:
  frontend:
  backend:

services:
  nginx:
    networks:
      - frontend

  app:
    networks:
      - frontend
      - backend

  db:
    networks:
      - backend    # not reachable from nginx directly

Production compose example

# docker-compose.prod.yml
services:
  app:
    image: my-app:${IMAGE_TAG:-latest}
    restart: unless-stopped
    environment:
      NODE_ENV: production
      DATABASE_URL: ${DATABASE_URL}
    ports:
      - "3000:3000"
    depends_on:
      db:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s
    deploy:
      resources:
        limits:
          cpus: '1.0'
          memory: 1G
    logging:
      driver: json-file
      options:
        max-size: "20m"
        max-file: "5"
    networks:
      - frontend
      - backend

  db:
    image: postgres:16-alpine
    restart: unless-stopped
    environment:
      POSTGRES_USER: ${DB_USER}
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_DB: ${DB_NAME}
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${DB_USER}"]
      interval: 10s
      timeout: 5s
      retries: 5
    deploy:
      resources:
        limits:
          memory: 2G
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"
    networks:
      - backend

volumes:
  pgdata:

networks:
  frontend:
  backend:

Zero-downtime deploys

Compose doesn’t do rolling updates natively. For zero-downtime:

  1. Pull the new image: docker compose pull app
  2. Recreate with minimal downtime: docker compose up -d --no-deps app

Or use a proxy like Nginx/Traefik to route traffic while you swap containers.

For anything requiring true zero-downtime at scale, that’s when Docker Swarm or Kubernetes starts making sense.

Related: Docker Cheat Sheet | Write a Node.js Dockerfile


Summary

  • Change restart: no (dev default) to restart: unless-stopped for every service in production
  • Health checks let Compose and load balancers know when a container is actually ready — use depends_on with condition: service_healthy
  • Never put production secrets in .env files on the server — use Docker secrets, CI/CD injection, or a secrets manager
  • Add max-size and max-file to your logging config to prevent filling your disk with JSON logs
  • Use resource limits (cpus, memory) to prevent one misbehaving container from taking down the host

Frequently Asked Questions

Should I use Docker Compose or Kubernetes in production?

Compose is the right choice for small-to-medium deployments on a single server or a few servers. Kubernetes is for orchestrating containers across many nodes at scale. If you’re running 2–10 services on a VPS, Compose is simpler and maintainable. If you need auto-scaling, multi-zone redundancy, and rolling deploys across a cluster, that’s Kubernetes territory.

How do I update a service with zero downtime using Compose?

True zero-downtime with plain Compose requires a reverse proxy (Nginx, Traefik, Caddy) handling traffic. Pull the new image, start a new container on a different port, update the proxy config to point to it, then stop the old container. Traefik automates this with labels. For simpler setups, a brief (~1-2 second) restart with docker compose up -d --no-deps app is usually acceptable.

What’s the difference between depends_on and depends_on with condition: service_healthy?

Plain depends_on only waits for the container to start — not for the service inside to be ready. condition: service_healthy waits until the container’s health check passes. Always use the health condition for databases, caches, and any service that takes a few seconds to initialize.