Linux Shell Scripting: 10 Production Patterns Every DevOps Engineer Must Know

Introduction

Shell scripts are the duct tape of DevOps — they hold everything together. But a badly written script can cause outages, data loss, and sleepless nights. This guide covers 10 battle-tested patterns that make your scripts reliable, debuggable, and safe to run in production.

Pattern 1: The Safety Header

Every production script must start with set -euo pipefail. The -e flag exits immediately on any error, -u treats unset variables as errors, and -o pipefail catches errors inside piped commands. Add IFS to prevent word-splitting bugs on filenames with spaces. This one line prevents 80 percent of common shell scripting mistakes.

Pattern 2: Structured Logging

Never use plain echo for output. Build log, warn, and error functions that prefix every line with a timestamp and severity level, and write to both stdout and a log file using tee. This means you always have a record of what happened during a deployment even if the terminal session closes.

Pattern 3: Retry Logic with Exponential Backoff

Never call an external API or health check endpoint without retry logic. Networks are unreliable and services are temporarily unavailable. Write a retry function that accepts a command, a max attempt count, and doubles the delay between each attempt. This prevents your script from failing permanently because of a two-second blip.

Pattern 4: Cleanup with trap

If your script creates temp files, takes locks, or starts background processes, use trap to guarantee cleanup even when the script exits due to an error. Set trap cleanup EXIT at the top of your script and define a cleanup function that removes temp directories and releases locks. Without trap, failed scripts leave orphaned resources behind.

Pattern 5: Idempotent Operations

Scripts should be safe to run twice. Before every write operation ask yourself what happens if this line runs twice. Use mkdir -p instead of mkdir, use grep -qxF before appending to files, and check if a user exists before creating one. Idempotency lets you re-run scripts safely after partial failures.

Pattern 6: Input Validation

Always validate arguments at the top of your script before doing anything. Check that required variables are set, that files exist before reading them, and that numeric arguments are actually numbers. Print a clear usage message and exit with code 1 when validation fails. Never let a script proceed with bad input.

Pattern 7: Locking to Prevent Concurrent Runs

If your script runs on a schedule, two instances can overlap and corrupt shared state. Use a lock file with flock or mkdir as an atomic lock operation. Check for the lock at startup, create it, and release it in your trap cleanup function. This guarantees only one instance runs at a time.

Pattern 8: Meaningful Exit Codes

Always exit with a meaningful code. Exit 0 means success. Exit 1 means a general error. Use higher codes for specific failure modes so calling scripts or CI systems can take different actions based on what went wrong. Document your exit codes at the top of the script in comments.

Pattern 9: Dry Run Mode

Add a –dry-run flag to any script that modifies state. When dry run is active, print every command that would run but do not execute it. This lets you safely verify what a script will do in production before committing. Wrap every destructive command in a run function that checks the dry run flag.

Pattern 10: Self-Documenting Scripts

Add a usage function that prints a clear description of what the script does, what arguments it accepts, and example invocations. Call this function when –help is passed or when arguments are invalid. A script you cannot understand six months later is a liability. Write scripts as if the next person to read them is a junior engineer at 3am during an incident.

Key Takeaways

  • Always start with set -euo pipefail — it catches 80 percent of common bugs instantly
  • Use trap EXIT for cleanup — it runs even when the script crashes mid-execution
  • Retry external calls with exponential backoff — networks and APIs are unreliable
  • Design every script to be idempotent — safe to run twice without side effects
  • Add a dry run mode to any script that modifies production state
  • Validate all inputs before doing any work — never proceed with bad arguments
  • Use structured logging with timestamps — you need a record when things go wrong at 3am

Leave a Reply

Your email address will not be published. Required fields are marked *