Linux Shell Scripting: 10 Production Patterns Every DevOps Engineer Must Know -

Linux Shell Scripting: 10 Production Patterns Every DevOps Engineer Must Know

Table of Contents

Introduction

Shell scripts are the duct tape of DevOps — they hold everything together. But a badly written script can cause outages, data loss, and sleepless nights. This guide covers 10 battle-tested patterns that make your scripts reliable, debuggable, and safe to run in production.

Pattern 1: The Safety Header

Every production script must start with set -euo pipefail. The -e flag exits immediately on any error, -u treats unset variables as errors, and -o pipefail catches errors inside piped commands. Add IFS to prevent word-splitting bugs on filenames with spaces. This one line prevents 80 percent of common shell scripting mistakes.

Pattern 2: Structured Logging

Never use plain echo for output. Build log, warn, and error functions that prefix every line with a timestamp and severity level, and write to both stdout and a log file using tee. This means you always have a record of what happened during a deployment even if the terminal session closes.

Pattern 3: Retry Logic with Exponential Backoff

Never call an external API or health check endpoint without retry logic. Networks are unreliable and services are temporarily unavailable. Write a retry function that accepts a command, a max attempt count, and doubles the delay between each attempt. This prevents your script from failing permanently because of a two-second blip.

Pattern 4: Cleanup with trap

If your script creates temp files, takes locks, or starts background processes, use trap to guarantee cleanup even when the script exits due to an error. Set trap cleanup EXIT at the top of your script and define a cleanup function that removes temp directories and releases locks. Without trap, failed scripts leave orphaned resources behind.

Pattern 5: Idempotent Operations

Scripts should be safe to run twice. Before every write operation ask yourself what happens if this line runs twice. Use mkdir -p instead of mkdir, use grep -qxF before appending to files, and check if a user exists before creating one. Idempotency lets you re-run scripts safely after partial failures.

Pattern 6: Input Validation

Always validate arguments at the top of your script before doing anything. Check that required variables are set, that files exist before reading them, and that numeric arguments are actually numbers. Print a clear usage message and exit with code 1 when validation fails. Never let a script proceed with bad input.

Pattern 7: Locking to Prevent Concurrent Runs

If your script runs on a schedule, two instances can overlap and corrupt shared state. Use a lock file with flock or mkdir as an atomic lock operation. Check for the lock at startup, create it, and release it in your trap cleanup function. This guarantees only one instance runs at a time.

Pattern 8: Meaningful Exit Codes

Always exit with a meaningful code. Exit 0 means success. Exit 1 means a general error. Use higher codes for specific failure modes so calling scripts or CI systems can take different actions based on what went wrong. Document your exit codes at the top of the script in comments.

Pattern 9: Dry Run Mode

Add a –dry-run flag to any script that modifies state. When dry run is active, print every command that would run but do not execute it. This lets you safely verify what a script will do in production before committing. Wrap every destructive command in a run function that checks the dry run flag.

Pattern 10: Self-Documenting Scripts

Add a usage function that prints a clear description of what the script does, what arguments it accepts, and example invocations. Call this function when –help is passed or when arguments are invalid. A script you cannot understand six months later is a liability. Write scripts as if the next person to read them is a junior engineer at 3am during an incident.

Key Takeaways

Always start with set -euo pipefail — it catches 80 percent of common bugs instantly
Use trap EXIT for cleanup — it runs even when the script crashes mid-execution
Retry external calls with exponential backoff — networks and APIs are unreliable
Design every script to be idempotent — safe to run twice without side effects
Add a dry run mode to any script that modifies production state
Validate all inputs before doing any work — never proceed with bad arguments
Use structured logging with timestamps — you need a record when things go wrong at 3am

Linux Shell Scripting: 10 Production Patterns Every DevOps Engineer Must Know

Introduction

Pattern 1: The Safety Header

Pattern 2: Structured Logging

Pattern 3: Retry Logic with Exponential Backoff

Pattern 4: Cleanup with trap

Pattern 5: Idempotent Operations

Pattern 6: Input Validation

Pattern 7: Locking to Prevent Concurrent Runs

Pattern 8: Meaningful Exit Codes

Pattern 9: Dry Run Mode

Pattern 10: Self-Documenting Scripts

Key Takeaways

Related

Leave a Reply Cancel reply

Recent Posts

Recent Blog

Interview Q&A

Categories