Incident Response Workflow

The Hook (The "Byte-Sized" Intro)

Alarms fire. Error rates spike. Users are complaining. This is where Git workflows meet real-world pressure. The incident response workflow answers: was this caused by a recent deploy? Should we revert or hotfix? How do we communicate? After the fire is out, the post-mortem uses Git history to trace exactly what happened.

📖 What is the Incident Response Workflow?

A structured process for using Git and deployment tools to detect, diagnose, fix, and learn from production incidents.

Conceptual Clarity

Incident response phases:

Phase	Git Actions	Goal
1. Detect	Check recent deploys	Was this caused by a change?
2. Triage	`git log --since`, `git diff`	What changed recently?
3. Mitigate	Revert or feature flag off	Stop the bleeding
4. Fix	Hotfix branch if needed	Proper fix
5. Verify	Deploy fix, monitor	Confirm resolution
6. Post-mortem	`git blame`, `git log`	Trace root cause

Revert vs hotfix decision:

Situation	Action
Root cause is clear + fix is simple	Hotfix
Root cause is unclear	Revert the last deploy
Multiple recent changes could be the cause	Revert all recent changes
Revert would lose important changes	Feature flag off + hotfix

Real-Life Analogy

Incident response is like a fire department response. First: contain the fire (revert/mitigate). Second: investigate the cause (git log, blame). Third: prevent future fires (post-mortem action items).

Visual Architecture

flowchart TD ALERT["🚨 Alert"] --> TRIAGE["🔍 Triage<br/>git log, git diff"] TRIAGE --> DECIDE{"Clear root cause?"} DECIDE -->|"Yes"| HOTFIX["🩹 Hotfix"] DECIDE -->|"No"| REVERT["⏪ Revert Deploy"] HOTFIX --> VERIFY["✅ Verify"] REVERT --> VERIFY VERIFY --> POSTMORTEM["📝 Post-mortem"] style ALERT fill:#2d1b1b,stroke:#e94560,color:#e94560 style VERIFY fill:#1b2d1b,stroke:#53d8fb,color:#53d8fb

Why It Matters

Speed: Knowing the workflow eliminates decision paralysis during incidents.
Safety: Revert first, investigate later — minimize user impact.
Learning: Post-mortems use Git history to trace root causes.
Accountability: Git log provides an audit trail of all actions taken.

Code

bash

# ─── Step 1: What changed recently? ───
git log --oneline --since="2 hours ago" main
git diff v1.0.0..v1.0.1 --stat

# ─── Step 2: Who changed the broken file? ───
git blame src/payments/process.js
git log --oneline -5 src/payments/process.js

# ─── Step 3a: Revert (when cause is unclear) ───
git revert HEAD --no-edit     # Revert last commit
git push origin main          # Deploy the revert
# (or revert the deploy via CI/CD rollback)

# ─── Step 3b: Hotfix (when cause is clear) ───
git checkout -b hotfix/INCIDENT-42-fix v1.0.1
# Make the fix...
git commit -m "fix(payments): handle null card object"
git push -u origin hotfix/INCIDENT-42-fix

# ─── Step 4: Verify ───
# Monitor error rates, check logs, run smoke tests

# ─── Step 5: Post-mortem ───
# Document in a post-mortem template:
# - Timeline of events
# - Root cause (link to commit)
# - Impact (users affected, duration)
# - Action items (preventive measures)

Key Takeaways

Revert first if the root cause isn't immediately clear.
Use git log --since and git diff to triage quickly.
Hotfix when the cause is known and the fix is small.
Post-mortems use Git history to build the timeline and trace the root cause.

Interview Prep

Q: During a production incident, should you revert or hotfix? A: If the root cause is clear and the fix is simple, hotfix. If not, revert the last deploy immediately to stop the bleeding, then investigate. The priority is restoring service, not fixing the bug.
Q: How does Git help during incident response? A: git log --since shows recent changes, git diff shows what changed between versions, git blame identifies who changed specific lines, and git revert quickly undoes problematic commits.
Q: What should a post-mortem include? A: Timeline (when detected, mitigated, resolved), root cause (link to specific commit), impact (users/revenue affected), and action items (tests to add, checks to implement, process improvements).