The Hook (The "Byte-Sized" Intro)
Over time, your .git folder accumulates loose objects, stale reflogs, and unreachable data. git gc is the housekeeping service: it packs loose objects into packfiles, removes unreachable objects past their expiry, and compresses references. Git runs it automatically in the background, but understanding it helps you manage large repos and know when your data becomes permanently unrecoverable.
📖 What is git gc?
git gc (garbage collection) optimizes the repository by packing loose objects, pruning unreachable objects, and compressing references.
Conceptual Clarity
What gc does:
| Action | Effect |
|---|---|
| Pack loose objects | Combines into packfiles with delta compression |
| Prune unreachable objects | Removes objects older than expiry (default: 2 weeks) |
| Expire reflog entries | Removes old reflog entries (default: 90 days reachable, 30 days unreachable) |
| Repack references | Compresses .git/refs/ into packed-refs |
Auto-gc triggers:
Git runs git gc --auto automatically when:
- Loose objects exceed ~6,700
- Packfiles exceed ~50
The expiry timeline:
| Object Type | Reflog Expiry | Prune After |
|---|---|---|
| Reachable reflog | 90 days | Never (still reachable) |
| Unreachable reflog | 30 days | 2 weeks after reflog expires |
Real-Life Analogy
git gc is spring cleaning for your repo. Loose papers (objects) get filed into binders (packfiles). Old receipts (unreachable objects) get shredded after 30 days. The room is cleaner, smaller, and faster to navigate.
Visual Architecture
Why It Matters
- Performance: Packed repos are faster to read and transfer.
- Disk space: Removes duplicate and unreachable data.
- Recovery window: Unreachable objects survive ~30 days before gc removes them.
- Awareness: Knowing gc timelines tells you how long recovery is possible.
Code
# ─── Run garbage collection ───
git gc
# Packs, prunes, compresses
# ─── Aggressive gc (slower, more compression) ───
git gc --aggressive
# Better compression, takes longer
# ─── Check what gc would prune ───
git prune --dry-run
# Shows what would be removed without removing it
# ─── Configure expiry times ───
git config gc.reflogExpire "90 days"
git config gc.reflogExpireUnreachable "30 days"
git config gc.pruneExpire "2 weeks"
# ─── Check repo size ───
git count-objects -v
# count: 0 (loose objects)
# packs: 1 (packfiles)
# size-pack: 5432 (KB)
# ─── Disable auto-gc temporarily ───
git config gc.auto 0
# Re-enable: git config gc.auto 6700Key Takeaways
git gcpacks objects, prunes unreachable data, and compresses references.- Git runs auto-gc in the background — you rarely need to run it manually.
- Unreachable objects survive ~30 days before gc permanently removes them.
- After gc prunes objects, recovery via
fsckis no longer possible.
Interview Prep
-
Q: What does
git gcdo? A: It packs loose objects into packfiles (with delta compression), prunes unreachable objects older than the configured expiry, expires old reflog entries, and compresses references. This reduces disk usage and improves performance. -
Q: How long are unreachable objects recoverable? A: Unreachable reflog entries expire after 30 days (default). After that,
git gccan prune the objects — typically 2 weeks after the reflog entry expires. So roughly 30-45 days total. After pruning, recovery is impossible. -
Q: When would you run
git gc --aggressive? A: After importing a large repository, after removing large files from history (e.g., withfilter-branch), or when the repository has grown very large and you want maximum compression. It's slower but produces better results.