The Hook (The "Byte-Sized" Intro)
A large repo with 100,000 files could store 100,000 individual blob files in .git/objects/. That's slow to read, slow to transfer, and wastes disk space. Packfiles solve this: Git compresses thousands of loose objects into a single file, using delta compression so similar objects share storage. This is how a repo with years of history can fit in megabytes.
📖 What is a Packfile?
A packfile is a single binary file containing many Git objects compressed together. Git uses delta compression — storing only the differences between similar objects — to dramatically reduce size.
Conceptual Clarity
Loose objects vs Packfiles:
| Feature | Loose Objects | Packfile |
|---|---|---|
| Location | .git/objects/ab/cdef... (one file each) | .git/objects/pack/pack-*.pack |
| Compression | zlib per object | Delta compression across objects |
| Speed | Slow for many objects | Fast — single file read |
| Transfer | Individual objects | One file over the network |
| When created | On git add, git commit | On git gc, git push, git fetch |
Delta compression: Instead of storing full copies of similar files, Git stores one full copy (base) and then just the differences (deltas) for the others. A 10KB file with a 100-byte change stores as 10KB + 100 bytes, not 20KB.
Real-Life Analogy
Loose objects are like carrying each shirt separately in your hands. Packfiles are like vacuum-sealing all your clothes into one compressed bag. Delta compression is like packing only the differences: "same as the blue shirt, but with a red collar."
Visual Architecture
Why It Matters
- Storage: Delta compression reduces repo size by 10-100x for large histories.
- Network:
git cloneandgit fetchtransfer packfiles, not individual objects. - Performance: Reading one large file is faster than thousands of small files.
- Automatic:
git gccreates packfiles automatically; fetches receive them.
Code
# ─── See loose objects ───
find .git/objects -type f | head -5
# .git/objects/ab/cdef1234...
# ─── See packfiles ───
ls .git/objects/pack/
# pack-abc123.pack (the compressed objects)
# pack-abc123.idx (the index for fast lookup)
# ─── Create a packfile manually ───
git gc
# Packs loose objects, removes unreachable ones
# ─── View packfile stats ───
git verify-pack -v .git/objects/pack/pack-*.idx | tail -5
# Shows objects, their types, sizes, and delta chains
# ─── Count objects ───
git count-objects -v
# count: 5 (loose objects)
# packs: 1 (packfiles)
# size-pack: 1234 (packfile size in KB)Key Takeaways
- Loose objects are individual compressed files — simple but slow at scale.
- Packfiles compress many objects into one file with delta compression.
- Git creates packfiles during
gc,push, andfetchautomatically. - Delta compression stores only differences between similar objects.
Interview Prep
-
Q: What is delta compression in Git packfiles? A: Instead of storing full copies of similar objects, Git stores one complete object (base) and only the byte-level differences (deltas) for similar objects. This dramatically reduces storage — e.g., a 10KB file with a 100-byte edit stores as ~10.1KB total, not 20KB.
-
Q: When does Git create packfiles? A: During
git gc(garbage collection),git push(only packs objects needed by the remote), andgit fetch(the remote sends a packfile). Auto-gc triggers periodically when loose object count exceeds a threshold (default: 6700). -
Q: How does a packfile index work? A: The
.idxfile is a sorted lookup table mapping object SHAs to their byte offset within the.packfile. This enables O(log n) random access to any object without scanning the entire packfile.