Lesson Completion
Back to course

Packfiles

Intermediate
8 minutes4.7Git

The Hook (The "Byte-Sized" Intro)

A large repo with 100,000 files could store 100,000 individual blob files in .git/objects/. That's slow to read, slow to transfer, and wastes disk space. Packfiles solve this: Git compresses thousands of loose objects into a single file, using delta compression so similar objects share storage. This is how a repo with years of history can fit in megabytes.

📖 What is a Packfile?

A packfile is a single binary file containing many Git objects compressed together. Git uses delta compression — storing only the differences between similar objects — to dramatically reduce size.

Conceptual Clarity

Loose objects vs Packfiles:

FeatureLoose ObjectsPackfile
Location.git/objects/ab/cdef... (one file each).git/objects/pack/pack-*.pack
Compressionzlib per objectDelta compression across objects
SpeedSlow for many objectsFast — single file read
TransferIndividual objectsOne file over the network
When createdOn git add, git commitOn git gc, git push, git fetch

Delta compression: Instead of storing full copies of similar files, Git stores one full copy (base) and then just the differences (deltas) for the others. A 10KB file with a 100-byte change stores as 10KB + 100 bytes, not 20KB.

Real-Life Analogy

Loose objects are like carrying each shirt separately in your hands. Packfiles are like vacuum-sealing all your clothes into one compressed bag. Delta compression is like packing only the differences: "same as the blue shirt, but with a red collar."

Visual Architecture

flowchart LR LOOSE["📦 Loose Objects<br/>1 file per object"] -->|"git gc"| PACK["📦 Packfile<br/>All objects in 1 file"] PACK --> INDEX["📋 Pack Index<br/>Fast SHA lookup"] style LOOSE fill:#2d1b1b,stroke:#e94560,color:#e94560 style PACK fill:#1b2d1b,stroke:#53d8fb,color:#53d8fb

Why It Matters

  • Storage: Delta compression reduces repo size by 10-100x for large histories.
  • Network: git clone and git fetch transfer packfiles, not individual objects.
  • Performance: Reading one large file is faster than thousands of small files.
  • Automatic: git gc creates packfiles automatically; fetches receive them.

Code

bash
# ─── See loose objects ─── find .git/objects -type f | head -5 # .git/objects/ab/cdef1234... # ─── See packfiles ─── ls .git/objects/pack/ # pack-abc123.pack (the compressed objects) # pack-abc123.idx (the index for fast lookup) # ─── Create a packfile manually ─── git gc # Packs loose objects, removes unreachable ones # ─── View packfile stats ─── git verify-pack -v .git/objects/pack/pack-*.idx | tail -5 # Shows objects, their types, sizes, and delta chains # ─── Count objects ─── git count-objects -v # count: 5 (loose objects) # packs: 1 (packfiles) # size-pack: 1234 (packfile size in KB)

Key Takeaways

  • Loose objects are individual compressed files — simple but slow at scale.
  • Packfiles compress many objects into one file with delta compression.
  • Git creates packfiles during gc, push, and fetch automatically.
  • Delta compression stores only differences between similar objects.

Interview Prep

  • Q: What is delta compression in Git packfiles? A: Instead of storing full copies of similar objects, Git stores one complete object (base) and only the byte-level differences (deltas) for similar objects. This dramatically reduces storage — e.g., a 10KB file with a 100-byte edit stores as ~10.1KB total, not 20KB.

  • Q: When does Git create packfiles? A: During git gc (garbage collection), git push (only packs objects needed by the remote), and git fetch (the remote sends a packfile). Auto-gc triggers periodically when loose object count exceeds a threshold (default: 6700).

  • Q: How does a packfile index work? A: The .idx file is a sorted lookup table mapping object SHAs to their byte offset within the .pack file. This enables O(log n) random access to any object without scanning the entire packfile.

Topics Covered

Git InternalsStorage

Tags

#git#internals#packfiles#compression

Last Updated

2026-02-13