The Hook (The "Byte-Sized" Intro)
Git doesn't store files the way you think. There are no "files" in Git's database — just blobs (raw content) and trees (directory listings). A blob has no name, no path, no permissions. A tree is what connects names to blobs. Rename a file? Same blob, new tree. Two identical files? One blob, referenced twice. This is how Git stays fast and space-efficient.
📖 What are Trees and Blobs?
Blobs store file content. Trees store directory structure, mapping filenames and permissions to blob/tree SHAs. Together they represent a snapshot of your project at any point in time.
Conceptual Clarity
Blob vs Tree:
| Feature | Blob | Tree |
|---|---|---|
| Stores | Raw file content | Directory entries |
| Contains | Just bytes (no metadata) | Mode + type + SHA + name |
| Named? | ❌ No filename | ✅ Lists filenames |
| Nested? | ❌ | ✅ Trees can contain trees |
Tree entry format:
<mode> <type> <sha> <name>
100644 blob abc123 README.md
100755 blob def456 run.sh
040000 tree 789abc src/
File modes:
| Mode | Meaning |
|---|---|
100644 | Regular file |
100755 | Executable file |
120000 | Symbolic link |
040000 | Subdirectory (tree) |
Real-Life Analogy
Blobs are like pages ripped from a book — they have content but no title or page number. Trees are the table of contents — they list "Chapter 1 is page 5, Chapter 2 is page 12." Without the table of contents, you have content but no structure.
Visual Architecture
Why It Matters
- Deduplication: Identical files share the same blob — no wasted space.
- Rename detection: Renaming creates a new tree but reuses the blob.
- Snapshot efficiency: Unchanged files reuse existing blobs across commits.
- Foundation: Every commit points to a root tree — understanding trees unlocks Git.
Code
# ─── View the root tree of the latest commit ───
git cat-file -p HEAD^{tree}
# 100644 blob abc123 README.md
# 100755 blob def456 run.sh
# 040000 tree 789abc src
# ─── View a subtree ───
git cat-file -p 789abc
# 100644 blob aaa111 app.js
# 100644 blob bbb222 utils.js
# ─── View a blob (file content) ───
git cat-file -p abc123
# # My Project
# Welcome to the README.
# ─── Prove deduplication: identical files = same blob ───
echo "hello" > file1.txt
cp file1.txt file2.txt
git add .
git cat-file -p HEAD^{tree}
# Both file1.txt and file2.txt point to the SAME blob SHA!
# ─── Create a blob manually ───
echo "hello" | git hash-object -w --stdin
# Writes the blob and returns its SHAKey Takeaways
- Blobs store raw content with no name or path — just bytes.
- Trees map filenames + permissions to blob/tree SHAs — they ARE the directory structure.
- Identical files share the same blob — Git deduplicates automatically.
- Renaming a file changes the tree but reuses the same blob.
Interview Prep
-
Q: How does Git store files internally? A: Git stores file content as blob objects (raw bytes, no filename). Directory structure is stored as tree objects that map filenames and permissions to blob/tree SHAs. A commit points to a root tree, which contains the complete project snapshot.
-
Q: What happens when you rename a file in Git? A: The blob (file content) stays the same since the content hasn't changed. A new tree object is created with the new filename mapping to the same blob SHA. Git detects renames by comparing blob SHAs between trees.
-
Q: How does Git achieve space efficiency across commits? A: Unchanged files reuse existing blob objects — only modified files create new blobs. Trees also reuse unchanged subtrees. This means a commit with 1000 files where only 1 changed creates just 1 new blob and a chain of new trees up to the root.