Trees and Blobs

The Hook (The "Byte-Sized" Intro)

Git doesn't store files the way you think. There are no "files" in Git's database — just blobs (raw content) and trees (directory listings). A blob has no name, no path, no permissions. A tree is what connects names to blobs. Rename a file? Same blob, new tree. Two identical files? One blob, referenced twice. This is how Git stays fast and space-efficient.

📖 What are Trees and Blobs?

Blobs store file content. Trees store directory structure, mapping filenames and permissions to blob/tree SHAs. Together they represent a snapshot of your project at any point in time.

Conceptual Clarity

Blob vs Tree:

Feature	Blob	Tree
Stores	Raw file content	Directory entries
Contains	Just bytes (no metadata)	Mode + type + SHA + name
Named?	❌ No filename	✅ Lists filenames
Nested?	❌	✅ Trees can contain trees

Tree entry format:

<mode> <type> <sha>    <name>
100644 blob   abc123   README.md
100755 blob   def456   run.sh
040000 tree   789abc   src/

File modes:

Mode	Meaning
`100644`	Regular file
`100755`	Executable file
`120000`	Symbolic link
`040000`	Subdirectory (tree)

Real-Life Analogy

Blobs are like pages ripped from a book — they have content but no title or page number. Trees are the table of contents — they list "Chapter 1 is page 5, Chapter 2 is page 12." Without the table of contents, you have content but no structure.

Visual Architecture

flowchart TD ROOT["📁 Root Tree"] --> README["📄 Blob: README.md 100644"] ROOT --> SRC["📁 Tree: src/ 040000"] ROOT --> SCRIPT["📄 Blob: run.sh 100755"] SRC --> APP["📄 Blob: app.js 100644"] SRC --> UTIL["📄 Blob: utils.js 100644"] style ROOT fill:#1a1a2e,stroke:#ffd700,color:#ffd700 style SRC fill:#1a1a2e,stroke:#ffd700,color:#ffd700 style README fill:#1b2d1b,stroke:#53d8fb,color:#53d8fb style APP fill:#1b2d1b,stroke:#53d8fb,color:#53d8fb

Why It Matters

Deduplication: Identical files share the same blob — no wasted space.
Rename detection: Renaming creates a new tree but reuses the blob.
Snapshot efficiency: Unchanged files reuse existing blobs across commits.
Foundation: Every commit points to a root tree — understanding trees unlocks Git.

Code

bash

# ─── View the root tree of the latest commit ───
git cat-file -p HEAD^{tree}
# 100644 blob abc123    README.md
# 100755 blob def456    run.sh
# 040000 tree 789abc    src

# ─── View a subtree ───
git cat-file -p 789abc
# 100644 blob aaa111    app.js
# 100644 blob bbb222    utils.js

# ─── View a blob (file content) ───
git cat-file -p abc123
# # My Project
# Welcome to the README.

# ─── Prove deduplication: identical files = same blob ───
echo "hello" > file1.txt
cp file1.txt file2.txt
git add .
git cat-file -p HEAD^{tree}
# Both file1.txt and file2.txt point to the SAME blob SHA!

# ─── Create a blob manually ───
echo "hello" | git hash-object -w --stdin
# Writes the blob and returns its SHA

Key Takeaways

Blobs store raw content with no name or path — just bytes.
Trees map filenames + permissions to blob/tree SHAs — they ARE the directory structure.
Identical files share the same blob — Git deduplicates automatically.
Renaming a file changes the tree but reuses the same blob.

Interview Prep

Q: How does Git store files internally? A: Git stores file content as blob objects (raw bytes, no filename). Directory structure is stored as tree objects that map filenames and permissions to blob/tree SHAs. A commit points to a root tree, which contains the complete project snapshot.
Q: What happens when you rename a file in Git? A: The blob (file content) stays the same since the content hasn't changed. A new tree object is created with the new filename mapping to the same blob SHA. Git detects renames by comparing blob SHAs between trees.
Q: How does Git achieve space efficiency across commits? A: Unchanged files reuse existing blob objects — only modified files create new blobs. Trees also reuse unchanged subtrees. This means a commit with 1000 files where only 1 changed creates just 1 new blob and a chain of new trees up to the root.