The Hook (The "Byte-Sized" Intro)
Every file you commit, every folder structure, every snapshot — Git doesn't store them as files. It stores them as objects in a content-addressable database, each identified by a unique 40-character fingerprint. It's like a library where every book has a barcode generated from its contents. Change one word, and the barcode changes. That's how Git keeps history tamper-proof.
📖 What is Git Architecture and Objects?
Under the hood, Git is a content-addressable filesystem — a simple but powerful key-value store. It stores all data as four types of objects, each identified by a SHA-1 hash. Understanding these objects makes Git's behavior predictable and debuggable.
Conceptual Clarity
Git has exactly 4 types of objects:
| Object | What It Stores | Analogy |
|---|---|---|
| Blob | Raw file contents (no filename, no path) | A page of text |
| Tree | Directory listing — maps filenames to blobs and sub-trees | A table of contents |
| Commit | Snapshot — points to a tree + author + message + parent commit(s) | A dated, signed photo |
| Tag | A named, annotated pointer to a commit | A sticky note on a photo |
How they connect:
- A commit points to a tree (the root folder)
- A tree points to blobs (files) and other trees (subfolders)
- Each object is identified by a SHA-1 hash — a 40-character fingerprint generated from the object's contents
Key insight: If two files have identical content, Git stores only ONE blob. Same content = same hash = same object. This is how Git stays efficient.
Real-Life Analogy
Think of a Git repository as a library's catalog system:
- Blobs = The actual pages of text (content only, no titles)
- Trees = The shelf index cards (which book is on which shelf)
- Commits = The library log entries ("On Jan 5, Alice added 3 books to Shelf B")
- SHA-1 hashes = The ISBN barcodes (generated from content — if content changes, ISBN changes)
Visual Architecture
Why It Matters
- Integrity: SHA-1 hashes mean any corruption is detectable. You can run
git fsckto verify your entire repo. - Deduplication: Identical files share one blob, saving space across branches and history.
- Immutability: Objects are never modified — new commits create new objects. Old ones stay forever (until garbage collected).
- Debugging: When something goes wrong,
git cat-filelets you inspect any object directly.
Code
# Look at the latest commit object
git cat-file -p HEAD
# Output:
# tree f8d9e1a...
# parent e4f5g6h...
# author Alice <alice@example.com> 1706000000 +0000
# committer Alice <alice@example.com> 1706000000 +0000
#
# Add homepage
# Look at the tree (folder listing) that commit points to
git cat-file -p f8d9e1a
# Output:
# 100644 blob 3c7a8b2... index.html
# 100644 blob 9d4e5f6... style.css
# 040000 tree b2c3d4e... images
# Look at a blob (raw file content)
git cat-file -p 3c7a8b2
# Output: <raw contents of index.html>
# Check the type of any object
git cat-file -t a1b2c3d
# Output: commit
# Verify repository integrity
git fsck
# Output: Checking object directories: done.The .git Directory
When you run git init, Git creates a hidden .git/ folder. Here's what's inside:
| Path | Purpose |
|---|---|
.git/objects/ | All blobs, trees, commits, and tags |
.git/refs/ | Branch and tag pointers |
.git/HEAD | Points to the current branch |
.git/config | Repository-level settings |
.git/hooks/ | Automation scripts |
# Peek inside the objects directory
ls .git/objects/
# Output: folders named by first 2 chars of SHA-1 hashes
# e.g., 3c/ 9d/ a1/ b2/ f8/ ...Key Takeaways
- Git stores data as 4 types of objects: blobs (file content), trees (directories), commits (snapshots), and tags (labels).
- Every object is identified by a SHA-1 hash — a fingerprint of its contents.
- Identical content = identical hash = Git stores it only once (deduplication).
- The
.git/directory is the brain of your repository — objects, refs, HEAD, and config all live there.
Interview Prep
-
Q: What are the four types of objects in Git? A: Blob (file content), Tree (directory structure), Commit (snapshot pointing to a tree with metadata), and Tag (annotated label for a commit). Each is identified by a SHA-1 hash.
-
Q: How does Git ensure data integrity? A: Every object is identified by a SHA-1 hash generated from its content. If even one byte changes, the hash changes, making corruption immediately detectable. You can verify integrity with
git fsck. -
Q: If you commit the same file content in two different branches, does Git store it twice? A: No. Git is content-addressable — identical content produces the same SHA-1 hash, so Git stores only one blob object. Both branches reference the same object, saving space.