Lesson Completion
Back to course

Git Architecture and Objects

Beginner
12 minutes4.7Git

The Hook (The "Byte-Sized" Intro)

Every file you commit, every folder structure, every snapshot — Git doesn't store them as files. It stores them as objects in a content-addressable database, each identified by a unique 40-character fingerprint. It's like a library where every book has a barcode generated from its contents. Change one word, and the barcode changes. That's how Git keeps history tamper-proof.

📖 What is Git Architecture and Objects?

Under the hood, Git is a content-addressable filesystem — a simple but powerful key-value store. It stores all data as four types of objects, each identified by a SHA-1 hash. Understanding these objects makes Git's behavior predictable and debuggable.

Conceptual Clarity

Git has exactly 4 types of objects:

ObjectWhat It StoresAnalogy
BlobRaw file contents (no filename, no path)A page of text
TreeDirectory listing — maps filenames to blobs and sub-treesA table of contents
CommitSnapshot — points to a tree + author + message + parent commit(s)A dated, signed photo
TagA named, annotated pointer to a commitA sticky note on a photo

How they connect:

  • A commit points to a tree (the root folder)
  • A tree points to blobs (files) and other trees (subfolders)
  • Each object is identified by a SHA-1 hash — a 40-character fingerprint generated from the object's contents

Key insight: If two files have identical content, Git stores only ONE blob. Same content = same hash = same object. This is how Git stays efficient.

Real-Life Analogy

Think of a Git repository as a library's catalog system:

  • Blobs = The actual pages of text (content only, no titles)
  • Trees = The shelf index cards (which book is on which shelf)
  • Commits = The library log entries ("On Jan 5, Alice added 3 books to Shelf B")
  • SHA-1 hashes = The ISBN barcodes (generated from content — if content changes, ISBN changes)

Visual Architecture

flowchart TD C1["📸 Commit: a1b2c3d<br/>Author: Alice<br/>Message: Add homepage<br/>Parent: e4f5g6h"] --> T1["📁 Tree: f8d9e1a"] T1 --> B1["📄 Blob: 3c7a8b2<br/>index.html"] T1 --> B2["📄 Blob: 9d4e5f6<br/>style.css"] T1 --> T2["📁 Tree: b2c3d4e<br/>images/"] T2 --> B3["📄 Blob: 1a2b3c4<br/>logo.png"] C1 --> C0["📸 Parent Commit: e4f5g6h"] style C1 fill:#0f3460,stroke:#53d8fb,color:#53d8fb style C0 fill:#1a1a2e,stroke:#53d8fb,color:#53d8fb style T1 fill:#1a1a2e,stroke:#ffd700,color:#ffd700 style T2 fill:#1a1a2e,stroke:#ffd700,color:#ffd700 style B1 fill:#1a1a2e,stroke:#e94560,color:#e94560 style B2 fill:#1a1a2e,stroke:#e94560,color:#e94560 style B3 fill:#1a1a2e,stroke:#e94560,color:#e94560

Why It Matters

  • Integrity: SHA-1 hashes mean any corruption is detectable. You can run git fsck to verify your entire repo.
  • Deduplication: Identical files share one blob, saving space across branches and history.
  • Immutability: Objects are never modified — new commits create new objects. Old ones stay forever (until garbage collected).
  • Debugging: When something goes wrong, git cat-file lets you inspect any object directly.

Code

bash
# Look at the latest commit object git cat-file -p HEAD # Output: # tree f8d9e1a... # parent e4f5g6h... # author Alice <alice@example.com> 1706000000 +0000 # committer Alice <alice@example.com> 1706000000 +0000 # # Add homepage # Look at the tree (folder listing) that commit points to git cat-file -p f8d9e1a # Output: # 100644 blob 3c7a8b2... index.html # 100644 blob 9d4e5f6... style.css # 040000 tree b2c3d4e... images # Look at a blob (raw file content) git cat-file -p 3c7a8b2 # Output: <raw contents of index.html> # Check the type of any object git cat-file -t a1b2c3d # Output: commit # Verify repository integrity git fsck # Output: Checking object directories: done.

The .git Directory

When you run git init, Git creates a hidden .git/ folder. Here's what's inside:

PathPurpose
.git/objects/All blobs, trees, commits, and tags
.git/refs/Branch and tag pointers
.git/HEADPoints to the current branch
.git/configRepository-level settings
.git/hooks/Automation scripts
bash
# Peek inside the objects directory ls .git/objects/ # Output: folders named by first 2 chars of SHA-1 hashes # e.g., 3c/ 9d/ a1/ b2/ f8/ ...

Key Takeaways

  • Git stores data as 4 types of objects: blobs (file content), trees (directories), commits (snapshots), and tags (labels).
  • Every object is identified by a SHA-1 hash — a fingerprint of its contents.
  • Identical content = identical hash = Git stores it only once (deduplication).
  • The .git/ directory is the brain of your repository — objects, refs, HEAD, and config all live there.

Interview Prep

  • Q: What are the four types of objects in Git? A: Blob (file content), Tree (directory structure), Commit (snapshot pointing to a tree with metadata), and Tag (annotated label for a commit). Each is identified by a SHA-1 hash.

  • Q: How does Git ensure data integrity? A: Every object is identified by a SHA-1 hash generated from its content. If even one byte changes, the hash changes, making corruption immediately detectable. You can verify integrity with git fsck.

  • Q: If you commit the same file content in two different branches, does Git store it twice? A: No. Git is content-addressable — identical content produces the same SHA-1 hash, so Git stores only one blob object. Both branches reference the same object, saving space.

Topics Covered

Git FundamentalsGit Introduction

Tags

#git#architecture#objects#sha

Last Updated

2026-02-12