Lesson Completion
Back to course

SHA Hashing

Intermediate
7 minutes4.7Git

The Hook (The "Byte-Sized" Intro)

Every commit SHA you see — a1b2c3d — isn't random. It's a fingerprint computed from the object's content using SHA-1. Change one character in a file and the hash changes completely. Same content always produces the same hash. This is how Git guarantees data integrity — if any bit is corrupted, the hash won't match and Git knows something is wrong.

📖 What is SHA Hashing?

SHA-1 is a cryptographic hash function that takes any input and produces a 40-character hexadecimal string. Git uses it to create unique identifiers for every object (blob, tree, commit, tag).

Conceptual Clarity

SHA properties in Git:

PropertyWhat It Means
DeterministicSame input → always the same hash
UniqueDifferent content → different hash (virtually guaranteed)
Fixed lengthAlways 40 hex chars (160 bits)
One-wayCan't reverse a hash to get the content
AvalancheChange 1 byte → completely different hash

What gets hashed for each object type:

ObjectInput to SHA-1
Blobblob <size>\0<content>
Treetree <size>\0<entries>
Commitcommit <size>\0<tree+parents+author+message>
Tagtag <size>\0<object+type+tagger+message>

Real-Life Analogy

SHA hashing is like a fingerprint scanner. Every person has a unique fingerprint. If you scan the same finger twice, you get the same result. If someone swaps in a different finger, the scan won't match. Git "fingerprints" every object to ensure nothing has been tampered with.

Visual Architecture

flowchart LR CONTENT["📄 File Content"] --> HASH["🔐 SHA-1"] HASH --> SHA["a1b2c3d4e5f6..."] CONTENT2["📄 Same Content"] --> HASH2["🔐 SHA-1"] HASH2 --> SHA2["a1b2c3d4e5f6..."] style SHA fill:#0f3460,stroke:#53d8fb,color:#53d8fb style SHA2 fill:#0f3460,stroke:#53d8fb,color:#53d8fb

Why It Matters

  • Data integrity: Corruption is detected instantly — hash won't match.
  • Deduplication: Identical files share the same hash and same storage.
  • Unique addressing: Every object is globally unique across all repos.
  • Distributed: No central authority needed to assign IDs.

Code

bash
# ─── Compute a blob hash manually ─── echo -n "Hello World" | git hash-object --stdin # 5e1c309dae7f45e0f39b1bf3ac3cd9db12e7d689 # ─── Same content always = same hash ─── echo -n "Hello World" | git hash-object --stdin # 5e1c309dae7f45e0f39b1bf3ac3cd9db12e7d689 (identical!) # ─── Different content = different hash ─── echo -n "Hello World!" | git hash-object --stdin # c57eff55ebc0c54973903af5f72bac72762cf4f4 (completely different!) # ─── View a commit's full SHA ─── git rev-parse HEAD # a1b2c3d4e5f6789... (40 characters) # ─── Short SHA (usually 7 chars, enough to be unique) ─── git rev-parse --short HEAD # a1b2c3d

Key Takeaways

  • Every Git object gets a SHA-1 hash computed from its content.
  • Same content always produces the same hash — enabling deduplication.
  • Any change to content produces a completely different hash — enabling integrity checks.
  • Short SHAs (7+ chars) are used for display; full SHAs (40 chars) are stored internally.

Interview Prep

  • Q: Why does Git use SHA-1 hashing? A: SHA-1 provides content-addressable storage (same content = same address), data integrity verification (corruption changes the hash), and globally unique identifiers without a central authority — all essential for a distributed version control system.

  • Q: What makes SHA hashes "content-addressable"? A: The hash IS the address. You don't assign IDs — the content determines its own ID. This means the same file in different repositories, branches, or commits always has the same hash, enabling efficient deduplication and comparison.

  • Q: Is SHA-1 still secure for Git? A: While SHA-1 has known cryptographic weaknesses (collision attacks), Git mitigates this with additional checks and is transitioning to SHA-256. For Git's use case (content addressing, not security), the risk is extremely low.

Topics Covered

Git InternalsHashing

Tags

#git#sha#hashing#internals

Last Updated

2026-02-13