Lesson Completion
Back to course

Large Repo Hygiene

Beginner
7 minutes4.7Git

The Hook (The "Byte-Sized" Intro)

A repo doesn't get big overnight. It happens one 50MB video, one committed node_modules, one accidental database dump at a time. And here's the catch: Git never forgets. Even if you delete the file in the next commit, the blob stays in history forever. Prevention is 100x easier than cleanup. These hygiene habits keep your repo lean from day one.

📖 What is Large Repo Hygiene?

Habits and practices that prevent repositories from growing unnecessarily large, keeping Git operations fast and clone times reasonable.

Conceptual Clarity

The hygiene checklist:

#PracticeWhy
1Comprehensive .gitignoreKeep build artifacts, deps, and OS files out
2Use Git LFS for binariesLarge files stored efficiently outside Git
3No committed dependenciesnode_modules, .venv belong in .gitignore
4No secrets in historyOnce committed, secrets persist in history
5Review large file additionsPR checks can catch oversized files
6Prune stale branchesfetch.prune true removes dead remote branches
7Regular git gcCompresses objects and removes unreachable data
8Monitor repo sizeTrack growth before it becomes a problem

Common repo bloat sources:

SourceSize ImpactPrevention
node_modules/500MB+.gitignore
Video/image assets100MB+ per fileGit LFS
Database dumps50MB+.gitignore
Build artifacts50-500MB.gitignore
IDE files<5MB but noisy diffs.gitignore
Accidental binariesVariesPre-commit hook

Real-Life Analogy

Repo hygiene is like kitchen hygiene. Clean as you cook (ignore files, use LFS) and the kitchen stays usable. Let dishes pile up (commit binaries, skip .gitignore) and eventually nobody can work in there.

Visual Architecture

flowchart TD COMMIT["Every commit"] --> CHECK{"Large file?"} CHECK -->|"Binary > 1MB"| LFS["📎 Git LFS"] CHECK -->|"Build artifact"| IGNORE["🚫 .gitignore"] CHECK -->|"Source code"| GIT["📦 Git"] style LFS fill:#1a1a2e,stroke:#ffd700,color:#ffd700 style IGNORE fill:#2d1b1b,stroke:#e94560,color:#e94560 style GIT fill:#1b2d1b,stroke:#53d8fb,color:#53d8fb

Why It Matters

  • Clone speed: A 5GB repo takes minutes to clone; a 50MB repo takes seconds.
  • CI cost: Large repos increase build times and storage costs.
  • Permanent: Git never truly deletes data from history — prevention is key.
  • Team friction: Nobody wants to wait 10 minutes to clone a repo.

Code

bash
# ─── Monitor repo size ─── git count-objects -vH # count: 0 # size: 0 bytes # size-pack: 45.2 MiB ← Total compressed size # ─── Find large files in history ─── git rev-list --objects --all \ | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' \ | sed -n 's/^blob //p' \ | sort -rnk2 \ | head -10 # Shows the 10 largest blobs ever committed # ─── Pre-commit hook to block large files ─── cat > .git/hooks/pre-commit << 'EOF' #!/bin/sh MAX_SIZE=5242880 # 5MB in bytes for file in $(git diff --cached --name-only); do size=$(wc -c < "$file" 2>/dev/null || echo 0) if [ "$size" -gt "$MAX_SIZE" ]; then echo "❌ $file is $(($size / 1048576))MB. Use Git LFS for files > 5MB." exit 1 fi done EOF chmod +x .git/hooks/pre-commit # ─── Clean up (if damage is already done) ─── # Use BFG Repo Cleaner to remove large files from history: # java -jar bfg.jar --strip-blobs-bigger-than 10M repo.git # git reflog expire --expire=now --all && git gc --prune=now --aggressive

Key Takeaways

  • Prevention > cleanup. Once a large file is in history, removing it is complex.
  • Use .gitignore for build artifacts and dependencies; Git LFS for binaries.
  • A pre-commit hook can block files above a size threshold.
  • git count-objects -vH monitors repo size; review regularly.

Interview Prep

  • Q: Why is repo size management important? A: Large repos slow down cloning, CI/CD pipelines, and everyday Git operations. Since Git stores full history, large files committed once remain in the repo forever unless explicitly scrubbed from history.

  • Q: How do you remove a large file that was accidentally committed? A: Use BFG Repo-Cleaner or git filter-repo to rewrite history and remove the file from all commits. Then force-push and have all team members re-clone. This is why prevention (.gitignore, LFS, pre-commit hooks) is much better.

  • Q: What is Git LFS and when should you use it? A: Git Large File Storage replaces large files in the repo with small pointer files. The actual file content is stored on a separate LFS server. Use it for binaries, videos, images, and any files > 1-5MB that need to be versioned.

Topics Covered

Large ReposBest Practices

Tags

#git#hygiene#large-repos#best-practices

Last Updated

2026-02-13