Large Repo Performance Tips

The Hook (The "Byte-Sized" Intro)

A 10GB repo with 500,000 files and 100,000 commits will bring git status to its knees — unless you know the tricks. Shallow clones, partial clones, fsmonitor, commit graphs, and sparse checkout can make a massive repo feel as responsive as a small one. These aren't workarounds — they're how companies like Microsoft manage repos with 3.5 million files.

📖 What are Large Repo Performance Tips?

Techniques and configurations that keep Git fast and responsive when working with repositories that have large file counts, deep history, or significant binary content.

Conceptual Clarity

Performance techniques ranked:

Technique	Impact	Effort	What It Helps
`core.fsmonitor`	🟢 High	Low	`git status` speed
`core.untrackedCache`	🟢 High	Low	`git status` speed
Commit graph	🟢 High	Low	`git log`, `git merge-base`
Shallow clone	🟡 Medium	Low	Clone speed, disk space
Partial clone	🟢 High	Low	Clone speed, disk space
Sparse checkout	🟢 High	Medium	Working dir size
Git LFS	🟡 Medium	Medium	Binary file handling
`feature.manyFiles`	🟢 High	Low	Enables multiple optimizations

What feature.manyFiles enables:

core.untrackedCache true
core.fsmonitor true
index.version 4 (smaller, faster index)

Real-Life Analogy

Optimizing a large repo is like optimizing a warehouse. You don't carry everything — you have a catalog (commit graph), a fast lookup system (fsmonitor), and only stock what's needed on the floor (sparse checkout).

Visual Architecture

flowchart TD LARGE["📦 Large Repo 500K files"] --> FS["⚡ fsmonitor Fast status"] LARGE --> GRAPH["📊 Commit graph Fast log"] LARGE --> SPARSE["🔍 Sparse checkout Subset of files"] LARGE --> LFS["📎 Git LFS Large files"] FS & GRAPH & SPARSE & LFS --> FAST["✅ Fast Operations"] style LARGE fill:#2d1b1b,stroke:#e94560,color:#e94560 style FAST fill:#1b2d1b,stroke:#53d8fb,color:#53d8fb

Why It Matters

Developer productivity: Slow Git = wasted time on every operation.
CI speed: Shallow/partial clones cut pipeline times significantly.
Disk space: Partial clones avoid downloading unused data.
Scale: These techniques are how Git scales to enterprise-sized repos.

Code

bash

# ─── Quick wins (set these immediately) ───
git config --global feature.manyFiles true
git config --global core.fsmonitor true
git config --global core.untrackedCache true

# ─── Generate commit graph (speeds up log, merge-base) ───
git commit-graph write --reachable
# Git auto-updates this on gc, but you can trigger manually

# ─── Shallow clone (CI/CD) ───
git clone --depth 1 https://github.com/team/large-repo.git
# Only 1 commit of history — fast clone

# ─── Partial clone (on-demand blobs) ───
git clone --filter=blob:none https://github.com/team/large-repo.git
# Tree objects only; blobs fetched on demand

# ─── Git LFS for large binaries ───
git lfs install
git lfs track "*.psd" "*.zip" "*.mp4"
git add .gitattributes
git commit -m "chore: track large files with LFS"

# ─── Check repo size ───
git count-objects -vH
# size-pack: 2.1 GiB  ← How much data is packed

# ─── Aggressive garbage collection ───
git gc --aggressive --prune=now

Key Takeaways

feature.manyFiles true is the single biggest quick win for large repos.
Commit graph speeds up git log, merge-base, and branch operations.
Shallow/partial clones are essential for CI/CD pipelines.
Git LFS handles large binary files without bloating the repo.

Interview Prep

Q: How do you make Git faster in a large repository? A: Enable feature.manyFiles (activates fsmonitor, untracked cache, index v4), generate a commit graph (git commit-graph write), use sparse checkout to limit working directory size, and use Git LFS for binary files.
Q: What is a partial clone? A: git clone --filter=blob:none downloads commit and tree objects but fetches file content (blobs) on demand as you check out or access files. This dramatically reduces initial clone time and disk usage.
Q: How does core.fsmonitor speed up git status? A: Instead of Git scanning every file for changes, fsmonitor uses the OS file system events to track which files changed. Git only checks those files, making git status near-instant even in repos with hundreds of thousands of files.