Git Internals
Most Git users learn commands without understanding what Git does behind the scenes. Learning the internals helps build an accurate mental model — making it much easier to debug problems, understand complex operations like rebase and reset, and appreciate why Git behaves the way it does.
The core idea of Git is deceptively simple: Git is a content-addressable filesystem. Every piece of data is stored as an object and given a unique name based on the SHA-1 hash of its content. If the content is the same, the name is the same — anywhere in the world.
The Four Object Types
Everything in Git is stored as one of four object types inside the .git/objects/ directory:
1. Blob (Binary Large Object)
A blob stores the raw contents of a file — no filename, no metadata, just the content. Two identical files in different locations have the same blob.
git cat-file -t a3f92b1 # Shows object type
# blob
git cat-file -p a3f92b1 # Shows object content
# Hello, World!
2. Tree
A tree represents a directory. It contains references to blobs (files) and other trees (subdirectories), along with names and permissions. A tree is like a snapshot of a directory at a specific moment.
git cat-file -p HEAD^{tree}
# 100644 blob a3f92b1 index.html
# 100644 blob b7c3d45 style.css
# 040000 tree c1a2b3c assets/
3. Commit
A commit object points to one tree (the project state at that moment) and contains metadata: author, committer, date, message, and a pointer to the parent commit(s). Commits form the chain that is the history.
git cat-file -p HEAD
# tree a3f92b1c8d4e5f6789...
# parent b7c3d45ef1234567...
# author Ravi Kumar <ravi@example.com> 1696842730 +0530
# committer Ravi Kumar <ravi@example.com> 1696842730 +0530
#
# Add login page
4. Tag (Annotated)
An annotated tag object points to a commit and adds metadata: tagger name, date, and tag message. Lightweight tags are just references (not objects).
How Objects are Stored
Each object is stored in the .git/objects/ directory. The file path is determined by the SHA-1 hash:
- Hash:
a3f92b1c8d4e5f6789abcdef1234567890abcdef - Stored as:
.git/objects/a3/f92b1c8d4e5f6789abcdef1234567890abcdef
The first two characters of the hash become a folder name, and the rest become the filename. The content is stored compressed using zlib.
The Complete Structure of a Commit
Commit Object
│
└── points to → Tree (root directory snapshot)
│
├── Blob (index.html content)
├── Blob (style.css content)
└── Tree (assets/ subfolder)
│
├── Blob (logo.png content)
└── Blob (icon.svg content)
Each commit captures the entire state of the project as a tree of trees and blobs. Git does not store diffs — it stores complete snapshots. However, because identical files share the same blob object, storage is very efficient.
References — How Branches and Tags Work
A branch is not a complex structure — it is simply a file containing a commit hash. For example:
cat .git/refs/heads/main
# a3f92b1c8d4e5f6789abcdef1234567890abcdef
When a commit is made on main, Git updates this file to point to the new commit hash. That is all a branch is — a movable pointer to a commit.
HEAD is also just a file:
cat .git/HEAD
# ref: refs/heads/main
This says HEAD points to main. In detached HEAD state, it contains a commit hash directly instead of a branch reference.
The Git Object Model — A Concrete Example
# Create a file and commit it
echo "Hello Git" > hello.txt
git add hello.txt
git commit -m "First commit"
# Git created these objects:
# 1. Blob: contains "Hello Git"
# 2. Tree: contains reference to the blob with name "hello.txt"
# 3. Commit: contains reference to the tree, plus author info and message
# Low-level inspection
git rev-parse HEAD # Show the full commit hash
git cat-file -p HEAD # Show commit object contents
git cat-file -p HEAD^{tree} # Show tree object (directory snapshot)
Packing — How Git Saves Space
Initially, each object is stored as a separate compressed file (loose objects). Over time, Git packs them into a single "packfile" using delta compression (storing differences between similar objects). This dramatically reduces storage size for large repositories.
# Manually trigger packing (Git does this automatically)
git gc
# See pack files
ls .git/objects/pack/
The Index — The Staging Area Explained
The staging area (or Index) is stored as a binary file at .git/index. It contains a list of all tracked files and their current staged versions (as blob hashes). When git add is run, the index is updated with the new blob hash for that file.
# Inspect the index in a readable format
git ls-files --stage
# Output:
100644 a3f92b1 0 index.html
100644 b7c3d45 0 style.css
What Happens When a Commit is Made
Understanding the exact sequence of steps Git takes during a commit:
- For each staged file, Git creates a blob object (if it does not already exist) from the file content
- Git creates a tree object representing the entire directory snapshot, referencing the correct blobs
- Git creates a commit object with the tree reference, parent commit hash, author info, and message
- Git updates the branch pointer (e.g.,
.git/refs/heads/main) to point to the new commit hash - Git updates HEAD to reflect the new state
Useful Plumbing Commands
| Command | Purpose |
|---|---|
git cat-file -t <hash> | Show the type of a Git object (blob, tree, commit, tag) |
git cat-file -p <hash> | Show the content of a Git object in human-readable form |
git ls-files --stage | Show the contents of the index (staging area) |
git rev-parse HEAD | Show the full hash of the current commit |
git rev-parse HEAD~3 | Show the hash of the commit 3 steps back |
git count-objects -v | Show how many objects exist and their size |
git fsck | Check repository integrity — find orphaned or corrupt objects |
Summary
Git stores all data as four types of objects: blobs (file contents), trees (directory snapshots), commits (history entries), and annotated tags. Each object has a unique SHA-1 hash based on its content. Branches are just files containing a commit hash — movable pointers. HEAD is a file that points to the current branch (or directly to a commit). Understanding this model makes Git commands deeply intuitive — reset moves a pointer, checkout changes what HEAD points to, and rebase replays commits to create new objects with new hashes.
