Git Internals

Most Git users learn commands without understanding what Git does behind the scenes. Learning the internals helps build an accurate mental model — making it much easier to debug problems, understand complex operations like rebase and reset, and appreciate why Git behaves the way it does.

The core idea of Git is deceptively simple: Git is a content-addressable filesystem. Every piece of data is stored as an object and given a unique name based on the SHA-1 hash of its content. If the content is the same, the name is the same — anywhere in the world.

The Four Object Types

Everything in Git is stored as one of four object types inside the .git/objects/ directory:

1. Blob (Binary Large Object)

A blob stores the raw contents of a file — no filename, no metadata, just the content. Two identical files in different locations have the same blob.

git cat-file -t a3f92b1   # Shows object type
# blob

git cat-file -p a3f92b1   # Shows object content
# Hello, World!

2. Tree

A tree represents a directory. It contains references to blobs (files) and other trees (subdirectories), along with names and permissions. A tree is like a snapshot of a directory at a specific moment.

git cat-file -p HEAD^{tree}
# 100644 blob a3f92b1  index.html
# 100644 blob b7c3d45  style.css
# 040000 tree c1a2b3c  assets/

3. Commit

A commit object points to one tree (the project state at that moment) and contains metadata: author, committer, date, message, and a pointer to the parent commit(s). Commits form the chain that is the history.

git cat-file -p HEAD
# tree a3f92b1c8d4e5f6789...
# parent b7c3d45ef1234567...
# author Ravi Kumar <ravi@example.com> 1696842730 +0530
# committer Ravi Kumar <ravi@example.com> 1696842730 +0530
#
# Add login page

4. Tag (Annotated)

An annotated tag object points to a commit and adds metadata: tagger name, date, and tag message. Lightweight tags are just references (not objects).

How Objects are Stored

Each object is stored in the .git/objects/ directory. The file path is determined by the SHA-1 hash:

  • Hash: a3f92b1c8d4e5f6789abcdef1234567890abcdef
  • Stored as: .git/objects/a3/f92b1c8d4e5f6789abcdef1234567890abcdef

The first two characters of the hash become a folder name, and the rest become the filename. The content is stored compressed using zlib.

The Complete Structure of a Commit

Commit Object
    │
    └── points to → Tree (root directory snapshot)
                          │
                          ├── Blob (index.html content)
                          ├── Blob (style.css content)
                          └── Tree (assets/ subfolder)
                                    │
                                    ├── Blob (logo.png content)
                                    └── Blob (icon.svg content)

Each commit captures the entire state of the project as a tree of trees and blobs. Git does not store diffs — it stores complete snapshots. However, because identical files share the same blob object, storage is very efficient.

References — How Branches and Tags Work

A branch is not a complex structure — it is simply a file containing a commit hash. For example:

cat .git/refs/heads/main
# a3f92b1c8d4e5f6789abcdef1234567890abcdef

When a commit is made on main, Git updates this file to point to the new commit hash. That is all a branch is — a movable pointer to a commit.

HEAD is also just a file:

cat .git/HEAD
# ref: refs/heads/main

This says HEAD points to main. In detached HEAD state, it contains a commit hash directly instead of a branch reference.

The Git Object Model — A Concrete Example

# Create a file and commit it
echo "Hello Git" > hello.txt
git add hello.txt
git commit -m "First commit"

# Git created these objects:
# 1. Blob: contains "Hello Git"
# 2. Tree: contains reference to the blob with name "hello.txt"
# 3. Commit: contains reference to the tree, plus author info and message

# Low-level inspection
git rev-parse HEAD           # Show the full commit hash
git cat-file -p HEAD         # Show commit object contents
git cat-file -p HEAD^{tree}  # Show tree object (directory snapshot)

Packing — How Git Saves Space

Initially, each object is stored as a separate compressed file (loose objects). Over time, Git packs them into a single "packfile" using delta compression (storing differences between similar objects). This dramatically reduces storage size for large repositories.

# Manually trigger packing (Git does this automatically)
git gc

# See pack files
ls .git/objects/pack/

The Index — The Staging Area Explained

The staging area (or Index) is stored as a binary file at .git/index. It contains a list of all tracked files and their current staged versions (as blob hashes). When git add is run, the index is updated with the new blob hash for that file.

# Inspect the index in a readable format
git ls-files --stage

# Output:
100644 a3f92b1 0  index.html
100644 b7c3d45 0  style.css

What Happens When a Commit is Made

Understanding the exact sequence of steps Git takes during a commit:

  1. For each staged file, Git creates a blob object (if it does not already exist) from the file content
  2. Git creates a tree object representing the entire directory snapshot, referencing the correct blobs
  3. Git creates a commit object with the tree reference, parent commit hash, author info, and message
  4. Git updates the branch pointer (e.g., .git/refs/heads/main) to point to the new commit hash
  5. Git updates HEAD to reflect the new state

Useful Plumbing Commands

CommandPurpose
git cat-file -t <hash>Show the type of a Git object (blob, tree, commit, tag)
git cat-file -p <hash>Show the content of a Git object in human-readable form
git ls-files --stageShow the contents of the index (staging area)
git rev-parse HEADShow the full hash of the current commit
git rev-parse HEAD~3Show the hash of the commit 3 steps back
git count-objects -vShow how many objects exist and their size
git fsckCheck repository integrity — find orphaned or corrupt objects

Summary

Git stores all data as four types of objects: blobs (file contents), trees (directory snapshots), commits (history entries), and annotated tags. Each object has a unique SHA-1 hash based on its content. Branches are just files containing a commit hash — movable pointers. HEAD is a file that points to the current branch (or directly to a commit). Understanding this model makes Git commands deeply intuitive — reset moves a pointer, checkout changes what HEAD points to, and rebase replays commits to create new objects with new hashes.

Leave a Comment

Your email address will not be published. Required fields are marked *