Contents

git directories

All meta-information is contained outside the working tree (in <dir>/.git)

.git/refs/heads - has named heads (these are local branches)

each file has a filename with the branch name, and a sha1 commit id as the file contents

.git/objects = object repository

each object is zlib-compressed file, or pack of delta-compressed objects

all sha1s refer to full-sized objects (that is, individual changes (like patch hunks) are not addressable in the object repository, only full items)

git concepts

inside the object repository are 4 types of objects:

blob = file contents

tree = directory (references other trees and blobs, with permissions)

commits = structured (but flat) text describing a commit

tags = structured text describing a tag

use git cat-file -p <id> to pretty-print any particular object in the tree

git uses content-addressable storage

the content of every item has a hash (sha1) that is used to reference the item

the item is stored under .git/objects/xx/yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy where xx is a directory, with the name being the first two digits of the sha1 and yy... is the filename. Note that the object repository does NOT know the original filename of the object it is storing (this is determined from 'tree' entries at runtime)

packs are compressed archives holding multiple objects, which are created by doing a garbage-collect or pack operation

packs consist of an index and a pack

content in packs are delta-encoded and compressed

The repository does not hold "changes" to files, it hold entire files and trees

pack files are delta-encoded, but this is an implementation detail.

Conceptually, every instance (version) of every file that ever existed in the project history, and every instance of every directory configuration that every existed, is stored in the repository.

a commit ALWAYS includes more than just a blob object

a commit is plain text, with a reference to commit parents (previous source states) and tree (current source state)

a commit refers to the top of tree object, which refers to sub-tree object and blobs for the entire source tree state at the time of the commit

there are two tree spaces in a git reposistory:

a commit tree - a graph of commits from a head leading back to the beginning of the repository

a source tree - a graph of trees and blobs forming the source state at a particular time

refs (heads, branches, tags, remotes) are all just references inside the object repository to different tree roots

pseudo-code for git operations

git add file.c

place file.c in object repository

calculate sha1 hash, zip file.c, and place contents under .git/object/xx/yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

create file .git/refs/heads/test_branch with sha1 from the contents of the file pointed to by .git/HEAD

that is, if .git/HEAD has the contents "ref: refs/heads/master", and .git/refs/heads/master contains sha1 fe42b357730f3b37e80b3664f9b761b26cee9f68, then created the file .git/refs/heads/test_branch with that sha1

git revert HEAD - revert the last commit

ask for the commit message

create a commit object (plain text file), referring to the current HEAD commit

commit parent will be the current HEAD commit

commit tree will be the same as HEAD parent's tree (taking us back to the state before the current HEAD commit)

add that commit object to the object repository

move the HEAD reference to this commit (e.g. change the sha1 in .git/refs/heads/master to the sha1 of the new commit)

how does git status work?

git status compares the index with the head and the current working directory