Git, Github, the Autograder, and You!

Scott Walls, 2012-09-11 (revised 2013)

Abstract

This document is a partial introduction to Git and Github in the
context of EECS 482. This is its only purpose, and I do not recommend
it as a full introduction to Git, especially outside of the context of this
class.

Introduction

The purpose of this document is to help you, a bright-eyed EECS 482 student,
understand the framework we have in place for you to work on your projects.
This framework has three main components: Git, Github, and the Autograder.

First, I'll talk a little about Git in a general sense. Then, I'll talk about
Github, particularly in relation to the project infrastructure. Finally, I'll give a
small list of Git commands useful for the project. Those of you who are already
familiar with Git and Github can feel free to skim. The material specific to
EECS 482 is mostly sprinkled through the Github and Git Commands sections.

Git - What?! Why?!

In EECS 482, in 2012 we decided to use Git. Git is a *distributed Version Control
System*, or VCS. Let's break that definition down a bit. Better yet, let's do it
backwards!

A *Version Control System* is software that tracks changes to files over time.
Examples include CVS, SVN, and Perforce. Using a VCS is a very good idea,
since it allows you to easily collaborate with other people, provides some help in
the case of computer failure, and allows you to do nice things such as reverting
and branching code.

Now, let's talk about what it means to be a *distributed* VCS. In the old
days, the days of CVS and SVN, version control looked a little like this:

with every programmer having a *working copy*, which is a folder containing the
copy of their code that they were actually working on. They would make their
changes, then send them to the central repository. They would also get changes
that other users had made from the central repository. There are some problems
with this centralized approach.

If the central repository goes down, you can't commit code to the repository,
nor can you pull code from the repository.

If the central repository explodes, you have lost the record of your code.
Any previous versions are lost.

So, rather than centralize, Git distributes! In Git, every user has their
own copy of the entire repository (including its entire history). This solves the
aforementioned problems. It has the added disadvantage of high disk usage, but
disks are cheap these days. This local repository means that we no longer lose
all of our data when the central repository goes down, but it also means that
we are no longer necessarily tied to a centralized topology. Sometimes, Git
projects look more like this:

with individual users pushing directly to one another. This is
distributed to its logical extreme.

It is possible to manage your project in this entirely distributed
fashion in this class, but I will mostly cover using Git a lot like
SVN, but with local repositories, which looks something like this:

This begs one glaring question: where will the central repository
live?

Enter Github (Way of Github? The Game of Github?)

Github is a great site that hosts off-site repositories for Git projects. It gives
free public repositories to open source projects, and it has been nice enough to
give us an eecs482 organization where we can host private repositories.
We will make repositories for you for each project (and one just for fun). In
order to access them, you need to:

Create a Github username and set up SSH keys, etc.. Github has great
how-tos
on this, and I've been told their GUI installer does it for you beautifully,
although I haven't tried it.

Go to the EECS 482 website and register your Github username with
the "Register your github username" link. This will give you a link to a
Github website listing your test repository (project 0). I found this link
helpful to save.

Now you're ready to start learning some Git commands.

A Small Set of Git Commands

Here I'll talk about a few Git commands that I find integral to working with
Git and Github *with relation to EECS 482*. This is not a comprehensive list
of Git commands. See the section "Conclusions and Further Reading" for places
that may have such a list. Also, feel free to share any Git commands that you
like with fellow students.

The first thing you'll want to do is get a working copy set up on your machine,
i.e. getting what's already set up from Github to you. This is is called cloning
the repository. (As a notation, I'll use YOUR_UNIQNAME to mean your
uniqname. I hope this isn't too much of a stretch.)

git clone git@github.com:eecs482/YOUR_UNIQNAME.0

The clone command sets up a working copy and repository on your computer that have
the same files as the repository on Github. It also sets up a remote repository
with the name origin, which will come into play in just a little bit. You should
now have a folder called YOUR_UNIQNAME.0 which holds your working copy
and repository. Feel free to play around with it. This repository's sole purpose
in life is to get you up to speed with Git. We will talk more later about clone and other
commands for working with remote repositories.

So now, you work on your code and make a few files. You'd like to save
some of your work. The next four commands are intimately related, so I'll talk
about them all at once.

The most important piece here is commit. That means taking the code and
putting it into your local repository. Here's where Git is a bit different from
most other Version Control Systems. In Git, commit takes things from your
*staging area* and puts them into the local repository. The staging area is Git's
name for the set of files that will make it into your next commit. For a file,
the commit is the big time, so "staging area" and its limelight connotations are
appropriate! The staging area and how these commands interact with it are
shown here:

There can be files in your Git directory that Git doesn't pay much attention
to. These files are called *untracked*. Git will not commit untracked files! In order
to have git track the files, you have to first add them to the local repository,
which will also put them in the staging area. You can use the rm command to
simultaneously remove them from the working copy and delete them.

As previously stated, commit only deals with files in your *staging area*. To
move a file into your staging area, you use the add command
again. For those of you used to SVN, for example, the -a flag may be
useful; it adds changes to all tracked files to the staging area before
committing. Finally, git status will
show you the status of your files, viz. whether they have been
modified, are in the staging area, or are untracked.

So you've made some changes to your project. Great. What about your
teammate? She's been coding away, too! Perhaps you two should share your
changes. In order to do this, you'll need to push and pull. Say you'd like to share
your changes with your teammate. First, you commit to your local repository, as
before. Then, you push your local repository to the remote repository (Github).
Once everything's in github, your partner can pull.

One snag in this process is that the repository on Github is initially empty.
When I say empty, I mean *empty* - no files, no starting point, *nothing.*
The first time that you push commits to a new Github repository, you must
explicitly tell git what to push:

git push origin -u master

When you ran git clone, git set up a reference to a remote repository called origin which
points to the original github repository. Your default branch is called master, so this
command pushes the local master branch to the remote repository named by origin. Since
master doesn't exist there yet, this command will create it. The -u flag links these
local and remote branches together, so that in the future, you can just type git push
or git pull without the extra verbiage.

Besides git clone, there is another way to create a git repository, which you may have
stumbled upon in your own reading, or which you may be more familiar with from SVN:

git init

This command creates an empty repository in the current working folder. If you've done this,
made some changes and commits, and now you want to push them to github, this is no problem.
You simply first tell git where you're going to push to, and then you push:

git remote add defines a new *remote* (reference to a remote repository) called
origin, pointing at a github repository. The second command tells git to push
the local master branch to the origin remote repository, just as above.

Now, submitting to the Autograder. The Autograder automatically looks
at the github repository. When you submit from the project page on the
website, you specify a branch name for the Autograder to pull code
from. The Autograder then looks at a special file called SUBMIT and
runs the correct code. Magical! Here is a visual representation of
these remote commands:

You may want to be very sure that the files that are on the github repository
are the ones you assumed. I would personally recommend using the github web
interface for this, although there are many other ways to do it.
That's most of what you'll use on the group projects.

Conflict resolution; every group goes through it

While we hope that you find it easy to work together harmoniously with your
group members, it is quite common that *conflicts* will arise in your code.
This happens in any version control system when two people make changes to
the same part of the code. Maybe you were cleaning up some spacing or fixing
a typo while your compatriot was adding something new. You may notice this first
when trying to push:

When you go open file.txt to see what on earth happened, you see this mess:

When you go open file.txt to see what on earth happened, you see this mess:
<<<<<<< HEAD
Text text and still more text.
=======
Text text and more text.
>>>>>>> 028663c8fcfa3d08ff77fe60d48b4ce34a8db6c4

This shows you both versions of the file, and the reason for the
conflict (how the versions differ). The first part before the
======== line is the local version, and the part below that line is
the remote version. (That gobbledygook at the end is a checksum of the
remote content, which also identifies that commit. Don't worry about it.)
Git has helpfully added those extra markers in your local copy to show
you precisely where the conflict occurred.

To fix the conflict, you need only edit the conflicting file(s), add them
(which marks their conflicts as resolved), and then commit.

Branches; or, breaking the code without breaking the code

One of git's most useful features is the ability to easily create
and work with *branches*. A *branch* is simply a sequence of revisions and
a name to identify them. You've already been working with one branch in git,
called master. If you have no other branches in a git repository, you always
at least have master.

Suppose that you've just finished a major part of the project. All your own tests
are passing, your autograder score is getting better, and you're ready to tackle
the next component. This is a great time to create a branch, because the master
branch is currently in a good state. Though perhaps not everything is implemented,
the code that *is* finished does its job right. Having experimental changes on a
separate branch allows you to revert back to a stable version at any time, and it also
allows you to easily see the changes you've made since you started working on the new
components. Often, developers will create a local branch just to try out some new approach.
If it turns out not to work, they can just delete the branch and pretend it never happened.

The branch command with an argument will make a new branch with that
name. It will NOT move you into that branch. For that, use the checkout
command.

git branch NEWBRANCHNAME
git checkout NEWBRANCHNAME

Or equivalently:

git checkout -b NEWBRANCHNAME

The branch command with no arguments will show you a list of all
the branches on your local repository:

git branch

If you're using branches - e.g. to work on some experimental revision to your
project - you may want to share those branches with your teammates. You've actually
already done this earlier in this tutorial, via git push and git pull. The only
trick here is to remember that, if you created a branch in your local repository,
github doesn't know about that branch, and git doesn't know where to push it.
Just as before, you do this simply by specifying the *remote* and the *branch* explicitly
when you push:

git push origin -u NEWBRANCHNAME

If your teammate has pushed a new branch to github, you may notice its
existence the next time you pull:

$ git pull
...
* [new branch] NEWBRANCHNAME -> origin/NEWBRANCHNAME
$

Just like any other branch, you can now switch to this branch with checkout:

git checkout NEWBRANCHNAME

Lastly, every good branch must come to an end. Once the experimental changes on a branch
are stable and your tests are passing, it's probably no longer experimental, and it's time
to integrate those changes back into your main line of development. This is called
*merging* a branch, and it's done simply as follows:

git checkout master
git merge NEWBRANCHNAME

This switches to the master branch and merges NEWBRANCHNAME back into it.
Hopefully there are no conflicts, but if there are, see above for how to resolve them.
The merge process should be familiar if you've already been pushing and pulling;
git pull is really just fetching commits from the remote branches and then merge-ing
them.

Conclusions and Further Reading

Well, there you have it. That's my little spiel on Git, Github, and
the Autograder. The gist is that Git is a distributed VCS, we've given
you some github repositories, and the Autograder looks directly at
those repositories.

There's so much more to learn, though! My two
favorite references for those of you who plan to use Git are: _The Git
Parable_ by Tom Preston-Werner, and the _Git Book_ by Scott Chacon. Both
are on the web and just a Google away. _The Git Parable_ is, in my
opinion, the best reference for understanding the philosophy of Git
wrapping your head around the Git Way. _The Git Book_ is more or less
just a spec/tutorial, but much more complete than this little
document. Good luck, and remember that your instructors and your
classmates are all very good resources!