Choosing a Branching Scheme for GitHub when Teaching Newbies

There is a commonly-used branching scheme used in professional software development, specifically, a developer creates (and shares) a branch specifically for developing a feature, then issues a pull request into the main line from that branch.

I am skeptical that this scheme is practical for teaching students. Specifically, when teaching students who have never on any computer programming before, it is very important to minimize the additional overhead of version control, especially managing branches.

Which Features of Version Control are Most Useful for Beginners?

What are our objectives in having students use Git(Hub)? While we would want to have this experience acclimate our students to the rhythms they might expect in a professional software development project, this is not foremost went teaching beginners.

The objectives that are foremost in this use of Git/GitHub are, in my opinion, as follows:

Provide starting code and files to students for assignments when such initial files are necessary or beneficial

provide a means for the student and the instructor to trace the development of related assignments over time

provide a tool for the instructor to give useful feedback with in-line comments and direction

facilitate delivery of assignments to the instructor

provide a means for the student to recover work in case of errors or system problems

What aspects of Git should we avoid for beginners?

I believe it is important to only use those aspects of Git and GitHub that are critical to accomplishing these objectives. For many students, simply learning how to do computer programming is a heavy cognitive load. Consequently, we should avoid aspects of version control that add to that cognitive load. In particular, we should avoid or minimize the student’s need for these operations:

Issuing (textual) git commands via terminal/command window

selective file staging

multiple repositories

switching branches

I believe we should strenously avoid having students do the following operations, some of which are challenging even to experienced professional developers:

merging branches

attaching submodules

resolving merge conflicts

creating branches

tagging

stashing

changing upstream/origin

SSH authentication

Key use cases

INITIATION Student retrieves starter files for assignment N to his workspace and begins edits and other work.

SUBMISSION A student performs the work on assignment N and submits it for review and grading via GitHub.

NEXT SUBMISSION After the student submits assignment N, the student begins to work on assignment N+1?

PROGRESSION A student has overlapping consecutive assignments. That is, the later assignment requires approved work from the prior assignment. How do does the student use branching simply for the purpose of submitting that work?

Detailed sequence of Progression use case:

time

action

T

student submits the assignment N

T+1

student begins work on assignment N+1

T+2

instructor returns assignment N to student with required revisions

T+3

student fixes assignment N and resubmits

T+4

instructor approves assignment N

T+5

student applies corrections from assignment N to ongoing work on assignment N+1

List of possible branches and their meanings

To consider the impact of different branching strategies, we need to have a list of candidate branches. By considering these candidate branches and there usage, we can assess the work sequence, and thus the work load to the novice student, for using particular branching strategies.

In the following table, the name of the branch is essentially a model. While the"Master" branch has a fixed name, the other names are figurative and me, of course, be any actual name. As you may guess, the letters"XYZ" in the branch name are a stand in for the name or number of the actual assignment.

The headings “W?” and “P?” are indicators as to whether a particular branch is intended to be used for ongoing work by the student, or for a delivery or “Pull,” or both.

Branch

W?

P?

Purpose/Usage

==master==

Y

Y

This is the usual Git master branch

==ASGN-xyz==

Y

This is a working branch for assignment ‘xyz’

==MILE-xyz==

Y

This is a milestone for assignment ‘xyz’

==done==

Y

This is a branch to which all assignments would be merged

==DONE-xyz==

Y

This is a branch specifically to be the pull request target for assignment ‘xyz’

==WORK-???==

Y

This would be a branch created by the student with any arbitrary name the student selects

==work==

Y

This would be the branch on which the student would work for every assignment

List of Possible Branching Schemes

So we have four possible branches that can be used as the student’s working branch, and for branches as the “Pull-To” branch. Now, I am not including the milestone branch in this table.

I am currently using option “C” for my COSC-A211 course at Loyola New Orleans. I have two dozen “pull-to” branches in the assignment repo.

ID

Working Branch

Pull-To Branch

Notes

A

==master==

==master==

unusable

B

==master==

==done==

unusable

C

==master==

DONE-xyz

This has been the approach to date in COSC A211. Student does not create/name branches, does not switch branches, does not merge.

D

ASGN-xyz

==master==

Student must switch branches. Progression use case would require merges

E

ASGN-xyz

==done==

Student must switch branches. Progression use case would require merges.

F

ASGN-xyz

DONE-xyz

This branching scheme is arguably the norm in professional software development organizations. Student must switch branches. Progression use case would require merges.

G

==work==

==master==

unusable

H

==work==

==done==

unusable

J

==work==

DONE-xyz

Comparable to scheme C. Instructor could designate ==work== branch as default, eliminating need for the student to switch branches from ==master==

K

WORK-???

==master==

Student would have to create branches, switch, merge. Like F, very typical for professional organizations.

L

WORK-???

==done==

M

WORK-???

DONE-xyz

Four of these options, specifically, A, B, G, and H, are non-starters. Using catchall branches such as “master,” “done,” and “work” be with each other would’ve prevented us from distinguishing work (and pull requests) for different assignments.

Problems Observed

Of course, it’s not consistent with professional workflow

When student issues a Pull Request to a later “Pull-to Branch”, the PR has all of the commits from all prior work, making the PR very muddy

Similarly, it’s hard to tune the notifications emails to make them usable.

Choose a git strategy for teach to beginners is always hard. In my experience I’ve seen how some development teams adopt some Git strategies, and in other cases, they create their own git strategy. In my opinion the last one is more complicated, git and GitHub are two topics with their own complexity, create a new git strategy could be like add more complexity to this…

I remember when I was teaching git to students who didn’t know nothing about, and they ask me about use tools as Git Kraken, or other tool for use git and connect remote repositories, I said the same, that use git is complicated and understand how it works require a lot of practice and study, and use other tool would be add other thing to understand… in my opinion.

I recommend use the GitHub Workflow, because it’s simple and easy to implement, maybe you might create a repository, the students create their own forks, they add commits across their forks, and finally they should send their pull request… I think that this could works for your objectives.

But if you want to explore more options for make this more accesible maybe you have to check Probot, for this you could check GitHub Learning Lab Courses because this courses implement Probot.

And maybe you could check more about Continuous Integration, for example I’ve work with CI implementation where you have to make your commits, but at the end of the day you have to squash them and make only one commit, then you will send in a Pull Request, the CI server (Jenkins CI) immediately detect when a PR new appears, and make something, in this case, run the test suite, and if the test worked, then the CI will merge the PR to master branch. Some like this you could implement with the CI system, for example.

And for example I recommend see more about statics, because with only one single repository you will have a lot of information based on git: commits, authors, flows, branches, dates in commits, features, descriptions, etc… some like (this)[http://tomgi.github.io/git_stats/examples/rails/activity/by_date.html] ( this is created by a ruby gem named Git Stats)