Tag: Git

The 5 R’s

I’ve seen the five R’s used many times in reference to the OER space (Open Educational Resources). They include the ability to allow others to: Retain, Reuse, Revise, Remix and/or Redistribute content with the appropriate use of licenses. These are all some incredibly powerful building blocks, but I feel like one particularly important building block is missing–that of the ability to allow easy accretion of knowledge over time.

Version Control

Some in the educational community may not be aware of some of the more technical communities that use the idea of version control for their daily work. The concept of version control is relatively simple and there are a multitude of platforms and software to effectuate it including Git, GitHub, GitLab, BitBucket, SVN, etc. In the old days of file and document maintenance one might save different versions of the same general file with increasingly different and complex names to their computer hard drive: Syllabus.doc, Syllabus_revised.doc, Syllabus_revisedagain.doc, Syllabus_Final.doc, Syllabus_Final_Final.doc, etc. and by using either the names or date and timestamps on the file one might try to puzzle out which one was the correct version of the file that they were working on.

For the better part of a decade now there is what is known as version control software to allow people to more easily maintain a single version of their particular document but with a timestamped list of changes kept internally to allow users to create new updates or roll back to older versions of work they’ve done. While the programs themselves are internally complicated, the user interfaces are typically relatively easy to use and in less than a day one can master most of their functionality. Most importantly, these version control systems allow many people to work on the same file or resource at a time! This means that 10 or more people can be working on a textbook, for example, at the same. They create a fork or clone of the particular project to their personal work space where they work on it and periodically save their changes. Then they can push their changes back to the original or master where they can be merged back in to make a better overall project. If there are conflicts between changes, these can be relatively easily settled without much loss of time. (For those looking for additional details, I’ve previously written Git and Version Control for Novelists, Screenwriters, Academics, and the General Public, which contains a variety of detail and resources.) Version control should be a basic tool of every educators’ digital literacy toolbox.

For the OER community, version control can add an additional level of power and capability to their particular resources. While some resources may be highly customized or single use resources, many of them, including documents like textbooks can benefit from the work of many hands in an accretive manner. If these resources are maintained in version controllable repositories then individuals can use the original 5 R’s to create their particular content.

But what if a teacher were to add several new and useful chapters to an open textbook? While it may be directly useful to their specific class, perhaps it’s also incredibly useful to the broader range of teachers and students who might use the original source in the future? If the teacher who forks the original source has a means of pushing their similarly licensed content back to the original in an easy manner, then not only will their specific class benefit from the change(s), but all future classes that might use the original source will have the benefit as well!

If you’re not sold on the value of version control, I’ll mention briefly that Microsoft spent $7.5 Billion over the summer to acquire GitHub, which is one of the most popular version control and collaboration tools on the market. Given Microsofts’ push into the open space over the past several years, this certainly bodes well for both open as well as version control for years to come.

Examples

A Math Text

As a simple example, lets say that one professor writes the bulk of a mathematics text, but twenty colleagues all contribute handfuls of particular examples or exercises over time. Instead of individually hosting those exercises on their own sites or within their individual LMSes where they’re unlikely to be easy to find for other adopters of the text, why not submit the changes back to the original to allow more options and flexibility to future teachers? Massive banks of problems will allow more flexibility for both teachers and students. Even if the additional problems aren’t maintained in the original text source, they’ll be easily accessible as adjunct materials for future adopters.

Wikipedia

One of the most powerful examples of the value of accretion in this manner is Wikipedia. While it’s somewhat different in form than some of the version control systems mentioned above, Wikipedia (and most wikis for that matter) have built in history views that allow users to see and track the trail of updates and changes over time. The Wikipedia in use today is vastly larger and more valuable today than it was on its first birthday because it allows ongoing edits to be not only improved over time, but those improvements are logged and view-able in a version controlled manner.

Google Documents

This is another example of an extensible OER platform that allows simple accretion. With the correct settings on a document, one can host an original and allow it to be available to others who can save it to their own Google Drive or other spaces. Leaving the ability for guests to suggest changes or to edit a document allows it to potentially become better over time without decreasing the value of the original 5 Rs.

Webmentions for Update Notifications

As many open educational resources are hosted online for easy retention, reuse, revision, remixing, and/or redistribution, keeping them updated with potential changes can potentially be a difficult proposition. It may not always be the case that resources are maintained on a single platform like GitHub or that users of these resources will necessarily know how to use these platforms or their functionality. As a potential “fix” I can easily see a means of leveraging the W3C recommended specification for Webmention as a means of keeping a tally of changes to resources online.

Let’s say Robin keeps a copy of her OER textbook on her WordPress website where students and other educators can easily download and utilize it. More often than not, those using it are quite likely to host changed versions of it online as well. If their CMS supports the Webmention spec like WordPress does via a simple plugin, then by providing a simple URL link as a means of crediting the original source, which they’re very likely to do as required by the Creative Commons license anyway, their site will send a notification of the copy’s existence to the original. The original can then display the webmentions as traditional comments and thus provide links to the chain of branches of copies which both the original creator as well as future users can follow to find individual changes. If nothing else, the use of Webmention will provide some direct feedback to the original author(s) to indicate their materials are being used. Commonly used education facing platforms like WordPress, Drupal, WithKnown, Grav, and many others either support the Webmention spec natively or do so with very simple plugins.

Editorial Oversight

One of the issues some may see with pushing updates back to an original surrounds potential resource bloat or lack of editorial oversight. This is a common question or issue on open source version control repositories already, so there is a long and broad history of for how these things are maintained or managed in cases where there is community disagreement, an original source’s maintainer dies, disappears, loses interest, or simply no longer maintains the original. In the end, as a community of educators we owe it to ourselves and future colleagues to make an attempt at better maintaining, archiving, and allowing our work to accrete value over time.

The 6th R: Request Update

In summation, I’d like to request that we all start talking about the 6 R’s which include the current 5 along with the addition of a Request update (or maybe pull Request, Recompile, or Report to keep it in the R family?) ability as well. OER is an incredibly powerful concept already, but could be even more so with the ability to push new updates or at least notifications of them back to the original. Having the ability to do this will make it far easier to spread and grow the value of the OER concept as well as to disrupt the education spaces OER was evolved to improve.

Bioinformatics is a broad discipline in which one common denominator is the need to produce and/or use software that can be applied to biological data in different contexts. To enable and ensure the replicability and traceability of scientific claims, it is essential that the scientific publication, the corresponding datasets, and the data analysis are made publicly available [1,2]. All software used for the analysis should be either carefully documented (e.g., for commercial software) or, better yet, openly shared and directly accessible to others [3,4]. The rise of openly available software and source code alongside concomitant collaborative development is facilitated by the existence of several code repository services such as SourceForge, Bitbucket, GitLab, and GitHub, among others. These resources are also essential for collaborative software projects because they enable the organization and sharing of programming tasks between different remote contributors. Here, we introduce the main features of GitHub, a popular web-based platform that offers a free and integrated environment for hosting the source code, documentation, and project-related web content for open-source projects. GitHub also offers paid plans for private repositories (see Box 1) for individuals and businesses as well as free plans including private repositories for research and educational use.

Revision (or version) control is used in tracking changes in computer programs, but it can easily be used for tracking changes in almost any type of writing from novels, short stories, screenplays, legal contracts, or any type of textual documentation.

Marginalia and Revision Control

At the end of April, I read an article entitled “In the Margins” in the Johns Hopkins University Arts & Sciences magazine. I was particularly struck by the comments of eminent scholar Jacques Neefs on page thirteen (or paragraph 20) about computers making marginalia a thing of the past:

Neefs believes contemporary literature is losing a valuable component in an age when technology often precludes and trumps the need to save manuscripts or rough drafts. But it is not something that keeps him up at night. ‘The modern technique of computers and everything makes [marginalia] a thing of the past,’ he says. ‘There’s a new way of creation. Some would say it’s tragic, but something new has been invented. I don’t consider it tragic. There are still great writers who write and continue to have a way to keep the process.’

Jacques Neefs (Image courtesy of Johns Hopkins University)

I actually think that he may be completely wrong and that current technology actually allows us to keep far more marginalia! (Has anyone heard of digital exhaust?) The bigger issue may be that many writers just don’t know how to keep a better running log of their work to maintain all the relevant marginalia they’re actually producing. (Of course there’s also the subsequent broader librarian’s “digital dilemma” of maintaining formats for the future. As an example, thing about how easy or hard it might be for you to read that ubiquitous 3.5 inch floppy disk you used in 1995.)

A a technologist who has spent many years in the entertainment industry, I feel compelled to point everyone towards the concept of revision control (or version control) within the realm of computer science. Though it’s primarily used in tracking changes in computer programs and is often a tool used by large teams of programmers, it can very easily be used for tracking changes in almost any type of writing from novels, short stories, screenplays, legal contracts, or any type of textual documentation of nearly any sort.

Example Use Cases for Revision Control

Publishing

As a direct example, I’m using what is known as a Git repository to track every change I make in a textbook I’m currently writing. I can literally go back and view every change I’ve made since beginning the project, so though I’m directly revising one (or more) text files, all of my “marginalia” and revisions are saved and available. Currently I’m only doing it for my own reference and for additional backup not supposing that anyone other than myself or an editor possibly may want to ever peruse it. If I was working in conjunction with otheres, there are ways for me to track the changes, edits, or notes that others (perhaps an editor or collaborator) might make.

In addition to the general back-up of the project (in case of catastrophic computer failure), I also have the ability to go back and find that paragraph (or multiple pages) I deleted last week in haste, but realize that I desperately want them back now instead of having to recreate them de n0vo.

Because it’s all digital, future scholars also won’t have problems parsing my handwriting issues as has occasionally come up in differentiating Mary Shelley’s writing from that of her husband in digital projects like the Shelley Godwin Archive. The fact that all changes are tracked and placed in a tree-like structure will indicate who wrote what and when and will indicate which changes were ultimately accepted and merged into the final version.

Screenplays in Hollywood

One particular use case I can easily see for such technology is tracking changes in screenplays over time. I’m honestly shocked that every production company or even more likely studios don’t use such technology to follow changes in drafts over time. In the end, doing such tracking will certainly make Writers Guild of America (WGA) arbitrations much easier as literally every contribution to a script can be tracked to give screenwriters appropriate credit. The end results with the easy ability to time-machine one’s way back into older drafts is truly lovely, and the outputs give so much more information about changes in the script compared to the traditional and all-too-simple (*) which screenwriters use to indicate that something/anything changed on a specific line or the different colored pages which are used on scripts during production.

I can also picture future screenwriters using services like GitHub as platforms for storing and distributing their screenplays to potential agents, managers, and producers.

Redlining Legal Documents

Having seen thousands of legal agreements go back and forth over the years, revision control is a natural tool for tracking the redlining and changes of legal documents as they change over time before they are finally (or even never) executed. I have to imagine that being able to abstract out the appropriate metadata in the long run may actually help attorneys, agents, etc. to become better negotiators, but something like this is a project for another day.

Academia

In addition to direct research for projects being undertaken by academics like Neefs, academics should look into using revision control in their own daily work and writings. While writing a book, paper, journal article, essay, monograph, etc. (or graduate students writing theses) one could use their own Git repository to not only save but to back up all of their own work not only for themselves primarily, but also future scholars who come later who would not otherwise have access to the “marginalia” one creates while manufacturing their written thoughts in digital form.

I can easily picture Git as a very simple “next step” in furthering the concept of the digital humanities as well as in helping to bridge the gap between C.P. Snow’s “two cultures.” (I’d also suggest that revision control is a relatively simple step one could take before learning a particular programming language, which I think should be a mandatory tool in everyone’s daily toolbox regardless of their field(s) of interest.)

Start Using Revision Control

“But how do I get started?” you ask.

Know going in that it may take parts of a day to get things set up and running, but once you’ve started with the basics, things are actually pretty easy and you can continue to learn the more advanced subtleties as you progress. Once things are working smoothly, the additional overhead you’ll be expending won’t be too much more than the old method of hitting Alt-S to save one of your old Word documents in the time before auto-save became ubiquitous.

First one should start by choosing one of the myriad revision control systems that exist. For the sake of brevity in this short introductory post, I’ll simply suggest that users take a very close look at Git because of its ubiquity and popularity in the computer science world and the fact that it includes a tremendously large amount of free information and support from a variety of sites on the internet. Git also has the benefit of having versions for all major operating systems (Windows, MacOS, and Linux). Git also has the benefit of a relatively long and robust life within the computer science community meaning that it’s very stable and has many more resources for the uninitiated to draw upon.

Once one has Git installed on their computer and has begun using it, I’d then recommending linking one’s local copy of the repository to a cloud storage solution like either GitHub or BitBucket. While GitHub is certainly one of the most popular Git-related services out there (because it acts, in part, as the hub for a large portion of the open internet and thus promotes sharing), I often recommend using BitBucket as it allows free unlimited private but still share-able repositories while GitHub requires a small subscription fee for keeping one’s work private. Having a repository in the cloud will help tremendously in that your work will be available and downloadable from almost anywhere and because it also serves as a de-facto back-up solution for your work.

I’ve recently been playing around with version control to help streamline the writing/editing process for a book I’ve been writing. Though Git and it’s variants probably seem more daunting than they should to the everyday user, they really represent a very powerful tool. I’ve spent less than two days learning the basics of both Git and hosted repositories (GitHub and Bitbucket), and it has been more than well worth the minor effort.

There is a huge wealth of information on revision control in general and on installing and using Git available on the internet, including full textbooks. For the complete beginners, I’d recommend starting with The Chronicle’s “A Gentle Introduction to Version Control.” Keep in mind that though some of these resources look highly technical, it’s because many are trying to enumerate every function one could potentially desire, when even just the basic core functionality is more than enough to begin with. (I could analogize it to learning to drive a car versus actually reading the full manual so that you know how to take the engine apart and put it back together from scratch. To start with revision control, you only need to learn to “drive.”) Professors might also avail themselves of the use of their local institutional libraries which may host small sessions on learning such tools, or they might avail themselves of the help of their colleagues or students in the computer science department. For others, I’d recommend taking a look at Git’s primary website. BitBucket has an excellent step-by-step tutorial (and troubleshooting) for setting up the requisite software and using it.

What do you use for revision control?

I’ll welcome any thoughts, experiences, or additional resources one might want to share with others in the comments.