Analysing student Software Engineering projects metadata

CS2103/T is the introductory software engineering module in the National University of Singapore. I took this module in AY16/17 Semester 1, from August to November 2016. For the last seven weeks of the module, students were required to construct a todo list application in teams of three, four or in one instance, two. Students were strongly encouraged to fork their projects off an existing address book application created by the teaching team, and repurpose that codebase into a todo list app.

There are 65 teams in this module. Since teams were required to use a Github public repository to host their projects, I used GitHub’s API to download commit and repository metadata. The following are some observations gathered from this data.

Judge an app by its cover

Each team is required to present their user interface mockup in their README file. It was a simple matter to download, convert into JPEG and turn this into a gallery. Click on any of the mockups to go to the team’s GitHub project.

Most of these are not very pretty, since the UI framework of choice was JavaFX, and a good looking and user-friendly UI was considered an extension milestone for the project. As mentioned below, some groups opted for additional UI widgets or libraries via external dependencies. The use of these libraries does not seem to be correlated with actually having good UI.

For reference, this is what the forked app looks like.

Here are a few apps with good looking design (though not necessarily UI - I really didn’t have the time to play with all 65 apps one by one) that stood out from the crowd:

With its punny name and minimalist design, PriorityQ stood out as one of the few projects with an UI that actually looks professional. The judicious use of colors, negative space, and typography makes this app look like one you might find on an actual designer’s desktop. Remember - sometimes less is more.

TasKitty’s design stood out because it was the only app that employed an anthropomorphised avatar. This is not something that’s easy to pull off - who can forget Clippy? - but if done right it makes the app stand out from the crowd by giving it personality, and making it easier to interact with.

The images can be downloaded using

stats.download_images()

The postprocessing was done using Imagemagick’s mogrify command.

Last minute commits

Throughout the project teams need to present their progress during weekly tutorials. Tutorials happen on Wednesdays, Thursdays and Fridays. Students are well known for doing their work at the last minute, and we can actually see this trend by plotting number of commits against time of day. The two highlighted regions represent 12 hours before and after midnight of the tutorial day for each of the three days of week where CS2103/T has tutorials on.

We can see clear peaks centered around the midnight of before every tutorial day, when students rush to complete their work.

The code to generate the graph can be found in render.py. The data can be extracted using

stats.analyze_commit_timing()

The final countdown

Another thing we could do with the commits is to count how many there are every week, from the start of the project to the end.

217Week 1

4281Week 2

5428Week 3

6945Week 4

7522Week 5

9146Week 6

3957Week 7

Week 7’s count of 3957 may seem low compared to the previous week, but note that the submission deadline was on a Monday - so on that day 3620 commits were created. That’s an incredible 56 commits on average per team!

Indeed, sharp-eyed readers may have noticed that the Mondays in the commits per hour graph seem to be higher than expected. This is mostly due to the additional commits created on that single day. Below are Monday commits - the shade area represents the commits made on the final day, while the rest of the bar are commits made on the other six Mondays.

170206

0

129140

1

10590

2

6754

3

4133

4

2021

5

49

6

39

7

3324

8

2841

9

34104

10

80121

11

133102

12

146140

13

133163

14

188180

15

202207

16

197215

17

270184

18

239193

19

265263

20

303226

21

390271

22

440239

23

As you can see, more commits were made on the final day than all other Mondays put together - the final tally is 3235 on all other Mondays vs. 3620 on the final day.

Greenfield or fork?

Of the 65 projects in the module, only two projects chose to start from scratch.

On the surface the projects don’t seem to be very different from the ones that were forked. Unfortunately I’m not privy to the student’s grades, so I can’t say if their decision to go greenfield was a wise one.

The code used to extract this:

for name, project in stats.projects.items():
if not project.is_fork:
print(name, project.title)

Dependencies

Out of the 65 projects, 16 projects (25%) did not use any additional dependencies. The is a surprisingly low number considering CS2103T is only an introductory software engineering module, and all project requirements can be fulfilled with existing dependencies.

However midway through the project the lecturer mentioned natural language parsing libraries in one of his weekly emails, which encouraged many teams to use it, as you can see below.

0: 16 5: 4
1: 23 6: 1
2: 11 7: 1
3: 4 16: 1
4: 4

The most number of dependencies a project used is 16. The build.gradle file provided in original project already provides a number of packages:

Manual dependency management

Two of the groups used manual dependency management instead of using Gradle - that is they downloaded all of their dependencies as JARs into a /lib folder and checked them into source control manually.

Having dependencies managed this way means packages do not need to be redownloaded, speeding up first time setup and continuous integration. It also means the repository server going down will cause any disruption.

The disadvantage of this is that updates will have to be managed manually. This is relatively painless for Java because Java packages come in self contained jar files, but for other ecosystems like Node, where certain libraries have C extensions that require compilation, using a package manager would be preferable.

Time and natural language parsing

The most common dependency are natural language parsing libraries. This is not surprising because parsing date time is very painful, and Natty came recommended by the lecturer. Other groups used Prettytime, which is useful for displaying human readable relative time strings such as “10 minutes ago”, and its NLP subpackage, which uses Natty under the hood. A couple of groups opted to use Antlr, a general purpose natural language parsing framework, which is somewhat of an overkill for this project.

natty: 33
prettytime: 10
prettytime-nlp: 8
antlr-runtime: 2
antlr: 1

Testing and logging

For testing, Junit is the testing framework used by the original project. Several projects used Mockito since dependency injection is recommended, and Mockito is very useful for stubbing and mocking. Java Faker is a library for generating fake data, while Powermock allowed private and static methods and other otherwise untestable parts of code to be tested. Hamcrest provides additional assertions for writing more descriptive tests.

mockito-core: 5
javafaker: 1
powermock-api-mockito: 1
hamcrest-all: 1

For logging, the original project used the java.util.logging (JUL) package that comes with Java. Some projects added SLF4J. For a simple project such as this one JUL is sufficient, although SLF4J is the industry standard for logging (and supports JUL as a backend).

Serialization

The original project used Jackson for serialization to JSON and XML. Two groups used Google’s Diff-Match-Patch library to optimize their undo/redo stack. Some used Gson and other libraries for JSON serialization, though Jackson is already included.

Common libraries

Google Core Libraries (Guava) is already included in the original project. A number of projects (including mine) also opted to include the Apache Commons libraries - in our case for the edit distance algorithm for command line parsing. Evo Inflector is an inflection library to support pluralization, and another team used a Java implementation of the Aho–Corasick algorithm for string searching.

UI libraries

The original project used JavaFX as the UI framework. A number of projects opted to use addition theming and controls. An interesting observation is that better design is not correlated with additional UI libraries used.