Background: Currently a junior in college studying Electrical Engineering at University of New Haven in West Haven, CT, US. My main distro is currently ?CrunchBang. I prefer to use Vim and Git. Some of my other interests are stamp collecting and Atari 8-bit computers.

Benefits to Debian: Make it easier to improve Debian by providing a single source of metrics to evaluate changes.

Deliverables (copied):

database structure to store all historical data points of all metrics

standardized declarative interface to add/remove metrics to be graphed. The interface should allow for both "local" metrics (e.g. data generated by scripts run on the machines hosting the metrics portal) and "remote" metrics (e.g. data generated by remote data sources which are then periodically gathered by the metrics portal)

cron jobs to periodically fetch new data and generate graphs

proof of concept: integration of (some of the) existing graphs in the metrics infrastructure

web interface to show updated graphs of the various metrics

client-side dynamic web interface to graph, on demand, specific metrics (possibly more than one at a time to look for correlations) over the desired time periods

(optional) produce a Debian Package of the portal code to ease deployment on Debian-based machines

Project schedule: I created a Gantt chart to illustrate my anticipated breakdown of the allotted time. I scheduled two days off, the Fourth of July and my birthday. I left one week at the end of Coding Period 1 to allow time to finish any unfinished tasks. I can use the week I allotted for producing the Debian package at the end of Coding Period 2 if I have any unfinished tasks at that point. I don't anticipate needing it and plan on finishing all tasks, including the optional packaging task. You can view the Gantt chart at http://josephbisch.com/debian-metrics-portal.html. The only other commitment, besides the two days off, is one online summer class. It will be taken the first half of summer (May 19-June 30) and will require no more than 4 hours per week of my time. Those 4 hours include lectures and studying/homework. I don't believe the class will significantly affect my ability to complete this project. I can still dedicate at least 40 hours per week towards this project.

Data sources:

https://wiki.debian.org/Statistics appears to have a complete list of existing sources. I will implement metrics for the following as part of this project. In the future more metrics will be implemented.

BTS stats including important bugs and old bugs. Important bugs are those that have a major effect on the usability of a package, making it completely unusable. Old bugs are bugs older than 2 years. Old bugs may either have just been neglected or the bug report might not be detailed enough. It is important to display these in such a way as to make the data accessable.

Release-critical means that a bug affects the release of the package with the stable release of Debian. It is important to graph the total number of RC bugs and also the packages with the most RC bugs to show how close a release is to being RC bug free and which packages need the most attention.

Dpkg-formats - List and graph the total number of packages that use each format. Can use to figure out why there is so much 3.0 (native) and 1.0 format usage. Possibly can correlate with packages with missing maintainers, undermaintained packages.

Source code stats - Display statistics about the number of lines and size of releases and various packages. Important to identify changes in number of lines and in size between releases so we can identify where increased size comes from.

VCS-usage - List and graph the total number of packages that use each VCS. Proves that git is most popular VCS in Debian. Allows us to look into trends in VCS usage and see how tools can be improved to encourage use of git over other VCS.

Choice of libraries: I have decided to use Jinja2 as my Python templating engine and flot.js as my ?JavaScript graphing library. I chose flot.js because I like the interface and the examples I saw on the website. I have already implemented a graph of the VCS from the sources.history table and found it to be a positive experience. I chose Jinja2 because I have used Django before and it appears to be very similar.

Choice of data format: I have decided to use JSON to store data since it is easy to use with both ?JavaScript and Python. I prefer it to XML. I can use SQLAlchemy to query and update the database. Behind the scenes, SQLAlchemy will be using psycopg2. After fetching data from the database with SQLAlchemy, I can use marshmallow to serialize the Python objects to JSON for use with flot.

Why am I right for this project:

I am experienced with Python. For example, while working for my campus' tutoring center, I wrote a program that parsed the bookstore's website and generated a CSV file of all the books. I wrote it using urllib2 to get the webpages and ?BeautifulSoup to scrape the pages. I saved the center time by automating a task that had been done manually in past years.

I am willing to ask questions and learn. Prior to creating this application I never used ?TaskJuggler, but I learned it to create the Gantt chart. I already emailed the mentors to ask a question about the sources.history script. However, I will reserve asking questions for when sources such as the Debian Wiki and Google do not suffice.

I have experience working as a team. I am on my school's robotics team, and was also on my high school's robotics team. I frequently have to work as a team as part of my engineering education.

I have experience with ?JavaScript. I work(ed) on the websites for both robotics teams. I have also done some freelance web design/development. I know how to integrate libraries and how to do basic manipulation of elements (e.g., show/hide element with animation on mouse click of another element; change properties of an element such as color, innerhtml, etc.). If I discover that I need more advanced ?JavaScript for this project, I can learn as I go along.

Other experience:

I took an introductory C course my freshman year of college. I also took a microcontroller course that used C. I went on to TA (be the Teacher's Assistant) the microcontroller course for one semester.

I have experience with LabVIEW from high school robotics and college classes. I have experience with MATLAB from college classes. I took a technical writing class in college

After graduating I mentored my high school's robotics team. I gave a seminar on LabVIEW.

What have I done so far: I wrote a script to create graphs from metrics on the UDD sources history table. I also implemented the VCS graph in ?JavaScript using flot.js. It allows the user to adjust the range of the x-axis. I use JQuery UI for the date picker widget. I used a flot plugin to show/hide a series when you click the corresponding label in the legend. I added support to the plugin for toggling visibility when the colored square is clicked. You can view the graph at http://josephbisch.com/debian-metrics-portal-js.html. I have begun to read the Debian Wiki. I created a Gantt chart that estimates the breakdown of the GSoC timeline.

Potential challenges:

I don't have prior experience contributing to Debian. I am already familiarizing myself with Debian wiki, irc, mailing lists, etc. to minimize the impact of this.

I don't have prior experience with SQLAlchemy, or the templating engine. I have scheduled plenty of time to familiarize myself with those.