This year’s summer has just come to and end, and so has the Summer of Code. It’s time to go over what I did while working on a performance testing framework for DUNE.

First, I wrote a Python program that measures the run time and resource consumption of an external program. It stores quite a lot of data, but the most useful are definitely time spent, top memory consumption, and computer parameters such as number of CPUs. The tool can then output this data into a temporary log file, store into a sqlite database, or upload it to a central server. Furthermore, it can generate nice graphical representations of the data in the form of HTML pages with javascript graphs. It is written in a very modular way, so while there is a script to tie it all together, each of the described actions can be done separately. This also minimizes external dependencies, so if the user doesn’t have the Python SQLite module installed, the database part is skipped.

Then, with help from my mentors, I tied the measurement tool into the DUNE build system, or rather, both of DUNE’s currently available build systems, autotools and cmake. This allows you to set minimal configuration, and then just run “make perftest” from the build directory, and all performance tests are performed, measured, stored, uplodaded and visualized. By using the build system directly, we can get information about the compiler used and its flags. This is important if you ever want to compare compilation with different compiler options. When run this way, the tool separately measures both the compilation time and run time. The compilation time may get quite long with a lot of templates, unnecessary includes and in general large code files, so such a test comes in handy for identifying compilation bottlenecks.

For displaying the results, I used Twitter Bootstrap, Dygraphs and Table.js, so the generated pages look quite nice. Graphs are interactive, and some table columns can be filtered for easier browsing. Some examples are shown here.

Results of a single text run

Graphical representation of results of repeated runs of the same test

Finally, I added a server component, implemented as a number of CGI scripts in Python. One of these endpoints receives uploaded log files, while another stores them into a ‘proper’, PostgreSQL database. This two-step process is used, so that processing can be done in batches and separately from uploading. For example, one could easily upload the data in some other way, like the secure copying with SSH. The current uploading setup is not completely open, as it requires a username and password, but these are stored in plaintext on the server.

With a server that accepts data from multiple computers, I could add some additional views. For example, there is a page that identifies outliers with adjustable tolerance. Outliers are data points with considerable deviation from the mean, which in our cases means unusually long run or compile times.

Server overview of all collected data

Results of a single run on a server

Aggregated results of a single test on a server

A page for finding outliers on a server

All in all, I would say my summer project was a success. It started a little slow, at first with a two-week pause because of a summer school I attended, and then when had to finish my master’s thesis a month before I expected. However, despite not always following the set schedule, I tried really hard to complete everything I set out in the timeline. This project had several different components, from the DUNE libraries in C++, the two build systems, Python, database programming and websites, so I had to learn some new things over the summer. For this I am grateful, and I would like to thank the DUNE developers again for giving me this chance.

I’m sorry I can’t attend the DUNE developer meeting this week, even though the developers invited me and even offered to cover my expenses. I’m giving a talk on the optics of liquid crystals at a conference in Kranjska Gora that happens to take place the exact same three days. However, I can say I enjoyed working on this project, and can only hope that my contributions will help others.

This time I would like to compare the originally-set timeline in my DUNE proposal to the actual results. We’re just over the first half of the project, and the actual work seems to already deviate from the initial plan. The most important change, as per the mentors’ suggestion, is that I focused more on the local part, and ensuring it really works, than a central remote website.

Here is the original plan, accompanied by my notes.

Week 1 [June 17 – June 23]: During the first week, I plan to finalize a short list of test programs (benchmarks). Additional ones could by added any time, but I would like to have a basic list before starting other work. I will write a basic script that only compiles and run these programs, while measuring their time and memory consumption. — Done

Week 2-3 [June 24 – July 7]: I will spend most of these two weeks in Cambridge attending a summer school on the physics of liquid crystals. Any coding time will be severely limited, but I will at least finish the database specification for storing the results, and actually store them. — Done

Week 4-5 [July 8 – July 21]: Add more detailed measurements to the test running script. Big-O measurements with automatically varying problem sizes, multi-thread vs. single-thread comparison, trying different compilers, etc. It may be necessary to write a special utility program for this in something other than shell scripts (Python or C++) for cross-platform compatibility, in which case I will. — Done. The whole thing is written in Python. The program stores problem size, the compiler and its flags, etc.

Week 6 [July 22 -July 28]: Finalize the testing and measuring part of the project. If any time remains, which I think it will, I will start working on a visualization website, and the client-website communication protocol. — Done, but differently. The visualization happens on the client-side, with graphs and all.

Mid-term evaluation [July 29]: At this time, as a milestone, I will have completed the part of running the test programs, measuring their performance in several ways, and allowing for automatic comparisons of problem sizes and compilers. — Partly done. Automatic changing of compilers is not supported, as this is very program-specific.

Weeks 9-10 [August 12 – August 25]: Frontent for the dashboard website. If matching visual style with the Wiki is desired, I will use that, otherwise either Twitter Bootstrap or ZURB Foundation, depending on preference. More important that website style are visualizations with charts, using one of the widely available libraries such as Highcharts. — Todo, although using Bootstrap and DyGraphs instead of Highcharts.

Weeks 11-12 [August 26 – September 8]: Add more advanced features to the website. Identifying and removing outliers, finding long-term trends, alerts for sudden performance drops, exporting data, etc. There could still be something missing from the last four weeks, which I will complete here. If not, I will start writing documentation.

Week 13 [September 9 – September 16]: Everything should work by now, so I will write extensive documentation for both running tests and the website. For projects with more than one developer I tend to document code as I write, but there will have to be user documentation as well.

Suggested pencil-down date [September 16] milestone: Everything specified in this project works. It is possible to run all benchmarks with a single command, upload the results to a central website, and look at the results there. By repeating the procedure, one can see the averages and deviations, as well as trends and comparisons between different setups.

In essence, I haven’t started writing a website yet, and instead paid more attention to the client side. Both visualizations and basic statistics, although not mentioned in the proposal, are now done locally on the developers’ machines.

Last week was the mid-term review for the GSoC. Because of this, I spend more time polishing and completing existing things than adding new ones. The biggest was documentation, I added docstrings to all the functions I’ve written since the start of coding. This should hopefully make it easier for everyone else to see what I did, but more importantly to extend it in the future.

This was, however, not all. Dune-perftest now has a couple (= two) example programs, written in C++ using the DUNE libraries. One is mostly empty and basically just measures the time needed for MpiHelper initialization, while the other works with matrices. Such programs will be used for monitoring the performance of DUNE itself. In order to build these C++ programs, I had to use the DUNE build system, based on autotools. I probably spent far more time than I should have on this one. As mostly a KDE developer, I am only used to CMake. I know that DUNE already supports CMake, and if I understand it correctly a complete move is planned, but at the moment I will include both.

There are no new screenshots, because graphically nothing has change since the last post. The actual generation of templates is somewhat improved, and the page (and graph) only shows data for the same command. I’m pretty happy with how both Bootstrap and Dygraphs turned out, I will probably redesign the page a little, but the graphs look good enough to me. However, I will add more information, starting with the memory footprint.

Now that the first half is over, I have to start planning ahead. My short-term goals are more automation and some statistics. More automation means you should be able to test multiple programs with one command. A couple more example C++ would help a lot for testing this. I will also make it possible to define both compile and run commands and have those associated with the same program. DUNE is mostly a template library, and these can often cause very long compile times. Once testing is automatic enough, there will be more data, so a need for meaningful statistics will arise. These can be basic enough, identifying outliers and general trends will be my first priorities.

This week I managed to put together all the separate parts of measuring, storing and visualizing program performance. Now, there is a single Python command that runs an external executable, measuring its time and memory consumption, stores it first in a log file and then in a SQL database, and finally produces an HTML report with a graph. A sample output can be seen on the following screenshot.

First visual results of performance testing

The document formatting is courtesy of Twitter Bootstrap, while the graphs are made with JavaScript using the free library Dygraphs. Of course I plan to add more data to them, not just how long a program takes vs. when it was run. There is also no filtering yet, the two measurements with noticeable higher durations were actually with a slightly different test program.

Instructions for running the test are included in the code repository in the README file. Neither Bootstrap nor Dygraphs are included in the repository, and they both have to be in a specific location to work. Apart from that, you just have to run “perftest.py” a couple of times (so that you have more than 1 point on the graph), and you already can see results similar to the ones above.

I started my project of bringing performance measuring to DUNE almost a month ago. Unfortunately I was attending a physics summer school in Cambridge for two weeks, so I didn’t have any results to write about yet. Now I managed to put together the first week of actual work.

So far, it is possible to measure the running time of any external command, as well as some other data like memory consumption and CPU utilization. These measurements, together with information about the host computer, are then stored in a temporary log file. Plain text log file are not very useful for comparisons and finding trends, so I started on a kind of a toolchain. A measurement is first stored in a log file, then a second program reads the contents of the file and stores them into a SQL database, and finally a third script read the values from the database and outputs an HTML file with tables and charts.

The separation into three separate Python programs/modules is done so that only the first part has to be run locally. A user could thus measure the performance of DUNE and his own programs without installing a bunch of dependencies, which are needed for database operations and visualization.

So far, the first part (measurement) pretty much works. I only say “pretty much” because we will probably decide to add more measured data later. The second part (database) is a little behind, because I want to first decide on the data entry format and at least most of the measured fields. These are details such as whether to store maximum or average RAM usage, or maybe both. Otherwise, interfacing with a SQLite3 database is pretty straightforward and I don’t anticipate any troubles here. I have only just started on the third, visualization part. This one is the most flexible (and the most fun), so it’s hard to tell how long it will take. I created a couple of HTML template files, and am now adding the programmatic part of reading from the DB and displaying the data.

I’m not able to do much work while in Cambridge, but I do manage to blog about it. In the week before I came here, I started working on DUNE as part of this year’s Summer of Code. With my mentors, we decided it’s best to use Python for a utility that will measure and report the compilation and running time of DUNE programs. In the future, it will report more that just time, such as memory consumption and I/O, but I’m only starting now.

Basically, now I have a Python script that runs /usr/bin/time on a specified command and extracts the relevant information. The next step, which is only partially done, is to store this data in a log file. A separate script will then read the log files, store the data in a database and display it in a HTML file. However, this will have to wait for as long as I’m here in England.

This and next week I’m attending I-CAMP, an international summer school on liquid crystals. It’s taking place in Cambridge, UK. It has a focus on optics, which is very relevant to my current work for the master’s thesis. My thesis work is now mostly complete, I am presenting a poster on this topic at the summer school as well.

After only two days of lectures, I can already conclude that English food is bad. Breakfast is alright, but fried fish and sandwiches doesn’t cut it for a pasta-lover like me. Additionally, seats in the lecture hall we used today were very uncomfortable, and spending 8 hours in the made my back hurt. Apart from that, the city and its colleges are very nice, with all the grass and trees. I was surprised at first, but you are allowed to walk on most of the grass areas, which is great for any activity in sunny weather.

My poster is about numerical modeling of light propagation through a fibre filled with liquid crystals. The interesting part is that a single laser pulse splits into 8 regions. Most people won’t understand much, but the pictures are nice. You can take a look here.