Sunday, March 28, 2010

darcs hacking sprint 4 report

Updated 2010-03-29 with more photos (thanks, David Anderson!), a small correction and a note about the SFC

The Fourth Darcs Hacking Sprint took place last weekend (19 to 21 March) as part of the Zurich Haskell Hackathon. We had a very productive sprint, a bit of code written, polished off many key discussions, had a little beer and a lot of fun.

Overview

In this sprint, we worked on finishing some performance work for the upcoming Darcs 2.5 release this summer (hashed storage, patch index, global caches, inventory hashing); planning our work for the Darcs 2.6 release next year (smart servers, cache cleanup, darcs rebase) and working with new users of the Darcs library.

Issues resolved

issue643 darcs send -o output - Guillaume Hoffmann

issue1473 annotate command line - Stefan Wehr

issue1456 portable darcs dist - Guillaume Hoffmann

New Darcs Hackers

We're always happy to work with new Darcs developers. At this sprint, we were joined by four new contributors.

Guillaume Hoffmann

Guillaume has been writing our Darcs Weekly News articles for a year now. Over the weekend he got his first taste of Darcs hacking, knocking out three ProbablyEasy bugs (darcs dist internals, darcs send -o UI, darcs apply with gzipped patch bundles). Guillaume reports that he can see himself doing more of this in the future!

Steven Keuchel

Steven worked on a new feature to display the file contents hashed associated with any patch. This makes it easier for third party tools to inspect the patch files behind Darcs.

Stefan Wehr and David Leuschner

Stefan and David mostly worked on the Darcs Patch Manager, but to warm up, they tackled a couple of ProbablyEasy bugs, particularly a bug in darcs annotate that was affecting Redmine

Hacking continued...

Darcs hackers at work (Saturday)Photo taken from David Anderson's Picasa site

Bugfix: Darcs on Windows shares

Salvatore tracked down the Windows regression on 2.4 that make Darcs not work on windows shares.

Performance: Fast darcs annotate

Benedikt Schmidt continued his work on the patch index (formerly known as the filecache). The patch index keeps track of which patches affect which files. This index will bring a big boost to darcs annotate performance, particularly for files which are affected by relative small number of patches.

Performance: Global cache

Luca continued his work on breaking up the global cache ($HOME/.darcs/cache) into buckets for faster access. Working with Reinier and Petr, Luca has developed an approach to migrating from old style caches to the new style bucketed ones. He has also improved the implementation to use hard links, to avoid disk space doubling and to preserve backwards compatibility with prior versions of Darcs.

Windows installer

Salvatore put together a nice Windows installer using the bamse package. It looks like we will be able to use this for the planned Darcs 2.5 release this summer. This work will also open the door to nicer integration with Windows tools, for example, using a bundled Tortoise SSH for better experience working with SSH passphrases.

Interactive cherry picking

Florent improved the quality of the Darcs cherry picking code, making it easier to fine tune our user interface and some day support graphical interfaces via the Darcs library. Witnessed list zippers for the win?

Interactive diff

Florent also started work on adding Darcs's interactive cherry picking to darcs diff, making it possible to choose a set of patches to view as a diff.

Performance: Hashed storage completion

Darcs has a representation of file and directory trees called slurpies. Petr polished off his work to replace the slurpies with his more efficient, general purpose hashed-storage library. Slurpies are going away, and Darcs will be faster for it. He and Ganesh also discussed how to gracefully transition from repositories created before the hashed-storage refactor.

Performance: Using tags when writing patches

Petr ported work by David Roundy to solve a scalability regression in hashed repositories. For darcs commands that write out patches, we had a naive hashing operation that does not account for the fact that patches behind tags cannot be modified. Darcs was unnecessarily traversing the entire sequence of patches (ie. O(n) time) when it could easily have been just traversing the sequence since the last tag.

UTF-8 metadata

Reinier continued to improve the encoding of Darcs patch metadata. Darcs is completely agnonstic with respect to the encoding of your files. Unfortunately, this agnostism extends to patch metadata (patch name, patch author), making it difficult for people to collaborate across different locales. To address this problem, Reinier has been working to make Darcs store its patch metadata in a single encoding (UTF-8) while gracefully supporting older patches (with metadata in potentially any encoding).

Discussions

The rebase discussionAlso from David's site

Release process

The Darcs 2.4 release was quite a tricky one to navigate. We found that bugs were only being flushed out on release candidate time and sometimes after the release proper.

We would like to encourage more people to try out Darcs work in progress and give us feedback early in the release process. After chatting about this, Reinier (with Ganesh, Eric and Petr) decided that as Release Manager, he would put out a Darcs alpha every 4 weeks.

In the future we may investigate automatic nightly builds via the buildbot and a platform support policy such as the one used by Tahoe.

Darcs patch index (fast darcs annotate)

Benedikt updated us on the recent status of his ongoing patch index work (formerly known as the filecache). We discussed the things that make the patch index convincing (permanant, repo-local, unique identifiers for files) the interaction between the patch index and the type witnesses and also ways of tuning the patch index performance and keeping it small.

We're looking forward to sharing the new patch index optimisation with you in upcoming releases. Darcs annotate may become a lot more useful in the next couple of releases!

Readable darcs annotate

Fast darcs annotate won't be useful if nobody can read it. Benedikt and Eric worked on designing a better output format darcs annotate. Taking a page from git blame, there will be one line per source file line, with columns for patch identifier, author name, date and finally the line. One of the design questions was how we should best refer to darcs patches, the current best candidate being a prefix of the darcs patch metadata hash.

Fast darcs over networks

Darcs get over networks is slow, painfully slow. Petr has suggested two priorities for improving the performance of network operations. The first would be to introduce a darcs optimize --http feature which would optimise the Darcs repository for fetching over a network (for example, by creating a "snapshot" of the pristine cache to be fetched in one go). The second priority would be develop a smart server that would provide darcs clients with only the files they need and in the optimal number of chunks. The two ideas combined would make an excellent Google Summer of Code project.

Darcs rebase

Prior to the sprint, Ganesh was working on a darcs rebase feature. Rebase will help Darcs users work with long term branches, and other cases where patch commutation by itself is not enough. At the sprint, Ganesh explained his work to everyone interested. Together we settled on a rough plan for the user interface. It looks like our new rebase command will offer a typically Darcs-ish twist: interactive cherry picking.

Darcs library

Ganesh and Florent talked with three teams building software in the Darcs ecosystem (DPM: Stephan Wehr and David Leuschner, Mac Darcs record GUI: Benedikt Huber and David Markvica, DarcsDen: Alex Suraci). There was a surprising degree of commonality.

The conversations have given us a much stronger sense of direction with the Darcs library. In particular, Ganesh is convinced that we should commit to our use witnesses - at the very least getting them completely finished so we can run with them, probably turning them on by default, and quite possibly dropping the non-witnesses builds.

Default switches

We held a quick roundtable discussion to settle some decisions on Darcs default switches that have been hanging in the air. Our decisions for Darcs 2.5:

--no-set-scripts-executable [unchanged]

pull/push/send --no-set-default

send --edit-description

record --no-test

check --no-test

Performance presentations

Petr and Benedkit gave lighting talks, showing some of our recent performance work to the Haskell community. Some exciting numbers from Benedikt's work (notes) include a 6 second darcs annotate on a file in the GHC repository (previously this did not complete within a half hour).

Google Summer of Code

We discussed our priorities for this year's Google Summer of Code. We have decided that we would focus our attention on performance issues. If we had two GSoC students this year, we would be mainly interested in dividing them between

network performance

developing a smart server for much faster darcs get and pull over a network

We also discussed ways to make the best use of our students' time. The Darcs team has participated in GSoC twice and learning a lot from the experience. This year we would like to see if we could publish some clear guidelines both on what we expect from GSoC students and what they can expect from us. Watch the mailing list for more discussion on this topic.

Budding Ecosystem

We were pleasantly suprised to find ourselves with users of the (still unstable) Darcs API. These new arrivals give us the feeling that the collection of related software is coalescing into a new Darcs ecosystem.

Darcs Patch Manager

David Leuschner and Stefan Wehr worked on an exciting new patch management program for project maintainers. The Darcs Patch Manager (DPM) offers a new way for repository maintainers to keep track of incoming Darcs patches, including their amendements and dependencies.

Towards the end of the hackathon, Stefan gave a nice short demo of DPM in action and deftly avoided the wrath of the demo Gods.

MacOS X GUI for Darcs record

Benedikt Huber and David Markvica started work on a graphical interface to the Darcs record command. One key twist is that they make use of the Darcs API to get the kind of dependency-tracking interactiveness goodness that Darcs offers. Bendedikt and Huber report that they have spent most of the hackathon getting to grips with the library. Darcs type witnesses were very helpful for avoiding errors, but they also impose a steep learning curve.

Darcsden

Alex Suraci and Simon Michael made several improvements to Darcsden, an open source hosting solution (akin to Github and Patch-tag). Some recent changes were Atom feeds, the ability to view forks of your repository and cherry-pick patches from them (work in progress). Darcsden also makes use of the Darcs API.

Darcsden fork viewer

Want to host Darcs Hacking Sprint 2010-10?

The Darcs Team would like to hold hacking sprints twice a year. These sprints are an important occassion for us to hold design discussions, hack some code, train new Darcs hackers and generally bond as a team.

Do you think you can help? Please get in touch with me if you think you may be able to host a group of around 20 Darcs hackers one of these October or November weekends.

Thanks!

Getting over 75 Haskell hackers into Zürich and having them up and running on arrival (Swiss power plugs notwithstanding) was no easy task! We'd like to thank Johan Tibell, David Anderson and the rest of the Google Crew for their hard work organising this hackathon.

Thanks also to the generous donors who chipped into our 2010 Darcs Travel Fund. We'll be looking forward to using the leftover cash for the upcoming 5th Darcs Hacking Sprint in October or November.

Speaking of donors, we'd particularly like to thank the Software Freedom Conservancy for providing us with the infrastructure (both legal and technical) for accepting donations and holding assets such as the darcs.net domain. Meta projects like the SFC are crucial for the success of volunteer-driven open source projects such as Darcs.

Finally here are some words from happy Darcs hackers:

The sprint was a wonderful social occasion, and it was great meeting most of the Darcs hackers, and also seeing other Haskell hackers interested in working in the Darcs ecosystem. I especially enjoyed teaching them how to use our API. -- Florent

The atmosphere was wonderful and I consider the sprint to have been very productive overall. -- Petr

This is coolest thing I ever did -- Luca

See you in half a year!

Participants

We had ten Darcs hackers in Zürich along with four Haskellers using the Darcs API to do awesome things (plus two more on IRC).

7 comments:

Just to be sure, you know I uploaded the rest of the Zurihac photos at http://picasaweb.google.com/david.jc.anderson/Zurihac# , and that there are a couple of much more direct shots of the darcs table, right? :-)

Ivan, perhaps Trent, our documentation manager can be persuaded to go. A large chunk of the Darcs team is based in Europe, though. While our fundraising was successful, maybe we should push our luck :-D