We had four particularly strong proposals for the "Learning to Rank"
project idea, so we decided to create a second project adding more algorithms,
to complement the project sketched out in
our ideas list.

Sorry to those we weren't able to select this year - we had to make some
difficult decisions during the selection process, and we really appreciate the
time you spent writing your proposal, working on patches, and on the rest of
the application process. We'd encourage you to remain involved with Xapian,
and to apply to us again next year if you're still eligible for GSoC.

If any applicants would like some more specific feedback on their applications
please just come and ask us.

Analysis of Xapian GSoC 2014 Applications

As I said in my earlier post,
we received 31 proposals from students (ignoring 2 duplicates withdrawn by students).
On closer inspection, we spotted another duplicate, so discounting that, here
is how the remaining 30 proposals break down by project idea:

10 - Clustering of search results

5 - Learning to Rank

5 - Weighting Schemes

2 - Postlist encodings

2 - Improve Java bindings (one with PHP bindings too)

1 - Gmane search improvements

1 - Testsuite Improvements

1 - Performance/Relevance testing and optimization of DFR

1 - Social Media Product Analyzer

1 - Web application for fast image search

1 - Improving Arabic Support + Python Binding Improvements

In the above list, italics indicate ideas or parts of ideas which
were suggested by the student, rather than coming from our
ideas list.

As in 2012, the most popular ideas from our suggested ideas list
are those with the closest connections to Information Retrieval theory. I
think the clustering idea also seems very accessible, which is why it's been so
popular (it was only added to the list shortly before student applications
opened, as we'd already seen signs that "Learning to Rank" and "Weighting
Schemes" were likely to be very popular).

There's also a wider spread in quality for the clustering proposals
(perhaps also due to the accessibility of that idea) so don't despair
if you're a student who applied for a clustering.

And generally, if we have more than one great proposal based on the
same project idea, we may accept more than one of them - we don't want to
duplicate effort, but it's often possible to adjust the scopes to produce
projects which don't overlap.

Xapian GSoC Applications for 2014

Student applications for GSoC closed a few hours ago,
and here are some initial stats on the proposals we received for
Xapian (for comparison, see my blog posts for
2011 and
2012).

We received a total of 31 applications this year - here's a graph of total
applications received against time:

If you're an admin or a mentor, you can produce a similar graph for your
own org(s) - just download this OpenDocument spreadsheet and follow the
instructions inside.

Of the 31, 18 were submitted in the last 12 hours, with the latest submission
a rather brave 99 seconds before the deadline.

The total number is lower than the 42 and 41 we received in previous years,
but in a quick skim through I didn't see anything we'd immediately discount as
a spam proposal and mark as invalid. So that 31 is more comparable with the
numbers after removing spam from previous years (which were 33 and 30).

I suspect the improved quality and the even more marked spike as the deadline
nears may be due to the new requirement that students upload proof that they
are enrolled before they can submit a proposal.

Xapian GSoC 2012 Projects

At the end of the previous episode,
you may remember our gallant heroes had a pile of 30 proposals to review.
We soon spotted one more to mark as invalid (just a paste with our ideas list
plus a some biographical details), and another got withdrawn by the student
without explanation (but was low quality anyway), so that left us with 28.

We had six volunteers for mentoring, and in the initial allocation we received
five student slots from Google, but we asked nicely if we could have an extra
one, and were lucky enough to get it. Last year we had four students, so that's
a 50% increase.

Here's those 28, broken down by the project idea:

8 - Weighting Schemes

6 - Learning to Rank

3 - Dynamic Snippets

2 - Lucene Backend

2 - QueryParser improvements

1 - Erlang Bindings

1 - Improve C# and Java bindings

1 - Improve PHP Bindings

1 - Improve Python Bindings

1 - Improving Japanese Support

1 - Node.js Bindings

1 - Postlist encodings

I find it interesting that the most popular three ideas have closer connections
to Information Retrieval theory than most - probably these appeal to
students who have taken IR courses and already have an interest and some
knowledge of the project area. I think we should aim to get more ideas like
these on the list in future years.

It's worth noting that in several cases students had taken an idea in
sufficiently different directions that there wasn't much overlap, so we didn't
just pick the best proposal for each project idea to narrow things down. Also,
the proposal isn't the only factor - we like to see applicants work on patch,
and to interact with us on IRC and/or email. But in the end it happens we
ended up with proposals which were all from different ideas - here are those we
selected:

My congratulations to the lucky six, and my commiserations to those we weren't
able to select. It wasn't an easy selection to make, and we truly appreciate
the time you spent writing your proposal, working on patches, and on the rest
of the application process. We'd encourage you to remain involved with Xapian,
and to apply to us again next year if you're still eligible for GSoC.

Xapian GSoC Applications for 2012

Student applications for GSoC closed a day
or so ago, and we've done an initial pass through Xapian's applications, so I thought I should post another
overview, similar to last year's.

We received a total of 41 applications this year (very close to last year's
total of 42). Here's a graph of applications against time:

If you're an admin or a mentor, you can produce a similar graph for your
own org(s) - just download this OpenDocument spreadsheet and follow the
instructions inside.

That total of 41 includes one duplicate and one application withdrawn by the
student (we had one of each last year too). I've also gone through and marked
nine spam proposals as invalid (similar to the seven we had last year). Spam
proposals are things like proposals with no connection at all to Xapian, and
proposals which are just a title and/or paste from our ideas list with a
generic biography.

So that leaves us with 30 proposals (compared to 33 last year). It's hard
to really measure, but my feeling is that the average quality is higher
than last year (and it was already pretty impressive last year).

Xapian 1.3 Branched

(Actually, we branched six weeks ago, but I've not got around to
writing about it until now.)

The development branch approach we used for 1.1.x development releases
leading to a stable 1.2.0 release seemed to work pretty well, so we're
adopting that again.

The main problem last time was that it took a long time to actually stabilise
1.1.x because we kept slipping more changes in. For 1.3.x, we need to be more
disciplined and changes should be developed on a branch and not merged
prematurely. We now have solid git mirroring,
so developing on a branch is a more pleasant experience than before. We also
need to be brutal sooner. It's better for everyone to (say) achieve two
releases series in two years than have one release series take two years.

When I was in the UK back in May, Richard and I sat down and hashed out
a list of goals for a 1.4 release series. This is what we came up with
(the order is just how they came to mind, so isn't really significant):