According to the German Wikipedia YaCy turned 10 this year. On December 15th 2003 Michael Christen publicly mentioned his idea of a peer-to-peer base search engine in the comments of a news article published at heise.de, a German IT news website, for the first time.

When I learned about YaCy a few months later, it was still in it’s very humble beginnings. I think there already was a website, but there was no version control for the source code, no forum, and only a small community of 2 or 3 developers. The crawler was not yet finished and indexing was done via the proxy which is still included today. I still remember how excited I was when my index contained the first 1000 documents and how disappointed I was when I lost them because Michael changed the database format once again (Solr was still far, far away). 😉

During the last 10 years I have been laughed at (this has not happened for along time) and yelled at (only once at 26c3), but most of the time I had lots of fun learning and getting to know quite a few funny, interesting, and inspiring people. I visited places and events I would not have attended if it was not for YaCy and I got a nice certificate. 🙂

Even though I have never been the most active contributor, I hope that YaCy will stay a part of my life for at least another 10 years.

On October 2nd YaCy was featured on FLOSS Weekly, a video podcast about Free Libre Open Source Software hosted by Randal Schwartz. The co-host of this episode was Aaron Newcomb. YaCy was represented by Michael Christen.

YaCy 1.6 is out since yesterday. This release contains mostly enhancements, bugfixes, a special ‚greedy learning mode‘ and most important, a feature that many of you requested using anonymous messages to http://sayat.me/YaCy, a link that is shown on the goodby-screen of YaCy after a shut-down. Many people used this link to report, that they would run YaCy permanently if YaCy tries to keep it’s CPU load and IO low. And thats therefore the main feature of version 1.6: if YaCy is running in the background and used for searching, it will try to keep it’s IO and CPU load low.

Here is what we did in detail:

We examined the IO problem and found out that Solr needs regular optimization processes. Without this, the IO is very high during DHT transmission (the peer-to-peer sharing of search indexes). With an optimized Solr, this process is done much more efficient.

We integrated a CPU load sensor which causes that no DHT transmission is made if the CPU load is too high (affects sending and receiving of indexes).

The new ‚greedy learning‘ mode will cause that YaCy loads linked documents from the first search results, until a total of 1000 documents are in the local search index. This is mostly reached at least after the first three searches and after that, YaCy can benefit from these documents in it’s search.

Speed

When you do a peer-to-peer search, the requesting peer must wait for the remote peers to submit their remote search result. To show a progress of remote searches, the search interface had a progress bar. It’s still there, but when we showed the new YaCy Release 1.4 at the Linuxtage Fair in Chemnitz last weekend, people said: „is there a bug? the progress bar does not show„. No, it’s not a bug, in many cases the bar flashes so fast that you cannot see the bar any more.

Quality

Furthermore, the search result quality has increased. This is the result of the advancing deep Solr integration not only in local search, but also in remote search. The integration is not yet fully finished, but it now shows a new quality of integration flexibility, speed and relevancy of search results.

Here are some more details about the main changes in YaCy 1.4:

This release includes mainly a deeper Solr integration, much more Solr fields are filled, Solr has now mutli-core capabilities and a second core with a webgraph was added (but deactivated for further testing).

The opensearch result writer of the integrated solr has now all the features as the original opensearch result servlet of YaCy had, and the file search interface „yacyinteractive“ now uses this new result writer instead the old one. The search of that interface is now much faster.

The default search process has undergone a full re-design and a lot of testing was done to fix problems with the to-solr migration. The normal (local) search is now very fast, especially in portal mode and even in p2p mode.

The ranking was strongly enhanced, there is now a support for flexible field boosts, boost functions and boost queries (see servlet /RankingSolr_p.html). All these ranking functions had been made editable and there is a new configuration sevlet for this. Furthermore, there are several ranking
schemas predefined, one for default internet search, one for sort-by-date and one for intranet search requests, which is triggered automatically if a site-operator is used. Intranet search ranking rates deep links higher than shallow which returns more specific document types. Remote searches are done using the local ranking profile, not the remote profile.

The selection of target peers had been enhanced, now all robinson peers which have a solr interface are searched using that interface rather with the old YaCy interface.

There should also an enhancement in indexing speed as there are less requests to the solr for
doing that and index updates are bundled together while forced commits had been reduced using a new solr 4.1 soft-commit feature.

There had also been fixes to some memory leaks and the overall memory usage should be lower.