Abstract

Solr ("Solar") is an open source search server based on Lucene Java library with web service like API. That means we can index documents via XML/HTTP, query via HTTP GET and receive XML results. Thanks to Lucene search engine library, it can provide advanced full-text search capabilities, and scalability by connecting to other Solr search servers. Solr also has administrator web interface. The purpose of this project is to provide full text search feature for Apache email archives by using Solr search server.

As Yonik said "Solr may be in the incubator, but it's already relatively stable and used in production systems", so starting to develop with Solr now is not a problem. Thanks Yonik.

Project Description

The final destination of this project is to provide a search interface, which contains a form with text boxes for sender, subject and content, and perhaps pulldowns for mailing-list and date. Results will be sortable by any field, and will be highlighted by various colors over the match-words. Moreover this project can also provide an useful tool that helps to integrate Solr search servers to other mailing list archives like Apache.

Planning

Java is my main programming language, so I will choose Java (1.5) to develop this project. I have developed a project using Lucene Library. Solr uses Lucene engine. So using Solr to search Apache's mail archives is a good choice.

Commit is an expensive operation, so we should only use 'commit' when we add enough data.

Step 3

When searching by invoking HTTP GET to Solr server, we will get result in XML format. Converting result in XML format to HTML can be done by using XSTL.

Step 4

Create web interface that provides full text search in Apache's email archives at http://mail-archives.apache.org . I see that Apache's email archives is using Ajax technique to provide quick-rendering functionality, so adding Ajax to web interface here sounds like a must-have. Use hightlight library to provide hightlight feature.

Step 5

This is the final step. We have to test the whole project. We will have many found bug fixed in this step.

Schedule

Step 1: Should be completed in June 14, 2006

Step 2: Should be completed in June 29, 2006

Step 3: Should be completed in July 15, 2006

Step 4: Should be completed in August 7, 2006

Step 5: Should be completed in August 21, 2006

Bio

I am a student of the Computer Science Department of Moscow State University, Russia. I am interested in integrating Lucene Search Engine to provide quick full text search feature in complex web applications.

Development Methodology

UML Modeling

Using Ant to build project is a must-have

Controlling version with CVS, SVN

Generating documentation with javadoc tool.

Testing with JUnit

The best individual to do this project

I am using Lucene Library in my project: JSMBSearch https://jsmbsearch.dev.java.net. I have some experience in using Lucene. Java is my main programming language.

I like Apache's projects, and want to contribute to them. So, I think I am the best individual to do this project. This summer is a good chance for me!