Update

November 6, 2012: The project is now working with Lucene.Net 3.0 and .NET Framework 4.0. Includes Visual Studio 2010 solution.

Lucene.Net: Excellent Full-Text Search Engine

Can there be a full-text search coded on 37 lines? Well, I am going to cheat a bit and use Lucene.Net for the dirty work. Lucene.Net is a .NET port of Jakarta Lucene search engine. Here is a quick list of its features:

Warning

Don't take the line count too seriously. I will show you that the core functionality doesn't take more than 37 lines of code, but to make it a real application you will need to spend some more time on it...

Demo Project

We will build a simple demo project that shows how to:

index HTML files found in a specified directory (including subdirectories).

search the index using a ASP.NET application.

highlight the query words in the search results.

But Lucene.Net has more potential. In real-world application, you would probably want to:

Add the new documents to the index when they appear in the directory. You don't need to rebuild the whole index.

Include other file types. Lucene.Net can index any file type which you are able to convert to plain text.

Why Not to Use Microsoft Indexing Server?

If you are happy with the Indexing Server, no problem. However, Lucene.Net has many advantages:

Lucene.Net is a single assembly of 100% managed code. It has no external dependencies.

You can use it to index any type of data (e-mails, XML, HTML files, etc.) from any source (database, web, etc.). That's because you need to supply plain text to the indexer. Loading and parsing the source is up to you.

Allows you to specify the attributes ("fields") that should be included in the index. You can search using these fields (e.g. by author, date, keywords).

It is an open source.

It is easily extensible.

Line 1: Creating the Index

The following line of code creates a new index stored on disk. directory is a path to the directory where the index will be stored.

In this example, we create the index from scratch. This is not necessary, you can also open an existing index and add documents to it. You can also update existing documents by deleting it and adding a new version.

Lines 2 - 12: Adding documents

For each HTML document, we will add two fields into the index:

text field that contains the text of the HTML file (with stripped tags). The text itself won't be stored in the index.

path field that contains the file path. It will be indexed and stored in full in the index.