Introduction

Have you ever heard of Lucene.Net? If not, let me introduce it briefly.

Lucene.Net is a line-by-line port of popular Apache Lucene, which is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search. Especially, an application where you want to achieve something close to Google search results, and no just search results, but very fast search results, or may be just insanely fast search results, but only in your app and on your terms!

So, while technically possibly, though somewhat challenging, you can integrate original Apache Lucene into your .NET application, and it will give you insanely fast search. But it will take quite a while, and will probably force to cut corners here and there, thus making your site way too complex and error prone. So, unless you absolutely have to have the fastest search on the planet (and beat Google along the way), you shouldn't go this way, as for majority of .NET applications Lucene.Net will be madly fast anyway.

Main purpose of Lucene.Net is to be easy to integrate into any .NET application and provide most of the speed and flexibility of the original Java-based library. And it does it pretty good! You will learn that in this article. Even original Apache Lucene documentation applies to Lucene.Net 99% of the time!

First of all, I discovered that Lucene is still faster than SQL query. And it is absolutely fast when searching for some text or pharase, no matter how many words are in your search. For example, when you search for a sentense of five different words in some text or description and want results to be in an order of relevance, in a same way that major web search engines do. How would you do it in SQL? Or .NET?... Your code might get very complex, and search querries long and complicated... That may become equivalent to slow-turtlish kind of search...

Good news are that Lucene.Net solves most of those problems for you! No need to write complicated search logic anymore! All you need to do is to correctly integrate it into you application! And that is what this article is about!

So, if you are interested in trying Lucene.Net for you .NET web site or application, continue reading, and prepare to embrace some love for Lucene!

Small disclaimer, sort of)):

I am not an expert in Lucene, and this article is not only about Lucene, it is rather about how to make it work in you .NET app/site. Hence, there will not be any advanced Lucene topics covered (at least initially), only what is needed to get it working.

Scenario

This article represents a simple scenario of a search using Lucene.Net:

Given:

You have some data source (most probably a database) which has some distinctive textual data in it

You need to be able to:

Create Lucene search index from all the data in you data source and delete the whole index

Add single record to Lucene search index and delete single record from it

Search all fields in the Lucene search index and get matching records from Lucene search index ordered by their relevance

Search by a particular field in the Lucene search indexand get matching records from Lucene search index ordered by their relevance

We will create a search in this scenario step-by-step, so you can understand how it all works. I wil explain more about what Lucene search index is in corresponding steps.

Installation

Of course, first we need to install Lucene.Net library itself, don't we?

I. Installing via NuGet

The easiest and preffered way to do that is to install Lucene.Net NuGet package.

Method 1 - using NuGet Package Manager Console

Open Package Manager Console in Visual Studio by clicking:

View > Other Windows > Package Manager Console

And once console will pop at the bottom, lets first search for 'lucene', by typing

'get-package -remote -filter lucene' in the prompt:

Screenshot 1.

From the search results we can see that there is a number of different packages available for us. For the most part those packages only extend default Lucene.Net functionality. They will be not covered in this article, but you can play with them later on your own. The only package we will need is a barebone Lucene.Net.

So, to install it let's type 'install-package Lucene.Net':

Screenshot 2.

And, boom! NuGet did a good job again - Lucene.Net is installed and referenced!

Method 2 - using NuGet UI tool

If you are not a fan of typing stuff into any sorts of consoles ("...Console? Console?! what a nonsense, sir/madam!...") or it is not working for any reason, you can use neat NuGet UI. Right click on your project and then click on 'Manage NuGet Packages...':

Screenshot 3.

A nice and clean window will pop up. There, first select: Online > All, then in the search box type 'Lucene.Net', than select Lucene.Net from the list and hit 'Install' (I got it installed already, so it shows green icon instead of 'Install' button):

Screenshot 4.

And, that's it you are ready to go!

II. Installing manually

Alternatively you can just download Lucene.Net here - download Lucene binaries from official site[^].

After that, manually copy lucene.net.dll from archive to you bin folder and manually reference it.

Implementing search, Step-by-Step

Step 1 - create sample data source

In order to use Lucene.Net, we need firstly to create Lucene search index from some set of data (most probably from database). Lucene search index is just a set of text files that Lucene.Net creates, and we'll create it later. So, for our tutorial, let's create an empty SampleData.cs file and add some generic data object into it:

This class can represent any data you wish, and of course you can create your own class or use existing one. Also, whenever it will come to testing Lucene, you may want to create a simple static data source repository based on a SampleData class above, and you can stick that code into same SampleData.cs file, you've just created:

We have created sample data source with generic data class SampleData and sample repository SampleDataRepository to retrieve our static data.

Step 2 - create base empty Lucene search class

Now let's create an empty LuceneSearch.cs file, and copy following there:

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Web;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Documents;
using Lucene.Net.Index;
using Lucene.Net.QueryParsers;
using Lucene.Net.Search;
using Lucene.Net.Store;
using Version = Lucene.Net.Util.Version;
namespace MvcLuceneSampleApp.Search {
publicstaticclass LuceneSearch {
//todo: we will add required Lucene methods here, step-by-step...
}
}

We will be adding all required methods to do all search related tasks to that class. At that moment you can make sure all your references are correct by building your site, and if build fails, fix your references.

Step summary:

We have created empty LuceneSearch class.

Step 3 - add Lucene search index directory handler

So, let's see what we got now - SampleData class, which represents some data, and LuceneSearch class which is supposed to do a search. But it's empty. So, it needs some code to do the job, right?

A small prerequisite first. Lucene.Net needs to build its search index, which, as I mentioned earlier, is basically a set of files generated by Lucene in some local directory. So, we need to add a special property to LuceneSearch class which represents a handler for a local directory that will store search index:

Here _luceneDir is a full physical path to the folder, and "lucene_index" is a name of that folder, which sits in an application root. So naturally you also need to create this directory manually or write some code that creates it automatically.

Then, _directory is an instance of Lucene.Net class FSDirectory, and will be used by all of the search methods to access search index.

Step summary:

We have added Lucene search index directory handler to make our LuceneSearch class ready to have search methods added.

Step 4 - add methods for Adding data to Lucene search index

Lucene will create search index based on some actual data, in our case it would be a List<SampleData> with several records, or singe SampleData record.

The first method we need, is a private method that creates a single search index entry based on our data, and it will be reused by public methods which we'll add later:

Basically, it takes one record with a class SampleData, maps it to Lucene class Document, and adds it to search index using IndexWriter. Please note how <code>Name and <code>Description fields use Field.Index.ANALYZED parameter, while Id uses Field.Index.NOT_ANALYZED one. Basically you want to use ANALYZED only on text or single string properties, and NOT_ANALYZED on singular values, like integer Ids.

Now let's add public method that will use _addToLuceneIndex() in order to add a list of records to search index:

Whenever our database records are being deleted, they will have to be removed from Lucene search index too, otherwise your search will return records that might not exist in the database. let's first add a simple method to remove single record from Lucene search index by record's Id field:

The code above basically searches for record by the field Id (and you can use any field of course), gets search results from index, and deletes all of them from index, and in our case it will usually delete single record, as long you always supply full and unique Id value.

Note: main Lucene search methods will be added later, and of course explained in more detail.

Secondy, whenever database schema changes, or you simply want to clear the whole index quickly, you would need a method to clear all index, so let's add that too:

This method simply removes the whole Lucene search index via a method built into LuceneIndexWriter.

Now probably is a good moment to mention that Lucene puts a "lock" on search index files, so when they are being updated or searched, so they cannot be altered. Also, it is important, that each file is "unlocked" because if some of the files would be deleted manually or by some code, and you try to search/update afterwards, it will cause your to see a lot of errors, and we don't want that, right?

So, in order to save ourselves from embarassment and our end users from righteous rage twards ourselves, we always need to .Close() and .Dispose() any Lucene handlers like IndexWriter and StandardAnalyzer.

Additionally, it would be beneficial to run Lucene search indexoptimization once in a while to speed up searches, especially if your index is getting bigger. So, let's add a small method to do just that:

We added three methods - ClearLuceneIndexRecord() to delete single record from Lucene search index, ClearLuceneIndex() to delete all records in the index, and Optimize() method to optimize large indices for faster search.

Step 6 - add methods to map Lucene search index data to SampleData

Hang in there, folks, this step is the last one, before we'll be adding our main search methods!

Now before we can search, do you remember how our _addToLuceneIndex() method from Step 4, mapped our database data to Lucene search index? Well, to get our search results in a form of List<SampleData> or similar, we need a function that will map index to our class SampleData, and here it is:

So, method above will get LuceneDocument h from index (where each field is represented as a string), and will map it to SampleData. Pretty simple, right?

In addition we need two more methods to map a List of Lucene Documents, and a List of Lucene ScoreDocs, each returned by different Lucene search method, and more on that in Step 7 and 8. Both of those methods reuse method _mapLuceneDocumentToData() we defined above:

We added methods to map results returned by Lucene to our data class to be reused on our site. Method _mapLuceneDocumentToData() maps Lucene Document with search results from index to our class SampleData, and method _mapLuceneToDataList() in turn maps a list of Lucene Documents or ScoreDocs.

Step 7 - add main search method

And finally, the main search method. It will search Lucene search index by a particular field (Id, Name, or Description) whenever we'll supply its name, or, alternatively it will search all three fields, which is a basis for universal search somewhat similar to your_favourite internet search engine. You may notice that it is still a private method. The reason for that is that this method is universal for any Lucene query.

Lucene query is a little more than just text you are searching for. Basic example is when the query looks like "Mumbai", Lucene will search for exact match for this word, and if query is like "Mum*" then all fields with words starting with "Mum" would be returned as search results. There is definitely many more ways to write advanced Lucene querries, but that won't be covered in this article.

In our case, query would be provided by a public method, which will format you search request for a particular search scenario, and it will be added in Step 8. Our scenario is that the private search method below shouldn't change much, and only public one will be ajusted for our search needs.

Of course our private _search() method has many points to be ajusted, and optimized, and it will be up to you to ajust it to your particular needs.

Did you notice var hits_limit = 1000;? As I mentioned in Step 5, when Lucene gets more than 1000 search results, it becomes increasingly slow, so you'd want to limit it to a number which is relevant in your case.

If you add new fields to your index, don't forget to add field names to be searched to this line in multiple fields search like here:

Just remember that results returned by _search() method will be ordered by Sort.RELEVANCE, which means that more accurate results will be returned first. Another option is Sort.INDEXORDER, which returns results in the order they've been added to search index. However for most scenarios Sort.RELEVANCE will work just fine.

Step summary:

We added primary private _search() method that will perform searches by single field or multiple fields in Lucene search index based on the search query supplied.

Step 8 (and the last one) - add public methods which call main search method

Our little search engine is practically done right now. Now it is the time to add last methods that will interface with our site or app.

The first one simply formats Lucene search query and calls main private search method _search():

Remember, use it wisely, beacause the larger the index gets, the longer it takes to load it all.

Step summary:

We added two methods that so far finalize our basic LuceneSearch class - first method is Search(), which formats Lucene search query, and searches by all fields or by a single field. Second method is GetAllIndexRecords() which merely returns all records in the search index.

Using the Code

Congratulations! If you made it so far in this article, it means you are probably ready and want to put your new LuceneSearch class to test. Or probably your already tested it along the way and got it all and now researching Lucene online documentation)).

To demonstrate it all in action, along with the simple UI, I went ahead and created a sample project for Visual Studio 2012 (including MVC and WebForms examples) that uses all the code from that article and provides a simple interface for Lucene.Net.

Alternatively, if you have Git installed (if not, read how to setup Git on Windows), create a local clone (copy) of this project on your computer by running this in the bash console (create some folder, right click on it, select 'Git Bash Here', copy command below, and paste into into console via Shift+Insert):

And I guess that is the end of this article. I sincerely hope Lucene.Net will speed up your searches and your life!

I hope this article was at least somewhat helpful! Let me know what you think!

References

I used those articles as a source of Lucene.Net inspiration and knowledge. They are quite outdated, but you can find quite a deal of theory and in-depth knowledge on Lucene (with adjustments to the latest Lucene.Net):

History

2/23/2012 - added a better way to parse Lucene search query (thanks to Gavin Harriss for the tip!), in the sample search site added SearchDefault() method to search using raw Lucene query, so
sample search site is at version v.1.3 now!

2/9/2012 - removed writer.Optimize() from search methods, since it may degrade search performance, and placed it into separate Optimize() method to be called manually or on schedule (thanks to dave.dolan for the tip!). Also added analyzer.Close() line to several methods, as not closed analyzer was causing errors in some occasions. Sample search site is updated to v.1.2!

2/7/2012 - ClearLuceneIndex() method is updated to a built-in way of clearing the whole Lucene search index (thanks to ENOTTY for this tip!), sample search site is updated to v.1.1

2/2/2012 - Cleaned and updated attached sample search site, few small article updates as well

2/1/2012 - Initial article about using Lucene.Net 2.9.4 is released along with
sample search sitev.1.0

One thing that would help more is a sample that lets you search multi fields for different values or better yet basic fluent interface to build up a search if possible. Basically my objects will have many fields to be indexed and i am looking for a way to provide a simple interface to search based on one or more fields. I know it is a little outside this article so maybe only searching for multiple fields is relevant but a fluent interface would be useful to others as well i am sure. I have never built a fluent interface so i am not sure how complicated they are.

As for the fluent interface, you are right, it's a bit out of scope of this article (which is aimed to be basic and general). Fluent interface most probably will be a more or less project specific. On the other hand there might be a good way to simplify existing article and example by incorporating some fluent interface elements. I need to think about it.

Thanks for the reply. I was not clear i meant search field xxx for yyyy and field zzz for eee. The example just searches both for the same value right? linq one sounds interesting so i am going to look at that and maybe it will have some answers as well.

One other question and it is prob my setup but i find the id and other unique columns work perfectly in my search but number columns with repeating positive and/or negative numbers seem to return way more results than they should and incorrect results. I am using SearchDefault method to rule out the wild card but i figure it has to be the way i am either trying to search for an exact match or the settings i used when i inserted the values. Any ideas?

I see what you mean, but here is the thing, Lucene was designed as an advanced text search engine, not the database to store records, so it doesn't have built in support for paging, and possibly never will.

And here is the thing - in both your case and the one above, Lucene will get all search results first, and only then you are able to select a page from that set of results... So there is no way to make it search "faster", other than setting a limit on the maximum number of results.

What I am trying to say, paging implementation in Lucene cannot be native as of now. So it doesn't matter how you filter results after you already got them.

However, that is true for Lucene.net. Original Java version is always more up-to-date and may have it already...

Hopefully someone will correct me if I am wrong here.

(p.s. If I ever discover a native way to page in Lucene.net, I'll post it here)

i have couple of question of your here
----------------------------------------
1) why you replace all * & ? to empty string in search term
2) why you search once with QueryParser and again by MultiFieldQueryParser
3) how are you detect that search term has one word or many words separated by space.
4) how wild card search can be done using ur code....where to change.
5) how do u handling search for similar word like if anyone search with helo then hello related result should come.
6) when my search result will return 5000 record and then if i limit like 1000 then how could i show next 4000 in pagination fasion.pagination logic is missing. it would be better if wrote code for handling pagination.
7) one request please convert the whole code to asp.net webform if it would be possible for you because still many people not familiar with asp.net mvc.

Is check if string has ONLY * or &, because if it has only those symbols, but not any text, Lucene gives an error. So if this validaion fails, empty list of data is returned.

2) why you search once with QueryParser and again by MultiFieldQueryParser
- QueryParser is used when you search against single field stored in Lucene index (in this Article, I store three fields - Id, Name, Description (see here[^])). E.g. when you want Lucene to give results based only on the Description, but not Id or Name. The field used is specified in searchField variable. If this variable is empty, Lucene will run your search query against all three fields using MultiFieldQueryParser. So in the sample site search is done against all three fields by default, or against the only field you select in the dropdown box next to search button.

3) how are you detect that search term has one word or many words separated by space.
- I don't! Actually with Lucene you don't care about this, as it attempts to give you most relevant results automatically. Of course you can fiddle with Lucene query to make it extremely custom and precise, but I prefer to do advanced filtering after Lucene data is returned, via LINQ.

4) how wild card search can be done using ur code....where to change.
- My default Search method already wildcards your query by adding * at the end of each word (see here[^]). If you need to run unfiltered Lucene query, inputting wildcards manually use SearchDefault method instead. I don't know much about making extensive Lucene queries though, as basic addition of * at the end of each word worked good enough for me.

5) how do u handling search for similar word like if anyone search with helo then hello related result should come.
- I don't handle this at all in my code. It has to be set in Lucene query, most probably you need to set Fuzzy search like here read about Lucene's fuzzy search[^]. Basically, all you do, is add '~' symbol at the end of each word, and Lucene will return similar words.

6) when my search result will return 5000 record and then if i limit like 1000 then how could i show next 4000 in pagination fasion.pagination logic is missing. it would be better if wrote code for handling pagination.
- Lucene doesn't handle pagination by itself (at least I haven't find the way yet). So far, the only way to paginate is to get all search results first, than paginate them via .NET... Consider this also - Lucene is good for complex quick searches of relevant results, so you shouldn't need it to return that many records, only relevant ones (like Google search). So it is not a complete replacement for searching via regular SQL querries.

7) one request please convert the whole code to asp.net webform if it would be possible for you because still many people not familiar with asp.net mvc.
- Done! I've posted the link in your previous question. It doesn't have pagination yet, I'll add it later though, when I got time. Again, keep in mind, that the pagination will be .NET based, not Lucene based, as Lucene is not really designed for this (unless someone knows differently).

Hi Tridip! I'm glad you like my article! No problem, I'll add a WebForms clone to the sample site and a pagination too.
Also, I'll respond to your other question little later, as I'm a bit busy right now.

I am sure Lucene can do any complex search on text in existing index, you just need to format lucene query appropriately, or use appropriate analyzer. I haven't really done much research on the topic...

I am sure there are more ways of storing indices in databases for several users, may be enve using SQL Server. I just haven't really looked into this yet, as standard Lucene index worked well enough for me.