Introduction to SOLR Search Engine

Sundara PandianTechnical Lead

Solr is an open source search platform which can index and search multiple sites by returning suggestions for related content based on the search query. Solr makes it easy for programmers to develop sophisticated, high-performance search applications with advanced features such as faceting (arranging search results in columns with numerical counts of key terms). Solr builds on another open source search technology: Lucene, built on Java. It provides indexing then search technology, as well as spell checking, advanced analysis/tokenization and hit highlighting capabilities

A Solr index can accept data from many different sources, including comma-separated value (CSV) files, XML files, files in common file formats such as Microsoft Word or PDF and data extracted from tables in a database.

How does SOLR search work?

Solr is a Apache lucene library wrapper. It uses lucene classes to create this index called Inverted Index. Solr maintains a list called posting list, which holds the mapping of words/terms/phrases with the corresponding places where they occur. Apache Solr is a search engine, where you can index a set of document (say, news articles) and then query Solr to return a set of documents that matches user query.

NoSQL database − Solr can also be used as big data scale NOSQL database where we can distribute the search tasks along a cluster.

Full text search − Solr provides all the capabilities needed for a full text search such as phrases, tokens, wildcard, spell check and auto-complete.

Highly Scalable − While using Solr with Hadoop, we can scale its capacity by adding replicas.

Enterprise ready − According to the need of the organization, Solr can be deployed in any kind of systems (big or small) such as standalone, distributed, cloud, etc.

Restful APIs − To communicate with Solr, It is not mandatory to have Java development skills. Instead you can use restful services to communicate with it. We enter documents in Solr in file formats like JSON, CSV and XML and get results in the same file formats.

Text-Centric and Sorted by Relevance − Solr is mostly used to search text documents and the results are delivered according to the relevance with the user’s query in order.

Extensible and Flexible − We can customize the components of Solr easily by extending the Java classes and configuring accordingly.

Admin Interface − Solr provides an easy-to-use, user friendly, feature powered, user interface, using which we can perform all the possible tasks such as manage add, update, delete, search documents and logs.

Limitations of using Solr

Latency is Increased (Solr replication latency and sum of tracking).

Need to do occasional large IO load to replicate large merges.

Complicated Load balance and management.

Need to do Reconfiguration, if the master is lost.

Ready to Build Your Projects?

The best companies and startups hire our top dev teams, and so can you.