Apache Solr High Performance

Progressing

Surendra MohanMarch 2014

In setting up Apache Solr, you’ll want to ensure it’s achieving optimum search results with maximum efficiency. This book shows you just how to achieve that with a comprehensive tutorial including troubleshooting.

$20.99

$34.99

RRP $20.99

RRP $34.99

eBook

Print + eBook

Want this title & more?

$16.99 p/month

Subscribe to PacktLib

Enjoy full and instant access to over 2000 books and videos – you’ll find everything you need to stay ahead of the curve and make sure you can always get the job done.

Book Details

ISBN 139781782164821

Paperback124 pages

About This Book

Achieve high scores by boosting query time and index time, implementing boost queries and functions using the Dismax query parser and formulae.

Set up and use SolrCloud for distributed indexing and searching, and implement distributed search using Shards

Use GeoSpatial search, handling homophones, and ignoring listed words from being indexed and searched

Who This Book Is For

This book is ideal for Apache Solr developers who want to learn different techniques to optimize Solr's performance with utmost efficiency, along with effectively troubleshooting the problems that usually occur while trying to boost performance. Familiarity with search servers and database querying is expected.

Table of Contents

Chapter 1: Installing Solr

Prerequisites for Solr

Summary

Chapter 2: Boost Your Search

Scoring

The dismax query parser

Function queries

Summary

Chapter 3: Performance Optimization

Solr performance factors

Solr caching

Using SolrCloud

Near real-time search

Summary

Chapter 4: Additional Performance Optimization Techniques

Documents similar to those returned in the search result

Sorting results by function values

Searching for homophones

Ignore the defined words from being searched

Summary

Chapter 5: Troubleshooting

Dealing with the corrupt index

Reducing the file count in the index

Dealing with the locked index

Truncating the index size

Dealing with a huge count of open files

Dealing with out-of-memory issues

Dealing with an infinite loop exception in shards

Dealing with expensive garbage collection

Bulk updating a single field without full indexation

Summary

Chapter 6: Performance Optimization with ZooKeeper

Getting familiar with ZooKeeper

Setting up, configuring, and deploying ZooKeeper

Applications of ZooKeeper

Summary

What You Will Learn

Boost your search based on scores, the DisMax query parser, and function queries.

Explore performance metrics along with implementing different Solr caching like Document, query result, filter, and whole result page caching.

Index and search across shards and near real-time searching.

Get to grips with additional performance optimization activities like fetching documents similar to the ones queried, searching homophones, or filtering searches on the basis of specific key words.

Troubleshoot the common problems like corrupt and locked indexes, memory, expensive garbage collection, and infinite loop exception when using multiple server environment efficiently

Set up, configure, and deploy various applications of ZooKeeper to optimize Solr’s performance

In Detail

Apache Solr is one of the most popular open source search servers available on the web. However, simply setting up Apache Solr is not enough to ensure the success of your web product. To maximize efficiency, you need to use techniques to boost Solr performance in order to return relevant results faster. You need to implement robust techniques that focus on optimizing the performance of your Solr instances and also troubleshoot issues that are prone to arise while maintaining Solr.

Apache Solr High Performance is a practical guide that will help you explore and take full advantage of the robust nature of Apache Solr so as to achieve optimized Solr instances, especially in terms of performance.

You will learn everything you need to know in order to achieve a high performing Solr instance or set of instances, as well as how to troubleshoot the common problems you are prone to face while working with single or multiple Solr servers.

This book offers you an introduction by explaining the prerequisites of Apache Solr and installing it, while also integrating it with the required additional components, and gradually progresses into features that make Solr flexible enough to achieve high performance ratings in various circumstances. Moving forward, the book will cover several clear and highly practical concepts that will help you further optimize your Solr instances’ performance both on single as well as multiple servers, and learn how to troubleshoot common problems that are prone to arise while using your Solr instance. By the end of the book you will also learn how to set up, configure, and deploy ZooKeeper along with learning more about other applications of ZooKeeper.

You will also learn how to handle data in multiple server environments, searches based on specific geographical co-ordinates, different caching techniques, and various algorithms and formulae that enable better performance; and many more.

Authors

Surendra Mohan

Surendra Mohan, who has served a few top-notch software organizations in varied roles, is currently a freelance software consultant. He has been working on various cutting-edge technologies like Drupal, Moodle, Apache Solr, ElasticSearch, Node.js, SoapUI, and so on for the past 10 years. He also delivers technical talks at various community events like Drupal Meetups and Drupal Camps. To find out more about him, his write-ups, technical blogs, and much more, go to http://www.surendramohan.info/.

He has also written the books Administrating Solr and Apache Solr High Performance published by Packt Publishing and has reviewed other technical books such as Drupal 7 Multi Site Configuration and Drupal Search Engine Optimization, as well as titles on Drupal commerce, ElasticSearch, Drupal related video tutorials, titles on OpsView, and many more.

Alerts & Offers

Series & Level

We understand your time is important. Uniquely amongst the major publishers, we seek to develop and publish the broadest range of learning and information products on each technology. Every Packt product delivers a specific learning pathway, broadly defined by the Series type. This structured approach enables you to select the pathway which best suits your knowledge level, learning style and task objectives.

Learning

As a new user, these step-by-step tutorial guides will give you all the practical skills necessary to become competent and efficient.

Beginner's Guide

Friendly, informal tutorials that provide a practical introduction using examples, activities, and challenges.

Essentials

Fast paced, concentrated introductions showing the quickest way to put the tool to work in the real world.

Cookbook

A collection of practical self-contained recipes that all users of the technology will find useful for building more powerful and reliable systems.

Blueprints

Guides you through the most common types of project you'll encounter, giving you end-to-end guidance on how to build your specific solution quickly and reliably.

Mastering

Take your skills to the next level with advanced tutorials that will give you confidence to master the tool's most powerful features.

Starting

Accessible to readers adopting the topic, these titles get you into the tool or technology so that you can become an effective user.

Progressing

Building on core skills you already have, these titles share solutions and expertise so you become a highly productive power user.