Project Description

Apache Solr is based on Lucene and is the enterprise open source search engine. It powers the search of sites like Twitter, the Apple and iTunes Stores, Wikipedia, Netflix and many more.

Solr does not only scale to any level of content, but provides rich search functionality, like faceting, geospatial search, suggestions, spelling corrections, indexing of binary formats and a whole variety of powerful tools to configure custom search solutions. It has integrated clustering and load-balancing to provide a high level of robustness.

collective.solr comes with a default configuration and setup of Solr that makes it extremely easy to get started, yet provides a vastly superior search quality compared to Plone’s integrated text search based on ZCTextIndex.

Next you should activate the collective.solr (site search) add-on in the add-on control panel of Plone.
After activation you should review the settings in the new Solr Settings control panel.
To index all your content in Solr you can call the provided maintenance view:

http://localhost:8080/plone/@@solr-maintenance/reindex

Solr connection configuration in ZCML

The connections settings for Solr can be configured in ZCML and thus in buildout. This makes it easier when copying databases between multiple Zope instances with different Solr servers.

The code is used in production in many sites and considered stable. This add-on can be installed in a Plone 4.1 (or later) site to enable indexing operations as well as searching (site and live search) using Solr. Doing so will not only significantly improve search quality and performance - especially for a large number of indexed objects, but also reduce the memory footprint of your Plone instance by allowing you to remove the SearchableText, Description and Title indexes from the catalog.
In large sites with 100000 content objects and more, searches using ZCTextIndex often taken 10 seconds or more and require a good deal of memory from ZODB caches. Solr will typically answer these requests in 10ms to 50ms at which point network latency and the rendering speed of Plone’s page templates are a more dominant factor.

Ported atomic updates from ftw.solr.
This requires you to update your solr config, load the new solr config and
do a full reindex. For more informations check the “feature” section.
The feature was implemented in ftw.solr by [lgraf].
[mathias.leimgruber]

Add support for using different request handlers in search requests.
[buchi]

Make CollectiveSolrLayer configurable, to allow testing different cores.
[timo]

Added context to search utility. This allows query to be used in AJAX calls.
[tomgross]

Use GET method in spell check request (as it’s an idempotent request which
does not affect server state)
[reinhardt]

Add zopectl.command for reindexing. Do not rely on positional arguments in _get_site.
[tschorr]

Move inline function out of to the global scope to make it more readable.
[gforcada]

Unify all exceptions raised by collective.solr.
[gforcada]

Soft commit changes while reindexing.
This allows to get results on searches while reindexing is taking place.
[gforcada]

4.1.0 (2015-02-19)

Pep8.
[timo,do3cc]

Refactor tests. Tests are now based on plone.app.testing. You can now
use the Fixture COLLECTIVE_SOLR_FIXTURE and the utility method
collective.solr.testing:activateAndReindex() to test your code with solr
[do3cc]

Refactor ISearch. The method buildQuery has been replaced with buildQueryAndParameters.
Responsabilities have been divided in the search view and the utility, now they are
all in the search utility. If you used the method before, please analyse
the changes in collective.solr.dispatcher:solrSearchResults from 4.0.3 to 4.1.0
You can probably benefit from the changes.
[do3cc]

Make sure slashes are properly escaped in the search query. Solr 4.0 added
regular expression support, which means that ‘/’ is now a special character
and must be escaped if searching for literal forward slash.
[timo]

Implement the getDataOrigin method for the FlareContentListingObject that
plone.app.contentlisting defines and that plone.app.search expects to exist.
[timo]

Use tika for extracting binary content.
[tom_gross]

Plone 4.3 compatibility of search view
[tom_gross]

Introduce ICheckIndexable-adapter for checking if an object is indexable.
[tom_gross]

3.0a4 - 2011-08-22

3.0a3 - 2011-08-22

Fixed handling of intra-word hyphens to be taken literally instead of being
interpreted as syntax for text fields.
[hannosch]

Explicitly require Plone 4.1 / Zope 2.13.
[hannosch]

Depend on the new c.indexing 2.0a2.
[hannosch]

Added an archetypes.schemaextender dependency and register two fields for
all objects providing IATContentType. showinsearch is a boolean field that
can be used to hide specific content items from search results. searchwords
is a lines field, which lets you specify words that an object should be found
under.
[hannosch]

Added documentation on setting up a master-slave configuration using the
SolrReplication support.
[hannosch]

Adjust tests to work with latest collective.recipe.solrinstance = 3.3 and
its new ICU-based text field.
[hannosch]

3.0a1 - 2011-06-23

Upgrade notes

Changed the names of the indexes used to emulate the path index. You need
to adjust your schema and rename physicalPath to path_string,
physicalDepth to path_depth and parentPaths to path_parents. This
also requires a full Solr reindex to pick up the new data.
[hannosch]

Changes

Added object_provides index to example schema, as it’s used in the
collection portlet to find collections.
[hannosch]

Rewrote the maintenance/sync method for more performance, dropped the
optional path restriction from it and removed the cache argument. It
should be able to sync datasets in the 100,000 object range in the matter of
a couple minutes.
[hannosch]

Changed the maintenance/reindex method to only flush data to Solr but not
commit after each batch. Instead we only commit once at the end. You should
configure auto commit policies on the Solr server side or commitWithin.
[hannosch]

Adjusted the mangleQuery function to calculate extended path indexes from
the Solr schema instead of hardcoding path. If you have any additional
extended path indexes, you need to provide indexers with the same three
suffixes as we do ourselves in the attributes module for the path index
and add those to the Solr schema.
[hannosch]

Added documentation on Java process, monitoring production settings and
include a number of useful munin plugin configurations.
[hannosch]

Updated example config to include production settings and JMX.
[hannosch]

2.0a1 - 2011-01-10

Added zopectl.command entry points for three new scripts.
solr_clear_index will remove all entries from Solr. solr_dump_catalog
will efficiently dump the content of the catalog onto the filesystem and
solr_import_dump will import the dump into Solr. This can be used to
bootstrap an empty Solr index or update it when the boost logic has changed.
All scripts will either take the first Plone site found in the database or
accept an unnamed command line argument to specify the id. The Solr server
needs to be running and the connection info needs to be configured in the
Plone site. Example use: bin/instance solr_dump_catalog Plone. In this
example the data would be stored in var/instance/solr_dump_plone. The data
can be transferred between machines and calling solr_dump_catalog multiple
times will append new data to the existing dump. To get Solr up-to-date you
should still call @@solr-maintenance/sync.
[hannosch, witsch]

Changed search pattern syntax to use str.format syntax and make both
{value} and {base_value} available in the pattern.
[hannosch]

Add logging for slow queries along with the query time as reported by Solr.
[witsch]

Limit number of matches looked up during live search for speedier replies.
[witsch]

Renamed the batch parameters to b_start and b_size to avoid
conflicts with index names and be consistent with existing template code.
[do3cc]

Added a new config option auto-commit which is enabled by default. You
can disable this, which avoids any explicit commit messages to be sent to
the Solr server by the client. You have to configure commit policies on
the server side instead.
[hannosch]

Added support for a special query key use_solr which forces queries to
be sent to Solr even though none of the required keys match. This can be
used to sent individual catalog queries to Solr.
[hannosch]

1.0b23 - Released May 15, 2010

Add support for batching, i.e. only fetch and parse items from Solr,
which are part of the currently handled batch.
[witsch]

Fix quoting of operators for multi-word search terms.
[witsch]

Use the faster C implementations of elementtree/xml.etree if available.
[hannosch, witsch]

1.0b22 - Released February 23, 2010

Split out a BaseSolrConnectionConfig class, to be used for registering a
non-persistent connection configuration.
[hannosch]

Fix bug regarding timeout locking.
[witsch]

Convert test setup to collective.testcaselayer.
[witsch]

Only apply timeout decorator when actually committing changes to Solr,
also re-enabling the use of query parameters for maintenance views again.
[witsch]

We also need to change the SearchDispatcher to use the original method
in case Solr isn’t active.
[hannosch]

Changed the searchResults monkey to store and use the method found on
the class instead of assuming it comes from the base class. This makes
things work with LinguaPlone which also patches this method.
[hannosch]

Add dutch translation.
[WouterVH]

Refactor buildout to allow running tests against Plone 4.x.
[witsch]

Optimize reindex behavior when populating the Solr index for the first time.
[hannosch, witsch]

Only register indexable attributes the old way on Plone 3.x.
[jcbrand]

Fix timeout decorator to work ttw.
[hannosch, witsch]

Add “z3c.autoinclude.plugin” entry point, so in Plone 3.3+ you can avoid
loading the ZCML file.
[hannosch]

1.0b21 - Released February 11, 2010

Fix unindexing to not fetch more data from the objects than necessary.
[witsch]

Use decorator to lock timeouts and make sure the lock is always released.
[witsch]

Fix maintenance views to work without setting up a Solr connection first.
[witsch]