-<p>Couchdb-lucene is known to be incompatible with some versions of OpenJDK as it includes an earlier, and incompatible, version of the Rhino Javascript library. The version in Ubuntu 8.10 (6b12-0ubuntu6.4) is known to work and it uses Rhino 1.7R1.</p>

-Data may be added to this document with the add method which takes an optional second object argument that can override any of the above default values.

-

-The data is usually interpreted as a String but couchdb-lucene provides special handling if a Javascript Date object is passed. Specifically, the date is indexed as a numeric value, which allows correct sorting, and stored (if requested) in ISO 8601 format (with a timezone marker).

-

-<pre>

-// Add with all the defaults.

-doc.add("value");

-

-// Add a subject field.

-doc.add("this is the subject line.", {"field":"subject"});

-

-// Add but ensure it's stored.

-doc.add("value", {"store":"yes"});

-

-// Add but don't analyze.

-doc.add("don't analyze me", {"index":"not_analyzed"});

-

-// Extract text from the named attachment and index it (but not store it).

-doc.attachment("attachment name", {"field":"attachments"});

-</pre>

-

-<h3>Example Transforms</h3>

-

-<h4>Index Everything</h4>

-

-<pre>

-function(doc) {

- var ret = new Document();

-

- function idx(obj) {

- for (var key in obj) {

- switch (typeof obj[key]) {

- case 'object':

- idx(obj[key]);

- break;

- case 'function':

- break;

- default:

- ret.add(obj[key]);

- break;

- }

- }

- };

-

- idx(doc);

-

- if (doc._attachments) {

- for (var i in doc._attachments) {

- ret.attachment("attachment", i);

- }

- }

-

- return ret;

-}

-</pre>

-

-<h4>Index Nothing</h4>

-

-<pre>

-function(doc) {

- return null;

-}

-</pre>

-

-<h4>Index Select Fields</h4>

-

-<pre>

-function(doc) {

- var result = new Document();

- result.add(doc.subject, {"field":"subject", "store":"yes"});

- result.add(doc.content, {"field":"subject"});

- result.add({"field":"indexed_at"});

- return result;

-}

-</pre>

-

-<h4>Index Attachments</h4>

-

-<pre>

-function(doc) {

- var result = new Document();

- for(var a in doc._attachments) {

- result.add_attachment(a, {"field":"attachment"});

- }

- return result;

-}

-</pre>

-

-<h4>A More Complex Example</h4>

-

-<pre>

-function(doc) {

- var mk = function(name, value, group) {

- var ret = new Document();

- ret.add(value, {"field": group, "store":"yes"});

- ret.add(group, {"field":"group", "store":"yes"});

- return ret;

- };

- var ret = [];

- if(doc.type != "reference") return null;

- for(var g in doc.groups) {

- ret.add(mk("library", doc.groups[g].library, g));

- ret.add(mk("method", doc.groups[g].method, g));

- ret.add(mk("target", doc.groups[g].target, g));

- }

- return ret;

-}

-</pre>

-

-<h2>Attachment Indexing</h2>

-

-Couchdb-lucene uses <ahref="http://lucene.apache.org/tika/">Apache Tika</a> to index attachments of the following types, assuming the correct content_type is set in couchdb;

-

-<h3>Supported Formats</h3>

-

-<ul>

-<li>Excel spreadsheets (application/vnd.ms-excel)

-<li>Word documents (application/msword)

-<li>Powerpoint presentations (application/vnd.ms-powerpoint)

-<li>Visio (application/vnd.visio)

-<li>Outlook (application/vnd.ms-outlook)

-<li>XML (application/xml)

-<li>HTML (text/html)

-<li>Images (image/*)

-<li>Java class files

-<li>Java jar archives

-<li>MP3 (audio/mp3)

-<li>OpenDocument (application/vnd.oasis.opendocument.*)

-<li>Plain text (text/plain)

-<li>PDF (application/pdf)

-<li>RTF (application/rtf)

-</ul>

-

-<h1>Searching with couchdb-lucene</h1>

-

-You can perform all types of queries using Lucene's default <ahref="http://lucene.apache.org/java/2_4_0/queryparsersyntax.html">query syntax</a>. The _body field is searched by default which will include the extracted text from all attachments. The following parameters can be passed for more sophisticated searches;

-

-<dl>

-<dt>q</dt><dd>the query to run (e.g, subject:hello). If not specified, the default field is searched.</dd>

-<dt>lang</dt><dd>The language that the query parameter is in. Available options, and the default if not specified, are identical to the language option specified above.</dd>

-<dt>sort</dt><dd>the comma-separated fields to sort on. Prefix with / for ascending order and \ for descending order (ascending is the default if not specified).</dd>

-<dt>limit</dt><dd>the maximum number of results to return</dd>

-<dt>skip</dt><dd>the number of results to skip</dd>

-<dt>include_docs</dt><dd>whether to include the source docs</dd>

-<dt>stale=ok</dt><dd>If you set the <i>stale</i> option <i>ok</i>, couchdb-lucene may not perform any refreshing on the index. Searches may be faster as Lucene caches important data (especially for sorting). A query without stale=ok will use the latest data committed to the index.</dd>

-<dt>debug</dt><dd>if false, a normal application/json response with results appears. if true, an pretty-printed HTML blob is returned instead.</dd>

-<dt>rewrite</dt><dd>(EXPERT) if true, returns a json response with a rewritten query and term frequencies. This allows correct distributed scoring when combining the results from multiple nodes.</dd>

-</dl>

-

-<p><i>All parameters except 'q' are optional.</i></p>

-

-<h2>Special Fields</h2>

-

-<dl>

-<dt>_db</dt><dd>The source database of the document.</dd>

-<dt>_id</dt><dd>The _id of the document.</dd>

-</dl>

-

-<h2>Dublin Core</h2>

-

-<p>All Dublin Core attributes are indexed and stored if detected in the attachment. Descriptions of the fields come from the Tika javadocs.</p>

-

-<dl>

-<dt>_dc.contributor</dt><dd> An entity responsible for making contributions to the content of the resource.</dd>

-<dt>_dc.coverage</dt><dd>The extent or scope of the content of the resource.</dd>

-<dt>_dc.creator</dt><dd>An entity primarily responsible for making the content of the resource.</dd>

-<dt>_dc.date</dt><dd>A date associated with an event in the life cycle of the resource.</dd>

-<dt>_dc.description</dt><dd>An account of the content of the resource.</dd>

-<dt>_dc.format</dt><dd>Typically, Format may include the media-type or dimensions of the resource.</dd>

-<dt>_dc.identifier</dt><dd>Recommended best practice is to identify the resource by means of a string or number conforming to a formal identification system.</dd>

-<dt>_dc.language</dt><dd>A language of the intellectual content of the resource.</dd>

-<dt>_dc.modified</dt><dd>Date on which the resource was changed.</dd>

-<dt>_dc.publisher</dt><dd>An entity responsible for making the resource available.</dd>

-<dt>_dc.relation</dt><dd>A reference to a related resource.</dd>

-<dt>_dc.rights</dt><dd>Information about rights held in and over the resource.</dd>

-<dt>_dc.source</dt><dd>A reference to a resource from which the present resource is derived.</dd>

-<dt>_dc.subject</dt><dd>The topic of the content of the resource.</dd>

-<dt>_dc.title</dt><dd>A name given to the resource.</dd>

-<dt>_dc.type</dt><dd>The nature or genre of the content of the resource.</dd>

-<p>You will need to restart CouchDB if you change couchdb-lucene source code but this is very fast.</p>

-

-<h1>Configuration</h1>

-

-<p>couchdb-lucene respects several system properties;</p>

-

-<dl>

-<dt>couchdb.url</dt><dd>the url to contact CouchDB with (default is "http://localhost:5984")</dd>

-<dt>couchdb.lucene.dir</dt><dd>specify the path to the lucene indexes (the default is to make a directory called 'lucene' relative to couchdb's current working directory.</dd>

-<dt>couchdb.log.dir</dt><dd>specify the directory of the log file (which is called couchdb-lucene.log), defaults to the platform-specific temp directory.</dd>

-</dl>

-

-<p>You can override these properties like this;</p>

-

-<pre>

-fti=/usr/bin/java -Dcouchdb.lucene.dir=/tmp \

--cp /home/rnewson/Source/couchdb-lucene/target/classes:\

-/home/rnewson/Source/couchdb-lucene/target/dependency\

-com.github.rnewson.couchdb.lucene.Main

-</pre>

-

-<h2>Basic Authentication</h2>

-

-<p>If you put couchdb behind an authenticating proxy you can still configure couchdb-lucene to pull from it by specifying additional system properties. Currently only Basic authentication is supported.</p>

-

-<dl>

-<dt>couchdb.user</dt><dd>the user to authenticate as.</dd>

-<dt>couchdb.password</dt><dd>the password to authenticate with.</dd>

-</dl>

-

-<h2>IPv6</h2>

-

-<p>The default for couchdb.url is problematic on an IPv6 system. Specify -Dcouchdb.url=http://[::1]:5984 to resolve it.</p>

-

+ See <ahref="http://github.com/rnewson/couchdb-lucene/">this page</a> for more details.