The ‘Inverted Index’ – an efficient method of finding documents
that contain given words.
In other words, instead of trying to answer the question “what words are contained
in this document?” this structure is optimized for providing quick answers to
“which documents contain word X?”

Lucene doesn’t offer an update(Document) method;
instead, a Document must first be deleted from an index and then re-added to it.

Use ‘doc.setBoost(float)’ to adjust the importance of documents.
Use ‘field.setBoost(float)’ to set level for fields.

Using indexable date/time fields to high resolution (milliseconds) may cause
performance problems.

Use indexable numeric fields for range queries (store the size of email messages,
for example).

Use ‘addIndexes(Directory[])’ to copy indexes from one IndexWriter to
another – for example, from RAMDirectory to FSDirectory .

Limit Field sizes with maxFieldLength – default is 10K terms per document.

Optimizing an index
— Merging segments
— Optimizing an index only affects the speed of searches
against that index, and does not affect the speed of indexing.
— API invoke pattern:
IndexWriter writer = new IndexWriter(“/path/to/index”, analyzer, false);
writer.optimize();
writer.close();

Ch. 3 – Search in applications

Scoring
Factors:
— tf(t in d) Term frequency factor for the term (t) in the document (d).
— idf(t) Inverse document frequency of the term.
— boost(t.field in d) Field boost, as set during indexing.
— lengthNorm(t.field in d) Normalization value of a field, given the number of terms within the
field. This value is computed during indexing and stored in the index.
— coord(q, d) Coordination factor, based on the number of query terms the
document contains.
— queryNorm(q) Normalization value for a query, given the sum of the squared weights
of each of the query terms.