PyLucene also includes a number of Lucene contrib packages: the
Snowball analyzer and stemmers, the highlighter package, analyzers
for other languages than english, regular expression queries,
specialized queries such as 'more like this' and more.

This document only covers the pythonic extensions to Lucene offered
by PyLucene as well as some differences between the Java and Python
APIs. For the documentation on Java Lucene APIs,
see here.

To help with debugging and to support some Lucene APIs, PyLucene also
exposes some Java runtime APIs.

Before PyLucene APIs can be used from a thread other than the main
thread that was not created by the Java Runtime, the
attachCurrentThread() method must be called on the
JCCEnv object returned by the initVM()
or getVMEnv() functions.

Java arrays are returned to Python in a JArray
wrapper instance that implements the Python sequence protocol. It
is possible to change array elements but not to change the array
size.

A few Lucene APIs take array arguments and expect values to be
returned in them. To call such an API and be able to retrieve the
array values after the call, a Java array needs to instantiated
first.
For example, accessing termDocs:

In addition to int, the JArray
function accepts object, string,
bool, byte, char,
double, float, long
and short to create an array of the corresponding
type. The JArray('object') constructor takes a second
argument denoting the class of the object elements. This argument
is optional and defaults to Object.

To convert a char array to a Python string use a
''.join(array) construct.

Instead of an integer denoting the size of the desired Java array,
a sequence of objects of the expected element type may be passed
in to the array constructor.
For example:

All methods that expect an array also accept a sequence of Python
objects of the expected element type. If no values are expected
from the array arguments after the call, it is hence not necessary
to instantiate a Java array to make such calls.

The PyLucene API exposes all Java Lucene classes in a flat namespace
in the PyLucene module. For example, the Java import
statement import
org.apache.lucene.index.IndexReader; corresponds to the
Python import statement from lucene import
IndexReader

Downcasting is a common operation in Java but not a concept in
Python. Because the wrapper objects implementing exactly the
APIs of the declared type of the wrapped object, all classes
implement two class methods called instance_ and cast_ that
verify and cast an instance respectively.

Java is a very verbose language. Python, on the other hand, offers
many syntactically attractive constructs for iteration, property
access, etc... As the Java Lucene samples from the Lucene in
Action book were ported to Python, PyLucene received a number
of pythonic extensions listed here:

Iterating search hits is a very common operation. Hits instances
are iterable in Python. Two values are returned for each
iteration, the zero-based number of the document in the Hits
instance and the document instance itself.
The Java loop:
for (int i = 0; i < hits.length(); i++) {
Document doc = hits.doc(i);
System.out.println(hits.score(i) + " : " + doc.get("title"));
}
can be written in Python:
for hit in hits:
hit = Hit.cast_(hit)
print hit.getScore(), ':', hit.getDocument['title']
if hit.iterator()'s next() method were declared to return
Hit instead of Object, the above
cast_() call would not be unnecessary.
The same java loop can also be written:
for i xrange(len(hits)):
print hits.score(i), ':', hits[i]['title']

Document instances have fields whose values can be accessed
through the mapping protocol.
The Java expression:
doc.get("title")
is better written in Python:
doc['title']

Document instances can be iterated over for their fields.
The Java loop:
Enumeration fields = doc.getFields();
while (fields.hasMoreElements()) {
Field field = (Field) fields.nextElement();
...
}
is better written in Python:
for field in doc.getFields():
field = Field.cast_(field)
...
Once JCC heeds Java 1.5 type parameters and once Java Lucene
makes use of them, such casting should become unncessary.

Many areas of the Lucene API expect the programmer to provide
their own implementation or specialization of a feature where
the default is inappropriate. For example, text analyzers and
tokenizers are an area where many parameters and environmental
or cultural factors are calling for customization.

PyLucene enables this by providing Java extension points listed
below that serve as proxies for Java to call back into the
Python implementations of these customizations.

These extension points are simple Java classes that JCC
generates the native C++ implementations for. It is easy to add
more such extensions classes into the 'java' directory of the
PyLucene source tree.

To learn more about this topic, please refer to the JCC
documentation.

Please refer to the classes in the 'java' tree for currently
available extension points. Examples of uses of these extension
points are to be found in PyLucene's unit tests.

The Apache Software Foundation provides support for the Apache community of open-source software projects. The Apache projects are defined by collaborative consensus based processes, an open, pragmatic software license and a desire to create high quality software that leads the way in its field. Apache Lucene, Apache Solr, Apache PyLucene, Apache Open Relevance Project and their respective logos are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.