I am to embark on a similar project. Just started reading on it. Isn’t
the index file created by Lucene readable from Ruby? I was planning to
index the files via Lucene and use Ruby to read the index file or if I
cannot read the index file from Ruby I plan to expose a webservice from
the Java side and consume it from the Ruby side. Looking at
Nathaniel’s post about CDBaby, I think I am on the right path.

I am to embark on a similar project. Just started reading on it. Isn’t
the index file created by Lucene readable from Ruby?

Hi Kris,
Andreas is correct in stating that Ferret can read the index from Ruby.

I was planning to
index the files via Lucene and use Ruby to read the index file or if I
cannot read the index file from Ruby I plan to expose a webservice from
the Java side and consume it from the Ruby side. Looking at
Nathaniel’s post about CDBaby, I think I am on the right path.

Why use Lucene to index the files when you can use Ferret. If it’s
speed you are concerned about, it shouldn’t be a problem by the end of
the month. Ferret should be faster than Lucene by then. One good
reason I can think of is that you’ll have better support in Java for
indexing PDF’s and Microsoft Office Docs. And Unicode is easier in
Java.

I am looking at Ferret too but I think Lucene is more matured.

This is true in terms of possible bugs. But the index file format and
API are the same.