About Alex Soto

Searchable documents? Yes You Can. Another reason to choose AsciiDoc

Elasticsearch is a flexible and powerful open source, distributed real-time search and analytics engine for the cloud based on Apache Lucene which provides full text search capabilities. It is document oriented and schema free.

Asciidoctor is a pure Ruby processor for converting AsciiDoc source files and strings into HTML 5, DocBook 4.5 and other formats. Apart of AsciidoctorRuby part, there is an Asciidoctor-java-integration project which let us call Asciidoctor functions from Java without noticing that Ruby code is being executed.

In this post we are going to see how we can use Elasticsearch over AsciiDoc documents to make them searchable by their header information or by their content.

Basically we are building the json document by calling startObject methods to start a new object, field method to add new fields, and startArray to start an array. Then this builder will be used to render the equivalent object in json format. Notice that we are using readDocumentHeader method from Asciidoctor class which returns header attributes from AsciiDoc file without reading and rendering the whole document. And finally content field is set with all document content.

And now we are ready to start indexing documents. Note that populateData method receives as parameter a Client object. This object is from ElasticsearchJava API and represents a connection to Elasticsearch database.

It is important to note that the first part of the algorithm is converting all our AsciiDoc files (in our case two) to XContentBuilder instances by using previous converter class and the method convert of Lambdaj project.

Next part is inserting documents inside one index. This is done by using prepareIndex method, which requires an index name (docs), an index type (asciidoctor), and the id of the document being inserted. Then we call setSource method which transforms the XContentBuilder object to json, and finally by calling execute().actionGet(), data is sent to database.

The final step is only required because we are using an embedded instance of Elasticsearch (in production this part should not be required), which refresh the indexes by calling refresh method.

After that point we can start querying Elasticsearch for retrieving information from our AsciiDoc documents.

Let’s start with very simple example, which returns all documents inserted:

Note that I am searching for field author the string Alex Soto, which returns only one. The other document is written by Jason. But it is interesting to say that if you search for Alexander Soto, the same document will be returned; Elasticsearch is smart enough to know that Alex and Alexander are very similar names so it returns the document too.

More queries, how about finding documents written by someone who is called Alex, but not Soto.

Note that in this case we are printing the AsciiDoc content through console, but you could use asciidoctor.render(String content, Options options) method to render the content into required format.

So in this post we have seen how to index documents using Elasticsearch, how to get some important information from AsciiDoc files using Asciidoctor-java-integration project, and finally how to execute some queries to inserted documents. Of course there are more kind of queries in Elasticsearch, but the intend of this post wasn’t to explore all possibilities of Elasticsearch.

Also as corollary, note how important it is using AsciiDoc format for writing your documents. Without much effort you can build a search engine for your documentation. On the other side, imagine all code that would be required to implement the same using any proprietary binary format like Microsoft Word. So we have shown another reason to use AsciiDoc instead of other formats.

Newsletter

Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

Email address:

Recent Jobs

No job listings found.

Join Us

With 1,240,600 monthly unique visitors and over 500 authors we are placed among the top Java related sites around. Constantly being on the lookout for partners; we encourage you to join us. So If you have a blog with unique and interesting content then you should check out our JCG partners program. You can also be a guest writer for Java Code Geeks and hone your writing skills!

Disclaimer

All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners. Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. Examples Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.