Using Barcode drawer for Software Control to generate, create Barcode image in Software applications.

www.OnBarcode.com

Create a LucenePDFDocument instance B, then grab the PDF document by opening an InputStream on it C. Once you have the stream, creating the Lucene document is one step away D. Listing 13.11 shows the output of the code from listing 13.10. If you compare listing 13.9 with listing 13.10, you ll see that less coding was necessary to produce essentially identical results. One more thing to cover before we move on to the next topic is the statement we made concerning the Contents field in table 13.1. This field is not stored in the usually easily accessible Hit.get(fieldname) format we re used to. It s stored as a Java StringBuffer object. Examining listing 13.10 at E, you can see that the Contents field is still accessible albeit not quite as easily as with the get(fieldname) method.

Using Barcode generation for Font Control to generate, create GS1 - 13 image in Font applications.

www.OnBarcode.com

Making use of third-party contributions

Subject - Testing PDFBox's LucenePDFDocument.class Title - file1

summary - Keanu Reeves is completely wooden in this romantic misfire by Alfonso Arau (Like Water for Chocolate). Reeves plays a World War II vet who hits the road as a traveling salesman and agrees to help a desperate, pregnant woman (Aitana Sanchez Gijon)

So which one of these methods should you use to index your documents That depends on how lazy or how much of a control freak you are. Seriously, the degree of control you need, the amount of time you have to accomplish what you need to do, and many more factors dictate which method you employ. Ultimately you must decide. It s time to move on to another document format. Like it or not, Microsoft document formats are ubiquitous in today s world. Knowing how to get at their content and being able to put it into an index is a critical skill. Let s see how we can achieve this.

13.2.2 Indexing Microsoft Word files with POI

The Apache POI Project exists to create and maintain pure Java APIs for manipulating various file formats based on Microsoft s OLE 2 Compound Document format. In short, it allows you to read and write MS Excel files using Java. As we ll show with example code, you can also read and extract text from Microsoft Word documents. The project is located at http://poi.apache.org/. Here are the different APIs and the application they are tied to: A set of pure Java APIs for reading and writing OLE 2 Compound Document formats HSSF APIs for reading and writing Microsoft Excel 97 (Windows XP) spreadsheets HWPF APIs for reading and writing Microsoft Word 97 (Windows XP) documents HSLF APIs for reading and writing Microsoft PowerPoint 97 (Windows XP) documents HDGF APIs for reading and writing Microsoft Visio documents HPSF APIs for reading MFC property sets POI welcomes anyone who is willing to help with the project, because a lot of work remains to be done. The developers could use help in all aspects, including bug reports, feature requests, and, just like every other project, documentation. If you re interested, join their mailing lists at http://poi.apache.org/mailinglists.html and make yourself known.