Extend the JDK Classes with Jakarta Commons, Part II

This second installment of a three-part series further explores components in Jakarta Commons and presents real world examples to demonstrate how you can use them in your projects.

by Narayanan A.R.

Nov 11, 2005

Page 1 of 5

akarta Commons, the set of reusable classes that various Jakarta projects use, are available as separate components, which you can use in your Java projects. This article is the second in a three-part series exploring various Jakarta Commons components and demonstrating them in real-world sample applications. (Click here to read Part I.) The examples don't only illustrate the Commons components; they are complete, modular applications that highlight the useful features you can reuse in your Java projects.

In particular, this installment explores the following components:

Codec

DBCP

DBUtils

Email

i18n

The article also includes complete source code. Extract this zip file to a local drive and run it by launching the test cases with JUnit for each of the examples.

Author Note: A basic knowledge of object-oriented programming (OOP), the Gang of Four design patterns (Strategy and Decorator), and a few J2EE patterns (DAO) will be very helpful for understanding the Commons components architecture and the examples presented here.

Codec

Commons Codec contains some general encoding/decoding algorithms, including phonetic encoders, Hex and Base64 encoders, and a URL encoder. The phonetic encoders are language encoders, which are useful in applications such as search engines, spell-check functions, and digital dictionaries. Hex and Base64 encoders are useful in applications that use characters to represent binary data. The URL encoder comes with more features and is considered a replacement for the JDK classes URLEncoder and URLDecoder.

This component also contains the DigestUtils class, which is useful for creating SHA and MD5 digest. The next section shows how to use these classes in real world examples.

Language Encoders

Phonetic algorithms are used to determine words that sound similar. A very good example is a word processing application that suggests alternatives for a typed word. The Commons Codec contains four classes: Soundex, Metaphone, RefinedSoundex, and DoubleMetaphone. Each class uses a separate algorithm to determine whether a word sounds similar to another. Their algorithm descriptions indicate that Metaphone is more accurate than Soundex.

The first example application uses the Soundex class to determine similar words for a misspelled word. It uses the Strategy design pattern to choose among the algorithms, which also enables you to modify the application to support algorithms in the other three classes. (The application classes can be found in the package in.co.narayanan.commons.codec in the src folder of the source code.)

Thewords.txt file contains a small list of words. The Words class abstracts the loading of the word list from the file and adheres to the IWords interface. WordsAssistant is the entry point class for the application. It determines similar words using one of the Soundex algorithms and depends on the IWords interface to access the words. Listing 1 is the implementation for the getSimilarWords method in the WordsAssistant class. It picks a strategy from the SoundexStrategy class and iterates the words to determine a match. It determines the match by calling the isSimilar method in the ISimilarWordStrategy interface. Then it adds the matching words to a list that it returns to the caller.

The classes SoundexStrategy and CharDiffStrategy provide the implementation to the ISimilarWordStrategy interface and use the Commons Soundex class. Listing 2 is the definition of the ISimilarWordStrategy interface. You can plug new strategies into the sample application by implementing the isSimilar method in this interface.

In Listing 3, the method soundex in class org.apache.commons.codec.language.Soundex determines the sound similarity between words. This method returns a code that will be the same for similar words and then compares them to decide whether the words are similar. For instance, the code is A515 for the words 'compont', 'component', and 'compenent'.

In Listing 4, the method difference in class org.apache.commons.codec.language.Soundex returns a number between 0 and 4, where 4 is the best match and 0 the worst. This example sets the pivot to 2. The JUnit test case class TestLanguageEncoders invokes the main class method getSimilarWords to demonstrate the application.

Binary Encoders

Binary encoders are useful for transmitting binary data in ASCII form. For instance, if an image needs to be attached to a digital business card stored in XML, a binary encoder can encode the image binary data using one of the algorithms and add it to the XML file in a separate tag.

The package org.apache.commons.codec.binary contains classes Base64, BinaryCodec, and Hex, each representing a unique way of encoding binary data. The sample application demonstrates using the Base64 algorithm to encode a binary file and store it in XML. The XML file contains metadata that describes the data in name/value pair form, which makes it searchable.

The class in.co.narayanan.commons.codec.WrapIt is the only class this example uses. It encodes the binary file and creates XML content along with the metadata details. Listing 5 shows a sample XML generated by this class.

The encoded binary data is enclosed in the <binary> tag. The metadata is represented as a series of <entry> tags. Encoding the binary content and storing it in an XML file simplifies its transmission through the Internet. For instance, a Visa application form in XML sent for processing to the Web service layer can carry the image and other binary contents such as résumés, scanned experience letters, and degree certificates.

Listing 6 is a code snippet from the WrapIt class, which does the actual encoding by reading 1,024 bytes at a time. The code calls the truncateBytes method only once for the last set of data read from the file. The encodeBase64Chunked static method breaks the encoded content into 76-character blocks to make it more human-readable. The metadata allows the XML-formatted data to be searchable.

URL Encoder

The class org.apache.commons.codec.net.URLCodec implements the 'www-form-urlencoded' encoding scheme for a string, object, or array of bytes. This class is different from the JDK URLEncoder class for the following reasons:

It can perform encoding and decoding for a given character set.

In addition to strings, it works for objects and arrays of bytes as well.

No examples are included to illustrate this class usage because the URLCodec javadoc is self-explanatory and very straightforward.