Java StAX

XML processing is an important weapon in a programmer’s armour. It is not something new and still growing popular by each day. Android embraces it. Every emerging technology incorporates XML. Java supports XML by providing JAXP and it has multiple ways for xml processing and they are SAX, DOM, TrAX, StAX and JAXB.

In this article let us learn about what is StAX API, implementations available and how we can use it. StAX is a streaming pull-parsing java API for reading and writing XMLs and it is an alternate to SAX, DOM and TrAX. StAX API is a set of interfaces providing a framework for bi-directional parser. It provides two sets of APIs, cursor and iterator based.

Important Note: StAX API is set of interfaces only and there is no implementation. There are different implementations available like stax.codehaus, sjsxp, and woodstox. Since there is no implementation with JDK, we need to choose a StAX implementation, download its jar and add it to our classpath to use StAX API for reading and writing XMLs. I am going to use Sun/Oracle’s StAX implementation that comes bundled in Java EE jar (Java EE 6 API Library: javaee-api-6.0.jar). This is available as part of glassfish server or you can download it and use with Eclipse.

SAX, DOM and StAX

SAX is a uni-directional, read only API and follows push model for reading. Read below to know about push reading. SAX and StAX are relatives compared to DOM as DOM uses a completely different approach for XML processing.

DOM parser is created using the concept of trees. XML document object model will be completely constructed as a tree and stored in memory. Then the XML document can be parsed by traversing the tree. This requires lot of memory and processing power. When working with small documents are fine with this kind of DOM processing but when you have a long document and then we will have performance issues.

StAX follows a streaming model, it can both read and write. Imagine feeding a whole XML document via a tube. At every moment one XML element will be the focus and then we move on to the next element either in forward or backward direction of our choice. This is kind of processing is not something new for us, we have seen ResultSet of JDBC API. Streaming has its advantage when we want to process large documents sequentially. Irrespective of size of the document the performance will be good. When mobile phones and apps are getting popular we also need to think of processing in terms of smaller scale compared to desktops and servers.

Example scenario for putting best use of StAX are parsing WSDL in web services, viewing relational database data stored as XML documents and in general parsing predictable document scenarios.

Pull and Push XML Parsing

When processing a XML document there are three parties involved. Party one is XML document, two is the API doing the processing and the final party is the client code which uses the API and gets data from the document.

Pull: Client => Parser => XML

Push: Client <= Parser => XML

In pull parsing client code calls the parsing API’s methods to get data and then the parser reads the XML and returns the data required. This is on demand, when client needs it it reads the data.

In push parsing the parser reads the XML document and whenever an event occurs, it pushes the respective data to the client and continues. It is like maintaining a birthday alarm. We register for alarm on a particular date, the alerter keeps running against time and alerts us when it encounters the date.

StAX Cursor API

Cursor API is very similar to JDBC ResultSet in terms of going through the XML. It always moves forward and once in goes past an element, it cannot reach back again. Main interfaces for cursor are XMLStreamReader and XMLStreamWriter and we can see about them below. Cursor API is very similar to SAX as it is unidirectional and it can read properties, shares a light weight memory foot print.

StAX Iterator API

Iterator API parses the XML document and returns event objects. These are events for element, text, comment etc. This is similar to the java iterator in collections. Main interfaces in iterator api are XMLEventReader and XMLEventWriter. Base of the iteratore interface is XMLEvent. nextEvent() is the key method which returns the next event in XML stream. This is similar to next in iterator collection. When I say events, we have doubt on exactly what are those events. To clear the doubt following are events types StartDocument, StartElement, EndElement, Characters, EntityReference, ProcessingInstruction, Comment, Attribute, Namespace.

hi joe,
how i read a file inside a servlet upto 1 hour
while server is running forever.
but file reading only one hour. please send me sone logic. sir please in every blog you give the task related some R &D purpose like above Question because we learn every day something new in core java and servlet,jsp,struts,hibernate,spring and all JEE technology…………
this is very helpful for every jobseeker and developer…………i always read your blog .it is really very helpful

I’m a Tibco developer and currently facing a problem with Large XML file. We tried to use the default parser(DOM) in Tibco but it is taking lot of time and sometimes it is throwing OOM exception. So after googling we found STax is a better API to solve this. Our requirement is to chunck the large file based on the threshold value provide(say for example if we have large xml with 20,000 elements,the code should split into multiple xml files each containing 1000 elements). So 1000 is the threshold value. Can you please provide a sample code using STax or guide me please.

Hello Joe, I have a huge XML file want to parse it using STAX and then commit the data from my POJO files to database. Now, do we have an approach to parse files in chunks and commit. Can you please let me know on this or share a sample snippet? Thank you.

Joe.. Article was good but you haven’t covered much.
One complete example for each API (Iterator/Cursor)would have been better. You haven’t covered Filtering in StAX, hoping you will include this part in near future.