Polyglotted IO2015-08-17T09:16:26+00:00http://polyglotted.io/Shankar Vasudevanshankar@polyglotted.ioXPath-Stax parser is now live in Maven Central Repository2012-06-30T00:00:00+00:00http://polyglotted.io/2012/06/30/xpath-stax-now-live<h1>XPath-Stax parser is now live in Maven Central Repository</h1>
<p class="meta">30 Jun 2012 - London</p>
<p><a href="http://www.polyglotted.org">Polyglotted.org</a> is dedicated to solving big data issues
and one of the recent concerns we faced was to process thousands of large XML
files processed by different batch jobs.</p>
<p>Since the data was processed by support personnel, we had to allow
<a href="http://www.w3.org/TR/xpath/">XPath</a> expressions for building easier jobs. Naturally
the concern was to efficiently parse the large files (typically 3-5 GB) in size.
Stackoverflow search led us to this blog post on <a href="http://andreas.haufler.info/2012/01/conveniently-processing-large-xml-files.html">Conveniently processing large xml
files</a>
and we liked the approach and decided to use the source.</p>
<p>Our jobs had specific performance expectations as to run within 30 minutes of time
and in a constrained environment with only 4GB of RAM available. The servers
were beefy machines with 24 cores, so we could easily parallelize our application.
Given the default behavior of SAX parsing model to be not safe for multiple thread
access, we had to resolve to STAX parsing. However when we processed our files,
we started running out of memory and the application could not cope up.</p>
<p>The main issue with the above source is that it still creates a lot of DOM objects
which are really heavy weight when processing multiple thousands of them. We had to
write our own implementation, which is light weight and uses little to no memory
for processing. Next came the requirement from a different sources to be map our
returned data into simple POJOs and we decided to map them to existing
<a href="http://www.oracle.com/technetwork/articles/javase/index-140168.html">JAXB</a> bindings.</p>
<p>Both the base parser and the bindings are limited in scope, as they do not have
access to the entire document in memory. However the code is usable and so we decided
to release them to the open source.</p>
<p>Documentation for xpath-stax can be found at <a href="http://www.polyglotted.org/xpath-stax">Polyglotted.org</a></p>
<p>xpath-stax-1.0.0 is now live at <a href="http://search.maven.org/#browse%7C-1206644886">maven repository</a></p>
<p>You can find the source code for the library at <a href="https://github.com/polyglotted/xpath-stax">github</a></p>
<p>Please feel free to comment on the library and contribute changes</p>
Introducing Polyglotted2012-06-10T00:00:00+00:00http://polyglotted.io/2012/06/10/introducing-polyglotted<h1>Introducing Polyglotted</h1>
<p class="meta">10 Jun 2012 - London</p>
<p>My name is Shankar Vasudevan, a developer and architect, working for a leading
investment bank within the London city. I have been architecting large scale
enterprise applications for the past 15 years. You can visit my <a href="http://uk.linkedin.com/in/vshank77">professional
profile</a> on LinkedIn or follow me
<a href="http://twitter.com/#!/vshank77">@vshank77</a> on Twitter.</p>
<p>Over the past 15 years, I have grown a great fascination for data storage and
analysis. The advent of Big Data and NoSQL in the recent years have got me
drooling and my professional work in the fields of information retrieval,
analytical processing, adword classification and scoring, enterprise search
have added fuel to the fire. More recently I have become a true believer of
<a href="http://martinfowler.com/bliki/PolyglotPersistence.html">Polyglot Persistence</a>
a keyword coined by <a href="www.martinfowler.com">Martin Fowler</a> and apply the
philosophy in day-to-day job.</p>
<p>Polyglotted.org is to give back to the community, projects that I conceive
in the area of information analysis and polyglot persistence.</p>
<p>If you are still interested, Martin Fowler has an interesting introduction
to <a href="http://martinfowler.com/%0Aarticles/nosql-intro.pdf">NoSQL Databases and Polyglot Persistence</a></p>