20110620

I'm quite excited to announce the first release of a brand new eobjects.org project: SassyReader. SassyReader is in my oppinion in deed something sassy as it fills a gap that has long existed in open source applications that deals with data management (ETL tools, tools like DataCleaner and the like). SassyReader is a library for reading data in the sas7bdat format, aka. the format that the SAS statistical software use! It is written entirely in Java and reads the files from their binary format (eg. it's not a connector to the SAS system, but a reader of the raw data).

So why is this important? Well first of all because it is very difficult to create systems that interoperate with SAS. SAS does ship a JDBC driver but it's compliancy with JDBC is actually very limited. Even creating a connection will typically require use of SAS's proprietary classes, so you cannot go the standards JDBC way. There is also no JDBC metadata support and you need to set up a server-side SAS/SHARE option to even expose the connection. Furthermore this is an add-on product from SAS which costs additional money if you're just a base SAS user. So doing trivial things like connecting and querying a data set requires a lot of work and money. In my oppinion this is poor practice - a legacy way of trying to lock people in to using only a particular brand of software, simply because interoperability is a big pain.

All in all I see a great benefit in a project like SassyReader for those who simply want a way of reading the data that is stored in SAS files.

I cannot take a whole lot of credit for this project though. Most of the really challenging stuff was created by Matt Shotwell, aka. BioStatMatt, who founded the sas7bdat project which is written in R. My contribution was to port it to Java and fix a few issues on the way. Matt put together a lot of fractioned works that describe various findings about the sas7bdat format. In other words this is a completely reverse engineered library, based on analysis of actual sas7bdat files. During the last months we've had a good conversation going and actually fixing some of the remaining issues in parallel and bringing additions to each other's code.

Today we've released version 0.1 of SassyReader. It's not yet ready for mission critical use as there are still quirks in the format that we haven't figured out. Also there are different shapes and sizes within the format that vary apparently depending on (I'm a bit guessing here) the amount of columns and the operating system that the file was written with. The good thing is that we have a quite extensive test set and for at least the files that I had lying around that I wanted to work with the reader managed to read all but one (11 out of 12)!

SQIAR (http://www.sqiar.com/solutions/technology/tableau) is a leading global consultancy which provides innovative business intelligence services to small and medium size (SMEs) businesses. Our agile approach provides organizations with breakthrough insights and powerful data visualizations to rapidly analyse multiple aspects of their business in perspectives that matter most.

SAS and all other SAS Institute Inc product or service names are registered trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.technology gadgets and gizmos

great and very unique solution, Kaspar! I have a question, the code worked for some demo sas7bdat files. Now I got some sas files from our IT and it seems that your code does not work on them. The result of int subhCount = IO.readInt(pageData, 20); is always 0. Any idea, what could be the cause?

TBH I don't know what that could mean. But I suggest then to raise it as a bug (and please include as much as you can in terms of description - maybe even a sample file if you can) on https://github.com/datacleaner/metamodel_extras