A place to discuss topics related to Cloud, Big Data, Government, Marketing, working at big companies, working at small companies, traffic in Tyson's Corner, Business Travel, Not so Business Travel, Music, Books, Movies and anything else that might strike my fancy!

Thursday, May 31, 2012

Rehashing a point of view

Being the 'new guy on the block' with regards to the storage industry, I've given myself the license to wonder about everything, open old wounds and simply ponder the question, 'why'. This is not a new place for me, I've been curious about how everything works and why things are as they are ever since I was a little kid burning holes in my Mom's carpet. Um, little kids don't know the difference between AC and DC power. So, in my mind, a battery operated electric motor would run REAL fast if you hooked the wires to an extension cord. Oops! To this day, I don't know how I survived my curiosity as a child. But, I digress.

Most recently, I've been chewing on the notion of structured vs. unstructured data. For years, I've had a notion of what I thought constituted a structured data and all else was, by definition, unstructured. Right? Admittedly, my parameters were pretty simple. Any data stored in a database was considered structured making any data stored in some other format (can you say files?) unstructured.

But, is this really an accurate way to think of information? I figured I'm not smart enough to be the first person to ever consider this so I hit Google. Not surprisingly, I found a number of relevant hits, from blog entries to academic papers, on the subject. Really? Academic papers? Ok.

Anyway, after reading and digesting I've come to the conclusion that characterizing data as structured or unstructured is more relevant to the context of who or what is attempting to use it. For example, information stored in a database is most certainly considered structured to another computer application, yet, showing raw data in it's table format to a person, especially a non-technologist, would most likely prove to be confusing. On the other hand, a human being sitting down to read the most current corporate memo could easily argue that the information is HIGHLY structured, yet, it may not necessarily be as apparent to a computer.

So what's my point to all this. Simple. In general, 'structured vs. unstructured' is a false comparison unless it is applied to a specific point of view. What most people REALLY mean when they say 'structured vs. unstructured' is 'information stored in a database vs. information stored in a file'.

So back to being the 'new guy on the block', this seems to have some interesting implications with regards to discussing storage solutions. When a vendor says they are good with unstructured data, do they really mean information stored in files? I suspect I know the answer.