Monday, September 7, 2009

Data vs. Information

Records in a table typically constitute data. Tables, joined together, in a view, tend to turn that data into information.

That elicited a very, very strong reaction from a good friend and mentor. In the comments he left this

Turn data into information? That doesn't make a whole lot of sense to me-- All data is information. Can you clarify that statement a little?

On the face of it, that's not a very strong reaction. He tends to be a lurker though, rarely leaving comments.

Then there was twitter, where he sent me a few more links on the subject.

I'm pretty sure he was fired up.

Once a week or so, we'll get together over beers and have excellent conversations. Occasionally, I'll try to hold my ground from the database perspective. Last week we had a discussion about whether the database should be making web service calls.

Security aside, I thought it was appropriate given the size and skills of the shop, but he and our other friend staunchly disagreed.

Point is, we have some great conversations. It has never come down to "You are stupid!" or anything like that, it's a conversation with each side presenting their arguments.

Since my friend has like 28 degrees in Engineering, I've learned to give him the benefit of the doubt, so I wanted to study up on it.

My contention, or what I have heard and read, is that a database stores data, only through the use of SQL or some reporting tool, does that data get turned into information. I don't know where I heard or read that for the first time, but I've probably been saying it for years.

Through my friends response and others on the mailing list, I probably need to rethink that particular statement.

Here are some relevant links provided by my friend and others on the oracle-l mailing list:

Have you ever used the phrase, "data into information" or some derivation there of? I'd like to track down where I first came across it if possible. Thoughts on Data vs. Information as separate entities?

I'd say there is an implication that information is something more than that. Usefulness, meaning or understanding.

We store data and use information. Maybe the act of accessing or processing data makes it information.

I think there's a lot overlap. My understanding of your comment about views is that, by linking tables together in a view you are adding some interpretation to the facts. The same could be said of a DECODE that translates state abbreviations to long names (we only have half a dozen states in Australia, so can do that sort of logic in a decode).

Of course metadata like constraints and even datatypes adds some meaning to data, so in practical terms I don't thinks there's a line to draw between the two.

I think that there's this idea that, if you look at an entire record, in a well designed database, that is data which is also information.If you look at a single value of that record though...say the column is QUANTITY and value '15', that is not realy information.What does it mean?Data becomes information when given a context.

I'm not sure what the pure theory is, but in every database I've ever worked on, you definitely transform data into information.

When I think of data, I think of the actual bits that are stored/transmitted, vs information, which is the meaning of that data. For example, a zipped text file has fewer bits of data than its uncompressed version, but contains the exact same amount of information. You can measure bits of information as the base 2 logarithm of the inverse of its probability. In other words, if I tell you something that has a 50% chance of occuring, I have conveyed one bit of information (and would probably use one bit of data to do so).

As a contrived example, lets say we have a field that contains 1 bit, and we interpret this field as "if it is set to 1, the lottery numbers for this day were 45, 34, 2, etc". If it is set to 0, the lottery numbers for the day were something else. For those rare days that it IS set to 1, this field contains over 27 bits of information (assuming any particular sequence is a 200 million to 1 shot). When it is set to 0, it is obviously conveying almost NO information at all (in this case, about .000000007 bits)