AdSense

Friday, 10 January 2014

Types of data

We so much talk about data all the time - do we all know in how many types data can be categorized based on its structure?
I am trying to compile a similar categorization for data around which ETI, BI, BigData & DataAnalytics have evolved
Structured data
This is the base of all the database systems which have dominated the
market of the ETL and BI industry for so many years. The structured data
refers mainly to the relational database where all the key structures
and associations are well defined and also all dimensional data is
properly associated with facts.
Semi-structured data
Data in the form of excel sheets, presentations, etc which can be used
as structured data to some extend for analysis but automation for direct
access is not so easy. It basically would need to be somehow turned
into structured data and then analysed.
Syndicated data
Home away, Thompson Reuters, etc. which provide special data in their
own formats for analysis are bucketed under the category of Syndicated
data.
Unstructured data
Logs from systems and devices, social media data such as twitter and
facebook. Even though one may have a feel to turn this data into
structured data for analysis, the volume and frequency of unstructured
data makes it nearly impossible to be converted to structured data for
analysis in terms of feasibility and impact. So, unstructured data is
analysed using different methods and tools.

The one that will soon be published on BabaGyan.com uses Python and R language in combination with Php, MySql and HTML5 too.