Personal Intelligence or...

Friday, 11 May 2012

Hope the last post had provided you with an understanding of what is big
data and what it can do. let us see say what are the data problems we may face
to move ahead in this new technology.

The major problem in Big data Implementation are

Processing the un-structured and semi-structured data

Deciphering the information form unstructured or semi structured
data

what is unstructured or semi structure data?

In simple terms any data elements that can be stored in rows and columns in a
database are called structured data. If it can't be stored in the rows/columns
and to be stored as BLOB's (Binary Large Objects) they are called as
unstructured or semi structured data.
(Note: Yet the science could not clearly define the
unstructured or semi structured data. But this is the base line which the
science group is working upon).

what are the credit card types having minimum card limit of 5000 $?
etc...

These questions can be answered by querying the structured data with specific
inputs .From technical stand point we were able to retrieve the information
directly by writing simple queries.

Let us consider a scenario in unstructured data that we want to analyze "how
many people searched / looked for credit cards with maximum limit of 5000 $ ?"
and let us consider these are the search parameters that has been done at our
website

card limit 5k

card limit 5000

card limit 5000 cad

card limit five thousand

card limit five thousand canadian dollars

limit five thousand dollars

credit cards + 5000

5000

First problem is understanding the unstructured data.
How can we conclude that the searches are made for credit cards of limit 5000
dollars ?
Example :

Search parameter no 6 ("limit five thousand dollars") , the
user may be searching for the saving account where minimum balance should be
5000 limit or the user may be looking for investments with limit of 5000.

Search parameter no 8 ("5000") this parameter is too
vague to co-relate it to the credit card

If we ignore this data understanding problem and consider all the searches
were looking for credit cards having limit of 5000 dollars.
what will be my search parameters ? How the typical query has to be
structured ? etc...
currently we have to make lot of assumptions to derive an information from un
/ semi structured data.

One approach that I can think of to tackle this problem is, to capture the
metadata of the search. By co-relating the search parameters with the metadata
of the search we can come to certain conclusion.
Ex: which page the search was made ?
If the search parameter were made on credit card page then we can
come to conclusion the user is looking for credit cards.
On this approach also, How to correlate the metadata with the actual data
element captured from the user is another problem ?

Conclusion:
Big data is like an gold mine. we will have to
process huge set of data to get useful information for the business to help
them for decision making.Yet this technology is in its infant stage.By the
growth of could computing, parallel processing technologies BIG data will be a
reality in near future. The useful ness of BIG data is highly seen in the field
of personal
business intelligence, Health care industry, marketing industry."To get one ounce gold we have to
process 33 tons of rock, same goes to big data"