Sunday, January 12, 2014

Big Data Revolution and Vision ........!!!

Big Data is THE biggest buzzwords around at the moment and definitely big data will change the world.Big Data refers to data sets that are too large to be processed and analyzed by traditional IT technologies.The Big Data Universe is changing right before our eyes and beginning to explode.Big data absolutely has the potential to change the way governments,
organizations, and academic institutions conduct business and make
discoveries, and its likely to change how everyone lives their
day-to-day lives.In the next five years, we’ll generate more data as humankind than we generated in the previous 5,000 years ...!!! Records and data exist in electronic digital form generated by mobile
communications to surveillance cameras to emails to web sites to transaction
receipts; it can combine daily news, social media feeds and videos.

What is big data?Every day, we create 2.5 quintillion bytes of data — so much that 90%
of the data in the world today has been created in the last two years
alone. This data comes from everywhere: sensors used to gather climate
information, posts to social media sites, digital pictures and videos,
purchase transaction records, and cell phone GPS signals to name a few.
This data is big data. Gartner defines Big Data as high volume, velocity and variety
information assets that demand cost-effective, innovative forms of
information processing for enhanced insight and decision making. According to IBM, 80% of data captured today is unstructured, from
sensors used to gather climate information, posts to social media
sites, digital pictures and videos, purchase transaction records, and
cell phone GPS signals, to name a few. All of this unstructured data
is Big Data.

In other words, Big
data is the term for a collection of data sets so large and complex
that it becomes difficult to process using on-hand database management
tools or traditional data processing applications. The challenges
include capture, curation, storage,search, sharing, transfer, analysis
and visualization. The trend to larger data sets is due to the
additional information (VALUE) derivable from analysis of a single large
set of related data allowing correlations to be found to "spot business
trends, determine quality of research, prevent diseases, link legal
citations, combat crime, and determine real-time roadway traffic
conditions.

What does Hadoop solve?

Organizations are discovering that important predictions can be made by sorting through and analyzing Big Data.

However, since 80% of this data is "unstructured", it must be
formatted (or structured) in a way that makes it suitable for data
mining and subsequent analysis.

Hadoop is the core platform for structuring Big Data, and solves the problem of making it useful for analytics purposes.

In
2004, Google published a paper on a process called MapReduce that used
such an architecture. MapReduce framework provides a parallel processing
model and associated implementation to process huge amount of data.
With MapReduce, queries are split and distributed across parallel nodes
and processed in parallel (the Map step). The results are then gathered
and delivered (the Reduce step). The framework was incredibly
successful, so others wanted to replicate the algorithm. Therefore, an
implementation of MapReduce framework was adopted by an Apache open
source project named Hadoop. Click here to download :MapReduce: Simpli ed Data Processing on Large Clusters by Jeffrey Dean and Sanjay Ghemawat.

Big data spans four dimensions -The 4
Vs that characterize big data:

Volume – the vast amounts of data generated every second -Example: terabytes, Records, Transactions,Tables and files

Velocity – the speed at which new data is generated and moves around
(credit card fraud detection is a good example where millions of
transactions are checked for unusual patterns in almost real time) -Example: Batch , Near time,Real time and Streams

Variety – the increasingly different types of data (from financial
data to social media feeds, from photos to sensor data, from video
capture to voice recordings)-Example : structured, unstructured, semi structured and all 3 types.

Veracity – the messiness of the data (just think of Twitter posts with hash tags, abbreviations, typos and colloquial speech)

How the Big Data Explosion Is Changing the World ?

Big data is the term increasingly used to describe the process of
applying serious computing power – the latest in machine learning and
artificial intelligence – to seriously massive and often highly complex
sets of information. Big data can be comparing
utility costs with meteorological data to spot trends and
inefficiencies. Big data can be comparing ambulance GPS information with
hospital records on patient outcomes to determine the correlation
between response time and survival and can also be the tiny
device you wear to track your movement, calories and sleep to track your
own personal health and fitness. Our daily lives generate an enormous collection of data.Whether you’re surfing the Web, shopping at the store, driving your
smart car around town, boarding an airplane, visiting a doctor,
attending class at university, each day you are generating a variety of
data.The benefit of the data depends on where and to whom you’re talking
to - a lot of the ultimate potential is in the ability to
discover potential connections, and to predict potential outcomes in a
way that wasn’t really possible before.With more data than ever available in digital form, progressively
inexpensive data storage, and more advanced computers at the ready to
help process and analyze it all.Companies believe that big data has the power to drive practical
insights that just weren’t possible before. It’s about
managing all that data and providing tools that enable everyone to
answers questions– questions they might not have even known they had. IBM CEO Ginni Rometty says big data
and predictive decisions will reshape organizations, and computers that
learn, like Watson, will be tech's next big wave. Its a vision of future .A hospital uses rapid gene sequencing to stop an outbreak of antibiotic
resistant bacteria, saving lives. A railroad company gets an alert from a
train’s sensor that a preventative fix is needed, saving the cost and
time of removing the train from the tracks later. A university notices a
student’s activity level has started to drop to a level consistent with
dropouts, and reaches out to assist.

Classic UseCases and its implementation in real-time scenarios : ----------------------------------------------------------------------------1) Retailers can exploit the data to track sales and consumer behavior,
in store and online;

2) Health professionals and epidemiologists trying to
predict the spread of disease combine data from health services, border
agencies and a variety of other sources.

4) The finance
sector seeks to exploit one of the most valuable mother lodes of data
through powerful tools that can make sense of patterns in news, trading
activities and other more esoteric sources.

5) India’s Unique identification project [Aadhaar project], spearheaded by NandanNilekani,
will collect and process billions of data, to provide identification
for each resident across the country and would be used primarily as the
basis for efficient delivery of welfare services. It would also act as a
tool for effective monitoring of various programs and schemes of the
Government.

7) Predicting a crime -Chicago Designing Predictive Software Platform to Identify Crime Patterns. Beyond the public safety uses, the platform could also help officials
make better decisions for city services like restaurant inspections,
snow plowing or garbage delivery.........etc !!!

Data scientists are building specialized systems that can read through billions of bits of data, analyze them via self-learning algorithms and package the insights for
immediate use.------------------------------------------

In the next few years millions of big data-related IT jobs will be
created worldwide and
there is a major shortage of the “analytical and managerial talent
necessary to make the most of big data.The United States alone faces a
shortage of more than 140,000 workers with big data skills as well as
up to 1.5 million managers and analysts needed to analyze and make
decisions based on big data findings. ---------------------------------------------------------------------Click here - Overview of apache Hadoop Click here - Watson - Era of cognitive Computing