–Big Data 2011 Preview

During the 2011 National Football League (NFL) playoff TV broadcasts — amid commercials with Anheuser-Busch Clydesdales and auto racing driver Danica Patrick — an ad appeared with an IBM researcher talking about data analytics. In the IBM TV ad, Dr. David Ferrucci discusses how an IBM Watson supercomputer competes in a Jeopardy! game by integrating analytics, natural language capabilities and rapid search of disparate data.

While at first glance NFL TV broadcasts may seem an unusual forum for a discussion of data analytics, Big Data offers important tools for enterprises of all sizes to improve operational efficiencies, grow revenues, and empower new business models.

What constitutes Big Data varies by organization: Large enterprises are beginning to grapple with multiple petabytes, while for a small or mid-size enterprise, growth to 10s of terabytes or more can create challenges for data management and analytics. There’s more complexity too, with the proliferation of disparate data sources including machine to machine data, social media and electronic healthcare records.

For healthcare provider Kaiser Permanente and its more than 8 million members, Big Data is about improving the quality of care and reducing costs. Using Kaiser’s HealthConnect electronic healthcare records and decision-support software, doctors and nurses can view the patient’s complete history including lab test results, prescriptions, diagnosis, treatment, demographics, medical plan and payment records. Further, patients can avoid unnecessary trips to the hospital by exchanging emails with their doctor and ordering prescription refills online.

The simplicity of Kaiser’s HealthConnect web interface masks the complexity of an extensive Big Data infrastructure. Inpatient, outpatient, pharmacy, finance, cost management and other groups at Kaiser all access patients’ electronic healthcare records, with appropriate role and group based security controls.

application and service development through a service oriented architecture (SOA);

Oracle 9i/10g, SQL Server and Teradata databases;

Informatica PowerCenter for data integration; and

data center outsourcing services from IBM.

For most enterprises and public sector organizations, the focus is the “right tool for the job”, which can include any number of different combinations among business intelligence software; R and other open source analytics tools; spreadsheets; relational databases; Hadoop; operational data stores; column stores; and document-oriented databases.

Hadoop/MapReduce, together with other related Apache open source projects, has moved past test/development to become a viable extension or alternative to traditional relational databases. For example, at LinkedIn, they use a combination of Hadoop to process massive batch workloads, Project Voldemort for a NoSQL key/value storage engine, and the Azkaban open-source workflow system to empower large-scale data computations of more than 100 billion relationships a day and low-latency site serving.

Using cloud-computing technologies, organizations are experimenting with distributed data stores, cloud compute capacity for data analytics, hosted data integration and even operational databases in the cloud. For organizations with existing investments in data warehouses and data markets, technologies such as in-memory systems, flash-based accelerators, and memcached servers can help alleviate performance bottlenecks and push back hardware retirement dates.

With all of the information available today on the public Internet and within internal corporate sites, it’s easy to feel overwhelmed. Visualization and collaboration tools are important to help business users overcome a feeling of data overload, identify patterns and take actionable steps. For example, LinkedIn Maps enables users to map professional networks and understand relationships among connections. Your map is color-coded to represent different affiliations or groups from your professional career, such as your previous employer, college classmates or industries you’ve worked in.

4 Responses to “–Big Data 2011 Preview”

[…] Challenges with Parsing Natural language is one of the most difficult to parse. While most languages including English have grammar rules for subjects, verbs, pronouns, etc., specific sentences can be difficult to parse in the absence of context. Even with the combination of IBM natural language processing software, an IBM supercomputer and Hadoop, IBM Watson struggled to understand some of the language formulations in questions posed by the Jeopardy! TV game show last year. […]

Hi there! I could have sworn I’ve been to this blog before but after browsing through some of the post I realized it’s
new to me. Anyways, I’m definitely happy I found it and I’ll
be bookmarking and checking back often!