Big Data Spurs Big Collaborations

The U.S. government's Big Data initiative is sparking more investments in data management projects, especially those involving joint efforts between business and government. Under the sponsorship of the Obama administration initiative, the White House showcased more than 30 Big Data projects Nov. 12 at an event hosted by the federal Office of Science and Technology. Many of the projects involved commercial companies, including major pharmaceutical and IT firms.

"America is rich with institutions that are expert at generating data, but as a nation we have not fulfilled our potential to make the most of these data by merging pre-competitive resources, partnering on analytics, and sharing lessons learned," said John Holdren, assistant to the president for science and technology.

The projects featured at the event "show that we are maturing in this respect, finding synergies and collaborative opportunities that will accelerate progress in a wide range of scientific, social and economic domains," he said.

In the IT sector, a primary example of such collaborations involves the National Aeronautics and Space Administration and Amazon Web Services. On the day of the White House event, NASA revealed it was making a large collection of its massive database available to research and educational users through an AWS cloud platform. NASA utilized a non-reimbursable agreement authorized by the Space Act to form the partnership.

"Under the agreement the partners bring their own resources to the table and perform the research together. NASA funds its own investigator and AWS invests its own resources. There is no exchange of money," Tsengdar Lee, high-end computing program manager at NASA headquarters, told the E-Commerce Times.

Terabytes Transferred to Cloud

The NASA-AWS program encompasses selected satellite and global data sets -- including temperature, precipitation and forest cover -- and data processing tools from the NASA Earth Exchange. Through NEX, users can explore and analyze large data sets, run and share modeling algorithms, collaborate on new or existing projects, and exchange workflows and results within and among other science communities.

NASA has uploaded terabytes of data from three satellite and computer modeling data sets to the AWS platform and will upload more in the future.

"Federal agencies collect and produce an extraordinary amount of data. The critical barrier to Big Data, which has traditionally been the infrastructure required to collect, compute and collaborate, is no longer a challenge through the use of cloud computing technology and solutions like the AWS platform," Jamie Kinney, AWS senior manager for scientific computing, told the E-Commerce Times.

"In cloud environments, where storage and compute infrastructure are delivered on demand, resources can bend and flex to provide the perfect fit for analyzing any amount or kind of data. This makes it easier and less expensive to collect, store, analyze and share data than it has ever been before," he said.

"Many researchers don't have easy access to Big Data even when the data is free and open," NASA's Lee said. The partnership facilitates the availability and analysis of data by using the AWS cloud technology. The program also promotes the "open data" concept and does not require any user credentials.

"The users make their own arrangement with AWS for any additional computational resource they want," Lee noted. The effort continues NASA's adoption of cloud platforms to enhance digital services, enabling the agency to make more U.S. government data easy to find and access without having to download large amounts of information.

The project combines data access with the utilization of software tools using AWS high-performance computing resources, according to Kinney.

"In the past, these types of analytics were only available to NASA researchers who have access to NASA's Advanced Supercomputing facilities. It would have been logistically difficult for researchers to gain easy access to this data due to its dynamic nature and immense size without cloud technology. Limitations on download bandwidth, local storage, and on-premises processing power made in-house processing impractical," he said.

"NASA's security requirements also limit the agency's ability to grant access to non-NASA researchers. Hosting this data and related software tools on AWS enables researchers to interact with these data sets without NASA having to open up its firewalls," Kinney pointed out.

AWS is hosting the data under its
Public Data Sets program, which is also used to host other data sets like the 1,000 genomes data. NASA operates and pays for the
OpenNEX website. The joint effort extends prior work between the partners.

"This is a new collaboration between NASA and AWS, built on the relationship we've formed over the past five years," said Kinney.

Tech Sector Active in Collaborations

Other Big Data partnership examples cited at the White House forum:

A collaboration between IBM and the National Institutes of Health on a medical diagnostics project involving data generated from electronic health records. NIH provided a US$2 million grant to IBM and two healthcare systems, Sutter Health and Geisinger Health. "By pairing IBM's expertise in Big Data analytics with the domain knowledge and data of our healthcare partners, this project will result in the development of new analytic algorithms for more accurate detection of the early onset of heart failure," said Shahram Ebadollahi, program director for Health Informatics Research at IBM.

A public outreach program sponsored by TechAmerica in which the industry association sponsored several Big Data Road Shows featuring healthcare and energy sector applications. The sessions, conducted in major cities, included senior federal and state officials, private sector representatives and academic experts. The meetings were a follow-up to the 2012 release of a TechAmerica Foundation report on demystifying Big Data, which provides a road map for the federal government to better leverage and utilize Big Data and related technologies.

An actual $16.7 million contract between the U.S. Postal Service and FedCentric, which will incorporate systems from Silicon Graphics International for a cost-efficient commodity high-density computing system to manage high-velocity data, including high-speed detection, processing, and analysis for fraud prevention.

In addition to advancing open access to the huge amount of government data that is potentially available to the public, the partnership element of the Big Data initiative can have a positive practical impact for federal agencies in managing IT.

"Regarding the NASA/AWS project, this is an emerging trend gathering steam. The CIA's award to AWS of a major cloud-based Big Data infrastructure services contract really put that company on everyone's radar. The CIA's stamp of approval carries a lot of weight with the entire federal government, and that is rippling through the market," said Alex Rossino, principal research analyst at Deltek.

Regardless of the commercial firm that may be involved, federal agencies are recognizing that extensive commercial computing infrastructure can be leveraged to provide large data sets and analytics to a wide variety of partners, customers and purposes, according to Rossino.

"Agencies know they cannot provide these services themselves, especially given the financial constraints they are facing," he said, "so they are leveraging commercial capabilities to get the job done."

John K. Higgins is a career business writer, with broad experience for a major publisher in a wide range of topics including energy, finance, environment and government policy. In his current freelance role, he reports mainly on government information technology issues for ECT News Network.