Could This ‘Data Lake’ Concept From GE Revolutionize Energy Analytics?

General Electric is involved in nearly every area of the clean energy market: solar engineering, wind manufacturing and development, LED lighting, distributed natural gas, metering and submetering, and grid analytics are just some of its major touch points.

It’s also a dominant force in conventional generation, monitoring 1,600 gas and steam turbines that represent nearly one-quarter of the world’s power plants.

With such a diverse array of energy assets under its control, GE has become a prodigious producer and consumer of data. Energy engineers at the conglomerate are analyzing ten times more data today than they were five years ago — bringing new insights, while also creating new complications for the data-crunching infrastructure.

The energy market is just one piece of GE’s total business, which also includes healthcare, aviation and rail transportation, manufacturing, mining and water processing. As GE expands the industrial internet and uses sensors to track the performance of everything it builds, the company is generating thousands of terabytes of data for customers.

But even a mighty giant like GE is having a hard time keeping up with it all.

“Big data is growing so fast that it is outpacing the ability of current tools to take full advantage of it,” said GE’s Vice President of Software Bill Ruh in a statement about the company’s new approach to data.

That’s why GE made a $105 million equity investment in Pivotal, a big data analytics firm, in April of last year. Over the last sixteen months, the two companies have been working on a new way of sifting through that growing pool of information. And today, they announced the result: a “data lake.”

A data lake, enabled by the open-source software Hadoop, is simply a collection of information in its raw format. Rather than process the data and file it away in a rigid way, GE and Pivotal are storing it in its original form and sifting through it when needed.

GE says it can process information 2,000 faster and ten times cheaper than traditional methods — reducing analysis run times from months to days or even minutes.

Bartlett, who studied biology and ecosystems before he jumped into computer science, uses a biological metaphor to describe the data lake concept. “A data lake is like a pond in the woods — a richly diverse ecosystem,” he says. “You have complex food webs composed of millions of organisms, from algae and plants all the way up to top predators. Other factors such as water depth, available oxygen, nutrient levels, temperature, salinity and flow create the context of an intricate, interconnected ecosystem. If you throw a line in the water, you never know what you will catch. It is an exciting place to fish! The questions and analytical opportunity are almost limitless.”

“On the other hand,” he says, “a more traditional database is more like a fish farm where all the species have been preclassified and fed the same diet and health supplements. Some intensive tanks even employ biosecurity measures — a [significant] contrast from the rich, open natural ecosystem. If you throw a line in the water here, you have a pretty good idea of what you will catch! While useful, it has more limitations as to what it can teach us.”

GE’s “fishing pole” will be its Predix software, the company’s analytics platform connecting devices to the industrial internet.

So far, the data lake is only being used for airlines. GE has analyzed 15,000 flights for its customers, and plans to scale up to 10 million flights by next year. But the method will eventually be expanded to all its major industries, including energy.

The possibilities are endless, given GE’s deep reach into energy infrastructure. For example, as the company moves into the grid analytics space and sells software for managing outages, understanding customer behavior and monitoring demand, GE could potentially gain an advantage by offering this kind of analytics service. The data-crunching method also has potential for better understanding energy use in commercial buildings, manufacturing plants and within the home.

Stephen Lacey is a Senior Editor at Greentech Media, where he focuses primarily on energy efficiency. He has extensive experience reporting on the business and politics of cleantech. He was formerly Deputy Editor of Climate Progress, a climate and energy blog based at the Center for American Progress. He was also an editor/producer with Renewable Energy World. He received his B.A. in journalism from Franklin Pierce University.

“General Electric is involved in nearly every area of the clean energy market: solar engineering, wind manufacturing and development, LED lighting, distributed natural gas, metering and submetering, and grid analytics are just some of its major touch points.”Note that while “distributed natural gas” is a GE line of business, it would not be part of GE’s involvement in the clean energy market … as it involves natural gas combusion and emissions, it would more propertly be classes as part of GE’s involvement in the dirty energy market.

0

| - ShareHide Replies ∧

Guest

gualy

August 22, 2014 00:28

Sometime, I have listened that information is power, but in this case that you are talking about GE’s Data Lake, we can realize that also is possible use to our benefit, like in the aviation, so General Electric is making each flight more secure, because they (GE) monitoring and analize many flights in specific the aeroplane’s turbines to correct possible fails or repair in site the turbines to security of the passanger.In the case, of Energy Management we could much benefits, because with the correct information the companies and goverment could make a better desitions to invest in new projects to power generation conventional or Renewable.I think that in Mexico can take advantage of GE’s “fishing pool” because Mexico will build many megaprojects to power generation and the mexican goverment will need much support to take desitions that really benefit to mexican people.Good report. See you.