Build Newsletter—Industrial Internet Advances Alongside Big Data

This month, it seems the most prolific topic—“The Industrial Internet”, “The Internet of Things (IoT)”, or “The Internet of Everything”, depending on how you like to call it, has completely taken over. However, it seems that even the emergence of SkyNet will have data quality issues. We found some interesting articles detailing new methods for big data cleanliness. We finish off with some news and commentary in app development, big data, in-memory data grids, open source software, and data science.

As all products and services become more digitized, there are tons of data-generating devices. But, there is also a new breed of information worker coming online. This includes construction, factory, farm, equipment, and many other types of workers. Often referred to as blue collar, these occupations interact with machinery and equipment. This means they will need apps to help manage machines. In the U.S. alone, there are 87 million citizens in these types of jobs.

The IoT megatrend is reaching its hands into all facets of industry—getting as far down as ball bearings. Last month, the Harvard Business Review’s shared how sensors, powered by kinetic motion, are located inside ball bearings, and transmit performance information. The manufacturer, a company called SKF Group, also provides 45 different iPad apps so that maintenance crews can monitor 8000 different products. Half a million machines have been connected to the SKF cloud for over two years.

As a result, now we can get an iPad alert to add lubricant—or risk paying for an expensive replacement process.

The Industrial Internet: New Reference Architecture, Lack of Developer Readiness, and Revenue

Why are these traditionally slow-to-adopt B2B manufacturing companies hiring developers and architects in droves and moving quickly? Better yet, why did SiliconANGLE report that only 50% of developers are ready for IoT? Perhaps it is because there are too many different tools? Well, we techies do like to play with stuff.

Given the velocity of the Industrial Internet space, there was also compelling news in the in-memory and big data space, which are integrating, converging, and necessary to support the Industrial Internet.

With that said, our first talk explained how Apache Geode started off as GemFire, and we included a slide on the GemFire journey to date. Pivotal GemFire now has 1000+ customers, and we power portions of every major Wall Street bank, the Department of Defense, travel portals, airlines, trade clearing, online gambling, telecommunications companies, manufacturers, payroll processors, insurance giants, and the largest rail systems on earth. We also covered some of Geode’s performance, like linear scale in general and speed versus Apache Cassandra™. We then outlined the Geode roadmap—with HDFS persistence, off-heap storage, Apache Lucene™ indexes, Apache Spark™ integration, and Cloud Foundry services. The second talk covered the architecture for real-time stock prediction, which you can see here.

Last month also included the Hadoop Summit, which we sponsored too. One of the key questions was about Hadoop crossing the chasm into the mainstream. We all know there is still a road ahead to take the architecture where it needs to go. Yet, it is in process. Outside of making SQL work on Hadoop, one of Pivotal’s strongest capabilities, one of the big hurdles is going to be how companies address the quality of their data.

As a recent survey explained, developers spend as much as 90% of their time cleaning data so that it can be analyzed. The top culprits are integrating data stores, combining relational and non-relational data, and the sheer volume. Ultimately, these are big wastes, which is why it was prevalent in the media as we gathered up the info for this newsletter.

What is the impact? Another report, though it is probably pushing the premise for this metric, claims that data quality can help companies generate 70% more revenue, particularly with sales and marketing data. Ok. Even if that is pushing it, we know data quality is a problem. In fact, its been a big problem long before data has become big data, just now its at scale. For a while some data science gurus were even telling us that data quality no longer mattered, or at least there were different approaches to getting more quality information from data.

However, as pointed out in this Datanami article, its possible to approach data quality improvement from an automation point of view. The better we can prepare and improve our source data, the more productive our data scientists and analysts will be.

Humans remain integral to most data quality improvement processes, but more and more we can augment them with automation.. There have also been articles, like this one, talk about using machine learning for self-serve data quality. Along these lines, computer scientists have been looking at evolution and genetic improvement as a way to improve source code. This article explains how such a program took 50,000 lines of source code and sped it up by 70 times—somewhere, the concept might be applied to data as well.

In any event, this article by Paxata explains their perspective regarding the implications of data quality on IoT. The author believes IoT data will still have quality issues, particularly when it is combined with other data sources. At first, one might think that IoT data is more like log file or click steam data, but there are other opportunities. While they didn’t explain the uses cases in a concrete way, we believe IoT data will be combined with transactional field services data, assignment and dispatch systems, invoices and work orders, mobile and tablet-based behavioral data, video, audio, emails, texts, phone call logs, training, and more.

We think Talend’s CEO captured a great thought—the concept of managing data as a data supply chain. Would you accept poor quality parts from suppliers for a manufacturing process? Well, aren’t we manufacturing data, reports, and data science-driven outcomes? The leader of the open source data integration platform also goes on to give some unique perspectives on open source.

About the Author

Greg Chase is an enterprise software business leader more than 20 years experience in business development, marketing, sales, and engineering with software companies. Most recently Greg has been focused on building the community and ecosystem around Pivotal Greenplum and Pivotal Cloud Foundry as part of the Global Ecosystem Team at Pivotal. His goal is to to help create powerful solutions for Pivotal’s customers, and drive business for Pivotal’s partners. Greg is also a wine maker, dog lover, community volunteer, and social entrepreneur.

Previous

DevOps, Microservices, and Platforms, Oh My!

In this talk from Cloud Foundry Summit 2015, the main message is—forget about tools and software for a minu...

Next

All About Diego and Lattice

What are Diego and Lattice, and how does it fit into Cloud Foundry? This week's Pivotal Conversations podca...

×

Subscribe to our Weekly Newsletter

Email

Email

!

I agree to the terms of Pivotal's Privacy Policy: pivotal.io/privacy-policy