Hortonworks » Business Values of Hadoophttp://hortonworks.com
Develops, Distributes and Supports Enterprise Apache Hadoop.Sun, 02 Aug 2015 15:42:57 +0000en-UShourly1http://wordpress.org/?v=4.2.3A Petrophysicist’s Perspective on Hadoop-based Data Discoveryhttp://hortonworks.com/blog/a-petrophysicists-perspective-on-hadoop-based-data-discovery/
http://hortonworks.com/blog/a-petrophysicists-perspective-on-hadoop-based-data-discovery/#commentsWed, 29 Jul 2015 18:27:22 +0000http://hortonworks.com/?p=75431Along with the Hortonworks Oil and Gas team, I have been working closely with Laurence Sones, senior petrophysicist, to understand how Hadoop-based Data Discovery is enabling Geologic and Geophysical (G&G) teams to improve decision-making across their assets. What follows is a Q&A session with Laurence discussing his perspectives on data discovery.

Kohlleffel: Laurence, you have a wealth of experience in the oil and gas industry. Please discuss your background and some of the roles that you have taken on.

Sones: Sure, I began as a field engineer with Schlumberger in the logging and perforating area. Following that, I moved into wireline sales and then did a stint as a Service Quality Manager for both open hole wireline and cased hole wireline. In addition, I was a well placement engineer for Schlumberger and then moved to Anadarko performing geosteering. Lastly, I was with Forest Oil as a petrophysicist.…

]]>Along with the Hortonworks Oil and Gas team, I have been working closely with Laurence Sones, senior petrophysicist, to understand how Hadoop-based Data Discovery is enabling Geologic and Geophysical (G&G) teams to improve decision-making across their assets. What follows is a Q&A session with Laurence discussing his perspectives on data discovery.

Kohlleffel: Laurence, you have a wealth of experience in the oil and gas industry. Please discuss your background and some of the roles that you have taken on.

Sones: Sure, I began as a field engineer with Schlumberger in the logging and perforating area. Following that, I moved into wireline sales and then did a stint as a Service Quality Manager for both open hole wireline and cased hole wireline. In addition, I was a well placement engineer for Schlumberger and then moved to Anadarko performing geosteering. Lastly, I was with Forest Oil as a petrophysicist.

Kohlleffel: Can you discuss both the geological analysis (surface, subsurface, and core drilling) and geophysical analysis (seismic, gravity, magnetic, electrical, geochemical) processes? How does working with a broad set of data allow you to make a decision or recommendation regarding high potential areas?

Sones: My foundational understanding of being able to perform effective log analysis starts with the time that I spent in the field and being able to identify high quality or poor quality log data based on how the data is collected, and then understand all of the various parameters at the time of acquisition. The time that I have spent in the industry has also given me a clear view of the people that are the users of the data and the applications that they use.

Initially, we review production for an area and type curves for production are developed and reviewed by reservoir engineers and geologists. Next, both geologists and petrophysicists review well logs and establish a basic petrophysical model based on rock type, fluid type, etc. to find a good correlation between the properties recorded on logs and the actual physical properties of the rocks. With this, water saturation, effective porosity, and net pay can be calculated and the acreage can be graded based on those properties. Reservoir modeling may be done with volumetrics that incorporate petrophysical properties and production data with which to correlate the production to the petrophysical properties.

Looking at the geological structure is also part of standard analysis of a play – mapping formation tops and fluid contacts, if those are present, and also we also review seismic data when available.

Lastly, we do core analysis, which incorporates multiple datasets including rock composition, water/oil saturation, porosity, permeability, SEM, geological description, mechanical properties which can all be used for an advanced petrophysical model. Typically, we can develop a strong correlation between the recorded log properties and the actual properties of the rock.

Kohlleffel: What are some of the manual processes involved in working with all of these disparate datasets?

Sones: Looking at the most recent field that I worked in, it was manually intense, and 90% of my time was spent in QC of logs, data quality review, and properly identifying the curves that were in the files to ensure that the proper information was being used to perform the analysis. The final 10% of my time on a project was available for the fun part, the analysis.

All of the manual tasks take a significant amount of time with any size field. Commonly, a geophysicist might start by checking depth references for every single well on a public site for verification to ensure that the highest quality data is being used and that at the end a high quality result is produced.

Kohlleffel: Operators are under tremendous pressure to reduce costs and this is putting enormous organizational and financial pressure on existing models. Can you comment on the feedback we are getting from operators that the use of Hadoop as an economical platform for advanced analytics is key to their ability to deliver an optimized cost model?

Sones: Producers are looking for any way to reduce costs, and I see multiple ways to do this with advanced analytics driven by Hadoop, whether it’s effectively checking your depth reference or having a powerful sensitivity analysis for driving cost down and understanding where you should be drilling. I am seeing an increased number of techs hired in some places to manage databases and data sources, but it’s not a replacement for optimization and what advanced analytics with Hadoop can bring to the table. It’s really just throwing more manpower at the problem versus applying better technology, which could benefit techs, geologists, engineers, and petrophysicists.

Kohlleffel: Laurence, please expound on the datasets that are critical to a G&G organization – can you go through those in more detail and describe the challenge in getting a single view or comprehensive map that includes the relevant data?

Sones: I’m glad to. The primary dataset is that of log data which can be recorded multiple times for different measurements on the same well in many cases. Log data establishes the foundation for analysis for geologists and petrophysicists. Production data is also critical and it can reside in multiple sources; generally it’s pulled into a primary geological analysis software application.

In addition to that, the seismic data is almost always on a separate platform being used by the geophysicists. I’ve mentioned some of the data in-house, but you also have all of the legal information that’s in the public domain on the state commission websites – legal location, legal well name, API numbers, depth records, elevation, etc.

Kohlleffel: How do you feel that Hadoop is helping companies across the industry address this proliferation of data silos as well as the manual QC process?

Sones: Hadoop is well suited for the data discovery required by the G&G community because as a centralized data platform it allows us to ingest all information for a single view. This includes structured data such as production and completion records, semi-structured data such as well logs, and unstructured data such as spreadsheets and PDFs. From this wide variety of datasets, we create our own “path” to the data and combinations of datasets that we feel are most important for our analysis. That’s important, because not being constrained to a prescribed path allows for complete freedom of data discovery by an individual, and allows us to ask questions that we hadn’t considered before. We may want to perform focused analysis on a small subset of wells or perform analytical processing on an entire basin or reservoir. Hadoop makes either scenario possible.

Furthermore, we can use the end user visualization tools that we are already familiar with to do sensitivity analysis in order to get the clearest picture of what is driving production. Some of the new areas that I am exploring with Hortonworks include leveraging Hadoop to bucket curves into analysis classes, auto zoning, metadata correction, machine learning for enhanced sensitivity analysis and data exploration, and batch processing of LAS files for various conversion metrics.

Kohlleffel: How important is it to you to have a 100% open source approach versus a partial open approach?

Sones: It’s an interesting question, because I’ve repeatedly seen how critical it is to be able to uptake innovative aspects and new features of software quickly and I believe that the Hortonworks’ 100% open source approach to Hadoop provides the Oil and Gas industry with a distinct advantage to other approaches.

Kohlleffel: Laurence, I want to thank you for your time today and we look forward to future discussions.

Learn More About Hortonworks Data Platform for Oil & Gas

]]>http://hortonworks.com/blog/a-petrophysicists-perspective-on-hadoop-based-data-discovery/feed/0Boosting Telecomm Customer Experience with Hadoop and Customer Behavior Graphshttp://hortonworks.com/blog/boosting-telecomm-customer-experience-with-hadoop-and-customer-behavior-graphs/
http://hortonworks.com/blog/boosting-telecomm-customer-experience-with-hadoop-and-customer-behavior-graphs/#commentsTue, 28 Jul 2015 20:36:30 +0000http://hortonworks.com/?p=75334Communication service providers aim to enhance customer experience and build strong and long-lasting relationships with their customers. This has become increasingly difficult as customer interactions now occur across many channels. Hence, it’s important to understand customer behavior across all channels to create the best experience for each individual. Join us on August 5 for a webinar with Hortonworks and Apigee to learn more.

Register Now

In today’s guest blog post, Sanjay Kumar, General Manager, Telecommunications at Hortonworks, and Sanjeev Srivastav, Vice President, Data Strategy at Apigee, discuss how service providers can capture and visualize customer behavior as a graph connecting the interaction points such as IVR, chat and call events, and combine it with network data to predict future call or chat patterns. For such analysis, many telcos have invested in big data infrastructure such as the HDP platform, and some have already begun to harness the myriad of data gathered from multiple internal systems in data lakes.…

]]>Communication service providers aim to enhance customer experience and build strong and long-lasting relationships with their customers. This has become increasingly difficult as customer interactions now occur across many channels. Hence, it’s important to understand customer behavior across all channels to create the best experience for each individual. Join us on August 5 for a webinar with Hortonworks and Apigee to learn more.

In today’s guest blog post, Sanjay Kumar, General Manager, Telecommunications at Hortonworks, and Sanjeev Srivastav, Vice President, Data Strategy at Apigee, discuss how service providers can capture and visualize customer behavior as a graph connecting the interaction points such as IVR, chat and call events, and combine it with network data to predict future call or chat patterns. For such analysis, many telcos have invested in big data infrastructure such as the HDP platform, and some have already begun to harness the myriad of data gathered from multiple internal systems in data lakes. Furthermore, there is the realization of the importance of assembling a single, 360-degree view of the customer, with customer-focused data sets.

Telco companies’ next challenge is to answer questions such as:

What is the likely reason for the next support call?

What post-call actions will increase customer satisfaction?

Which support requests are serviced across IVR and chat?

Where can proactive outreach be useful?

Means of answering these questions must be grounded in the reality of changing customer behavior and operational metrics. Examples presented in this blog used the HDP platform coupled with the Apigee Insights software to not only provide reliable predictions, but also tools useable by the business managers. This allows the businessperson to understand how customers in a similar context have engaged with the company, and how effective specific interventions have been.

Understanding customer journeys

When telco subscribers visit a support site or use a care app, call customer support, experience an outage, or express an opinion on a social network, they are taking steps on a “customer journey,” which represents a sequence of events experienced by a group of customers; in other words, it represents various paths in the overall customer behavior graph.

Reducing customer journeys to a set of customer attributes, say, those who called more than 3 times per day, along with their demographic characteristics

Using algorithms based on customer attributes to predict the actions of customers

These approaches are limited in their ability to answer questions like these:

When the number of escalations is high, what was the experience of customers leading to the escalation?

What events generate significant follow-up support activity, such as repeat calls or a technician dispatch?

At what point, while using self-care channels, do subscribers give up and drop into the call center?

The ability to ask questions like these (which involves walking forward and backward in time, across events from multiple channels of data) and to build predictive models based on these customer journeys provides a foundational layer for the analytics capability.

Behavior graphs on Hadoop data lakes in production

Once the journey analysis and predictive models are in place, the business and IT teams need to deploy the solution at scale. In this context, Hortonworks and Apigee have partnered to bring to market the Apigee Insights technology, a unique method of building aggregate behavior graphs on the Hadoop platform. Here we’ll describe work done for a communications provider, using data including customer call center records, operations data, and the network quality data.

The picture above shows a schematic graph that represents customer behavior. Note the sequential placement of events in the behavior graph. Such representations are used to compute the likelihood that a specific customer would be a repeat caller in, say, a 21-day window. Furthermore, based on the journeys experienced by a significantly large population of other customers, the system also provides scores for the likely reason behind that next call.

The specific business unit in question here receives about 100,000 calls a day, from a customer base of approximately 10 million. Over time, one can imagine the almost infinitely large number of paths that can be generated to represent the customer engagement. By applying certain threshold parameters that govern the amount of activity represented by the various possible paths, one obtains a set of “interesting” paths—but even those can number in the millions. It takes the power of big data platforms like HDP 2.x to manage the storage and computational power needed to handle this variety and volume of data.

In the screenshot below, you can see some examples of journeys that end at an “escalation” event, experienced by about 226,000 customers. Clues from these journeys and details of the customers who experienced these journeys are directly useful in effecting change, say, by increased training of support representatives or by proactive outreach to customers.

Furthermore, one can also inquire from the system the exhaustive list of paths that customers potentially took between two events of interest, and noting the relative fractions of customers who follow the various journeys. Such a bird’s-eye view provides an understanding of customer behavior that is very hard to glean simply from a set of customer attribute values.

Machine learning for customer behavior

Besides the need to visualize past customer behavior, it is critical that predictive models keep pace with changing trends. Models that use a combination of customer profile elements and event activity data are better suited for the twin tasks of predicting the likelihood of repeat calls and the likely reason of the next support request.

Apigee’s behavior-based predictive models, which don’t require the conversion of event data into customer profile data, and also preserve the information inherent in the event sequences, provide excellent predictive power.

How do you take action?

Change is inevitable, and data-driven insights help companies better manage change. For example, data about the intensity of customer activity on specific paths quantifies the impact of changes to that particular path or process.

By knowing a customer’s recent journey segment (she experienced a service outage, has called into a support center, and her service just got restored), the company can delight her by sending an automated outbound SMS.

Access to customer journeys at scale provides business analysts with a feel for the problems and opportunities for improving the customer experience. Additionally, if the predictive modeling is done using data structures that represent the journeys, analysts also get a sense of customer behaviors that most influence the predictive modeling; this is hard to achieve from a black-box predictive model.

Lastly, once predictive model have been developed, the overarching goal of improving customer experience can be reduced to specific actions that are appropriate for specific groups of customers. Call center agents or supervisors can execute these actions via existing or new applications; an intelligent API platform can further streamline the “last mile” problem of delivering predictive model scores to the applications that need them.

Without needing “features” based on insights shared by business owners, the model has high precision for identifying repeat callers (96% correct for the top 1% of the list of customers, sorted by their propensity of repeat calls); for each of the callers, it also predicted the top few reasons likely associated with the next call (actual reason of the call was ranked the number one reason for 61% of the top 1% callers).

It’s about journeys, models, and APIs

As digital interactions dominate the enterprise landscape, data about how customers engage with companies is being stored in Hadoop data lakes, for it provides critical business insights. Apigee Insights provides a powerful method of representing such engagements as behavior graphs while taking into account the raw data about the time and sequence of activities. This representation is used for descriptive analytics (examining historical data) and predictive analytics. The last mile of leveraging predictive analytics to influence or enable customer engagement is facilitated using Apigee Edge. Customer care decisions can thus be much more targeted and likely to have a positive impact.

]]>http://hortonworks.com/blog/boosting-telecomm-customer-experience-with-hadoop-and-customer-behavior-graphs/feed/0Spark on HDInsight, Partner of Year Award Highlight WPC 2015http://hortonworks.com/blog/spark-on-hdinsight-partner-of-year-award-highlight-wpc-2015/
http://hortonworks.com/blog/spark-on-hdinsight-partner-of-year-award-highlight-wpc-2015/#commentsFri, 17 Jul 2015 20:38:57 +0000http://hortonworks.com/?p=74444The Spark lit in Azure HDInsight last week was just the beginning to this week’s energy and momentum we saw around the partnership at Microsoft’s Worldwide Partner’s Conference. During the conference keynote, Microsoft CEO Satya Nadella affirmed, “Together with our partners we are transforming the business world for the next era”.

This year’s big announcement of Apache Spark being brought to HDInsight was the latest Big Data announcement delivered by Microsoft and Hortonworks. The deep partnership between the two companies started in 2011, initially bringing HDP to a native Windows Server environment, followed by Microsoft’s Azure HDInsight, Hadoop as a service based on the core HDP platform. Our tight partnership continued with collaboration to extend HDP to Azure IaaS to meet the demands of our customers’ shift to a hybrid architecture. Today the market reaps the benefits of the flexibility and interoperability of HDP on-premises, on Azure and as-a-service with HDInsight.…

]]>The Spark lit in Azure HDInsight last week was just the beginning to this week’s energy and momentum we saw around the partnership at Microsoft’s Worldwide Partner’s Conference. During the conference keynote, Microsoft CEO Satya Nadella affirmed, “Together with our partners we are transforming the business world for the next era”.

This year’s big announcement of Apache Spark being brought to HDInsight was the latest Big Data announcement delivered by Microsoft and Hortonworks. The deep partnership between the two companies started in 2011, initially bringing HDP to a native Windows Server environment, followed by Microsoft’s Azure HDInsight, Hadoop as a service based on the core HDP platform. Our tight partnership continued with collaboration to extend HDP to Azure IaaS to meet the demands of our customers’ shift to a hybrid architecture. Today the market reaps the benefits of the flexibility and interoperability of HDP on-premises, on Azure and as-a-service with HDInsight.

Carrying the excitement and energy forward from the big Hadoop and HDInsights announcements, Hortonworks started the week being honored with awards from Microsoft’s Enterprise Partner Group for “Cloud Partner of the Year” in both the West and Central regions of the United States. Hortonworks Data Platform and Hortonworks Sandbox are offered on the Azure Marketplace. HDInsight, built with the core HDP platform is one of the top 10 apps driving Azure usage. The momentum is apparent in these regions of the United States, where our joint customers are taking advantage of the benefits of Azure and Open Enterprise Hadoop. With no up-front costs or hardware investments and the flexibility and scalability of Azure, customers can spin up a Hadoop cluster in minutes, access the powerful cloud analytic services available in Azure and transform their business. We are proud to both receive these awards and celebrate our joint field efforts in building wins together such as Pier 1, Noble Energy and Rockwell Automation.

On day 2 of WPC, Hortonworks RVP of West and Central, Sales Rick Turco, had the pleasure of joining Stephen Boyle, Microsoft VP US Enterprise Services Partners in his keynote session to share where customers are going with Hadoop and Azure technologies in the market and why. Opening the session with a video from our joint customer, we learned how they used HDInsight to transform their business. Rick and Stephen had a valuable q&a session, helping the partner community understand how they too can help their customers transform their business with Hortonworks Open Enterprise Hadoop and Azure.

Keying off Stephen’s keynote, we additionally jointly conducted SI, ISV and Industry briefings, round tables and strategy sessions to get to the next level of fidelity and further enable the partner community.

Wrapping up the conference, the US Enterprise Partner Group conducted their briefing of their FY plans for supporting the vision and calls to action from Scott Guthrie’s and Kevin Turner’s keynote of Building the Intelligent Cloud and seizing the moment’s predicted 14% annual cloud market growth by 2018. Making the US EPG’s top 3 list of go-to partners, we at Hortonworks are very excited to continue our strong partnership with Microsoft and help drive customer innovation and ensure success as enterprises move to the cloud and big data with Azure and HDInsights.

]]>http://hortonworks.com/blog/spark-on-hdinsight-partner-of-year-award-highlight-wpc-2015/feed/0Hortonworks Momentum in Oil and Gas Heats Uphttp://hortonworks.com/blog/hortonworks-momentum-in-oil-and-gas/
http://hortonworks.com/blog/hortonworks-momentum-in-oil-and-gas/#commentsThu, 09 Jul 2015 14:47:05 +0000http://hortonworks.com/?p=73942It was a frenetic month of June with Hadoop Summit in San Jose and the GDS International Oil and Gas CIO Summit the following week in San Antonio. Both events were well attended and very positively received by Hortonworks customers and O&G companies beginning to evaluate Hadoop for new advanced analytics applications.

Hadoop Summit was well attended by a large contingent across all segments of the industry and we saw a 500% growth in attendees from oil and gas customers over last year.

Some O&G highlights from the event included:

A keynote panel discussion on “Transformational Stories of Hadoop in the Enterprise” featuring Anil Varma, Schlumberger Vice President Data and Analytics, along with Verizon, Home Depot, Symantec, and Rogers Communications.

Hadoop at Work discussions by Frank Besch, Director of Business Innovation with Noble Energy on “Becoming a Data Driven Oil and Gas Enterprise with Advanced Analytics and Hadoop” and Vish Avasarala, Global Head of Advanced Analytics with Schlumberger on “Big Data Challenges in the Energy Sector“.

]]>It was a frenetic month of June with Hadoop Summit in San Jose and the GDS International Oil and Gas CIO Summit the following week in San Antonio. Both events were well attended and very positively received by Hortonworks customers and O&G companies beginning to evaluate Hadoop for new advanced analytics applications.

Hadoop Summit was well attended by a large contingent across all segments of the industry and we saw a 500% growth in attendees from oil and gas customers over last year.

Video Interview participation for the Oil and Gas industry with a number of other industries

Key O&G stakeholders were able to discuss their challenges and vision in a 1:1 format with Hortonworks founders, executive management, and lead engineers

The week after Hadoop Summit, the GDS International Oil and Gas Summit featured a unique format coupled with an experienced group of delegates eager to understand more about Big Data and Analytics (which was a focus track at the conference).

Hortonworks participated with a select group of vendors and delegates throughout the three day Summit in the following areas:

Focused 1:1 sessions with 14 of the delegates discussing challenges facing the industry and their companies and how we can work together to drive cost efficiencies across the lifecycle of a well, increase cash flow, and add to overall shareholder value. Consistent themes of investment interest included single view of an asset, industrial IOT, predictive analytics, and G&G advanced analytics.

The Hortonworks Oil and Gas team collaborates with our customers on a simple formula for success with Hadoop. We execute a business value workshop, identify potential pilots with high value and quick execution, and deliver and operationalize targeted business value to a specific business unit or asset team. We accomplish this over a 6-week period with the goal of helping our customers jumpstart their journey to driving decisions with data in Open Enterprise Hadoop.

For the balance of the summer, we are continuing existing discussions and starting new programs of work while looking to ensure that all of our customers have a world-class experience partnering with Hortonworks.

]]>http://hortonworks.com/blog/hortonworks-momentum-in-oil-and-gas/feed/0The View from the Hadoop Summit 2015http://hortonworks.com/blog/the-view-from-the-hadoop-summit-2015/
http://hortonworks.com/blog/the-view-from-the-hadoop-summit-2015/#commentsTue, 23 Jun 2015 18:17:07 +0000http://hortonworks.com/?p=73407Earlier this month, Hortonworks had the pleasure of joining Yahoo! in hosting the 8th Annual Hadoop Summit, the leading conference for the Apache Hadoop community. Summit is always an important and exciting experience, bringing together thought leaders, technologists, and data specialists from throughout the community to explore and advance the art and science of Big Data.

This year’s event came at a pivotal time for Hadoop and Hortonworks, with news about Open Enterprise Hadoop and the launch of the newest version of Hortonworks Data Platform (HDP 2.3™) poised to transform the way large organizations in every industry process data.

I’d like to congratulate all the participants and attendees who joined us in San Jose, and highlight a few of the things that made this year’s Summit so special.

First of all, we’re thrilled with the enthusiastic crowds who converged at Summit.…

]]>Earlier this month, Hortonworks had the pleasure of joining Yahoo! in hosting the 8th Annual Hadoop Summit, the leading conference for the Apache Hadoop community. Summit is always an important and exciting experience, bringing together thought leaders, technologists, and data specialists from throughout the community to explore and advance the art and science of Big Data.

This year’s event came at a pivotal time for Hadoop and Hortonworks, with news about Open Enterprise Hadoop and the launch of the newest version of Hortonworks Data Platform (HDP 2.3™) poised to transform the way large organizations in every industry process data.

I’d like to congratulate all the participants and attendees who joined us in San Jose, and highlight a few of the things that made this year’s Summit so special.

First of all, we’re thrilled with the enthusiastic crowds who converged at Summit. The event drew more than 4,000 people—a 30 percent increase over last year—and I can’t count the number of people who sought me out to tell me that it was the best industry show they’d ever attended. (That same energy from the open Hadoop community was on proud display in the 40 intrepid individuals who rose before dawn for the latest in our series of Summit Bike Rides).

This year at Summit we heard the clear voice of Hadoop practitioners who are using the platform to transform their businesses. Nearly half of our 165 sessions were led by end-users, vivid proof that Hadoop is both pervasive and enterprise-ready. On Thursday morning, leaders from our customers Home Depot, Rogers, Symantec, Schlumberger and Verizon shared their transformational stories on the customer panel.

We also heard from many of our partners in the ecosystem—the Big Data innovators doing important work with Open Enterprise Hadoop in every part of the industry. In addition, more than forty analysts came to hear firsthand about our launch of HDP 2.3 and why it is the most transformational Hadoop distribution yet.

The excitement is easy to understand. With the launch of HDP 2.3, we’re accelerating the traction of Hadoop in the enterprise by giving businesses what they need to drive transformational outcomes—specifically, these improvements in ease of use, enterprise-readiness, and proactive support:

Breakthrough User Experience – HDP 2.3 eliminates much of the administrative complexity around Hadoop and improves developer productivity with user interfaces and tools that make Hadoop development and operations easier.

Enterprise Readiness – New capabilities for data encryption and data governance help IT organizations meet security and compliance requirements, while operational simplification for both on-premise and cloud-based deployments integrate Hadoop seamlessly into today’s hybrid environments.

Proactive Support – We’re especially pleased to be able to introduce Hortonworks SmartSense™, which adds proactive cluster monitoring and delivers critical recommendations through our world-class support subscriptions for Hadoop. This service uses Hadoop predictive analytics to optimize our customers’ investments in HDP.

Our day-one keynote focused on the enterprise-readiness of Hadoop today. Hortonworks CEO Rob Bearden explored the state of the market and the ways businesses are using Open Enterprise Hadoop to cope with the incredible volume and diversity of data flooding into their organizations. Not just cope—thrive, as guest speakers from Microsoft and Forrester Research talked about the new possibilities Open Enterprise Hadoop is bringing into reality. Managing and processing unstructured data more effectively is important and valuable, but when you start talking about integrating Hadoop within your applications, then you’re talking about true disruption.

On day two, we invited key Hortonworks partners to the stage, including Microsoft, EMC, SAP, Teradata and HP Software, to talk about complementary technologies that make Hadoop adoption-ready at enterprise scale. The power of Hadoop lies in its open source origins and development. When enterprise customers combine that with infrastructure and systems from the world’s top technology providers, they get the best of both worlds in complete solutions to drive the digitalization of their businesses.

In one of my favorite moments from Summit, author David Epstein used the “Moneyball” era of sports to illustrate how a rising tide of data doesn’t necessarily translate into success unless you’ve got the tools and expertise to discover the essential truths hidden within Big Data.

Day three focused on Hortonworks customers—perhaps the most concise proof of the enterprise-readiness of HDP. Within the Fortune 100 alone, we serve half the banks, more than two-thirds of the retail companies, and three-quarters of telecom companies. Move out to the Global 1000 and you’ll find many, many more organizations using our technology.

Of course, to thrive in the mainstream, a technology must meet core enterprise requirements for rigor that any mature organization expects, in order to weave Hadoop into their established practices. It was inspiring to hear how we’ve been doing from customers including United HealthCare Group, GE, Webtrends, TrueCar, Verizon, Schlumberger, Home Depot, Rogers Communications, and Symantec. Here is some of what they shared on stage (which you can watch here, along with the rest of the keynotes):

We see it as 50 billion machines getting connected in roughly the next 5-10 years. As that happens, we’re going to see the same level of transformation and business model innovation as we saw when a billion people got connected through the consumer Internet…With that comes the need for industrial data management at scale…To do it well, we believe you need an open standard approach and we are excited about the role that Hadoop plays in that as part of the ecosystem.

–Vince Campisi, CIO
GE Software

We went from being able to spit out all this vehicle intelligence once a day on the core vehicles that we cared about to being able to do that every thirty minutes across the entire set of vehicles that we’re bringing in…Our goal is to acquire everything—literally every piece of data that we can in the automotive industry and synthesize it within 15 minutes…

…Is Hadoop going to work in the enterprise?…We talk about data governance and master data management and security. Of course. It could do it two, three, four years ago, it just was harder than it is going to be in something like HDP 2.3.

–Russell Foltz-Smith, General Manager and SVP for Data Products
TrueCar

With a wireless company as big as we are—over a 100 million customers—any small incremental change in churn percentage means significant changes in revenue…We were able to increase the scoring of our churn modeling by a factor of 20% in predictability…

In partnership with Hortonworks…we are now ingesting over 250 billion records a day…we have about 1,000 nodes in play today, driving value.

–Rob Smith, Executive Director
Verizon Wireless

Symantec processes security logs globally for just about everybody…The security system itself used to get backed up and there would be cases where, on average, the time to detection was [3-4 hours]…On the Hadoop ecosystem, we’re able to reduce that latency down to seconds.

…Going back, I would say, kill the fear. Just kill the fear. When it comes to figuring it out: smart people, cool tech, figure it out. Haters to the left, kill the fear, just go for it. Get it started and go.

Immediately after the customer panel Geoffrey Moore—yes, that Geoffrey Moore—offered his own take on the kinds of buying decisions customers are making today and how they compare with the roadmap he laid out at Summit just three years ago. Geoff declared that Hadoop has crossed the chasm and he discussed important use cases for mainstream adoption.

If you’re seeing a recurring theme here, it’s that Hadoop has never been more ready for business. Hortonworks is proud of our leadership role defining the emerging category of Open Enterprise Hadoop. With the amount of data in the world set to increase twentyfold over the next five years—most of it unstructured—enterprises must find new ways to manage and analyze it at scale. We never stop working to deliver the capabilities businesses need to drive transformational outcomes, and it was thrilling to see it all live and on stage at Summit. If you made it to San Jose, I hope you enjoyed it as much as we did. If you didn’t—or if you’d just like to relive the magic—you can find our keynote videos here.

Is it too soon to start looking forward to Hadoop Summit 2016?

More Summit and Hadoop Highlights

]]>http://hortonworks.com/blog/the-view-from-the-hadoop-summit-2015/feed/0Oracle’s Big Data Integration Offering Certified on Hortonworks HDP 2.2http://hortonworks.com/blog/oracles-big-data-integration-certified-hdp2-2/
http://hortonworks.com/blog/oracles-big-data-integration-certified-hdp2-2/#commentsWed, 17 Jun 2015 20:14:17 +0000http://hortonworks.com/?p=73150Oracle and Hortonworks continue to work on bringing the latest ELT and real-time transactional data streaming capabilities to the Hortonworks Data Platform (HDP). Recently Oracle completed certification testing for HDP 2.2 for both Oracle Data Integrator and Oracle GoldenGate for Big Data, both integral parts of the Oracle Data Integration product portfolio. These releases certified on HDP 2.2 are the latest in the series of advanced Big Data updates and features that Oracle Data Integration is rolling out for customers to help take their Hadoop projects to the next level of enterprise integration.

Oracle Data Integrator (ODI) for Big Data helps transform and enrich data within the big data reservoir/data lake without users having to learn the languages necessary to manipulate them. ODI for Big Data generates native code that is then run on the underlying Hadoop platform without requiring any additional agents.…

]]>Oracle and Hortonworks continue to work on bringing the latest ELT and real-time transactional data streaming capabilities to the Hortonworks Data Platform (HDP). Recently Oracle completed certification testing for HDP 2.2 for both Oracle Data Integrator and Oracle GoldenGate for Big Data, both integral parts of the Oracle Data Integration product portfolio. These releases certified on HDP 2.2 are the latest in the series of advanced Big Data updates and features that Oracle Data Integration is rolling out for customers to help take their Hadoop projects to the next level of enterprise integration.

Oracle Data Integrator (ODI) for Big Data helps transform and enrich data within the big data reservoir/data lake without users having to learn the languages necessary to manipulate them. ODI for Big Data generates native code that is then run on the underlying Hadoop platform without requiring any additional agents. ODI separates the design interface to build logic and the physical implementation layer to run the code. This allows ODI users to build business and data mappings without having to learn HiveQL, Pig Latin and Map Reduce.

Oracle GoldenGate is a leader in real-time transactional data replication. With the release of Oracle GoldenGate for Big Data, also certified on Hortonworks HDP, Oracle has extended the capabilities of GoldenGate beyond relational databases, and into the Hadoop eco system. Oracle GoldenGate for Big Data provides real-time transaction streaming to Apache Flume, Apache HDFS, Apache Hive and Apache Hbase. With its easy–to-use real-time streaming solution, Oracle GoldenGate for Big Data enables IT organizations to quickly integrate into big data systems without extensive training and management resources, and facilitates better insights and timely action. The product also includes Oracle GoldenGate for Java, which enables customers to easily integrate to additional big data systems, such as Oracle NoSQL, Apache Kafka, Apache Storm, Apache Spark, and others.

In addition to the latest ODI and GoldenGate certifications, the Hortonworks Data Platform 2.2 is certified with the Oracle Big Data Connectors.

We’re delighted to be working with Oracle to bring these certified solutions to the market, helping to assure customers that these products are tested and integrated together.

]]>http://hortonworks.com/blog/oracles-big-data-integration-certified-hdp2-2/feed/0Cisco and Hortonworks Team-Up to Optimize Your Enterprise Data Warehousehttp://hortonworks.com/blog/cisco-cisco-and-hortonworks-team-up-to-optimize-your-enterprise-data-warehouse/
http://hortonworks.com/blog/cisco-cisco-and-hortonworks-team-up-to-optimize-your-enterprise-data-warehouse/#commentsMon, 15 Jun 2015 17:42:28 +0000http://hortonworks.com/?p=73041As businesses continue to create data at an ever-increasing pace, data architectures are strained under the loads placed upon them. Data volumes continue to grow considerably, low-value workloads like ETL consume more and more processing resources, and new types of data can’t easily be captured and put to use. Organizations struggle with escalating costs, increasing complexity, and the challenge of expansion.

This coming Wednesday, Big Data experts will look at how Hadoop is enabling a broad range of organizations to address these challenges. By moving high data volumes to Hadoop, offloading ETL processes, and enriching existing data architectures with new data for increased value, companies can dramatically reduce their costs and build flexible modern data architecture.

Join the Big Data and Analytics Virtual Conference organized by Cisco on June 17, in particular the Hortonworks session from 1 pm to 1:30 pm ET, and learn how Cisco and Hortonworks have teamed up to optimize the enterprise data warehouse, reduce primary storage costs, enable access to historical data, and create new analytics opportunities within the high-performance enterprise data warehouse environment.…

]]>As businesses continue to create data at an ever-increasing pace, data architectures are strained under the loads placed upon them. Data volumes continue to grow considerably, low-value workloads like ETL consume more and more processing resources, and new types of data can’t easily be captured and put to use. Organizations struggle with escalating costs, increasing complexity, and the challenge of expansion.

This coming Wednesday, Big Data experts will look at how Hadoop is enabling a broad range of organizations to address these challenges. By moving high data volumes to Hadoop, offloading ETL processes, and enriching existing data architectures with new data for increased value, companies can dramatically reduce their costs and build flexible modern data architecture.

Join the Big Data and Analytics Virtual Conference organized by Cisco on June 17, in particular the Hortonworks session from 1 pm to 1:30 pm ET, and learn how Cisco and Hortonworks have teamed up to optimize the enterprise data warehouse, reduce primary storage costs, enable access to historical data, and create new analytics opportunities within the high-performance enterprise data warehouse environment.

If you are curious about more data topics, don’t miss parts two and three of this virtual event series, respectively focused on operational analytics and business analytics (September 9 and October 7). Register once, access all events.

]]>http://hortonworks.com/blog/cisco-cisco-and-hortonworks-team-up-to-optimize-your-enterprise-data-warehouse/feed/0Driving Business Transformation with Open Enterprise Hadoophttp://hortonworks.com/blog/driving-business-transformation-with-open-enterprise-hadoop/
http://hortonworks.com/blog/driving-business-transformation-with-open-enterprise-hadoop/#commentsMon, 08 Jun 2015 17:06:42 +0000http://hortonworks.com/?p=72807Hadoop isn’t optional for today’s enterprises—that much is clear. But as companies race to get control over the significantly growing volumes of unstructured data in their organizations, they’ve been less certain about the right way to put Hadoop to work in their environment.

We’ve already seen a variety of wrong approaches with proprietary extensions that limit innovation, fragment architectures and trade openness for vendor lock-in. Now a new consensus is forming around an emerging category that drives truly transformational outcomes: Open Enterprise Hadoop.

Hortonworks pioneered this category, and the Global 5000 is rapidly adopting its unique approach. You can see this momentum in our Hortonworks Q1 earnings announcement. We were able to achieve 200 percent growth in customers and 167 percent growth in GAAP revenue.

In fact, Hortonworks’ innovative approach to this market has been noticed by more than just the industry analyst community.…

]]>Hadoop isn’t optional for today’s enterprises—that much is clear. But as companies race to get control over the significantly growing volumes of unstructured data in their organizations, they’ve been less certain about the right way to put Hadoop to work in their environment.

We’ve already seen a variety of wrong approaches with proprietary extensions that limit innovation, fragment architectures and trade openness for vendor lock-in. Now a new consensus is forming around an emerging category that drives truly transformational outcomes: Open Enterprise Hadoop.

Hortonworks pioneered this category, and the Global 5000 is rapidly adopting its unique approach. You can see this momentum in our Hortonworks Q1 earnings announcement. We were able to achieve 200 percent growth in customers and 167 percent growth in GAAP revenue.

In fact, Hortonworks’ innovative approach to this market has been noticed by more than just the industry analyst community. Michal Katz from RBC notes in her blog that Hortonworks’ CEO Rob Bearden is joining the select top industry leaders to transform the industry with next generation solutions.

Here’s why so many organizational leaders are making Open Enterprise Hadoop the foundation of their big data strategy.

Open Enterprise Hadoop takes direct aim at those shortcomings that hampered previous approaches to Hadoop in the enterprise. Those earlier attempts typically relied on proprietary extensions of early Hadoop projects, a branching approach that sealed them off from subsequent innovations, locked them into vendor-specific analytics, and often undermined integration with YARN, the open data operating system.

By taking that path, those Hadoop vendors surrendered much of the rapid innovation that comes from open source development, making the platform feel all too much like the legacy technologies that it was supposed to surpass.

Instead of creating their own proprietary extensions, vendors in this category rely solely on open source components and on the open community. They harness the powerful processes governed by the Apache Software Foundation and its enterprise-savvy committers—including more than 100 at Hortonworks alone (which employs the most Hadoop committers in the industry).

As a result of this very intentional strategy, Open Enterprise Hadoop solutions:

Leverage the full power of open source development. Open Enterprise Hadoop solutions remain “on-the-trunk,” so enterprises benefit from the latest community innovations as soon as they become available.

Consolidate data silos. Open Enterprise Hadoop requires that all Hadoop ecosystem projects leverage the Apache Hadoop YARN data operating system. This makes it possible for organizations to access a centralized “data lake” via multiple heterogeneous access methods, support many different users at once, and scale to deployments managing petabytes of data. Open Enterprise Hadoop also ensures full interoperability beyond the Hadoop core through the promotion of open standards for the broad technology ecosystem.

Provide robust operations, security, and governance capabilities. Open Enterprise Hadoop vendors make the platform ready to meet those enterprise standards through the work of project committers who combine enterprise savvy with a commitment to open source principles and processes.

The latest version of Hortonworks Data Platform (HDP) will be introduced this month, and it illustrates some major advances only possible with this approach of harnessing the power of the community. Those advances in HDP 2.3 include:

A breakthrough user experience. Dramatic reductions in administrative complexity and a vastly improved user experience speed time to value. Fast setup, a streamlined configurations, and simple cluster formation help you get the platform up and running quickly, while real-time dashboards make it easier to maximize cluster health.

Enhanced security and governance. HDP 2.3 extends our data governance initiative with Apache Atlas. IT can use a single administrative console to set security policy across the entire cluster. Complete capabilities for authentication, authorization, and auditing support full access control and reporting. HDP can encrypt data at rest and in motion.

Proactive support. Hortonworks frees more finite engineering resources from maintaining the internals of the data platform. While you can still self-support HDP, we provide subscriptions for 24 x 7 support along with patches, updates, and other fixes to keep your critical enterprise workloads running.

What does all this mean to you? As the industry moves toward open Hadoop solutions, Hortonworks is driving the Open Enterprise Hadoop category to deliver transformational outcomes for today’s businesses.

We’re working closely with our 437 customers—and counting—to understand the needs of enterprises across all industries, and then we leverage the power of our leadership in the open source community to innovate the technology according to those priorities. And as a community, we do it faster than any single vendor ever could.

And we’re just getting started. You’ll be hearing a lot more about Open Enterprise Hadoop in the months ahead—and you’ll like what you hear.

Learn More

Matthew Morgan is the vice president of global product marketing for Hortonworks. In this role, he leads Hortonworks product marketing, vertical solutions marketing, and worldwide sales enablement. His background includes twenty years in enterprise software, including leading worldwide product marketing organizations for Citrix, HP Software, Mercury Interactive, and Blueprint. Feel free to connect with him on LinkedIn or visit his personal blog

]]>http://hortonworks.com/blog/driving-business-transformation-with-open-enterprise-hadoop/feed/0HDP for Manufacturing Yield Optimization in Pharmahttp://hortonworks.com/blog/hdp-for-manufacturing-yield-optimization-in-pharma/
http://hortonworks.com/blog/hdp-for-manufacturing-yield-optimization-in-pharma/#commentsThu, 04 Jun 2015 20:11:14 +0000http://hortonworks.com/?p=72566This is a guest blog post from Jerry Megaro, Merck’s Director of Innovation and Manufacturing Analytics. Jerry established the practice of Data Excellence and Data Sciences within the Merck Manufacturing Division and now leads initiatives to transform Merck Manufacturing into a data-driven organization that enhances the company’s performance across the supply chain.

Hortonworks experience working with top pharma manufacturers indicates an exciting opportunity to improve manufacturing performance by proactively managing process variability. Vaccine production is a great example to consider, since it involves the use of live, genetically engineered molecules, as well as a highly technical manufacturing process with many steps.

As a result, manufacturers need to monitor hundreds of upstream and downstream parameters to ensure the quality and purity of the ingredients and vaccines being produced. Two batches of a particular vaccine, produced using an identical manufacturing process, can exhibit significant yield variances.…

]]>This is a guest blog post from Jerry Megaro, Merck’s Director of Innovation and Manufacturing Analytics. Jerry established the practice of Data Excellence and Data Sciences within the Merck Manufacturing Division and now leads initiatives to transform Merck Manufacturing into a data-driven organization that enhances the company’s performance across the supply chain.

Hortonworks experience working with top pharma manufacturers indicates an exciting opportunity to improve manufacturing performance by proactively managing process variability. Vaccine production is a great example to consider, since it involves the use of live, genetically engineered molecules, as well as a highly technical manufacturing process with many steps.

As a result, manufacturers need to monitor hundreds of upstream and downstream parameters to ensure the quality and purity of the ingredients and vaccines being produced. Two batches of a particular vaccine, produced using an identical manufacturing process, can exhibit significant yield variances. This unexplained variability negatively impacts manufacturing yield and overall business performance.

The potential benefits for manufacturers are tremendous. IDC Manufacturing Insights estimates that manufacturers, on average, still sacrifice between 200 and 400 basis points in margin to adverse quality. IDC also estimates that world class quality can translate to 20 to 30 percent more revenue from loyal customers over the lifetime of their relationship with the company as well as improve conquest (taking customers from competitors) sales as much as 25 percent. All of this adds up to a potential before-taxes margin improvement—assuming a manufacturer with $10B in revenue and 20% margins—of upwards to 40%.

At Merck, we generate a huge amount of data in our manufacturing operations. But despite the huge volumes, we have wrestled in the past with two key challenges that barred us from being able to make use of the all the data to completely understand all aspects of our manufacturing processes, which in can turn improve production performance.

Neither of these challenges is specific to pharmaceuticals or to manufacturing (my colleagues in other industries face them as well).

Challenge #1: Data Silos

The first challenge had to do with data silos. Large datasets couldn’t help us improve our yields if they were siloed across many disparate systems and data repositories, which made them extremely difficult to combine in one place for a single view of our manufacturing operations.

Merck has established many highly tuned, specialized systems to gather data. Each system gives us a different view into the manufacturing process.

We gather real-time shop floor data in time series. As we make a batch, we capture data from machine sensors to monitor values like temperature trends, humidity levels, flow rates, pressure, and agitator speeds.

We retain maintenance and calibration records on our equipment. For example, a specialized instrument like a mass spectrometer measures a concentration of off-gas. Instrument sensor data could be useful for both real-time decisions and also for historical cause-effect analysis over many batches and many years.

Throughout the various stages of our manufacturing process, we capture many quality measures on each batch. Understanding quality data is particularly important, because just one batch lost to quality issues could cost the company one million dollars and could jeopardize supply of our medicines.

Other systems manipulate and control the manufacturing facility to maintain the conditions required for the sensitive biological processes that Merck manufacturing must control very precisely. These complex processes generate huge volumes of data in a variety of data types and formats, which become siloed across and ultimately trapped in disparate manufacturing, quality and maintenance systems, each with their own separate file systems.

Today the Hadoop ecosystem lets us bring all of that diverse data together into one environment. I have long regarded this single view of data as the “Holy Grail of manufacturing process optimization”. We now have this aggregate view, which compliments our existing underlying systems, and maintains the fine-grain detail of the raw data required for end-to-end process visibility and optimization.

Challenge #2: Limited Data Retention

Some manufacturing questions are asked on a batch-by-batch level, but others need to be asked over a large number of batches that span years. Two economic factors in existing data technologies obstructed our desire to retain all operational data for long-term analysis.

It is expensive to store each unit of data in the storage technologies that we’ve been using for years. Ours is a highly measured, intensely scrutinized and regulated process. We already know which data are most valuable to retain, and so it makes sense to pay a higher average cost to store that data—but there is far more data whose value is uncertain. Moreover, low-value data today, may grow in value as our understanding of our processes change. The point is this: the cost of storage can constrain the amount of additional data that we could like capture or retain electronically after the batch was produced and shipped. We may retain some paper records by buying more cabinets, but those can be very time consuming to search.

Another cost driver was the need to transform multiple sources of raw data into a structure and format required by our existing storage platforms. We call this constraint “schema on write”. This approach requires that you know the questions to be asked of the data prior to putting the data into the correct “schema.”

Hadoop has a “schema on read” architecture that allows us to capture, store and bring together a huge variety of data in one shared environment, without first having to go through a costly and labor intensive process to create a schema for this data in advance. The questions that we will want to ask in the future will then determine the data we seek. And with Hadoop, we have far more data to explore, in a shared environment where we can join it in various ways to rapidly answer new questions that will help us improve our processes.

Because of schema on read we can now create new data sets that never existed in their natural habitats, and we can keep those as long as we need them.

Challenge #3: The High Cost of Testing Hypotheses in the Real World

Another advantage of having all of the data together is that we can investigate some of our intuitions without having to run an experiment in a physical environment.

For example, we had a belief that we were diluting ingredients by washing them with water chase at a certain stage in the process. It would have been too risky and costly to try to test that hypothesis on the shop floor. With Hadoop we were able to collect all of the historical data to see if that hypothesis was true, without having to conduct the experiment in the plant. (The data showed that there was no effect.)

Solution: Yield Optimization with Hortonworks Data Platform

We can only answer important questions about our highly variable, biological manufacturing process if we have enough data across that entire process. Now with Hadoop, we can ask the important questions, identify systemic patterns in the data and take advantage of those patterns by improving our processes. This valuable capability has enabled us to identify variables that have the greatest impact on product yields.

Here are some of the questions asked and answered with the help of Hadoop.

How can we predict how a piece of equipment will perform?

We mine the data from our equipment maintenance system. With more data on more instruments spanning years back in time, we can establish performance profiles for individual machines and their critical components from previously unseen patterns. These profiles could then be used to monitor streaming sensor data in real time to proactively detect and respond appropriately. This can substantially improve the overall productivity of operations and avoid unnecessary interruptions.

How can we enhance the yield of a particular protein in our fermentation process?

In manufacturing, you always want to control something. We have a lot of control levers to do that. We manage heat. We change the agitation rate. We change the rate at which we add ingredients.

Fermentation is one of the biological processes that we control. We cultivate yeast cells. During the biomass growth phase, we want the cells to generate more cells but without gene expression.

In the next phase we do want to transition to gene expression. This means passing the energy from the cells growing new cells, into making the protein. We need data to manage and control that transition.

In a biological environment this phase can be quite variable and it depends on external factors. For example, salinity is an important condition to monitor. Oxygen may dissolve from gas to liquid at different rates because of agitation. The biology is constantly changing.

What we’ve discovered is that there’s often a proxy or measure that gives you a good indication of everything that’s going on in that batch, such as a respiratory quotient. We need to understand that and use it as a feedback mechanism.

Feedback control theory is an interdisciplinary branch of engineering and mathematics that deals with the behavior of dynamic systems and how their behavior is modified by feedback. Say you’re sitting in your house and it’s a hot day out and it’s getting warm inside. The feedback is the inside temperature, and you prefer 68 degrees.

The day is getting hotter, so you turn on your air conditioning. Now it’s 75 in the house and based on that feedback the AC knows to control the temperate downward to 68. There are many algorithms that are used for that feedback-control mechanism.

The same concept is useful in pharmaceutical manufacturing. Having access to all the production data, allows us to perform a virtual experiment to determine the best operating regions of our process. This enhances our ability to understand how we may increase the yield of the proteins of interest.

Hadoop has also helped us with another process impediment: the speed of our analysis. For example, if a batch needed to be investigated for whatever reason, it could take us months to gather all the data that may exist in a combination of both paper and electronic formats and then aggregate it to understand what caused the issue of interest.

Now we are working towards having “curated data on tap” across the end-to-end process. Data lineage is a top priority. We know where the data came from. It will still take time to analyze it, but it might take a week to find answers or refine our analysis, rather than many months.

Looking Ahead

Now we’re working on building up our ability to analyze streaming data in real time. Ideally, we would want algorithms to analyze real-time data, match that to a profile of a “golden batch” that we have in history and then alert us if any batch begins to deviate from that ideal profile. We’re not there yet, but that’s where we’d like to be.

This is one of the reasons that our partnership with Hortonworks is so important. They’ve already worked with many customers across different industries to implement streaming analytics, so their guidance is valuable as we plan to extend our use of their platform to further optimize our yields. We trust their advice and count on their Hadoop expertise.

]]>http://hortonworks.com/blog/hdp-for-manufacturing-yield-optimization-in-pharma/feed/0HARMAN and Hortonworks Collaborate for the Connected Carhttp://hortonworks.com/blog/harman-and-hortonworks-collaborate-for-the-connected-car/
http://hortonworks.com/blog/harman-and-hortonworks-collaborate-for-the-connected-car/#commentsWed, 03 Jun 2015 16:07:55 +0000http://hortonworks.com/?p=72480Today from TU-Automotive Detroit, we announced our partnership with HARMAN, the leading global infotainment, audio and software services company.

Hortonworks and HARMAN are partnering to transform the automotive enterprise by enabling the connected car ecosystem with real-time, Internet of Things (IoT) data, insights and prognostics solutions.

The widespread adoption of connected devices is accelerating. Gartner Research expects 25 billion installed devices by 2020. Together, Hortonworks and HARMAN will offer solutions to help automotive manufacturers gain valuable insights by analyzing real-time information based on data streaming from connected cars.

Hortonworks and HARMAN Drive the Automotive Enterprise

Hortonworks Data Platform (HDP) processes sensor data from the connected car – collecting it, storing it and analyzing it. This can be used for real-time alerts on driver behavior, road safety or the need for maintenance and repairs. Additionally, HDP can provide a single view of this data to inform automotive engineers about driving behavior, safety risk and car performance across all vehicles.…

]]>Today from TU-Automotive Detroit, we announced our partnership with HARMAN, the leading global infotainment, audio and software services company.

Hortonworks and HARMAN are partnering to transform the automotive enterprise by enabling the connected car ecosystem with real-time, Internet of Things (IoT) data, insights and prognostics solutions.

The widespread adoption of connected devices is accelerating. Gartner Research expects 25 billion installed devices by 2020. Together, Hortonworks and HARMAN will offer solutions to help automotive manufacturers gain valuable insights by analyzing real-time information based on data streaming from connected cars.

Hortonworks and HARMAN Drive the Automotive Enterprise

Hortonworks Data Platform (HDP) processes sensor data from the connected car – collecting it, storing it and analyzing it. This can be used for real-time alerts on driver behavior, road safety or the need for maintenance and repairs. Additionally, HDP can provide a single view of this data to inform automotive engineers about driving behavior, safety risk and car performance across all vehicles.

With HDP, connected car data can be stored in any format for processing, and integrated easily with existing data architectures through a full range of deployment options.

Incorporating offerings from HARMAN’s Symphony Teleca and Redbend companies, HARMAN is the first tier 1 automotive supplier to enable the full range of IoT and V2X applications for the connected car, including deployment of software through over-the-air (OTA) updates, diagnostics and telematics to big data, service management and analytics – all done securely and seamlessly.

Data Discovery for Product Improvements: Analyze driver habits and correlate that with car performance and maintenance records to give your engineers empirical insights on how to optimize existing and future models.

A Single View of Manufacturing Operations: Capture sensor data from manufacturing operations and store it forever, providing real-time and historical analysis to maximize quality yields and minimize the risk of scrap or recalls.

“Combining the capabilities of HARMAN services and Hortonworks, automakers and their suppliers will have access to a scalable platform for real-time insights, new innovative service creation and predictive analytics-based solutions that can minimize the risk of costly recalls and reduce warranty expenses,” said Sanjay Dhawan, President, HARMAN Services Division. “This is a very exciting step forward in the evolution of the connected car as we speed up the time to market with powerful functionalities that benefit the OEMs and their drivers.”

If you are attending TU-Automotive Detroit, please stop by our booth 106 or Symphony Teleca’s booth C112 to see live demos that show the capabilities.

It you’re not at the conference, watch this blog for news as our partnership gains momentum.

About HARMAN and Symphony Teleca

HARMAN (NYSE:HAR) designs and engineers connected products and solutions for consumers, automakers, and enterprises worldwide, including audio, visual and infotainment systems; enterprise automation solutions; and software services. HARMAN also is a technology and integration services leader for the Automotive, Mobile, Telecommunications and Enterprise markets. More than 25 million automobiles on the road today are equipped with HARMAN audio and infotainment systems.

Symphony Teleca, a HARMAN company, is a technology services company that helps customers innovate at the convergence of smart devices, software, cloud and data. Symphony Teleca’s engineering competencies – UX Design, Analytics and Agile Engineering, and service offerings across ideation, development and integration, help customers accelerate time to market.