Hortonworks » Hortonworks Bloghttp://hortonworks.com
Develops, Distributes and Supports Enterprise Apache Hadoop.Sun, 02 Aug 2015 15:42:57 +0000en-UShourly1http://wordpress.org/?v=4.2.3Introducing Availability of HDP 2.3 – Part 3http://hortonworks.com/blog/introducing-availability-of-hdp-2-3-part-3/
http://hortonworks.com/blog/introducing-availability-of-hdp-2-3-part-3/#commentsWed, 29 Jul 2015 20:09:18 +0000http://hortonworks.com/?p=75495Last week, on July 22nd, we announced the general availability of HDP 2.3. Of the three part blog series, the first blog summarized the key innovations in the release—ease of use & enterprise readiness and how those are helping deliver transformational outcomes—while the second blog focused on data access innovation. In this final part, we explain cloud provisioning, proactive support, and other general improvements across the platform.

Automated Provisioning with Cloudbreak

Proactive Support with Hortonworks Smart Sense

General Platform Improvements

Automated Provisioning with Cloudbreak

Since Hortonworks’ acquisition of SequenceIQ, the integrated team has been working hard to complete the deployment automation for public clouds including Microsoft Azure, Amazon EC2, and Google Cloud. We are pleased to deliver Cloudbreak 1.0 along with HDP 2.3. Support and guidance are available to all Hortonworks customers who have an active Enterprise Plus support subscription, and we’ve published an initial set of installation and administrative documentation.…

]]>Last week, on July 22nd, we announced the general availability of HDP 2.3. Of the three part blog series, the first blog summarized the key innovations in the release—ease of use & enterprise readiness and how those are helping deliver transformational outcomes—while the second blog focused on data access innovation. In this final part, we explain cloud provisioning, proactive support, and other general improvements across the platform.

Automated Provisioning with Cloudbreak

Since Hortonworks’ acquisition of SequenceIQ, the integrated team has been working hard to complete the deployment automation for public clouds including Microsoft Azure, Amazon EC2, and Google Cloud. We are pleased to deliver Cloudbreak 1.0 along with HDP 2.3. Support and guidance are available to all Hortonworks customers who have an active Enterprise Plus support subscription, and we’ve published an initial set of installation and administrative documentation.

Cloudbreak is a cloud agnostic tool for provisioning, managing and monitoring on-demand Hadoop clusters. For administrators, it provides scripting functionality to automate tasks. Through its easy user interface, administrators can manage services for any configuration.

Cloudbreak can be used to provision Hadoop across major cloud providers: Microsoft Azure, Amazon Web Service, and Google Cloud Platform. It enables efficient usage of cloud platforms via policy-based autoscaling that can expand and contract the cluster based on Hadoop usage metrics and defined policies. And, it provides centralized and secure user experience to Hadoop cluster through rich web interface as well as REST API and CLI shell across all cloud providers. It is fundamentally integrated with Apache Ambari and heavily leverages the Ambari Blueprints functionality allowing users to reliably and repeatedly stand-up clusters based on their needs.

While Cloudbreak’s primary role is to launch on-demand Hadoop clusters in the cloud, the underlying technology actually does more. It can, for example, launch on-demand Hadoop clusters in any environment that supports Docker – in a dynamic way. Because all the setup, orchestration, networking, and cluster membership are done dynamically, there is no need for a predefined configuration.

While we are focused initially on the public cloud deployment options and flexibility, we are excited about future possibilities of leveraging Docker and Cloudbreak to deliver the maximum deployment choice for our customers within public clouds and within their data centers.

Proactive Support with Hortonworks SmartSense™

As we’ve seen the tremendous appetite for the adoption of Hadoop over the past 2 years, we have also observed more and more mission critical applications and workloads being placed on top of Hadoop. Not surprisingly, our rapidly growing base of customers look to Hortonworks for guidance and best practices to minimize their operational risk and maximize their resources and staff for Hadoop operations. To meet that demand, we have developed Hortonworks SmartSense. It enriches our already world-class support offering for Hadoop by:

Providing proactive insights and recommendations to customers about their cluster utilization and its health.

Quickly and easily capturing log files and metrics for faster support case resolution.

Today, we are delivering a new user experience for SmartSense via the Ambari Views Framework in addition to completing the integration of the corresponding recommendations through our support portal. The SmartSense View plugs seamlessly into Ambari and allows for Hadoop operators to easily configure and manage how the information is gathered from the cluster.

SmartSense’s capabilities, says Cheolho Minale, vice president of technology at The Mobile Majority, will allow his Hadoop team to optimize its HDP cluster’s ad performance:

At The Mobile Majority, we have been using Hortonworks Data Platform to optimize ad performance on behalf of our customers. We’re excited to look into Hortonworks SmartSense as a way to continuously optimize our HDP cluster as it grows over time.

This is only the beginning for Hortonworks SmartSense. We believe that we can share valuable insights with our customers as we gain a deeper understanding of how our customers use HDP within their HDP environments, how their performance and usage peaks and ebbs, and how they optimize their HDP clusters using Smart Sense.

General Platform Improvements

Finally, I wanted to wrap up the HDP 2.3 blog series with a set of selective improvements to key components of HDP. Each of these improvements makes a difference in terms of ease of use, enterprise readiness, and simplification. Notable enhancements made in this release that we haven’t yet touched on elsewhere are described below:

Apache Hadoop 2.7.0 was released back in April, and with HDP 2.3 we are shipping Hadoop 2.7.1. The engineering work completed as part of Hadoop 2.7.1 ensures that it is stable and ready-to-use. Across its many components, here are some notable enhancements:

YARN

Non-exclusive Node Labels – where applications are given preference for the Label they specify, but not exclusive access (YARN-3214). This allows for greater resource sharing within a single cluster and is particularly useful for organizations where workload types shift at different times of day. The non-exclusive label allows for those nodes that might be typically dedicated for interactive workloads during the day can now be used to support nightly batch processing as well.

Fair sharing across apps for same user same queue, per queue scheduling policies (YARN-3306). This allows for the same user to submit multiple queries within the same queue and then fairly share the resources allocated to her across the jobs she has submitted.

HDFS

Improve distcp efficiency: reduced time and processing power needed to mirror datasets across cluster (HDFS-7535, MAPREDUCE-6248)

OOZIE

Oozie is the defacto job scheduler for Hadoop. In Oozie 4.2.0, 2 additional actions have been added, increasing the ability for users to define workflows that include HiveServer2 and Spark. In addition, a key enhancement for stopping (both kill and suspend) jobs by their coordinator name has been added:

Stop (kill, suspend) and Resume jobs by coordinator name (and other filters) (OOZIE-2108)

What’s Next?

This year Hortonworks has focused on three key themes: ease of use, enterprise readiness, and simplification. We want to make HDP easy to use for all types of users. This means continuing to deliver breakthrough user experiences for cluster administrators, developers, and data workers, from data scientists to architects. We want to increase the adoption of HDP within the enterprise, and this means improving ease of operations, increasing security, and providing comprehensive data governance.

Lastly, as we bring all together the various Apache projects that make up HDP, we want to ensure that they work together in a seamless, integrated, and simple to use data processing platform.

We are excited about the progress we’ve made with the arrival of HDP 2.3 and we hope you enjoy the results of all of the open source developers within the community who made this possible.

]]>http://hortonworks.com/blog/introducing-availability-of-hdp-2-3-part-3/feed/0A Petrophysicist’s Perspective on Hadoop-based Data Discoveryhttp://hortonworks.com/blog/a-petrophysicists-perspective-on-hadoop-based-data-discovery/
http://hortonworks.com/blog/a-petrophysicists-perspective-on-hadoop-based-data-discovery/#commentsWed, 29 Jul 2015 18:27:22 +0000http://hortonworks.com/?p=75431Along with the Hortonworks Oil and Gas team, I have been working closely with Laurence Sones, senior petrophysicist, to understand how Hadoop-based Data Discovery is enabling Geologic and Geophysical (G&G) teams to improve decision-making across their assets. What follows is a Q&A session with Laurence discussing his perspectives on data discovery.

Kohlleffel: Laurence, you have a wealth of experience in the oil and gas industry. Please discuss your background and some of the roles that you have taken on.

Sones: Sure, I began as a field engineer with Schlumberger in the logging and perforating area. Following that, I moved into wireline sales and then did a stint as a Service Quality Manager for both open hole wireline and cased hole wireline. In addition, I was a well placement engineer for Schlumberger and then moved to Anadarko performing geosteering. Lastly, I was with Forest Oil as a petrophysicist.…

]]>Along with the Hortonworks Oil and Gas team, I have been working closely with Laurence Sones, senior petrophysicist, to understand how Hadoop-based Data Discovery is enabling Geologic and Geophysical (G&G) teams to improve decision-making across their assets. What follows is a Q&A session with Laurence discussing his perspectives on data discovery.

Kohlleffel: Laurence, you have a wealth of experience in the oil and gas industry. Please discuss your background and some of the roles that you have taken on.

Sones: Sure, I began as a field engineer with Schlumberger in the logging and perforating area. Following that, I moved into wireline sales and then did a stint as a Service Quality Manager for both open hole wireline and cased hole wireline. In addition, I was a well placement engineer for Schlumberger and then moved to Anadarko performing geosteering. Lastly, I was with Forest Oil as a petrophysicist.

Kohlleffel: Can you discuss both the geological analysis (surface, subsurface, and core drilling) and geophysical analysis (seismic, gravity, magnetic, electrical, geochemical) processes? How does working with a broad set of data allow you to make a decision or recommendation regarding high potential areas?

Sones: My foundational understanding of being able to perform effective log analysis starts with the time that I spent in the field and being able to identify high quality or poor quality log data based on how the data is collected, and then understand all of the various parameters at the time of acquisition. The time that I have spent in the industry has also given me a clear view of the people that are the users of the data and the applications that they use.

Initially, we review production for an area and type curves for production are developed and reviewed by reservoir engineers and geologists. Next, both geologists and petrophysicists review well logs and establish a basic petrophysical model based on rock type, fluid type, etc. to find a good correlation between the properties recorded on logs and the actual physical properties of the rocks. With this, water saturation, effective porosity, and net pay can be calculated and the acreage can be graded based on those properties. Reservoir modeling may be done with volumetrics that incorporate petrophysical properties and production data with which to correlate the production to the petrophysical properties.

Looking at the geological structure is also part of standard analysis of a play – mapping formation tops and fluid contacts, if those are present, and also we also review seismic data when available.

Lastly, we do core analysis, which incorporates multiple datasets including rock composition, water/oil saturation, porosity, permeability, SEM, geological description, mechanical properties which can all be used for an advanced petrophysical model. Typically, we can develop a strong correlation between the recorded log properties and the actual properties of the rock.

Kohlleffel: What are some of the manual processes involved in working with all of these disparate datasets?

Sones: Looking at the most recent field that I worked in, it was manually intense, and 90% of my time was spent in QC of logs, data quality review, and properly identifying the curves that were in the files to ensure that the proper information was being used to perform the analysis. The final 10% of my time on a project was available for the fun part, the analysis.

All of the manual tasks take a significant amount of time with any size field. Commonly, a geophysicist might start by checking depth references for every single well on a public site for verification to ensure that the highest quality data is being used and that at the end a high quality result is produced.

Kohlleffel: Operators are under tremendous pressure to reduce costs and this is putting enormous organizational and financial pressure on existing models. Can you comment on the feedback we are getting from operators that the use of Hadoop as an economical platform for advanced analytics is key to their ability to deliver an optimized cost model?

Sones: Producers are looking for any way to reduce costs, and I see multiple ways to do this with advanced analytics driven by Hadoop, whether it’s effectively checking your depth reference or having a powerful sensitivity analysis for driving cost down and understanding where you should be drilling. I am seeing an increased number of techs hired in some places to manage databases and data sources, but it’s not a replacement for optimization and what advanced analytics with Hadoop can bring to the table. It’s really just throwing more manpower at the problem versus applying better technology, which could benefit techs, geologists, engineers, and petrophysicists.

Kohlleffel: Laurence, please expound on the datasets that are critical to a G&G organization – can you go through those in more detail and describe the challenge in getting a single view or comprehensive map that includes the relevant data?

Sones: I’m glad to. The primary dataset is that of log data which can be recorded multiple times for different measurements on the same well in many cases. Log data establishes the foundation for analysis for geologists and petrophysicists. Production data is also critical and it can reside in multiple sources; generally it’s pulled into a primary geological analysis software application.

In addition to that, the seismic data is almost always on a separate platform being used by the geophysicists. I’ve mentioned some of the data in-house, but you also have all of the legal information that’s in the public domain on the state commission websites – legal location, legal well name, API numbers, depth records, elevation, etc.

Kohlleffel: How do you feel that Hadoop is helping companies across the industry address this proliferation of data silos as well as the manual QC process?

Sones: Hadoop is well suited for the data discovery required by the G&G community because as a centralized data platform it allows us to ingest all information for a single view. This includes structured data such as production and completion records, semi-structured data such as well logs, and unstructured data such as spreadsheets and PDFs. From this wide variety of datasets, we create our own “path” to the data and combinations of datasets that we feel are most important for our analysis. That’s important, because not being constrained to a prescribed path allows for complete freedom of data discovery by an individual, and allows us to ask questions that we hadn’t considered before. We may want to perform focused analysis on a small subset of wells or perform analytical processing on an entire basin or reservoir. Hadoop makes either scenario possible.

Furthermore, we can use the end user visualization tools that we are already familiar with to do sensitivity analysis in order to get the clearest picture of what is driving production. Some of the new areas that I am exploring with Hortonworks include leveraging Hadoop to bucket curves into analysis classes, auto zoning, metadata correction, machine learning for enhanced sensitivity analysis and data exploration, and batch processing of LAS files for various conversion metrics.

Kohlleffel: How important is it to you to have a 100% open source approach versus a partial open approach?

Sones: It’s an interesting question, because I’ve repeatedly seen how critical it is to be able to uptake innovative aspects and new features of software quickly and I believe that the Hortonworks’ 100% open source approach to Hadoop provides the Oil and Gas industry with a distinct advantage to other approaches.

Kohlleffel: Laurence, I want to thank you for your time today and we look forward to future discussions.

Learn More About Hortonworks Data Platform for Oil & Gas

]]>http://hortonworks.com/blog/a-petrophysicists-perspective-on-hadoop-based-data-discovery/feed/0Boosting Telecomm Customer Experience with Hadoop and Customer Behavior Graphshttp://hortonworks.com/blog/boosting-telecomm-customer-experience-with-hadoop-and-customer-behavior-graphs/
http://hortonworks.com/blog/boosting-telecomm-customer-experience-with-hadoop-and-customer-behavior-graphs/#commentsTue, 28 Jul 2015 20:36:30 +0000http://hortonworks.com/?p=75334Communication service providers aim to enhance customer experience and build strong and long-lasting relationships with their customers. This has become increasingly difficult as customer interactions now occur across many channels. Hence, it’s important to understand customer behavior across all channels to create the best experience for each individual. Join us on August 5 for a webinar with Hortonworks and Apigee to learn more.

Register Now

In today’s guest blog post, Sanjay Kumar, General Manager, Telecommunications at Hortonworks, and Sanjeev Srivastav, Vice President, Data Strategy at Apigee, discuss how service providers can capture and visualize customer behavior as a graph connecting the interaction points such as IVR, chat and call events, and combine it with network data to predict future call or chat patterns. For such analysis, many telcos have invested in big data infrastructure such as the HDP platform, and some have already begun to harness the myriad of data gathered from multiple internal systems in data lakes.…

]]>Communication service providers aim to enhance customer experience and build strong and long-lasting relationships with their customers. This has become increasingly difficult as customer interactions now occur across many channels. Hence, it’s important to understand customer behavior across all channels to create the best experience for each individual. Join us on August 5 for a webinar with Hortonworks and Apigee to learn more.

In today’s guest blog post, Sanjay Kumar, General Manager, Telecommunications at Hortonworks, and Sanjeev Srivastav, Vice President, Data Strategy at Apigee, discuss how service providers can capture and visualize customer behavior as a graph connecting the interaction points such as IVR, chat and call events, and combine it with network data to predict future call or chat patterns. For such analysis, many telcos have invested in big data infrastructure such as the HDP platform, and some have already begun to harness the myriad of data gathered from multiple internal systems in data lakes. Furthermore, there is the realization of the importance of assembling a single, 360-degree view of the customer, with customer-focused data sets.

Telco companies’ next challenge is to answer questions such as:

What is the likely reason for the next support call?

What post-call actions will increase customer satisfaction?

Which support requests are serviced across IVR and chat?

Where can proactive outreach be useful?

Means of answering these questions must be grounded in the reality of changing customer behavior and operational metrics. Examples presented in this blog used the HDP platform coupled with the Apigee Insights software to not only provide reliable predictions, but also tools useable by the business managers. This allows the businessperson to understand how customers in a similar context have engaged with the company, and how effective specific interventions have been.

Understanding customer journeys

When telco subscribers visit a support site or use a care app, call customer support, experience an outage, or express an opinion on a social network, they are taking steps on a “customer journey,” which represents a sequence of events experienced by a group of customers; in other words, it represents various paths in the overall customer behavior graph.

Reducing customer journeys to a set of customer attributes, say, those who called more than 3 times per day, along with their demographic characteristics

Using algorithms based on customer attributes to predict the actions of customers

These approaches are limited in their ability to answer questions like these:

When the number of escalations is high, what was the experience of customers leading to the escalation?

What events generate significant follow-up support activity, such as repeat calls or a technician dispatch?

At what point, while using self-care channels, do subscribers give up and drop into the call center?

The ability to ask questions like these (which involves walking forward and backward in time, across events from multiple channels of data) and to build predictive models based on these customer journeys provides a foundational layer for the analytics capability.

Behavior graphs on Hadoop data lakes in production

Once the journey analysis and predictive models are in place, the business and IT teams need to deploy the solution at scale. In this context, Hortonworks and Apigee have partnered to bring to market the Apigee Insights technology, a unique method of building aggregate behavior graphs on the Hadoop platform. Here we’ll describe work done for a communications provider, using data including customer call center records, operations data, and the network quality data.

The picture above shows a schematic graph that represents customer behavior. Note the sequential placement of events in the behavior graph. Such representations are used to compute the likelihood that a specific customer would be a repeat caller in, say, a 21-day window. Furthermore, based on the journeys experienced by a significantly large population of other customers, the system also provides scores for the likely reason behind that next call.

The specific business unit in question here receives about 100,000 calls a day, from a customer base of approximately 10 million. Over time, one can imagine the almost infinitely large number of paths that can be generated to represent the customer engagement. By applying certain threshold parameters that govern the amount of activity represented by the various possible paths, one obtains a set of “interesting” paths—but even those can number in the millions. It takes the power of big data platforms like HDP 2.x to manage the storage and computational power needed to handle this variety and volume of data.

In the screenshot below, you can see some examples of journeys that end at an “escalation” event, experienced by about 226,000 customers. Clues from these journeys and details of the customers who experienced these journeys are directly useful in effecting change, say, by increased training of support representatives or by proactive outreach to customers.

Furthermore, one can also inquire from the system the exhaustive list of paths that customers potentially took between two events of interest, and noting the relative fractions of customers who follow the various journeys. Such a bird’s-eye view provides an understanding of customer behavior that is very hard to glean simply from a set of customer attribute values.

Machine learning for customer behavior

Besides the need to visualize past customer behavior, it is critical that predictive models keep pace with changing trends. Models that use a combination of customer profile elements and event activity data are better suited for the twin tasks of predicting the likelihood of repeat calls and the likely reason of the next support request.

Apigee’s behavior-based predictive models, which don’t require the conversion of event data into customer profile data, and also preserve the information inherent in the event sequences, provide excellent predictive power.

How do you take action?

Change is inevitable, and data-driven insights help companies better manage change. For example, data about the intensity of customer activity on specific paths quantifies the impact of changes to that particular path or process.

By knowing a customer’s recent journey segment (she experienced a service outage, has called into a support center, and her service just got restored), the company can delight her by sending an automated outbound SMS.

Access to customer journeys at scale provides business analysts with a feel for the problems and opportunities for improving the customer experience. Additionally, if the predictive modeling is done using data structures that represent the journeys, analysts also get a sense of customer behaviors that most influence the predictive modeling; this is hard to achieve from a black-box predictive model.

Lastly, once predictive model have been developed, the overarching goal of improving customer experience can be reduced to specific actions that are appropriate for specific groups of customers. Call center agents or supervisors can execute these actions via existing or new applications; an intelligent API platform can further streamline the “last mile” problem of delivering predictive model scores to the applications that need them.

Without needing “features” based on insights shared by business owners, the model has high precision for identifying repeat callers (96% correct for the top 1% of the list of customers, sorted by their propensity of repeat calls); for each of the callers, it also predicted the top few reasons likely associated with the next call (actual reason of the call was ranked the number one reason for 61% of the top 1% callers).

It’s about journeys, models, and APIs

As digital interactions dominate the enterprise landscape, data about how customers engage with companies is being stored in Hadoop data lakes, for it provides critical business insights. Apigee Insights provides a powerful method of representing such engagements as behavior graphs while taking into account the raw data about the time and sequence of activities. This representation is used for descriptive analytics (examining historical data) and predictive analytics. The last mile of leveraging predictive analytics to influence or enable customer engagement is facilitated using Apigee Edge. Customer care decisions can thus be much more targeted and likely to have a positive impact.

]]>http://hortonworks.com/blog/boosting-telecomm-customer-experience-with-hadoop-and-customer-behavior-graphs/feed/0Running Operational Applications (OLTP) on Hadoop using Splice Machine and Hortonworkshttp://hortonworks.com/blog/running-operational-applications-oltp-on-hadoop-using-splice-machine-and-hortonworks/
http://hortonworks.com/blog/running-operational-applications-oltp-on-hadoop-using-splice-machine-and-hortonworks/#commentsTue, 28 Jul 2015 17:28:48 +0000http://hortonworks.com/?p=75324On August 4th at 10:00 am PST, Eric Thorsen, General Manager Retail/CP at Hortonworks and Krishnan Parasuraman, VP Business Development at Splice Machine, will be talking about how Hadoop can be leveraged as a scale-out relational database to be the System of Record and power mission critical applications.

In this blog, they provide answers to some of the most frequently asked questions they have heard on the topic.

Register Now

Hadoop is primarily known for running batch based, analytic workloads. Is it ready to support real-time, operational and transactional applications?

Although Hadoop’s heritage and initial success were in batch based applications and analytic workloads, today, the platform has evolved to support real-time, highly interactive applications. The introduction of HBase into the ecosystem enabled real-time, incremental writes on top of the immutable Hadoop file system. With Splice Machine, companies can now support ACID transactions on top of data resident in HBase.…

]]>On August 4th at 10:00 am PST, Eric Thorsen, General Manager Retail/CP at Hortonworks and Krishnan Parasuraman, VP Business Development at Splice Machine, will be talking about how Hadoop can be leveraged as a scale-out relational database to be the System of Record and power mission critical applications.

In this blog, they provide answers to some of the most frequently asked questions they have heard on the topic.

Hadoop is primarily known for running batch based, analytic workloads. Is it ready to support real-time, operational and transactional applications?

Although Hadoop’s heritage and initial success were in batch based applications and analytic workloads, today, the platform has evolved to support real-time, highly interactive applications. The introduction of HBase into the ecosystem enabled real-time, incremental writes on top of the immutable Hadoop file system. With Splice Machine, companies can now support ACID transactions on top of data resident in HBase. As a full-featured Hadoop RDBMS that supports ANSI standard SQL, secondary indexes, constraints, complex joins and highly concurrent transactions, Splice Machine database and the Hortonworks data platform enable enterprises to power real-time OLTP applications and analytics, especially as they approach big data scale.

How can enterprises, specifically in the Retail industry, take advantage of a Hadoop RDBMS?

With increasing number of channels and customer interactions across each one of them, retailers are looking for opportunities to better harness this data to drive real time decision-making – be it in personalizing their marketing activities and delivering targeted campaigns, or optimizing their assortment and merchandising decisions, or improving the efficiency of their supply chains.

A retail enterprise has multiple data repositories that require RDBMS capabilities but, at the same time, is challenged with the need for scaling those. For example, a Demand Signal Repository is a common System of Record that houses point of sale data, inventory information, forecasts, promotions and shipments. This data needs to be harmonized and maintained in a consistent state. It needs to support operational reporting such as stock-outs and also complex analytics such as forecasts. We also hear from those enterprises that their existing traditional databases such as Oracle, SQL Server or DB2 that house this data are unable to scale beyond a few terabytes and become too cumbersome to maintain. This clearly spells out the need for a data platform that can scale effortlessly to manage massive volumes of data and, at the same time, provide RDBMS capabilities that has feature function parity with their existing systems.

Can we run mixed workloads – transactional (OLTP) and analytical (OLAP) – on the same Hadoop cluster?

In retail, there are various processes that encompass both transactional and analytical workloads. For example, a campaign management system needs to ingest real-time customer data from multiple sources and potentially deliver personalized messages to those individuals. This is a highly transactional process with customer profile lookups and real-time updates. It requires concurrent system access that can scale effortlessly, especially during peak shopping seasons. That same system also needs to be able to run fairly complex analytics such as audience segmentation, look-alike modeling and offer optimization.

Retailers typically run the transactional process via a campaign management or CRM application on top of a traditional database such as Oracle or SQL Server and run their analytic processing on a different data warehouse or an MPP data mart. They had to maintain separate databases for these two different workloads and move data back and forth. With the Hadoop RDMS, they can run both the transactional (OLTP) and analytic workload (OLAP) on the same data platform, eliminating the need to duplicate data and deal with ETL bottlenecks. This also enables their entire process to scale-up affordably with increasing data volumes.

Can you give us an example of an enterprise that has modernized their data platform with Hadoop RDBMS and they ROI they have achieved?

A good example is Harte Hanks. They are replacing their Oracle RAC database powering their campaign management solution with Splice Machine Hadoop RDBMS. Harte Hanks is a global marketing services provider and serves some of the largest retailers in the market. They provide a 360 degree view of the customer thru a customer relationship management system and enable cross channel campaign analytics with real-time and mobile access. Their biggest challenge was that their customer queries were getting slower, in some cases over a half hour to complete. Expecting 30-50% future data growth, Harte Hanks was concerned that database performance issues would become increasingly worse. Harte Hanks evaluated whether to continue scaling up to larger and more expensive proprietary servers or to seek solutions that can affordably scale-out on commodity hardware. Splice Machine and Hadoop now support Harte Hank’s mixed workload applications (OLAP and OLTP). They have been able to gain a 75% cost saving with a 3-7x increase in query speeds.

Overall, they have experienced a 10-20x improvement in price/performance without significant application, BI or ETL rewrites.

]]>http://hortonworks.com/blog/running-operational-applications-oltp-on-hadoop-using-splice-machine-and-hortonworks/feed/0Introducing Availability of HDP 2.3 – Part 2http://hortonworks.com/blog/introducing-availability-of-hdp-2-3-part-2/
http://hortonworks.com/blog/introducing-availability-of-hdp-2-3-part-2/#commentsFri, 24 Jul 2015 20:38:03 +0000http://hortonworks.com/?p=75232On July 22nd, we introduced the general availability of HDP 2.3. In part 2 of this blog series, we explore notable improvements and features related to Data Access.

SQL on Hadoop

Spark 1.3.1

Stream Processing

Systems of Engagement that scale

HDP Search

We are especially excited about what these data access improvements mean for our Hortonworks subscribers.

Russell Foltz-Smith, Vice President of Data Platform, at TrueCar summed up the data access impact to his business using earlier versions of HDP, and his enthusiasm for the innovation in this latest release:

TrueCar is in the business of providing truth and transparency to all the parties in the car-buying process,” said Foltz-Smith. “With Hortonworks Data Platform, we went from being able to report on 20 terabytes of vehicle data once a day to doing the same every thirty minutes–even as the data grew to more than 600 terabytes.…

We are especially excited about what these data access improvements mean for our Hortonworks subscribers.

Russell Foltz-Smith, Vice President of Data Platform, at TrueCar summed up the data access impact to his business using earlier versions of HDP, and his enthusiasm for the innovation in this latest release:

TrueCar is in the business of providing truth and transparency to all the parties in the car-buying process,” said Foltz-Smith. “With Hortonworks Data Platform, we went from being able to report on 20 terabytes of vehicle data once a day to doing the same every thirty minutes–even as the data grew to more than 600 terabytes. We’re excited about HDP 2.3.

SQL on Hadoop

SQL is the Hadoop user community’s most popular way to access data, and Apache Hive is the defacto standard for SQL on Hadoop. I spoke with many of our customers at Hadoop Summit in San Jose, and a recurring theme emerged. They asked us to push harder towards SQL 2011 analytic compliance.

While we started with HiveQL, a subset of the functions available within ANSI standard SQL, the request clearly highlights the need to improve the breadth of SQL semantics available to Hive.

In fact, one of the more satisfying, if not surprising, comments that we heard had to do with performance. We are hearing that the performance improvements made over the past few years through the Stinger Initiative have made such a significant difference that additional performance boosts can wait until the SQL breadth is improved.

As organizations move to use Hive & Hadoop, they do not want to perform “SQL rewrite” for existing applications being ported onto Hadoop. The effort to reshape queries and re-test them is expensive. With that in mind, Apache Hive 1.2 was released in late May and with HDP 2.3, it further simplifies SQL development on Hadoop with these new SQL features:

We will continue our focus on SQL breadth to help customers ease the transition of their existing analytic applications onto HDP and to make that transition as simple as possible.

Spark 1.3.1

HDP 2.3 includes support for Apache Spark 1.3.1. The Spark community continues to innovate at an extraordinarily rapid pace. Given our leadership in Open Enterprise Hadoop, we are eager to provide our customers with the latest and most stable versions of the various Apache projects that make up HDP.

We focused the bulk of our testing has focused on Spark 1.3.1 to ensure its features and capabilities provides the best experience on Apache Hadoop YARN. The Spark community released Spark 1.4.1 just last week. While it provides additional capabilities and improvements, we plan to test 1.4.1 to harden it and fix any issues before we graduate the technical preview version of Spark to GA with inclusion in HDP.

Some of the new features of Spark 1.3.1 release are:

DataFrame API (Tech Preview)

ML Pipeline API in python

Direct Kafka support in Spark Streaming

Spark is a great tool for Data Science. It provides data parallel machine learning (ML) libraries, and an ML pipeline API to facilitate machine learning across all the data easier and to deliver insights faster.

We also plan to provide a Notebook experience to make data science easier and more intuitive.

Recently we worked with Databricks to deliver full ORC support with Spark 1.4 and for the foreseeable future, we plan to focus on contributing to within the Spark community to enhance its YARN integration, security, operational experience, and machine learning capabilities. It is certainly a very exciting time for Spark and the community as a whole!

Stream Processing

As more devices and sensors join the Internet of Things (IoT), they emit growing streams of data in real time. The need to analyze this data drives adoption of Apache Storm as the distributed stream processing engine. HDP is an excellent platform for IoT — for storing, analyzing and enriching real-time data. Hortonworks is eager to help customers adopt HDP for their IoT use cases, and we made a big effort in this release to increase the enterprise readiness of both Apache Storm and Apache Kafka.

Further, we simplified the developer experience by expanding connectivity of other sources of data, including support for data coming from Apache Flume. Storm 0.10.0 is a significant step forward.

Here is a brief summary of all the stream processing improvements:

Enterprise Readiness: Security & Operations

Security

Addressing Authentication and Authorization for Kafka — including integration with Apache Ranger (KAFKA-1682)

Twitter recently announced the Heron project, which claims to provide substantial performance improvements while maintaining 100% API compatibility with Storm. The Heron project is based on Twitter’s private fork of Storm prior to Storm being contributed to Apache and before Storm’s underlying Netty-based transport was introduced.

The key point here is that the new transport layer has delivered dramatic performance improvements over the previous 0mq-based transport. The corresponding Heron research paper provides additional details regarding other architectural improvements made, but the fact that Twitter chose to maintain API compatibility with Storm is a testament to the power and flexibility of that API. Twitter has also expressed a desire to share their experiences and work with the Apache Storm community.

A number of concepts expressed in the Heron paper were already in the implementation stage within the Storm community even before it was published, and we look forward to working with Twitter to bring those and other improvements to Storm. We are also eager to continue our collaboration with Yahoo! for Storm at extreme scale.

While the 0.10.0 release of Storm is an important milestone in the evolution of Apache Storm, the Storm community is actively working on new improvements, both near and long term, continuously exploring the realm of the possible, and helping to accelerate a wide variety of IoT use cases being requested by our customers.

Systems of Engagement that Scale

The concept of Systems of Engagement has been attributed to author Geoffrey Moore. Traditional IT systems have mostly been Systems of Record that log transactions and provide the authoritative source for information. In these kinds of systems, the primary focus is on the business process and not the people involved. As a result, analytics becomes an after-thought of describing and summarizing the transactions and processes into neat reports labeled “Business Intelligence”.

In contrast to Systems of Record, Systems of Engagement are focused on people and their goal is to bring the analytics to the forefront — moving business intelligence from the back-office & descriptive mode into proactive, predictive, and ultimately prescriptive models.

The constantly-connected world powered by the web, mobile and social data has changed how customers expect to interact with businesses. Now they demand interactions that are relevant and personal. To meet this expectation, IT must move beyond the classic Systems of Record that store only business transactions and evolve into the emerging Systems of Engagement that understand users and are capable of delivering a context-rich and personalized experience.

Successful Systems of Engagement are those that manage to combine the massive volumes of customer interaction data with deep and diverse analytics. This allows Systems of Engagement to build customer profiles and give users an experience tailored to their needs through personalized recommendations. Of course, that means that Systems of Engagement must scale!

Hortonworks Data Platform gives developers the power to build scalable Systems of Engagement by combining limitless storage, deep analytics and real-time access in one integrated whole, rather than forcing developers to stitch these pieces together by hand.

Of course, all of this starts with HDFS as a massively-scalable data store. On this foundation a wide diversity of analytical solutions has been built, from Hive to Spark to Storm and many more.

Finally, applications need a way to get data out of Hadoop in real-time in a highly-available way. For this, we have Apache HBase and Apache Phoenix, which allow data to be read from Hadoop in milliseconds using a choice of NoSQL or SQL interfaces.

HBase development continues to focus on the key attributes of scalability, reliability and performance. Notable new additions in HDP 2.3 include:

Apache Phoenix is an ANSI SQL layer on HBase, which makes developing big data applications much easier. With Phoenix, complex logic like joins are handled for you and performance is improved by pushing processing to the server. Having a real SQL interface is a key advantage that HBase has other scalable database options.

HBase is also unique in that it is a true community-driven open source database and in 2015 we continue to see a vibrant and robust community of innovation in both HBase and Phoenix. In addition to strong contribution from Hadoop vendors we’ve seen tremendous community contribution from companies such as:

Bloomberg

Cask

eBay

Facebook

Intel

Interset

Salesforce

Xiaomi

Yahoo!

We at Hortonworks thank everyone who contributes to making HBase and Phoenix great.

HDP Search

More and more customers are asking about search with Hadoop and search is becoming a critical part in a number of our customer deployments. We see HDP Search being deployed in conjunction with HBase and Storm in increasing frequency. In HDP 2.3, HDP Search is powered by Solr 5.2.

Recent security authorization work allows Ranger to protect Solr collections. Solr now works seamlessly on a Kerberized cluster through enhancements made for authentication. Other critically important optimization work was completed as well. This includes allowing administrators to define the HDFS replication factor. Previously, the index size was 2x larger, but through additional rules that can be defined, replica shard, collection and creation can be controlled as desired. In addition, the speed of returning query results is nearly twice as fast when compared to Solr 4.x.

As customer demand for HDP Search increases, it also requires ease of use, enterprise readiness, and simplification. This release has pushed forward on all these fronts. We want to thank our partners at Lucidworks for the close collaboration and engagement on these innovations.

Final Thoughts on Data Access

As you can see, there has been a tremendous amount of work that has gone into each of these areas over the past six to eight months. The arrival of all these capabilities broadens the ability for organizations to build new, unique and compelling applications on top of HDP — with YARN at its core. We are truly excited by the possibilities and very thankful for all the contributions from the Apache community that fuel this innovation.

]]>http://hortonworks.com/blog/introducing-availability-of-hdp-2-3-part-2/feed/0Enabling The World To Become Data-Drivenhttp://hortonworks.com/blog/enabling-the-world-to-become-data-driven/
http://hortonworks.com/blog/enabling-the-world-to-become-data-driven/#commentsFri, 24 Jul 2015 16:09:53 +0000http://hortonworks.com/?p=75201Today, Rob Rosen, Senior Director Partner Solutions at Platfora, tells us more about the partnership with Hortonworks and how the two companies enable the enterprise to run analytics at scale, in their big data environment.

The Hadoop market has increasingly gained traction in a world that generates over 2 billion gigabytes of data every day. Top enterprises have turned to Hadoop as their solution to better manage, store, and process their structured and unstructured data. In fact, it has become an integral part for many data-driven businesses looking to develop modern data architecture.

Additionally, today’s C-level execs are increasingly looking to extract insights from this data sitting in Hadoop. They require an end-to-end big data discovery solution that enables their business users and data scientists alike to easily and iteratively derive insights from the data.

It’s clear that businesses today need to get strategic when selecting a big data solution for the data management and governance of their Hadoop clusters.…

]]>Today, Rob Rosen, Senior Director Partner Solutions at Platfora, tells us more about the partnership with Hortonworks and how the two companies enable the enterprise to run analytics at scale, in their big data environment.

The Hadoop market has increasingly gained traction in a world that generates over 2 billion gigabytes of data every day. Top enterprises have turned to Hadoop as their solution to better manage, store, and process their structured and unstructured data. In fact, it has become an integral part for many data-driven businesses looking to develop modern data architecture.

Additionally, today’s C-level execs are increasingly looking to extract insights from this data sitting in Hadoop. They require an end-to-end big data discovery solution that enables their business users and data scientists alike to easily and iteratively derive insights from the data.

It’s clear that businesses today need to get strategic when selecting a big data solution for the data management and governance of their Hadoop clusters. The Hortonworks and Platfora partnership enables businesses to meet those needs and increase their enterprise capabilities from their Hadoop investment.

Moving from Big Data 1.0 to 2.0

In the past, Hadoop was simply used as a cost savings solution for IT. Data was moved from large data warehouses to the Hadoop data lake as a cost-effective archiving solution. But today, businesses are looking to leverage and analyze the data for business decisions that directly enhance their line of business revenue.

Businesses also need a visualization and analytics layer on top of Hadoop to make sense of the data lake. In fact, extracting value and insights from data is a necessity for businesses now and Platfora solves this need through its partnership with Hortonworks.

Typically, extracting data insights takes months with traditional BI tools that require intense data prep from the data scientist. By enabling business users to do the work that IT traditionally does, businesses can accelerate the time to insight from months to minutes. In fact, I believe that the partnership empowers business users and IT alike as they can both dig into the data in Hadoop and find patterns quickly to inform key business decisions.

The Partnership Capitalizes on the Data Trends

The big data space is noisy, but one of the use cases trending in the market is leveraging data insights to detect security breaches.

The Hortonworks-Platfora partnership enables people to use Hadoop to implement better security into their organizations. For example, doing forensics on a data breach requires analysis of petabytes of data, allowing analysts to uncover access patterns previously unseen by traditional security tooling. However, being able to analyze data at scale was very difficult prior to Hadoop. With the Hortonworks data platform, users can better leverage the data in Hadoop for security breach detection.

Now that users can access data at a large scale, they need a way to take all the different structured and unstructured data, ingest it into Hadoop, and build a multi-structured dataset that they can analyze and look for irregular patterns. Platfora enables business users and data scientists to leverage and access these multi-structured datasets (which is very difficult to accomplish with traditional BI tools). In addition, our Platfora solution extends past the traditional 30-day limit that most security appliances offer, enabling users to analyze the data over a much longer period. This is critical given that 5% of dangerous traffic to the company website comes from criminals who steal information slowly, over longer time periods.

What this Means for You

It’s time to embrace modern data architecture and replace traditional BI tools for one that offers an end-to-end big data discovery solution. This requires businesses to realize that Hadoop is now a core piece of the overall modern data architecture and having Hortonworks and Platfora is essential for these businesses to use the data as a strategic asset.

]]>http://hortonworks.com/blog/enabling-the-world-to-become-data-driven/feed/1The Connected World – Opportunities for Commercial Insurershttp://hortonworks.com/blog/the-connected-world-opportunities-for-commercial-insurers/
http://hortonworks.com/blog/the-connected-world-opportunities-for-commercial-insurers/#commentsThu, 23 Jul 2015 20:40:26 +0000http://hortonworks.com/?p=75163A recent article in PropertyCasualty 360, The Internet of Things: Insurers must prepare for disruption, customer impact, highlights the imperative for insurer strategies that address the emergence of the Internet of Things (IoT) and how it changes customer behaviors and their views of risk. The article predicts that as consumers have more access to data through their connected devices:

The IoT will fundamentally change what consumers know, when they know it, and how they interact with businesses that serve them. It will change the consumer dialogue about risk, and shrink consumers’ perceptions of the boundaries of “traditional, insurable” risk.

So far the P&C insurance industry has focused its IoT thinking on the connected home, wearable fitness devices and usage based car insurance (UBI). Many insurance carriers typically leverage third-party services for discounts based on IoT data. Others use it to create new products and services.…

]]>A recent article in PropertyCasualty 360, The Internet of Things: Insurers must prepare for disruption, customer impact, highlights the imperative for insurer strategies that address the emergence of the Internet of Things (IoT) and how it changes customer behaviors and their views of risk. The article predicts that as consumers have more access to data through their connected devices:

The IoT will fundamentally change what consumers know, when they know it, and how they interact with businesses that serve them. It will change the consumer dialogue about risk, and shrink consumers’ perceptions of the boundaries of “traditional, insurable” risk.

So far the P&C insurance industry has focused its IoT thinking on the connected home, wearable fitness devices and usage based car insurance (UBI). Many insurance carriers typically leverage third-party services for discounts based on IoT data. Others use it to create new products and services.

The automotive, healthcare, manufacturing, oil & gas, retail, telecommunications and transportation sectors have taken the lead leveraging IoT to change their business operations and products. They have already improved results by managing vehicle fleets with connected car data, building smart factories with centralized systems monitoring security, HVAC and sprinklers. Others optimized their supply chains with sensors that track the quality and location of merchandise.

So where is the insurance industry in regards to usage of its commercial customers’ IoT data, or with making internal IoT investments of its own?

Until recently, the insurance industry was relatively quiet on the usage of IoT data for commercial lines insurance (outside of a few cases of UBI in commercial auto insurance). But that is changing.

For example, MunichRe and HSB are sponsoring the “Plug and Play Internet of Things Accelerator”. In the related article in Insurance Innovation Reporter, they discuss their involvement and focus on how real-time analytics from sensor data will impact commercial lines insurance.

Local municipality data on accidents, crime, the water supply or air quality

Industry-specific public data sources

Providing Enhanced Risk Management Services

We also anticipate that carriers will use these public sources in combination with their own internal data for new product development, underwriting and claims processes. The insurance carrier also has the opportunity to use this information to further evaluate reinsurance needs and aid reinsurers in their evaluations. The commercial industry segment could be on a path to risk-monitored usage-based premiums, enabled by new data analyzed through a “connected risk” lens.

As insurers embark on the journey of capturing and utilizing IoT data, they would do well to remember some important guidelines. First of all, they need the right data platform to build their advanced analytic apps. At Hortonworks, we have deep experience working with companies across all industries to do that.

Insurers also need to foster the right skillsets and mindsets, encouraging their teams to embrace the disruption in order to create new value in the industry.

By beginning the first clearly defined IoT project with the right mindset and building a data platform that can capture the volume and variety of data being generated by IoT insurers, brokers and reinsurers can capture this opportunity and generate products and services we haven’t seen before in the industry.

]]>http://hortonworks.com/blog/the-connected-world-opportunities-for-commercial-insurers/feed/0Hortonworks Named to 2015 CRN Emerging Technology Vendor Listhttp://hortonworks.com/blog/crn-emerging-technology-vendor-2015/
http://hortonworks.com/blog/crn-emerging-technology-vendor-2015/#commentsThu, 23 Jul 2015 16:09:07 +0000http://hortonworks.com/?p=74978For Hortonworks, working with and enabling the Hadoop ecosystem is one of our core tenants, and we’re proud of the 1,100+ partners that have joined us in the journey to ensure that Open Enterprise Hadoop interoperates with your existing data center technologies. Today, we’re delighted that we have been named to The Channel Company’s exclusive 2015 CRN® Emerging Vendors List. The annual list features technology vendors that have introduced innovative new products, creating opportunities for channel partners in North America to create solutions for customers. The Channel Company recognizes Hortonworks’ demonstrated commitment to developing new technologies that meet growing market demands.

Community initiatives rally Hadoop users, developers and vendors toward common objectives to deliver innovation. Our most recent contributions are showcased in the recent release of the Hortonworks Data Platform 2.3. Our work around the Data Governance initiative is particularly exemplary of the way we drive innovation—in the open, with our ecosystem that includes technology partners and customers.…

]]>For Hortonworks, working with and enabling the Hadoop ecosystem is one of our core tenants, and we’re proud of the 1,100+ partners that have joined us in the journey to ensure that Open Enterprise Hadoop interoperates with your existing data center technologies. Today, we’re delighted that we have been named to The Channel Company’s exclusive 2015 CRN® Emerging Vendors List. The annual list features technology vendors that have introduced innovative new products, creating opportunities for channel partners in North America to create solutions for customers. The Channel Company recognizes Hortonworks’ demonstrated commitment to developing new technologies that meet growing market demands.

]]>http://hortonworks.com/blog/crn-emerging-technology-vendor-2015/feed/0Available Now: HDP 2.3http://hortonworks.com/blog/available-now-hdp-2-3/
http://hortonworks.com/blog/available-now-hdp-2-3/#commentsWed, 22 Jul 2015 15:55:56 +0000http://hortonworks.com/?p=74884We are very pleased to announce that Hortonworks Data Platform (HDP) Version 2.3 is now generally available for download. HDP 2.3 brings numerous enhancements across all elements of the platform spanning data access to security to governance. This version delivers a compelling new user experience, making it easier than ever before to “do Hadoop” and deliver transformational business outcomes with Open Enterprise Hadoop.

As we announced at Hadoop Summit in San Jose, there are a number of significant innovations as part of this release including:

HDP 2.3 represents the very latest innovation from across the Hadoop ecosystem. Literally, hundreds of developers have been collaborating with us to evolve each of the individual Apache Software Foundation (ASF) projects from the broader Apache Hadoop ecosystem.…

]]>We are very pleased to announce that Hortonworks Data Platform (HDP) Version 2.3 is now generally available for download. HDP 2.3 brings numerous enhancements across all elements of the platform spanning data access to security to governance. This version delivers a compelling new user experience, making it easier than ever before to “do Hadoop” and deliver transformational business outcomes with Open Enterprise Hadoop.

As we announced at Hadoop Summit in San Jose, there are a number of significant innovations as part of this release including:

HDP 2.3 represents the very latest innovation from across the Hadoop ecosystem. Literally, hundreds of developers have been collaborating with us to evolve each of the individual Apache Software Foundation (ASF) projects from the broader Apache Hadoop ecosystem. The various project teams have coalesced these new facets into a comprehensive and open Hortonworks Data Platform (HDP), delivering both new features and closing out a wide variety of issues across Apache Hadoop and its related projects.

In conjunction with the HDP 2.3 general availability, Apache Ambari 2.1 is now also generally available. Aside from delivering a breakthrough configuration and customization experience, Ambari 2.1 includes support for installing, managing and monitoring Apache Accumulo and Apache Atlas, along with expanded high-availability support for Apache Ranger and Apache Storm.

Here is the up-to-date view of all components and versions that comprise HDP 2.3:

Thank you to everyone within and across the open source community who worked to deliver the staggering amount of innovation contained within these projects!

Delivering Transformational Outcomes

This release is a big step forward, and we’re excited that more and more companies are transforming their businesses using HDP’s unique capabilities. While many early adopters were drawn to Hadoop based on it’s ability to process and store data cost-effectively at scale, the continued innovation within the Hadoop ecosystem which makes up HDP now delivers so much more than simple cost savings through the use of commodity hardware and descriptive reporting. HDP is being used in an increasing number of mission critical environments and is fueling entirely new businesses based on analytics and data. Everyday HDP powers these new businesses, as Jim Walker, vice president of marketing at EverString, attests:

For EverString, HDP serves as the backbone of our Predictive Marketing business. Our company is the realization of an entire business fundamentally built on HDP, not simply an application on Hadoop. Our customers rely on us to deliver the true value of Hadoop as a service, and our success is predicated on the reliability of Hortonworks and enterprise readiness of HDP.

What’s Next?

This blog is first in a series of three posts. Look for the next two posts this week as we explore all the new capabilities of HDP 2.3.

]]>http://hortonworks.com/blog/available-now-hdp-2-3/feed/0Spark on HDInsight, Partner of Year Award Highlight WPC 2015http://hortonworks.com/blog/spark-on-hdinsight-partner-of-year-award-highlight-wpc-2015/
http://hortonworks.com/blog/spark-on-hdinsight-partner-of-year-award-highlight-wpc-2015/#commentsFri, 17 Jul 2015 20:38:57 +0000http://hortonworks.com/?p=74444The Spark lit in Azure HDInsight last week was just the beginning to this week’s energy and momentum we saw around the partnership at Microsoft’s Worldwide Partner’s Conference. During the conference keynote, Microsoft CEO Satya Nadella affirmed, “Together with our partners we are transforming the business world for the next era”.

This year’s big announcement of Apache Spark being brought to HDInsight was the latest Big Data announcement delivered by Microsoft and Hortonworks. The deep partnership between the two companies started in 2011, initially bringing HDP to a native Windows Server environment, followed by Microsoft’s Azure HDInsight, Hadoop as a service based on the core HDP platform. Our tight partnership continued with collaboration to extend HDP to Azure IaaS to meet the demands of our customers’ shift to a hybrid architecture. Today the market reaps the benefits of the flexibility and interoperability of HDP on-premises, on Azure and as-a-service with HDInsight.…

]]>The Spark lit in Azure HDInsight last week was just the beginning to this week’s energy and momentum we saw around the partnership at Microsoft’s Worldwide Partner’s Conference. During the conference keynote, Microsoft CEO Satya Nadella affirmed, “Together with our partners we are transforming the business world for the next era”.

This year’s big announcement of Apache Spark being brought to HDInsight was the latest Big Data announcement delivered by Microsoft and Hortonworks. The deep partnership between the two companies started in 2011, initially bringing HDP to a native Windows Server environment, followed by Microsoft’s Azure HDInsight, Hadoop as a service based on the core HDP platform. Our tight partnership continued with collaboration to extend HDP to Azure IaaS to meet the demands of our customers’ shift to a hybrid architecture. Today the market reaps the benefits of the flexibility and interoperability of HDP on-premises, on Azure and as-a-service with HDInsight.

Carrying the excitement and energy forward from the big Hadoop and HDInsights announcements, Hortonworks started the week being honored with awards from Microsoft’s Enterprise Partner Group for “Cloud Partner of the Year” in both the West and Central regions of the United States. Hortonworks Data Platform and Hortonworks Sandbox are offered on the Azure Marketplace. HDInsight, built with the core HDP platform is one of the top 10 apps driving Azure usage. The momentum is apparent in these regions of the United States, where our joint customers are taking advantage of the benefits of Azure and Open Enterprise Hadoop. With no up-front costs or hardware investments and the flexibility and scalability of Azure, customers can spin up a Hadoop cluster in minutes, access the powerful cloud analytic services available in Azure and transform their business. We are proud to both receive these awards and celebrate our joint field efforts in building wins together such as Pier 1, Noble Energy and Rockwell Automation.

On day 2 of WPC, Hortonworks RVP of West and Central, Sales Rick Turco, had the pleasure of joining Stephen Boyle, Microsoft VP US Enterprise Services Partners in his keynote session to share where customers are going with Hadoop and Azure technologies in the market and why. Opening the session with a video from our joint customer, we learned how they used HDInsight to transform their business. Rick and Stephen had a valuable q&a session, helping the partner community understand how they too can help their customers transform their business with Hortonworks Open Enterprise Hadoop and Azure.

Keying off Stephen’s keynote, we additionally jointly conducted SI, ISV and Industry briefings, round tables and strategy sessions to get to the next level of fidelity and further enable the partner community.

Wrapping up the conference, the US Enterprise Partner Group conducted their briefing of their FY plans for supporting the vision and calls to action from Scott Guthrie’s and Kevin Turner’s keynote of Building the Intelligent Cloud and seizing the moment’s predicted 14% annual cloud market growth by 2018. Making the US EPG’s top 3 list of go-to partners, we at Hortonworks are very excited to continue our strong partnership with Microsoft and help drive customer innovation and ensure success as enterprises move to the cloud and big data with Azure and HDInsights.