Top Hortonworks Blogs from 2017

As 2017 comes to an end we can all (hopefully) take a quick breather and prepare for the new year. Whether this means going offline and digital free, binge watching the shows you have been stockpiling, or something in between, we hope you have the opportunity to do a little bit of reading.

To help you with your reading list, here are some of the top blogs from Hortonworks in 2017, in order of publication date:

Try Apache Spark 2.1 & Zeppelin in Hortonworks Data Cloudby Vinay Shukla
Wanna try Spark 2.1 Now? Well, you are in luck… Hortonworks Data Cloud (“HDCloud”) for AWS gives you a quick way to launch a Spark cluster in the cloud. Read More

Machine Learning & its Impact on the Future for Insurance by Cindy Maike
First and foremost, machine learning WILL change the way insurers do business. The insurance industry is founded on forecasting future events and estimating the value/impact of those events and has used established predictive modeling practices – especially in claims loss prediction and pricing – for some time now. Read More

A Reference Architecture for the Open Banking Standard… by Vamsi Chemitiganti
Financial services firms specifically deal with manifold data types ranging from Customer Account data, Transaction Data, Wire Data, Trade Data, Customer Relationship Management (CRM), General Ledger and other systems supporting core banking functions. When one factors in social media feeds, mobile clients & other non traditional data types, the challenge is not just one of data volumes but also variety and the need to draw conclusions from fast moving data streams by commingling them with years of historical data. Read More

Announcing the General Availability of HDP 2.6by Wei Wang
We develop the entire Hortonworks Data Platform to ensure our customers not only can adopt the latest innovation from the broader open source community, but also enjoy some of the enterprise ready and easy of use functionalities packaged within HDP 2.6. Read MoreIf you are interested in HDP 2.6, you will also probably want to see this blog as well:Announcing the General Availability of HDP 2.6.3 and Hortonworks DataPlane Service

Top 5 Performance Boosters with Apache Hive LLAPby Carter Shanklin
Now that LLAP is generally available with HDP 2.6, let’s take some time to look at the top 5 performance boosters you’re missing out on if you’re not using LLAP. Read More

Integrate SparkR and R for Better Data Science Workflowby Yanbo Liang
To address R’s scalability issue, the Spark community developed SparkR package which is based on a distributed data frame that enables structured data processing with a syntax familiar to R users. Read More

Livy: A Rest Interface for Apache Spark by Saisai ShaoIn order to overcome the current shortcomings of executing Spark applications, and to introduce additional features, we introduce Livy – a REST based Spark interface to run statements, jobs, and applications. Read More

Benchmark Apache HBase vs Apache Cassandra on SSD in a Cloud Environment by Will Xu
As more and more workloads are being brought onto modern hardware in the cloud, it’s important for us to understand how to pick the best databases that can leverage the best hardware. Amazon has introduced instances with directly attached SSD (Solid state drive). Both Apache HBase and Apache Cassandra are popular key-value databases. In this benchmark, we hope to learn more about how they leverage the directly attached SSD in a cloud environment. Read More

A Category Emerges: Introducing Hortonworks DataPlane Service by Scott Gnau
Hortonworks DPS is a next-gen service to manage, govern and secure data and workloads across multiple sources (databases, EDWs, clusters, data lakes), types of data (at-rest, in-motion) and tiers (on-prem, multiple clouds, hybrid). It allows enterprises to focus on getting more value from data quicker by providing an intuitive experience for managing all data. Read More

Automated Validation for all of the Apache Hadoop Ecosystem by Ramya Sunil, Sunitha Velpula, Raja Aluri
Unlike traditional enterprise software, we deal with an inflow of hundreds of Apache commits, across 25+ projects in the Apache Hadoop ecosystem. Apache community has a rich set of unit tests, which are continuously run (often, for every commit) to catch regressions early. However, they are not always sufficient to assess the impact on integrated functionality. This is where having a robust, scalable and reliable testing infrastructure to validate the multi-layer stack becomes crucial. Read More