Category Archives: Guest

Microsoft recently announced a new Impala Connector for the Power BI Desktop (currently a preview, with GA expected early in 2017). Cloudera is also working with Microsoft’s Power BI Engineering team to certify it against Impala to ensure it meets critical enterprise requirements such as security. The following Microsoft post about the new connector, by Power BI senior program manager Miguel Llopis, is re-published below for your convenience.

Learn how the performance advantages of the Crypto cryptographic library will provide an upgrade for Spark shuffle encryption over the current approach.

When running a big data computing job, the data being processed may contain sensitive information that users don’t want anyone else to access. Encrypting that sensitive data is becoming more and more important, especially for enterprise users.

For Apache Spark, which is the emerging standard for big data processing,

The following post (Part 2 of two parts) by Vik Paruchuri, founder of data science learning platform Dataquest, offers some detailed and instructive insight about data science workflow (regardless of the tech stack involved, but in this case, using Python). We re-publish it here for your convenience.

Before we dive into exploring the data [see Part 1 for steps relating to data preparation], we’ll want to set the context,

The following post by Vik Paruchuri, founder of data science learning platform Dataquest, offers some detailed and instructive insight about data science workflow (regardless of the tech stack involved, but in this case, using Python). We re-publish it here for your convenience.

Data science companies are increasingly looking at portfolios when making hiring decisions. One of the reasons for this is that a portfolio is the best way to judge someone’s real-world skills.

For the first time, this new study by Intel software engineers analyzes the performance impact of using Apache HBase on various modern storage technologies.

As more “fast” storage technologies (such as SSD and NVMe SSD) emerge, organizations with big data use cases want to make better use of them to achieve better throughput and latency. But to this point, there have been no detailed analyses published about the true significance of that performance boost,