A big part of any ML workflow is massaging the data into the right features for use in downstream processing. To simply feature extraction, Spark provides many feature transformers out-of-the-box. The table below outlines most of the feature transformers available in Spark 1.4 along with descriptions of each one. Much of the API is inspired by scikit-learn; for reference, we provide names of similar scikit-learn transformers where available.

With the recent release of Apache Spark 1.4.1 on July 15th, 2015, I wanted to write a step-by-step guide to help new users get up and running with SparkR locally on a Windows machine using command shell and RStudio. SparkR provides an R frontend to Apache Spark and using Spark’s distributed computation engine allows R-Users to run large scale data analysis from the R shell. The steps listed here are also documented in my online book title “Getting Started with SparkR for Big Data Analysis” which can be accessed at: http://www.danielemaasit.com/getting-started-with-sparkr/. These steps will get you up and running in less than 5 mins.

I look at apps like Grindr and Tinder and see how they’ve rewritten sex culture — by creating a sexual landscape filled with vast amounts of incredibly graphic site-specific data — and I can’t help but wonder why there isn’t an app out there that

Today’s guest post is written by Vincent Warmerdam of GoDataDriven and is reposted with Vincent’s permission from blog.godatadriven.com. You can learn more about how to use SparkR with RStudio at the 2015 EARL Conference in Boston November 2-4, where Vincent will be speaking live. This document contains a tutorial on how to provision a spark […]

In this post, we outline Spark Streaming’s architecture and explain how it provides the above benefits. We also discuss some of the interesting ongoing work in the project that leverages the execution model.

New Class of Memory Unleashes the Performance of PCs, Data Centers and More NEWS HIGHLIGHTS Intel and Micron begin production on new class of non-volatile memory, creating the first new memory category in more than 25 years.New 3D XPoint™ technology brings non-volatile memory speeds up to 1,000

We’ve seen that there is one processor that needs to be added to the picture — the commodity multi-core CPU. This is already a part of many server configurations, and for some applications, e.g., Monte-Carlo pricing of American options, it can give better or comparable performance than an accelerator processor when optimized correctly. Between NVIDIA’s Kepler GPUs and Xeon Phi, the GPU wins for both of our test applications.

Written by Nicole White What’s New in RNeo4j? RNeo4j is Neo4j’s R driver – it allows you to quickly and easily interact with a Neo4j database from your R environment. Some recent updates to RNeo4j include: My contributions Functionality for… Learn More »

On behalf of the community, it’s our pleasure to announce that Kubernetes, the open source container orchestration system, has reached the v1 milestone (GitHub). This important release, built by over 400 contributors, means Kubernetes is ready for production use. While this is huge news, there’s still much work remaining to build out the entire container toolset.

This article is a part of an evolving theme. Here, I explain the basics of Deep Learning and how Deep learning algorithms could apply to IoT and Smart city domains. Specifically, as I discuss below, I am interested in complementing Deep learning algorithms using IoT datasets. I elaborate these ideas in the Data Science for Internet of Things program which enables you to work towards being a Data Scientist for the Internet of Things (modelled on the course I teach at Oxford University and UPM – Madrid). I will also present these ideas at the International conference on City Sciences at Tongji University in Shanghai and the Data Science for IoT workshop at the Iotworld event in San Francisco

Sharing your scoops to your social media accounts is a must to distribute your curated content. Not only will it drive traffic and leads through your content, but it will help show your expertise with your followers.

Integrating your curated content to your website or blog will allow you to increase your website visitors’ engagement, boost SEO and acquire new visitors. By redirecting your social media traffic to your website, Scoop.it will also help you generate more qualified traffic and leads from your curation work.

Distributing your curated content through a newsletter is a great way to nurture and engage your email subscribers will developing your traffic and visibility.
Creating engaging newsletters with your curated content is really easy.