Google Cloud Machine Learning (CML) is an effort to simplify and reduce time and cost of building a data science tech infrastructure for data science projects. It can plug into Google's other storage,querying, and data-handling products to generate machine learning models. Among the data sources is Google Cloud Dataproc - a managed Hadoop and Spark platform that is now in general availability.

CML is based on the TensorFlow framework - used by Google to build machine intelligence capabilities into Google products like its speech-recognition API. Models built with TensorFlow outside ofGoogle's services can be used with CML or on the Google cloud ecosystem for data ingestion, management, and training.

Incuded are pretrained APIs for translation, machine vision, and speech recognition (IBM has similar offerings with Watson on Bluemix).

Google Cloud Dataproc is a managed Hadoop and Spark platform to use open source tools for batch processing, querying, streaming, and machine learning. Automation helps you create clusters quickly, manage them easily, and potentially reduce costs by turning clusters off when not needed.

The goal is to simplify administration and thus focus on higher value tasks.

Data Science Skills Survey 2015Business Over Broadway conducted a survey in 2015 of over 620 data professionals showing 25 different skills. Top 2 skills were communication and managing structured data. Bottom 2 skills were big and distributed data and cloud management.The survey concludes the top skills for project success are data mining and using visualization tools. For business managers the key skills are statistics, machine learning and big and distributed data. For Developers, the critical skills are product design and development, systems administration and back-end programming. For Creatives, the important skills include math, business development and graphical models.The 10 most important data science skills to project success were:

S - Data Mining and Viz Tools (corr with satisfaction = .44)

S - Statistics and statistical modeling (.39)

T - Machine Learning (.38)

S - Science/Scientific Method (.38)

M - Algorithms and Simulations (.37)

M - Bayesian Statistics (.37)

M - Optimization (.33)

S - Data Management (.33)

T - NLP and text mining (.32)

M - Math (.31)

The 10 data science skills that best predict project success for Business Managers (i.e., leader, business person, entrepreneur) are:

The 10 data science skills that best predict project success for Creatives (i.e., Jack of all trades, artist, hacker) are:

M - Math (corr with satisfaction = .51)

S - Data Mining and Viz Tools (.39)

B - Business development (.34)

M - Graphical Models (.32)

M - Optimization (.31)

T - Managing Structured data (.31)

P - Database Administration (.28)

M - Algorithms and Simulations (.23)

T - Machine Learning (.22)

M - Bayesian Statistics (.21)

Note that data technologies and advanced data science techniques are rapidly changing and innovating. New tech and cognitive computing will likely change the critical skill sets required in the future.