Topics

Featured in Development

As part of our core values of sharing knowledge, the InfoQ editors were keen to capture and share our book and article recommendations for 2018, so that others can benefit from this too. In this second part we are sharing the final batch of recommendations

Featured in Architecture & Design

Tanya Reilly discusses her research into how the fire code evolved in New York and draws on some of the parallels she sees in software. Along the way, she discusses what it means to be an SRE, what effective aspects of the role might look like, and her opinions on what we as an industry should be doing to prevent disasters.

Featured in Culture & Methods

Mik Kersten has published a book, Project to Product, in which he describes a framework for delivering products in the age of software. Drawing on research and experience with many organisations across a wide range of industries, he presents the Flow Framework™ as a way for organisations to adapt their product delivery to the speed of the market.

Featured in DevOps

The fact that machine learning development focuses on hyperparameter tuning and data pipelines does not mean that we need to reinvent the wheel or look for a completely new way. According to Thiago de Faria, DevOps lays a strong foundation: culture change to support experimentation, continuous evaluation, sharing, abstraction layers, observability, and working in products and services.

Amazon Web Services launches Machine Learning Service

Amazon Web Services have recently launched their Amazon Machine Learning service that allows users to learn predictive models in the cloud. After Google with Prediction API, and Microsoft with Azure Machine Learning, Amazon is the latest major cloud service provider to launch a similar service.

The service currently provides a learning model similar to that used in many large scale learning applications, as well as visualizations for basic data statistics and the predictive performance of the learned model, but still has some limitations in terms of flexibility, data import and export, and support for automated model parameter tuning.

In the past years, many services and products have been launched to simplify data analysis. Some of these have focussed on simplicity by hiding most of the complexity from the user, while others try to provide a more complete set of data analysis tools for specialists.

Amazon's latest offering falls into the first category. It only deals with prediction problems. The exact underlying learning algorithm is not known, but the features it provides are very similar to vowpal wabbit, a fast machine learning algorithm developed by Jon Langford based on the stochastic gradient descent algorithm. This algorithm, which works by sequentially streaming the data past the model and adapting it based on the observed prediction error, is inherently hard to parallelized but very efficient and has bounded memory usage, and is therefore the workhorse behind many large scale applications (used, for example, for ad click prediction at Google).

In addition, Amazon Machine Learning can compute basic statistics per feature for the training data, and it provides visualizations for the prediction performance of the learned model. These two features allow the user to inspect the data and gain a better understanding into the learned prediction model. Finally, the service has some basic features for doing simple transformations on the data like extracting features, or turning text into an n-gram representation, which is often used for textual data.

There are some limitations. Data must reside in Amazon's S3 storage service, or in a Redshift database, and there is no way to import or export the learned model. There is no support for automatically training and evaluating many model variants in parallel in order to tune the model parameters, although this procedure has high practical value.

A first review also notes that the performance of the system is still somewhat lacking compared to just using a tool like vowpal wabbit locally on a laptop.

Google's Prediction API, which was launched in 2010, falls into the same category. It only deals with prediction problems, and not with more complex problems like recommendation, or unsupervised learning methods like clustering. The interface essentially only lets you upload data, train, and evaluate a model, and use a stored model to compute predictions.

Microsoft Azure Machine Learning, on the other hand, has a much more rich interface and is geared to a more specialized audience. It exposes different kinds of learning algorithms, lets the user compose complex feature transformation pipelines, and even integrates R scripts. Other examples are PredictionIO or GraphLab Create.

Apache Spark is also developing a machine learning library that can be used, for example, via databricks cloud to perform complex scalable data analysis in the cloud.