Friday, September 15, 2017

Classmate Jerry Woytash has just sent in an article about Michelangelo, Uber's machine learning platform. There's a lot in here, and the write-up has a lot of technical details, but some key background information is addressed early in the piece:

Michelangelo consists of a mix of open source systems and components built in-house. The primary open sourced components usedare HDFS, Spark, Samza, Cassandra, MLLib, XGBoost, and TensorFlow. We generally prefer to use mature open source options where possible, and will fork, customize, and contribute back as needed, though we sometimes build systems ourselves when open source solutions are not ideal for our use case.

Michelangelo is built on top of Uber’s data and compute infrastructure, providing a data lake that stores all of Uber’s transactional and logged data, Kafka brokers that aggregate logged messages from all Uber’s services, a Samza streaming compute engine, managed Cassandra clusters, and Uber’s in-house service provisioning and deployment tools.