Topics

Featured in Development

Alex Bradbury gives an overview of the status and development of RISC-V as it relates to modern operating systems, highlighting major research strands, controversies, and opportunities to get involved.

Featured in Architecture & Design

Will Jones talks about how Habito, the leading digital mortgage broker, benefited from using Haskell, some of the wins and trade-offs that have brought it to where it is today and where it's going next. He also talks about why functional programming is beneficial for large projects, and how it helps especially with migrating the data store.

Featured in AI, ML & Data Engineering

Katharine Jarmul discusses research related to fair-and-private ML algorithms and privacy-preserving models, showing that caring about privacy can help ensure a better model overall and support ethics.

Featured in Culture & Methods

This personal experience report shows that political in-house games and bad corporate culture are not only annoying and a waste of time, but also harm a lot of initiatives for improvement. Whenever we become aware of the blame game, we should address it! DevOps wants to deliver high quality. The willingness to make things better - products, processes, collaboration, and more - is vital.

Featured in DevOps

Service mesh architectures enable a control and observability loop. At the moment, service mesh implementations vary in regard to API and technology, and this shows no signs of slowing down. Building on top of volatile APIs can be hazardous. Here we suggest to use a simplified, workflow-friendly API to shield organization platform code from specific service-mesh implementation details.

Uber recently introduced AresDB, an open-source real-time analytics engine leveraging an unconventional power source - graphics processing units (GPUs) - for meeting the growing demands of analysis at scale and at the same time unifying, simplifying and improving Uber's existing solutions.

AresDB is written in C++ and Golang and was released in November 2018. It is an addition to Uber's repertoire of open-source contributions.

The realm of real-time analysis has many existing technologies, some of which - Apache Pinot, Elasticsearch - have been used by Uber, but as company engineers stated in their "Introducing AresDB" post, no single solution simultaneously addressed all of Uber's functional, scalability, performance, cost, and operational requirements.

To tackle this problem, Uber focused on using GPUs since the typical real-time analytical queries at Uber - used for functions such as powering dashboards to monitor business metrics and making automated decisions (like trip pricing and fraud detection) based on the metrics collected - involve filtering and aggregating millions and billions of records. The fast parallel-processing model of general-purpose GPUs is tailor-made to handle these kinds of computation tasks that can be parallelized.

After assessing the performance of some existing GPU-based technologies, like OmniSci, Kinetica, etc., AresDB was built to address Uber's specific needs.

AresDB only uses GPUs at the time of query processing. It handles data ingestion using CPUs (data is stored in host memory) and handles recovery via disks. At query time, it transfers data from host memory to GPU memory for parallel processing as evident from the following high-level overview diagram of AresDB's architecture:

Column-based storage has been implemented to enable compression for storage and query efficiency. There are two categories of stores - a Live store for recently ingested data stored in an uncompressed, unsorted format and an Archive store for mature, sorted and compressed data.

Real-time upsert with primary key deduplication has been implemented to increase data accuracy and provide "near real-time data freshness" within seconds. As part of real-time ingestion, AresDB classifies records as "late" or not. Records considered as "late" are put into the archive store whereas fresh records go into the live store. A scheduled archiving process also periodically takes records from the live store (after they can be considered to be mature) and merges them into the archived store.

GPU-powered query processing uses highly parallelized data processing by GPUs to provide low query latencies. To run queries against AresDB, users need to use the Ares Query Language (AQL) in which the queries are specified using JSON, YAML and Golang objects. According to the introduction post, a benefit of not using SQL like languages is that "In JSON-format, AQL provides better programmatic query experience than SQL for dashboard and decision system developers, because it allows them to easily compose and manipulate queries using code without worrying about issues like SQL injection."

However, as stated in the announcement post, supporting SQL for querying is one of the future steps which the Uber engineering team plans to take to improve the user experience.

AresDB is open-sourced under the Apache 2.0 license and is being used at Uber to extract business insights in real-time enabling data-driven decision making to improve user experience on the Uber platform. The introduction post also states that in the future, Uber intends to make the following improvements to the project: