Dwarf

Dwarf is Ning's data pipeline and Analytics platform. It is a collection of open-source libraries, utilities and servers to build large-scale Analytics infrastructures.

Ning has been working with Hadoop and related technologies since 2007. Over the years, we've built internally a large scale data pipeline, that we open-sourced in 2010. It is composed of several building blocks, which can be independently used:

Action core

Exposes HDFS over HTTP. It provides both a REST API and a browsing UI.

action-access library

Java library to access the Action core. It provides an API to retrieve and store data in HDFS over HTTP.

Collector core

Data aggregator service, similar to Scribe or Kafka. It exposes both an HTTP and Thrift API.

eventtracker library

Java library to send data (events) to the Collector core.

Goodwill core

Metadata repository for events, it stores schema definitions. Used by the Collector core and the Action core for data validation.

goodwill-access library

Java library for Goodwill, it allows you to programmatically access Schemata from Goodwill.

serialization library

The serialization library contains the building blocks of the Dwarf framework. Each component depends on it.

Meteo

Realtime event processing engine. It leverages Esper for runtime analysis and can output data to different rendering engines for graphing purposes.