Want to appear on this page? Send a quick description of your organization and usage to the mailing list or to @apachekafka or @jaykreps on twitter and we'll add you.

Companies

LinkedIn - Apache Kafka is used at LinkedIn for activity stream data and operational metrics. This powers various products like LinkedIn Newsfeed, LinkedIn Today in addition to our offline analytics systems like Hadoop.

Square - We use Kafka as a bus to move all systems events through our various datacenters. This includes metrics, logs, custom events etc. On the consumer side, we output into Splunk, Graphite, Esper-like real-time alerting.

Box - At Box, Kafka is used for the production analytics pipeline & real time monitoring infrastructure. We are planning to use Kafka for some of the new products & features

Airbnb - Used in our event pipeline, exception tracking & more to come.

Mozilla - Kafka will soon be replacing part of our current production system to collect performance and usage data from the end-users browser for projects like Telemetry, Test Pilot, etc. Downstream consumers usually persist to either HDFS or HBase.

Cisco - Cisco is using Kafka as part of their OpenSOC (Security Operations Center). More detail here.

Tagged - Apache Kafka drives our new pub sub system which delivers real-time events for users in our latest game - Deckadence. It will soon be used in a host of new use cases including group chat and back end stats and log collection.

Foursquare - Kafka powers online to online messaging, and online to offline messaging at Foursquare. We integrate with monitoring, production systems, and our offline infrastructure, including hadoop.

Mate1.com Inc. - Apache kafka is used at Mate1 as our main event bus that powers our news and activity feeds, automated review systems, and will soon power real time notifications and log distribution.

Spongecell - We use Kafka to run our entire analytics and monitoring pipeline driving both real-time and ETL applications for our customers.

Wooga - We use Kafka to aggregate and process tracking data from all our facebook games (which are hosted at various providers) in a central location.

AddThis - Apache Kafka is used at AddThis to collect events generated by our data network and broker that data to our analytics clusters and real-time web analytics platform.

Urban Airship - At Urban Airship we use Kafka to buffer incoming data points from mobile devices for processing by our analytics infrastructure.

Metamarkets - We use Kafka to ingest real-time event data, stream it to Storm and Hadoop, and then serve it from our Druid cluster to feed our interactive analytics dashboards. We've also built connectors for directly ingesting events from Kafka into Druid.

Simple - We use Kafka at Simple for log aggregation and to power our analytics infrastructure.

Gnip - Kafka is used in their twitter ingestion and processing pipeline.

Loggly - Loggly is the world's most popular cloud-based log management. Our cloud-based log management service helps DevOps and technical teams make sense of the the massive quantity of logs. Kafka is used as part of our log collection and processing infrastructure.

VisualDNA We use Kafka 1. as an infrastructure that helps us bring continuously the tracking events from various datacenters into our central hadoop cluster for offline processing, 2. as a propagation path for data integration, 3. as a real-time platform for future inference and recommendation engines

Sematext - in SPM (performance monitoring + alerting), Kafka is used for metrics collection and feeds SPM's in-memory data aggregation (OLAP cube creation) as well as our CEP/Alerts servers (see also: SPM for Kafka performance monitoring). In SA (search analytics) Kafka is used in search and click stream collection before being aggregated and persisted. In Logsene (log analytics) Kafka is used to pass logs and other events from front-end receivers to the persistent backend.

Wize Commerce - At Wize Commerce (previously, NexTag), Kafka is used as a distributed queue in front of Storm based processing for search index generation. We plan to also use it for collecting user generated data on our web tier, landing the data into various data sinks like Hadoop, HBase, etc.

Quixey - At Quixey, The Search Engine for Apps, Kafka is an integral part of our eventing, logging and messaging infrastructure.

LinkSmart - Kafka is used at LinkSmart as an event stream feeding Hadoop and custom real time systems.

LucidWorks Big Data - We use Kafka for syncing LucidWorks Search (Solr) with incoming data from Hadoop and also for sending LucidWorks Search logs back to Hadoop for analysis.

Criteo - use Kafka in production for over a year for stream processing and log transfer (over 2M messages/s and growing)

The Wikimedia Foundation - uses Kafka as a log transport for analytics data from production webservers and applications. This data is consumed into Hadoop using Camus and to other processors of analytics data.

OVH - uses Kafka in production for over a year now using it for event bus, data pipeline for antiddos and more to come.

Helpshift produces billions of events with Kafka through an erlang based producer ekaf that supports 8.0, and consumes topics primarily with storm and clojure.

iPinYou is the largest DSP in China which has its HQ in Beijing and offices in Shanghai, Guangzhou, Silicon Valley and Seattle. Kafka clusters are the central data hub in iPinYou. All kinds of Internet display advertising data, such as bid/no-bid, impression, click, advertiser, conversion and etc., are collected as primary data streams into Kafka brokers in real time, by LogAggregator (a substitute for Apache Flume, which is implemented in C/C++ by iPinYou, has customized functionality, better performance, lower resource-consuming).

MailChimp - Kafka powers MailChimp’s data pipeline that in turn powers MailChimp Pro, as well as an increasing number of other product features. You can read some of the details here.

Anonymous

@VisualDNA We use Kafka 1. as an infrastructure that helps us bring continuously the tracking events from various datacenters into our central hadoop cluster for offline processing, 2. as a propagation path for data integration, 3. as a real-time platform for future inference and recommendation engines