ClickHouse. Just makes you think faster.

Run more queries in the same amount of time

Test more hypotheses

Slice and dice your data in many more new ways

Look at your data from new angles

Discover new dimensions

Blazing Fast

ClickHouse's performance exceeds comparable column-oriented DBMS currently available on the market. It processes hundreds of millions to more than a billion rows and tens of gigabytes of data per single server per second.

ClickHouse uses all available hardware to its full potential to process each query as fast as possible. The peak processing performance for a single query (after decompression, only used columns) stands at more than 2 terabytes per second.

ClickHouse works 100-1,000x faster than traditional approaches

In contrast to common data management methods, where vast amounts of raw data in its native format are available as a "data lake" for any given query, ClickHouse offers instant results in most cases: the data is processed faster than it takes to create a query. Follow the link below to see detailed benchmarks by Yandex of ClickHouse in comparison with other database management systems. Also there are some links on third-party benchmarks in the following section.

Linearly Scalable

ClickHouse allows companies to add servers to their clusters when necessary without investing time or money into any additional DBMS modification. The system has been successfully serving Yandex.Metrica, while the count of servers in it's main production cluster have grown from 60 to 394 in two years, which are by the way located in six geographically distributed datacenters.

ClickHouse scales well both vertically and horizontally. ClickHouse is easily adaptable to perform either on cluster with hundreds of nodes, or on a single server or even on a tiny virtual machine. Currently there are installations with more than two trillion rows per single node, as well as installations with 100Tb of storage per single node.

Hardware Efficient

ClickHouse processes typical analytical queries two to three orders of magnitude faster than traditional row-oriented systems with the same available I/O throughput. The system's columnar storage format allows fitting more hot data in RAM, which leads to a shorter response times.

ClickHouse allows to minimize the number of seeks for range queries, which increases efficiency of using rotational disk drives, as it maintains locality of reference for continually stored data.

By minimizing data transfers for most types of queries, ClickHouse enables companies to manage their data and create reports without using specialized networks that are aimed at high-performance computing.

Fault Tolerant

ClickHouse supports multi-master asynchronous replication and can be deployed across multiple datacenters. Downtime of a single node or the whole datacenter won't affect the system's availability for both reads and writes. Distributed reads are automatically balanced to live replicas to avoid increasing latency. Replicated data are synchronized automatically or semi-automatically after server downtime.

Key Features

True column-oriented storage

Vectorized query execution

Data compression

Parallel and distributed query execution

Real time query processing

Real time data ingestion

On-disk locality of reference

Cross-datacenter replication

High availability

SQL support

Local and distributed joins

Pluggable external dimension tables

Arrays and nested data types

Approximate query processing

Probabilistic data structures

Full support of IPv6

Features for web analytics

State-of-the-art algorithms

Detailed documentation

Clean documented code

Feature Rich

ClickHouse features a user-friendly SQL query dialect with a number of built-in analytics capabilities. For example, it includes probabilistic data structures for fast and memory-efficient calculation of cardinalities and quantiles. There are functions for working dates, times and time zones, as well as some specialized ones like addressing URLs and IPs (both IPv4 and IPv6) and many more.

Data organizing options available in ClickHouse, such as arrays, array joins, tuples and nested data structures, are extremely efficient for managing denormalized data.

Using ClickHouse allows joining both distributed data and co-located data, as the system supports local joins and distributed joins. It also offers an opportunity to use external dictionaries, dimension tables loaded from an external source, for seamless joins with simple syntax.

ClickHouse supports approximate query processing – you can get results as fast as you want, which is indispensable when dealing with terabytes and petabytes of data.

The system's conditional aggregate functions, calculation of totals and extremes, allow getting results with a single query without having to run a number of them.

When to use ClickHouse

For analytics over stream of clean, well structured and immutable events or logs. It is recommended to put each such stream into a single wide fact table with pre-joined dimensions.

Some examples of viable applications:

Web and App analytics

Advertising networks and RTB

Telecommunications

E-commerce and finance

Information security

Monitoring and telemetry

Time series

Business intelligence

Online games

Internet of Things

When NOT to use ClickHouse

Transactional workloads (OLTP)

Key-value access with high request rate

Blob or document storage

Over-normalized data

Highly Reliable

ClickHouse has been managing petabytes of data serving a number of highload mass audience services of Yandex, Russia's leading search provider and one of largest European IT companies. Since 2012, ClickHouse has been providing robust database management for the company's web analytics service, comparison e-commerce platform, public email service, online advertising platform, business intelligence tools and infrastructure monitoring.

ClickHouse can be configured as purely distributed system located on independent nodes, without any single points of failure.

Software and hardware failures or misconfigurations do not result in loss of data. Instead of deleting "broken" data, ClickHouse saves it or asks you what to do before a startup. All data is checksummed before every read or write to disk or network. It is virtually impossible to delete data by accident as there are safeguards even for human errors.

ClickHouse offers flexible limits on query complexity and resource usage, which can be fine-tuned with settings. It is possible to simultaneously serve both a number of high priority low-latency requests and some long-running queries with background priority.

Simple and Handy

ClickHouse streamlines all your data processing. It's easy to use: ingest all your structured data into the system, and it is instantly available for reports. New columns for new properties or dimensions can be easily added to the system at any time without slowing it down.

ClickHouse is simple and works out-of-the-box. As well as performing on hundreds of node clusters, this system can be easily installed on a single server or even a virtual machine. No development experience or code-writing skills are required to install ClickHouse.