High performance data analytics

Leading Service and Software Provider for ClickHouse

Altinity is the leading service and software provider for ClickHouse - an open source SQL data warehouse offering industry-leading query speeds on petabyte-scale data. Altinity offers the highest expertise on the market to help customers deploy and run the most demanding analytic applications. We also provide innovative software to manage ClickHouse in Kubernetes, cloud, and bare metal environments.

Let Altinity software help you manage and operate ClickHouse clusters. Offerings cover operation on Kubernetes, bare metal, and cloud clusters, as well as data ingest and visualization. Plus we will gladly implement custom features in ClickHouse itself! Read more »

Accelerate your ClickHouse evaluation, design, and deployment with Altinity proof-of-concept support plans. We also offer training packages to give your team a competitive edge in building world-class data warehouse applications. Read more »

ClickHouse offers incredible flexibility to solve almost any business problem in a multiple of ways. Schema design plays a major role in this. For our recent benchmarking using the Time Series Benchmark Suite (TSBS) we replicated TimescaleDB schema in order to have fair comparisons. In that design every metric is stored in a separate column. This is the best for ClickHouse from a performance perspective, as it perfectly utilizes column store and type specialization.

Sometimes, however, schema is not known in advance, or time series data from multiple device types needs to be stored in the same table. Having a separate column per metric may be not very convenient, hence a different approach is required. In this article we discuss multiple ways to design schema for time series, and do some benchmarking to validate each approach.

One of our customers recently had a problem using CickHouse: the simple workflow of load-analyze-present wasn't as efficient as they were expecting. The body of the problem was with loading and presenting IPv4 and IPv6 addresses, which are traditionally stored in ClickHouse as UInt32 and FixedString(16) columns. These types have many advantages, like compact footprint and ease of comparing values. But they also have shortcomings that prompted us to seek a better solution.

The previous post surveyed connectivity benchmarks for ClickHouse to estimate general performance of server concurrency. In this next post we will take on real-life examples and explore concurrency performance when actual data are involved.

ClickHouse is an OLAP database for analytics, so the typical use scenario is processing a relatively small number of requests -- from several per hour to many dozens or even low hundreds per second --affecting huge ranges of data (gigabytes/millions of rows).

But how it will behave in other scenarios? Let's try to use a steam-hammer to crack nuts, and check how ClickHouse will deal with thousands of small requests per second. This will help us to understand the range of possible use cases and limitations better.

This post has two parts. The first part covers connectivity benchmarks and test setup. The next part covers maximum QPS in scenarios involving actual data.

The Madrid ClickHouse Meetup is over. Having attended in person I can report it was excellent: well-organized with great content. The meetup venue was the Google for Startups Campus, which has a comfortable auditorium capable of holding over 100 attendees. Between the venue and quality of the talks, it was more like a mini-conference than a meetup.

Altinity is thrilled to be among the sponsors of the event. Meet our team at the Exhibit Hall

Use code “ALTINITY10” to get 10% off registration

Percona Live conferences provide the open source database community with an opportunity to discover and discuss the latest open source trends, technologies and innovations. The conference includes the best and brightest innovators and influencers in the open source database industry.

Join us on June 5, 2019 6 pm GMT / 1 pm EST / 10 am PSTThis talk shows how to get sub-second response from datasets containing a billion rows or more. We'll start with defining schema and loading quickly data in parallel. We will then introduce tricks like LowCardinality datatype, ASOF joins, and materialized views that can reduce query response to thousandths of seconds. Finally we'll show you metrics and logging to analyze query performance. After this talk you'll be ready for your first billion rows and many more afterwards.