More stable segment balancing

Coordinator is now capable of more stable segment management especially for segment balancing. We have fixed an unexpected segment imbalancing caused by the conflicted decisions of Coordinator rule runner and balancer.

Bug fixes in querying and result caching

We've fixed the wrong lexicographic sort of topN queries and the wrong filter application for the nested queries. The bug of ClassCastException when caching topN queries with Float dimensions has also fixed.

Highlights

Large performance improvements for coordinator's loadstatus API

The loadstatus API of Coordinators returns the percentage of segments actually loaded in the cluster versus segments that should be loaded in the cluster. The performance of this API has greatly been improved.

Fix SQLMetadataSegmentManager to allow successive start and stop

Fix default interval handling in SegmentMetadataQuery

SegmentMetadataQuery is supposed to use the interval of druid.query.segmentMetadata.defaultHistory if the interval is not specified, but it queried all segments instead which incurs an unexpected performance hit. SegmentMetadataQuery now respects the defaultHistory option again.

Support HTTP OPTIONS request

Fix a bug of different segments of the same segment id in Kafka indexing

Kafka indexing service allowed retrying tasks to overwrite the segments in deep storage written by the previous failed tasks. However, this caused another bug that the same segment ID could have different data on historicals and in deep storage. This bug has been fixed now by using unique segment paths for each Kafka index tasks.

Highlights

Kafka indexing incremental handoffs and decoupled partitioning

The Kafka indexing service now supports incremental handoffs, as well as decoupling the number of segments created by a Kafka indexing task from the number of Kafka partitions. Please see #4815 (comment) for more information.

Improved automatic segment management

Automatic pending segments cleanup

Indexing tasks create entries in the "pendingSegments" table in the metadata store; prior to 0.12.0, these temporary entries were not automatically cleaned up, leading to possible cluster performance degradation over time. Druid 0.12.0 allows the coordinator to automatically clean up unused entries in the pending segments table. This feature is enabled by setting druid.coordinator.kill.pendingSegments.on=true in coordinator properties.

Compaction task

Compacting segments (merging a set of segments within a given interval to create a set with larger but fewer segments) is a common Druid batch ingestion use case. Druid 0.12.0 now supports a Compaction Task that merges all segments within a given interval into a single segment. Please see http://druid.io/docs/0.12.0-rc1/ingestion/tasks.html#compaction-task for more details.

Query request queuing improvements

Currently clients can overwhelm a broker inadvertently by sending too many requests which get queued in an unbounded Jetty worker pool queue. Clients typically close the connection after a certain client-side timeout but the broker will continue to process these requests, giving the appearance of being unresponsive. Meanwhile, clients would continue to retry, continuing to add requests to an already overloaded broker..

The newly introduced properties druid.server.http.queueSize and druid.server.http.enableRequestLimit in the broker configuration and historical configuration allow users to configure request rejection to prevent clients from overwhelming brokers and historicals with queries.

Parse batch support

For developers of custom ingestion parsing extensions, it is now possible for InputRowParsers to return multiple InputRows from a single input row. This can simplify ingestion pipelines by reducing the need for input transformations outside of Druid. Added by @pjain1 in #5081.

Performance improvements

SegmentWriteOutMedium

When creating new segments, Druid stores some pre-processed data in temporary buffers. Prior to 0.12.0, these buffers were always kept in temporary files on disk. In 0.12.0, PR #4762 by @leventov allows these temporary buffers to be stored in off-heap memory, thus reducing the number of disk I/O operations during ingestion. To enable using off-heap memory for these buffers, the druid.peon.defaultSegmentWriteOutMediumFactory property needs to be configured accordingly. If using off-heap memory for the temporary buffers, please ensure that -XX:MaxDirectMemorySize is increased to accommodate the higher direct memory usage.

Parallel merging of intermediate GroupBy results

PR #4704 by @jihoonson allows the user to configure a number of processing threads to be used for parallel merging of intermediate GroupBy results that have been spilled to disk. Prior to 0.12.0, this merging step would always take place within a single thread.

And much more!

Updating from 0.11.0 and earlier

Please see below for changes between 0.11.0 and 0.12.0 that you should be aware of before upgrading. If you're updating from an earlier version than 0.11.0, please see release notes of the relevant intermediate versions for additional notes.

Rollback restrictions

Please note that after upgrading to 0.12.0, it is no longer possible to downgrade to a version older than 0.11.0, due to changes made in #4762. It is still possible to roll back to version 0.11.0.

com.metamx.java-util library migration

The Metamarkets java-util library has been brought into Druid. As a result, the following package references have changed:

This will affect the druid.monitoring.monitors configuration. References to monitor classes under the old com.metamx.metrics.* package will need to be updated to reference io.druid.java.metrics.* instead, e.g. io.druid.java.util.metrics.JvmMonitor.

If classes under the the com.metamx packages shown above are referenced in other configurations such as log4j2.xml, those references will need to be updated as well.

Extension developers will need to update their code to use the new Druid packages as well.

Caffeine cache extension

The Caffeine cache extension has been moved out of an extension, into core Druid. In addition, the Caffeine cache is now the default cache implementation. Please remove druid-caffeine-cache if present from the extension list when upgrading to 0.12.0. More information can be found at #4810.

Kafka indexing service changes

earlyMessageRejectPeriod

The semantics of the earlyMessageRejectPeriod configuration have changed. The earlyMessageRejectPeriod will now be added to (task start time + task duration) instead of just (task start time) when determining the bounds of the message window. Please see #4990 for more information.

Rolling upgrade

In 0.12.0, there are protocol changes between the Kafka supervisor and Kafka Indexing task and also some changes to the metadata formats persisted on disk. Therefore, to support rolling upgrade, all the Middle Managers will need to be upgraded first before the Overlord. Note that this ordering is different from the standard order of upgrade, also note that this ordering is only necessary when using the Kafka Indexing Service. If one is not using Kafka Indexing Service or can handle down time for Kafka Supervisor then one can upgrade in any order.

Until the point in time Overlord is upgraded, all the Kafka Indexing Task will behave in same manner (even if they are upgraded) as earlier which means no decoupling and incremental hand-offs. Once, Overlord is upgraded, the new tasks started by the upgraded Overlord will support the new features.

Roll back

Once both the overlord and middle managers are rolled back, a new set of tasks should be started, which will work properly. However, the current set of tasks may fail during a roll back. Please see #4815 for more info.

Interface Changes for Extension Developers

The ColumnSelectorFactory API has changed. Aggregator extension authors and any others who use ColumnSelectorFactory will need to update their code accordingly. Please see #4886 for more details.

The Aggregator.reset() method has been removed because it was deprecated and unused. Please see #5177 for more info.

The DataSegmentPusher interface has changed, and the push() method now has an additional replaceExisting parameter. Please see #5187 for details.

The Escalator interface has changed: the createEscalatedJettyClient method has been removed. Please see #5322 for more details.

Druid 0.11.0 contains over a hundred performance improvements, stability improvements, and bug fixes from almost 40 contributors. This release adds two major security features, TLS support and extension points for authentication and authorization.

The existing Kerberos authentication extension has been updated to implement the new Authenticator interface, please see the "Kerberos configuration changes" section under "Updating from 0.10.1 and earlier" for more information if you are using the Kerberos extension.

Double columns support

cachingCost Balancer Strategy

Users upgrading to 0.11.0 are encouraged to try the new cachingCost segment balancing strategy on their coordinators. This strategy offers large performance improvements over the existing cost balancer strategy, and it is planned to become the default strategy in the release following 0.11.0.

This strategy can be selected by setting the following property on coordinators:

And much more!

Updating from 0.10.1 and earlier

Please see below for changes between 0.10.1 and 0.11.0 that you should be aware of before upgrading. If you're updating from an earlier version than 0.10.1, please see release notes of the relevant intermediate versions for additional notes.

Upgrading coordinators and overlords

The following patch changes the way coordinator->overlord redirects are handled:#5037

As a result of the two patches above, special care is needed when upgrading Coordinator or Overlord to 0.11.0. All coordinators and overlords must be shut down and upgraded together.

For example, to upgrade Coordinators, you would shutdown all coordinators, upgrade them to 0.11.0 and then start them. Overlords should be upgraded in a similar way.

During the upgrade process, there must not be any time period where a non-0.11.0 coordinator or overlord is running simultaneously with an 0.11.0 coordinator or overlord.

Note that at least one overlord should be brought up as quickly as possible after shutting them all down so that peons, tranquility etc continue to work after some retries.

Also note that the druid.zk.paths.indexer.leaderLatchPath property is no longer used now.

Service name changes

In earlier versions of Druid, / characters in service names defined by druid.service would be replaced by : characters because these service names were used in Zookeeper paths. Druid 0.11.0 no longer performs these character replacements.

Example:1 - if the old configuration had a broker with service name test/broker:druid.service=test/broker

and a Router was configured assuming that / will be replaced with : in the broker service name,druid.router.tierToBrokerMap={"hot":"test:broker","_default_tier":"test:broker"}

the Router configuration should be updated to remove that assumption:druid.router.tierToBrokerMap={"hot":"test/broker","_default_tier":"test/broker"}

Example:2 - If the old configuration had overlord with service Name test/overlord then value of druid.coordinator.asOverlord.overlordService or druid.selectors.indexing.serviceName should be test/overlord and not test:overlord

Example:3 - If the old configuration had overlord with service Name test:overlord then value of druid.coordinator.asOverlord.overlordService or druid.selectors.indexing.serviceName should be test:overlord and not test/overlord

Following service name-related configurations are also affected and should be updated to exactly match the value of druid.service property on other node being discovered.

Users can point the Kerberos authenticator's authorizerName to an instance of an "allowAll" authorizer to replicate the pre-0.11.0 behavior of a cluster using Kerberos authentication with no authorization.

Lookups API path changes

The paths for the lookups configuration API have changed due to #5058.

Configuration paths that had the form /druid/coordinator/v1/lookups now have the form /druid/coordinator/v1/lookups/config.

Migrating to Double columns

Prior to 0.11.0, the Double* aggregators would store column values on disk as Float while performing aggregations using Double representations.

PR #4491 allows the Double aggregators to store column values on disk as Doubles. Due to concerns related to rolling updates and version downgrades, this behavior is disabled by default and Druid will continue to store Double aggregators on disk as floats.

To enable Double column storage, set the following property in the common runtime properties:

druid.indexing.doubleStorage=double

Users should not set this property during an initial rolling upgrade to 0.11.0, as any nodes running pre-0.11.0 Druid will not be able to handle Double columns created during the upgrade period. Users will also need to reindex any segments with Double columns if downgrading from 0.11.0 to an older version. Please see #4944 and #4605 for more information.

Scan query changes

The Scan query has been moved from extensions-contrib to core Druid. As part of this migration: #4751, the scan query's handling of the time column has changed.

The time column is now is returned as "__time" rather than "timestamp", it is no longer included if you do not specifically ask for it in your "columns", and it is returned as a long rather than a string.

Users can revert the Scan query's time handling to the legacy extension behavior by setting "legacy" : true in their queries, or setting the property druid.query.scan.legacy = true. This is meant to provide a migration path for users that were formerly using the contrib extension.

Extension Interface Changes

Aggregator double column support

The Aggregator interface has gained a getDouble() method, which defaults to casting the result of getFloat(). The getDouble() method should be re-implemented for any custom aggregators that can support doubles.

Filter interface change

The Filter.getBitmapResult() method no longer has a default implementation: #4481

Custom filter extensions will need to provide an implementation for getBitmapResult() now.

Other Notes

jvm/gc/time metric

The jvm/gc/time metric is no longer emitted, replaced by a new metric named jvm/gc/cpu for the reasons described here: #4480

Default worker select strategy

Please note that the default worker select strategy has changed from fillCapacity to equalDistribution. This change was introduced in 0.10.1, the previous release, but was not mentioned in the 0.10.1 release notes, so it is called out again here.

V8 segment creation removed

Druid will now always build V9 segments, creating V8 segments is no longer supported and the buildV9Directly property for ingestion tasks has been removed.

Highlights

TopN performance improvements

Processing for TopN queries with 1-2 aggregators on historical nodes is now 2-4 times faster. This is accomplished with new runtime inspection logic that generates monomorphic implementations of query processing classes, reducing polymorphism in the TopN query execution path.

Limit clause push down for GroupBy

Druid can now optimize limit clauses in GroupBy queries by distributing the limit application to historical/realtime nodes, applying the limit to partial result sets before they are sent to the broker for merging. This reduces network traffic within the cluster and reduces the merging workload on broker nodes. Please refer to http://druid.io/docs/0.10.1/querying/groupbyquery.html#query-context for more information.

The Firehose implementations for Microsoft Azure, Rackspace Cloud Files, Google Cloud Storage, and Amazon S3 now support caching and prefetching of data. These firehoses can now operate on portions of the input data and pull new data as needed, instead of having to fully read the firehose's input to disk.

And much more!

Updating from 0.10.0 and earlier

Please see below for changes between 0.10.0 and 0.10.1 that you should be aware of before upgrading. If you're updating from an earlier version than 0.10.0, please see release notes of the relevant intermediate versions for additional notes.

Deprecation of support for Hadoop versions < 2.6.0

To add support for Amazon's S3A filesystem, Druid is now built against Hadoop 2.7.3 libraries, and we are deprecating support for Hadoop versions older than 2.6.0.

For users running a Hadoop version older than 2.6.0, it is possible to continue running Druid 0.10.1 with the older Hadoop version using a workaround.

The user would need to downgrade hadoop.compile.version in the main Druid pom.xml, remove the hadoop-aws dependency from pom.xml in the druid-hdfs-storage core extension, and then rebuild Druid.

Users are strongly encouraged to upgrade their Hadoop clusters to a 2.6.0+ version as of this release, as support for Hadoop <2.6.0 may be dropped completely in future releases.

Users should also update any existing druid.indexer.task.defaultHadoopCoordinates configurations to 2.7.3 versions.

Kafka Broker Changes

Due to changes from #4115, the Kafka indexing service is no longer compatible with version 0.9.x Kafka brokers. Users will need to upgrade their Kafka brokers to an 0.10.x version.

If Lookups are being used in prior deployment, then as part of upgrade to 0.10.1, All coordinators should be stopped, upgraded, and then started with version 0.10.1 at one time rather than upgrading them one at a time. There should never be a situation where one coordinator is running 0.10.0 while other coordinator is running 0.10.1 at the same time.

During the course of the cluster upgrade, lookup query nodes will report an error starting with got notice to load lookup [LookupExtractorFactoryContainer{version='null'. This is not actually an error and is a side effect of the update. See #4603 for details.

Off-heap query-time lookup cache

Please note that the off-heap query-time lookup cache is broken at this time because of an excessive memory use issue, and must not be used:#3663

Downloads

Druid 0.10.0 contains hundreds of performance improvements, stability improvements, and bug fixes from over 40 contributors. Major new features include a built-in SQL layer, numeric dimensions, Kerberos authentication support, a revamp of the "index" task, a new "like" filter, large columns, ability to run the coordinator and overlord as a single service, better performing defaults, and eight new extensions.

If you are upgrading from a previous version of Druid, please see "Updating from 0.9.2 and earlier" below for upgrade notes, including some backwards incompatible changes.

Highlights

Built-in SQL

Druid now includes a built-in SQL server powered by Apache Calcite. Druid provides two SQL APIs: HTTP POST and JDBC. This provides an alternative to Druid's native JSON API which is more familiar to new developers, and which makes it easier to integrate pre-existing applications that natively speak SQL. Not all Druid and SQL features are supported by the SQL layer in this initial release, but we intend to expand both in future releases.

Numeric dimensions

Druid now supports numeric dimensions at ingestion and query time. Users can ingest long and float columns as dimensions (i.e., treating the numeric columns as part of the ingestion-time grouping key instead of as aggregators, if rollup is enabled). Additionally, Druid queries can now accept any long or float column as a dimension for grouping or for filtering.

There are performance tradeoffs between string and numeric columns. Numeric columns are generally faster to group on than string columns. Numeric columns don't have indexes, so they are generally slower to filter on than string columns.

Kerberos authentication support

Added a new extension named 'druid-kerberos' which adds support for User Authentication for Druid Nodes using Kerberos. It uses the simple and protected GSSAPI negotiation mechanism, SPNEGO(https://en.wikipedia.org/wiki/SPNEGO) for authentication via HTTP.

Index task revamp

The indexing task was re-written to improve performance, particularly for jobs spanning multiple intervals that generated many shards. The segmentGranularity intervals can now be automatically determined and no longer needs to be specified, but ingestion time can be reduced if both intervals and numShards are provided.

Additionally, the indexing task now supports an appendToExisting flag which causes the data to be indexed as an additional shard of the current version rather than as a new version overshadowing the previous version.

Like filter

Druid now includes a "like" filter that enables SQL LIKE-style filtering, such as foo LIKE 'bar%'. The implementation is generally faster than regex filters, and is encouraged over regex filters when possible. In particular, like filters on prefixes such as bar% are significantly faster than equivalent regex filters such as ^bar.*.

Large columns

Druid now supports individual columns larger than 2GB. This feature is not typically required, since general guidance is that segments should generally be 500MB–1GB in size, but is useful in situations where one column is much larger than all the others (for example, large sketches).

This functionality is available to all Druid users and no special configuration is necessary when using the built-in column types. If you have developed a custom metric column type as a Druid extension, you can enable large column support by overriding getSerializer in your ComplexMetricsSerde.

Coordinator/Overlord combination option

Druid deployments can now be simplified by combining the Coordinator and Overlord functions into the Coordinator process. To do this, set druid.coordinator.asOverlord.enabled and druid.coordinator.asOverlord.overlordService appropriately on your Coordinators and then stop your Overlords. Overlord console would be available on http://coordinator-host:port/console.html.

This is currently an experimental feature and is off by default. We intend to consider making this the default in a future version of Druid.

Better performing defaults

This release changes two default settings to improve out-of-the-box performance:

The buildV9Directly option introduced in Druid 0.9.0 is now enabled by default. This option improves performance of indexing by creating the v9 data format directly rather than creating v8 first and then converting to v9. If necessary, you can roll back to the old code by setting "buildV9Directly" to false in your indexing tasks.

The v2 groupBy engine introduced in Druid 0.9.2 is now enabled by default. This new groupBy engine was rewritten from the ground up for better performance and memory management. If necessary, you can roll back to the old engine by setting either "druid.groupBy.query.defaultStrategy" in your runtime.properties, or "groupByStrategy" in your query context, to "v1". See http://druid.io/docs/0.10.0/querying/groupbyquery.html for details on the differences between groupBy v1 and v2.

Other performance improvements

In addition to better performing defaults, Druid 0.10.0 has a number of other performance improvements, including:

And much more!

Updating from 0.9.2 and earlier

Please see below for changes between 0.9.2 and 0.10.0 that you should be aware of before upgrading. If you're updating from an earlier version than 0.9.2, please see release notes of the relevant intermediate versions for additional notes.

Rolling updates

Query API changes

Please note the following backwards-incompatible query API changes when updating. Some queries may need to be adjusted to continue to behave as expected.

JavaScript query features are now disabled by default for security reasons (#3818). If you use these features, you can re-enable them by setting druid.javascript.enabled=true in your runtime properties. See http://druid.io/docs/0.10.0/development/javascript.html for details, including security considerations.

GroupBy queries no longer allow __time as the output name of a dimension, aggregator, or post-aggregator (#3967).

Select query pagingSpecs now default to fromNext: true behavior when fromNext is not specified (#3986). Behavior is unchanged for Select queries that did have fromNext specified. If you prefer the old default, then you can change this through the druid.query.select.enableFromNextDefault runtime property. See http://druid.io/docs/0.10.0/querying/select-query.html for details.

SegmentMetadata queries no longer include "size" analysis by default (#3773). You can still request "size" analysis by adding "size" to "analysisTypes" at query time.

Deployment and configuration changes

Please note the following deployment-related changes when updating.

Druid now requires Java 8 to run (#3914). If you are currently running on Java 7, we suggest upgrading Java first and then Druid.

Druid now defaults to the "v2" engine for groupBy rather than the legacy "v1" engine. As part of this, memory usage limits have changed from row-based to byte-based limits, so it is possible that some queries which met resource limits before will now exceed them and fail. You can avoid this by tuning the new groupBy engine appropriately. If necessary, you can roll back to the old engine by setting either "druid.groupBy.query.defaultStrategy" in your runtime.properties, or "groupByStrategy" in your query context, to "v1". See http://druid.io/docs/0.10.0/querying/groupbyquery.html for more details on the differences between groupBy v1 and v2.

This new groupBy engine was rewritten from the ground up for better performance and memory management. The query API and results are compatible between the two engines; however, there are some differences from a cluster configuration perspective: (1) groupBy v2 uses primarily off-heap memory instead of on-heap memory; (2) it requires one merge buffer for each concurrent query; (3) if you've configured caching for groupBy, it must be on historicals; and (4) it doesn't support chunkPeriod.

Druid now has a non-zero default value for druid.processing.numMergeBuffers (#3953). This will increase the amount of direct memory used in its default configuration. This change was made in connection with changing the default groupBy engine, since the new "v2" engine requires merge buffers. If you were formerly using groupBy v1, you should be able to offset this by reducing your JVM heap, since groupBy v2 uses off-heap memory rather than on-heap memory.

Druid query-related processes (broker, historical, indexing tasks) now eagerly allocate druid.processing.numThreads number of DirectByteBuffer instances of size druid.processing.buffer.sizeBytes for query processing at startup (#3628). This allocation was lazy in earlier versions of Druid. This change does not affect memory use of a long-running Druid cluster, but it means that memory for processing buffers must be available when Druid starts up.

When running in non-UTC timezones, behavior of predefined segmentGranularity constants such as "day" will change when upgrading. Formerly, this would generate segments using local days. In Druid 0.10.0, this will generate segments using UTC days. If you were running Druid in the UTC timezone, this change has no effect. Please note that running Druid in a non-UTC timezone is not officially supported (see http://druid.io/docs/0.10.0/configuration/index.html) and we recommend always running all processes in UTC timezones. To create local-day segments using Druid 0.10.0, you can use a local-time segmentGranularity such as:

Extension changes

AggregatorFactory, BufferAggregator, and PostAggregator interfaces have changed (#3894, #3899, #3957, #4071). If you have deployed a custom extension that includes an AggregatorFactory or PostAggregator, it will need to be recompiled. Druid's built-in extensions have all been updated for this change.