1.4.1 Release Notes

Enhancements

Bug Fixes

Fix use hbase command in HBase sourcesPreviously, Dremio would switch the query context to HBase sources' internal hbase system namespace. This behavior is now disabled. In the unlikely event that users need to access the system namespace, it's available through the fully qualified dataset name (e.g. MyHBase.hbase.table).

Parallelization fixes for preview queriesPreview queries now have expanded samples and better parallelization.

1.4.0 Release Notes

Enhancements

Source Adapters

Support custom paths as root of the FileSystem in MapRFS/HDFS sourcesIn earlier versions, Dremio only supported using HDFS/MapRFS root directory as source starting point. Now users can add custom paths in MapRFS/HDFS as root of the source. Users will not be able to access any files outside the custom path given as root.

Support pushing down NOT LIKE to ElasticsearchDremio can now pushdown NOT LIKE operator into Elasticsearch. This is also supported if the Boolean expression are part of a larger complex expression.

Partial predicate pushdown to ElasticsearchWith this improvement, the parts of a predicate conjunction that can be pushed down to Elasticsearch will be pushed down, with the remaining components handled in Dremio execution engine.

Coordination and metadata

High Availability support using ZooKeeperDremio now supports having multiple master nodes for HA purposes using ZooKeeper. This improvement simplifies HA configuration and significantly reduces overall failover times. Dremio will determine if a node is a master node based on a specific configuration flag and not if its hostname matches the configured master host like in previous releases.

Support for group based column and row level permissions (EE only)Dremio now supports is_member(“groupname”) function that checks query user's group membership. The function returns true if the user belongs to the "groupname" group. This function can be used in filter conditions in a VDS to define permissions on rows and columns.

Use LDAP sAMAccountName to login (EE only)In LDAP configuration, Dremio had supported Distinguished Name templates for usernames. Administrators can now choose a different attribute (e.g. sAMAccountName) in a DIT entry as username for login. For details, see the LDAP section in documentation.

Metadata indexing and caching improvementsDremio will now execute metadata queries coming from clients much faster even when filters are present. This is achieved through improved caching and indexing of all source metadata.

Execution

Enhance Arrow Value Vectors for better performance, less heap overheadDremio’s in-memory query execution engine is completely based on columnar data structures and formats provided by Arrow. During testing, we have seen several opportunities for improvement in areas of performance, memory usage and code maintainability. We improved the entire Java implementation of Arrow where improving performance and reducing heap memory usage were the main focus areas.

Bug Fixes

Acceleration

Aggregation reflection with join might fail to substituteIn some cases, Dremio would not accelerate join queries that would get accelerated by aggregation reflections. This is now fixed.

Coordination and Metadata

Rank using aggregate is not workingDremio now supports rank function on aggregates like count, sum, and others.

Execution

Low thread usage and high wait time when we have a large number of blocking tasksDremio is now better at scheduling system resources when there is a mix of CPU-bound and I/O-bound tasks in the system.

Very uneven partitions in a HashJoinPlanning for a join was partitioning the dataset into too few discrete partitions, which was causing some executor nodes to have noticeably more work than others, and hence take noticeably longer to serve a query. That has now changed to spread the work more evenly among the executor nodes.

IO wait times when reading from file system sources are incorrectly reported as 0Now Dremio expects to show the IO wait times when reading from file system sources in query profile to help debug the latencies in query run time.

Query reattempt due to Out of Memory failure happens before fragment cleanupDremio has a reattempt logic that determines if the reason for query failure is recoverable or not. In case of Out of Memory related failures, we start the reattempt. However, the problem was that resource cleanup didn’t happen completely in the previous failed attempt of query. Due to this the next attempt failed in the setup phase itself even before execution began. We now wait for all the query fragments to be terminated/retired properly before issuing the next attempt.

CONCAT function fails with length > 256 charactersSome queries that use the CONCAT (arg1, arg2 …) SQL function were failing when the length of concatenation result was greater than 256 characters. The problem was incorrectly using internal memory buffers to store the intermediate results of concatenation. This is now fixed.

Web Application and APIs

Hide some Reflection UI from Jobs from non-adminsIn the Job details view, you can click on a Reflection to view and modify its definition. That link was showing for regular users and now that link will only be shown to administrators.

Explore grid values hidden by scrollbarWhen viewing a dataset in the UI, the scrollbars of the grid could sometimes draw over the data. This has now been fixed to behave correctly and not cover up any data.

Source Adapters

Elasticsearch adapter uses wrong hashCode to check for changesThis incorrect hashCode would cause Dremio to think that the Elasticsearch index mapping had changed, even when it hadn't. This caused some Elasticsearch queries to fail.

Dremio fails to read a Hive table containing partitions with different schemasA Hive table can contain two partitions with different schemas. This happens when table schema is altered after one or more partitions are already created. Partitions created after table schema change contain the data in new table schema format. When reading partitions with old schema, Dremio now converts them to new table schema using Hive provided SerDe utilities. There is no schema conversion for partitions with new schema. The behavior here is similar to Hive.