1.3.1 Release Notes

Bug Fixes

Issue querying HBase tables from HiveQuerying HBase tables from Hive in Dremio would fail in some cases. This is now fixed. Users need to include hbase-site.xml from HBase to Dremio's /conf location.

Improved error messages when working with HiveErrors when reading data from Hive sources will now include more context.

Issue with data type changes when reading partitioned Hive tablesChanges in the data types for Hive tables would sometimes result in failed queries. Dremio now better handles different schemas across partitions.

1.3.0 Release Notes

Enhancements

Acceleration

Improved reflection profiles
Query profiles now include more detailed information about reflections such as names of reflections, what reflections were considered, matched and chosen, details for the best cost query plan and canonicalized user query.

Improved reflection matching logic when working with multiple tablesMatching performance and reflection coverage has been increased when querying multiple datasets that have multiple reflections defined.

Execution

Improved memory profiling Dremio now records more details about memory usage. Information on peak amount of memory across phases per node is now available.

Better thread scheduling when some cores are idle
Dremio now better handles scheduling threads when some of the cores are idle. This option is disabled by default in this release. The debug.task.on_idle_load_shed flag can be used to enable this option, followed by restarting all the execution nodes.

Performance improvements working with NULLS in Arrow
This update reduces the amount of heap churn when interacting with validity vectors for all data types and provides better performance working with NULL values.

Ability to download Parquet in Dremio UIDatasets can now be downloaded as Parquet files, which will preserve all type information. This option respects the 1,000,000 row system-wide download limit.

Support for byte-order-marks (BOM) for text files BOM are now recognized when reading text files.

Coordination and metadata

Tableau for Mac supportAdds support for Tableau on Mac with Dremio ODBC Connector. Requires Tableau 10.4 or higher and Dremio Connector 1.3.14 or higher installed on the machine.

Metadata store maintenance utility
The dremio-admin utility now has a clean action that can be used to compact the metadata store, delete orphan objects, delete jobs based on age and reindex the data.

Web Application

Improvements to Job information
Job information will now automatically refresh. New queries will also give detailed information about which Data Reflections were used, and which were not used.

Safari Support (experimental)Dremio now supports Safari, starting with Safari 11.

REST API for SourcesDremio now has a public REST API for managing sources.

Bug Fixes

Acceleration

Windows queries fail if any reflection is chosenFixed issue with acceleration when using certain window function patterns.

Reflection field list incorrectly shows fields as having mixed typeFixed various bugs affecting dataset schema information when working with reflections.

Reflections on datasets from RDBMS sources are immediately marked as expiredFixed issue where reflections on datasets from RDBMS sources are marked as expired right after creation.

MaterializationTask fails to get the TTL of JDBC queriesFixed bugs that were preventing reflections on JDBC datasets to be properly refreshed.

Left outer join queries not getting acceleratedFixed issue where left outer join queries were not getting accelerated with certain query patterns.

Partial raw materializations are not matched when doing a join that requires only available columnsUpdated acceleration logic to leverage raw reflections in a larger set of scenarios.

Substitution fails to flatten the array and gives wrong resultsFixed various bugs when using queries with flatten function against datasets with reflections.

Handle "in-progress" Materialization tasks on startupIf the cluster is restarted while reflection materialization tasks are running, we make sure to mark those materialization as failed. This prevents issues with reflection maintenance after cluster restarts.

Coordination and Metadata

Use of binary collation with SQL ServerPushdowns with string comparisons in SQL Server are now using a binary collation, consistent with Dremio's own collation.

String data from SQL server is trimmedString comparisons in SQL server ignore trailing spaces. To have a consistent behavior in Dremio, string data fetched by Dremio from SQL Server is trimmed from trailing spaces in order for comparisons with other systems to be consistent.

Edit original SQL fails after 2 or more transforms applied on virtual datasetThis should now work as expected.

Get error on Exclude when selecting "1970-01-01 00:00:00.000" date & timeUsers are now able to select time within 100 ms boundary of Unix epoch.

SPLIT_PART() throws an 'IndexOutOfBoundsException'SPLIT_PART() function can now handle multiple parts.

Issue with different metadata refresh intervalsAlthough Dremio has two settings for the refresh rate of names vs. dataset definitions, the name-only refresh was not working as expected for some sources, and Dremio would always update the full dataset definitions. The individual settings are now observed for all sources. Moreover, when a source is added, Dremio only needs to find the dataset names before the UI allows the user to continue. The full set of metadata is refreshed in the background.

JDBC date/time issueIn certain scenarios, date/time values returned to JDBC clients could be off by one. This issue is now fixed.

Execution

Proxy settings for S3 are ignoredAttempting to set up an S3 source through a proxy would fail in Dremio. This behavior is now fixed -- Dremio will correctly propagate all the proxy settings to the S3 client.

Avoid repeated object creation in reading/writing column dataThe in-memory data structures in Arrow provide a read-only and write-only view of memory through accessor and mutator interfaces respectively. In our heap analysis, we noticed a bug where the volume of mutator objects was close to 66 million. The reason was that every time we asked for a mutator and accessor, a new object was created on heap upon every call. The fix resolves the problem.

Update default value of max width per node to be average number of cores across all executor nodesDremio has an external option “MAX_WIDTH_PER_NODE” to tune the degree of parallelism we use during the execution of a query. The default value of this parameter used to be 70% of the number of cores on a particular node. We now made changes such that default value of this option considers the number of cores across all executor nodes in Dremio cluster.

Null values in Complex data types were not correctly handled by WRITER operatorDremio’s writer operator was not able to handle NULL values in complex/nested types. The fix resolves the problem.

Reduce heap usage in Parquet readerThe fix changes the code to use extremely lightweight (less heap overhead) and more efficient data structures in the critical path of Parquet reader code. Similar changes were also done for auxiliary structures we use in our implementation of hash join / hash agg operators.

Fix over-allocation of memory in our columnar data structuresIn Dremio, all data is nullable so we use auxiliary structure to track NULL or non-NULL nature of cell values in a particular column. The problem was that we were over-allocating (8x) the memory for the auxiliary structure. The fix resolves the problem.