3.0 Release Notes

What's New

General-availability Features

Execution

High Performance Parallel Exports (CTAS)The use of high performance parallel exports allow users to create, reorganize,
download and export large (>1 million rows) or small datasets from any source of data into
any of the CTAS-supporting data sources within Dremio. When users employ the CTAS statement,
Dremio will then store the results of the query into one or many parquet files (depending on the size of the source)
on which users have full control on the naming, destination path and security rules.
Like any dataset created within Dremio, the results of the
high performance parallel exports will be cataloged and made searchable so users can easily
find, share, and collaborate.
See Tables for more information.

CTAS supports all filesystem source types (S3/ADLS/NAS/HDFS/MapR-FS, etc.)
using the filesystem permissions for the written table using impersonation.

CTAS is enabled on a per source basis during source creation through the Dremio UI.
This enables functionality on the source connector using impersonation for permissions
which allows WRITE access to the source via SQL-based commands.

Web Application

Wikis and TagsThe Wiki feature allows you to add rich content (text, images, etc…) for a Space (and its datasets)
or a Source (and its datasets).
The Tag feature allows you to create and assign tags to all datasets.
See Data Curation for more information.

Dataset Catalog tabThe Dataset Catalog tab provides a single location for you to manage all dataset context and metadata.
It allows you to create and manage Wiki content and Tags for all datasets.
See Data Curation for more information.

Security

Apache Ranger Support (Enterprise only)Dremio now offers Ranger Based authorization for Hive.
This authorization method checks the Ranger policy permissions for the end user logged into Dremio and
then allows/disallows access as defined by the Ranger policy.
See Ranger authorization in Hive for more information.

You enable the Hive authorization client when you add a new Hive Source to Dremio.

ODBC / JDBC Wire Encryption (Enterprise only)Dremio now supports using TLS/SSL for encrypting communication between ODBC and JDBC clients and server (coordinator).
You enable TLS/SSL on coordinators via the configuration file.
The clients use connection properties to enable TLS/SSL.
See Using Wire Encryption for more information.

Intra-cluster Wire Encryption (Enterprise only)Dremio now supports using TLS/SSL for encrypting communication between nodes.
You enable TLS/SSL on all coordinators and executors via the configuration file.
See Using Wire Encryption for more information.

AWS S3 IAM Role-based AccessDremio now supports IAM role-based access to S3 buckets.
On top of using access key/secret, S3 sources can now use IAM roles
from EC2 instance metadata for access.

Preview-only Features

Preview-only features are disabled by default.

Gandiva (Preview-only)

The Gandiva feature supports efficient evaluation of arbitrary SQL expressions on Arrow buffers
using runtime code generation in LLVM.
It uses LLVM tools to generate and compile code that makes optimal use of underlying CPU architecture.
By combining LLVM with Apache Arrow libraries, Gandiva can perform low-level operations on
Arrow in-memory buffers that are highly optimized for specific runtime environments.
See Gandiva-based Execution for more information.

Gandiva provides:

Improved resource utilization

Faster, lower-cost operations of analytical workloads

[info] Note

LLVM tools are a set of modular compiler tools that deal with code generation.
They are used to compile and execute arbitrary expressions efficiently (instead of interpreting them).
In the Dremio context, this is useful for generating code at runtime for two SQL operators
that deal with arbitrary user expressions such as Project and Filter.

Workload Management (Preview and Enterprise only)

The Workload Management feature improves workload management via user-defined job queues.
These queues that are associated with different resource constraints and flexible assignment rules
for assigning user jobs into these queues.

Workload Management is displayed in the Dremio UI on the Admin console and
the Queues and Rules sections allow you to manage your queues and rules.

Deprecations

Multi-Role Nodes in Cluster Deployment

Configuring Dremio in C/E mode (coordinator and executor instances on the same node)
is deprecated for cluster deployments.
Multi-roles are only supported in single-node installations.

Hive

Starting with Dremio 3.0, any use of Hive scalar function are deprecated.

Fixed Issues

Coordinator

Reflection refresh reattempts can cause duplication or incomplete datasets.Fixed some situations where a reattempt of a reflection refresh
could cause duplicated or incomplete datasets in rare cases.

Query reattempts can cause duplication or incomplete datasets.Fixed some situations where a reattempt of a query
run from the UI could cause duplicated or incomplete datasets in rare cases.

Clicking 'Edit Original SQL' link gave an error when a requested dataset
was renamed or moved after it was copied.Fixed. Copying a dataset now starts with a clean history.

Incorrect comparison of CHAR values.String literals in SQL queries are now treated as VARCHAR, rather than CHAR types.
This allows consistent behavior when dealing when string literals
in case statements and equality filters.

Apache Parquet logs are written to a different location and not configurable.Resolved by integrating the Parquet logs with the internal logging system.
This results in all logs being centrally located.

When a reflection job is cancelled, the temporary directory created for materialization is not removed.Resolved the cleanup process when cancelling reflections jobs.

Web Application

Edit Original SQL button sometimes hangs or causes confusion regarding the dataset version.The logic for the Edit Original SQL button has been fixed.
If users have unsaved changes in their current query at the time that they click on this button,
they wil be prompted to save or abandon their changes.

The Preview button is obscured by the dataset history tooltip in certain situations.Fixed the user interface.

The Never Refresh checkbox for reflections doesn't display as checked in the
Community Edition UI.The Never Refresh checkbox for reflections now works correctly.

Loading a canceled job gave an a 'doesn't exist' error which was not descriptive enough.Fixed by providing a new error message: "Could not load results as the query was canceled".

Previewing JSON files with union types causes a NullPointerException.Previewing JSON files with a schema that includes a union of different types
(for example: string and integer)
exposes an issue in the underlying Arrow UnionReader,
causing a NullPointerException in Dremio.

Updating virtual dataset definitions via the REST API causes metadata issues.Updating the SQL of a VDS would not correctly refresh the metadata of the VDS.
For example, the list of fields would not update.

Plugins

Hive queries occasionally fail when all partitions are pruned.A valid query against a Hive table could cause multiple planner exceptions
and the query to fail if all partitions are pruned during planning.

The Dremio Hive source setting to control zerocopy is not taken into
account to enable/disable ORC zerocopy.This setting is now applied correctly.

Hive ORC transactional tables that have not been compacted will report an incorrect row count.In particular, if the table has never been compacted,
it will report 0 records, which results in sub-optimal query plans.

Running a preview query against a Oracle table might run longer.Fixed by pushing down the limit clause to the Oracle source.

Dremio is unable to connect to an Oracle source if the password contains special characters._When saving the source, user would previously get a "Invalid Oracle URL specified" error message.

The error message, "Self-suppression not permitted" occurs when establishing HDFS connection fails.When the HDFS client queues requests and a connection fails to be established,
all of the requests receive the same exception instance: "Self-suppression not permitted".
Fixed the "Self-suppression not permitted" issue.

Miscellaneous

When running COUNT on a SQL Server data set that is larger than 2,147,483,647 rows, the source will return
an arithmetic overflow error.Resolved by pushing down the COUNT() aggregate function as COUNT_BIG() in SQL Server.

When retrieving a folder ID with REST API catalog/by-path, the returned ID sometimes
utilized quotes incorrectly.
Resolved by normalizing the ID during validation.

With SQL Server, if one of the source tables is very small and a query witha join is performed, the reflection does not substitute properly.Fixed by improving the multi-join normalization operation.

If Dremio runs out of memory, an exception occurs in the PartitionedCollector.
The buffers already allocated are not released which causes a memory leak.Resolved by implementing auto rollback and closing all buffers after running out of memory.

Upgrading fails if a dataset name contains the forward slash (/).
This occurred because Dremio used the forward slash to delineate between the dataset version string and the path.Fixed the upgrade so that forward slashes in dataset names are no longer an issue.

For a non-partitioned Hive table, an incorrect split key is generated.
Resolved by showing the correct information in the exception message when an error occurs.

When MySQL returns large amounts of data in response to a query, the connection will timeout.
This is because there is a property 'net_wait_timeout' that defaults to 30 seconds, unless set by the JDBC connection.Resolved by adding the abiity to set net write timouts on the MySQL JDBC connection.

In some circumstances, changing the setting for DREMIO_MAX_MEMORY_SIZE_MB causes Dremio to fail to start.
Resolved the issue.

Known Issues

If you submit a query using the JDBC driver and cancel it from the UI,
the query will appear to have successfully completed with no warning or exception.

When starting Dremio (after upgrading to 3.0), most reflections enqueue a refresh job
(typically, this happens only once).This refresh occurs even if the reflection's refresh interval isn't due or the refection has "never refresh" set.
Thereafter, if refresh intervals were set, then all reflections resume their usual refresh cycle.

The datetimeoffset data type in SQL Server incorrectly gets the COLLATE clause applied to it.Set dremio.jdbc.mssql.push-collation.disable to true to use this field.

3.0.1 Release Notes

Enhancements

Improved Hive Transactional Table PerformanceDremio uses a vectorized reader
for splits in Hive-partitioned ORC transaction tables which have no deltas.
Splits that have deltas will continue to use the non-vectorized reader.

Encrypted Postgres ConnectionsDremio supports encrypted connections to Postgres using SSL.
To enable SSL, check the "Encrypt connection" box when creating the source.
For further configuration, navigate to Advanced Options tab.

See the Dealing with Mixed Types section in Datasets for more information.

Fixed Issues

Streaming aggregation is not grouping properly within the Window Aggregate query.This occurs when data is sorted on multiple columns and one of the sort columns is dropped in the plan.
This results in dependent columns retaining the original sort order.
This issue was resolved by removing the sort order on the dependent columns.

When running a query against a ADLS source,
the DATA_READ ERROR: Error reading data from response stream in positioned read() for file occurs.This issue is resolved by upgrading the Azure data lake store SDK to version 2.3.2.

Unable to use complex functions on columns of union type with a complex subtype.Dremio now supports ASSERT_STRUCT and ASSERT_LIST functions to handle complex subtype in a column of union type.
For example: flatten(assert_list(union_column)).

Under certain circumstances when the number of keys in Unpivots is more than 32,
an IndexOutOfBoundsException failure may occur.Fixed by accessing bit buffers directly and using getNullByteOffset and getNullBitOffset
as the offsets to access validities and values.

When viewing a folder in a file system source that has more than 1024 files,
the "maxClauseCount is set to 1024" error is displayed.This error occurs because Dremio has an internal limit for retrieving tags associated with files in a folder.
Resolved by retrieving tags for 200 files maximum and displaying a notification,
"Tags are only shown inline for the first 200 items.", when that maximum is reached.

In the Dremio UI Wiki, tables do not display properly.Fixed the issue so that the Markdown tables display properly.

If the Dremio app is killed within the first minute of startup,
an "Unknown source INFORMATION_SCHEMA" error occurs.This error happens when the internal index is out of sync with the internal store.
Dremio now partially reindexes the uncommitted updates.

Unable to save the Dremio UI Wiki for source file system folders.The Dremio UI Wiki is unavailable for file system folders.

When reading Hive tables in ORC format, heap memory runs out for small tables.Resolved by setting default options on Hive through the hive-site.xml file.

After adding a JSON file to MongoDB, an SCHEMA LEARNING error occurs when selecting the query run.Fixed an issue where the reported schema of MongoDB is consistent with the record reader schema
when there are complex references.

Joins following an aggregation using a min/max of a string column might cause query failures.The query failure is usually manifested as an array out of bounds exception.
In rare cases, the failure might cause the executor node itself to fail.

IOException error occurs when starting up Dremio web server with SSLWhen the web server is configured to use custom certificates, truststore was previously optional.
This behavior regressed in 3.0.0 and the truststore is required. The truststore is optional in 3.0.1.

3.0.5 Release Notes

Enhancements

Data Sources

Execution

Improved query performance when using LIMIT on large queries.Dremio now more efficiently cancels rest of query execution when the needed amount of values
are returned from a query based on the LIMIT clause.

Improved CPU resource utilization and balancing on execution nodes.Execution logic has been improved to better handle longer running queries when the system load is not high.

Fixed Issues

Dremio doesn't work properly when one or more Parquet files in a directory have zero (0) record files.Dremio works correctly in the circumstance when some files have zero (0) records and some files do have records.

When running a CTAS command against a filesystem configured to use impersonation, the files created
by Dremio executors are owned by the same user as the Dremio process, and not by the user who ran the query.This issue is resolved by ensuring that during directory creation time, table and directory ownership are correct.

By default, the PostgreSQL JDBC driver caches the entire query results into memory.
This means that when doing a table scan on large tables, it is easy to run out of memory.Resolved this issue by setting auto-commit to off when creating a connection to PostgreSQL
so that the JDBC driver properly limits the amount of memory being used by the fetch size.

For Hive, ACID tables cannot be read when hive.exec.orc.zerocopy is enabled.Resolved this issue by fixing a Hive improper byte starting position
when slice covers two (2) or more zerocopy buffers.

Excessive heap memory usage when working with large, complex queries.In certain cases with large/complex queries, query plan instructions coordinators
send to the executor nodes could result in excessive heap memory usage.
This mechanism has been improved to be more heap efficient.

AssertionError: Relational expression rel# error when joining VALUES table (on the left side) and a
JDBC table.When running a JOIN between a VALUES table (on the left side) and a
JDBC table, planning will fail. This issue is now fixed.

Excessive planning time when working with Hive tables when the query has many threads.This would happen in decently sized cluster (many cores) when working against Hive tables that have many partitions.
Resolved with improved memory management logic across threads.

Query scans extra columns when there are window functions.Planning logic has been updated to only scan the relevant columns.

Direct memory wouldn’t get cleaned up after completing a query on Hive ORC tables.Memory allocation and clean-up logic has been updated to correctly handle this scenario.

When a window is resized, at some point double vertical scrollbars appear on the Users page.
At a certain height of the window, the scrollbars might cause page flickering.Fixed the issue.

3.0.6 Release Notes

Enhancements

ARP Performance

Gathering schema and table information from relational sources takes too longFetching schema and table information is now more efficient and takes less time
when adding new relational sources.

Fixed Issues

When viewing a folder in a non-files system source that has more than 1024 files,
the "maxClauseCount is set to 1024" error is displayed.This has been resolved for all affected sources.

Pushdowns into Oracle with identifier names longer than 30 characters would fail.Queries would fail with error: “The JDBC storage plugin failed while trying setup the SQL query”.
Dremio now rewrites aliases longer than 30 characters
for Oracle to avoid errors when pushing queries to Oracle.

Format previews did not work when a directory has 'hidden' files
(files starting with an underscore in the file name).Resolved by ignoring period and underscores in files when performing format previews.