Kudu 0.10.0 and Impala Kudu released

Cloudera is happy to announce the availability of parcels and packages for Kudu 0.10.0.

Kudu 0.10.0 delivers a number of new features, bug fixes, and optimizations, detailed below.

Kudu 0.10.0 maintains wire compatibility with previous releases, meaning that applications using the Kudu client libraries can be upgraded either before, at the same time as, or after the Kudu servers. However, if you use new features of Kudu 0.10.0, such as manually range-partitioned tables, you must first upgrade all clients to this release.

This release does not maintain full Java API or ABI compatibility with Kudu 0.9.x due to a package rename and some other small changes. See below for details. We are also releasing a refresh of the Impala Kudu parcel.

Gerrit #3737 The Java client has been repackaged under org.apache.kudu instead of org.kududb. Import statements for Kudu classes must be modified to compile against 0.10.0. Wire compatibility is maintained.

Gerrit #3055 The Java client’s synchronous API methods now throw KuduException instead of Exception. Existing code that catches Exception should still compile, but introspection of an exception’s message may be impacted. This change was made to allow thrown exceptions to be queried more easily using KuduException.getStatus and calling one of Status’s methods. For example, an operation that tries to delete a table that doesn’t exist would return a `Status that returns true when queried on isNotFound().The Java client’s KuduTable.getTabletsLocations set of methods is now deprecated. Additionally, they now take an exclusive end partition key instead of an inclusive key. Applications should use the scan tokens API instead of these methods.

The C++ API for specifying split points on range-partitioned tables has been improved to make it easier for callers to properly manage the ownership of the provided rows.The TableCreator::split_rows API took a vector<const KuduPartialRow*>, which made it very difficult for the calling application to handle errors with cleanup when setting the fields of the KuduPartialRow. This API has been deprecated and replaced by a new method, TableCreator::add_range_split, which allows easier use of smart pointers for safe memory management.

The Java client’s internal buffering has been reworked. Previously, the number of buffered write operations was constrained on a per-tablet-server basis. Now, the configured maximum buffer size constrains the total number of buffered operations across all tablet servers in the cluster. This provides a more consistent bound on the memory usage of the client, regardless of the size of the cluster to which it is writing. This change can negatively affect the write performance of Java clients that rely on buffered writes. Consider using the setMutationBufferSpace API to increase a session’s maximum buffer size if write performance seems degraded after upgrading to Kudu 0.10.0.

The Remote Bootstrap process used to copy a tablet replica from one host to another has been renamed to Tablet Copy. This resulted in the renaming of several RPC metrics. If you were previously explicitly fetching or monitoring metrics related to Remote Bootstrap, update your scripts to reflect the new names.

The SparkSQL datasource for Kudu no longer supports mode Overwrite. Use the new KuduContext.upsertRows method instead. Additionally, inserts using the datasource are now upserts by default. The older behavior can be restored by setting the operation parameter to insert.

New features

You can now manually manage the partitioning of a range-partitioned table. When a table is created, you can specify a set of range partitions that do not cover the entire available key space. You can add or drop range partitions to existing tables. This is particularly helpful with time-series workloads in which new partitions can be created on an hourly or daily basis. Old partitions can be efficiently dropped if the application does not need to retain historical data past a certain point. This feature is experimental for the 0.10 release. More details can be found in the accompanying blog post.

Support for running Kudu clusters with multiple masters has been stabilized. You can start a cluster with three or five masters to provide fault tolerance if one or two masters fail, respectively. Some tools (for example, ksck) still lack complete support for multiple masters. These deficiencies will be addressed in a following release.

Kudu now supports the ability to reserve a certain amount of free disk space in each of its configured data directories. If a directory’s free disk space drops to less than the configured minimum, Kudu stops writing to that directory until space becomes available. If no space is available in any configured directory, Kudu aborts. Configure this feature using the fs_data_dirs_reserved_bytes and fs_wal_dir_reserved_bytes flags.

The Spark integration’s KuduContext now supports four new methods for writing to Kudu tables: insertRows, upsertRows,updateRows, and deleteRows. These are now the preferred way to write to Kudu tables from Spark.

Improvements and optimizations

KUDU-1516 The kudu-ksck tool has been improved and now detects problems, such as when a tablet does not have a majority of replicas on live tablet servers, or if those replicas are not in a good state. Users who currently depend on the tool to detect inconsistencies may now see failures when before they would not see any.

Gerrit #3477 The way operations are buffered in the Java client has been reworked. Previously, the session’s buffer size was set per tablet, meaning that a buffer size of 1,000 for 10 tablets allowed 10,000 operations to be buffered at the same time. With this change, all the tablets share one buffer, so users might need to set a bigger buffer size to reach the same level of performance as before.

KUDU-1444 Added support for passing back basic per-scan metrics (for excample, cache hit rate) from the server to the C++ client. See the KuduScanner::GetResourceMetrics() API for detailed usage. This feature will be supported in the Java client API in a future release.

KUDU-1446 Improved the order in which the tablet server evaluates predicates, so that predicates on smaller columns are evaluated first. This may improve performance on queries that apply predicates on multiple columns of different sizes.

Gerrit #3541 Fixed a problem in the Java client whereby an RPC could be dropped when a connection to a tablet server or master was forcefully closed on the server side while RPCs to that server were being encoded. The RPC was not sent, and users of the synchronous API received a TimeoutException. Several other Java client bugs that could cause similar spurious timeouts were also fixed.

Gerrit #3724 Fixed a problem in the Java client whereby an RPC could be dropped when a socket timeout was fired while that RPC was being sent to a tablet server or master. This manifested in the same way as Gerrit #3541.

KUDU-1538 Fixed a bug in which recycled block identifiers could cause the tablet server to lose data. Block identifiers are no longer reused.

Other noteworthy changes

This is the first release of Apache Kudu as a top-level (non-incubating) project!

The default false positive rate for Bloom filters has been changed from 1% to 0.01%. This increases the space consumption of Bloom filters by a factor of two (from approximately 10 bits per row to approximately 20 bits per row). This is expected to substantially improve the performance of random-write workloads at the cost of an incremental increase in disk space usage.

The Kudu C++ client library now has Doxygen-based API documentation available online.

Install the new Kudu packages or parcels, or install Kudu 0.10.0 from source.

Restart all Kudu services.

Rolling upgrades are not supported when upgrading from Kudu 0.9.x to 0.10.0 and are known to cause errors in this release. If you have a problem after an accidental rolling upgrade, shut down all services and then restart all services. The system should come up as expected.

For the duration of the Kudu Beta, instructions are generally provided only for upgrading from the previous latest version to the newly released version.

Downgrading from 0.10.0 to 0.9.x

After upgrading to Kudu 0.10.0, you can downgrade to 0.9.x, with the following exceptions:

Tables created in 0.10.0 are not accessible after a downgrade to 0.9.x

A multi-master setup formatted in 0.10.0 cannot be downgraded to 0.9.x

As always, your feedback is appreciated. For general Kudu questions, visit the community page. If you have any questions related to the Kudu packages provided by Cloudera, including installation or configuration using Cloudera Manager, visit the Cloudera Community Forum.