Channels

Services

Apache Hive goes fully native on Windows

The Apache Hive developers have published the latest version of their extension for Apache Hadoop which gives the MapReduce distributed computing and storage system an SQL-like query language. Among the 350 issues from the Apache Hive issue tracker addressed in the new version, 0.10.0, there are a set of fixes which make the Windows port of Hive a native application on Microsoft's operating system. The changes remove a previous dependency on cygwin. Improved Windows support is not the only enhancement though.

For example, the SQL-like language's GROUP function now supports Cubes and Rollup, List Bucketing and optimised Skewed Joins. Hive can now read and write Apache Avro serialised data. Described as evolving Hive security "to a point where it's no longer simply preventing users from shooting their own foot", the metastore now does server, rather than client-side, authorisation of calls to it. Those connections to the metastore layer have also been enhanced with robust retry support.

Other new features include an optimisation which allows some simple queries that don't need to aggregate data to skip the MapReduce steps to run faster, and optimisations for Union which, under particular conditions, cut down on the number of MapReduce jobs needed to complete the request. The Explain command now produces more relevant information and it is possible to show how a table was created and even reinstate a dropped table structure and its metadata, but not its content. Finally, the HWI web interface has been refreshed using Bootstrap and there are more statistics available.

A detailed list of all the changes is available in the release notes. The new release works with Hadoop 0.20.x, 0.23.x.y, 1.x.y and 2.x.y and is available to download from an Apache mirror site.