Talos Vulnerability Report

TALOS-2016-0177

HDF5 Group libhdf5 H5Z_NBIT Code Execution Vulnerability

November 17, 2016

CVE Number

CVE-2016-4331

Description

HDF5 is a file format that is maintained by a non-profit organization, The HDF Group. HDF5 is designed to be used for storage and organization of large amounts of scientific data and is used to exchange data structures between applications in industries such as the GIS industry via libraries such as GDAL, OGR, or as part of software like ArcGIS.

The vulnerability exists when the library is decoding data out of a dataset encoded with the H5Z_NBIT decoding. When calculating the precision that a BCD number is encoded as, the library will fail to ensure that the precision is within the bounds of the size. Due to this, the library will calculate an index outside the bounds of the space allocated for the BCD number. Whilst decoding this data, the library will then write outside the bounds of the buffer leading to a heap-based buffer overflow. This can lead to code execution under the context of the application using the library.

Tested Versions

hdf5-1.8.16.tar.bz2

tools/h5ls: Version 1.8.16

tools/h5stat: Version 1.8.16

tools/h5dump: Version 1.8.16

Product Urls

http://www.hdfgroup.org/HDF5/

http://www.hdfgroup.org/HDF5/release/obtainsrc.html

http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.16.tar.bz2

CVSSv3 Score

8.6 - CVSS:3.0/AV:L/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H

Details

The HDF file format is intended to be a general file format that is self-describing for various types of data structures used in the scientific community [1]. These data structures are intended to be stored in two types of objects, Datasets and Groups. Paralleling the file-format to a file system, a Dataset can be interpreted as a file, and a Group can be interpreted as a directory that's able to contain other Datasets or Groups. Associated with each entry, is metadata containing user-defined named attributes that can be used to describe the dataset.

When reading a dataset out of the file, the HDF5 library will call the following function, H5Dread. After allocating space for the buffer, the library will call an internal function H5Dread which will eventually call into H5Dchunk_lock. This function is responsible for reading the contents of the dataset into a cache that the application will later be able to access.

Once chunklock is read, the library will call into it's pipeline to determine how it can decode the data. This happens by calling H5Zpipeline. Inside H5Z_pipeline, the library will determine what kind of filter to choose and then call the "filter" method from a data structure that contains methods or handlers that deal with the specific encoding type.

When handling data encoded with the nbit encoding type, the library will call H5Zfilternbit. This function will take inputs from the file and use it to calculate the amount of space required to decode the encoded data. This is done by taking the number of elements and multiplying it by the size of elements. With the provided proof of concept, the size is 4 and the number of elements is 12. This results in a buffer size of 48 bytes.

When entering the H5ZNBITATOMIC case, the library copy input from the file into a structure that gets passed to H5Znbitdecompressoneatomic. This loop will iterate for the number of elements that were stored in the dataset. The field that is used later to write outside the buffer allocated in the prior snippet is used to determine the precision of a binary-coded-decimal number.

Once inside H5Znbitdecompressoneatomic, the library will use the value of p.precision to calculate the index into the buffer that was allocated. Due to a lack of bounds-checking, this index will allow for a loop that is executed later to write outside the bounds of the buffer. If precision is larger than datatype_len, then the index can be made to overflow.