Talos Vulnerability Report

TALOS-2016-0176

HDF5 Group libhdf5 H5T_ARRAY Code Execution Vulnerability

November 17, 2016

CVE Number

CVE-2016-4330

Description

HDF5 is a fileformat that is maintained by a non-profit organization, The HDF Group. HDF5 is designed to be used for storage and organization of large amounts of scientific data and is used to exchange data structures between applications in industries such as the GIS industry via libraries such as GDAL, OGR, or as part of software like ArcGIS. The vulnerability exists due to the library's failure to check if the number of dimensions for an array read from the file is within the bounds of the space allocated for it. When reading elements from the file into this array, a heap-based buffer overflow will occur, potentially leading to arbitrary code execution.

Tested Versions

hdf5-1.8.16.tar.bz2
tools/h5ls: Version 1.8.16
tools/h5stat: Version 1.8.16
tools/h5dump: Version 1.8.16

Product Urls

CVSSv3 Score

8.6 -- CVSS:3.0/AV:L/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H

Details

The HDF file format is intended to be a general file format that is self-describing for various types of data structures used in the scientific community [1]. These datastructures are intended to be stored in two types of objects, Datasets and Groups. Paralleling the file-format to a filesystem, a Dataset can be interpreted as a file, and a Group can be interpreted as a directory that's able to contain other Datasets or Groups. Associated with each entry, is metadata containing user-defined named attributes that can be used to describe the dataset.

Within the HDF file format, paths can be specified as the '/'-separated posix format. When reading a dataset, the library will open the object using H5Dopenoid. Inside this function, the library will read the type and it's location. Once the type and it's location are read, then the library will pass the H5ODTYPEID value onto H5Omsg_read.

Inside H5Omsgreadoh, the application will use the typeid argument to determine which message type is being used for a message. This message type is used to determine which callback to use in order to handle the message. This process occurs within the macro H5OLOADNATIVE at H5Omessage.c:545

Inside the H5OLOADNATIVE macro, the application will select a structure containing function pointers out of the msg->type field. This structure contains various functions that are used to decode the message. When decoding a msg of type H5ODTYPEID, the library will dispatch into the H5Odtypeshareddecode function. This function will eventually call H5Odtypedecode. Inside H5Odtypedecode, the library will first allocate space using the call H5T__alloc. Afterwards, execution will continue onto H5Odtypedecodehelper which is responsible for decoding the datatypes.

After allocating space for the H5Tarrayt, the library will return back to H5Odtypedecode which will then execute the function H5Odtypedecodehelper. When entering the case H5TARRAY, the library will read the number of dimensions from the file and then check that it's valid via an assertion. Due to an assertion being only enabled when the application is compiled in debug-mode, this check will get optimized out by the preprocessor. Immediately following, the library will enter a loop that reads DWORDs from the file into the H5Tarrayt.dim field. If the value of u.array.ndims is larger than 32, then this loop will read data outside the bounds of the H5Tarrayt that was allocated earlier. This will lead to heap corruption and can lead to code execution under the context of the application using the library.