Detailed Description

Extracts Histogram of Oriented Gradients features from the input grayscale image.

The Histogram of Oriented Gradients (HOG) vision function is split into two nodes vxHOGCellsNode and vxHOGFeaturesNode. The specification of these nodes cover a subset of possible HOG implementations. The vxHOGCellsNode calculates the gradient orientation histograms and average gradient magnitudes for each of the cells. The vxHOGFeaturesNode uses the cell histograms and optionally the average gradient magnitude of the cells to produce a HOG feature vector. This involves grouping up the cell histograms into blocks which are then normalized. A moving window is applied to the input image and for each location the block data associated with the window is concatenated to the HOG feature vector.

Function Documentation

[Graph] Performs cell calculations for the average gradient magnitude and gradient orientation histograms.

Firstly, the gradient magnitude and gradient orientation are computed for each pixel in the input image. Two 1-D centred, point discrete derivative masks are applied to the input image in the horizontal and vertical directions.

\[ M_h = [-1, 0, 1] \]

and

\[ M_v = [-1, 0, 1]^T \]

\(G_v\) is the result of applying mask \(M_v\) to the input image, and \(G_h\) is the result of applying mask \(M_h\) to the input image. The border mode used for the gradient calculation is implementation dependent. Its behavior should be similar to VX_BORDER_UNDEFINED. The gradient magnitudes and gradient orientations for each pixel are then calculated in the following manner.

\[ G(x,y) = \sqrt{G_v(x,y)^2 + G_h(x,y)^2} \]

\[ \theta(x,y) = arctan(G_v(x,y), G_h(x,y)) \]

where \(arctan(v, h)\) is \( tan^{-1}(v/h)\) when \(h!=0\),

\( -pi/2 \) if \(v<0\) and \(h==0\),

\( pi/2 \) if \(v>0\) and \(h==0\)

and \( 0 \) if \(v==0\) and \(h==0\)

Secondly, the gradient magnitudes and orientations are used to compute the bins output tensor and optional magnitudes output tensor. These tensors are computed on a cell level where the cells are rectangular in shape. The magnitudes tensor contains the average gradient magnitude for each cell.

where \(G_c\) is the gradient magnitudes related to cell \(c\). The bins tensor contains histograms of gradient orientations for each cell. The gradient orientations at each pixel range from 0 to 360 degrees. These are quantised into a set of histogram bins based on the num_bins parameter. Each pixel votes for a specific cell histogram bin based on its gradient orientation. The vote itself is the pixel's gradient magnitude.

[Graph] The node produces HOG features for the W1xW2 window in a sliding window fashion over the whole input image. Each position produces a HOG feature vector.

Firstly if a magnitudes tensor is provided the cell histograms in the bins tensor are normalised by the average cell gradient magnitudes.

\[bins(c,n) = \frac{bins(c,n)}{magnitudes(c)}\]

To account for changes in illumination and contrast the cell histograms must be locally normalized which requires grouping the cell histograms together into larger spatially connected blocks. Blocks are rectangular grids represented by three parameters: the number of cells per block, the number of pixels per cell, and the number of bins per cell histogram. These blocks typically overlap, meaning that each cell histogram contributes more than once to the final descriptor. To normalize a block its cell histograms \(h\) are grouped together to form a vector \(v = [h_1, h_2, h_3, ... , h_n]\). This vector is normalised using L2-Hys which means performing L2-norm on this vector; clipping the result (by limiting the maximum values of v to be threshold) and renormalizing again. If the threshold is equal to zero then L2-Hys normalization is not performed.

\[L2norm(v) = \frac{v}{\sqrt{\|v\|_2^2 + \epsilon^2}}\]

where \( \|v\|_k \) be its k-norm for k=1, 2, and \( \epsilon \) be a small constant. For a specific window its HOG descriptor is then the concatenated vector of the components of the normalized cell histograms from all of the block regions contained in the window. The W1xW2 window starting position is at coordinates 0x0. If the input image has dimensions that are not an integer multiple of W1xW2 blocks with the specified stride, then the last positions that contain only a partial W1xW2 window will be calculated with the remaining part of the W1xW2 window padded with zeroes. The Window W1xW2 must also have a size so that it contains an integer number of cells, otherwise the node is not well-defined. The final output tensor will contain HOG descriptors equal to the number of windows in the input image. The output features tensor has 3 dimensions, given by:

See vxCreateTensor and vxCreateVirtualTensor. We recommend the output tensors always be virtual objects, with this node connected directly to the classifier. The output tensor will be very large, and using non-virtual tensors will result in a poorly optimized implementation. Merging of this node with a classifier node such as that described in the classifier extension will result in better performance. Notice that this node creation function has more parameters than the corresponding kernel. Numbering of kernel parameters (required if you create this node using the generic interface) is explicitly specified here.

[Immediate] Performs cell calculations for the average gradient magnitude and gradient orientation histograms.

Firstly, the gradient magnitude and gradient orientation are computed for each pixel in the input image. Two 1-D centred, point discrete derivative masks are applied to the input image in the horizontal and vertical directions.

\[ M_h = [-1, 0, 1] \]

and

\[ M_v = [-1, 0, 1]^T \]

\(G_v\) is the result of applying mask \(M_v\) to the input image, and \(G_h\) is the result of applying mask \(M_h\) to the input image. The border mode used for the gradient calculation is implementation dependent. Its behavior should be similar to VX_BORDER_UNDEFINED. The gradient magnitudes and gradient orientations for each pixel are then calculated in the following manner.

\[ G(x,y) = \sqrt{G_v(x,y)^2 + G_h(x,y)^2} \]

\[ \theta(x,y) = arctan(G_v(x,y), G_h(x,y)) \]

where \(arctan(v, h)\) is \( tan^{-1}(v/h)\) when \(h!=0\),

\( -pi/2 \) if \(v<0\) and \(h==0\),

\( pi/2 \) if \(v>0\) and \(h==0\)

and \( 0 \) if \(v==0\) and \(h==0\)

Secondly, the gradient magnitudes and orientations are used to compute the bins output tensor and optional magnitudes output tensor. These tensors are computed on a cell level where the cells are rectangular in shape. The magnitudes tensor contains the average gradient magnitude for each cell.

where \(G_c\) is the gradient magnitudes related to cell \(c\). The bins tensor contains histograms of gradient orientations for each cell. The gradient orientations at each pixel range from 0 to 360 degrees. These are quantised into a set of histogram bins based on the num_bins parameter. Each pixel votes for a specific cell histogram bin based on its gradient orientation. The vote itself is the pixel's gradient magnitude.

[Immediate] Computes Histogram of Oriented Gradients features for the W1xW2 window in a sliding window fashion over the whole input image.

Firstly if a magnitudes tensor is provided the cell histograms in the bins tensor are normalised by the average cell gradient magnitudes.

\[bins(c,n) = \frac{bins(c,n)}{magnitudes(c)}\]

To account for changes in illumination and contrast the cell histograms must be locally normalized which requires grouping the cell histograms together into larger spatially connected blocks. Blocks are rectangular grids represented by three parameters: the number of cells per block, the number of pixels per cell, and the number of bins per cell histogram. These blocks typically overlap, meaning that each cell histogram contributes more than once to the final descriptor. To normalize a block its cell histograms \(h\) are grouped together to form a vector \(v = [h_1, h_2, h_3, ... , h_n]\). This vector is normalised using L2-Hys which means performing L2-norm on this vector; clipping the result (by limiting the maximum values of v to be threshold) and renormalizing again. If the threshold is equal to zero then L2-Hys normalization is not performed.

\[L2norm(v) = \frac{v}{\sqrt{\|v\|_2^2 + \epsilon^2}}\]

where \( \|v\|_k \) be its k-norm for k=1, 2, and \( \epsilon \) be a small constant. For a specific window its HOG descriptor is then the concatenated vector of the components of the normalized cell histograms from all of the block regions contained in the window. The W1xW2 window starting position is at coordinates 0x0. If the input image has dimensions that are not an integer multiple of W1xW2 blocks with the specified stride, then the last positions that contain only a partial W1xW2 window will be calculated with the remaining part of the W1xW2 window padded with zeroes. The Window W1xW2 must also have a size so that it contains an integer number of cells, otherwise the node is not well-defined. The final output tensor will contain HOG descriptors equal to the number of windows in the input image. The output features tensor has 3 dimensions, given by:

See vxCreateTensor and vxCreateVirtualTensor. The output tensor from this function may be very large. For this reason, is it not recommended that this "immediate mode" version of the function be used. The preferred method to perform this function is as graph node with a virtual tensor as the output.