Abstract

Hierarchical Temporal Memory is a new machine learning algorithm intended to mimic the working principle of neocortex, part of the human brain, which is responsible for learning, classification, and making predictions. Although many works illustrate its effectiveness as a software algorithm, hardware design for HTM remains an open research problem. Hence, this work proposes an architecture for HTM Spatial Pooler and Temporal Memory with learning mechanism, which creates a single image for each class based on important and unimportant features of all images in the training set. In turn, the reduction in the number of templates within database reduces the memory requirements and increases the processing speed. Moreover, face recognition analysis indicates that for a large number of training images, the proposed design provides higher accuracy results (83.5%) compared to only Spatial Pooler design presented in the previous works.

The Hierarchical Temporal Memory (HTM) is a cognitive learning algorithm developed by Numenta Inc. [1]. HTM was designed based on various principles of neuroscience and, therefore, is said to be able to emulate the working principle of neocortex, a part of the human brain responsible for learning, classification, and making predictions [2].

After successful software realization of this learning algorithm, several works such as [3] have been conducted to realise its hardware implementation, including [7], one of the latest works that propose analog circuit design of HTM Spatial Pooler based on the combination of memristive crossbar circuits with CMOS technology. One of the main advantages of the latest work is that the processing of input data is performed in the analog domain, which indeed can offer higher processing speed, primarily, due to the absence of analog-to-digital and digital-to-analog converters used in digital systems. Thus, inspired by design described in [7] and from the idea of creating new analog add-on system that may move processing from digital domain to analog domain at the sensory level, this work proposes a system design of Hierarchical Temporal Memory for face recognition applications.

In particular, this work proposes a system level design for HTM that exploits the combination of a memristive crossbar based Spatial Pooler [7], and conceptual analog Temporal Memory based on learning mechanism inspired from the Hebbian rule. Further, we present face recognition algorithm for the proposed system and provide its performance results and analysis.

The main core of the proposed system is HTM, that consists of two parts: Spatial Pooler and Temporal Memory [2]. Spatial Pooler (SP) is responsible for the generation of the sparse distributed representations of input data and can be used for feature extraction and pattern recognition applications on its own, whereas Temporal Memory (TM) is responsible for learning input patterns and making predictions based on the temporal changes in the given input stream [8].

HTM was initially developed as the software algorithm [8] and some research works, such as [9], were presented to illustrate and verify the capabilities of algorithmic implementation of HTM for performing classification, learning patterns, detecting abnormalities, and making predictions.

Since HTM is a new machine learning algorithm, few attempts were made to implement it in hardware level. For instance, [3] presented HTM hardware design for digital application-specific integrated circuit (ASIC) architecture, [4] depicts the design for the FPGA implementation of digital HTM, [5] proposed computing blocks for HTM using memristive crossbar arrays and spin-neurons, which process data in both digital and analog domains, and the latest works [6] proposed circuits for HTM Spatial Pooler based on memristive crossbar architecture.

Introduced by Donald Hebb in 1949, Hebbian theory (also known as the Hebbian rule or Hebbian postulate) serves as one of the many learning mechanisms used in the design of artificial neural networks. In particular, it describes the basic idea behind synaptic plasticity and states that synaptic efficacy increases when a presynaptic cell takes part in repeated or persistent stimulations and firing of neighboring postsynaptic cell [12]. Following the Hebbian rule, the activation of postsynaptic units in neural nets depends on the weighted activations of presynaptic units, which can be represented by (Equation 1)

yi=∑jwijxj(1)

where yi represents the output of neuron i, xj stands for the jth input, and wij is the weight of the connection from neuron xj to yi[12].

In other words, the Hebbian theory claims that the synaptic weight between two neurons increases when both neurons simultaneously experience the activation or deactivation, and it decreases when they activate or deactivate separately. The equation for the change in synaptic weight Δwij of the connection can be shown by (Equation 2), which is known as a learning mechanism of Hebbian theory [12].

Δwij=ηxjyi(2)

where η is the learning rate. This learning mechanism in the artificial neural networks is used to alter weights between neurons.

Figure 1: High level block diagram of the proposed system illustrating operating principle of the HTM spatial pooler

Currently, there are several available memristor models, which incorporate not only the characteristics of real existing devices, but also provide the possibility to switch from one device parameters to another, so that suitability of these devices can also be assessed. For example, [13] proposed a model with nanosecond switching time, which is crucial in designing real-time systems. Recent models proposed in [14] allow simulation of large-scale networks of memristors as well since parallelism and scalability of the system play important role in processing huge amount of data.

A high-level block diagram of the proposed system is illustrated in Figure 1. Input data controller reads the input image, places it in data storage and sends this input image to HTM Spatial Pooler by partially retrieving it from the data storage. The data controller is substantial as partial sending is required to ensure that the selected size of HTM SP is capable of processing an entire image. In turn, HTM SP is responsible for feature extraction of the input image and thus, provides a binary output. If the input image is a training image, the output data controller directs its extracted features to HTM TM, which creates a single image/template for each class having common features of all the training images belonging to that particular class. During the testing phase, resulting images stored within TM are then used by pattern matcher to calculate the similarity score between the input testing image and each of the trained classes.

In this work, we also propose Algorithm ? that can be used to analyze the effectiveness of the proposed system. The algorithm shows interconnections between main processing stages of the entire system: pre-processing, HTM SP, HTM TM, and pattern matching. Pre-processing stage, shown in lines 2-3 in Algorithm ?, is necessary to convert the input image into system compatible format. In this stage, we convert input image to grayscale and enhance its quality using standard deviation filtering. This is achieved by either external means or by input data controller.

HTM SP stage models feature extraction process achieved by the Spatial Pooler block that is illustrated in lines 5-21 of Algorithm ?. This is done by initially generating random weights matrix w, so that each weight in w would have analog value between 0 and 1. The weights matrix has dimensions of N×N, which also defines dimensions of each column within Spatial Pooler. Lines 6-10 define connectivity of each synapse, so that if its weight w is higher than the threshold γ, the synapse is connected and represented by 1, but if w<γ, it is disconnected and represented by 0. Synapse connectivity is used to determine overlap value column.overlap() for each column m, which is represented as the sum of the products of synaptic weight matrix w and N×N bits of the image within the region m. This overlap value represents the importance of bits connected to each particular column.

Lines 15-20 define the inhibition rule implemented in the proposed system. According to the rule, inhibition is performed in a block-by-block manner, each having dimensions of M×M columns and is based on overlap values achieved by columns c lying within that inhibition block inhibition.block(). This is done by comparing individual overlap values of columns c with threshold value θ, which is determined as the maximum overlap that is detected within that particular inhibition block. Then, the column or columns with overlap value greater than or equal to θ are considered as important and represented by logical high 1 value. Otherwise, columns are considered as unimportant and represented by logical low 0 value. As a result, the binary feature extracted output image SP.image after HTM SP processing is formed by concatenating all inhibition blocks.

HTM TM stage defines a learning mechanism that is activated during training stage when binary feature extracted an image from HTM SP SP.image is moved by the output data controller block to the HTM TM block. Lines 26-32 define that proposed TM should create certain class.map for each class in k image classes, which reflects temporal variations of spatial features. This is done by making TM update class.map every time new feature extracted image is fetched to TM block. Based on whether bit has the value of 1 or 0 within SP.image, corresponding memory cell within class.map is either increased or decreased by δ value, respectively. At the end of training phase, according to lines 33-37, class.map of each class is binarized.

Recognition and image classification stage defines a pattern matching process that is active during testing phase when binary feature extracted image SP.image is moved by the output data controller block to the Pattern Matcher block. According to lines 39-44, the similarity score between extracted features SP.image of the testing image and each of the class maps stored within HTM TM is defined as the sum of XOR logic high 1 outputs. Since XOR operation produces logic high 1 output at places where two compared bits are of different value, a class of the tested image can be defined as the class.map that produces the least score() value.

In the proposed system HTM SP processing can be implemented with memristive crossbar based SP [7]. Memristor devices due to its ability to memories and being able to mimic neurons find various applications [15]. Figure 2 illustrates single memristive crossbar processing unit. Memristive-CMOS circuits allow precise realization of the feature extraction process described by algorithm lines 3-10 with additional advantages in terms of parallel synaptic processing and compact storage of synaptic weights. Figure 3 illustrates modified version of the WTA circuit designed by [18], which is when combined with SP circuits presented in [7] allow implementation of inhibition processing described by algorithm lines 11-17.

Figure 2: Single memristive crossbar processing unit as presented in Figure 3: Winner-Take-All Circuit taken from

Instead of saving all the feature extracted images as it was done in the previous SP design [7], the proposed work incorporates conceptual analog TM into the entire system. It is, in turn, intended to reduce memory requirements and processing time by creating single training image, which is called a class map in this work. Such class map incorporates features of all training images belonging to a single class and allows pattern matching to be performed by comparing the testing image with the only single image for each of the memorized (trained) classes.

This is realized by making TM learn by focusing on both important and unimportant features and by reflecting how features change with time.

Focus is achieved by placing Temporal Memory circuitry after Spatial Pooler so that the inputs are not the original training images, but feature extracted images provided at the output of Spatial Pooler. Since each of these outputs is binary in nature, such placement allows Temporal Memory to differentiate important and unimportant features.

Reflection is achieved by changing the weights of the Temporal Memory cells according to the importance of the corresponding input bit. This is realized by implementing learning mechanism of the Hebbian theory given by (Equation 2), which, in general, is used to determine the weight change between presynaptic and postsynaptic units. In the proposed design of Temporal Memory, the binary pixels of feature extracted images are used as presynaptic units, whereas postsynaptic units are represented by a matrix of ones of the same size as the image. The realization of postsynaptic units as a matrix of ones is required to ensure that every pixel of feature extracted image is treated as equally important. Moreover, such arrangement allows alteration of the weights of Temporal Memory cells with respect to their importance.

In particular, if the input bit of feature extracted image is 1, meaning that it represents an important feature, then the weight of the corresponding Temporal Memory cell increases by positive weight update (+Δ) value. Contrary, if the input bit of the feature extracted image is 0, meaning that it represents an unimportant feature, then the weight of the corresponding Temporal Memory cell decreases by negative weight update (−Δ) value. This algorithm of differentiating important and unimportant features using learning mechanism of the Hebbian rule is illustrated in Figure 4.

As a result, instead of having multiple binary images with extracted features, TM creates single analog image incorporating important and unimportant features of and having the same dimensions as each of the input images belonging to a single class. Figure 5 illustrates the formation of the class map for the first class by fetching feature extracted binary images belonging to the first class to TM. All of the TM cells, initially having the same weight, eventually become distinguishable at the end of the training sequence.

Figure 5: The main principle of single class map formation using Temporal Memory and feature extracted images obtained from Spatial Pooler

However, such learning mechanism requires TM to be multi-valued. This is to ensure that the weights can take values not only of 1 and 0, as feature extracted images do, but can be changed according to the weight update value, which is ±Δ.

Hence, Figure 6 illustrates the design of TM required to memorize single class map and utilizing multi-valued memory cells. The total number of the required memory cells correspond to the product of the number of memorized classes and the number of bits of a single class map. For example, for 13 class maps, each having dimensions of 120bits×160bits, the required number of memory cells is 13×19200. The multi-valued memory cells, in turn, can be realized by using n-bit memristor-based memory, which is described in [19].

Figure 6: The design of Temporal Memory consisting of 120×160=19200 memory cells and that is used for storing a single class map trained by fetching input images having dimensions of 120bits×160bits.

After class maps were formed within TM during the training phase, the testing is performed by comparing each input testing image with all the class maps learned by the system. In the proposed system this can be achieved by fetching a feature extracted input testing image and all the class maps into memristive pattern matcher (Fig. ?), which is realized by memristive XOR gates illustrated in Fig. ? and described in [7].

Figure 7 illustrates the principle used to determine a similarity score between any feature extracted the image and two arbitrary class maps. The class maps are thresholded at the mean value of 0.5, so that XOR logic can be accomplished. For two input images, memristive XOR gates will produce output image having logical 0 at the regions where both images represent important or unimportant features (i.e. either both have 1 at that region or both have 0 at that region) and having logical 1 at the regions where two images represent different features. Hence, the class of an input testing image is determined by the class map that has the least number of white bits (or the greatest number of black bits) at the corresponding XOR output.

The described pattern matching process emphasizes the advantage of the proposed design in terms of faster processing speed: the time required for the system to determine similarity score reduces, as the number of templates required to be compared with the input testing image decreases to a single image (that is class map) per class.

The performance metrics were based on face recognition accuracy of AR database [20]. The database consists of 100 classes each having 26 images that were taken in two sessions. These images then were divided into two separate sets. The first set consisted of 13 images of each class taken during the first session and was used to train Temporal Memory, whereas the second set consisted of 13 images of each class taken during the second session and was used for testing.

Based on the above mentioned set up, the first analysis was aimed to determine optimal delta (±Δ) required to update the weights of Temporal Memory for a different number of training images. Figure 8 illustrates the recognition accuracy results achieved for different combinations of ±Δ and the size of the training set. As can be seen, for the number of training images between 1 and 13 the maximum recognition accuracy is achieved when ±Δ value is lower than or equal to ±0.1. Moreover, as the size of the training set increases, the maximum achieved accuracy increases for ±Δ value of ±0.01 and decreases for large values of Δ.

This result indicates that achieving maximum recognition accuracy with a large number of training images is possible, when the weight update value is small. Another point that should be taken into account is that the value of memristor directly proportional to the duration of applied constant voltage. These two statements imply that the increased number of training images decreases the duration applied voltage required to update the weights, which means the consecutive input images can be processed at a higher speed.

Table 1: Recognition accuracy of classifying test images in each category of AR database done by two different architectures using single template or class map per each class.

Architecture

Emotions

Light conditions

Occlusions (glasses)

Occlusions (scarf)

Total

Spatial Pooler

77.50%

91.00%

84.33%

53.33%

76.54\%

Spatial Pooler and Temporal Memory

84.25%

96.33%

85.67%

67.67%

83.48\%

After optimal weight update value was determined to be ±0.01, the analysis was performed to compare the effectiveness of the architecture based on Spatial Pooler only [7] and the proposed architecture combining Spatial Pooler and Temporal Memory on face recognition task. To make common settings, for the architecture of only Spatial Pooler the training images belonging to a single class were initially averaged and the averaged images then were processed by Spatial Pooler to provide a feature extracted training templates. For the proposed architecture, the training images were processed by Spatial Pooler and extracted feature outputs were used to create class maps within Temporal Memory. Table 1 illustrates recognition accuracy results for the condition when both architectures had one template or class map for each of 100 classes (giving in total 100 templates or class maps) with which all of 13 testing images of each class (giving in total 1300 testing images) were compared. Comparing these results with these reported in [7], it can be seen that as the number of training images increases, the architecture incorporating Temporal Memory provides higher recognition accuracy at lower memory requirements and faster processing at pattern matching stage.

In this paper, we proposed system realization of HTM Spatial Pooler and Temporal Memory using memristor-CMOS circuits. The main difference from the existing HTM system with memristor is in that the system incorporates Temporal Memory with learning capability. The learning process of the system involves collecting important features from the training data of given class and creating its class map - a single image, based on the extracted features. Hence, the main advantage of the system is less memory occupation of HTM that provides with higher processing speed. The results of performance analysis indicate that for a large set of training images the proposed recognition system provides a higher accuracy compared to the results presented in the previous work.