A Brown University Research Group

Computational mechanisms of color processing

The goal of this project is to study the computational mechanisms underlying color processing in the primate brain. In recent work we have developed a novel framework for the joint processing of color and shape information in natural images [ECCV’12]. A hierarchical non-linear spatio-chromatic operator yields spatial and chromatic opponent channels, which mimics processing in the primate visual cortex. We have extended two popular object recognition systems (our very own hierarchical model of visual processing and a classical bag-of-word approach based on the SIFT descriptor) to incorporate color information along with shape information.

Neurophysiology background

Luminance neurons have distinct “ON” and “OFF” subfields and are selective to orientations (and high spatial frequency) thus contribute to edge detection.

Insights from neuroscience:

Chromatic and spatial information should be represented jointly as done in the primate visual cortex

Dedicated neural circuits for contrast gain controls play key role in color constancy.

Approach

Fig. 2. Proposed spatio-chromatic opponent image descriptors. Top: Individual R, G, B color channels are first convolved with a center and a surround filter at orientation , phase , and scale s. The corresponding color channels are further combined and rectified by half-squaring followed by divisive normalization (I). This yields eight chromatic SO channels organized in four pairs (e.g., R+G- shown here and R-G+. In stage II, an oriented filter (with both excitatory and inhibitory subunits) is further applied on each SO channel, followed by half-squaring rectification and summation over phases and opponent pairs to yield four spatio-chromatic DO channels that are invariant to figure-ground reversal (e.g., RG).

(1) Spatio-chromatic sensitivity function

The response of an SO (S1) unit is obtained by considering the dot-product between an image patch and the spatio-chromatic sensitivity function.

Fig. 3. Spatial sensitivity distributions for each individual color component. These are obtained by isolating the positive/negative subunits from linear oriented filters to form excitatory/inhibitory center or surround structures.

(2) From Single-Opponent to Double-Opponent units

DO (S1) model unit responses are obtained by filtering SO channels with the spatial sensitivity function:

Note: unlike the SO stage, the convolution here is only 2D and contains both center and surround (excitatory/inhibitory) subunits (in the SO computation excitatory and inhibitory subunits are applied on separate color components). With this difference in mind, the spatial sensitivity function used at the DO stage is the same as the one used at the SO stage but in the general case any filter with excitatory and inhibitory components could be used.

(3) Non-linear operations

Example:

Fig. 4. Processing by the SO and DO color channels. A. Original image. B. SO maps. C. DO maps (after max over all dimensions for display).

Sample results

We have evaluated our approach on four systems:

SIFT-based bag-of-words approach

HMAX model

GIST algorithm for natural scene categorization

BSDS500 for contour detection

Object recognition

Table 1. Recognition performance on the soccer team and 17-category flower datasets. Classification accuracy is reported for each feature type (data in parenthesis correspond to the original performance reported in [3, 4] using the same features as in a bag-of-words scheme.)

Soccer team

Flower

Method

Color

Shape

Both

Color

Shape

Both

Hue/SIFT

69 (67)

43 (43)

73 (73)

58 (40)

65 (65)

77 (79)

Opp/SIFT

69 (65)

43 (43)

74 (72)

57 (39)

65 (65)

74 (79)

SOSIFT/DOSIFT

82

66

83

68

69

79

SOHMAX/DOHMAX

87

76

89

77

73

83

Table 2. Recognition performance on PASCAL VOC 2007 dataset. Performance corresponds to the mean average precision (AP) over all 20 classes. Performance (in parenthesis) corresponds to the best performance reported in [5, 6].

PASCAL VOC 2007

Method

SIFT

HueSIFT

OpponentSIFT

CSIFT

SODOSIFT

SODOHMAX

AP

40 (38.4)

41

43 (42.5)

43 (44.0)

46.5 (33.3/39.8)

46.8 (30.1/36.4)

Table 3. On the need for non-linear circuits: Recognition performance on the soccer team and PASCAL VOC 2007 datasets with and without rectification or divisive normalization stages for the SO (left) and DO (right) SIFT descriptors.

Method

Soccer team

PASCAL VOC 2007

Full model

82.0/66.0

33.3/39.8

Without half-squaring

62.0/60.0

30.3/36.7

Without normalization

70.0/53.0

32.9/40.7

Table 4. All parameters used here are directly constrained by neuroscience data (k = 1 and σ = 0.225 turned out to perform best for the SO (left) and DO (right) SIFT descriptors).

Contour detection

Fig. 7. Contour detection on BSDS500. (A) Representative examples obtained using the original texton maps and our proposed color extensions. From left to right: original images, color-texton map (SOTG) and texton map (TG). (B) Precision-recall curves on BSDS500, comparing the original grayscale texture channel with the full Berkeley system [7] which combines brightness, color, and texture cues against our color-texture cue.