THE NORB DATASET, V1.0
Fu Jie Huang, Yann LeCun
Courant Institute, New York University
July 2004
last updated: October,2005
This database is intended for experiments in 3D object reocgnition from shape.
It contains images of 50 toys belonging to 5 generic categories:
four-legged animals, human figures, airplanes, trucks, and cars.
The objects were imaged by two cameras under 6 lighting conditions,
9 elevations (30 to 70 degrees every 5 degrees), and 18 azimuths (0 to 340
every 20 degrees).
The training set is composed of 5 instances of each category
(instances 4, 6, 7, 8 and 9), and the test set of the remaining
5 instances (instances 0, 1, 2, 3, and 5).
TERMS / COPYRIGHT
This database is provided for research purposes. It cannot be sold.
Publications that include results obtained with this database should
reference the following paper:
Y. LeCun, F.J. Huang, L. Bottou, Learning Methods for Generic Object Recognition with
Invariance to Pose and Lighting. CVPR 2004. online version
CONTENT
The files are gzipped for download purpose. After uncompressed, they are in a simple
binary matrix format, with file postfix ".mat". The file format is explained in a
later section.
The "-dat" files store the image sequences. The "-cat" files store the corresponding
category of the images. Each "-dat" file stores 29,160 image pairs (6 categories,
5 instances, 6 lightings, 9 elevations, and 18 azimuths). The 6-th category is for
images without objects, which can be used to train a system to reject images as none
of the 5 object categories. Each corresponding "-cat" file contains 29,160 category
labels (0 for animal, 1 for human, 2 for plane, 3 for truck, 4 for car, 5 for blank).
Each "-info" file stores 29,160 10-dimensional vectors, which contain additional
information about the corresponding images. The first 4 elements in the vector are:
- 1. the instance in the category (0 to 9)
- 2. the elevation (0 to 8, which mean cameras are 30, 35,40,45,50,55,60,65,70
degrees from the horizontal respectively)
- 3. the azimuth (0,2,4,...,34, multiply by 10 to get the azimuth in degrees)
- 4. the lighting condition (0 to 5)
and the next 6 elements describe the peturbations added to the object when superposed
onto a cluttered background. (see next section)
For regular training and testing, "-dat" and "-cat" files are sufficient. "-info"
files are provided in case some other forms of classification or preprocessing are
needed.
JITTERED OBJECTS AND CLUTTERED BACKGROUND
After capturing, each image has been processed so that the object is centered in
the image (the center of mass of object pixels are in the center of the image),
scaled so that the bounding box is roughly 80x80 pixels, and placed on a uniform
background, including the cast shadow.
And then 3 sources of variations are added to the data set:
- the objects are peturbed
- the objects are superposed onto complex background
- distractor objects are added to the background
The objects are randomly peturbed in 5 ways. They are scaled by factors
between 0.78 to 1.0; in-plane rotated -5 to +5 degrees; and shifted -6 to
+6 pixels horizontally and vertically. The image intensities (in the range
of 0 to 255) are a random value between -20 to +20; image contrasts are
scaled in the range of 0.8 to 1.3. The peturbations are stored in the last 6
elements in the "-info" files:
- 5. horizontal shift (-6 to +6)
- 6. vertical shift (-6 to +6)
- 7. lumination change (-20 to +20)
- 8. contrast (0.8 to 1.3)
- 9. object scale (0.78 to 1.0)
- 10. rotation (-5 to +5 degrees)
The complex background images are extracted from a subset of natural scene
images from Corel image library. The images contain scenes with large region
contrasts such as lake against moutain, and irregular region boundaries.
One distractor object is added to each image. The distractor is located toward
the boundary of the image, but can clutter the main object in the center.
There are images with only background and distractor objects. These images belong
to their own category, as indicated in the category files.
FILE FORMAT
The files are stored in the so-called "binary matrix" file format, which
is a simple format for vectors and multidimensional matrices of various
element types. Binary matrix files begin with a file header which describes
the type and size of the matrix, and then comes the binary image of the matrix.
The header is best described by a C structure:
struct header {
int magic; // 4 bytes
int ndim; // 4 bytes, little endian
int dim[3];
};
Note that when the matrix has less than 3 dimensions, say, it's a 1D vector,
then dim[1] and dim[2] are both 1. When the matrix has more than 3 dimensions,
the header will be followed by further dimension size information. Otherwise,
after the file header comes the matrix data, which is stored with the index
in the last dimension changes the fastest.
The magic number encodes the element type of the matrix:
- 0x1E3D4C51 for a single precision matrix
- 0x1E3D4C52 for a packed matrix
- 0x1E3D4C53 for a double precision matrix
- 0x1E3D4C54 for an integer matrix
- 0x1E3D4C55 for a byte matrix
- 0x1E3D4C56 for a short matrix
Since the files are generated on an Intel machine, they use the little-endian
scheme to encode the 4-byte integers. Pay attention when you read the files on
machines that use big-endian.
- The "-dat" files store a 4D tensor of dimensions 29160x2x108x108.
- The "-cat" files store a 1D vector of dimension 29,160.
- The "-info" files store a 2D matrix of dimensions 29160x10.
Here's a piece of Matlab code to show how to read some example files.
(to avoid the endian confusion, we read bytes of the header):
>> fid=fopen('norb-5x46789x9x18x6x2x108x108-training-10-dat.mat','r');
>> fread(fid,4,'uchar'); % result = [85 76 61 30], it's a byte matrix
>> fread(fid,4,'uchar'); % result = [4 0 0 0], ndim = 4
>> fread(fid,4,'uchar'); % result = [232 113 0 0], dim0 = 29160 (=113*256+232)
>> fread(fid,4,'uchar'); % result = [2 0 0 0], dim1 = 2
>> fread(fid,4,'uchar'); % result = [108 0 0 0], dim2 = 108
>> fread(fid,4,'uchar'); % result = [108 0 0 0], dim3 = 108
>> imshow(transpose(reshape(fread(fid,108*108),108,108)),[0 255]); % show the first image
>> fid=fopen('norb-5x46789x9x18x6x2x108x108-training-10-cat.mat','r');
>> fread(fid,4,'uchar'); % [84 76 61 30], integer matrix
>> fread(fid,4,'uchar'); % [1 0 0 0] ndim = 1
>> fread(fid,4,'uchar'); % [232 113 0 0] dim0 = 29160 (=113*256+232)
>> fread(fid,4,'uchar'); % [1 0 0 0] (ignore this)
>> fread(fid,4,'uchar'); % [1 0 0 0] (ignore this)
>> fread(fid,10,'int'); % [0 1 2 3 4 5 0 1 2 3] (on little-endian CPU)
>> fid=fopen('norb-5x46789x9x18x6x2x108x108-training-10-info.mat','r');
>> fread(fid,4,'uchar'); % [84 76 61 30], integer matrix
>> fread(fid,4,'uchar'); % [2 0 0 0] ndim = 2
>> fread(fid,4,'uchar'); % [232 113 0 0] dim0 = 29160 (=113*256+232)
>> fread(fid,4,'uchar'); % [10 0 0 0] dim1 = 10
>> fread(fid,4,'uchar'); % [1 0 0 0] (ignore this)
>> fread(fid,10,'int'); % [8 5 10 4 -3 0 -6 1 0 -4] (on little-endian CPU)
Here is a screen shot of first 30 image pairs read from
"norb-5x46789x9x18x6x2x108x108-training-10-dat.mat", arranged topdown and left-to-right
(column major). The caption below each pair shows the content from the corresponding
"-cat.mat" and "-info.mat" files. They are "category/instance/elevation/azimuth/lighting".
For the background images, the later 4 numbers are all -1.
DOWNLOAD
Please note that your web browser may uncompress the files without telling you.
Check the file size to see if it's uncompressed!
The file size of "*-cat.mat.gz" is 0.4 KB, uncompressed to 116 KB
The file size of "*-dat.mat.gz" is 514 MB, uncompressed to 680 MB
The file size of "*-info.mat.gz" is 157 KB, uncompressed to 1.1 MB
downloadable files:
norb-5x01235x9x18x6x2x108x108-testing-01-cat.mat.gz
norb-5x01235x9x18x6x2x108x108-testing-01-dat.mat.gz
norb-5x01235x9x18x6x2x108x108-testing-01-info.mat.gz
norb-5x01235x9x18x6x2x108x108-testing-02-cat.mat.gz
norb-5x01235x9x18x6x2x108x108-testing-02-dat.mat.gz
norb-5x01235x9x18x6x2x108x108-testing-02-info.mat.gz
norb-5x46789x9x18x6x2x108x108-training-01-cat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-01-dat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-01-info.mat.gz
norb-5x46789x9x18x6x2x108x108-training-02-cat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-02-dat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-02-info.mat.gz
norb-5x46789x9x18x6x2x108x108-training-03-cat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-03-dat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-03-info.mat.gz
norb-5x46789x9x18x6x2x108x108-training-04-cat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-04-dat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-04-info.mat.gz
norb-5x46789x9x18x6x2x108x108-training-05-cat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-05-dat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-05-info.mat.gz
norb-5x46789x9x18x6x2x108x108-training-06-cat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-06-dat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-06-info.mat.gz
norb-5x46789x9x18x6x2x108x108-training-07-cat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-07-dat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-07-info.mat.gz
norb-5x46789x9x18x6x2x108x108-training-08-cat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-08-dat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-08-info.mat.gz
norb-5x46789x9x18x6x2x108x108-training-09-cat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-09-dat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-09-info.mat.gz
norb-5x46789x9x18x6x2x108x108-training-10-cat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-10-dat.mat.gz
norb-5x46789x9x18x6x2x108x108-training-10-info.mat.gz