Σχόλια 0

Το κείμενο του εγγράφου

Tim Miller

CSCI 5521-

Pattern Recognition

Determining Facial Attractiveness Using Eigenfaces

Final Report

Dr. Paul Schrater

December 19, 2003

Abstract

The technique of transforming images of human faces into reduced dimensional spacesusing principal components analysis was first described by Kirby and Sirovich (6). Thistechnique is used to produce what are called “eigenfaces” and has been used relativelysuccessfully for automated facial recognition (9, 12, 14). For this project I extended the use ofthe eigenfaces technique to predict attractiveness of female faces. Using attractiveness ratingsassigned by human subjects, novel images are rated for attractiveness on a scale of 1 to 10. Inaddition, new faces in any attractiveness classes can be generated using the ratings obtained fromsubjects.

Introduction

Literature Review

The literature for facial recognition is quite extensive, with most pattern recognitiontechniques having been triedat some point by some research group. Similarly, the research inthe area of facial attractiveness is voluminous. There is very little overlap in the two areas,although there is some amount of research that uses data transformation techniques in order tomanipulate the attractiveness of digitized facial images.

For the problem of facial recognition, both principle components analysis (PCA) (1, 3, 6,9, 14) and Fisher’s linear discriminant (1) have been used to reduce data dimensionality. Thesetwo methods are also known as “Eigenfaces” and “Fisherfaces.” The most widely used methodis PCA, though this is usually done on images with uniform lighting and facial expression.Alternately, some claim that using “Fisherfaces” as opposed to “Eigenfaces” allows for morevariance in lighting conditions and facial expressions (1). One of the most interesting newtechniques uses PCA on 3D face coordinates obtained from laser scans of a persons entire head(3). By scanning the face with different expressions, itis possible to obtain a difference vectorthat represents certain facial expressions. This can then be added or subtracted to the mean tomanipulate the facial expression of any face. This enables the data set to be much larger than thenumber of scans recorded, because a certain facial expression (e.g. smile, frown) can be added toor subtracted from any face in the database to create a new face.

For actual classification a variety of methods have been used, including support vectormachines (5), template matching (4), feature matching (4), Bayesian decision theory (9), Fisher’slinear discriminant (1), and Euclidean distance (14). For each paper, the method used has betterresults than any previously used method, and thus each paper claims that the method it uses is thebest suited for facial recognition. This is partly due to the fact that newer research tends to usemore advanced classification schemes, leading to better results. However, it is probably alsopartly due to different data sets, samplesizes, and image qualities.

Among researchers who study facial attractiveness, there seems to be an agreement thataverage faces are attractive (7, 8, 10, 11). There is disagreement, however, about whetheraverageness is a cause of attractiveness, and also whether average faces are the most attractive.Some propose that evolutionary forces would cause faces near to the population mean to bedesirable (7, 8). Others, however, say that while average faces may be attractive, they are not themost attractive (10, 11). In these papers, subjects rated a group of actual face images. Then, anaverage face was created from the face images, along with an average of the most attractive faceimages (as determined by the subjects). If averageness makes the most attractive faces, onewould expect the average face and the average of the attractive faces to have approximatelyequal attractiveness ratings. However, this was not the case, as the average of the attractive faceswas rated significantly higher than the mean of all faces. This is compelling evidence that whileaverage faces are attractive, certain non-average characteristics can result in a more attractiveface.

In much of the research done to investigate facial attractiveness, digital generation andmanipulation of facial images is used to probe for indicators of attractiveness (7, 8, 10, 11).“Averaged” faces are literally just images where each pixel value is the arithmetic mean of thecorresponding pixel values of some group of images (7, 8). In other experiments, facial imageswere “feminized” or “masculinized” by adding some linear combination of face bases that hadearlier been determined to be either feminine or masculine by subjects (10, 11). It was foundthat feminized faces tended to rate higher than masculine faces, giving another indicator ofattractiveness besides averageness.

Other manipulations were done without respect to attractiveness (2). In this experiment,facial images were connected by a line with the mean face, and moved alongthe line. As animage moves away from the mean, it is called a caricature, as the non-average features areaccentuated. Conversely, as the image is moved towards the mean, it is called an anti-caricature,because non-average features diminish in magnitude. As a face goes past the mean, it is calledan anti-face, and begins to take on some of the opposite features of the original face. Thisexperiment showed that a “perceptional discontinuity” occurs at the mean. In other words, facesalong the same line on the same side of the mean were deemed to be similar to a certain degreeby subjects, while faces on the same line, but on different sides of the mean, were not deemed tobe similar by subjects.

The algorithm to transform images into eigenfaces is described well in Turk and Pentland(14), but I will describe it briefly here. Each image could be represented as a matrix of M

Npixels, with each element an 8-bit grayscale value. If all images are represented as such, theycan each be turned into a vector of length MN. Next, the average face can be found, and eachimage can then be represented as a difference from the average. The difference vectors can thenall be used as columns in a matrix A. The eigenfaces are then the eigenvectors of the covariancematrix C = A*AT. However, since images are very high dimensional, the covariance matrix islikely quite large, and a straightforward eigenvector computation is prohibitively expensive.

This problem can be solved by noticing that in the equation

AATAvi

= uAvi,

the eigenvectors of C are Avi, wherevi

are the eigenvectors of ATA, which is a square matrixwith dimension equal to the number of images in the data set. Since the number of images in thedata set is usually much less than the size of an image, this computation is likely to be tractable.

Once the eigenfaces have been determined, each image in the training set can beprojected onto this “face-space.” In this way, each image is represented as a linear combinationof the eigenfaces. For classification, new images are also projected onto the face-space. Theweights for the new image are compared to those of the test data, and based on the distancemetric in use, the new image is decided to belong to the class that it is closest to.

For this

project, the dataset used had already been converted into eigenfaces. Many ofthe images for which the principal components coefficients had been given were not part of theset of images on which PCA was done. As a result, these faces did not reconstruct perfectly.While they may have been suitable for machine classification, most were not suitable to be ratedby humans for attractiveness. Thus, the training set for this project had to be reduced to 100faces, many of which still did not reconstruct as

well as one would like.

Methodology

The process of automatically estimating attractiveness starts with training a classifierusing a training set of facial images. A total of 7 male students (3 undergraduates, 4 graduate)rated each facial image inthe data set for facial attractiveness on a scale from one to ten. Theattractiveness rating of each image is the average across all subjects for that image. Subjectswere also given the option of not rating a given image if it was deemed too difficult to assign arating because of a poor reconstruction of the image. This abstention was counted as a non-rating, so that it did not affect the average in either direction.

For classification, the vector of face attractiveness ratings was divided into six classes.The number six was determined by trial and error. Ten classes may be the most obviousdivision, and was the first attempt. However, the range of ratings was 2.4–

6.7, so dividing intoten classes would have been splitting hairs. Six was deemed the best choice, as the histogram thedata produced with six classes best represented the normal distribution the ratings should mostlikely model.

A multitude of classification schemes were used to classify the facial images. Theseinclude Fisher’s linear discriminant, perceptrons, linear and non-linear support vector machines,Euclidean distance, and uniform and normal random classifiers. In the first three cases, specialcare was needed since there was greater than two classes. For all three methods,the one-against-all approach was used. Then, for each face, it was assigned to the class for which it had thefurthest distance from the discriminating hyperplane.

For the perceptron algorithm, it was discovered through inspection that the classifyingvector did not converge on the training set, so a variation of the pocket algorithm (15) was used.Specifically, whenever a weight vector classified the training data with fewer mistakes than theprevious best weight vector, the new best vector was saved.

This prevented the occurrence of abad weight vector being used simply because the last update step was an over correction.

Since there were only 100 data samples, the size ofthe data set was not sufficient tobreak it up into a training set and a test set. In order to get results that would still be an indicatorof generalization capabilities, the leave-one-out method was used. Basically, each algorithm wasrun 100 times, with a different face being withheld from the training set each time. This facewas then tested on the discriminant functions computed for it.

Attractive face generation is based on the results provided by subjects. In each class, themean and standard deviation for each coefficient is computed. Then, using a gaussian pseudo-random number generator initialized with the computed

and2, a new random face in thatclass is generated.

Table 1.

Results

A summary of the results can be seen in table 1. None of the methods used wasparticularly adept at guessing the exact right class. However, all methods, with the exception ofsupport vector machines with a quartic kernel, did much better than the random classifier. Thismay not seem very impressive, but

given that the number of classes is so small, a randomclassifier actually is likely to get a large percentage of its guesses within a few points of the

Euclidean

PerceptronPocket

LinearSVM

QuadraticSVM

CubicSVM

QuarticSVM

UniformRandom

GaussianRandom

Percent Correct

18.00%

26.00%

20.00%

20.00%

26.00%

10.00%

13.00%

3.00%

Percent within one

51.00%

61.00%

53.00%

52.00%

46.00%

24.00%

44.00%

18.00%

Percent within two

71.00%

80.00%

77.00%

78.00%

74.00%

48.00%

66.00%

33.00%

actual class. The perceptron algorithm worked the best for this data set, but the difference inaccuracy between the different techniques is minimal. One would expect that with a larger andbetter data set, the classification rates would be much higher.

It should also be mentioned that for all the non-linear SVM classifications, a subset of thedata containing only the first 50 images was used. For some reason, the algorithm would notconverge with a data size any larger than 50. This could be part of the reason why non-linearsupport vector machines actually did slightly worse than the linear version.

With a data set cut inhalf, the performance probably should be a little worse.

The results of facial generation are shown in figure 1. These images are a result ofaveraging all the images in each of the six classes, where class one is rated the least

attractive,and class six is rated the most attractive. This averaging was done because the technique ofusing a random number generator to generate a random face is not consistent enough to alwaysgive clear face images.

The subjects did not rate thegenerated images, but there are some subjective differencesthat can be noticed. For one, there does not always seem to be much difference in appearancebetween classes. However, there is a noticeable difference in the forehead size, with moreattractive

classes having bigger foreheads than less attractive classes. In addition, there seems tobe a slight softening of features in the progression from less attractive to more attractive, as theface is remarkably more round in class four than in class one.

Classes five and six did notreconstruct as well as the earlier classes, probably because of smaller class sizes. However, theystill appear at least as attractive as the earlier classes despite the blurriness.

Future Work

To continue or improve this research, the first task would be to get better data. In thedata set given, many of the faces were badly distorted from the reconstruction process. Whilethese images may be okay for a classifying machine once it starts, the images shown to thesubjects for rating should be original images. Also, some faces occurred in the data more thanonce with slightly varying poses. This is probably not inherently harmful, and it may evenprovide some insight into which poses or lighting conditions are most attractive. However, byduplicating some of the images in an already small dataset, the amount of meaningful variationbecomes even smaller.

Given better data, it may be possible to map out an attractiveness space for female faces.Average faces are attractive, but not optimally attractive (11). It would be interesting to explorewhat sort of deviations from average increase or decrease attractiveness, and try to fit some sortof shape to attractiveness. With a large enough initial data set, one would expect many attractivefaces that may be attractive for different reasons. These different attractive faces could be usedto generate new images. After subjects rate these synthesized images, certain features could bepicked out that are more or less conducive to attractiveness.

Another improvement to this system would be increased automation. One of the nicethings about the data set used is that the images were all aligned and scaled properly already. Ifthere was a system that could take in any image, find the face, scale the image, and align theeyes, then any image could be used as input, not just those taken by one research group.

[5] J. Huang, V. Blanz, and B. Heisele, “Face Recognition Using Component-Based SVMClassification and Morphable Models,”Proceedings of the First International Workshop onPattern Recognition Using Support Vector Machines,pp. 135-143, August 10, 2002.

[6] M. Kirby, and L. Sirovich, “Application of the Karhunen-Loeve procedure for thecharacterization of human faces”,IEEE Transactions on Pattern Analysis and MachineIntelligence,