Fusion-Based Approach for Long-Range Night-Time Facial

ABSTRACT Long range identification using facial recognition is being pursued as a valuable surveillance tool. The capability to perform this task covertly and in total darkness greatly enhances the operators ability to maintain a large distance between themselves and a possible hostile target. An active-SWIR video imaging system has been developed to produce high-quality long-range night/day facial imagery for this purpose. Most facial recognition techniques match a single input probe image against a gallery of possible match candidates. When resolution, wavelength, and uncontrolled conditions reduce the accuracy of single-image matching, multiple probe images of the same subject can be matched to the watch-list and the results fused to increase accuracy. If multiple probe images are acquired from video over a short period of time, the high correlation between the images tends to produce similar matching results, which should reduce the benefit of the fusion. In contrast, fusing matching results from multiple images acquired over a longer period of time, where the images show more variability, should produce a more accurate result. In general, image variables could include pose angle, field-of-view, lighting condition, facial expression, target to sensor distance, contrast, and image background. Long-range short wave infrared (SWIR) video was used to generate probe image datasets containing different levels of variability. Face matching results for each image in each dataset were fused, and the results compared. Keywords: Image Fusion, Face Recognition, SWIR, Night Vision, Surveillance, Biometrics, Active Imaging

1. INTRODUCTION As tensions increase in areas around the world, new technologies are continually being created and evaluated to protect oneself or property. Video surveillance systems are very popular because they put a standoff distance between the target and the operator, can be automated for 24 hour operation, and also can be used for post-processing. When teamed with add-on software packages, these surveillance systems become a very formidable tool for defense applications. Facial recognition systems are an example of this and are becoming increasingly popular in many communities including the military, commercial, and private sectors. As the desire for this technology grows, so does the requirement for its accuracy. The ability to detect and identify individuals at long ranges, with high accuracy, in short periods of time, whether day or night, in all weather conditions, while remaining completely tactical and covert would prove highly favorable in this application. For this reason, the West Virginia High Technology Consortium Foundation (WVHTCF), under a research contract from the Office of Naval Research (ONR), with funding and oversight from the Office of the Secretary of Defense Deployable Force Protection Program (DFP), is developing the Tactical Imager for Night/Day Extended Range Surveillance (TINDERS)1,2,3. This is an actively illuminated short-wave infrared (SWIR) video imaging system that uses illumination that it invisible to conventional silicon optics and the naked eye. This system provides imagery suitable for facial recognition at ranges of up to 400 meters in complete darkness and imagery for tracking purposes beyond three kilometers. Optimization of the face recognition performance can be pursued in three primary ways. First, the optical hardware can be optimized to produce the highest quality SWIR imagery. Second, the face matching algorithms can be optimized to produce the most accurate matching of a single SWIR probe image to a database of visible-spectrum mug-shots. Most of the effort to date has focused on these first two approaches. The third approach, which is discussed in this paper, is to fuse the matching results for multiple SWIR probe images of the same person to increase the identification accuracy. In principle, the longer an individual is monitored by the video surveillance system, the more facial images can be captured and processed, and the more accurate the fused matching result will be. This paper investigates the hypothesis that it is more beneficial to have a low correlation than a high correlation between the fused probe images. The data suggests that the fusion of multiple images with low correlation results in a somewhat larger improvement in accuracy than the fusion of multiple images with high correlation. The results of this work may ultimately lead to the integration of new features in the TINDERS system, and may also be generalizable to other face recognition systems based on video surveillance. Automatic Target Recognition XXIV, edited by Firooz A. Sadjadi, Abhijit Mahalanobis, Proc. of SPIE Vol. 9090, 909006 2014 SPIE CCC code: 0277-786X/14/$18 doi: 10.1117/12.2052725Proc. of SPIE Vol. 9090 909006-1Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 09/27/2014 Terms of Use: http://spiedl.org/terms

2.1 System In 2009, TINdemonstratiodetection andevelopmentand ruggedneTINDERS fularger netwoallowing foridentificationrecorded SW2.2 SWIR ITINDERS isangle is masuperluminesOperating wicompletely intimes as defi1M at the illuwhen comparWhile typicachallenges. skin reflectivnose, lips, anacquired senagainst an ex Figure 12.3 HardwaThe TINDERcomputer. FOverview NDERS began on, this unit wd identificatiot of the secondess. unctions as a nrk. It can be r either manuan can be attempWIR video data.llumination s an actively iatched to the scent light emithin a commonvisible to the ined by the ANuminator outpured to 800 nm.al visible spectrAs illustrated vity, lack of a nnd eyes remainsor data againxisting database. Visible Spectruare RS system is coigure 2 depictsas a laboratorwas mobilized on in completed generation syight/day videotasked or slewal or automaticpted. The TIN

Visible spectrumts: the optical hon of the compoIPTION nitial proof of ccountry whereof this system made in size, wn be deployed aof interest. Thcomplished. torage capacityom capability. otorized opticsdoped fiber amat is greater thaers. The systemstandards4. Tn at this wavelances, operation SWIR and v However, theFor a system te challenge ism mug shot (righhead unit, the eonents and a piconcept. Aftere it demonstratwas then impweight, durabilias a standalonehe system provOnce a persony allowing for The illuminas. The sourcmplifier, runninan 1400 nm, thm remains comThe eye-safetylength allows fon in the SWIRvisible-spectrume landmarks, suto be most use to match the ht) SWIR probe electronics encicture of the der successful lated long rangeproved upon wity, mobility, ue system or as vides live viden is detected,the post-proceation beam divce is a fiber ng at constanthe TINDERS smpletely eye say classification for 65 times mR band does offm imagery incluch as the edgeeful, it must mSWIR facial

image. closure, and theeployable unit.aboratory e person with the usability, part of a eo output positive essing of vergence coupled t power. system is afe at all is Class ore light ffer some lude low es of the match its imagery e control

Figure 2. TINDERS System. (left) Conceptual illustration of TINDERS system (right) Deployable TINDERS system showing optics head, electronics enclosure and control computer. The optical head unit contains all the optics including the mechanisms to move them. Included within is a laser range finder, a focal plane array, and a video processing board. All components are encased in a climate controlled and environmentally sealed enclosure mounted to a FLIR pan/tilt unit. Typically it is deployed on the tripod as depicted above but can also be mounted on a mast or tower. The electronics enclosure contains the supporting power supplies, communication electronics, and illumination source. As previously mentioned, this is a fiber-coupled system, meaning the SLED, amplifier, and filter are contained within this thermal electrically cooled enclosure. The Mil Spec connections facilitate 24 hour operation in all conditions. The system is currently controlled by a semi-rugged computer. The computer does not need to be co-located with the system. Both simply need to be plugged into any ordinary 100 Mbit/s network and 110 VAC. The unit offers a large storage capacity for video post-processing as well as in situ operation. 2.4 Control Software TINDERS can be controlled by an operator from the control computer console or a client running Windows Remote Desktop Connection if the system is network connected. The operator is presented with a number of controls to perform their desired task whether it is surveillance, detection, identification, or tracking. The system has integrated person and face detection algorithms allowing for tracks to be initiated with little operator involvement. Once these are initiated, the operator then can attempt facial recognition as discussed in the next section. TINDERS, as previously mentioned, can also be integrated into a larger network of sensors and report its findings to a database or central location for further scrutiny. 2.5 Facial Recognition The TINDERS control software and the facial recognition package are both contained within one graphical user interface (GUI). The facial recognition platform is powered by a commercially available package developed by MorphoTrust USA. It is based on their Face Examiner Workstation5, but with the addition of custom software that pre-processes the SWIR images before they are matched against a gallery that is pre-populated with a database of visible-spectrum mug shots. Selection of video frames for face recognition can be done in either an automated or manual mode. In the automated mode, faces are detected in video and automatically submitted to the face recognition process. In manual mode, the operator clicks a button to cue a predefined number of SWIR video frames to be submitted to the facial recognition process. Alternatively, the user can load previously-saved files containing SWIR facial imagery. The user can then manually mark the eye positions. Matching is then initiated. Each probe image is then matched against each gallery image and assigned a score. These are then ranked based on the fused score. Proc. of SPIE Vol. 9090 909006-3Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 09/27/2014 Terms of Use: http://spiedl.org/termsEj. '13i l 'I iq 6 I l

Figure 3 illustrates the TINDERS GUI with integrated facial recognition when used in manual mode. The image on the left represents the live video feed. The image on the right is a probe image that was searched against the visible spectrum database. The visible images below show the possible matches with the correct subject scoring Rank 1.

Figure 3. Screenshot of the TINDERS GUI, showing user controls, facial recognition in manual mode, correct match, and other controls. 2.6 Fusion The fusion process involves combining the face matching results from multiple probe images to increase the identification confidence level. The live video feed supplied by the TINDERS system supplies up to 30 frames per second of SWIR facial imagery. As long as the images belong to the same person, they can continue to be matched and the matching results fused. For the results presented in this paper, the maximum score fusion technique is used in which each visible gallery image is assigned a fused matching score equal to its highest single-probe matching score across all of the probe images. The probe images submitted to facial recognition do not necessarily have to be identical to one another; however, in most TINDERS experiments that have been analyzed to date, test subjects were stationary and facing the camera with a neutral facial expression, having a pose angle and facial expression very similar to the enrolled visible-spectrum gallery images. As a result, the fusion process in these experiments has combined the matching results for a set of probe images that closely resemble one another, i.e. probe images with high correlation. While the fused results have always been more accurate than the single-probe results, it is possible that the lack of variability in the probe images has limited the performance improvement provided by fusion. Thus, an experiment was performed to investigate the impact of probe image variability on fusion performance. 3. EXPERIMENTAL DESCRIPTION 3.1 Database The visible-spectrum gallery used for this experiment included enrollment images from two datasets. The first dataset was collected in summer through winter of 2012 in conjunction with a research group at West Virginia University (WVU). This dataset consists of 104 individuals. The second dataset, which was collected specifically for this experiment, included 10 individuals. Thus, the total database size used in this experiment was 114 faces. One individual inadvertently appeared in both datasets; however, this does not affect our ability to test the hypothesis, as the duplicate record affects all matching scores in the same way. 3.2 Experimental Set Up In order to determine whether it is more favorable to have a fusion of images of high or low correlation, sets of images need to be produced having both high and low correlation. These sets need to be produced in a repeatable manner. A Proc. of SPIE Vol. 9090 909006-4Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 09/27/2014 Terms of Use: http://spiedl.org/termsi i

test sequencedataset with The data wasone end of thmeters at thevariability inwere strategilooking at ea3.3 Image AAfter the visiThe subject system was awas recordinthe recordingmark and saycenter mark aspeak out loathe subject wexpression. 160 meter maAt this pointneutral angulFigure 4. facial recFigure 5. facial recSeveral aspeand all of theexamples of image blurrincollection wae was developspecific pose as collected in the structure. Oe other end of tn background. ically placed inach one. Acquisition ible mug shot wwas instructedalready runningng live video. g of ~ 300 videy out loud, Oand repeat. In ad introduced vwas instructed tExample imagark. t, the three exelar progression High Correlaticognition. Low Correlaticognition. cts of the expee 160-m imageoverexposed imng due to cameas not repeatedped that used tangles and faciathe outdoor paOne seat was pthe structure. The parking gn front of both was acquired id to stare withg, had the illumThe subject weo frames at 30One-one thousatotal there wervariability intoto repeat this pgery from 70-mercises are repn. This compleion. SWIR Facion. SWIR Facerimental condry was inadvermagery at 70-mera motion, pad. While the the TINDERS al expressions.arking structureplaced at a ranAt both locatiogarage is dimlyseats that forceindoors, each sh a neutral expminator on, hawas asked to m0 frames per seand, two-one thre nine marks to the facial exprogression witm can be seen ipeated: ten secetes the acquisicial Imagery at 7cial Imagery at 7ditions resultedrtently overexpm and 160-m. articularly at 16overall face msystem to rec. The video wae of the WVHTnge of 70 meteons, back dropy lit offering veed the subjectssubject was ushpression directad the imager zmaintain this poecond. The suhousand. Thethat gave a compression recordthout speaking in Figures 4 ancond frontal neition process. 70-m range show70-m range showd in less than oposed, with the In addition, h60 m. Due to tmatching perforcord SWIR vidas then post-prTCF facility. ers from the syps were placed ery little ambies to turn their hhered to the 70tly into the TIzoom set to theosition and expubject was thene subject was tmplete angular ed from framethe words out nd 5. The subjeutral expressiwing high correlawing low correlaoptimal imagere effect most prhigh winds in thtime constraintrmance on thisdeo of all 10 rocessed for imThe TINDERSystem and anobehind the subent lighting. Sheads approxim0 meter mark inINDERS systee minimum fiepression for 10n instructed to then instructedview of the fae to frame. Onloud, maintainject was then aon, talking angation between imation between imry. First, someronounced at 1he parking struts and harsh ws dataset is expsubjects in themage extractionS system was other at a rangebjects to minimSeveral chalk mmately 20 degren the parking sem. At this pld of view (FO0 seconds, allowlook at the botd to look at thece. Having thece this was comning a constantasked to proceegular progress

mages to be fused

mages to be fusede of the 70-m 60-m. Figure ucture resulted winter weather, pected to be fe second n. set up at e of 160 mize any markings ees when structure. oint, the OV), and wing for ttom left e bottom e subject mpleted, t, neutral ed to the sion, and d for d for imagery 6 shows in some the data far lower Proc. of SPIE Vol. 9090 909006-5Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 09/27/2014 Terms of Use: http://spiedl.org/terms

3.5 Cumulative Match Characteristics The cumulative match characteristic (CMC) curve6 is a convenient tool to compare the face recognition accuracies resulting from the fusion of the different sets of probe images. Because the CMC curve directly shows how often a genuine match is highly ranked by the system, it is better suited to the current experiment than other types of metrics, such as the receiver operating characteristic6. Overlaying the CMC curves for different sets of probe images will allow for easy visualization of the relative matching performance for the fusion of the different sets. To calculate a CMC curve, the fraction of subjects correctly ranked better than or equal to each rank is computed. For instance the Rank 1 value of the CMC is just the fraction of subjects who were correctly identified at Rank 1. For Rank 2, the fraction of subjects correctly identified at Rank 2 or higher is computed, etc. Equation 1 provides the general formula for the CMC, where P(r) is the fraction of test subjects whose fused gallery image matching score had a rank of r out of all gallery images in the database, and k is the rank coordinate on the horizontal axis of the CMC curve. CHC(k) = P(r);k=1 k = 1, , m, (1)

3.6 Results for 70-m data The CMC results for the high-correlation probe sets are shown in Figure 7. The left plot shows the baseline CMC for the fusion of a set of 6 frontal, neutral-expression facial images like those in Figure 4, while the right plot overlays the CMC curves for similar sets with 15 and 24 images. The Rank 1 performance for all three sets is 40%, while the Rank 10 performance of the 15 and 24-image set is 80% as compared to 70% for the 6-image set. This is consistent with a possible small benefit to fusing results for a larger number of very similar probe images, although this result is within the statistical error of the experiment. Note that with a database size of 114 faces, the purely random chance of guessing a Rank 1 match would be 0.9% and the chance of guessing Rank 10 or better would be ~ 9%.

Figure 7. 70-m CMC results for highly-correlated sets of probe images. (left) CMC results for the fusion of 6 frontal, neutral-expression images. (right) CMC results for the fusion of 15 and 24 highly-correlated frontal, neutral-expression images overlaid on the CMC from the left plot. Figure 8 compares the CMC curves for low-correlation probe sets to high-correlation probe sets acquired at 70-m range. The left plot shows the CMC for the 70-m set that includes 6 neutral-frontal images and 9 neutral angle progression images, while the right plot shows the set that also includes the 9 talking angle progression images. Both plots overlay CMC curves for high-correlation probe sets for comparison. In both cases, a clear performance improvement is seen when the fused dataset includes both neutral frontal and a progression of angled poses. In particular, both of the low-correlation image sets have 90% rank 4 performance and 100% rank 10 performance, while the high-correlation image sets have no better than 60% rank 4 performance and 80% rank 10 performance. This is despite the fact that the low-correlation SWIR probe image sets fuse the original 6 neutral-frontal images with additional SWIR images that have a 20 pose angle, which typically have low matching performance against frontal neutral visible gallery images. Proc. of SPIE Vol. 9090 909006-7Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 09/27/2014 Terms of Use: http://spiedl.org/termsC u m u l a t i v e M a t c h C h a r a c t e r i s t i c a t 7 0 m0 . 90 . 80 . 70 . 60 . 50 . 4 -O

Figure 8. Comparison of low-correlation to high-correlation CMC curves at 70-meters range. (left) CMC curve for set containing 6 neutral frontal and 9 neutral angle progression images overlaid with CMC curves for 6 and 15 neutral frontal images. (right) CMC curve for set containing 6 neutral frontal, 9 neutral angle progression, and 9 talking angle progression images overlaid with CMC curves for 6 and 24 neutral frontal images. 3.7 Results for 160-m data As mentioned above, the quality of the 160-m images was severely degraded by overexposure and high winds, reducing the face matching performance well below typical TINDERS performance at this distance. When image quality is poor, face matching results are very sensitive to the marked eye locations, so 4 analysis groups of CMC curves were generated, each group having independent, but internally-consistent eye locations. Figure 9 shows the CMC curves for each of the four groups. Each plot includes a CMC for the 6 and 24 neutral-frontal probe image set as well as the 24-image set containing 6 neutral-frontal, 9 neutral angle progression, and 9 talking angle progression images.

Figure 9. Comparison of low-correlation (green) to high-correlation (blue and red) CMC curves at 160-meters range. The low-correlation probe set includes 6 neutral-frontal images, 9 neutral angle progression, and 9 talking angle progression images. High-correlation probe sets include 6 and 24 neutral-frontal images. Image quality was degraded by overexposure and motion blur due to high winds. Each plot uses the same images but with independent manual marking of eye locations. Note the large variability in the data between plots. (c) (a) (b) (d) Proc. of SPIE Vol. 9090 909006-8Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 09/27/2014 Terms of Use: http://spiedl.org/terms

As can be seen from Figure 9, there was a large amount of statistical variability between the four analysis groups, and it was not possible to discern a statistically significant difference in matching performance between the high-correlation and low-correlation probe image sets at 160 m. While the degraded imagery led to overall low matching performance, achieving only 10% - 20% rank 1 success, and only 50% - 60% rank 10 success, the success rates were still far better than random chance, which would have predicted only 0.9% and 9% rank 1 and rank 10 success rates, respectively, indicating that the facial features still played an important role in determining the CMC curves. One possible explanation for the lack of observed benefit for fusing low-correlation images is the large 20 pose angle used in the angle progression images. 4. DISCUSSION

By fusing the face matching results from multiple probe images of the same person, it is possible to improve overall matching performance. It is logical to expect that probe images that closely resemble each other will produce similar matching scores against the gallery, and thus derive limited overall benefit from fusion. In contrast, it is logical to expect that probe images with varying pose angle and facial expression, within some useable range, will produce more variation in matching scores, thus deriving a larger benefit from fusion. It is also clear that probe images with very high pose angle or extreme expressions, where single-probe matching performance is extremely low, would not be expected to improve matching performance when fused with other probe images within the useable range. The 70-m results shown in Figure 8 clearly show that the fusion of 24 images with variable pose angles and expressions results in higher matching performance than the fusion of 24 images with very similar pose angle and expression. These results indicate that even images with pose angle of 20 can have a positive impact on matching performance, when the image quality is sufficiently high. In contrast, the 160-m results shown in Figure 9, which used low-quality, overexposed probe images, showed no statistically-significant improvement to matching performance when images with varying 20 poses were added to the fused probe image set. One possible explanation for this is that the range of useable pose angles decreases as image quality decreases. It is possible that a low-correlation probe image set of the same quality, using a smaller pose angle such as 10, might have resulted in a larger improvement to the fused matching score at 160 m. Experiments that include a larger number of subjects, automatic eye location, and more low-to-intermediate pose angles will be needed to better understand and quantify the relationship between image correlation and fused face matching performance. 5. ACKNOWLEDGMENTS This research was performed under contract N00014-09-C-0064 from the Office of Naval Research, with funding and oversight from the Deployable Force Protection Science and Technology Program. The authors would like to acknowledge important technical contributions from Jason Stanley, Kenneth Witt, William McCormick, and the MorphoTrust USA team, as well as the cooperation of the WVU Center for Identification Technology Research. REFERENCES [1] Robert B. Martin, Mikhail Sluch, Kristopher M. Kafka, Robert V. Ice, and Brian E. Lemoff, Active-SWIR signatures for long-range night/day human detection and identification, Proc. SPIE 8734, Active and Passive Signatures IV, 87340J (May 23, 2013). [2] Brian E. Lemoff, Robert B. Martin, Mikhail Sluch, Kristopher M. Kafka, William B. McCormick, and Robert V. Ice, Long-range night/day human identification using active-SWIR imaging, Proc. SPIE 8704, Infrared Technology and Applications XXXIX, 87042J (June 18, 2013). [3] Brian E. Lemoff, Robert B. Martin, Mikhail Sluch, Kristopher M. Kafka, William B. McCormick, and Robert V. Ice, Automated night/day standoff detection, tracking, and identification of personnel for installation protection, Proc. SPIE 8711, Sensors, and Command, Control, Communications, and Intelligence (C3I) Technologies for Homeland Security and Homeland Defense XII, 87110N (June 6, 2013). [4] Laser Safety, Wikipedia. 2013. Web. 12 March 2013. < http://en.wikipedia.org/wiki/Laser_safety> [5] MorphoTrust USA Face Examiner Workstation web page. http://www.morphotrust.com/IdentitySolutions/ForFederalAgencies/Officer360/Investigator360/ABIS/FaceExaminerWorkstation.aspx . Biometrics Testing and Statistics: http://www.biometrics.gov/documents/biotestingandstats.pdf. Proc. of SPIE Vol. 9090 909006-9Downloaded From: http://opticalengineering.spiedigitallibrary.org/ on 09/27/2014 Terms of Use: http://spiedl.org/terms