Sign up to receive free email alerts when patent applications with chosen keywords are publishedSIGN UP

Abstract:

A digital image acquisition system having no photographic film, such as a
digital camera, has a flash unit for providing illumination during image
capture and a red-eye filter for detecting a region within a captured
image indicative of a red-eye phenomenon, the detection being based upon
a comparison of the captured image and a reference image of nominally the
same scene taken without flash. In the embodiment the reference image is
a preview image of lower pixel resolution than the captured image, the
filter matching the pixel resolutions of the captured and reference
images by up-sampling the preview image and/or sub-sampling the captured
image. The filter also aligns at least portions of the captured image and
reference image prior to comparison to allow for, e.g. movement in the
subject.

Claims:

1. A portable digital camera device having no photographic film,
comprising: a digital image capturing apparatus including an image sensor
for capturing a main image including a face region, focusing means
configured to acquire multiple images at least approximately of said same
scene as the main image including said face region and at different focus
distances and at different degrees of focus based on the different focus
distances; processing means configured to analyze the different degrees
of focus based on the different focus distances and to determine an
approximate distance to the face and to focus the face on the one or more
image sensors; a dedicated hardware subsystem which is provided image
data directly from the image sensor; a face filter for determining a
group of pixels that include the face region within the image data; a
subsampling module for changing a degree of subsampling based on a
complexity of calculation for said face filter; and a main system memory
which receives raw image data and metadata as implemented by the
dedicated hardware subsystem; and wherein the main system memory is
configured to receive metadata that comprises one or more identified face
regions within the main image.

2. The portable digital camera device of claim 1, wherein the multiple
images comprise a subsampled version of approximately the main image.

3. The portable digital camera device of claim 2, wherein the multiple
images comprise a subsample resolution representation of approximately
the main image.

4. The portable digital camera device of claim 1, wherein said changing
the degree of said subsampling is determined empirically.

5. The portable digital camera device of claim 1, wherein said changing
the degree of said subsampling is determined based on a size of said face
region.

6. The portable digital camera device of claim 1, wherein said changing
the degree of said subsampling is determined based on a relative size of
said face region.

7. The portable digital camera device of claim 1, wherein said changing
the degree of said subsampling is determined based on a region size.

10. The portable digital camera of claim 1, wherein one or more of the
multiple images comprise at least one precapture image.

11. The portable digital camera of claim 10, wherein the precapture image
is captured just before the main image is captured.

12. The portable digital camera of claim 1, further comprising one or
more further filters for correcting exposure, focus or composition of the
main image, or combinations thereof.

13. The portable digital camera of claim 1, further comprising a filter
for aligning two or more of the multiple images.

14. The portable digital camera of claim 13, wherein the aligning
comprises estimating camera or object movement, or both, in one or more
intervals between capture of the two or more of the multiple images.

15. A method of focusing a face region onto an image sensor of a digital
image capturing apparatus, comprising: focusing the face region onto the
image sensor including acquiring multiple images at least approximately
of said same scene as the main image including said face region and at
different focus distances and at different degrees of focus based on the
different focus distances; analyzing the different degrees of focus based
on the different focus distances and determining an approximate distance
to the face for focusing the face region onto the image sensor; providing
image data to a dedicated hardware subsystem directly from the image
sensor; determining a group of pixels that include the face region within
the image data; changing a degree of subsampling based on a complexity of
calculation for said face filter; and providing raw image data and
metadata to a main memory as implemented by the dedicated hardware
subsystem; and wherein the metadata comprises one or more identified face
regions within the main image.

16. The method of claim 15, further comprising determining empirically
said changing the degree of said subsampling.

17. The method of claim 15, further comprising determining said changing
the degree of said subsampling based on a size of said face region.

18. The method of claim 15, further comprising determining said changing
the degree of said subsampling based on a relative size of said face
region.

19. The method of claim 15, further comprising determining said changing
the degree of said subsampling based on a region size.

20. The method of claim 15, further comprising capturing one or more
precapture images just before capturing the main image.

21. The method of claim 15, further comprising aligning two or more of
the multiple images.

22. The method of claim 21, wherein the aligning comprises estimating
camera or object movement, or both, in one or more intervals between
capture of the two or more of the multiple images.

23. One or more non-transitory processor-readable media having code
embedded therein for programming one or more processors to perform a
method of focusing a face region onto an image sensor of a digital image
capturing apparatus, wherein the method comprises: focusing the face
region onto the image sensor including acquiring multiple images at least
approximately of said same scene as the main image including said face
region and at different focus distances and at different degrees of focus
based on the different focus distances; analyzing the different degrees
of focus based on the different focus distances and determining an
approximate distance to the face for focusing the face region onto the
image sensor; providing image data to a dedicated hardware subsystem
directly from the image sensor; determining a group of pixels that
include the face region within the image data; changing a degree of
subsampling based on a complexity of calculation for said face filter;
and providing raw image data and metadata to a main memory as implemented
by the dedicated hardware subsystem; and wherein the metadata comprises
one or more identified face regions within the main image.

24. The one or more non-transitory processor-readable media of claim 23,
further comprising determining empirically said changing the degree of
said subsampling.

25. The one or more non-transitory processor-readable media of claim 23,
further comprising determining said changing the degree of said
subsampling based on a size of said face region.

26. The one or more non-transitory processor-readable media of claim 23,
further comprising determining said changing the degree of said
subsampling based on a relative size of said face region.

27. The one or more non-transitory processor-readable media of claim 23,
further comprising determining said changing the degree of said
subsampling based on a region size.

28. The one or more non-transitory processor-readable media of claim 23,
further comprising capturing one or more precapture images just before
capturing the main image.

29. The one or more non-transitory processor-readable media of claim 23,
further comprising aligning two or more of the multiple images.

30. The one or more non-transitory processor-readable media of claim 29,
wherein the aligning comprises estimating camera or object movement, or
both, in one or more intervals between capture of the two or more of the
multiple images.

Description:

PRIORITY AND RELATED APPLICATIONS

[0001] This application is a Continuation of U.S. patent application Ser.
No. 13/026,303, filed Feb. 14, 2011; which is a Continuation of U.S.
patent application Ser. No. 12/707,531, filed Feb. 17, 2010; now U.S.
Pat. No. 7,973,828; which is a Continuation of U.S. patent application
Ser. No. 12/119,614, filed May 13, 2008, now U.S. Pat. No. 7,787,022;
which is a Continuation of U.S. patent application Ser. No. 10/919,226,
filed Aug. 16, 2004, now U.S. Pat. No. 7,738,015; which is a
continuation-in-part (CIP) application which claims the benefit of
priority to U.S. patent application Ser. No. 10/772,767, filed Feb. 4,
2004, now U.S. Pat. No. 7,352,394; which is a CIP that claims the benefit
of priority to U.S. patent application Ser. No. 10/635,862, filed Aug. 5,
2003, now U.S. Pat. No. 7,630,006.

[0002] This application is related to U.S. patent application Ser. No.
10/170,511, filed Jun. 12, 2002, now U.S. Pat. No. 7,042,505; which is a
Continuation of U.S. patent application Ser. No. 08/947,603, filed Oct.
9, 1997, now U.S. Pat. No. 6,407,777; which are each hereby incorporated
by reference. This application is also related to U.S. patent
applications Ser. No. 10/635,918, filed Aug. 5, 2003, now abandoned, and
Ser. No. 10/773,092, filed Feb. 4, 2004.

FIELD OF THE INVENTION

[0003] The invention relates generally to the area of digital photography,
and more specifically to filtering "red-eye" artefacts from a
flash-induced digital camera image.

BACKGROUND OF THE INVENTION

[0004] "Red-eye" is a phenomenon in flash photography where a flash is
reflected within a subject's eye and appears in a photograph as a red dot
where the black pupil of the subject's eye would normally appear. The
unnatural glowing red of an eye is due to internal reflections from the
vascular membrane behind the retina, which is rich in blood vessels. This
objectionable phenomenon is well understood to be caused in part by a
small angle between the flash of the camera and the lens of the camera.
This angle has decreased with the miniaturization of cameras with
integral flash capabilities. Additional contributors include the relative
closeness of the subject to the camera and ambient light levels.

[0005] The red-eye phenomenon can be minimized by causing the iris to
reduce the opening of the pupil. This is typically done with a
"pre-flash", a flash or illumination of light shortly before a flash
photograph is taken. This causes the iris to close. Unfortunately, the
pre-flash is an objectionable 0.2 to 0.6 seconds prior to the flash
photograph. This delay is readily discernible and easily within the
reaction time of a human subject. Consequently the subject may believe
the pre-flash is the actual photograph and be in a less than desirable
position at the time of the actual photograph. Alternately, the subject
must be informed of the pre-flash, typically loosing any spontaneity of
the subject captured in the photograph.

[0006] Those familiar with the art have developed complex analysis
processes operating within a camera prior to invoking a pre-flash.
Various conditions are monitored prior to the photograph before the
pre-flash is generated; the conditions include the ambient light level
and the distance of the subject from the camera. Such a system is
described in U.S. Pat. No. 5,070,355, which is hereby incorporated by
reference. Although that invention minimizes the occurrences where a
pre-flash is used, it does not eliminate the need for a pre-flash.

[0007] Digital cameras are becoming more popular and smaller in size.
Digital cameras have several advantages over film cameras. Digital
cameras eliminate the need for film as the image is digitally captured
and stored in a memory array for display on a display screen on the
camera itself. This allows photographs to be viewed and enjoyed virtually
instantaneously as opposed to waiting for film processing. Furthermore,
the digitally captured image may be downloaded to another display device
such as a personal computer or color printer for further enhanced
viewing. Digital cameras include microprocessors for image processing and
compression and camera systems control. It is possible to exploit the
computation capabilities of such microprocessors for performing
operations to improve the red-eye detection and elimination. Thus, what
is needed is a method of better tools for eliminating red-eye phenomenon
within, for example, a digital camera having a flash unit without the
distraction of a pre-flash.

[0008] U.S. Patent Application 2002/0150306 (Baron), which is hereby
incorporated by reference, describes a method for the removal of flash
artefacts by capturing two digital images of a subject, one with flash
and one without flash, and subtracting one image from the other to
provide an artefact image which is then thresholded and subtracted from
the flash image. However, the technique is directed to flash artefacts in
general, and not specifically to red-eye removal. There is no attempt to
identify red-eye regions as compared to any other flash-induced
artefacts. Indeed, there is no attempt to identify particular regions at
all, since the technique is simply one of subtraction and thresholding.

BRIEF SUMMARY OF THE INVENTION

[0009] A system in accordance with the present invention there is provided
a digital image acquisition system having no photographic film,
comprising a portable apparatus for capturing digital images, a flash
unit for providing illumination during image capture, and a red-eye
filter for detecting a region within a captured image indicative of a
red-eye phenomenon, said detection being based upon a comparison of said
captured image and at least one reference eye color characteristic.

[0010] According to one embodiment of the invention, the at least one
reference eye color characteristic includes a reference image of
nominally the same scene taken without flash.

[0011] According to this embodiment, the reference image may be a preview
image of lower pixel resolution than the captured image, the filter may
include programming code for matching the pixel resolutions of the
captured and reference images by up-sampling the preview image and/or
sub-sampling the captured image.

[0012] To allow for inadvertent movement in the subject between taking the
two images, preferably the filter may further include programming code
for aligning at least portions of the captured image and reference image
prior to said comparison.

[0013] In the embodiment the filter detects said region indicative of a
red-eye phenomenon by identifying a region in the captured image at least
having a color indicative of a red-eye phenomenon and comparing said
identified region with the corresponding region in the reference image.
The filter may further designate the region as indicative of a red-eye
phenomenon if said corresponding region does not have a color indicative
of a red-eye phenomenon. The decision as to whether a region has a color
indicative of a red-eye phenomenon may be determined on a statistical
basis as a global operation on the entire region.

[0014] According to a further embodiment, the filter includes a shape
analyser so that the filter is configured to identify a region in the
captured image having both a shape and color indicative of a red-eye
phenomenon for subsequent comparison with a reference eye shape
characteristic and the reference eye color characteristic, respectively.

[0015] In another embodiment, a pixel modifier is included for modifying
the color of the pixels within a region indicative of a red-eye
phenomenon.

[0016] A method in accordance with the present invention is also provided
for detecting a red-eye phenomenon within a digital image acquired by a
digital image acquisition device having no photographic film. The device
includes a portable apparatus for capturing digital images, and a flash
unit for providing illumination during image capture. The method includes
identifying a region within a captured image indicative of a red-eye
phenomenon including comparing the captured image and at least one
reference eye color characteristic.

[0017] According to one embodiment, the at least one reference eye color
characteristic includes an eye color characteristic of a reference image
of nominally the same scene taken without flash.

[0018] The reference image may be a preview image of lower pixel
resolution than the captured image. The method may further include
matching the pixel resolutions of the captured and reference images
including up-sampling the preview image or sub-sampling the captured
image, or a combination thereof.

[0019] The method may further include aligning at least portions of the
captured image and reference image prior to said comparison.

[0020] The method may further include analysing a shape so that the
identifying comprises identifying a region in the captured image having
both a shape and color indicative of a red-eye phenomenon for subsequent
comparison with the corresponding region in the reference image. A shape
may be analysed to determine subsequent to the comparison whether a
region designated as indicative of a red-eye phenomenon has a shape
indicative of a red-eye phenomenon.

[0021] The method may also include detecting a region indicative of a
red-eye phenomenon by identifying a region in the captured image at least
having a color indicative of a red-eye phenomenon and comparing the
identified region with the corresponding region in the reference image,
and designating the region as indicative of a red-eye phenomenon if the
corresponding region does not have a color indicative of a red-eye
phenomenon.

[0022] The method may also include deciding whether a region has a color
indicative of a red-eye phenomenon by determining on a statistical basis
as a global operation on the entire region.

[0023] The method may also include modifying the color of the pixels
within a region indicative of a red-eye phenomenon.

BRIEF DESCRIPTION OF THE DRAWINGS

[0024]FIG. 1 is a block diagram of a camera apparatus operating in
accordance with an embodiment of the present invention.

[0025]FIG. 2 illustrates the workflow of the initial stage of a red-eye
filter using preview data according to the embodiment.

[0026] FIGS. 3-a to 3-d illustrates the pixelation process of an image in
different resolutions.

[0027] FIG. 3-e is a enlargement of a hypothetical digitization of an eye
in an image.

[0028]FIG. 4. illustrates the pixel differences between a red-eye image
and a non red-eye image.

[0029] FIGS. 5-a to 5-d and 6-a and 6-b illustrate the detailed workflow
of the red-eye filter according to the embodiment, and alternatives.

DESCRIPTION OF A PREFERRED EMBODIMENT

[0030]FIG. 1 shows a block diagram of a image acquisition system such as
a digital camera apparatus operating in accordance with the present
invention. The digital acquisition device, also generically referred to
in this application as a camera 20, includes a processor 120. It can be
appreciated that many of the processes implemented in the digital camera
may be implemented in or controlled by software operating in a
microprocessor (μProc), central processing unit (CPU), controller,
digital signal processor (DSP) and/or an application specific integrated
circuit (ASIC), collectively depicted as block 120 and termed as
"processor". Generically, all user interface and control of peripheral
components such as buttons and display is controlled by a μ-controller
122. The processor 120, in response to a user input at 122, such as half
pressing a shutter button (pre-capture mode 32), initiates and controls
the digital photographic process. Ambient light exposure is determined
using light sensor 40 in order to automatically determine if a flash is
to be used. The distance to the subject is determined using focusing
means 50 which also focuses the image on image capture means 60. If a
flash is to be used, processor 120 causes the flash means 70 to generate
a photographic flash in substantial coincidence with the recording of the
image by image capture means 60 upon full depression of the shutter
button. The image capture means 60 digitally records the image in color.
The image capture means is known to those familiar with the art and may
include a CCD (charge coupled device) or CMOS to facilitate digital
recording. The flash may be selectively generated either in response to
the light sensor 40 or a manual input 72 from the user of the camera. The
image recorded by image capture means 60 is stored in image store means
80 which may comprise computer memory such a dynamic random access memory
or a non-volatile memory. The camera is equipped with a display 100, such
as an LCD, for preview and post-view of images. In the case of preview
images, which are generated in the pre-capture mode 32, the display 100
can assist the user in composing the image, as well as being used to
determine focusing and exposure. In case of postview, the image display
can assist the user in viewing suspected red-eye regions and to manually
decide if the region should be corrected or not after viewing it. A
temporary storage space 82 is used to store one or plurality of the
preview images and be part of the image store means 80 or a separate
component. The preview image is usually generated by the same image
capture means 60, and for speed and memory efficiency reasons may be
generated by subsampling the image 124 using software which can be part
of the general processor 120 or dedicated hardware, before displaying 100
or storing 82 the preview image. Depending on the settings of this
hardware subsystem, the pre-acquisition image processing may satisfy some
predetermined criteria prior to storing the preview image. Such criteria
may be chronological--such as save images every 0.5 seconds; more
sophisticated criteria may be analysis of the image for changes, or the
detection of faces in the image. A straightforward preferred embodiment
is to constantly replace the previous saved preview image with a new
captured preview image during the pre-capture mode 32, until the final
full resolution image is captured by full depression of the shutter
button.

[0031] The red-eye filter 90 can be integral to the camera 20 or part of
an external processing device 10 such as a desktop computer, a hand held
device, a cell phone handset or a server. In this embodiment, the filter
receives the captured image from the full resolution image storage 80 as
well as one or a plurality of preview images from the temporary storage
82. The filter 90 analyzes the stored image for characteristics of
red-eye and, if found, modifies the image and removes the red-eye
phenomenon from the image as will be describe in more detail. The red-eye
filter includes a pixel locator 92 for locating pixels having a color
indicative of red-eye; a shape analyzer 94 for determining if a grouping
of at least a portion of the pixels located by the pixel locator comprise
a shape indicative of red-eye; an falsing analyzer 96 for processing the
image around the grouping for details indicative of an image of an eye;
and a pixel modifier 98 for modifying the color of pixels within the
grouping. The modified image may be either displayed on image display
100, saved on a persistent storage 112 which can be internal or a
removable storage such as CF card, SD card or the like, or downloaded to
another device, such as a personal computer, server or printer via image
output means 110 which can be tethered or wireless.

[0032] In a further embodiment where the red-eye filter 90 is located on
an external application in a separate device, 10, such as a desktop
computer, the final captured image stored in block 80 along with a
representation of the preview image as temporarily stored in 82, may be
stored prior to modification on the storage device 112, or transferred
together via the image output means 110 onto the external device 10,
later to be processed by the red-eye filter 90.

[0033]FIG. 2 details the initial stage of the workflow of this
embodiment. It will be understood both this initial stage as well as the
subsequent red-eye correction stage (See FIGS. 5a to 5d) will typically
be performed by software in the camera and/or separate device 10. A
preview image (normally of lesser resolution than the final image) is
generated while the camera is in the pre-capture mode 32 such as when the
user half presses the shutter button. While in this mode, shown in FIG. 2
as the preview mode 210, the camera constantly captures the preview
images 220. The capture interval is usually semi-real time which means
fractions of a tenth of a second or less. The camera saves each new
preview image if it satisfies some test criteria, 122. If not, the camera
continues, 211, to capture the next preview image without saving the
previous one. The process will continue until the final full resolution
image is acquired 280 and saved 282 by fully depressing the shutter
button.

[0034] In a simple embodiment, if the test criteria are met, or if no test
criteria exist, the system will constantly replace the previous saved
preview image with the new preview image, 230. Alternatively, where
multiple preview images can be saved, 240, the new image will be placed
on a chronological FIFO stack, namely First In First Out, where the
system continuously captures and saves new preview images 244 while each
time clearing the oldest image 242 from the stack, until the user takes
the final picture. The reason for storing multiple preview images is
based on the fact that the last image, or any single image, may not be
the best reference image for comparison with the final full resolution
image in the red-eye correction process. By storing multiple images, a
better reference image can be achieved, and a closer alignment between
the preview and the final captured image can be achieved. This concept
will be further discussed in FIGS. 5a to 5c, in the alignment stage 540.
Other reasons for capturing multiple images are that a single image may
be blurred due to motion, the subject had their eyes closed, the exposure
was not set, etc. In a yet alternative embodiment, the multiple images
may assist in creating a single higher quality reference image; either
higher resolution or by taking different portions of different regions
from the multiple images. This concept of sub-pixel resolution may be
combined with the upsampling process as described in FIGS. 5a to 5c,
block 534.

[0035] The test criteria 222 may involve the actual analysis of the
preview image content before deciding whether the new preview image
should replace a previously saved image. Such criteria may be based on
image analysis such as the existence of faces in the image, detection of
eyes or metadata analysis such as the exposure condition, whether a flash
is going to happen, the distance to the subjects, etc.

[0036] As part of the red-eye filter 90 the full resolution image 292 and
the preview image or images 294 will be loaded into working memory, 292
and 294, unless they are already in memory in which they will just be
accessed through a pointer.

[0037] Referring to FIGS. 3a-3e, the digitization process in various
resolutions is explained and depicted. FIG. 3A illustrates the grid like
nature of a sensor as illustrated in FIG. 1, block 60. The sensor
comprises multiple cells 302 which determine the camera resolution. For
example a sensor of 2000×3000 cells will be a 6 Million pixel
sensor (it will be understood that each cell in fact comprises a
plurality of individual sensor elements sensitive to different colors,
e.g. RGB or RGBG, to create each colored image pixel).

[0038] FIG. 3-b depicts the optical projection of a face 310 onto the
sensor 60. Each of the cells 302 records the average light information it
receives for the image. This is the process of digitization and
quantization.

[0039] The degree of details is determined by the resolution of the sensor
as depicted in FIG. 3-c. In this illustration a much smaller sensor is
used, and in this case the same face 310 is digitized into a smaller
number of pixels, or alternatively subsampled from the full resolution
sensor data into a smaller number of pixel representations.

[0040] FIG. 3-d describes the inverse process where the subsampled image
of FIG. 3-c is upsampled to the same size as the original. When comparing
the resolution, naturally the some of the details are lost in this
process. For illustrative example, while in FIG. 3-b the face was roughly
25×25=625 pixels, in FIG. 3-d the face is made of only 5×5=25
pixels.

[0041] Of course, the above are only for illustration purposes. In
practice, due to the larger resolution of the sensors than in this
illustration, a normal eye will be depicted by a much larger pixel count
to be noticeable. FIG. 3-e displays such a digitized eye. In this figure,
an eye 350, as imaged on a sensor 60, will consist of roughly 25 pixels
wide, 352. In particular interest for this invention the inner portion,
the iris 360, in this case will be roughly 8 pixels in diameter, as
illustrated in 462.

[0042] According to a preferred embodiment of this invention, the preview
image and the final image, or portions of them, need to be aligned as
depicted in FIGS. 5a to 5c, block 540. As explained above, the reference
image and the final image may have different resolutions. The discrepancy
in resolution may lead to differences in content, or pixel values, even
though no data was changed in the subject image. In particular, edge
regions when downsampled and then upsampled may have a blurring or an
averaging effect on the pixels. Thus direct comparison of different
resolution images, even when aligned, may lead to false contouring. In
addition, the reference image may be acquired prior to or after the final
image is captured. Due to the above reasons, there is a need to match the
two images, both in content and pixel resolution, as described below.

[0043]FIG. 4 better illustrates the effect of the sub- and up-sample
process in finding the difference pixelwise between two images. In this
case, the input images are the ones illustrated in FIGS. 3-b and 3-d
respectively high resolution and low resolution. In this figure, white
squares such as 430 means that there is no difference between the two.
Checkered squares or pixels, such as 420 means that there is a difference
between the images.

[0044] The flat regions should display no significant differences due to
resolution changes. The main difference will be caused be two reasons
namely edge regions where changes in value occur such as in blocks 410.
However, there is another cause for difference which is of interest to
this invention and displayed in 430. In these pixels, the difference is
caused by the actual change in the color of the eye from normal eyes to
red-eyes. Not only is there a change in the pixel value but the change is
also more specifically reflected as change to a red or light color from
the normal color of the iris or form the black color of the pupil.

[0045] FIGS. 5-a to 5-d illustrate the workflow of the red-eye filter 90
of this embodiment, as well as variations thereof.

[0046] Referring first to FIG. 5-a, there are two input images into the
filter, namely a full resolution image 510, I(x,y) which is the one that
was captured by full depression of the shutter button and needs to be
analyzed for red-eye artefacts, and a preview image 520, P(x,y) which is
used as a reference image and is nominally the same scene as the image
I(x,y) but taken without the flash. The preview image may be a result of
some image processing taking into account multiple preview images and
creating a single image, 522. Methods of improving image quality based on
multiple images are familiar to those versed in the art of image
processing. The resulting output from the analysis process of 522 is a
single preview image.

[0047] The preview image 520 is normally, but not necessarily, of lower
resolution than the full resolution image 510, typically being generated
by clocking out a subset of the sensor cells or by averaging the raw
sensor data. Therefore, the two images, or alternatively the relevant
regions in the images (i.e. the regions containing or suspected to
contain eyes, which can be determined by image processing techniques
known in the art), need to be matched in pixel resolution, 530. In the
present context "pixel resolution" means the size of the image, or
relevant region, in terms of the number of pixels constituting the image
or region concerned. Such a process may be done by either upsampling the
preview image, 534, downsampling the acquired image, 532, or a
combination thereof. Those familiar in the art will be aware of several
techniques best used for such sampling methods. The result of step 530 is
a pair of images I'(x,y) and P'(x,y) corresponding to the original images
I(x,y) and P(x,y), or relevant regions thereof, with matching pixel
resolution. The system and method of the preferred embodiment involves
the detection and removal of red-eye artefacts. The actual removal of the
red-eye will eventually be performed on the full resolution image.
However, all or portions of the detection of red-eye candidate pixel
groupings, the subsequent testing of said pixel groupings for determining
false red-eye groupings, and the initial step of the removal, where the
image is presented to the user for user confirmation of the correction,
can be performed on the entire image, the subsampled image, or a subset
of regions of the entire image or the subsampled image.

[0048] Although nominally of the same scene, the preview image and the
finally acquired full resolution image may differ spatially due to the
temporal lag between capturing the two images. Therefore, the two images,
or relevant regions thereof, may need to be aligned, 540, especially in
respect of regions of the images containing or suspected to contain eyes.
Essentially, alignment means transforming at least one of the images, and
in this embodiment the preview image P'(x,y), to obtain maximum
correlation between the images, or relevant regions thereof, based on
measurable characteristics such as color, texture, edge analysis. Those
familiar in the art are aware of several algorithms to achieve such
alignment; see, for example, U.S. Pat. No. 6,295,367 which is hereby
incorporated by reference and which describes alignment of images due to
object and camera movement, and U.S. Pat. No. 5,933,546, which is also
hereby incorporated by reference and which addresses the use of
multi-resolution data for pattern matching.

[0049] Further discussion on the alignment is presented in FIG. 5-c. In
this Figure, the inputs are the two images I'(x,y) and P'(x,y) as defined
in FIG. 5-a. The alignment may be global for the entire image or local
for specific regions. For example, a simple linear alignment, such as a
shift in the horizontal direction by H pixels, and/or in the vertical
direction by V pixels, or a combination of the two. Mathematically, the
shifted image, P''(x,y), can be described as:

P''(x,y)=P'(x-H, Y-V)

[0050] However, simple translation operation may not suffice in the need
to align the image. Therefore, there may be a need for X-Y shearing,
which is a symmetrical shift of the object's points in the direction of
the axis to correct for perspective changes; X-Y tapering where the
object is pinched by shifting its coordinates towards the axis, the
greater the magnitude of the coordinate the further the shift; or
rotation around an arbitrary point.

[0051] In general, the alignment process may involve an affine
transformation, defined as a special class of projective transformations
that do not move any objects from the affine space 3 to the plane at
infinity or conversely, or any transformation that preserves collinearity
(i.e. all points lying on a line initially still lie on a line after
transformation) and ratios of distances (e.g., the midpoint of a line
segment remains the midpoint after transformation). Geometric
contraction, expansion, dilation, reflection, rotation, shear, similarity
transformations, spiral similarities and translation are all affine
transformations, as are their combinations. In general, the alignment 540
may be achieved via an affine transformation which is a composition of
rotations, translations, dilations, and shears, all well-known to one
familiar in the art of image processing.

[0052] If it is determined through a correlation process that a global
transformation suffices, as determined in block 542=YES, one of the
images, and for simplicity the preview image, will undergo an affine
transformation, 544, to align itself with the final full resolution
image. Mathematically, this transformation can be depicted as:

P''=AP'+q

where A is a linear transformation and q is a translation.

[0053] However, in some cases a global transformation may not work well,
in particular for cases where the subject matter moved, as could happen
when photographing animated objects. In such case, in particular in
images with multiple human subjects, and when the subjects move in
independent fashion, the process of alignment 540 may be broken down,
546, to numerous local regions each with its own affine transformation.
What is important is to align the eyes between the images. Therefore,
according to this alternative, one or multiple local alignments may be
performed, 548, for regions in the vicinity surrounding the eyes, such as
faces.

[0054] Only after the images are aligned can one compare the potential
red-eye colors.

[0055] In the preferred embodiment of FIG. 5-a, the preview image
information is used as part of the falsing stage 96. Blocks 92, 94 and 98
correspond to the same blocks in FIG. 1, being the stages of pixel
locator, shape analyzer and pixel modification respectively. This
embodiment can incorporate pixel locator 92, shape analyzer 94 and pixel
modifier 98 as described in U.S. Pat. No. 6,407,777 (DeLuca),
incorporated by reference above, the functions of the pixel locator 92
and shape analyzer 94 being performed on the image I'(x,y) and the pixel
modifier 98 operating on the original acquired image I(x,y). Block 96,
which is the falsing stage, is improved in this embodiment as compared to
the falsing stage of DeLuca.

[0056] Referring to block 96, for each region of the image I'(x,y)
suspected as red-eye, step 596-2, as identified by steps 92 and 94, the
suspected region is tested by comparing the pixel values of the region
with the pixel values of the corresponding region in the aligned preview
image P''(x,y), 596-6. However, prior to doing so, the regions need to be
prepared and modified for such comparison, 596-4.

[0057] Due to the fact that the regions may not match exactly, a
pixel-by-pixel comparison may not suffice. The reason for the mismatch
may occur due to the original size discrepancy. For example, in edges
this phenomenon is graphically illustrated in FIG. 4. Other reasons for a
mismatch are potential movement of the object, or there may be some
averaging that is done in the low resolution preview image that may loose
high frequency color data. Such effects are referred to as smoothing and
aliasing. In addition, even if the alignment is optimal, there may be
sub-pixel alignment that can not be accounted for. Moreover, there may be
color differences between the preview image, shot using available light
and the acquired full resolution image which is shot using flash. In many
cases, the color transformation between one image to another is not
global and uniform. Therefore, the process of preparing the regions for
comparison.

[0058] This process as illustrated in block 596-4 will be further
described in FIG. 5-d. The underlying concept behind step 596-4 is to
distinguish between differences that are caused due to the acquisition
process and the differences that are caused due to the existence of
red-eye in the image. This problem is well known to one familiar in the
art of statistical pattern matching and scene analysis and image
recognition. An example of such an application taking into account
differences due to resolution is described in U.S. Pat. No. 5,933,546,
which is hereby incorporated by reference.

[0059] If a region in the aligned preview image P''(x,y) was red and the
equivalent region is red in the image I'(x,y), 596-6, that region will be
eliminated from I'(x,y) as a red-eye artefact, 596-9, and the
corresponding region will be eliminated as a red-eye artefact from the
original full resolution image I(x,y). Otherwise, the region will
continue to remain suspected as red-eye, 596-8. The process will
continue, 596-3, for all suspected regions.

[0060] The comparison of the regions for a color value is done as a global
operation on the entire region, and the answer to the question of whether
a region is red or not is made statistically for the entire region and
not pixel by pixel, i.e. it does not depend on the value of any
particular individual pixel. Such approach will account for
inconsistencies on the pixel level that may be resolved statistically
when analyzing a larger collection of pixels consisting of a region. For
example, some regions of the eye may not be fully red, or display other
artefacts such as a glint of high luminance. Other example for the need
of a global statistical operation is the presence of noise in the image.
Techniques are known in the art for such global comparison.

[0061] Based on the information above, the regions finally identified as
red-eye artefacts can be modified, 98, to eliminate the red-eye from the
original full resolution image I(x,y). The modification can be done using
any one of numerous available techniques such as luminance reduction,
chrominance reduction, or subtraction of the artefact, as described in US
Published Patent Application 2002/0150306 (Baron), which is hereby
incorporated by reference.

[0062] FIG. 5-d describes the preparation of regions suspected of red-eye
for comparison as described in FIG. 5-a, block 596-4. As discussed above,
a simple pixel level comparison may not be enough to determine whether
the region is not of red-eye nature. The process of preparation may
include a combination of several components such as creating color
balance between the regions of the preview image and the final image,
1510, analyzing the texture, or differences in high frequency patterns
between the two regions that may have occurred due to the change in
resolution, 1520, and comparing the edges between the two regions, 1530,
where the differences may have occurred due to change in exposure, color
balance, resolution or alignment, and in particular sub pixel alignment.
The color balance step 1510 comprises marking each red-eye region in
I'(x,y) and the corresponding region in P''(x,y), steps 1512 and 1514,
determining the difference in color balance between the region in I'(x,y)
surrounding, but not including, the suspected red-eye region and the
corresponding region of P''(x,y), step 1516, and transforming the entire
region, including the suspected red-eye region, based on the color
balance difference so determined, step 1518.

[0063] As an alternative embodiment of this invention, the preview image
can be used as part of the pixel locator stage 92, as illustrated in FIG.
5-b, rather than as part of the falsing analyzer 96. In FIG. 5-b, blocks
510, 520, 522, 530, 532, 534, 540, 94 and 98 are identical to those in
FIG. 5-a. According to this embodiment, the use of the preview image in
order to detect red-eye artefacts is implemented as part of the red-eye
identification process, otherwise described as the pixel locator 92 in
FIG. 1 but here identified as Pixel Analyser and Region Segmenter 592.

[0064] After the suspected red-eye regions are identified, the process
continues via the shape analysis 94, false detection elimination 96 and
correction 98 as described in FIG. 1. In this case, the falsing detector
96 may be performed according to DeLuca.

[0065] According to this embodiment, after the alignment step 540 the
following steps 592-1a and 592-1b analyse both images I'(x,y) and
P''(x,y) for the presence of pixels having a color indicative of red-eye
(592-1a), for example in the manner of DeLuca, and then identifies
clusters of contiguous red pixels so detected (592-1b). This is known as
segmentation and is more fully described in US Pat. Appn. 2002/0176623,
which is hereby incorporated by reference.

[0066] Now, each region (cluster) with red content in the acquired image
I'(x,y), step 592-2, is compared with the corresponding region in the
aligned preview image P''(x,y). The regions will need to be prepared,
592-4, as previously described in relation to block 596-4 of FIG. 5-a. If
the regions are red in both cases, 592-6=YES, the region will not be
marked as red-eye, no action will be taken and the process will continue
to the next suspected region, 592-3. If the region is red in the acquired
image I'(x,y) while the corresponding region is not red in the preview
image P''(x,y), 592-6=NO, then the region will be marked as suspected
red-eye, 592-8.

[0067] FIG. 6-a shows a modification of the embodiment of FIG. 5-b in
which Step 540 (Align Images) has been divided into two steps, Step 541
(If Possible Globally Align Images) and Step 592-3 (If Required Locally
Align Images). Step 541 corresponds to Steps 542 and 544 of FIG. 5-c.
However, if a global alignment is not possible or practical, the local
alignment is deferred until after red pixel identification and clustering
has been performed, since the presence of such clusters in the two images
I'(x,y) and P'(x,y) will assist in the local alignment. FIG. 6-b shows a
similar modification applied to FIG. 5-a.

[0068] In the embodiments of the invention, in the comparison stages,
592-6 and 596-6 the pixel values do not necessarily have to be compared
with red but may alternatively or additionally be compared with other
values such as yellow, white, pink, brown or other color indicative of a
red-eye phenomenon, or to a range of values, to accommodate other flash
related eye artefacts that are not purely red. Due to the fact that the
eye surface is retro-reflective (due to the smoothness created by the
tears, and the spherical shape of the eyeball), the technique as
described in this specification can assist in the detection of the eyes
in an image. Such existence of an eye can be found by comparison of the
spectral reflection of the flash in the eye with the same region where no
flash was used, and thus without spectral reflection. This comparison may
assist in locating eyes in general and not just eyes with red-eye
artefacts. This process may be implemented by finding the change of small
specular reflections that occur in the eye region when flash illumination
is used such as described in WO 03/026278 (Jarman). The specular
reflections may be used as another indication of suspected regions as
defined in blocks 592-2 and 596-2 by comparing the specular reflection of
the flash image with no specular reflection of the preview image.

[0069] Alternatively to a binary decision of adding or eliminating a
region, 596-8 and 596-9, in the case of a continuous probability for each
region, the process will be revised from a binary decision to changing a
probability decision. The quantitative determination of such change in
probability may be decided based on analysis of the confidence level of
the comparison 592-4 and 596-4.

[0070] The preferred embodiments described above may be modified by adding
or changing operations, steps and/or components in many ways to produce
advantageous alternative embodiments. For example, the reference image
can be a post-view image rather than a preview image, i.e. an image taken
without flash immediately after the flash picture is taken.

[0071] A red-eye correction procedure may begin as described by block 92
with detecting a human face in a digital image and, based on this
detection, finding the eyes in the face (see, e.g., U.S. Pat. No.
6,252,976, U.S. Publ. Pat. App. No. 2003/0044070 and U.S. Pat. No.
6,278,491, which are hereby incorporated by reference). This procedure
may also be used for creating the regional alignment 546 and color
balance 1510.

[0072] A range of alternative techniques may be employed to detect and
verify the existence of red-eye defects in an image (see, e.g., U.S.
Publ. Pat. Apps. No. 2003/0044177 and 2003/0044178, which are hereby
incorporated by reference). These techniques may be incorporated into the
pixel locator, shape analyzer, falsing analyzer and pixel modifier
corresponding to blocks 92, 94, 96 and 98. A camera may include software
or firmware for automatically detecting a red-eye image using a variety
of image characteristics such as image brightness, contrast, the presence
of human skin and related colors. The analysis of these image
characteristics may be utilized, based on certain pre-determined
statistical thresholds, to decide if red-eye defects exist and if a flash
was used to take the original image.

[0073] The preferred embodiments described herein may involve expanded
digital acquisition technology that inherently involves digital cameras,
but that may be integrated with other devices such as cell-phones
equipped with an acquisition component, toy cameras etc. The digital
camera or other image acquisition device of the preferred embodiment has
the capability to record not only image data, but also additional data
referred to as meta-data. The file header of an image file, such as JPEG,
TIFF, JPEG-2000, etc., may include capture information including the
preview image, for processing and red-eye detection at a later post
processing stage, which may be performed in the acquisition device or in
a separate device such as a personal computer. The preferred embodiments
described herein serve to improve the detection of red-eyes in images,
while eliminating or reducing the occurrence of false positives, and to
improve the correction of the detected artefacts.

[0074] While an exemplary drawing and specific embodiments of the present
invention have been described and illustrated, it is to be understood
that that the scope of the present invention is not to be limited to the
particular embodiments discussed. Thus, the embodiments shall be regarded
as illustrative rather than restrictive, and it should be understood that
variations may be made in those embodiments by workers skilled in the
arts without departing from the scope of the present invention, as set
forth in the claims below and structural and functional equivalents
thereof.

[0075] In addition, in methods that may be performed according to
preferred embodiments herein and that may have been described above, the
operations have been described in selected typographical sequences.
However, the sequences have been selected and so ordered for
typographical convenience and are not intended to imply any particular
order for performing the operations, unless expressly set forth or
understood by those skilled in the art being necessary.

[0076] Thus, the preferred embodiments described herein provide an
improved method and apparatus for detecting red-eye phenomenon within
images taken by a digital camera having a flash while eliminating or
reducing the occurrence of false positives by using preview information.

[0077] In addition to all of the references cited above that have been
incorporated by reference, the sections entitled BACKGROUND, SUMMARY OF
THE INVENTION, ABSTRACT and BRIEF DESCRIPTION OF THE DRAWINGS, are hereby
incorporated by reference into the DETAILED DESCRIPTION OF THE PREFERRED
EMBODIMENT.

Patent applications by Eran Steinberg, San Francisco, CA US

Patent applications by Michael J. Deluca, Boca Raton, FL US

Patent applications by Peter Corcoran, Claregalway IE

Patent applications by Petronel Bigioi, Galway IE

Patent applications by Yury Prilutsky, San Mateo, CA US

Patent applications in class Including noise or undesired signal reduction

Patent applications in all subclasses Including noise or undesired signal reduction