Illumination estimation is the main step in color constancy processing, also an important prerequisite for digital color image reproduction and many computer vision applications. In this paper, a method for estimating illuminant spectrum is investigated using a digital color camera and a color chart under the situation when the spectral reflectance of the chart is known. The method is based on measuring CIEXYZ of the chart using the camera. The first step of the method is to gain camera′s color correction matrix and gamma values by taking a photo of the chart under a standard illuminant. The second step is to take a photo of the chart under an estimated illuminant, and the camera′s inherent RGB values are converted to the standard sRGB values and further converted to CIEXYZ of the chart. Based on measured CIEXYZ and known spectral reflectance of the chart, the spectral power distribution (SPD) of the illuminant is estimated using the Wiener estimation and smoothing estimation. To evaluate the performance of the method quantitatively, the goodnessfitting coefficient (GFC) was used to measure the spectral match and the CIELAB color difference metric was used to evaluate the color match between color patches under the estimated and actual SPDs. The simulated experiment was carried to estimate CIE standard illuminant D50 and C using X-rite ColorChecker 24-color chart, the actual experiment was carried to estimate daylight and illuminant A using two consumergrade cameras and the chart, and the experiment results verified feasible of the investigated method.

We present a high dynamic range (HDR) imaging system design scheme based on coded aperture technique. This
scheme can help us obtain HDR images which have extended depth of field. We adopt Sparse coding algorithm to
design coded patterns. Then we utilize the sensor unit to acquire coded images under different exposure settings. With
the guide of the multiple exposure parameters, a series of low dynamic range (LDR) coded images are reconstructed. We
use some existing algorithms to fuse and display a HDR image by those LDR images. We build an optical simulation
model and get some simulation images to verify the novel system.

Plenoptic camera records the 4D light field data by storing the spatial information and angular information. Meanwhile, it
introduces the trade-off between spatial resolution and angular resolution. We proposed a new camera design which has
been modulated in Fourier domain. High resolution 4D light field could be reconstructed from the coded image by sparse
reconstruction. A simulation is carried out to evaluate the performance of the camera design. The reconstructed light field
has a better performance than the conventional plenoptic camera.

The point spread function (PSF) of imaging system with coded mask is generally acquired by practical measure-
ment with calibration light source. As the thermal radiation of coded masks are relatively severe than it is in
visible imaging systems, which buries the modulation effects of the mask pattern, it is difficult to estimate and
evaluate the performance of mask pattern from measured results. To tackle this problem, a model for infrared
imaging systems with masks is presented in this paper. The model is composed with two functional components,
the coded mask imaging with ideal focused lenses and the imperfection imaging with practical lenses. Ignoring
the thermal radiation, the systems PSF can then be represented by a convolution of the diffraction pattern of
mask with the PSF of practical lenses. To evaluate performances of different mask patterns, a set of criterion
are designed according to different imaging and recovery methods. Furthermore, imaging results with inclined
plane waves are analyzed to achieve the variation of PSF within the view field. The influence of mask cell size
is also analyzed to control the diffraction pattern. Numerical results show that mask pattern for direct imaging
systems should have more random structures, while more periodic structures are needed in system with image
reconstruction. By adjusting the combination of random and periodic arrangement, desired diffraction pattern
can be achieved.

Accurate and fast detection of small infrared target has very important meaning for infrared precise guidance, early
warning, video surveillance, etc. Based on human visual attention mechanism, an automatic detection algorithm for
small infrared target is presented. In this paper, instead of searching for infrared targets, we model regular patches that do
not attract much attention by our visual system. This is inspired by the property that the regular patches in spatial domain
turn out to correspond to the spikes in the amplitude spectrum. Unlike recent approaches using global spectral filtering,
we define the concept of local maxima suppression using local spectral filtering to smooth the spikes in the amplitude
spectrum, thereby producing the pop-out of the infrared targets. In the proposed method, we firstly compute the
amplitude spectrum of an input infrared image. Second, we find the local maxima of the amplitude spectrum using cubic
facet model. Third, we suppress the local maxima using the convolution of the local spectrum with a low-pass Gaussian
kernel of an appropriate scale. At last, the detection result in spatial domain is obtained by reconstructing the 2D signal
using the original phase and the log amplitude spectrum by suppressing local maxima. The experiments are performed
for some real-life IR images, and the results prove that the proposed method has satisfying detection effectiveness and
robustness. Meanwhile, it has high detection efficiency and can be further used for real-time detection and tracking.

We present a new hybrid camera system based on spatial light modulator (SLM) to capture texture-adaptive
high-resolution hyperspectral video. The hybrid camera system records a hyperspectral video with low spatial resolution
using a gray camera and a high-spatial resolution video using a RGB camera. The hyperspectral video is subsampled by
the SLM. The subsampled points can be adaptively selected according to the texture characteristic of the scene by
combining with digital imaging analysis and computational processing. In this paper, we propose an adaptive sampling
method utilizing texture segmentation and wavelet transform (WT). We also demonstrate the effectiveness of the
sampled pattern on the SLM with the proposed method.

Due to the absorption and scattering of water, images acquired in underwater environment have different colors from
those in air, which can cause problem for image processing and object recognition. Addressing the problem of color
correction, this paper presents a method of color restoration based on water absorption spectrum. Considering the
nonlinear attenuate of light in different wavelength at different depths, the changes of tri-stimulus values are calculated.
Experiments are carried out in coastal seawater. The change of tri-stimulus values are used to compensate color loss. The
results demonstrate the feasibility of our method.

High resolution hyperspectral images have important applications in many areas, such as anomaly detection, target
recognition and image classification. Due to the limitation of the sensors, it is challenging to obtain high spatial
resolution hyperspectral images. Recently, the methods that reconstruct high spatial resolution hyperspectral images
from the pair of low resolution hyperspectral images and high resolution RGB image of the same scene have shown
promising results. In these methods, sparse non-negative matrix factorization (SNNMF) technique was proposed to
exploit the spectral correlations among the RGB and spectral images. However, only the spectral correlations were
exploited in these methods, ignoring the abundant spatial structural correlations of the hyperspectral images. In this
paper, we propose a novel algorithm combining the structural sparse representation and non-negative matrix
factorization technique to exploit the spectral-spatial structure correlations and nonlocal similarity of the hyperspectral
images. Compared with SNNMF, our method makes use of both the spectral and spatial redundancies of hyperspectral
images, leading to better reconstruction performance. The proposed optimization problem is efficiently solved by using
the alternating direction method of multipliers (ADMM) technique. Experiments on a public database show that our
approach performs better than other state-of-the-art methods on the visual effect and in the quantitative assessment.

Compressed sensing (CS) is a new branch for information theory from the development of mathematical in 21st. CS
provides a state-of-art technique that we can reconstruct sparse signal from a very limited number of measurements.
In CS, reconstruct algorithm often need dense computation. The well-know algorithms like Basis Pursuit (BP) or
Matching Pursuit (MP) is not likely to implement in PCs in practice. In this paper, we consider to use GPU (Graphic
Processing Unit) and its large-scale computation ability to solve this problem. Based on the recently released NVIDIA
CUDA 6.0 Tool Kit and CUBLAS library we study the GPU implementation of Orthogonal Matching Pursuit (OMP), and
Two-Step Iterative Shrinkage algorithm (TwIST) implementing on GPU. The result shows that compared with CPU,
implementing those algorithms on GPU can get an obvious speed up without losing any accuracy.

Image super-resolution (SR) is widely used in the fields of civil and military, especially for the low-resolution remote
sensing images limited by the sensor. Single-image SR refers to the task of restoring a high-resolution (HR) image from
the low-resolution image coupled with some prior knowledge as a regularization term. One classic method regularizes
image by total variation (TV) and/or wavelet or some other transform which introduce some artifacts. To compress these
shortages, a new framework for single image SR is proposed by utilizing an adaptive filter before regularization. The key
of our model is that the adaptive filter is used to remove the spatial relevance among pixels first and then only the high
frequency (HF) part, which is sparser in TV and transform domain, is considered as the regularization term. Concretely,
through transforming the original model, the SR question can be solved by two alternate iteration sub-problems. Before
each iteration, the adaptive filter should be updated to estimate the initial HF. A high quality HF part and HR image can
be obtained by solving the first and second sub-problem, respectively. In experimental part, a set of remote sensing
images captured by Landsat satellites are tested to demonstrate the effectiveness of the proposed framework.
Experimental results show the outstanding performance of the proposed method in quantitative evaluation and visual
fidelity compared with the state-of-the-art methods.

Compared to traditional digital cameras, light field (LF) cameras measure not only the intensity of rays, but also their
light field information. As LF cameras trade a good deal of spatial resolution for extra angular information, they provide
lower spatial resolution than traditional digital cameras. In this paper, we show a hybrid imaging system consisting of a
LF camera and a high-resolution traditional digital camera, achieving both high spatial resolution and high angular
resolution. We build an example prototype using a Lytro camera and a DSLR camera to generate a LF image with 10
megapixel spatial resolution and get high-resolution digital refocused images, multi-view images and all-focused images.

To compensate the deficit of 3D content, 2D to 3D video conversion (2D-to-3D) has recently attracted more
attention from both industrial and academic communities. The semi-automatic 2D-to-3D conversion which
estimates corresponding depth of non-key-frames through key-frames is more desirable owing to its advantage
of balancing labor cost and 3D effects. The location of key-frames plays a role on quality of depth propagation.
This paper proposes a semi-automatic 2D-to-3D scheme with adaptive key-frame selection to keep temporal
continuity more reliable and reduce the depth propagation errors caused by occlusion. The potential key-frames
would be localized in terms of clustered color variation and motion intensity. The distance of key-frame interval
is also taken into account to keep the accumulated propagation errors under control and guarantee minimal user
interaction. Once their depth maps are aligned with user interaction, the non-key-frames depth maps would be
automatically propagated by shifted bilateral filtering. Considering that depth of objects may change due to
the objects motion or camera zoom in/out effect, a bi-directional depth propagation scheme is adopted where a
non-key frame is interpolated from two adjacent key frames. The experimental results show that the proposed
scheme has better performance than existing 2D-to-3D scheme with fixed key-frame interval.

In free viewpoint video system, the color and the corresponding depth video are utilized to synthesize the virtual views
by depth image based rendering (DIBR) technique. Hence, high quality of depth videos is a prerequisite for high quality
of virtual views. However, depth variation, caused by scene variance and limited depth capturing technologies, may
increase the encoding bitrate of depth videos and decrease the quality of virtual views. To tackle these problems, a depth
preprocess method based on smoothing the texture and abrupt changes of depth videos is proposed to increase the
accuracy of depth videos in this paper. Firstly, a bilateral filter is adopted to smooth the whole depth videos and protect
the edge of depth videos at the same time. Secondly, abrupt variation is detected by a threshold calculated according to
the camera parameter of each video sequence. Holes of virtual views occur when the depth values of left view change
obviously from low to high in horizontal direction or the depth values of right view change obviously from high to low.
So for the left view, depth value difference in left side gradually becomes smaller where it is greater than the thresholds.
And then, in right side of right view is processed likewise. Experimental results show that the proposed method can
averagely reduce the encoding bitrate by 25% while the quality of the synthesized virtual views can be improve by
0.39dB on average compared with using original depth videos. The subjective quality improvement is also achieved.

In multi-view video system, multiple video plus depth is main data format of 3D scene representation. Continuous virtual
views can be generated by using depth image based rendering (DIBR) technique. DIBR process includes geometric
mapping, hole filling and merging. Unique weights, inversely proportional to the distance between the virtual and real
cameras, are used to merge the virtual views. However, the weights might not the optimal ones in terms of virtual view
quality. In this paper, a novel virtual view merging algorithm is proposed. In the proposed algorithm, machine learning
method is utilized to establish an optimal weight model. In the model, color, depth, color gradient and sequence
parameters are taken into consideration. Firstly, we render the same virtual view from left and right views, and select the
training samples by using a threshold. Then, the eigenvalues of the samples are extracted and the optimal merging
weights are calculated as training labels. Finally, support vector classifier (SVC) is adopted to establish the model which
is used for guiding virtual views rendering. Experimental results show that the proposed method can improve the quality
of virtual views for most sequences. Especially, it is effective in the case of large distance between the virtual and real
cameras. And compared to the original method of virtual view synthesis, the proposed method can obtain more than
0.1dB gain for some sequences.

Since stereoscopic images provide observers with both realistic and discomfort viewing experience, it is necessary to
investigate the determinants of visual discomfort. By considering that foreground object draws most attention when
human observing stereoscopic images. This paper proposes a new foreground object based visual comfort assessment
(VCA) metric. In the first place, a suitable segmentation method is applied to disparity map and then the foreground
object is ascertained as the one having the biggest average disparity. In the second place, three visual features being
average disparity, average width and spatial complexity of foreground object are computed from the perspective of
visual attention. Nevertheless, object’s width and complexity do not consistently influence the perception of visual
comfort in comparison with disparity. In accordance with this psychological phenomenon, we divide the whole images
into four categories on the basis of different disparity and width, and exert four different models to more precisely
predict its visual comfort in the third place. Experimental results show that the proposed VCA metric outperformance
other existing metrics and can achieve a high consistency between objective and subjective visual comfort scores. The
Pearson Linear Correlation Coefficient (PLCC) and Spearman Rank Order Correlation Coefficient (SROCC) are over
0.84 and 0.82, respectively.

Abnormal event detection in crowded scenes is one of the most challenging tasks in the video surveillance for the
public security control. Different from previous work based on learning. We proposed an unsupervised Interaction Power
model with an adaptive threshold strategy to detect abnormal group activity by analyzing the steady state of individuals’
behaviors in the crowed scene. Firstly, the optical flow field of the potential pedestrians is only calculated within the
extracted foreground to reduce the computational cost. Secondly, each pedestrian can be divided into patches of the same
size, and the interaction power of the pedestrians will be represented by the motion particles which describe the motion
status at the center pixels of the patches. The motion status of each patch is computed by using the optical flows of the
pixels within the patch. For each motion particle, its interaction power, defined as its steady state of the current behavior,
is computed among all its neighboring motion particles. Finally, the dense crowds’ steady state can be represented as a
collection of motion particles’ interaction power. Here, an adaptive threshold strategy is proposed to detect abnormal
events by examining the frame power field which is a fixed-size random sampling of the interaction power of motion
particles. Experimental results on the standard UMN dataset and online videos show that our method could detect the
crowd anomalies and achieve a higher accuracy compared to the other competitive methods published recently.

The task of salient region detection aims at establishing the most important and informative regions of an image. In this
work, we propose a novel method that tackles such task as a process from superpixel-level locating to pixel-level refining.
Firstly, we over-segment the image into superpixels and compute an affinity matrix to estimate the similarity between
each two superpixels according to both color contrast and space distribution. The matrix is then applied to aggregate
superpixels into several clusters by using affinity propagation. To measure the saliency of each cluster, three parameters
are taken into account including color contrast, cluster compactness and proximity to the focus. We appoint the most
salient one to three clusters as the crude salient region. For the refining step, we regard each selected superpixel as an
influential center. Hence, the saliency value of a pixel is simultaneously determined by all the selected superpixels.
Practically, several Gauss curves are constructed based on the selected superpixels. Pixel-wise saliency value is decided
by the color distinction and spatial distance between one pixel and the curves’ centers. We evaluate our algorithm on the
publicly available dataset with human annotations, and experimental results show that our approach has competitive
performance.

The embryo or germ of a rice seed is growing to the shoot and the root parts of a seedling. In the early stage, the
germinated embryo directly receives food from the endosperm. How healthy of the seedling can be physically predicted
by measuring the areas of the embryo and endosperm. In this work, we show for the first time how the embryo and
endosperm areas of a brown rice can be spatially measured. Our key design is based on the utilization of a tablet
equipped with our lens module for capturing the rice seed image under white light illumination. Our Windows-based
program is developed to analyze and separate the image of the whole brown rice into the embryo and endosperm parts
within 2 seconds per seed. Our tablet-based system is just 30×30×6 cm3
with 1 kilogram in weight, capable to easily
carry to perform in the field.

Venipuncture is the most common way of all invasive medical procedures. A vein display system can make vein access
easier by capturing the vein information and projecting a visible vein image onto the skin, which is correctly aligned with
the subject’s vein. The existing systems achieve correct alignment by the design of coaxial structure. Such a structure
causes complex optical and mechanical design and big physical dimensions inevitably. In this paper, we design a stereovision-
based vein display system, which consists of a pair of cameras, a DLP projector and a near-infrared light source.
We recover the three-dimensional venous structure from image pair acquired from two near-infrared cameras. Then the
vein image from the viewpoint of projector is generated from the three-dimensional venous structure and projected
exactly onto skin by the DLP projector. Since the stereo cameras get the depth information of vessels, the system can
make sure the alignment of projected veins and the real veins without a coaxial structure. The experiment results prove
that we propose a feasible solution for a portable and low-cost vein display device.

With the development of manipulations techniques of digital images, digital image forensic technology is becoming more
and more necessary. However, the determination of processing history of multi-operation is still a challenge problem. In
this paper, we improve the traditional seam insertion algorithm, and propose corresponding detection method. Then an
algorithm that focuses on detecting the processing history of seam insertion and contrast enhancement is proposed, which
can be widely used in practical image forgery. Based on comprehensive analysis, we have discovered the inherent
relationship between seam insertion and contrast enhancement. Different orders of processing make different impacts on
images. By using the newly proposed algorithm, both contrast enhancement followed by seam insertion and seam insertion
followed by contrast enhancement can be detected correctly. Plenty of experiments have been implemented to prove the
accuracy.

Phase contains important information about the diffraction or scattering property of an object, and therefore
the imaging of phase is vital to many applications including biomedicine and metrology, just name a few.
However, due to the limited bandwidth of image sensors, it is not possible to directly detect the phase of an
optical field. Many methods including the Transport of Intensity Equation (TIE) have been well demonstrated
for quantitative and non-interferometric imaging of phase. The TIE offers an experimentally simple technique
for computing phase quantitatively from two or more defocused images. Usually, the defocused images were
experimentally obtained by shifting the camera along the optical axis with slight intervals. Note that light
field imaging has the capability to take an image stack focused at different depths by digital refocusing the
captured light field of a scene. In this paper, we propose to combine Light Field Microscopy and the TIE
method for phase imaging, taking the digital-refocusing advantage of Light Field Microscopy. We demonstrate
the propose technique by simulation results. Compare with the traditional camera-shifting technique, light-field
imaging allows the capturing the defocused images without any mechanical instability and therefore demonstrate
advantage in practical applications.

Microscopic image restoration and reconstruction is a challenging topic in the image processing and computer vision,
which can be widely applied to life science, biology and medicine etc. A microscopic light field creating and three
dimensional (3D) reconstruction method is proposed for transparent or partially transparent microscopic samples, which
is based on the Taylor expansion theorem and polynomial fitting. Firstly the image stack of the specimen is divided into
several groups in an overlapping or non-overlapping way along the optical axis, and the first image of every group is
regarded as reference image. Then different order intensity derivatives are calculated using all the images of every group
and polynomial fitting method based on the assumption that the structure of the specimen contained by the image stack
in a small range along the optical axis are possessed of smooth and linear property. Subsequently, new images located
any position from which to reference image the distance is Δz along the optical axis can be generated by means of
Taylor expansion theorem and the calculated different order intensity derivatives. Finally, the microscopic specimen can
be reconstructed in 3D form using deconvolution technology and all the images including both the observed images and
the generated images. The experimental results show the effectiveness and feasibility of our method.

Capturing four dimensional light field data sequentially using a coded aperture camera is an effective approach but
suffers from low signal noise ratio. Although multiplexing can help raise the acquisition quality, noise is still a big issue
especially for fast acquisition. To address this problem, this paper proposes a noise robust light field reconstruction
method. Firstly, scene dependent noise model is studied and incorporated into the light field reconstruction framework.
Then, we derive an optimization algorithm for the final reconstruction. We build a prototype by hacking an off-the-shelf
camera for data capturing and prove the concept. The effectiveness of this method is validated with experiments on the
real captured data.

Integral Imaging is a technique capable of reproducing a continuous parallax, full-color, continuous point of view,
and real perspectives of the scene.Since the amount of information contained in an element image array(EIA) is far
greater than ordinary image, the storage and transmission caused great difficulties. When the difference between the
depths of most objects in the scene is not great, and the distance from the camera to the objects is not far. For the above
case, this paper proposes a method to compress the element images(EIs) . Since the resolution of each element image(EI)
is small, so the matching displacements of all pixels in one EI are nearly the same. For instance, one Integral Image is
composed of 12×1 EIs, and each resolution of each EI is 20×20. If the matching displacement between adjacent EIs is
5 pixels, then pick out 1 EI from every 4 EIs at the same interval, so we can get 3 EIs. Next, splice the 3 EIs together to
form one image. Processing the remaining EIs in the same way, at last we can get a total of 4 spliced images. Compress
the 4 spliced images with video compression method.

Perceptual stereoscopic image quality assessment (SIQA) aims to use computational models to measure the image
quality in consistent with human visual perception. In this research, we try to simulate monocular and binocular visual
perception, and proposed a monocular-binocular feature fidelity (MBFF) induced index for SIQA. To be more specific,
in the training stage, we learn monocular and binocular dictionaries from the training database, so that the latent response
properties can be represented as a set of basis vectors. In the quality estimation stage, we compute monocular feature
fidelity (MFF) and binocular feature fidelity (BFF) indexes based on the estimated sparse coefficient vectors, and
compute global energy response similarity (GERS) index by considering energy changes. The final quality score is
obtained by incorporating them together. Experimental results on four public 3D image quality assessment databases
demonstrate that in comparison with the most related existing methods, the devised algorithm achieves high consistency
alignment with subjective assessment.

Many computer vision tasks are hindered by image formation itself, a process that is governed by the so-called plenoptic
integral. By averaging light falling into the lens over space, angle, wavelength and time, a great deal of information is
irreversibly lost. The emerging idea of transient imaging operates on a time resolution fast enough to resolve non-stationary
light distributions in real-world scenes. It enables the discrimination of light contributions by the optical path length from
light source to receiver, a dimension unavailable in mainstream imaging to date. Until recently, such measurements used
to require high-end optical equipment and could only be acquired under extremely restricted lab conditions. To address
this challenge, we introduced a family of computational imaging techniques operating on standard time-of-flight image
sensors, for the first time allowing the user to “film” light in flight in an affordable, practical and portable way. Just as
impulse responses have proven a valuable tool in almost every branch of science and engineering, we expect light-in-flight
analysis to impact a wide variety of applications in computer vision and beyond.

This paper is devoted to generating the coordinates of partial 3D points in scene reconstruction via time of flight
(ToF) images. Assuming the camera does not move, only the coordinates of the points in images are accessible.
The exposure time is two trillionths of a second and the synthetic visualization shows that the light moves at half
a trillion frames per second. In global light transport, direct components signify that the light is emitted from
a light point and reflected from a scene point only once. Considering that the camera and source light point are
supposed to be two focuses of an ellipsoid and have a constant distance at a time, we take into account both the
constraints: (1) the distance is the sum of distances which light travels between the two focuses and the scene
point; and (2) the focus of the camera, the scene point and the corresponding image point are in a line. It is
worth mentioning that calibration is necessary to obtain the coordinates of the light point. The calibration can
be done in the next two steps: (1) choose a scene that contains some pairs of points in the same depth, of which
positions are known; and (2) take the positions into the last two constraints and get the coordinates of the light
point. After calculating the coordinates of scene points, MeshLab is used to build the partial scene model. The
proposed approach is favorable to estimate the exact distance between two scene points.

According to specific configurations, three-dimensional (3D) patterning involves both 3D bioimaging and laser
micromachining. Recent advances in bioimaging have witnessed strong interests in the exploration of novel microscopy
methods capable of dynamic imaging of living organisms with high resolution, and large field of view (FOV). For most,
applications of bioimaging should be limited by the tradeoff between the speed, resolution, and FOV in common
techniques, e.g., confocal laser scanning microscopy and two-photon microscopy. However, a recently proposed
temporal focusing (TF) technique, based on spatio/temporal shaping of femtosecond laser pulses, enables depth-resolved
bioimaging in a wide-field illumination. This lecture firstly provides a glimpse into the state-of-the-art progress of
temporal focusing for bioimaging applications. Then we reveal a bizarre point spread function (PSF) of the temporal
focusing system, both experimentally and theoretically. It can be expected that this newly emerged technique will
exhibited new advances in not only 3D nonlinear bioimaging but also femtosecond laser micromachining in the future.

This paper presents a novel method for solving the super-resolution (SR) and enhancement problem of depth maps captured
by the Time-of-Flight (ToF) cameras. Using the registered color images combined with the edge information of original
depth image as a prior, and employ the joint sparse representation model to obtain the common representation coefficients
as another prior, we can get the solution—the high-resolution (HR) depth maps with low noise and accurate values through
the two priors. The results show that our approach possess many advantages compared with the previous state-of-art
methods.

Joint transform correlator (JTC) is a highly efficient way to measure image motion and a hybrid opto-digital JTC
(HODJTC) has been proposed by us in [CHIN. OPT. LETT., Vol. 8, No. 8]. Being different from the traditional JTC,
only one optical Fourier transform is needed and the optically generated joint power spectrum (JPS) is used to compute
the image motion in a digital way. Although a high measurement precision can be obtained through HODJTC, the
defocus will counteract the final result. In this paper, the influence of defocus is analyzed and an improved HODJTC,
whose sensitiveness to defocus is reduced, is proposed. By introducing randomly generated defocus, a series of
cross-correlation peak images is obtained and a subsequent spatial averaging procedure is applied to these images to
generate the final cross-peak image which is used to compute the defocus invariant motion value.

We present an optical image encryption method based on a modified radial shearing interferometer. In our encryption
process, a plaintext image is first encoded into a phase-only mask (POM), and then modulated by a random phase mask
(RPM), the result is regarded as the input of the radial shearing interferometer and divided into two coherent lights, one
of which will be further modulated by a random amplitude mask (RAM). After all, these two coherent lights will
interfere with each other leading to an interferogram, i.e., ciphertext. And the ciphertext can be used to retrieve the
plaintext image with the help of a recursive algorithm and all correct keys. The aforementioned encryption procedure can
be achieved digitally or optically while the decryption process can be analytically accomplished. Numerical simulation is
provided to demonstrate the validity of this method.

In multi-view plus depth (MVD) 3D video coding, texture maps and depth maps are coded jointly. The depth maps
provide the scene geometry information and are used to render the virtual view at the terminal through a
Depth-Image-Based-Rendering (DIBR) technique. The distortion of the coded texture maps and depth maps will induce
synthesized virtual view distortion. Besides the coding efficiency of texture maps and depth maps, bit allocation between
texture maps and depth maps also has a great effect on the virtual view quality. In this paper, the virtual view distortion
is divided into texture maps induced distortion and depth maps induced distortion separately, models of texture maps
induced virtual view distortion and depth maps induced virtual view distortion are derived respectively. Based on the
depth maps induced virtual view distortion model, depth maps coding Rate Distortion Optimization (RDO) is modified
and the depth maps coding efficiency is increased. Meanwhile, we also propose a Rate-distortion (R-D) model to solve
the joint bit allocation problem. Experimental results demonstrate the high accuracy of the proposed virtual view
distortion model. The R-D performance of the proposed algorithm is close to the full search algorithm that can give the
best R-D performance, while the coding complexity of the proposed algorithm is lower. Compared with fixed texture and
depth bits ratio (5:1), an average 0.3 dB gains can be achieved by the proposed algorithm. The proposed algorithm has
high rate control accuracy with the average error less than 1%.

Accurate Point Spread Function (PSF) estimation of coded aperture cameras is a key to deblur defocus images.
There are mainly two kinds of approaches to estimate PSF: blind-deconvolution-based methods, and
measurement-based methods with point light sources. Both these two kinds of methods cannot provide accurate
and convenient PSFs due to the limit of blind deconvolution or imperfection of point light sources. Inaccurate
PSF estimation introduces pseudo-ripple and ringing artifacts which influence the effects of image deconvolution.
In addition, there are many inconvenient situation for the PSF estimation.
This paper proposes a novel method of PSF estimation for coded aperture cameras. It is observed and verified
that the spatially-varying point spread functions are well modeled by the convolution of the aperture pattern
and Gaussian blurring with appropriate scales and bandwidths. We use the coded aperture camera to capture
a point light source to get a rough estimate of the PSF. Then, the PSF estimation method is formulated as the
optimization of scale and bandwidth of Gaussian blurring kernel to fit the coded pattern with the observed PSF.
We also investigate the PSF estimation at arbitrary distance with a few observed PSF kernels, which allows us to
fully characterize the response of coded imaging systems with limited measurements. Experimental results show
that our method is able to accurately estimate PSF kernels, which significantly make the deblurring performance
convenient.

A non-intrusive gesture recognition human-machine interaction system is proposed in this paper. In order to solve the hand positioning problem which is a difficulty in current algorithms, face detection is used for the pre-processing to narrow the search area and find user’s hand quickly and accurately. Hidden Markov Model (HMM) is used for gesture recognition. A certain number of basic gesture units are trained as HMM models. At the same time, an improved 8-direction feature vector is proposed and used to quantify characteristics in order to improve the detection accuracy. The proposed system can be applied in interaction equipments without special training for users, such as household interactive television

Automatic target detection in remote sensing images remains a challenging problem. In this paper, we present
a new oil tank detection method based on salient region and geometric features. Salient region detection and
Otsu threshold are used for image segmentation to get candidate regions effectively, and four geometric features
are employed for reducing the false alarms. Experimental results show that our method can provide a promising
way to detect oil tanks accurately, and it is also robust in complicated conditions such as occlusion, shadow or
deformation.

Major parameters of X-rays camera include spatial resolution, flat field response and dynamic range. Such parameters were calibrated on a pulsed X-rays source with about 0.3MeV energy. Fluorophotometric method was used for the measurement of spatial resolutions of the penetrating lights and reflecting lights. Results indicated they were both basically same. And the spatial resolution of the camera was measured with edge method. Corresponding to 10% intensity, the modulator transfer function (MTS) of the resolution was about 5lp/mm, while the size of the point spread function (PSF) was about 0.8mm. Due to the system design with both short distance and big filed of view, the flat field non-homogeneity was about 15%. In addition, because of the relatively big gain of the scintillator and MCP image intensifier and the limited detecting efficiency of the X-rays and scintillator, the image intensity of the flat field response demonstrated a big standard deviation of about 1375. Due to the crosstalk throughout the system, the maximal signal-to-noise ratio (SNR) of the X-rays camera was about 10:1.These results could provide important technical specifications for both applications of X-rays camera and data processing of other relevant images.

This paper presents a novel example-based super-resolution (SR) algorithm with improved k-means cluster. In this
algorithm, genetic k-means (GKM) with hybrid particle swarm optimization (HPSO) is employed to improve the
reconstruction of high-resolution (HR) images, and a pre-processing of classification in frequency is used to accelerate
the procedure. Self-redundancy across different scales of a natural image is also utilized to build attached training set to
expand example-based information. Meanwhile, a reconstruction algorithm based on hybrid supervise locally linear
embedding (HSLLE) is proposed which uses training sets, high-resolution images and self-redundancy across different
scales of a natural image. Experimental results show that patches are classified rapidly in training set processing session
and the runtime of reconstruction is half of traditional algorithm at least in super-resolution session. And clustering and
attached training set lead to a better recovery of low-resolution (LR) image.

Visual dictionary learning as a crucial task of image representation has gained increasing attention. Specifically,
sparse coding is widely used due to its intrinsic advantage. In this paper, we propose a novel heterogeneous
latent semantic sparse coding model. The central idea is to bridge heterogeneous modalities by capturing their
common sparse latent semantic structure so that the learned visual dictionary is able to describe both the
visual and textual properties of training data. Experiments on both image categorization and retrieval tasks
demonstrate that our model shows superior performance over several recent methods such as K-means and Sparse
Coding.

Iterated Function System (IFS) has been used to generate fractal graphics and fractal Chinese characters. A fractal
Chinese character magnification method is proposed in this paper to zoom in on arbitrarily selected areas within a fractal
Chinese character. For any selected area, a geometric transform is done to make the selected area occupy the full display
area. The mapping coefficients of the IFS for the Chinese character are modified such that the fractal pattern of the
Chinese character in the selected area can be just shown in the full display area. The experimental results demonstrate
that details are shown clearly with the magnification factor being more than 10000.

Traditional forward view synthesis prediction enables the efficient use of depth to provide synthesized frames for texture
reference in non-base layers. But asserted drawbacks of high complexity that results from edge detection, hole-filling, up
sampling and down sampling in forward warping technique compromise the positive performance. Hence, backward
view synthesis prediction is proposed to remove these drawbacks while maintaining the performance. However, fixed
depth block used in backward view synthesis prediction limits the performance gain and the number of motion
compensation operations, which is a requisite concern of complexity analysis. In this paper, a block based BVSP for
inter-layer prediction with only high-level syntax changes is implemented and an adaptive depth block size selection
method is proposed. The experimental results show that an average gain of 3.5% bitrate reduction was achieved and after
enabling adaptive depth block size selection, this performance gain is relatively maintained while the number of motion
compensation operations was reduced to a designated level.

Most of the information of optical wavefront is encoded in the phase which includes more details of the object.
Conventional optical measuring apparatus is relatively easy to record the intensity of light, but can not measure the phase
of light directly. Thus it is important to recovery the phase from the intensity measurements of the object. In recent years,
the methods based on quadratic programming such as PhaseLift and PhaseCut can recover the phase of general signal
exactly for overdetermined system. To retrieve the phase of sparse signal, the Compressive Phase Retrieval (CPR)
algorithm combines the l1-minimization in Compressive Sensing (CS) with low-rank matrix completion problem in
PhaseLift, but the result is unsatisfied. This paper focus on the recovery of the phase of sparse signal and propose a new
method called the Compressive Phase Cut Retrieval (CPCR) by combining the CPR algorithm with the PhaseCut
algorithm. To ensure the sparsity of the recovered signal, we use CPR method to solve a semi-definite programming
problem firstly. Then apply linear transformation to the recovered signal, and set the phase of the result as the initial
value of the PhaseCut problem. We use TFOCS (a library of Matlab-files) to implement the proposed CPCR algorithm in
order to improve the recovered results of the CPR algorithm. Experimental results show that the proposed method can
improve the accuracy of the CPR algorithm, and overcome the shortcoming of the PhaseCut method that it can not
recover the sparse signal effectively.

Current research on scannerless three dimensional imaging LiDAR mainly focus on the phase scannerless imaging
LiDAR, the multiple-slit streak tube imaging lidar and the flash LiDAR. But there are the disadvantages, such as short
detection range, the complicated structure of vacuum unit and lacking the grayscale images corresponding to the three
kinds of LiDAR listed above. In this paper we develop a novel 3D imaging LiDAR that works in the way of pushbroom. It
converts the time of flight (TOF) into the space with digital mirror device (DMD). When pulse arrives at the DMD, the
micromirrors are shifting from a status to another. Because the TOFs of pulses hit on different targets are different, there
will be the streak on the focal plane array (FPA) of the sensor, which shows the relative position. The relative position of
the streak can be used to reconstruct the range profile of the target. Compared with other three dimensional imaging
method, this new method has the advantages of high rate imaging, large field of view, simple structure and small size. First,
this article introduces the theory of digital micromirror laser 3D imaging LiDAR, and then it analyses the technical indicator
of the core component. At last, it gives the process of computing the detection range, theoretically demonstrating the
feasibility of this technology.

The extraction of discriminative and robust feature is a crucial issue in pattern recognition and classification. In this
paper, we propose a kernel based discriminant image filter learning method (KDIFL) for local feature enhancement and
demonstrate its superiority in the application of face recognition. Instead of designing the image filter in a handcraft or
analytical way, we propose to learn the image filter so that after filtering the between-class difference is attenuated and
the within-class difference is amplified, thus facilitate the following recognition. During filter learning, the kernel trick is
employed to cope with the nonlinear feature space problem caused by expression, pose, illumination, and so on. We
show that the proposed filter is generalized and it can be concatenated with classic feature descriptors (e.g. LBP) to
further increase the discriminability of extracted features. Our extensive experiments on Yale, ORL and AR face
databases validate the effectiveness and robustness of the proposed method.

In this paper, we propose a novel method to recognize human actions using 3D human skeleton joint points. First,
we represent a skeleton pose by a feature vector with three descriptors: limb orientation, joint motion orientation
and body part relation. Then, we mine discriminative local basic motions based on the sequences of feature
vectors. These local basic motions contain the discriminative motions of key joints and can well represent human
actions. Experiments conducted on MSR Action3D Dataset and MSR Daily Activity3D Dataset demonstrate
the effectiveness of the proposed algorithm and a superior performance over the state-of-the-art techniques.

Facial landmark localization is a crucial step in many facial image analysis applications. In this paper, we propose a
combined ASEF (the average of synthetic exact filter) and pictorial structure method for facial landmark detection. First,
the local-maximums of the ASEF response image for each landmark are extracted as candidates. Then, the ASEF
response of candidates for each landmark and their relative positions are evaluated by the pictorial structure model.
Finally, the combination of candidates with highest score is selected as the final detection result. We show that by
introducing the position constraint to ASEF, the detection accuracy can be highly improved. The experimental results on
the BioID dataset verify the efficiency and accuracy of proposed method.

Among all the existing segmentation techniques, the thresholding technique is one of the most popular due to its
simplicity, robustness, and accuracy (e.g. the maximum entropy method, Otsu’s method, and K-means clustering).
However, the computation time of these algorithms grows exponentially with the number of thresholds due to their
exhaustive searching strategy. As a population-based optimization algorithm, differential algorithm (DE) uses a
population of potential solutions and decision-making processes. It has shown considerable success in solving complex
optimization problems within a reasonable time limit. Thus, applying this method into segmentation algorithm should be
a good choice during to its fast computational ability. In this paper, we first propose a new differential algorithm with a
balance strategy, which seeks a balance between the exploration of new regions and the exploitation of the already
sampled regions. Then, we apply the new DE into the traditional Otsu’s method to shorten the computation time.
Experimental results of the new algorithm on a variety of images show that, compared with the EA-based thresholding
methods, the proposed DE algorithm gets more effective and efficient results. It also shortens the computation time of
the traditional Otsu method.

In this paper a pBRDF (polarimetric Bidirectional Reflectance Distribution Function) model of painted surfaces coupled
with atmospheric polarization characteristics is built and the method of simulating polarimetric radiation reaching the
imaging system is advanced. Firstly, the composition of the radiation reaching the sensor is analyzed. Then, the pBRDF
model of painted surfaces is developed according to the microfacet theory presented by G. Priest and the downwelled
skylight polarization is modeled based on the vector radiative transfer model RT3. Furthermore, the modeled
polarization state of reflected light from the surfaces was achieved through integrating the directional polarimetric
information of the whole hemisphere, adding the modeled polarimetric factors of incident diffused skylight. Finally, the
polarimetric radiance reaching the sensor is summed up with the assumption that the target-sensor path is assumed to be
negligible since it is relatively short in the current imaging geometry. The modeled results are related to the solar-sensor
geometry, atmospheric conditions and the features of the painted surfaces. This result can be used to simulate the
imaging under different weather conditions and further work for the validation experiments of the model need to be done.

A three-dimensional shape measurement system based on fiber-optic image bundles was proposed to measure
three-dimensional shape of object in confined space. Fiber-optic image bundles have the advantage of flexibility.
Firstly, based on the principle of phase-shifting and advantages of fiber-optic image bundles, the mathematical
model of the measurement system was established, hardware and software platform of the system was set up. Then,
the problems of calibration and poor quality images brought by fiber-optic image bundles were analyzed, after
which a viable solution was proposed. Finally, experiments for objects in confined space were performed by using
the three-dimensional shape measurement system. As the transmission media of the system, fiber-optic image
bundles could achieve picture’s flexible acquisition and projection. The three-dimensional shape of the object was
reconstructed after data processing of images. Experimental results indicated that the system was miniature and
flexible enough to measure the three-dimensional shape of objects in confined space. It expanded the application
range of structured-light three-dimensional shape measurement technique.

This paper proposes an efficient fusion method for multiple remote sensing images based on sparse representation, in
which we mainly solve the fusion rules of the sparse coefficients. In the proposed fusion method, first is to obtain the
sparse coefficients of different source images based on three dictionaries. Considering the sparsity, the source
coefficients can be divided into large, middle, and small correlation classer. According to the analysis and comparison of
permutations, the final coefficients are fused in the term of different fusion rules according to the correlation. Finally, the
fused image can be reconstructed via combining the fused coefficients and trained dictionaries.

Multi-projector three dimension display is a promising multi-view glass-free three dimension (3D) display technology,
can produce full colour high definition 3D images on its screen. One key problem of multi-projector 3D display is how to
acquire the source images of projector array while avoiding pseudoscopic problem. This paper analysis the displaying
characteristics of multi-projector 3D display first and then propose a projector content synthetic method using tetrahedral
transform. A 3D video format that based on stereo image pair and associated disparity map is presented, it is well suit for
any type of multi-projector 3D display and has advantage in saving storage usage. Experiment results show that our method
solved the pseudoscopic problem.

We propose a novel multiple object tracking algorithm in a particle filter framework, where the input is a set of candidate
regions obtained from Robust Principle Component Analysis (RPCA) in each frame, and the goals is to recover
trajectories of objects over time. Our method adapts to the changing appearance of objects, due to occlusion, illumination
changes and large pose variations, by incorporating a l1 minimization-based appearance model into the Maximize A
Posterior (MAP) inference. Though L1 trackers have showed impressive tracking accuracy, they are computationally
demanding for multiple object tracking. Conventional data association methods using simple nonparametric appearance
model, such as histogram-based descriptor, may suffer from drastic changing object appearance. The robust tracking
performance of our approach has been validated with a comprehensive evaluation involving several challenging
sequences and state-of-the-art multiple object trackers.

At present PASI system of scoring is used for evaluating erythema severity, which can help doctors to diagnose psoriasis
[1-3]. The system relies on the subjective judge of doctors, where the accuracy and stability cannot be guaranteed [4].
This paper proposes a stable and precise algorithm for erythema severity estimation. Our contributions are twofold. On
one hand, in order to extract the multi-scale redness of erythema, we design the hierarchical feature. Different from
traditional methods, we not only utilize the color statistical features, but also divide the detect window into small window
and extract hierarchical features. Further, a feature re-ranking step is introduced, which can guarantee that extracted
features are irrelevant to each other. On the other hand, an adaptive boosting classifier is applied for further feature
selection. During the step of training, the classifier will seek out the most valuable feature for evaluating erythema
severity, due to its strong learning ability. Experimental results demonstrate the high precision and robustness of our
algorithm. The accuracy is 80.1% on the dataset which comprise 116 patients’ images with various kinds of erythema.
Now our system has been applied for erythema medical efficacy evaluation in Union Hosp, China.

The automatic segmentation of psoriatic lesions is widely researched these years. It is an important step
in Computer-aid methods of calculating PASI for estimation of lesions. Currently those algorithms can
only handle single erythema or only deal with scaling segmentation. In practice, scaling and erythema
are often mixed together. In order to get the segmentation of lesions area，this paper proposes an
algorithm based on Random forests with color and texture features. The algorithm has three steps. The
first step, the polarized light is applied based on the skin’s Tyndall-effect in the imaging to eliminate
the reflection and Lab color space are used for fitting the human perception. The second step, sliding
window and its sub windows are used to get textural feature and color feature. In this step, a feature of
image roughness has been defined, so that scaling can be easily separated from normal skin. In the end,
Random forests will be used to ensure the generalization ability of the algorithm. This algorithm can
give reliable segmentation results even the image has different lighting conditions, skin types. In the
data set offered by Union Hospital, more than 90% images can be segmented accurately.

We have developed a whole set of range gated laser imaging system with ~3km maximum acquisition distance, the
system uses a Nd:YAG electro-optical Q-switched 532nm laser as transmitter, a double micro channel plate as gated
sensor, all the components are controlled by the a trigger control unit with accuracy of subnanosecond. A imaging
scheme is designed for imaging the large building ~500m away, and a sequence of images are obtained in the
experiment, which are the basic data for 3D reconstruction; to improve the range resolution, we study the temporal
distribution of intensity of the received signal, and use centroid algorithm for data processing. We compare the 3D image
with the theoretical model, and the results are corresponding.

A no-reference image quality assessment method for super-resolution reconstruction is proposed. The basic idea is to
perform a contourlet multiscale decomposition of low resolution image and reconstructed super resolution image first.
According to the relativity of the contourlet coefficient, the reconstructed image is divided into sharp edges, image
texture and flat region. Then, calculate the ringing intensity index of sharp edges, the blur extent index of the image
texture and the directional entropy index of the high frequency components. Finally, the result to evaluate the
reconstructed image quality is obtained by integrated these indexes into one total image quality index. Several
experimental results using simulated images demonstrate the new index is efficient and stable for evaluating the quality
of the reconstructed super-resolution image. It performs well in accordance with human subjective vision.

High-resolution real-time three-dimensional imaging is important in 3D video surveillance, robot vision, and
automatic navigation. In this paper, a three-dimensional superresolution range-gated imaging based on inter-frame
correlation is proposed to realize high-resolution real-time 3D imaging. In this method, a CCD/CMOS with a gated
image intensifier is used as image sensor, and depth information collapsed in 2D images is reconstructed by
spatial-temporal inter-frame correlation with a resolution of about 1000×1000 full-frame pixels within a frame.
Furthermore, under inter-frame correlation a 3D point cloud frame is generated at video rates corresponding to
CCD/CMOS utilized. Finally, some proof simulation experiments are demonstrated.

Two new sequential search algorithms for feature selection in hyperspectral remote sensing images are proposed. Since
many wavebands in hyperspectral images are redundant and irrelevant, the use of feature selection to improve
classification results is highly needed. First, we present a new generalized steepest ascent (GSA) feature selection
technique that improves upon the prior steepest ascent algorithm by selecting a better starting search point and
performing a more thorough search. It is guaranteed to provide solutions that equal or exceed those of the classical
sequential forward floating selection algorithm. However, when the number of available wavebands is large, the
computational load required for the GSA algorithm becomes excessive. We thus propose a modification of the improved
floating forward selection algorithm which is more computationally efficient. Experimental results for two hyperspectral
data sets show that our proposed algorithms yield better classification results than other suboptimal search algorithms.

Through analyzing the characteristic of high resolution image obtained by high resolution sensor when the size of sensor
is fixed, a new fast feature point detecting method is put forward. Firstly, detect effective points by sampling in fixed
step, which are used to filter to get extreme feature points, and realize the extraction process of extreme feature points
simplified, then take points with neighborhood domain features as the description of effective points, and obtain extreme
feature points through the preset threshold calculation, finally, obtain correct feature points by filtering. At last the effect
of the extraction method was validated by the image matching result. And the matching result shows that image’s
features extracted by this method could ensure the precision and decrease the computation at the same time.

In this paper, we propose a method of feature matching based asymmetric three-dimensional (3D) image coding with
hierarchical reconstruction quality. At the encoder, for the main view the standard intra coding can be applied to obtain
high reconstruction quality while for its neighboring views extracted feature descriptors can be utilized to calculate
transformation matrix between views. The parameters of transformation matrix can be transmitted by very low bit rate
and achieve the preliminary reconstruction. Furthermore, the residues can be exploited to improve the performance. The
experimental results have shown that the proposed scheme can reach a very high compression ratio.

Underwater laser imaging is of great significance in underwater search and marine science, etc. However, traditional
underwater laser imaging is often of poor quality with noises and blurs, moreover, the resolution of the image is also
low. In order to obtain clear underwater images with high resolution and quality, here, we have designed a range gated
imaging underwater imaging system and realized an image restoration approach. In this paper, based on the introduction
to the imaging system and image restoration algorithm, the experiment is established by setting the imaging system
under water in the lake to capture the underwater targets. With the proposed underwater image restoration approach,
images of high quality could be retrieved which proves that the method is able to identify the target ~10 meters away
underwater.

This paper presents a High Dynamic Range algorithm based on HSI color space. To keep hue and saturation of original
image and conform to human eye vision effect is the first problem, convert the input image data to HSI color space
which include intensity dimensionality. To raise the speed of the algorithm is the second problem, use integral image
figure out the average of every pixel intensity value under a certain scale, as local intensity component of the image, and
figure out detail intensity component. To adjust the overall image intensity is the third problem, we can get an S type
curve according to the original image information, adjust the local intensity component according to the S type curve. To
enhance detail information is the fourth problem, adjust the detail intensity component according to the curve designed in
advance. The weighted sum of local intensity component after adjusted and detail intensity component after adjusted is
final intensity. Converting synthetic intensity and other two dimensionality to output color space can get final processed
image.

In this work, we present a model to calculate the electric amplitude and phase field distribution of single nanoparticle by
using finite-difference time-domain (FDTD) method. We model the light-nanoparticle interaction by using a liner
polarization light to illuminate the single nanoparticle through immersion oil and glass substrate. The illumination is set
as a cone of plane waves limited by the aperture of the objective. The scattering field summarized on a single detector is
amplified by heterodyne interference with a reference light. The amplitude and phase distribution of particles with
different diameters ranging from 50 nm to 2 micron are calculated.

Object recognition has wide applications in the area of human-machine interaction and multimedia retrieval.
However, due to the problem of visual polysemous and concept polymorphism, it is still a great challenge to
obtain reliable recognition result for the 2D images. Recently, with the emergence and easy availability of
RGB-D equipment such as Kinect, this challenge could be relieved because the depth channel could bring more
information. A very special and important case of object recognition is hand-held object recognition, as hand is
a straight and natural way for both human-human interaction and human-machine interaction. In this paper,
we study the problem of 3D object recognition by combining heterogenous features with different modalities
and extraction techniques. For hand-craft feature, although it reserves the low-level information such as shape
and color, it has shown weakness in representing hiconvolutionalgh-level semantic information compared with
the automatic learned feature, especially deep feature. Deep feature has shown its great advantages in large
scale dataset recognition but is not always robust to rotation or scale variance compared with hand-craft feature.
In this paper, we propose a method to combine hand-craft point cloud features and deep learned features in
RGB and depth channle. First, hand-held object segmentation is implemented by using depth cues and human
skeleton information. Second, we combine the extracted hetegerogenous 3D features in different stages using
linear concatenation and multiple kernel learning (MKL). Then a training model is used to recognize 3D handheld
objects. Experimental results validate the effectiveness and gerneralization ability of the proposed method.

According to the adjustability of the gain multiplier of Electron Multiplying CCD , an image gain adjustment method
based on dynamic gray-level is proposed. Compared to a fixed value adjustment algorithm, the automatic gain
algorithm here is more adaptive,even in low-light conditions , it can achieve better gain values. Experimental results
show that the automatic gain algorithm which combines mean values with the dynamic range of histograms meets the
requirements. Whether it is during the day or at night , the brightness of image can quickly converge to the optimum
range of gray histogram distribution, gray-level dynamic range is also accounted for more than 90% . Judging from the
images obtained: the brightness is moderate, details are clear .

By the success of compressive sensing (CS), coded aperture snapshot spectral imager (CASSI) computationally
obtains 3D spectral images from 2D compressive measurement. In CASSI, each pixel of the detector captures
spectral information only from one voxel in each band with binary weights (i.e., 0 or 1), which limits the variety
of superposition relationship among the 3D voxels in the underlying scene. Moreover, the correspondence of each
pixel of detector to each pixel of coded aperture cannot be readily achieved in the presence of dispersive prism,
due to the small pixel sizes of these elements (often in micrometer). In this paper, we propose a flexible design to
improve the performance of CASSI with currently employed optical elements in CASSI. Specifically, the proposed
design integrates a kind of flexible alignment relationship along the coded aperture, the dispersive prism and
the detector. Each measurement of the detector is manifested as the summation of several voxels in each band
with random decimal weights and different measurements corresponds to overlapped voxels, which provides more
sufficient superposition relationship of the scene information. This flexible design favors the sensing mechanism
better satisfy the requirement of CS theory. Furthermore, the proposed design greatly reduces the alignment
complexity and burden of system construction. Preliminary result achieves improved image quality, including
higher PSNR and better perceptual effect, compared to the traditional design.

Hole filling of depth maps is a core technology of the Kinect based visual system. In this paper, we
propose a hole filling algorithm for Kinect depth maps based on separately repairing of the foreground
and background. There are two-part processing in the proposed algorithm. Firstly, a fast pre-processing
to the Kinect depth map holes is performed. In this part, we fill the background holes of Kinect depth
maps with the deepest depth image which is constructed by combining the spatio-temporal information
of the pixels in Kinect depth map with the corresponding color information in the Kinect color image.
The second step is the enhancement for the pre-processing depth maps. We propose a depth
enhancement algorithm based on the joint information of geometry and color. Since the geometry
information is more robust than the color, we correct the depth by affine transform in prior to utilizing
the color cues. Then we determine the filter parameters adaptively based on the local features of the
color image which solves the texture copy problem and protects the fine structures. Since L1 norm
optimization is more robust to data outliers than L2 norm optimization, we force the filtered value to be
the solution for L1 norm optimization. Experimental results show that the proposed algorithm can
protect the intact foreground depth, improve the accuracy of depth at object edges, and eliminate the
flashing phenomenon of depth at objects edges. In addition, the proposed algorithm can effectively fill
the big depth map holes generated by optical reflection.

Local structure, e.g., local binary pattern (LBP), is widely used in texture classification. However, LBP is too
sensitive to disturbance. In this paper, we introduce a novel structure for texture classification. Researches
on cognitive neuroscience indicate that the primary visual cortex presents remarkable orientation selectivity for
visual information extraction. Inspired by this, we investigate the orientation similarities among neighbor pixels,
and propose an orientation selectivity based pattern for local structure description. Experimental results on
texture classification demonstrate that the proposed structure descriptor is quite robust to disturbance.

Video monitoring system (VMS) has been extensively applied in domains of target recognition, traffic management, remote sensing, auto navigation and national defence. However the VMS has a strong dependence on the weather, for instance, in foggy weather, the quality of images received by the VMS are distinct degraded and the effective range of VMS is also decreased. All in all, the VMS performs terribly in bad weather. Thus the research of fog degraded images enhancement has very high theoretical and practical application value. A design scheme of a fog degraded images enhancement system based on the TI DaVinci processor is presented in this paper. The main function of the referred system is to extract and digital cameras capture images and execute image enhancement processing to obtain a clear image. The processor used in this system is the dual core TI DaVinci DM6467T（ARM@500MHz+DSP@1GH. A MontaVista Linux operating system is running on the ARM subsystem which handles I/O and application processing. The DSP handles signal processing and the results are available to the ARM subsystem in shared memory.The system benefits from the DaVinci processor so that, with lower power cost and smaller volume, it provides the equivalent image processing capability of a X86 computer. The outcome shows that the system in this paper can process images at 25 frames per second on D1 resolution.

To make a view perspective cue emerging in reconstructed images, a new approach is proposed by incorporating
virtual variable-focal-length lenses into computer generated Fourier hologram (CGFH). This approach is based
on a combination of monocular vision principle and digital hologram display, thus it owns properties coming from
the two display models simultaneously. Therefore, it can overcome the drawback of the unsatisfied visual depth
perception of the reconstructed three-dimensional (3D) images in holographic projection display (HPD). Firstly,
an analysis on characteristics of conventional CGFH reconstruction is made, which indicates that a finite depthof-
focus and a non-adjustable lateral magnification are reasons of the depth information lack on a fixed image
plane. Secondly, the principle of controlling lateral magnification in wave-front reconstructions by virtual lenses
is demonstrated. And the relation model is deduced, involving the depth of object, the parameters of virtual
lenses, and the lateral magnification. Next, the focal-lengths of virtual lenses are determined by considering
perspective distortion of human vision. After employing virtual lenses in the CGFH, the reconstructed image
on focal-plane can deliver the same depth cues as that of the monocular stereoscopic image. Finally, the depthof-
focus enhancement produced by a virtual lens and the effect on the reconstruction quality from the virtual
lens are described. Numerical simulation and electro-optical reconstruction experimental results prove that the
proposed algorithm can improve the depth perception of the reconstructed 3D image in HPD. The proposed
method provides a possibility of uniting multiple display models to enhance 3D display performance and viewer
experience.

Real images usually have two layers, namely, cartoons(the piece-wise smooth part of image) and textures(the oscillating
pattern part of the image). In this paper, we solve the challenging image deconvolution problems by using variation
image decomposition method which can regularize the cartoon with total variation and texture in G space respectively.
Different from existing schemes in the literature which can only recover the smooth structure of the image, our
deconvolution method can not only restore the smooth part of image but also recover the detailed oscillating part of the
image. Numerical simulation examples are given to demonstrate the applicability and usefulness of our proposed
algorithms in image deconvolution.

A new multiview just-noticeable-depth-difference(MJNDD) Model is presented and applied to compress the joint
multiview video plus depth. Many video coding algorithms remove spatial and temporal redundancies and statistical
redundancies but they are not capable of removing the perceptual redundancies. Since the final receptor of video is the
human eyes, we can remove the perception redundancy to gain higher compression efficiency according to the properties
of human visual system (HVS). Traditional just-noticeable-distortion (JND) model in pixel domain contains luminance
contrast and spatial-temporal masking effects, which describes the perception redundancy quantitatively. Whereas HVS
is very sensitive to depth information, a new multiview-just-noticeable-depth-difference(MJNDD) model is proposed by
combining traditional JND model with just-noticeable-depth-difference (JNDD) model. The texture video is divided into
background and foreground areas using depth information. Then different JND threshold values are assigned to these two
parts. Later the MJNDD model is utilized to encode the texture video on JMVC. When encoding the depth video, JNDD
model is applied to remove the block artifacts and protect the edges. Then we use VSRS3.5 (View Synthesis Reference
Software) to generate the intermediate views. Experimental results show that our model can endure more noise and the
compression efficiency is improved by 25.29 percent at average and by 54.06 percent at most compared to JMVC while
maintaining the subject quality. Hence it can gain high compress ratio and low bit rate.

The ballistic missile hyperspectral data of imaging spectrometer from the near-space platform are generated by
numerical method. The characteristic of the ballistic missile hyperspectral data is extracted and matched based on two
different kinds of algorithms, which called transverse counting and quantization coding, respectively. The simulation
results show that two algorithms extract the characteristic of ballistic missile adequately and accurately. The algorithm
based on the transverse counting has the low complexity and can be implemented easily compared to the algorithm based
on the quantization coding does. The transverse counting algorithm also shows the good immunity to the disturbance
signals and speed up the matching and recognition of subsequent targets.

In this paper, an ultrasonic televiewer image encoding method based on block prediction is proposed. The original image
is divided into blocks of 8-by-8 pixels. The current block to be encoded is predicted from previously encoded and
decoded blocks. The prediction mode that minimizes the differences between the original and predicted is chosen from 9
modes. The prediction difference block is transformed with Discrete Cosine Transform (DCT), and the DCT coefficients
are quantized and encoded with lossless algorithm. The prediction modes selected are also encoded. Experimental results
show that the performance of the proposed method is much better than JPEG.

New technologies such as multi-dimension, multi-components and high precision methods adopted in seismic
exploration make seismic exploration data increase explosively. Large volume seismic data results in serious problems in
transmission, storage and processing of the data. In this paper a seismic data compression method based on wavelet
transform is proposed. The original data is decomposed into 12 detail sub-bands and 1 low-resolution sub-band with 2-
dimensional discrete wavelet transform. The wavelet coefficients of seismic data are encoded with embedded zero-tree
wavelet coding algorithm. Experimental results show that the proposed method is capable of efficient compression.

A design method for the distortionless catadioptric panoramic imaging system is proposed in this paper. The panoramic
system mainly consists of two parts, a reflecting surface system with relay lens and a CCD camera. A mapping
relationship between the real image plane and the projection surface is established to acquires low distorted imaging
features easily. And the design of freeform surface is applied to the reflecting surfaces to correct distortion. After
iteratively optimize the freeform surfaces, the image quality is gradually improved. The simulation results show that
compared with traditional system, the new freeform surface system has simple design, attaining higher performance and
has the advantage of small scene distortion and making the image more suitable and convenient for observing.

Predictive Lossy Compression has been found to be an interesting alternative to conventional transform coding
techniques in multispectral image compression. Recently, High Efficiency Video Coding (HEVC) standard has shown
significant improvement over state of the art transformation based still-image coding standard. In this paper we study the
properties of multispectral image and propose a predictive lossy compression scheme based on HEVC. Empirical
analysis shows that our proposed method is superior to the existing state of the art predictive lossy compression schemes.

Based on the characteristics of a 0.5'' micro AM-OLED and the binocular parallax principle of human being, a HD stereo
display system was designed using hardware platform of ARM11 and embedded Linux as the operating system. System
used S3C6410 as the MCU. Side-by-Side or Top-and-Bottom 3D video source, which inputted from the HDMI or SD card,
was converted to the Frame Timing Mode and Field Timing Mode video format, which processed through the video coding
algorithm. At the same time, the outputting 3D synchronous signal controlled the left and right AM-OLED to receive
corresponding parallactic images. HD stereo video sources achieved an improvement effect on the dual AM-OLED after
the optical system amplified, which presented an image distance equivalent to the human eyes 2.5 meters, the diagonal
dimension of 46 feet natural lifelike scene in front of the user. Combined synchronous signal with Frame Timing Mode and
Field Timing Mode, the HD binocular stereo system displayed a preferable result for the customs.

In video retargeting, how to assess the performance in maintaining temporal coherence has become the prominent
challenge. In this paper, we will present a new objective measurement to assess temporal coherence after video
retargeting. It’s a general metric to assess jittery artifact for both discrete and continuous video retargeting
methods, the accuracy of which is verified by psycho-visual tests. As a result, our proposed assessment method
possesses huge practical significance.

Gerchberg–Saxton-type (GS-type) algorithms have been widely applied in photonics to reconstruct the object structures.
However, using random guesses as the initial inputs, the reconstruction quality of GS-type algorithms is unpredictable.
And, it always leads to a large number of iterations to reach convergence. In this paper, a singular value decomposition
(SVD) based method is proposed to generate an effective phase guess for GS-type algorithms using a low rank
approximation. Experimental results demonstrate that under the same reconstruction error, the proposed SVD based
guesses reduce the iteration times by more than 50% on average compared with that of random guesses. Furthermore,
they can outperform random guesses both in terms of steady state error and iteration times. Compared with the average
performance of random guesses, the proposed approach reduces the steady state error of recovered images by 70.7% on
average and reduces the iteration times by 56.1% on average.

Grey world algorithm is a simple but widely used global white balance method for color cast images. However,
this algorithm only assumes that the mean values of the R, G, and B components tend to be equal, which may
lead to false alarms in some normal images with large areas of single color background, for example, images in
ocean background. Another defect is that grey world algorithm may cause luminance variations in the channels
having no cast. We note that though different in mean values, standard deviations of the three channels are
supposed to converge in color cast images, which is not suitable for those false alarms. Based on this discrepancy,
through a mathematical manipulation both on mean values and standard deviations of the three channels, a novel
color correction model is proposed by weighting the gain coefficients in grey world model. All the three weighted
gain coefficients in the proposed model tend to be 1 on images containing large single color regions so as to
avoid false alarms. For the color cast images, the channel existing color cast is given a weighted gain coefficient
much less than 1 to correct color cast, while the other two channels are distributed weighted gain coefficients
approximately equal to 1 thus to ensure that the proposed model has little negative effects on channels with no
color cast. Experiments show that our model presents better performance in color correction.

Spectral video is crucial for monitoring of dynamic scenes, reconnaissance of moving targets, observation and tracking
of living cells, etc. The traditional spectral imaging methods need multiple exposures to capture a full frame spectral
image, which leads to a low temporal resolution and thus lose their value as spectral video. The new code aperture
snapshot spectral imaging (CASSI) method has been emerging in recent years, which is suitable for spectral video
acquisition, due to its high-speed snapshot and few-amount measurements. Based on the CASSI, this paper proposes a
compressive spectral video acquisition method with double-channel complementary coded aperture. The method can
achieve the spectral video with a high temporal resolution by directly sampling the 3D spectral scene with 2D array
sensor in only one snapshot. Furthermore, by using the double-channel complementary coded aperture in compressive
measurement and the sparse regularity in the optimization recovery together, we can obtain the higher PSNR and better
visual effects compared with the single-channel CASSI. Simulation results demonstrate the efficacy of the proposed
method.

Detecting aircrafts is important in the field of remote sensing. In past decades, researchers used various approaches
to detect aircrafts based on classifiers for overall aircrafts. However, with the development of high-resolution
images, the internal structures of aircrafts should also be taken into consideration now. To address this issue, a
novel aircrafts detection method for satellite images based on probabilistic topic model is presented. We model
aircrafts as the connected structural elements rather than features. The proposed method contains two major
steps: 1) Use Cascade-Adaboost classier to identify the structural elements of aircraft firstly. 2) Connect these
structural elements to aircrafts, where the relationships between elements are estimated by hierarchical topic
model. The model places strict spatial constraints on structural elements which can identify differences between
similar features. The experimental results demonstrate the effectiveness of the approach.

Stereo vision is a hot research topic in the field of computer vision and 3D video display.Disparity map is one of the
most crucial steps. A novel constant computational complexity algorithm based on separable successive weight
summation (SWS) is presented. The proposed algorithm eliminates iteration and support area independently, which saves
computation and memory space .The similar measure of gradient is also applied to improve the original algorithm. Image
segmentation and edge detection is used for the stereo matching to accelerate the speed and improve the accuracy of
matching algorithm.The image of edge is extracted to reduce the search scope for the stereo matching algorithm. Dense
disparity map was obtained through local optimization.Experimental results show that the algorithm is efficient and can
well reduce the matching noise and improve the matching precision in depth discontinuities and low-texture region.

Fourier ptychographic microscopy (FPM) is a recently developed imaging method, which stitches together a sequence of
low-resolution images in Fourier space in the iterative manner. However, the high-resolution color image super-resolved
by this method always has problems of dispersion when compared to the high-resolution image observed under highmagnification
lens. In this paper, we propose a new method for super-resolving multi-channel images. Instead of simply
applying the FPM algorithm to RGB channels respectively, the method considers the relationship among each channel,
which is employed to correct the result. Experimental results demonstrate that the dispersion can be eliminated,
compared with the super-resolving multi-channel image got from original FPM algorithm. Besides, the robustness of
Fourier ptychographic imaging is improved and the running time of the super-resolution of color images is reduced.

A scalable extension design is proposed for High Efficiency Video Coding (HEVC), which can provide temporal, spatial,
and quality scalability. This technique achieves high coding efficiency and error resilience, but increases the
computational complexity. To reduce the complexity of the quality scalable video coding, this paper proposes a fast
mode selection method based on mode distribution of coding units(CUs). Some experiments are tested which show that
the proposed algorithm can achieve up to 63.70% decrease in encoding time with a negligible loss of video quality.

Transient imaging provides a direct view of how light travel in the scene, which leads to exciting applications such as
looking around corners. Low-budget transient imagers, adapted from Time-of-Fight (ToF) cameras, reduce the barrier of
entry for performing research of this new imaging modality. However, the image quality is far from satisfactory due to
the limited resolution of PMD sensors. In this paper, we improve the resolution of transient images by modulating the
illumination. We capture the scene under three linearly independent lighting conditions, and derive a theoretical model
for the relationship between the time-profile and the corresponding 3D details of each pixel. Our key idea is that the light
flight time in each pixel patch is proportional to the cross product of the illuminating direction and the surface normal.
First we capture and reconstruct transient images by Fourier analysis at multiple illumination locations, and then fuse the
data of acquired low-spatial resolution images to calculate the surface normal. Afterwards, we use an optimization
procedure to split the pixels and finally enhance the image quality. We show that we can not only reveal the fine
structure of the object but may also uncover the reflectance properties of different materials. We hope the idea of
utilizing spatial-temporal relations will give new insights to the research and applications of transient imaging.

For typical multi-projector 3D display systems, precise calibration of projectors is extremely important for
ensuring projected images/videos to coincide exactly in the same region of the screen to obtain high quality
3D display experience. Conventional calibration is achieved by adjusting the pose of projectors manually with
the built-in keystone correction function, which is imprecise and time-consuming. In this paper, we propose
an auto-calibration approach using feature detection and matching technique via an uncalibrated camera to
improve both the calibration efficiency and precision. The whole procedure can be finished in minutes, and the
calibration error also hardly increase with the number of projectors. What’s more, to the best of our knowledge,
our approach is the first fast auto-calibration approach employed in the multi-projector 3D display systems.

We present a large-scale and glasses-free 3D display system in this paper. The developed prototype consists of a
100-inch display screen and eight synchronized projectors, providing an eight-view glassless 3D experience. The
synchronization and calibration between projectors are well addressed in this paper. Our system is also designed
to be free from the vertical stripe noise, which is a big drawback for many other projector-based 3D system.
Experiment results show that both binocular disparity and motion parallax are well supported. In summary, we
provide a feasible solution for large-scale glasses-free 3D cinemas via multiple projectors.

To restore the motion blurred image caused by various vibration and attitude variation in remote
imaging, an approach is presented which is based on joint transform correlator (JTC). An auxiliary
high-speed CCD is used to capture image sequences When the prime CCD is imaging in exposure
period, these image sequences are optically calculated by JTC system, and image motion vector can be
effectively detected and point spread function is accurately modeled instantaneously, it will alleviate
greatly the complexity of image restoration algorithm. Finally, a simple restoration algorithm is
proposed to restore the blurred image. We have also constructed an image restoration system based on
joint transform correlator. The experimental results show that the proposed method has improved
image quality greatly.

Fizeau interferometry is one of the most important technique to measure astronomical objects with high angle resolution.
This paper is the part of a series dedicated to research of the Fizeau interferometry carried out by the research team of
Shanghai Astronomical Observatory. This paper is mainly concerned the simulation of image restoration based on
Y-type telescope and segmented mirrors telescope. It is proved that we can get the high resolution image using RL and
OS-EM method.

Speckle interferometry has beenwidely used in the observational astronomy, especially in binary stars.This paper is the
part of a series dedicated to the speckle imaging of binary stars carried out by the research team of Shanghai
Astronomical Observatory.The observation experiments were carried out with 1.56-m telescope using a speckle
camera,and the high resolution image were reconstructed successfully using speckle interferometry and iterative shiftand-
add. In order to speed up the computation speed, we also prepared a reconstruction software based on GPU
technology and CUDA programming model, compared with C++ program based on CPU, the speed ratio can reach
about 7 times.

As people's life quality have been improved significantly, the traditional 2D video technology can not meet people's
urgent desire for a better video quality, which leads to the rapid development of 3D video technology. Simultaneously
people want to watch 3D video in portable devices,. For achieving the above purpose, we set up a remote stereoscopic
video play platform. The platform consists of a server and clients. The server is used for transmission of different formats
of video and the client is responsible for receiving remote video for the next decoding and pixel restructuring. We utilize
and improve Live555 as video transmission server. Live555 is a cross-platform open source project which provides
solutions for streaming media such as RTSP protocol and supports transmission of multiple video formats. At the
receiving end, we use our laboratory own player. The player for Android, which is with all the basic functions as the
ordinary players do and able to play normal 2D video, is the basic structure for redevelopment. Also RTSP is
implemented into this structure for telecommunication. In order to achieve stereoscopic display, we need to make pixel
rearrangement in this player's decoding part. The decoding part is the local code which JNI interface calls so that we can
extract video frames more effectively. The video formats that we process are left and right, up and down and nine grids.
In the design and development, a large number of key technologies from Android application development have been
employed, including a variety of wireless transmission, pixel restructuring and JNI call. By employing these key
technologies, the design plan has been finally completed. After some updates and optimizations, the video player can
play remote 3D video well anytime and anywhere and meet people's requirement.

The goal of phase retrieval is to recover the phase information from intensity distribution which is an important topic in optics and image processing. The algorithm based on the transport of intensity equation only need to measure the spatial intensity of the center plane and adjacent light field plane, and reconstruct the phase object by solving second order differential equations. The algorithm is derived in the coherent light field. And the partially coherent light field is described more complex. The field at any point in the space experiences statistical fluctuations over time. Therefore, traditional TIE algorithms cannot be applied in calculating the phase of partially coherent light field. In this thesis, the phase retrieval algorithm is proposed for partially coherent light field. First, the description and propagation equation of partially coherent light field is established. Then, the phase is retrieved by TIE Fourier transform. Experimental results with simulated uniform and non-uniform illumination demonstrate the effectiveness of the proposed method in phase retrieval for partially coherent light field.

Real-time monitoring of blood glucose concentration (BGC) is a great important procedure in controlling diabetes
mellitus and preventing the complication for diabetic patients. Noninvasive measurement of BGC has already become a
research hotspot because it can overcome the physical and psychological harm. Photoacoustic spectroscopy is a
well-established, hybrid and alternative technique used to determine the BGC. According to the theory of photoacoustic
technique, the blood is irradiated by plused laser with nano-second repeation time and micro-joule power, the
photoacoustic singals contained the information of BGC are generated due to the thermal-elastic mechanism, then the
BGC level can be interpreted from photoacoustic signal via the data analysis. But in practice, the time-resolved
photoacoustic signals of BGC are polluted by the varities of noises, e.g., the interference of background sounds and
multi-component of blood. The quality of photoacoustic signal of BGC directly impacts the precision of BGC
measurement. So, an improved wavelet denoising method was proposed to eliminate the noises contained in BGC
photoacoustic signals. To overcome the shortcoming of traditional wavelet threshold denoising, an improved
dual-threshold wavelet function was proposed in this paper. Simulation experimental results illustrated that the denoising
result of this improved wavelet method was better than that of traditional soft and hard threshold function. To varify the
feasibility of this improved function, the actual photoacoustic BGC signals were test, the test reslut demonstrated that the
signal-to-noises ratio(SNR) of the improved function increases about 40-80%, and its root-mean-square error (RMSE)
decreases about 38.7-52.8%.

We present a method for distinguishing human face from high-emulation mask, which is increasingly
used by criminals for activities such as stealing card numbers and passwords on ATM. Traditional
facial recognition technique is difficult to detect such camouflaged criminals. In this paper, we use the
high-resolution hyperspectral video capture system to detect high-emulation mask. A RGB camera is
used for traditional facial recognition. A prism and a gray scale camera are used to capture spectral
information of the observed face. Experiments show that mask made of silica gel has different spectral
reflectance compared with the human skin. As multispectral image offers additional spectral
information about physical characteristics, high-emulation mask can be easily recognized.

The performance of high-resolution imaging with large optical instruments is severely limited by atmospheric turbulence. Adaptive
optics (AO) offers a real-time compensation for turbulence. However, the correction is often only partial, and image restoration is
required for reaching or nearing to the diffraction limit. Wavelet-based techniques have been applied in atmospheric turbulencedegraded
image restoration. However, wavelets do not restore long edges with high fidelity while curvelets are challenged by small
features. Loosely speaking, each transform has its own area of expertise and this complementarity may be of great potential. So, we
expect that the combination of different transforms can improve the quality of the result. In this paper, a novel deconvolution
algorithm, based on both the wavelet transform and the curvelet transform (NDbWC). It extends previous results which were obtained
for the image wavelet-based restoration. Using these two different transformations in the same algorithm allows us to optimally detect
in tire same time isotropic features, well represented by the wavelet transform, and edges better represented by the curvelet transform.
The NDbWC algorithm works better than classical wavelet-regularization method in deconvolution of the turbulence-degraded image
with low SNR.

The depth quality of a time-of-flight (ToF) camera is influenced by many systematic and non-systematic errors1. In this paper we present a simple method to correct and reduce these errors and propose a multi-phase approach to improve the depth acquisition accuracy. Compared with traditional calibration methods, we take the position of light source into account, and calibrate the light source together with the camera to reduce depth distortion. To ameliorate the sensor errors caused in the manufacturing process, a Look-up Table (LUT) is used to correct pixel-related errors. Besides, we capture images with multiple phases and apply FFT to get the true depth. By the proposed approach, we are able to reconstruct an accurate 3D model with RMSE of the measured depth belowing 1.2mm.

Plasmons induced by topological insulator (TI) Bi2Se3 micro-ribbon arrays have been experimentally observed recently
(Nature nanotechnology 2013, 8, 556-560). In this letter, the surface plasmons excited by TI Bi2Se3 micro-disk arrays
are investigated by the methods of full-wave numerical simulations. Numerical simulation results show that thin Bi2Se3
micro-disk arrays can support dipolar plasmon resonances in the terahertz (THz) regimes and the absorptions can be
tuned by the structure parameters. In addition to the plasmon mode, two phonon-mode responses are also observed,
which confirms the experimental results of micro-ribbon arrays. Our work further proves that TI can be a good candidate
of plasmonic platform.

The digital camera has become a requisite for people’s life, also essential in imaging applications, and it is important to
get more accurate colors with digital camera. The colorimetric characterization of digital camera is the basis of image
copy and color management process. One of the traditional methods for deriving a colorimetric mapping between camera
RGB signals and the tristimulus values CIEXYZ is to use polynomial modeling with 3×11 polynomial transfer
matrices. In this paper, an improved polynomial modeling is presented, in which the normalized luminance replaces the
camera inherent RGB values in the traditional polynomial modeling. The improved modeling can be described by a two
stage model. The first stage, relationship between the camera RGB values and normalized luminance with six gray
patches in the X-rite ColorChecker 24-color card was described as "Gamma", camera RGB values were converted into
normalized luminance using Gamma. The second stage, the traditional polynomial modeling was improved to the
colorimetric mapping between normalized luminance and the CIEXYZ. Meanwhile, this method was used under daylight
lighting environment, the users can not measure the CIEXYZ of the color target char using professional instruments, but
they can accomplish the task of the colorimetric characterization of digital camera. The experimental results show that:
(1) the proposed method for the colorimetric characterization of digital camera performs better than traditional
polynomial modeling; (2) it’s a feasible approach to handle the color characteristics using this method under daylight
environment without professional instruments, the result can satisfy for request of simple application.