This application paper presents a machine vision solution for the quality inspection of flat glass products. A contact
image sensor (CIS) is used to generate digital images of the glass surfaces. The presented machine vision based quality
inspection at the end of the production line aims to classify five different glass defect types. The defect images are
usually characterized by very little ‘image structure’, i.e. homogeneous regions without distinct image texture.
Additionally, these defect images usually consist of only a few pixels. At the same time the appearance of certain defect
classes can be very diverse (e.g. water drops). We used simple state-of-the-art image features like histogram-based
features (std. deviation, curtosis, skewness), geometric features (form factor/elongation, eccentricity, Hu-moments) and
texture features (grey level run length matrix, co-occurrence matrix) to extract defect information. The main contribution
of this work now lies in the systematic evaluation of various machine learning algorithms to identify appropriate
classification approaches for this specific class of images. In this way, the following machine learning algorithms were
compared: decision tree (J48), random forest, JRip rules, naive Bayes, Support Vector Machine (multi class), neural
network (multilayer perceptron) and k-Nearest Neighbour. We used a representative image database of 2300 defect
images and applied cross validation for evaluation purposes.

In this paper, we present a stain defect detection algorithm based on the difference of window mean brightness. In particular, we use the maximum square value of the difference brightness in divided windows (MAXWDMS). Window shapes generally affect WDMS values and make stain images clearly distinguishable. The proposed method consists of three steps: window design, stain localization using MAXWDMS and setting the WDMS level. The proposed methodology has been successfully used in stain defect detection, achieving good detection rates in both quantitative evaluation and sensibility estimation. Experimental results show improved detection accuracy and a satisfactory processing time.

When following current advancements and implementations in the field of machine vision there seems to be no borders for future developments: Calculating power constantly increases, and new ideas are spreading and previously challenging approaches are introduced in to mass market. Within the past decades these advances have had dramatic impacts on our lives. Consumer electronics, e.g. computers or telephones, which once occupied large volumes, now fit in the palm of a hand. To note just a few examples e.g. face recognition was adopted by the consumer market, 3D capturing became cheap, due to the huge community SW-coding got easier using sophisticated development platforms. However, still there is a remaining gap between consumer and industrial applications. While the first ones have to be entertaining, the second have to be reliable. Recent studies (e.g. VDMA [1], Germany) show a moderately increasing market for machine vision in industry. Asking industry regarding their needs the main challenges for industrial machine vision are simple usage and reliability for the process, quick support, full automation, self/easy adjustment at changing process parameters, "forget it in the line". Furthermore a big challenge is to support quality control: Nowadays the operator has to accurately define the tested features for checking the probes. There is an upcoming development also to let automated machine vision applications find out essential parameters in a more abstract level (top down). In this work we focus on three current and future topics for industrial machine vision: Metrology supporting automation, quality control (inline/atline/offline) as well as visualization and analysis of datasets with steadily growing sizes. Finally the general trend of the pixel orientated towards object orientated evaluation is addressed. We do not directly address the field of robotics taking advances from machine vision. This is actually a fast changing area which is worth an own contribution.

We present a light-field multi-line-scan image acquisition and processing system intended for the 2.5/3-D inspection
of fine surface structures, such as small parts, security print, etc. in an industrial environment. The
system consists of an area-scan camera, that allows for a small number of sensor lines to be extracted at high
frame rates, and a mechanism for transporting the inspected object at a constant speed. During the acquisition,
the object is moved orthogonally to the camera’s optical axis as well as the orientation of the sensor lines. In
each time step, a predefined subset of lines is read out from the sensor and stored. Afterward, by collecting all
corresponding lines acquired over time, a 3-D light field is generated, which consists of multiple views of the
object observed from different viewing angles while transported w.r.t. the acquisition device. This structure
allows for the construction of so-called epipolar plane images (EPIs) and subsequent EPI-based analysis in order
to achieve two main goals: (i) the reliable estimation of a dense depth model and (ii) the construction of an
all-in-focus intensity image. Beside specifics of our hardware setup, we also provide a detailed description of
algorithmic solutions for the mentioned tasks. Two alternative methods for EPI-based analysis are compared
based on artificial and real-world data.

Nowadays multimedia projectors of various types are widely used in many areas of human life. Significant part of all
projector devices are used in two main areas: education (e. g. on lectures) and business (e. g. on all sorts of meetings).
What makes projector a demanded device? It provides a user with high quality full color large diagonal image for a much
lower price than display devices that based on other popular non-projecting technologies. However the absolute majority
of projectors share a significant disadvantage. If human steps into the beam two major negative effects appear: a
distorted image appears on human’s body and human’s eyes become glared by bright light. The first effect is undesirable
by the audience and the second effect is undesirable by the presenter. To avoid the abovementioned effects short throw
projectors can be used but they are more expensive than regular ones and have significant limitations on the placement
spot: usually the lens of a short-throw projector is designed so that it has to be placed on a fixed distance to the screen.
We propose a different solution of similar problem: instead of moving the projector towards the screen let us black out
the human (or other obstacle) from it. To do so our system must know the exact position of human. In this paper we
propose a system with flexible architecture that realizes the above-mentioned solution.

One of the most popular approaches to detect lines is based on the Radon transform (RT). But in real-world applications
RT-based approach suffers from the noise and clutter, because they decrease the sharpness of the local maximums.
In this paper we suggest a new approach to computational effective line detection using the Weighted Radon Transform
(WRT). The suggested WRT-based approach uses gradient direction information, so only the differences that are
perpendicular to the line direction are integrated to make a local maximum corresponding to the line.
The theoretical and experimental studies show the effectiveness of the WRT-based line detection. The suggested WRTbased
algorithm can be effectively implemented in real-time systems using parallelization and FFT-based techniques.

Adaptive thresholding is a useful technique for document analysis. In medical image processing, it is also helpful for segmenting structures, such as diaphragms or blood vessels. This technique sets a threshold using local information around a pixel, then binarizes the pixel according to the value. Although this technique is robust to changes in illumination, it takes a significant amount of time to compute thresholds because it requires adding all of the neighboring pixels. Integral images can alleviate this overhead; however, medical images, such as ultrasound, often come with image masks, and ordinary algorithms often cause artifacts. The main problem is that the shape of the summing area is not rectangular near the boundaries of the image mask. For example, the threshold at the boundary of the mask is incorrect because pixels on the mask image are also counted. Our key idea to cope with this problem is computing the integral image for the image mask to count the valid number of pixels. Our method is implemented on a GPU using CUDA, and experimental results show that our algorithm is 164 times faster than a naïve CPU algorithm for averaging.

The joint transform correlator (JTC) technique has shown attractive performance for real-time pattern recognition
applications. Among the various JTC techniques proposed in the literature, the fringe-adjusted JTC (FJTC) yields
remarkable promise for object recognition, and it has been shown that the FJTC produces a better correlation output than
alternate JTCs under varying illumination conditions of the input scene; however, it has been found that the FJTC is not
illumination invariant. Therefore, to alleviate this drawback of the FJTC, an illumination-invariant FJTC, based on
combination of the fringe-adjusted filter (FAF) and the monogenic signal, is presented. The performance of the FJTC and
the proposed local phase based FJTC technique in unknown input-image with varying illumination is investigated and
compared. The proposed detection algorithm makes use of the monogenic signal from a two dimensional object region to
extract the local phase information for assisting the FJTC robust to illumination effects. Experimental results show that
by utilizing the monogenic phase information enables the FAF-based JTC to produce sharper correlation peaks and
higher peak-to-clutter ratio compared to alternate JTCs. The proposed technique may be used as a real-time region-ofinterest
identifier in wide-area surveillance for automatic object recognition when the target under very dark or bright
condition that beyond human vision.

We present a 3D change detection framework designed to support various applications in changing environmental conditions. Previous efforts have focused on image filtering techniques that manipulate the intensity values of the image to create a more controlled and unnatural illumination. Since most applications require detecting changes in a scene irrespective of the time of day and present lighting conditions, image filtering algorithms fail to suppress the illumination differences enough for Background Model (BM) subtraction to be effective. Our approach completely eliminates the illumination challenges from the change detection problem. The algorithm is based on our previous work in which we have shown a capability to reconstruct a surrounding environment in near real-time processing speeds. The algorithm, namely Dense Point-Cloud Representation (DPR), allows for a 3D reconstruction of a scene using only a single moving camera. In order to eliminate any effects of the illumination change, we convert each point-cloud model into a 3D binary voxel grid. A `1' is assigned to voxels containing points from the model while a `0' is assigned to voxels with no points. We detect the changes between the two environments by volumetrically subtracting the registered 3D binary voxel models. This process is extremely computationally efficient due to logic-based operations available when handling binary models. We evaluate the 3D change detection framework by experimenting on the same scene with aerial imagery captured at various times.

High throughput (HT) phenotyping of crops is essential to increase yield in environments deteriorated by climate
change. The controlled environment of a greenhouse offers an ideal platform to study the genotype to phenotype
linkages for crop screening. Advanced imaging technologies are used to study plants’ responses to resource limitations
such as water and nutrient deficiency. Advanced imaging technologies coupled with automation make HT phenotyping
in the greenhouse not only feasible, but practical.
Monsanto has a state of the art automated greenhouse (AGH) facility. Handling of the soil, pots water and
nutrients are all completely automated. Images of the plants are acquired by multiple hyperspectral and broadband
cameras. The hyperspectral cameras cover wavelengths from visible light through short wave infra-red (SWIR). Inhouse
developed software analyzes the images to measure plant morphological and biochemical properties. We measure
phenotypic metrics like plant area, height, and width as well as biomass. Hyperspectral imaging allows us to measure
biochemcical metrics such as chlorophyll, anthocyanin, and foliar water content.
The last 4 years of AGH operations on crops like corn, soybean, and cotton have demonstrated successful
application of imaging and analysis technologies for high throughput plant phenotyping. Using HT phenotyping,
scientists have been showing strong correlations to environmental conditions, such as water and nutrient deficits, as well
as the ability to tease apart distinct differences in the genetic backgrounds of crops.

A key step in many image quantification solutions is feature pooling, where subsets of lower-level features are combined
so that higher-level, more invariant predictions can be made. The pooling region, which defines the subsets, often has a
fixed spatial size and geometry, but data-adaptive pooling regions have also been used. In this paper we investigate
pooling strategies for the data-adaptive case and suggest a new framework for pooling that uses multiple sub-regions
instead of a single region. We show that this framework can help represent the shape of the pooling region and also
produce useful pairwise features for adjacent pooling regions. We demonstrate the utility of the framework in a number
of classification tasks relevant to image quantification in digital microscopy.

Automatic action recognition in videos is a challenging computer vision task that has become an active research area in recent years. Existing strategies usually use kernel-based learning algorithms that considers a simple combination of different features completely disregarding how such features should be integrated to fit the given problem. Since a given feature is most suitable to describe a given image/video property, the adaptive weighting of such features can improve the performance of the learning algorithm. In this paper, we investigated the use of the Multiple Kernel Learning (MKL) algorithm to adaptive search for the best linear relation among the considered features. MKL is an extension of the support vector machines (SVMs) to work with a weighted linear combination of several single kernels. This approach allows to simultaneously estimate the weights for the multiple kernels combination as well as the underlying SVM parameters. In order to prove the validity of the MKL approach, we considered a descriptor composed of multiple features aligned with dense trajectories. We experimented our approach on a database containing 36 cooking actions. Results confirm that the use of MKL improves the classification performance.

This paper reports the latest development of a color vision technique for detecting colonies of foodborne pathogens
grown on agar plates with a hyperspectral image classification model that was developed using full hyperspectral data.
The hyperspectral classification model depended on reflectance spectra measured in the visible and near-infrared
spectral range from 400 and 1,000 nm (473 narrow spectral bands). Multivariate regression methods were used to
estimate and predict hyperspectral data from RGB color values. The six representative non-O157 Shiga-toxin producing
Eschetichia coli (STEC) serogroups (O26, O45, O103, O111, O121, and O145) were grown on Rainbow agar plates. A
line-scan pushbroom hyperspectral image sensor was used to scan 36 agar plates grown with pure STEC colonies at each
plate. The 36 hyperspectral images of the agar plates were divided in half to create training and test sets. The mean Rsquared
value for hyperspectral image estimation was about 0.98 in the spectral range between 400 and 700 nm for
linear, quadratic and cubic polynomial regression models and the detection accuracy of the hyperspectral image
classification model with the principal component analysis and k-nearest neighbors for the test set was up to 92% (99%
with the original hyperspectral images). Thus, the results of the study suggested that color-based detection may be viable
as a multispectral imaging solution without much loss of prediction accuracy compared to hyperspectral imaging.

Skin prick test is a commonly used method for diagnosis of allergic diseases (e.g., pollen allergy, food allergy, etc.) in allergy clinics. The results of this test are erythema and wheal provoked on the skin where the test is applied. The sensitivity of the patient against a specific allergen is determined by the physical size of the wheal, which can be estimated from images captured by digital cameras. Accurate wheal detection from these images is an important step for precise estimation of wheal size. In this paper, we propose a method for improved wheal detection on prick test images captured by digital cameras. Our method operates by first localizing the test region by detecting calibration marks drawn on the skin. The luminance variation across the localized region is eliminated by applying a color transformation from RGB to YCbCr and discarding the luminance channel. We enhance the contrast of the captured images for the purpose of wheal detection by performing principal component analysis on the blue-difference (Cb) and red-difference (Cr) color channels. We finally, perform morphological operations on the contrast enhanced image to detect the wheal on the image plane. Our experiments performed on images acquired from 36 different patients show the efficiency of the proposed method for wheal detection from skin prick test images captured in an uncontrolled environment.

This paper addresses the problem of face recognition using a graphical representation to identify structure that is
common to pairs of images. Matching graphs are constructed where nodes correspond to image locations and edges are
dependent on the relative orientation of the nodes. Similarity is determined from the size of maximal matching cliques in
pattern pairs. The method uses a single reference face image to obtain recognition without a training stage. The Yale
Face Database A is used to compare performance with earlier work on faces containing variations in expression,
illumination, occlusion and pose and for the first time obtains a 100% correct recognition result.

In security applications the human face plays a fundamental role, however we have to assume non-collaborative subjects. A face can be partially visible or occluded due to common-use accessories such as sunglasses, hats, scarves and so on. Also the posture of the head influence the face recognizability. Given a video sequence in input, the proposed system is able to establish if a face is depicted in a frame, and to determine its degree of recognizability in terms of clearly visible facial features. The system implements features filtering scheme combined with a skin-based face detection to improve its the robustness to false positives and cartoon-like faces. Moreover the system takes into account the recognizability trend over a customizable sliding time window to allow a high level analysis of the subject behaviour. The recognizability criteria can be tuned for each specific application. We evaluate our system both in qualitative and quantitative terms, using a data set of manually annotated videos. Experimental results confirm the effectiveness of the proposed system.

Intelligent multi-camera systems that integrate computer vision algorithms are not error free, and thus both false positive and negative detections need to be revised by a specialized human operator. Traditional multi-camera systems usually include a control center with a wall of monitors displaying videos from each camera of the network. Nevertheless, as the number of cameras increases, switching from a camera to another becomes hard for a human operator.
In this work we propose a new method that dynamically selects and displays the content of a video camera from all the available contents in the multi-camera system. The proposed method is based on a computational model of human visual attention that integrates top-down and bottom-up cues. We believe that this is the first work that tries to use a model of human visual attention for the dynamic selection of the camera view of a multi-camera system.
The proposed method has been experimented in a given scenario and has demonstrated its effectiveness with respect to the other methods and manually generated ground-truth. The effectiveness has been evaluated in terms of number of correct best-views generated by the method with respect to the camera views manually generated by a human operator.

Military Operations in Urban Terrain (MOUT) require the capability to perceive and to analyze the situation around a
patrol in order to recognize potential threats. A permanent monitoring of the surrounding area is essential in order to
appropriately react to the given situation, where one relevant task is the detection of objects that can pose a threat.
Especially the robust detection of persons is important, as in MOUT scenarios threats usually arise from persons. This
task can be supported by image processing systems. However, depending on the scenario, person detection in MOUT can
be challenging, e.g. persons are often occluded in complex outdoor scenes and the person detection also suffers from low
image resolution. Furthermore, there are several requirements on person detection systems for MOUT such as the
detection of non-moving persons, as they can be a part of an ambush. Existing detectors therefore have to operate on
single images with low thresholds for detection in order to not miss any person. This, in turn, leads to a comparatively
high number of false positive detections which renders an automatic vision-based threat detection system ineffective. In
this paper, a hybrid detection approach is presented. A combination of a discriminative and a generative model is
examined. The objective is to increase the accuracy of existing detectors by integrating a separate hypotheses
confirmation and rejection step which is built by a discriminative and generative model. This enables the overall
detection system to make use of both the discriminative power and the capability to detect partly hidden objects with the
models. The approach is evaluated on benchmark data sets generated from real-world image sequences captured during
MOUT exercises. The extension shows a significant improvement of the false positive detection rate.

Threshold selection using the within-class variance in Otsu’s method is generally moderate, yet inappropriate for expressing class statistical distributions. Otsu uses a variance to represent the dispersion of each class based on the distance square from the mean to any data. However, since the optimal threshold is biased toward the larger variance among two class variances, variances cannot be used to denote the real class statistical distributions. Therefore, to express more accurate class statistical distributions, this paper proposes the within-class standard deviation as a criterion for threshold selection, and the optimal threshold is then determined by minimizing the within-class standard deviation. Experimental results confirm that the proposed method produced a better performance than existing algorithms.

In secondary raw materials and recycling sectors, the products quality represents, more and more, the key issue to pursuit in order to be competitive in a more and more demanding market, where quality standards and products certification play a preheminent role. These goals assume particular importance when recycling actions are applied. Recovered products, resulting from waste materials, and/or dismissed products processing, are, in fact, always seen with a certain suspect. An adequate response of the industry to the market can only be given through the utilization of equipment and procedures ensuring pure, high-quality production, and efficient work and cost. All these goals can be reached adopting not only more efficient equipment and layouts, but also introducing new processing logics able to realize a full control of the handled material flow streams fulfilling, at the same time, i) an easy management of the procedures, ii) an efficient use of the energy, iii) the definition and set up of reliable and robust procedures, iv) the possibility to implement network connectivity capabilities finalized to a remote monitoring and control of the processes and v) a full data storage, analysis and retrieving. Furthermore the ongoing legislation and regulation require the implementation of recycling infrastructure characterised by high resources efficiency and low environmental impacts, both aspects being strongly linked to the waste materials and/or dismissed products original characteristics. For these reasons an optimal recycling infrastructure design primarily requires a full knowledge of the characteristics of the input waste. What previously outlined requires the introduction of a new important concept to apply in solid waste recycling, the recycling-oriented characterization, that is the set of actions addressed to strategically determine selected attributes, in order to get goaloriented data on waste for the development, implementation or improvement of recycling strategies. The problems arising when suitable HyperSpectral Imaging (HSI) based procedures have to be developed and implemented to solid waste products characterization, in order to define time efficient compression and interpretation techniques, are thus analyzed and discussed in the following. Particular attention was also addressed to define an integrated hardware and software (HW and SW) platform able to perform a non-intrusive, non-contact and real-time analysis and embedding a core of analytical logics and procedures to utilize both at laboratory and industrial scale. Several case studies, referred to waste plastics products, are presented and discussed.

In this paper, we studied a method for eye gaze tracking that provide gaze estimation from a standard webcam with a zoom lens and reduce the setup and calibration requirements for new users. Specifically, we have developed a gaze estimation method based on the relative locations of points on the top of the eyelid and eye corners. Gaze estimation method in this paper is based on the distances between top point of the eyelid and eye corner detected by the correlation filters. Advanced correlation filters were found to provide facial landmark detections that are accurate enough to determine the subjects gaze direction up to angle of approximately 4-5 degrees although calibration errors often produce a larger overall shift in the estimates. This is approximately a circle of diameter 2 inches for a screen that is arm’s length from the subject. At this accuracy it is possible to figure out what regions of text or images the subject is looking but it falls short of being able to determine which word the subject has looked at.

In this paper we propose an automated solution that compensates for CMYK inkjet printer output non-uniformities. The solution is applied to a printer employing fixed printhead arrays. The algorithm initially pre-processes scanned image of printed output to isolate the desired region of interest. The region of interest information is utilized to extract non-uniformities across the entire printed area. Finally, the algorithm concludes in a calibration step that enables compensation of the identified non-uniformities and provides the desired tonal target response.