The detection and classification of underwater mines is an important task, with strong implications for the safety and security of ports, harbors and the open seas. In recent years, the ability to successfully address this challenge has improved with the significant advances made in synthetic aperture sonar (SAS) imaging systems. An autonomous underwater vehicle (AUV) equipped with a state-of-the-art SAS system can provide imagery of the underwater environment with resolution on the order of a centimeter while covering more than a square kilometer per hour.

Nevertheless, even with the advent of the higher-resolution SAS systems, the task of mine classification when provided with only a single view of an object remains a challenge. This difficulty can be attributed to the relative similarity in appearance between mines and benign clutter objects in SAS imagery. For example, if a mine is observed at a certain orientation, it may be difficult to distinguish it from a rock. However, viewing the mine from a second orientation may reveal previously obscured characteristics that differentiate it.

In general, the information accrued from multiple views of an object should translate into a reduced false alarm rate, and hence, improved mine classification performance. This article describes a method of collecting multiview data for the classification of underwater mines.

Attempting to perform underwater mine classification using multiple SAS views is a relatively immature area of research. The NATO Undersea Research Centre (NURC) previously investigated an approach based on finding the single highest maximum correlation between a set of views of a training shape and a testing shape. Classification was then performed by using this measure of similarity—termed the affinity—directly. Another approach NURC tested involved a partially observable Markov decision process to perform multiview classification of underwater objects.

The multiview mine classification technique has been demonstrated with data collected by NURC during a sea trial in 2008 conducted off the coast of Latvia, referred to as COLOSSUS 2.

(Above) Examples of a single-view (left), double-view (middle) and triple-view (right) image of a cylinder (top row), of the template for a cylinder (middle row) and of a mine-sized rock (bottom row). The cylinder templates are correlated with the images, generating model-based classification scores.

(Below) Fusion of several views of a cylindrical target. The top row shows four independent views and the main image presents the fusion result after fine coregistration.

New Approach Using Multiple Views
Recently NURC analyzed a substantially different approach that is based on fusing multiple views of an object into a single image. Classification of the object is then performed by comparing the fused image to a library of simulated image templates of targets of interest.

This approach is particularly well suited to the underwater mine classification problem for several reasons. For one, it fully exploits the recent advances in SAS systems by focusing on the detailed shape information that high-resolution imagery provides. But more importantly, the approach also overcomes the unique challenges presented by the general underwater mine classification problem that cause single-view classification approaches to fail. An additional advantage is that the data about a possible contact can be presented to a human operator for visual analysis.

The fact that the image of an asymmetric mine is highly aspect-dependent increases the need for more training data. For example, the image of a broadside cylinder will look very different from the image of an end-fire cylinder. In general, the relative paucity of training data on different targets at different ranges, at different aspects and in different site conditions contributes to the difficulty of building a robust classifier.

The multiview imaging approach combats these challenges by obviating explicit feature extraction and classifier construction. Moreover, in the proposed multiview classification method, there is no constraint on the number of views that can be handled.

This proposed approach has an advantage in that the creation of a single image allows all of the information contained in the multiple images to be exploited simultaneously. However, even with the high-resolution imagery provided by the SAS, as well as the knowledge of the relative target-sensor orientations at which each view is obtained, the manner in which the requisite image fusion is performed is not trivial.

Multiview Fusion and Classification
Several approaches can be used for fusing available target views. The first choice to be made is whether to combine the single-view images into one multiview image or to treat the images separately. NURC recently developed a method to fuse several views into a single view using sonar inversion. This method produces a high-resolution multiview image of almost photographic quality that gives an accurate clue to the object's shape and that can be visually evaluated by a human operator. Additionally, a by-product of the method is an approximation of the 3D shape of the target, generated with shape-from-shading techniques, which can be presented to the operator or used to extract 3D classification features.

Following SAS image formation, the classification processing chain goes through a number of steps. First, in image registration, the multiple views are associated and coregistered in order to find a common reference frame. This is followed by image fusion—combining the data in the images—and then feature extraction, which will provide optimal discriminative information. The final step, classification, is the actual process that produces the object class and a classification score for each object of interest based on the feature values (or score) obtained.

Exemplary SAS images showing a patch of seafloor of about 40-by-40 meters with potential mines. The case displayed on the left generates an orders of magnitude higher false-alarm rate than the case on the right.

Image Registration. After SAS image formation has been completed, an automated detection algorithm is applied to flag all potential mine-like objects. The detector provides a coarse, initial reduction of the data, with the alarm rate dictated by the capability of the subsequent classifier. The objective of the image registration is then to associate all the snippets corresponding to the same object, even if the georeferencing is not perfect and the snippets look very different.

The sonar system analyzed here, a SAS operated at a frequency of 300 kilohertz with a bandwidth of 60 kilohertz, is installed on NURC's MUSCLE AUV. The resolution of the images is two centimeters in range and three centimeters in cross-range. The single-view images are first converted from slant-range to ground-range coordinates. Next, the images are rotated to enable the overlay based on the orientation of the sonar measured with a compass.

The registration itself is not simple. It is known that overlaying multiple images by using 2D cross-correlation and obtaining the required 2D shift from the peak position in the cross-correlation output does not give the desired result; such a method would match the highlights of every image pair, which is undesirable since each view is insonified from a different direction. A more advanced method to cope with the aspect-dependent response is to estimate the 3D target shape (with facets) from each single view using shape-from-shading techniques. These shapes are not necessarily accurate, but they do allow for incorporating both directionality and intensity for each target pixel. When correlation is applied, only the overlapping information is used, i.e., pixels that correspond to facets oriented similarly in both single-view images.

Image Fusion. After the single-view images are successfully registered, their pixel values can be combined. One approach is based on the mean operator, so that each pixel in the fused image is the average of the single-view pixels. For a second fusion method, the mean is replaced by the maximum operator, which has the advantage that strong highlights in each single-view image are preserved. This latter fusion approach also suppresses shadows, which are unreliable for classification in complex environments. Information obtained from shadows has also been shown to be strongly dependent on target range and seafloor characteristics, a variability that is reduced with this approach.

Model-Based Classification. The classification score is obtained by matching ideal, noise-free image templates of various mine shapes with the fused image. The templates are generated with a validated NURC sonar model called SIGMAS for which the mine shape, sonar-object geometry and sonar settings are inputs. The better the template of a mine matches the fused image, the more probable it is that the latter corresponds to an actual mine.

Robust classification requires templates for each mine type one expects to encounter. Furthermore, asymmetric mines require a set of rotated versions of the target. Other parameters, such as bottom slope, bottom type, target tilt or burial, were shown to have a much lower impact on performance. In the case of classifying cylinders, the template bank consists of 16 cylinders rotated at different orientations. The model matching is achieved via cross-correlation of the template and the fused image, with the resulting peak correlation value retained as the lone feature associated with that template. Classifying in this way means that the feature can be regarded as a classification score, so it is further referred to as such. The following example illustrates an attempt to discriminate a cylinder from a rock.

Using only a single view, the maximum correlation coefficient is 0.64, but for the image resulting from fusing two views, it is 0.70. When the image of the rock, a typical representative for a false alarm, is correlated with the cylinder templates, the correlation coefficients are much lower: 0.29 for the single-view case and 0.27 for the dual-view case. This first example is promising and appears to achieve the objective, which is to obtain good correlation for the correct target and poor correlation for the false alarm. In this case the separation improves with a second view. The performance gain of a classifier based on this score can be evaluated by generating receiver operating characteristic (ROC) curves.

NURC conducted an extensive sensitivity analysis on simulated data of rocks and cylinders at different ranges and aspect angles in order to quantify the benefits of each additional view. The model-based correlation score that was used for the classification was shown to increase and also to become less aspect-dependent as more views were considered. The study resulted in ROC curves that revealed classification performance as a function of range, aspect and number of views.

Of particular interest was the finding that adding a third view results in a major classification improvement. False alarm rates decrease by more than an order of magnitude after adding a second view, more than three orders of magnitude when adding a third view and another two orders of magnitude for the fourth view. However, in the real world, such an improvement should not be expected due to image registration imperfectness and variability in targets and false alarms.

Overall the results are promising, and NURC will continue this direction of research. In conjunction with the demonstrated success with real data, it is likely that this multiview fusion approach is a viable solution for classification in high-clutter environments.

Conclusions and Future Work
NURC has investigated the added value of the proposed multiview mine classification approach using extensive sets of simulated data.

It has been shown that classification performance improves significantly when adding a second, third and fourth view; however, the value of additional views beyond four is rather limited. Another study has shown that target classification performance worsens as the complexity of the seabed increases because of the false alarms generated by seabed features.

Additionally, when the seabed is characterized by sand ripples, the appearance of the ripples in the resulting imagery exhibits a strong dependence on the orientation of the data-collection survey route.

Therefore, future work will assess the merits of the proposed multiview classification approach on data containing complex seabeds.

Dr. Johannes Groen joined the NATO Undersea Research Centre's mine countermeasures project as a senior scientist in 2006. For this project, he works on synthetic aperture sonar and automatic target recognition. He earned an M.Sc. degree in mathematics in 1998 and a Ph.D. in physics from the Delft University of Technology in 2006.

Dr. Enrique Coiras worked on the MUSCLE autonomous underwater vehicle's synthetic aperture sonar data and image processing at the NATO Undersea Research Centre for four years, leaving NURC in August to join the European Union's Satellite Centre to work on remote sensing topics. He holds a Ph.D. in physics.

Dr. David P. Williams has been a scientist in the mine countermeasures group at the NATO Undersea Research Centre since 2007. His research interests lie in the fields of machine learning, pattern recognition and statistical signal processing. He earned his B.S.E., M.S. and Ph.D. degrees in electrical engineering from Duke University in 2002, 2003 and 2006.

Sea Technology is
read worldwide in more than 110 countries by management, engineers,
scientists and technical personnel working in industry, government
and educational research institutions. Readers are involved with
oceanographic research, fisheries management, offshore oil and gas
exploration and production, undersea defense including antisubmarine
warfare, ocean mining and commercial diving.