Seeing through water

Transcription

1 Seeing through water lexei. Efros School of omputer Science arnegie Mellon University Pittsburgh, P 15213, U.S.. Volkan Isler, Jianbo Shi and Mirkó Visontai Dept. of omputer and Information Science University of Pennsylvania Philadelphia, P 1914 bstract We consider the problem of recovering an underwater image distorted by surface waves. large amount of video data of the distorted image is acquired. The problem is posed in terms of finding an undistorted image patch at each spatial location. This challenging reconstruction task can be formulated as a manifold learning problem, such that the center of the manifold is the image of the undistorted patch. To compute the center, we present a new technique to estimate global distances on the manifold. Our technique achieves robustness through convex flow computations and solves the leakage problem inherent in recent manifold embedding techniques. 1 Introduction onsider the following problem. pool of water is observed by a stationary video camera mounted above the pool and looking straight down. There are waves on the surface of the water and all the camera sees is a series of distorted images of the bottom of the pool, e.g. igure 1. The aim is to use these images to recover the undistorted image of the pool floor as if the water was perfectly still. esides obvious applications in ocean optics and underwater imaging [1], variants of this problem also arise in several other fields, including astronomy (overcoming atmospheric distortions) and structure-from-motion (learning the appearance of a deforming object). Most approaches to solve this problem try to model the distortions explicitly. In order to do this, it is critical not only to have a good parametric model of the distortion process, but also to be able to reliably extract features from the data to fit the parameters. s such, this approach is only feasible in well understood, highly controlled domains. On the opposite side of the spectrum is a very simple method used in underwater imaging: simply, average the data temporally. lthough this method performs surprisingly well in many situations, it fails when the structure of the target image is too fine with respect to the amplitude of the wave (igure 2). In this paper we propose to look at this difficult problem from a more statistical angle. We will exploit a very simple observation: if we watch a particular spot on the image plane, most of the time the picture projected there will be distorted. ut once in a while, when the water just happens to be locally flat at that point, we will be looking straight down and seeing exactly the right spot on the ground. If we can recognize when this happens uthors in alphabetical order.

2 igure 1: ifteen consecutive frames from the video. The experimental setup involved: a transparent bucket of water, the cover of a vision textbook omputer Vision/ Modern pproach. igure 2: Ground truth image and reconstruction results using mean and median and snap the right picture at each spatial location, then recovering the desired ground truth image would be simply a matter of stitching these correct observations together. In other words, the question that we will be exploring in this paper is not where to look, but when! 2 Problem setup Let us first examine the physical setup of our problem. There is a ground truth image G on the bottom of the pool. Overhead, a stationary camera pointing downwards is recording a video stream V. In the absence of any distortion V (x,y,t) = G(x,y) at any time t. However, the water surface refracts in accordance with Snell s Law. Let us consider what the camera is seeing at a particular point x on the D array, as shown in igure 3(c) (assume 1D for simplicity). If the normal to the water surface directly underneath x is pointing straight up, there is no refraction and V (x) = G(x). However, if the normal is tilted by angle θ 1, light will bend by the amount θ 2 = θ 1 sin 1 ( sinθ 1), so the camera point V (x) will see the light projected from G(x + dx) on the ground plane. It is easy to see that the relationship between the tilt of the normal to the surface θ 1 and the displacement dx is approximately linear (dx.25θ 1 h using small angle approximation, where h is the height of the water). This means that, in 2D, what the camera will be seeing over time at point V (x,y,t) are points on the ground plane sampled from a disk centered at G(x,y) and with radius related to the height of the water and the overall roughness of the water surface. similar relationship holds in the inverse direction as well: a point G(x,y) will be imaged on a disk centered around V (x,y). What about the distribution of these sample points? ccording to ox-munk Law [2], the surface normals of rough water are distributed approximately as a Gaussian centered around the vertical, assuming a large surface area and stationary waves. Our own experiments, conducted by hand-tracking (igure 3b), confirm that the distribution, though not exactly Gaussian, is definitely unimodal and smooth. Up to now, we only concerned ourselves with infinitesimally small points on the image or the ground plane. However, in practice, we must have something that we can compute with. Therefore, we will make an assumption that the surface of the water can be locally approximated by a planar patch. This means that everything that was true for points is now true for local image patches (up to a small affine distortion).

3 3 Tracking via embedding rom the description outlined above, one possible solution emerges. If the distribution of a particular ground point on the image plane is unimodal, then one could track feature points in the video sequence over time. omputing their mean positions over the entire video will give an estimate of their true positions on the ground plane. Unfortunately, tracking over long periods of time is difficult even under favorable conditions, whereas our data is so fast (undersampled) and noisy that reliable tracking is out of the question (igure 3(c)). However, since we have a lot of data, we can substitute smoothness in time with smoothness in similarity for a given patch we are more likely to find a patch similar to it somewhere in time, and will have a better chance to track the transition between them. n alternative to tracking the patches directly (which amounts to holding the ground patch G(x, y) fixed and centering the image patches V (x+dx t,y+dy t ) on top of it in each frame), is to fix the image patch V (x,y) in space and observe the patches from G(x + dx t,y + dy t ) appearing in this window. We know that this set of patches comes from a disk on the ground plane centered around patch G(x,y) our goal. If the disk was small enough compared to the size of the patch, we could just cluster the patches together, e.g. by using translational EM [3]. Unfortunately, the disk can be rather large, containing patches with no overlap at all, thus making only the local similarity comparisons possible. However, notice that our set of patches lies on a low-dimensional manifold; in fact we know precisely which manifold it s the disk on the ground plane centered at G(x,y)! So, if we could use the local patch similarities to find an embedding of the patches in V (x,y,t) on this manifold, the center of the embedding will hold our desired patch G(x,y). The problem of embedding the patches based on local similarity is related to the recent work in manifold learning [4, 5]. asic ingredients of the embedding algorithms are: defining a distance measure between points, and finding an energy function that optimally places them in the embedding space. The distance can be defined as all-pairs distance matrix, or as distance from a particular reference node. In both cases, we want the distance function to satisfy some constraints to model the underlying physical problem. The local similarity measure for our problem turned out to be particularly unreliable, so none of the previous manifold learning techniques were adequate for our purposes. In the following section we will describe our own, robust method for computing a global distance function and finding the right embedding and eventually the center of it. Surface θ 1 N h θ 2 G(x) G(x + dx) (a) (b) (c) igure 3: (a) Snell s Law (b)-(c) Tracking points of the bottom of the pool: (b) the tracked position forms a distribution close to a Gaussian, (c): a vertical line of the image shown at different time instances (horizontal axis). The discontinuity caused by rapid changes makes the tracking infeasible. 4 What is the right distance function? Let I = {I 1,...,I n } be the set of patches, where I t = V (x,y,t) and x = [x min,x max ],y = [y min,y max ] are the patch pixel coordinates. Our goal is to find a center patch to represent the set I. To achieve this goal, we need a distance function

4 d : I I IR such that d(i i,i j ) < d(i i,i k ) implies that I j is more similar to I i than I k. Once we have such a measure, the center can be found by computing: I = arg min d(i i,i j ) (1) I i I Unfortunately, the measurable distance functions, such as Normalized ross orrelation (N) are only local. common approach is to design a global distance function using the measurable local distances and transitivity [6, 4]. This is equivalent to designing a global distance function of the form: { dlocal (I d(i i,i j ) = i,i j ), if d local (I i,i j ) τ (2) d transitive (I i,i j ), otherwise. I j I where d local is a local distance function, τ is a user-specified threshold and d transitive is a global, transitive distance function which utilizes d local. The underlying assumption here is that the members of I lie on a constraint space (or manifold) S. Hence, a local similarity function such as N can be used to measure local distances on the manifold. n important research question in machine learning is to extend the local measurements into global ones, i.e. to design d transitive above. One method for designing such a transitive distance function is to build a graph G = (V,E) whose vertices correspond to the members of I. The local distance measure is used to place edges which connect only very similar members of I. fterwards, the length of pairwise shortest paths are used to estimate the true distances on the manifold S. or example, this method forms the basis of the well-known Isomap method [4]. Unfortunately, estimating the distance d transitive (, ) using shortest path computations is not robust to errors in the local distances which are very common. onsider a patch that contains the letter and another one that contains the letter. Since they are different letters, we expect that these patches would be quite distant on the manifold S. However, among the patches there will inevitably be a very blurry that would look quite similar to a very blurry producing an erroneous local distance measurement. When the transitive global distances are computed using shortest paths, a single erroneous edge will singlehandedly cause all the patches to be much closer to all the patches, short-circuiting the graph and completely distorting all the distances. Such errors lead to the leakage problem in estimating the global distances of patches. This problem is illustrated in igure 4. In this example, our underlying manifold S is a triangle. Suppose our local distance function erroneously estimates an edge between the corners of the triangle as shown in the figure. fter the erroneous edge is inserted, the shortest paths from the top of the triangle leak through this edge. Therefore, the shortest path distances will fail to reflect the true distance on the manifold. 5 Solving the leakage problem Recall that our goal is to find the center of our data set as defined in Equation 1. Note that, in order to compute the center we do not need all pairwise distances. ll we need is the quantity d I (I i ) = I j I d(i i,i j ) for all I i. The leakage problem occurs when we compute the values d I (I i ) using the shortest path metric. In this case, even a single erroneous edge may reduce the shortest paths from many different patches to I i changing the value of d I (I i ) drastically. Intuitively, in order to prevent the leakage problem we must prevent edges from getting involved in many shortest path computations to the same node (i.e. leaking edges). We can formalize this notion by casting the computation as a network flow problem.

5 Let G = (V,E) be our graph representation such that for each patch I i I, there is a vertex v i V. The edge set E is built as follows: there is an edge (v i,v j ) if d local (I i,i j ) is less than a threshold. The weight of the edge (v i,v j ) is equal to d local (I i,i j ). To compute the value d I (I i ), we build a flow network whose vertex set is also V. ll vertices in V {v i } are sources, pushing unit flow into the network. The vertex v i is a sink with infinite capacity. The arcs of the flow network are chosen using the edge set E. or each edge (v j,v k ) E we add the arcs v j v k and v k v j. oth arcs have infinite capacity and the cost of pushing one unit of flow on either arc is equal to the weight of (v j,v k ), as shown in igure 4 left (top and bottom). It can easily be seen that the minimum cost flow in this network is equal to d I (I i ). Let us call this network which is used to compute d I (I i ) as NW(I i ). The crucial factor in designing such a flow network is choosing the right cost and capacity. omputing the minimum cost flow on NW(I i ) not only gives us d I (I i ) but also allows us to compute how many times an edge is involved in the distance computation: the amount of flow through an edge is exactly the number of times that edge is used for the shortest path computations. This is illustrated in igure 4 (box ) where d 1 units of cost is charged for each unit of flow through the edge (u,w). Therefore, if we prevent too much flow going through an edge, we can prevent the leakage problem. v Error : Shortest Path d 1/ u w d 1 : onvex low d 3/ d u 2/c 2 w d 1/c 1 c 1 c 1 + c 2 u v d/ v Error : Shortest Path with apacity d 1/c 1 u w c 1 w igure 4: The leakage problem. Left: Equivalence of shortest path leakage and uncapacitated flow leakage problem. ottom-middle: fter the erroneous edge is inserted, the shortest paths from the top of the triangle to vertex v go through this edge. oxes -:lternatives for charging a unit of flow between nodes u and w. The horizontal axis of the plots is the amount of flow and the vertical axis is the cost. ox : Linear flow. The cost of a unit of flow is d 1 ox : onvex flow. Multiple edges are introduced between two nodes, with fixed capacity, and convexly increasing costs. The cost of a unit of flow increases from d 1 to d 2 and then to d 3 as the amount of flow from u to w increases. ox : Linear flow with capacity. The cost is d 1 until a capacity of c 1 is achieved and becomes infinite afterwards. One might think that the leakage problem can simply be avoided by imposing capacity constraints on the arcs of the flow network (igure 4, box ). Unfortunately, this is not very easy. Observe that in the minimum cost flow solution of the network NW(I i ), the amount of flow on the arcs will increase as the arcs get closer to I i. Therefore, when we are setting up the network NW(I i ), we must adaptively increase the capacities of arcs closer to the sink v i otherwise, there will be no feasible solution. s the structure of the graph G gets complicated, specifying this notion of closeness becomes a subtle issue. urther, the structure of the underlying space S could be such that some arcs in G must indeed

6 carry a lot of flow. Therefore imposing capacities on the arcs requires understanding the underlying structure of the graph G as well as the space S which is in fact the problem we are trying to solve! Our proposed solution to the leakage problem uses the notion of a convex flow. We do not impose a capacity on the arcs. Instead, we impose a convex cost function on the arcs such that the cost of pushing unit flow on arc a increases as the total amount of flow through a increases. See igure 4, box. This can be achieved by transforming the network NW(I i ) to a new network NW (I i ). The transformation is achieved by applying the following operation on each arc in NW(I i ): Let a be an arc from u to w in NW(I i ). In NW (I i ), we replace a by k arcs a 1,...,a k. The costs of these arcs are chosen to be uniformly increasing so that cost(a 1 ) < cost(a 2 ) <... < cost(a k ). The capacity of arc a k is infinite. The weights and capacities of the other arcs are chosen to reflect the steepness of the desired convexity (igure 4, box ). The network shown in the figure yields the following function for the cost of pushing x units of flow through the arc: { d1 x, if x c 1 cost(x) = d 1 c 1 + d 2 (x c 1 ), if c 1 x c 2 d 1 c 1 + d 2 (c 2 c 1 ) + d 3 (x c 1 c 2 ), if c 2 x The advantage of this convex flow computation is twofold. It does not require putting thresholds on the arcs a-priori. It is always feasible to have as much flow on a single arc as required. However, the minimum cost flow will avoid the leakage problem because it will be costly to use an erroneous edge to carry the flow from many different patches. 5.1 ixing the leakage in Isomap s noted earlier, the Isomap method [4] uses the shortest path measurements to estimate a distance matrix M. fterwards, M is used to find an embedding of the manifold S via MDS. s expected, this method also suffers from the leakage problem as demonstrated in igure 5. The top-left image in igure 5 shows our ground truth. In the middle row, we present an embedding of these graphs computed using Isomap which uses the shortest path length as the global distance measure. s illustrated in these figures, even though isomap does a good job in embedding the ground truth when there are no errors, the embedding (or manifold) collapses after we insert the erroneous edges. In contrast, when we use the convex-flow based technique to estimate the distances, we recover the true embedding even in the presence of erroneous edges (igure 5 bottom row). 6 Results In our experiments we used 8 image frames to reconstruct the ground truth image. We fixed 3 3 size patches in each frame at the same location (see top of igure 7 for two sets of examples), and for every location we found the center. The middle row of igure 7 shows embeddings of the patches computed using the distance derived from the convex flow. The transition path and the morphing from selected patches (,,) to the center patch () is shown at the bottom. The embedding plot on the left is considered an easier case, with a Gaussian-like embedding (the graph is denser close to the center) and smooth transitions between the patches in a transition path. The plot to the right shows a more difficult example, when the embedding has no longer a Gaussian shape, but rather a triangular one. lso note that the transitions can have jumps connecting non-similar patches which are distant in the embedding space. The two extremes of the triangle represent the blurry patches, which are so numerous and (3)

7 Ground Truth Isomap [4] onvex flow igure 5: Top row: Ground truth. fter sampling points from a triangular disk, a knn graph is constructed to provide a local measure for the embedding (left). dditional erroneous edges and are added to perturb the local measure (middle, right). Middle row: Isomap embedding. Isomap recovers the manifold for the error-free cases (left). However, all-pairs shortest path can leak through and, resulting a significant change in the embedding. ottom row: onvex flow embedding. onvex flow penalized too many paths going through the same edge correcting the leakage problem. The resulting embedding is more resistant to perturbations in the knn graph. very similar to each other, so that they are no longer treated as noise or outliers. This results in folding in the embedding and thus, moving estimated the center towards the blurry patches. To solve this problem, we introduced additional two centers, which ideally would represent the blurry patches, allowing the third center to move to the ground truth. Once we have found the centers for all patches we stitched them together to form the complete reconstructed image. In case of three centers, we use overlapping patches and dynamic programming to determine the best stitching. igure 6 shows the reconstruction igure 6: omparison of reconstruction results of different methods using the first 8 frames, top: patches stitched together which are closest to mean (left) and median (right), bottom: our results using a single (left) and three (right) centers result of our algorithm compared to simple methods of taking the mean/median of the patches and finding the closest patch to them. The bottom row shows our result for a single and for three center patches. The better performance of the latter suggests that the two new centers relieve the correct center from the blurry patches. or a graph with n vertices and m edges, the minimum cost flow computation takes O(mlog n(m + nlog n)) time, therefore finding the center I of one set of patches can be done in O(mnlog n(m + nlog n)) time. Our flow computation is based on the min-cost max-flow implementation by Goldberg [7]. The convex function used in our experiments was as described in Equation 3 with parameters d 1 = 1, c 1 = 1, d 2 = 5, c 2 = 9, d 3 = 5.

Solving Simultaneous Equations and Matrices The following represents a systematic investigation for the steps used to solve two simultaneous linear equations in two unknowns. The motivation for considering

Chapter 14: Production Possibility Frontiers 14.1: Introduction In chapter 8 we considered the allocation of a given amount of goods in society. We saw that the final allocation depends upon the initial

Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition

EXPERIMENT 8 To Determine the Wavelength of Sodium Light using Newton s Rings Introduction Please read additional instructions on the bench for setting up the PC and camera for this experiment Newton s

A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

Optical Flow as a property of moving objects used for their registration Wolfgang Schulz Computer Vision Course Project York University Email:wschulz@cs.yorku.ca 1. Introduction A soccer game is a real

Euclidean Geometry Students are often so challenged by the details of Euclidean geometry that they miss the rich structure of the subject. We give an overview of a piece of this structure below. We start

Vision based Vehicle Tracking using a high angle camera Raúl Ignacio Ramos García Dule Shu gramos@clemson.edu dshu@clemson.edu Abstract A vehicle tracking and grouping algorithm is presented in this work

Interpolating and Approximating Implicit Surfaces from Polygon Soup 1. Briefly summarize the paper s contributions. Does it address a new problem? Does it present a new approach? Does it show new types

Single Image 3D Reconstruction of Ball Motion and Spin From Motion Blur An Experiment in Motion from Blur Giacomo Boracchi, Vincenzo Caglioti, Alessandro Giusti Objective From a single image, reconstruct:

The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

A network flow algorithm for reconstructing binary images from discrete X-rays Kees Joost Batenburg Leiden University and CWI, The Netherlands kbatenbu@math.leidenuniv.nl Abstract We present a new algorithm

The student will be able to: Geometry and Measurement 1. Demonstrate an understanding of the principles of geometry and measurement and operations using measurements Use the US system of measurement for

Appendix H H.Calculating Normal Vectors This appendix describes how to calculate normal vectors for surfaces. You need to define normals to use the OpenGL lighting facility, which is described in Chapter

2.2. Creaseness operator 31 2.2 Creaseness operator Antonio López, a member of our group, has studied for his PhD dissertation the differential operators described in this section [72]. He has compared

Mathematics on the Soccer Field Katie Purdy Abstract: This paper takes the everyday activity of soccer and uncovers the mathematics that can be used to help optimize goal scoring. The four situations that

MA 323 Geometric Modelling Course Notes: Day 02 Model Construction Problem David L. Finn November 30th, 2004 In the next few days, we will introduce some of the basic problems in geometric modelling, and

Geometric Camera Parameters What assumptions have we made so far? -All equations we have derived for far are written in the camera reference frames. -These equations are valid only when: () all distances

2.2. Instantaneous Velocity toc Assuming that your are not familiar with the technical aspects of this section, when you think about it, your knowledge of velocity is limited. In terms of your own mathematical

. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

Calculus with Analytic Geometry I Exam 10 Take Home part Textbook, Section 47, Exercises #22, 30, 32, 38, 48, 56, 70, 76 1 # 22) Find, correct to two decimal places, the coordinates of the point on the

Correlation between OATS, Fully Anechoic Room and GTEM Radiated Emissions Stephen Clay Introduction: Just a few words about myself. My name is Steve Clay and I work for Nokia Mobile Phones, and it is my

Measuring the Earth s Diameter from a Sunset Photo Robert J. Vanderbei Operations Research and Financial Engineering, Princeton University rvdb@princeton.edu ABSTRACT The Earth is not flat. We all know

Proceedings of the Third International WLT-Conference on Lasers in Manufacturing 2005,Munich, June 2005 Integrated sensors for robotic laser welding D. Iakovou *, R.G.K.M Aarts, J. Meijer University of

The Earth Is Not Flat: An Analysis of a Sunset Photo Robert J. Vanderbei Operations Research and Financial Engineering Princeton University Princeton, N.J., 08544 rvdb@princeton.edu Abstract: If the Earth

Finding Equations of Sinusoidal Functions From Real-World Data **Note: Throughout this handout you will be asked to use your graphing calculator to verify certain results, but be aware that you will NOT

Team: Projectile Motion So far you have focused on motion in one dimension: x(t). In this lab, you will study motion in two dimensions: x(t), y(t). This 2D motion, called projectile motion, consists of

Course 1 Laboratory Second Semester Experiment: Young s Modulus 1 Elasticity Measurements: Young Modulus Of Brass 1 Aims of the Experiment The aim of this experiment is to measure the elastic modulus with

Chapter 8 Graphs and Functions: Cartesian axes, coordinates and points 8.1 Pictorially we plot points and graphs in a plane (flat space) using a set of Cartesian axes traditionally called the x and y axes

Grade 8 Expressions and Equations Essential Questions: 1. How do you use patterns to understand mathematics and model situations? 2. What is algebra? 3. How are the horizontal and vertical axes related?

1 2 The calibration problem was discussed in details during lecture 3. 3 Once the camera is calibrated (intrinsics are known) and the transformation from the world reference system to the camera reference

A comparison of the OpenGIS TM Abstract Specification with the CIDOC CRM 3.2 Draft Martin Doerr ICS-FORTH, Heraklion, Crete Oct 4, 2001 1 Introduction This Mapping has the purpose to identify, if the OpenGIS

Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

EXPERIMENT 9 Diffraction Gratings 1. How a Diffraction Grating works? Diffraction gratings are optical components with a period modulation on its surface. Either the transmission (or the phase) changes

TopHat StableTop Beamshaper The top-hat beam shaper is a diffractive optical element (DOE) used to transform a near-gaussian incident laser beam into a uniform-intensity spot of either round, rectangular,

Difference Equations to Differential Equations Section.7 Rolle s Theorem and the Mean Value Theorem The two theorems which are at the heart of this section draw connections between the instantaneous rate

2. THE x-y PLANE 2.1. The Real Line When we plot quantities on a graph we can plot not only integer values like 1, 2 and 3 but also fractions, like 3½ or 4¾. In fact we can, in principle, plot any real

Tracking Moving Objects In Video Sequences Yiwei Wang, Robert E. Van Dyck, and John F. Doherty Department of Electrical Engineering The Pennsylvania State University University Park, PA16802 Abstract{Object

MATH 23: SYSTEMS OF LINEAR EQUATIONS Systems of Linear Equations In the plane R 2 the general form of the equation of a line is ax + by = c and that the general equation of a plane in R 3 will be we call

2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed

Natural Neighbour Interpolation DThe Natural Neighbour method is a geometric estimation technique that uses natural neighbourhood regions generated around each point in the data set. The method is particularly

A Reliability Point and Kalman Filter-based Vehicle Tracing Technique Soo Siang Teoh and Thomas Bräunl Abstract This paper introduces a technique for tracing the movement of vehicles in consecutive video