News

You are here

New Registration Method Improves Video Analysis

Nov 02, 2012 Katie Carr, ADSC

Almost every athlete knows the importance of "watching tape." Videotaping has been an essential training tool for coaches and athletes for decades; however, recent developments in computer vision research at the Advanced Digital Sciences Center in Singapore by Bernard Ghanem and Narendra Ahuja could revolutionize the way coaches train athletes.

Ghanem and Ahuja's research looks at human actions and interactions in video and creates a way to analyze these complex movements and patterns.

"Our system will help coaches understand the large numbers of video clips of sports they have," explains Ahuja, a Professor of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign. "Coaches could more speedily and reliably extract statistics of different sports events, such as fumbles and goals."

Additionally, their research will allow coaches to interpret those statistics across many plays. Ahuja adds that coaches could use their system to understand the "conditions under which a quarterback adopts a certain play strategy, [or] decipher any regular patterns in the way a team responds to a certain play by another team and the weaknesses of specific players under specific play settings."

Using video registration, a coach could follow the trajectory of a player, as shown in this example from an American football game. The image on the left shows players' locations using colored boxes; after registering each video frame, the coach could see the players' respective trajectories on the right.

There are benefits for individual players in their training routines, as well. Ahuja points out that "a player could quickly access a summary of his own performance over a period and use it to focus on aspects that he needs to modify."

Ghanem and Ahuja's research is based upon the concept of video registration, which is the ability to take a video and create a map or coordinate system of the changing scenes. For example, if there are three different views of a school classroom, it's possible to register the photos together to create one complete view of the room. Since videos are simply a sequence of still images, video registration is the process of determining how video images fit together into a bigger scene.

According to Ahuja, registration is a fundamental problem in computer vision that is central to many capabilities, such as stereo reconstruction analysis, multi-view understanding, and object recognition.

While the task of video registration is not new in the field of computer vision, the current standard method, which has been used for at least five years, assumes the scene being captured is not dynamic, meaning there is not a lot of movement in the scene. However, in real life, videos are often taken of highly dynamic scenes from cameras that are panning, tilting, or zooming.

"To analyze the motion of an object, you need to separate out the motion of the camera," said Ghanem, a Senior Research Scientist at ADSC and a graduate of the University of Illinois with a Ph.D. in Electrical and Computer Engineering."You have to make sure that any motion in the video is from the object alone. There can be no camera movement whatsoever."

Ghanem and Ahuja, along with their team of five other researchers and two interns, have been using video registration to separate the motion of a moving object in a video from the apparent movement due to the pan, tilt, or zoom of a camera. Ghanem says this step is fundamental and must be accomplished before any further processing of the video, such as tracking the movement of a specific person, can be done.

"By doing that, we can render a video where the only motion is object motion," Ghanem said. "Only then can you do reliable tracking."

Another problem with the standard technique is that it detects points and lines in a video and matches them with similar images, using a random sampling method to choose these matches. A method called RANSAC (Random Sampling Consensus) is used to avoid errors in detections. Unfortunately, this method is strict in its assumptions, leading to more errors, especially when the scene is dynamic. For example, this method assumes they know the number of errors on average to expect for each video.

Ghanem and Ahuja's technique assumes there will be a sparse number of errors, but makes no conclusions as to how many. This relaxed assumption makes their new method more flexible and therefore more accurate.

Using recent mathematical developments in the fields of sparse and low-rank representation, in addition to new algorithms and efficient computational techniques, the group created a video registration system that outperforms the accuracy of the current method by at least 8 percent and sometimes up to 30 percent. Additionally, while their registration method is much more complicated to solve, Ghanem and Ahuja have put in place techniques and theoretical technology that can solve complicated problems efficiently. They see themselves yielding results faster than the standard method in the near future as their method is optimized.

"Right now, we're just outperforming [the standard], but we're close to their speed," Ghanem said. "This is one of the challenges behind making sparse and low-rank representation prominent in the field. If you don't have an efficient algorithm, it's not going to be widely used in computer vision."

The application of the technology could be seen in the video room of any college football team. The coach is watching tape of the latest game and wants to know whether the players on his defense are keeping equal distances between each other on a punt return. Football players want to maintain a gap between them in order to cover as much of the field as possible. Using Ghanem and Ahuja's new program, the coach would be able to track players and show the trajectories they are running to see if they are maintaining the gap.

"In football, it's very important to know where everyone is at all times and where they are in respect to each other," Ghanem said. "It's good for coaches to see the patterns of their team and what other patterns other teams have and to be able to automatically discover these patterns."

Another use of the program would be if a coach wants to compare a formation his team runs to a similar formation that another team runs. Using a group of videos, the program would be able to locate and analyze that particular formation for a coach.

Ghanem and Ahuja also plan to add other attractive applications to the program as it progresses. One example is the ability to create a 3D world where the camera would be used to visualize what it would see from different places on the field. Even if the initial video was taken from the stands, the program would be able to generate an abstract video of the same play from an end-zone view or the view of a particular player.

"We can synthesize what the camera would see if the camera were on the field, or what a quarterback would be seeing at that particular time," Ghanem said. "This video is similar to what you would see if you were on the field or in the environment."

So instead of teaching athletes how to handle different formations using video from above, this will assist coaches in showing the players what they should expect to see when running the play on the field. They hope this will help solve the problem of a player seeing a video from the sideline and then having difficulty knowing how to apply himself to the play when he gets out on the field.

While Ghanem and Ahuja are focusing the application of their research in sports domain, the technology could be applied to any moving camera or in any application where tracking people and analyzing their patterns would be beneficial. For example, it could be used by a video surveillance team in an airport or for a large supermarket that wants to track customer tendencies to optimize product placement.

"Since the video registration and tracking modules that we are developing are general, there are many applications outside of the realm of sports," Ghanem said. "…The tracking and registration results can be used to discover patterns in people's movements and tendencies anywhere."