Computer Software Cuts Videos To Remove Boring Stuff

Algorithm-based approach makes surveillance and social media less tedious.

Originally published:

Jul 3 2014 - 6:00am

By:

Joel N. Shurkin, Contributor

(Inside Science) -- How would you like a computer program that could sift through all the videos friends post on social media and take out all the boring parts? It could take tedious hour-long presentations and automatically cut them to four minutes, leaving just the good stuff.

Two scientists at Pittsburgh's Carnegie Mellon University have a computer algorithm that does just that, a computer form of speed-reading.

Let's say there is a burglary at a closed convenience store and the surveillance camera has recorded 16 hours of business as usual, interrupted at some time by the crime. Police could sit through all 16 hours and hope the event comes early. They can even speed up the recording and cut the potential viewing time to eight hours, but attention would wander.

With LiveLight, developed by Eric Xing, a professor of machine learning, and Bin Zhao, a doctoral student, the footage jumps right to the crime.

"It understands that the burglary is different," Xing said, by eliminating the repetition of the empty room.

You have a video of your toddler in a merry-go-round you want to show your friends, and because it is your toddler, it lasts 15 minutes with the kid doing nothing but going around expressionless. It is quite possible your friends don't want to see all 15 minutes. But, what if something interesting happens in the last 12 seconds? Perhaps, a dog joins the party. With LiveLight you can get it down to just two rotations of the merry-go-round and the dog--a minute or two.

The two presented their work at the Computer Vision and Pattern Conference in Columbus, Ohio, last week.

(A demo of LiveLight from Carnegie Mellon School of Computer Science)

They have formed a company, PanOptus, to market LiveLight.

LiveLight makes use of a "learned dictionary" of the first 30 seconds of the video, Xing said.

If it is a video of a traffic intersection, it notes cars going in well-defined directions and puts that in the dictionary. As the video continues and the cars continue to move in the same directions, LiveLight ignores further occurrences of that pattern. But, if there is an accident, the cars in the video would break out of the pattern. Some cars go in a different direction, and emergency vehicles show up. The software sees and saves that.

LiveLight uses a very complex process called group sparse coding, which takes the millions of pixels in an image, gives them mathematical attributes that shrink the amount of information needed to describe each pixel and looks for frames with groups of identical pixels. If it finds any, it throws them out, leaving those that are unique -- for instance frames containing the sudden appearance of burglars in a once-empty room.

LiveLight can even eliminate repetitive motion.

"It makes the data manageable," said Shlomo Argamon, a professor of computer science at the Illinois Institute of Technology, in Chicago, who was not involved in the research. "It measures the underlying regularity."

"The dictionary keeps updating itself," Xing said. The video has to run at least once. As it is running, the algorithm adds to the dictionary. It can take an hour or two to edit one hour of video on a conventional laptop. Using a more powerful computer or even a supercomputer would shorten the editing time to minutes.

"The longest we have done is 24 hours," Xing said. "We don't have to wait to start processing." The video is streaming while the algorithm is processing. No one has to ever watch the raw footage.

LiveLight can also produce a list of what's in the dictionary so a human can go through and edit, perhaps restoring something the algorithm had decided to cut.

The result of the editing is something like a film trailer, perfect for uploading to social media.

"This thing is very, very useful," Argamon said, "because of the amount of surveillance we have these days.

"It is useful and dangerous."

It would be particularly useful in automated factories or nuclear power stations where machinery normally runs without human observation. The software would cut costs and save time.

In George Orwell's 1984, however, there was a surveillance camera in every home. For every device there had to be someone looking at it. You would need an equal number of watchers as those being watched, so the authorities would have to do sampling, Argamon said. A program like LiveLight would make that kind of spying more efficient.

This makes it much easier to have a surveillance state, he said.

And, of course, it's just a machine and has issues with context.

As Woody Allen once described this type of problem: "I took a speed reading course once and I read War and Peace in 20 minutes. It involves Russia."

Joel Shurkin is a freelance writer based in Baltimore. He is the author of nine books on science and the history of science, and has taught science journalism at Stanford University, UC Santa Cruz and the University of Alaska Fairbanks. He tweets at @shurkin.