At 11:45 p.m. on June 8 in the Quicken Loans Arena in Cleveland, Ohio, the Golden State Warriors finished a sweep of the Cleveland Cavaliers with a 108-85 blowout win. As one of the most dominant players in the NBA, the Warriors’ Kevin Durant was named the 2018 NBA Finals MVP, his second time earning the MVP recognition.

At the same time, on the other side of Pacific Ocean, in a high-tech park in Beijing known as Asia’s “Silicon Valley”, the AI Vision team at IBM Research-China was hard at work applying AI technology to the game. Over the course of the three-hour game, our AI system analyzed and categorized each frame, live and in real time. The system identified players and tagged their expression and movement along with key actions (such as shooting, rebounding, blocking, falling on the floor), the ball’s trajectory, and the position of basketball net.

Prior to the Finals, our team performed the same analysis on more than 200 hours of game footage featuring Kevin Durant (from the 2007 season to today). While the players were still celebrating a historic win, an editor clicked a button and in approximately 20 seconds, a highlight reel featuring some of his most dazzling performances was created. The first AI-edited basketball highlight reel made its debut via Tencent Sports’ online video streaming platform, where 143 million basketball fans have enjoyed it so far.

Preparing the highlight reel

This was the first time AI has been used to edit sports videos in China. IBM partnered with Tencent Sports on the project. In preparation for this work, Tencent Sports organized an online poll inviting fans to select a description (e.g., accurate, powerful, wild, consistent) to match popular NBA stars. Fans selected “accurate” as the best descriptor for Stephan Curry and “powerful” for Lebron James. Based on these descriptions, a human editor wrote a “script” for a 1- to 2-minute highlight reel for each player, dictating what types of clips should appear at what point in time. For example, the script might call for a dunk by Lebron James from 2015 to appear after a clip of him passing the ball. The editor also pre-selected music to overlay the highlight reel.

Once our AI system receives such a script from a human editor, it gets to work. Based on the description voted on by fans, and using the script and soundtrack provided by the editor, our system selects the best clips from each player’s career and stitches them together in the format of a highlight reel.

AI at work behind the scenes

Our AI Vision system is multi-modal, meaning it is able to ingest both audio and visual data. The system is able to track and recognize players’ faces and facial expressions (frustration, sadness, happiness), identify objects (court, basketball, baskets, jersey numbers), and categorize actions (slam dunks, shoots, alley-oops, lay-ups, dives for the ball, cheering).

Using deep learning techniques, the AI system creates a confidence ratio for each clip, reflecting how confident it is that a given action is occurring, such as a dunk or a blocked shot. The AI system then matches the clips in its database to the requirements in the script provided by the human editor. For example, if a “happy” facial expression is registered in a clip where the system is confident that a dunk is occurring, that clip would be considered more exciting and thus a good candidate for inclusion in a highlight reel.

Basketball analysis is one of the most complicated tasks for computer vision. The extreme visual complexity of players’ fast movements, crowded screens, multiple camera angles, and rapid camera movement means that basketball video editing is a highly challenging task. For a human, this would be a labor-intensive process that could take hours, but thanks to AI, it can now be simplified and streamlined. A human editor is still able to dictate the content, flow, and narrative of the highlight reel but is relieved of the tedious duty of manually searching through hours of footage to find the perfect clip. Compared with traditional editing, it takes only 20 seconds for our AI Vision system to process a two- or three-hour game to create a one-minute final cut. Our system can make editing more efficient, freeing human editors’ minds to focus on creating and innovating.