Welcome

Welcome to my website (prof.irfanessa.com). Here you will find information related to my academic pursuits. This includes updates on my research projects, list of publications, classes I teach and my collaborators/students. If you'd like to contact me, I suggest please see the FAQ. Students wanted to contact me about working with me are highly encouraged to read the FAQ. My bio is also available. Use the menu bar above, or the TAGS and CATEGORIES listed in the columns to find relevant information.

In this talk, the speaker takes you on a journey of how AI systems have evolved over time. DIRECTOR OF MACHINE LEARNING AT GEORGIA INSTITUTE OF TECHNOLOGY Dr. Irfan Essa is a professor in the school of Interactive Computing and the inaugural Director of Machine Learning at Georgia Tech. One of the fastest growing research areas in computing, machine learning spans many disciplines that use data to discover scientific principles, infer patterns and extract meaningful knowledge. Essa directs an interdisciplinary team studying ways machine learning connects information and actions to bring the most benefit to the most people. This talk was given at a TEDx event using the TED conference format but independently organized by a local community. Learn more at https://www.ted.com/tedx

Machine Learning at Georgia Tech Seminar Series

Speaker: Irfan Essa
Date/Time: March 1, 2017, 12n

Abstract

The Interdisciplinary Research Center (IRC) for Machine Learning at Georgia Tech (ML@GT) was established in Summer 2016 to foster research and academic activities in and around the discipline of Machine Learning. This center aims to create a community that leverages true cross-disciplinarity across all units on campus, establishes a home for the thought leaders in the area of Machine Learning, and creates programs to train the next generation of pioneers. In this talk, I will introduce the center, describe how we got here, attempt to outline the goals of this center and lay out it’s foundational, application, and educational thrusts. The primary purpose of this talk is to solicit feedback about these technical thrusts, which will be the areas we hope to focus on in the upcoming years. I will also describe, in brief, the new Ph.D. program that has been proposed and is pending approval. We will discuss upcoming events and plans for the future.

Abstract

In this talk, I will start with describing the pervasiveness of image and video content, and how such content is growing with the ubiquity of cameras. I will use this to motivate the need for better tools for analysis and enhancement of video content. I will start with some of our earlier work on temporal modeling of video, then lead up to some of our current work and describe two main projects. (1) Our approach for a video stabilizer, currently implemented and running on YouTube, and its extensions. (2) A robust and scaleable method for video segmentation.

I will describe, in some detail, our Video stabilization method, which generates stabilized videos and is in wide use running on YouTube, with Millions of users. Then I will describe an efficient and scalable technique for spatiotemporal segmentation of long video sequences using a hierarchical graph-based algorithm. I will describe the videosegmentation.com site that we have developed for making this system available for wide use.

Finally, I will follow up with some recent work on image and video analysis in the mobile domains. I will also make some observations about the ubiquity of imaging and video in general and need for better tools for video analysis.

Participated in the Dagstuhl Workshop on “Modeling and Simulation of Sport Games, Sport Movements, and Adaptations to Training” at the Dagstuhl Castle, September 13 – 16, 2015.

Motivation

Computational modeling and simulation are essential to analyze human motion and interaction in sports science. Applications range from game analysis, issues in training science like training load-adaptation relationship, motor control & learning, to biomechanical analysis. The motivation of this seminar is to enable an interdisciplinary exchange between sports and computer scientists to advance modeling and simulation technologies in selected fields of applications: sport games, sport movements and adaptations to training. In addition, contributions to the epistemic basics of modeling and simulation are welcome.

Abstract

In this talk, I will start with describing the pervasiveness of image and video content, and how such content is growing with the ubiquity of cameras. I will use this to motivate the need for better tools for analysis and enhancement of video content. I will start with some of our earlier work on temporal modeling of video, then lead up to some of our current work and describe two main projects. (1) Our approach for a video stabilizer, currently implemented and running on YouTube and its extensions. (2) A robust and scalable method for video segmentation.

I will describe, in some detail, our Video stabilization method, which generates stabilized videos and is in wide use. Our method allows for video stabilization beyond the conventional filtering that only suppresses high-frequency jitter. This method also supports the removal of rolling shutter distortions common in modern CMOS cameras that capture the frame one scan-line at a time resulting in non-rigid image distortions such as shear and wobble. Our method does not rely on apriori knowledge and works on video from any camera or on legacy footage. I will showcase examples of this approach and also discuss how this method is launched and running on YouTube, with Millions of users.

Then I will describe an efficient and scalable technique for spatiotemporal segmentation of long video sequences using a hierarchical graph-based algorithm. This hierarchical approach generates high-quality segmentations and we demonstrate the use of this segmentation as users interact with the video, enabling efficient annotation of objects within the video. I will also show some recent work on how this segmentation and annotation can be used to do dynamic scene understanding.

I will then follow up with some recent work on image and video analysis in the mobile domains. I will also make some observations about the ubiquity of imaging and video in general and need for better tools for video analysis.

Abstract

In this talk, I will start with describing the pervasiveness of image and video content, and how such content is growing with the ubiquity of cameras. I will use this to motivate the need for better tools for analysis and enhancement of video content. I will start with some of our earlier work on temporal modeling of video, then lead up to some of our current work and describe two main projects. (1) Our approach for a video stabilizer, currently implemented and running on YouTube, and its extensions. (2) A robust and scaleable method for video segmentation.

I will describe, in some detail, our Video stabilization method, which generates stabilized videos and is in wide use. Our method allows for video stabilization beyond the conventional filtering that only suppresses high frequency jitter. This method also supports removal of rolling shutter distortions common in modern CMOS cameras that capture the frame one scan-line at a time resulting in non-rigid image distortions such as shear and wobble. Our method does not rely on a-priori knowledge and works on video from any camera or on legacy footage. I will showcase examples of this approach and also discuss how this method is launched and running on YouTube, with Millions of users.

Then I will describe an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. This hierarchical approach generates high quality segmentations and we demonstrate the use of this segmentation as users interact with the video, enabling efficient annotation of objects within the video. I will also show some recent work on how this segmentation and annotation can be used to do dynamic scene understanding.

We address a variety of challenges for analysis and enhancement of Computational Video. We present novel post-processing methods to bridge the difference between professional and casually shot videos mostly seen on online sites. Our research presents solutions to three well-defined problems: (1) Video stabilization and rolling shutter removal in casually-shot, uncalibrated videos; (2) Content-aware video retargeting; and (3) spatio-temporal video segmentation to enable efficient video annotation. We showcase several real-world applications building on these techniques.

We start by proposing a novel algorithm for video stabilization that generates stabilized videos by employing L1-optimal camera paths to remove undesirable motions. We compute camera paths that are optimally partitioned into con- stant, linear and parabolic segments mimicking the camera motions employed by professional cinematographers. To achieve this, we propose a linear program- ming framework to minimize the first, second, and third derivatives of the result- ing camera path. Our method allows for video stabilization beyond conventional filtering, that only suppresses high frequency jitter. An additional challenge in videos shot from mobile phones are rolling shutter distortions. Modern CMOS cameras capture the frame one scanline at a time, which results in non-rigid image distortions such as shear and wobble. We propose a solution based on a novel mixture model of homographies parametrized by scanline blocks to correct these rolling shutter distortions. Our method does not rely on a-priori knowl- edge of the readout time nor requires prior camera calibration. Our novel video stabilization and calibration free rolling shutter removal have been deployed on YouTube where they have successfully stabilized millions of videos. We also discuss several extensions to the stabilization algorithm and present technical details behind the widely used YouTube Video Stabilizer.

We address the challenge of changing the aspect ratio of videos, by proposing algorithms that retarget videos to fit the form factor of a given device without stretching or letter-boxing. Our approaches use all of the screens pixels, while striving to deliver as much video-content of the original as possible. First, we introduce a new algorithm that uses discontinuous seam-carving in both space and time for resizing videos. Our algorithm relies on a novel appearance-based temporal coherence formulation that allows for frame-by-frame processing and results in temporally discontinuous seams, as opposed to geometrically smooth and continuous seams. Second, we present a technique, that builds on the above mentioned video stabilization approach. We effectively automate classical pan and scan techniques by smoothly guiding a virtual crop window via saliency constraints.

Finally, we introduce an efficient and scalable technique for spatio-temporal segmentation of long video sequences using a hierarchical graph-based algorithm. We begin by over-segmenting a volumetric video graph into space-time regions grouped by appearance. We then construct a region graph over the ob- tained segmentation and iteratively repeat this process over multiple levels to create a tree of spatio-temporal segmentations. This hierarchical approach gen- erates high quality segmentations, and allows subsequent applications to choose from varying levels of granularity. We demonstrate the use of spatio-temporal segmentation as users interact with the video, enabling efficient annotation of objects within the video.

Part of this talks will will expose attendees to use the Video Stabilizer on YouTube and the video segmentation system at videosegmentation.com. Please find appropriate videos to test the systems.

Part of the work described above was done at Google, where Matthias Grundmann, Vivek Kwatra and Mei Han are, and Professor Essa is working as a Consultant. Part of the work were efforts of research by Matthias Grundmann, Daniel Castro and S. Hussain Raza, as part of their research efforts as students at GA Tech.

The Computation + Journalism Symposium 2013, held Jan 31 – Feb 1, 2013, at Georgia Institute of Technology, Atlanta, GA, USA was a huge success. Please see the videos here of all the sessions. See me discuss computational journalism with Phil Meyer, and my slides and take-away points from the closing session.