Colloquium Details

Robust and Efficient Classification for Videos captured in the Wild

Author:

Behrooz Mahasseni Oregon State University

Date:

October 20, 2016

Time:

15:30

Location:

220 Deschutes

Abstract

Understanding videos of human actions, recorded in an uncontrolled
setting, is an open problem in computer vision. Video surveillance,
content retrieval, autonomous driving and sports analysis are examples
of practical applications. We focus our research on efficiency and
robustness of action recognition in real-world videos.

My initial work has been aimed at advancing traditional approaches
which use hand-crafted video features. Specifically, in our initial
work on robustness, we have relaxed the viewpoint dependence of
existing methods and developed a multitask learning approach for
view-invariant activity recognition. Also, regarding efficiency, we
formulated an approximate policy iteration for budgeted semantic video
segmentation.

Next, inspired by the successful application of deep learning in
computer vision, we present a multimodal deep learning framework
which improves the robustness of activity recognition via a deep
fusion of multimodal data, where diverse sensors (e.g., video camera,
3D skeleton, Kinect camera, audio recordings) capture important clues
about the ongoing events. For fusing multimodal data, we define a new
hybrid method to regularize LSTMs across different sources of data.

Finally, we extend our initial work on efficient semantic video
segmentation to develop a deep long short term memory (LSTM) policy
iteration for cost-efficient semantic video segmentation.

We believe these research projects advance computer vision because the
developed approaches are able to: 1) Meet stringent runtime
requirements of many applications, and 2) Work in less sanitized
settings with small datasets or data coming from heterogeneous sources.

Biography

Behrooz Mahasseni is a 5th year Ph.D. student studying at Oregon State
University. He works under the supervision of Prof. Sinisa Todorovic and
his main research is video analysis and representation. He started his
Ph.D. working on view-invariant activity recognition and feature space
learning considering videos recorded from different viewpoints. Since 2014
his main focus is understanding video content using deep learning
techniques.

His latest work is regularizing long short-term memory for
action classification in uncontrolled settings. In his summer 2016
internship in NVIDIA research, he worked on temporal video segmentation
using deep temporal attention models.