Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

Abstract

In this thesis, we study the topic of Lifelong Robotic Object Perception. We propose, as a
long-term goal, a framework to recognize known objects and to discover unknown objects in the environment as the robot operates, for as long as the robot operates. We build the
foundations for Lifelong Robotic Object Perception by focusing our study on the two critical components of this framework: 1) how to recognize and register known objects for robotic manipulation, and 2) how to automatically discover novel objects in the environment so that we can recognize them in the future.

Our work on Object Recognition and Pose Estimation addresses two main challenges in computer vision for robotics: robust performance in complex scenes, and low latency for real-time operation. We present MOPED, a framework for Multiple Object Pose Estimation and Detection that integrates single-image and multi-image object recognition and pose estimation in one optimized, robust, and scalable framework. We extend MOPED to leverage RGBD images using an adaptive image-depth fusion model based on maximum likelihood estimates. We incorporate this model to each stage of MOPED to achieve object
recognition robust to imperfect depth data.

In Robotic Object Discovery, we address the challenges of scalability and robustness for
long-term operation. As a ﬁrst step towards Lifelong Robotic Object Perception, we aim to
automatically process the raw video stream of an entire workday of a robotic agent to
discover novel objects. The key to achieve this goal is to incorporate non-visual information - robotic metadata - in the discovery process. We encode the natural constraints and non-visual sensory information in service robotics to make long-term object discovery feasible. We introduce an optimized implementation, HerbDisc, that processes a video stream of 6 h 20 min of challenging human environments in under 19 min and discovers 206 novel objects.

We tailor our solutions to the sensing capabilities and requirements in service robotics,
with the goal of enabling our service robot, HERB, to operate autonomously in human
environments.