We are on the brink of multiple major flagship applications for intelligent robots. In the transition from industrial robots to more intelligent personal and service robots, learning may be a promising way to overcome the limitations of hard-coded systems, enabling robots to adapt to uncertainty. However, a learning robot can have many limitations that may render it inefficient or ineffective. The following three projects describe my work towards overcoming challenges of intelligent systems.

Learning in Natural Environments

Vision is one of the primary sensory modalities of animals and robots, yet in robots it still has limited power in natural environments. Dynamic processes of Nature continuously change how an environment looks, which work against appearance--based methods for data association. As a robot is deployed again and again, the possibility of finding correspondences diminishes between surveys increasingly separated in time. This is a major limitation of intelligent systems targeted for precision agriculture, search and rescue, and environment monitoring. After Cedric brought me onto this project, I investigated how to overcome the variation in appearance of natural environments.

After finding that some animals have evolved to exploit an environment's structure to achieve data association across seasons, I sought to create a similar approach for robots. I proposed a new image registration technique that maximizes the use of spatial information. Dense correspondence is first utilized between near-time surveys to recover a map of the environment. A map and known poses are used to provide appearance--invariant viewpoint selection and robust image registration. This technique is called Reprojection Flow.

Our work is among the first to seriously challenge the established state-of-the-art in pairwise dense correspondence. Unlike other approaches, ours relies on the robustness of data association that is obtained directly from the spatial layout of real scenes. Furthermore, ours is grounded in the experience a robot acquires while moving around its environment. More work is expected to show 1) this method can be made more accurate and scalable; and 2) a robot can learn to infer dense correspondence from its experience. Our latest results have been accepted for oral presentation (top 10%) at BMVC.

Error Free Learning

I passed the Qual after investigating how robots might learn without being labeled defective. Although a large body of work already addresses how a robot might explore its environment in order to learn and adapt to it, the risks of exploration are commonly overlooked. Few papers addressed how a robot could reliably stay out of harm's way if it is left to freely explore. After I saw this problem, my research goal was to investigate how robots could avoid committing serious errors while exploring.

A step towards error free learning is made possible with a new insight about how to learn from human feedback. Feedback interpreted as a direct label on the optimality of an action can provide a way to eliminate hazardous sections of the state space. This is in contrast to most previous work in which feedback is interpreted as a reward (e.g., reward shaping), which creates something like a trail of breadcrumbs for coaxing an agent out of an undesirable or dangerous area. Our new ``policy shaping'' approach to interactive machine learning called for a fundamentally new way to use feedback with reinforcement learning.

We ended up deriving a simple, yet rigorous, information theoretic algorithm to maximize the information gained from human feedback, which we named Advise. Our experiments showed Advise in some cases significantly outperformed state-of-the-art methods, and was robust to noise. It also eliminates the ad hoc parameter tuning common of methods that interpret feedback as a reward. These advancements were presented at the 1st Biennial Conference on Reinforcement Learning and Decision Making (RLDM), where it was one of the top four papers, and published in the 27th Annual Conference on Neural Information Processing Systems (NIPS).

Separating containers from non-containers:

(May 2008 - August 2011)

A framework for learning behavior-grounded object categories

I earned my M.S. after 3.5 years of studying what a container is, and how a humanoid robot can learn what a container is.
Although a growing body of literature in robotics addressed many different container manipulation problems, individual papers only chipped away at isolated problems one by one.
This meant that the algorithms for one domain were not directly applicable to other domains.
After I saw this problem, the goal of my thesis was to identify how a robot could start to learn about containers in a more general way.

Because people have a representation of containers that is generalizable across many different container manipulation problems, I looked to psychology for all the information on the origins of container learning.
Psychologists observed that infants form an abstract spatial category for containers, which allows them to apply their knowledge to novel containers.
At the time, however, the current theories of object categorization weren't clear about exactly how infants form an object category for containers.
Consequently, I looked more deeply into the psychology literature in order to try and understand how infants learn.

By citing many different theories and observations from psychology, I extrapolated an explanation for how infants learn object categories.
As a result of the expertise of the whole team, we were able to create a computational framework for learning object categories in a similar way by a robot.
Our experiments with containers showed that this method of object categorization really works, and it works really well.

Our work was well received when we submitted it to the IEEE Transactions on Autonomous Mental Development (TAMD) for publication in their journal.
An eminent developmental psychologist reviewed the object categorization theory (the expertise of the other two reviewers was robotics), and in her "comments to the author" that we received when the paper was accepted, she signed her name in her review (reviews are usually anonymous) and said:

"I commend the authors on a fantastic literature review of my domain. The authors accurately cite a broad array of the relevant literature. There were no relevant articles missing. I do not have any suggested changes because I think the literature is very good as it is. ...I was tickled by the unification of citations from people that are often perceived to be in opposing theoretical camps. ...I signed this review because I hope that the authors send me a copy when they get it published. I find the work fascinating and I would like to refer to their in my own work."

In addition to technical comments that helped us to improve our work, the two roboticists said "[this paper presents] an interesting and out-of-the-box way of addressing concept acquisition" and "this paper makes a significant contribution to the existing literature." In the end, my research productivity for my M.S. came to rest at π (11 papers in 3.5 years).