Sign in to your account

Introduction

Heuristic search algorithms rely on a heuristic function, h, to guide search for planning. The aim of such a heuristic function is to produce a quick-to-compute estimate of the true cost-to-goal, h*, for any given state s.

A well-known property of heuristic search based algorithms like A* or IDA* is that if the heuristic never overestimates the true cost-to-goal - that is, h(s)≤h*(s) - then the plans produced by these algorithms is guaranteed to be optimal. Such a heuristic is called an admissible heuristic.

Learning Heuristics

An alternative approach is to learn heuristics from data using machine learning algorithms. For example, consider the popular 15-puzzle. The aim of the 15-puzzle is to reach a goal state from some start state by sliding a blank tile in a direction and swapping that blank tile with the adjacent number in that direction.

Now, suppose we have a set of optimal plans for a set of 15-puzzle tasks, where each plan is for a 15-puzzle task with a different start state. Then it is possible to use these plans as training data for a supervised learning algorithm, such as a neural network, to learn a heuristic. Since supervised learning algorithms generalise to unseen data, such heuristics can then be applied to new, previously unseen tasks i.e. 15-puzzle tasks with different start states to what was...

Introduction

In this blog post we show how the application of curriculum learning can affect the performance of a simple reinforcement learning agent on some target task. We do this by handcrafting source tasks using knowledge of our domain and agent. Our findings show that a curriculum can both positively and negatively affect the performance of an agent on some target task, and that the sequencing of source tasks is significant.

Curriculum Learning

"Example of a mathematics curriculum. Lessons progress from simpler topics to more complex ones, with each building on the last."[1]

Curriculum learning is a study in Machine Learning in which the goal is to design a sequence of source tasks (or curriculum) for an agent to initially train on, such that final performance or learning speed of the agent is improved on some target task. It is motivated by the desire to apply autonomous agents to increasingly difficult tasks and serves to make such tasks easier to solve.

Domain

We conduct our experiment on a simple grid world domain. The below description and visuals are quoted/utilized directly from [2].

The world consists of a room, which can contain 4 types of objects. Keys are items the agent can pick up by moving to them and executing a pickup action. These are used to unlock locks. Each lock in a room is dependent on a set of keys. If the agent is holding the right keys, then moving to a lock and executing an unlock action opens the lock. Pits are obstacles...

Through Social Cobots: Robots with Human-Like Collaboration Skills

This study is a part of a collaboration with DAI-Labor of TU Berlin, Germany. We envision the future of collaborative robots (cobots) in industry through their fully autonomous human-like collaboration with human partners.

Our research aims to address the question: "How do we build cobots with human-like natural collaboration skills?". Existing intention-aware planning approaches often make the assumptions that a human collaborator's actions are always relevant to the collaborated task and that the human always accepts the cobot's assistance when offered.

We believe that these assumptions are a significant barrier against having social cobots in the real world. In reality, a human's dynamic desires and emotional states could result in stochastic human intentions, especially in repeated tasks. A cobot with these assumptions may misinterpret the human actions, which may result in intrusive and unreasonable robot behaviors (e.g. a human gazing at an object might be interpreted as she needs it, yet behind this gaze, she could be evaluating to take it herself or thinking of something irrelevant to the task like staring into space).

Our goal is to offer a new model design as an alternative to the conventional intention-aware models by removing these assumptions. The result is our novel robot decision-making model, a partially observable Markov decision process (POMDP), that is capable of handling these...

About

Research in the RAIL lab focuses primarily on learning in autonomous systems. In particular, we are interested in the acquisition of behaviours, as well as knowledge about the environment around a learning system. Our work draws on tools from multiple fields including decision theory, machine learning, and computer vision, using techniques including reinforcement learning, Bayesian models, deep neural networks, and Monte Carlo tree search.