anyway. If an agent is the sole observer of a POI, it
gains the full value of the POI observation.

The difference reward under partial observability,
Di(PO), is calculated in the same manner as Di, but
with restrictions on what agent i can observe. Each
rover evaluates itself in the same way as Di, but
because of the partial observability, it is possible that
two rovers will be observing the same POI from
opposite sides, and neither will realize that the POI is
doubly observed (which does not increase the system
performance), and both will credit themselves. Likewise, each rover cannot sense POIs located outside of
its observation radius. This is represented in figure 7.

Visualization of Reward StructuresVisualization is an important part of understandingthe inner workings of many systems, but particularlythose of learning systems (Agogino, Martin, andGhosh 1999; Bishof, Pinz, and Kropatsch 1992; Gal-lagher and Downs 1997; Hinton 1989; Hoen et al.

2004; Wejchert and Tesauro 1991). Especially in costlyspace systems we need additional validation that ourlearning systems are likely to work. Performance simu-lations can give us good performance bounds in sce-narios that we can anticipate ahead of time. However,these simulations may not uniformly test the rovers inall situations that they may encounter. Learning andadaptation can allow rovers to adapt to unanticipatedscenarios, but their reward functions still have to havehigh sensitivity and alignment to work. The visualiza-tion presented here can give us greater insight into thebehavior of our reward functions. Our visualizationscan answer important questions such as how often wethink our reward will be aligned with our overall goalsand how sensitive our rewards are to a rover’s actions.

Through visual inspection we can see if there are
important gaps in our coverage, and we can increase
our confidence that a given reward system will work
reliably.

The majority of the results presented in this work
show the relative sensitivity and alignment of each
of the reward structures. We have developed a unique
method for visualizing these, which is illustrated in

Rover SensorPoints of Interest SensorPoints of Interest

Figure 5. Rover Sensing Diagram.

Each rover has eight sensors: four rover sensors and four POI sensors that detect the relative congestion of each in each of
the four quadrants that rotate with the rover as it moves.