Q-Learning .*;

Q-Learning Example 2

This example is simply a 16-node version of Example 1. Similar code, with a few refinements.

Note how much wasted memory is used on matrix Q (all the zeros in the results) where there are no links between nodes to record any learning for. This implies the need for a better method of storing learned information. At first, we might see geometric patterns
of zeros in matrix Q, but these are really only consequential to the layout of the node/link pattern during design. Now consider that the indices of matrix Q are what we're using to map the agent's progress. Why not just record these index/coordinates with their resulting learning score, and forget anything with a zero? At any rate, this example illustrates the disadvantages of using an entire matrix for Q.