Tip:
Highlight text to annotate it
X
Now here are some results from running the passive TD algorithm on the 4 by 3 maze.
On the right, we see a graph of the average
error in the utility function--average across all the states.
So it starts off--for the first 5 or so trials,
the error rate is very high--it's off the charts.
But then it starts to settle down, through 10, 20, 40;
and up to about 60 or so, it's still improving;
and then it gets to a final steady state
after about 60 trials of about .05 in the average error in utility.
So that's not too bad, but not really converging all the way down to no rate of error.
And on the left, you see the utility estimates
for various different states;
and, as we see--as we get out to 500 trials,
they're starting to converge a little bit,
close to their true values.
But we see in the first 100 or so trials--
they were all over the map, and so it wasn't doing very well.
It took awhile for it to converge to something close to the true values.