Get all the updates for this publication
This chapter presents a novel feature adaptation scheme based on temporal difference (TD) learning for the problem of prediction. The scheme suitably combines aspects of exploitation and exploration by (a) finding the worst basis vector in the feature matrix at each stage and replacing it with the current best estimate of the normalized value function and (b) replacing the second worst basis vector with another vector chosen randomly that would result in a new subspace of basis vectors getting picked. The chapter uses the algorithm to a problem of prediction in traffic signal control and observes good performance over two different network settings. As future work, the chapter considers the application of TD learning algorithm together with other schemes such as least squares temporal difference (LSTD) learning and least squares policy evaluation (LSPE).
Journal | Data powered by TypesetNatural Sciences |
---|---|
Publisher | Data powered by TypesetJohn Wiley & Sons, Inc. |
Open Access | No |