Header menu link for other important links
X
Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning
, Bhatnagar Shalabh, S. Borkar Vivek
Published in John Wiley & Sons, Inc.
Pages: 517 - 534
Abstract

This chapter presents a novel feature adaptation scheme based on temporal difference (TD) learning for the problem of prediction. The scheme suitably combines aspects of exploitation and exploration by (a) finding the worst basis vector in the feature matrix at each stage and replacing it with the current best estimate of the normalized value function and (b) replacing the second worst basis vector with another vector chosen randomly that would result in a new subspace of basis vectors getting picked. The chapter uses the algorithm to a problem of prediction in traffic signal control and observes good performance over two different network settings. As future work, the chapter considers the application of TD learning algorithm together with other schemes such as least squares temporal difference (LSTD) learning and least squares policy evaluation (LSPE).

About the journal
JournalData powered by TypesetNatural Sciences
PublisherData powered by TypesetJohn Wiley & Sons, Inc.
Open AccessNo