Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning

L. A. Prashanth; Bhatnagar Shalabh; S. Borkar Vivek

doi:10.1002/9781118453988.ch23

Profiles Research Units Publications

Other

Adaptive Feature Pursuit: Online Adaptation of Features in Reinforcement Learning

, Bhatnagar Shalabh, S. Borkar Vivek

Published in John Wiley & Sons, Inc.

DOI: 10.1002/9781118453988.ch23

Pages: 517 - 534

Abstract

This chapter presents a novel feature adaptation scheme based on temporal difference (TD) learning for the problem of prediction. The scheme suitably combines aspects of exploitation and exploration by (a) finding the worst basis vector in the feature matrix at each stage and replacing it with the current best estimate of the normalized value function and (b) replacing the second worst basis vector with another vector chosen randomly that would result in a new subspace of basis vectors getting picked. The chapter uses the algorithm to a problem of prediction in traffic signal control and observes good performance over two different network settings. As future work, the chapter considers the application of TD learning algorithm together with other schemes such as least squares temporal difference (LSTD) learning and least squares policy evaluation (LSPE).

About the journal

Journal	Data powered by TypesetNatural Sciences
Publisher	Data powered by TypesetJohn Wiley & Sons, Inc.
Open Access	No

Authors (1)

L. A. Prashanth
- Department of Computer Science and Engineering

ABOUT IIT MADRAS

R & D

RANKINGS & ACHIEVEMENTS

QUICK FIND