Get all the updates for this publication
We propose a novel actor–critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process. The actor incorporates a descent direction that is motivated by the solution of a certain non-linear optimization problem. We also discuss an extension to incorporate function approximation and demonstrate the practicality of our algorithms on a network routing application.
Journal | Data powered by TypesetSystems & Control Letters |
---|---|
Publisher | Data powered by TypesetElsevier BV |
Open Access | No |