We propose a novel actor–critic algorithm with guaranteed convergence to an optimal policy for a discounted reward Markov decision process. The actor incorporates a descent direction that is motivated by the solution of a certain non-linear optimization problem. We also discuss an extension to incorporate function approximation and demonstrate the practicality of our algorithms on a network routing application.

L. A. Prashanth

Department of Computer Science and Engineering

Sudakar Chandran

Department Of Physics

H.L. Prasad

Bhatnagar Shalabh

Fulltext

IIT Madras is a public technical and research university located in Chennai, Tamil Nadu. Founded in 1959, it is recognised as an Institute of National Importance.

IIT Madras has been ranked as the top engineering institute in India for four years in a row by the National Institutional Ranking Framework of the MHRD

It currently offers undergraduate, postgraduate and research degrees across 16 disciplines in Engineering, Sciences, Humanities and Management. About 596 faculty belonging to science and engineering departments and centres of the Institute are engaged in teaching, research and industrial consultancy.

IIT Madras

Systems & Control Letters

A constrained optimization perspective on actor–critic algorithms and application to network routing

Journal	Data powered by TypesetSystems & Control Letters
Publisher	Data powered by TypesetElsevier BV
Open Access	No