Actor-critic algorithms for risk-sensitive MDPs

L. A. Prashanth; Ghavamzadeh M.

Profiles Research Units Publications

Other

Actor-critic algorithms for risk-sensitive MDPs

, Ghavamzadeh M.

Published in Neural information processing systems foundation

2013

Abstract

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance-related risk measures are among the most common risk-sensitive criteria in finance and operations research. However, optimizing many such criteria is known to be a hard problem. In this paper, we consider both discounted and average reward Markov decision processes. For each formulation, we first define a measure of variability for a policy, which in turn gives us a set of risk-sensitive criteria to optimize. For each of these criteria, we derive a formula for computing its gradient. We then devise actor-critic algorithms for estimating the gradient and updating the policy parameters in the ascent direction. We establish the convergence of our algorithms to locally risk-sensitive optimal policies. Finally, we demonstrate the usefulness of our algorithms in a traffic signal control application.

About the journal

Journal	Advances in Neural Information Processing Systems
Publisher	Neural information processing systems foundation
ISSN	10495258
Open Access	No

Authors (1)

L. A. Prashanth
- Department of Computer Science and Engineering

ABOUT IIT MADRAS

R & D

RANKINGS & ACHIEVEMENTS

QUICK FIND