Understanding Linear Quadratic Regulator (LQR) in the Context of Markov Decision Processes (MDP)

Understanding Linear Quadratic Regulator (LQR) in the Context of Markov Decision Processes (MDP)

Deterministic System: #

  • States: $x_{t}$ (the state of the system at time $t$)
  • Actions: $u_{t}$ (the control input at time $t$)
  • Transition Dynamics: $$ x_{t+1} = Ax_{t} + Bu_{t} $$
  • Cost Function (at time $t$): $c(x_{t}, u_{t}) = x_{t}^{\top} Q x_{t} + u_{t}^{\top} R u_{t}$
    • In the context of MDPs, we usually aim to maximize rewards. However, in this case, we’re minimizing cost, which is essentially the same as maximizing the negative of the cost.
  • Value Function: (from time $t$ onward):
    • In MDPs, the value function $V_{t}(x_{t})$ signifies the expected optimal cumulative reward (or cost) from $t$ onward (starting from state $x_{t}$ and following the optimal action sequence onward). $$ V_{t}(x_{t}) = \min_{{u_{t}}} \left[ \sum_{t} c(x_{t}, u_{t}) \right] $$
  • The Bellman Equation: (at time $t$):
    • The Bellman equation provides a recursive relationship for the value function. This equation implies that the value of being in state $x_{t}$ and taking the optimal action $u_{t}$ is the immediate cost $c(x_{t}, u_{t})$ plus the value of the subsequent state, $V_{t+1}(Ax_{t} + Bu_{t})$.
    • Here, the subsequent state is determined by the current state and action via the system dynamics. $$ V_{t}(x_{t}) = \min_{u_{t}} \left[ c(x_{t}, u_{t}) + V_{t+1}(Ax_{t} + Bu_{t}) \right] $$

In the infinite horizon scenario, the value function is not explicitly dependent on time $t$, hence we denote it as $V(x)$. Thus, the Bellman equation simplifies to: $$ V(x) = \min_{u} \left( x^{\top} Q x + u^{\top} R u + V(Ax + Bu) \right) $$

Assuming a quadratic form for the value function, $V(x) = x^{\top} P x$, and substituting this into the Bellman equation. The optimal control law $u^*$ that minimizes the right-hand side of the Bellman equation is then given by:

$$ u^* = - (R + B^{\top} P B)^{-1} B^{\top} P A x $$

This leads to the Algebraic Riccati Equation (ARE):

$$ P = Q + A^{\top} P A - A^{\top} P B (R + B^{\top} P B)^{-1} B^{\top} P A $$

which can be used to solve for $P$.

Note that this $u^*$ is the linear state feedback control law $Kx$, where

$$ K = -(R + B^{\top} P B)^{-1} B^{\top} P A $$