Understanding Linear Quadratic Regulator (LQR) in the Context of Markov Decision Processes (MDP)

2023-11-27

Math

Deterministic System: #

States: $x_{t}$ (the state of the system at time $t$)
Actions: $u_{t}$ (the control input at time $t$)
Transition Dynamics: $$ x_{t+1} = Ax_{t} + Bu_{t} $$
Cost Function (at time $t$): $c(x_{t}, u_{t}) = x_{t}^{\top} Q x_{t} + u_{t}^{\top} R u_{t}$
- In the context of MDPs, we usually aim to maximize rewards. However, in this case, we’re minimizing cost, which is essentially the same as maximizing the negative of the cost.
Value Function: (from time $t$ onward):
- In MDPs, the value function $V_{t}(x_{t})$ signifies the expected optimal cumulative reward (or cost) from $t$ onward (starting from state $x_{t}$ and following the optimal action sequence onward). $$ V_{t}(x_{t}) = \min_{{u_{t}}} \left[ \sum_{t} c(x_{t}, u_{t}) \right] $$
The Bellman Equation: (at time $t$):
- The Bellman equation provides a recursive relationship for the value function. This equation implies that the value of being in state $x_{t}$ and taking the optimal action $u_{t}$ is the immediate cost $c(x_{t}, u_{t})$ plus the value of the subsequent state, $V_{t+1}(Ax_{t} + Bu_{t})$.
- Here, the subsequent state is determined by the current state and action via the system dynamics. $$ V_{t}(x_{t}) = \min_{u_{t}} \left[ c(x_{t}, u_{t}) + V_{t+1}(Ax_{t} + Bu_{t}) \right] $$

In the infinite horizon scenario, the value function is not explicitly dependent on time $t$, hence we denote it as $V(x)$. Thus, the Bellman equation simplifies to: $$ V(x) = \min_{u} \left( x^{\top} Q x + u^{\top} R u + V(Ax + Bu) \right) $$

Assuming a quadratic form for the value function, $V(x) = x^{\top} P x$, and substituting this into the Bellman equation. The optimal control law $u^*$ that minimizes the right-hand side of the Bellman equation is then given by:

$$ u^* = - (R + B^{\top} P B)^{-1} B^{\top} P A x $$

This leads to the Algebraic Riccati Equation (ARE):

$$ P = Q + A^{\top} P A - A^{\top} P B (R + B^{\top} P B)^{-1} B^{\top} P A $$

which can be used to solve for $P$.

Note that this $u^*$ is the linear state feedback control law $Kx$, where

$$ K = -(R + B^{\top} P B)^{-1} B^{\top} P A $$