Deterministic System: #
- States: $x_{t}$ (the state of the system at time $t$)
- Actions: $u_{t}$ (the control input at time $t$)
- Transition Dynamics: $$ x_{t+1} = Ax_{t} + Bu_{t} $$
- Cost Function (at time $t$): $c(x_{t}, u_{t}) = x_{t}^{\top} Q x_{t} + u_{t}^{\top} R u_{t}$
- In the context of MDPs, we usually aim to maximize rewards. However, in this case, we’re minimizing cost, which is essentially the same as maximizing the negative of the cost.
- Value Function: (from time $t$ onward):
- In MDPs, the value function $V_{t}(x_{t})$ signifies the expected optimal cumulative reward (or cost) from $t$ onward (starting from state $x_{t}$ and following the optimal action sequence onward). $$ V_{t}(x_{t}) = \min_{{u_{t}}} \left[ \sum_{t} c(x_{t}, u_{t}) \right] $$
- The Bellman Equation: (at time $t$):
- The Bellman equation provides a recursive relationship for the value function. This equation implies that the value of being in state $x_{t}$ and taking the optimal action $u_{t}$ is the immediate cost $c(x_{t}, u_{t})$ plus the value of the subsequent state, $V_{t+1}(Ax_{t} + Bu_{t})$.
- Here, the subsequent state is determined by the current state and action via the system dynamics. $$ V_{t}(x_{t}) = \min_{u_{t}} \left[ c(x_{t}, u_{t}) + V_{t+1}(Ax_{t} + Bu_{t}) \right] $$
In the infinite horizon scenario, the value function is not explicitly dependent on time $t$, hence we denote it as $V(x)$. Thus, the Bellman equation simplifies to: $$ V(x) = \min_{u} \left( x^{\top} Q x + u^{\top} R u + V(Ax + Bu) \right) $$
Assuming a quadratic form for the value function, $V(x) = x^{\top} P x$, and substituting this into the Bellman equation. The optimal control law $u^*$ that minimizes the right-hand side of the Bellman equation is then given by:
$$ u^* = - (R + B^{\top} P B)^{-1} B^{\top} P A x $$
This leads to the Algebraic Riccati Equation (ARE):
$$ P = Q + A^{\top} P A - A^{\top} P B (R + B^{\top} P B)^{-1} B^{\top} P A $$
which can be used to solve for $P$.
Note that this $u^*$ is the linear state feedback control law $Kx$, where
$$ K = -(R + B^{\top} P B)^{-1} B^{\top} P A $$