B5679 - OPTIMAL CONTROL AND REINFORCEMENT LEARNING M

Academic Year 2025/2026

Learning outcomes

The course focuses on theoretical and numerical methods for the design of trajectories and feedback policies of dynamical systems to optimize a performance index and satisfy given constraints. The presented methods involve the areas of optimal control, reinforcement learning and model predictive control with a strong focus on numerical optimization. At the end of the course students will know how to (i) model optimal control and reinforcement learning problems and characterize optimality conditions, (ii) develop numerical optimization methods from optimal control and reinforcement learning to compute optimal, feasible trajectories and policies, and (iii) design optimization-based predictive control schemes for maneuvering of autonomous systems. To bridge the gap between theory and application, students will apply the studied techniques to trajectory optimization and maneuvering of autonomous systems in a number of application domains including autonomous vehicles, intelligent robots(e.g., aerial robots) and other mechatronic systems.

Course contents

Introduction to optimal control
Motivating application domains and tasks for the optimal control of dynamical systems: maneuvering and trajectory optimization of autonomous vehicles, robotic systems (e.g., Autonomous Mobile Robots) and other mechatronic systems. Optimal control problem formulation. Examples of optimal control problems in the presented application domains.


Nonlinear optimization
Basics on unconstrained and constrained optimization: definitions and optimality conditions. Optimization algorithms: descent (line-search) methods (gradient and Newton methods), barrier function methods, Sequential Quadratic Programming (SQP). Software tools and coding on case studies for optimal control.


Optimality conditions for optimal control
Nonlinear programming reformulation of optimal control, KKT optimality conditions. Unconstrained optimal control: reformulation via shooting, derivation of the reduced-cost gradient, Hamiltonian definition, necessary conditions for optimal control. Pontryagin Maximum Principle.


Linear Quadratic (LQ) optimal control
Finite horizon: problem formulation, necessary and sufficient conditions for optimality via Riccati equation, feedback structure of the optimal control. Infinite-horizon optimal control. Trajectory tracking via optimal control: Linear Quadratic Regulator (LQR). Continuous-time version of the LQ optimal control. Case studies in autonomous vehicles and robotics. Software tools and coding on case studies.


Deterministic Dynamic programming
Principle of optimality, value function and Bellman's equation. Discrete-time Minimum Principle for optimal control. LQ optimal control via dynamic programming. Dynamic programming and Reinforcement Learning.


Numerical methods for optimal control
Gradient and Newton methods for optimal control. Barrier function method for constrained optimal control. SQP for optimal control. Software tools and coding on case studies for optimal control of autonomous vehicles and robots.


Optimization-based control techniques
Optimal control for trajectory generation and optimization. Receding horizon hierarchical control schemes. Model Predictive Control: introduction, nominal schemes on linear and nonlinear systems, extensions and applications. Software tools and coding on case studies from autonomous vehicles and robots.


Introduction to reinforcement learning
Motivating application domains and examples for reinforcement learning in robotic systems (e.g., Autonomous Mobile Robots) and other autonomous systems. Reinforcement Learning problem formulation and control notation: system, policy and reward. Markov decision processes.


Stochastic Dynamic programming
Value function and action-value function. Bellman's expectation equation. Bellman's optimality equation. Policy evaluation, policy improvement, policy iteration, value iteration.


Selected algorithms for reinforcement learning
Approximation-based methods: main idea and approaches. Introduction to off-policy and on-policy algorithms. Monte Carlo and temporal-difference methods. Algorithms based on function approximation. Policy gradient. Examples from robotics and autonomous systems.


Readings/Bibliography

The course is partly based on the books
“D. P. Bertsekas, Nonlinear Programming”
"A. E. Bryson, Y. Ho, Applied Optimal Control"
"R. S. Satton and A. G. Barto, Reinforcement Learning: An Introduction"
“D. P. Bertsekas, A Course in Reinforcement Learning”

and a set of slides/notes which will be made available throughout the term.

Teaching methods

Frontal lectures, slides and hands-on software exercising. Lectures will be oriented to practical software implementation with application to case studies.

Assessment methods

Oral exam and discussion of a course project.

Teaching tools

"Virtuale" (course content and material, useful info). Software tools for the design and simulation of optimal control and reinforcement learning methods.

Office hours

See the website of Giuseppe Notarstefano