No items found.

This example shows trajectory optimization with the Hermite-Simpson collocation method and trajectory stabilization with a finite-horizon LQR controller. These methods are demonstrated for the model of an acrobot, a two-link underactuated robot. **Note: **This tutorial uses the PyCollimator library. You can find installation and usage instructions, as well as further examples at the PyCollimator site.

An acrobot, see schematic below, is a robotic arm with two links and a single actuator at the elbow joint. The state of the acrobot is described by the angles $q_1$ and $q_2$ of the two links, and the control input is the torque $u=\tau$ applied at the elbow joint.

All the parameters and variables in the acrobot model are described in the table below:

Denoting $\mathbf{q} = [q_1, q_2]^T$ and $\mathbf{u} = [u]$ as the state and control input, respectively, the equations of motion for the acrobot are given by (see [1] for details)

$$ \mathbf{M}(\mathbf{q})\ddot{\mathbf{q}} + \mathbf{C}(\mathbf{q}, \dot{\mathbf{q}})\dot{\mathbf{q}} + \mathbf{G}(\mathbf{q}) = \boldsymbol{\tau}_g(\mathbf{q}) + \mathbf{B}\mathbf{u}, $$

where $\mathbf{M}(\mathbf{q})$ is the mass matrix, $\mathbf{C}(\mathbf{q}, \dot{\mathbf{q}})$ is the matrix for Coriolis and centrifugal terms, $\mathbf{G}(\mathbf{q})$ are the gravitational terms, $\boldsymbol{\tau}_g(\mathbf{q})$ are the torques due to gravity, and $\mathbf{B}$ is the control input matrix. Denoting, $\cos(q_1)$ as $c_1$, $\sin(q_1)$ as $s_1$, and $\sin(q_1 + q_2)$ as $s_{1+2}$, these matrices are [1]:

\begin{align} \mathbf{M}(\mathbf{q}) &= \begin{bmatrix} I_1 + I_2 + m_2 l_1^2 + 2m_2 l_1 l_{c2} c_2 & I_2 + m_2 l_1 l_{c2} c_2 \\ I_2 + m_2 l_1 l_{c2} c_2 & I_2 \end{bmatrix},\\[15pt] \mathbf{C}(\mathbf{q},\dot{\mathbf{q}}) &= \begin{bmatrix} -2 m_2 l_1 l_{c2} s_2 \dot{q}_2 & -m_2 l_1 l_{c2} s_2 \dot{q}_2 \\ m_2 l_1 l_{c2} s_2 \dot{q}_1 & 0 \end{bmatrix}, \\[15pt] \boldsymbol{\tau}_g(\mathbf{q}) &= \begin{bmatrix} -m_1 g l_{c1}s_1 - m_2 g (l_1 s_1 + l_{c2}s_{1+2}) \\ -m_2 g l_{c2} s_{1+2} \end{bmatrix},\\[15pt] \mathbf{B} &= \begin{bmatrix} 0 \\ 1 \end{bmatrix}. \end{align}

As usual, we can write the above equations as first-order differential equations by introducing the state vector ${x} = [q_1, q_2, \dot{q}_1, \dot{q}_2]^T$.

The code for the Acrobot can be found in the PyCollimator examples at [.code]collimator/examples/acrobot.py[.code].

With the Acrobot LeafSystem, we can simulate its dynamics and visualize the results. We show this below for zero elbow torque.

A ** trajectory** is a pair of $x(t)$ and $u(t)$. For a reference trajectory $x_{ref}(t)$ and $u_{ref}(t)$ in an interval $t\in[t_0, t_f]$ that our plant should follow, the goal of trajectory optimization is to obtain a set of control vector $u_{opt}$, which when applied to the plant would produce a trajectory $x_{opt}$ that closely follows the reference trajectory. For instance, for the acrobot we may want to obtain such a trajectory for the acrobot to swing-up, starting from the vertically own orientation. Such an optimal trajectory can be obtained by solving the following optimization problem, where we search for $u_{opt}(t)$ and $x_{opt}(t)$ such that their discrepancy with respect to $x_{ref}(t)$ and $u_{ref}(t)$ is minimized:

\begin{align}

u_{opt}(t) = \text{arg}\,

\min\limits_{u(t)}\, \quad & [x(t_f) - x_{ref}(t_f)]^T Q_f [x(t_f) - x_{ref}(t_f)] + \\

& \int_{t_0}^{t_f} (x - x_{ref})^T Q (x - x_{ref}) \, dt + \\

&\int_{t_0}^{t_f} (u - u_{ref})^T R (u - u_{ref}) \, dt, \\

\text{subject to} \quad & \dot{x} = f(x, u), \\

\text{and} \quad & x(t=t_0) = x_0,

\end{align}

where $Q_f$, $Q$, and $R$ are positive definite matrices. $Q_f$ represents the penalty for the terminal state discrepancy at $t=t_f$ while $Q$ and $R$ represent the continuous-time penalties for the state and control vector discrepancies, respectively. The function $f(x, u)$ represents the dynamics of the system. Note that one can change the optimization problem to suit their needs. For example, if reaching a particular state at $t=t_f$ was important, then one may wish to impose an additional equality constraint of the form $x(t_f) = x_{ref}(t_f)$. Additionally, one may add bounds on the state and control variables.

The above problem is in continuous time. In order to solve it, we first discretize the optimization problem (typically in $N$ discrete steps in $[t_0, t_f]$). Such discretization requires many choices and leads to different methods for transcribing the continuous-time problem to discrete-time and the optimization methodology for solving the problem. For example, some common methods (see [1] and references therein for more details) are:

1. Direct transcription

2. Direct shooting

3. Hermite-Simpson collocation (also referred as direct collocation)

Here, we demonstrate how to use the Hermite-Simpson collocation method to solve the trajectory optimization problem for the acrobot model to swing-up in Collimator, which uses the [.code]IPOPT[.code] solver for nonlinear programming. The [.code]solve_trajectory_optimzation[.code] method of the [.code]HermiteSimpsonNMPC[.code] class can be utilized to solve the trajectory optimization problem. Documentation for the class can be found here.

For the trajectory optimization problem, we can disregard the inputs and the outputs, and just focus on creating the object of this class. The initialization parameters reveal that the problem is discretized into $N$ steps of length $dt$. We can provide the matrices $Q, R$, and $QN$, which is analogous to $Q_f$. Additionally, we can specify the lower and upper bounds on both the state and control vectors. Finally, we can specify whether we wish to include the terminal state and terminal control vector as constraints. Let's look at the [.code]solve_trajectory_optimization[.code] method:

Once the object is created, we can solve the trajectory optimization problem from [.code]t_curr[.code] ($t_0$) and [.code]x_curr[.code] ($x_0$), providing [.code]x_ref[.code] and [.code]u_ref[.code] and initial guesses. Note that for a discretization of $N$ steps, if the size of the state and control vectors are [.code]nx[.code] and [.code]nu[.code], respectively, then [.code]x_ref[.code] and [.code]u_ref[.code] are of shapes [.code](N, nx)[.code] and [.code](N, nu)[.code], respectively, i.e. their $i^\mathrm{th}$ row provides the target/reference at the $i^\mathrm{th}$ step.

For the acrobot to swing-up, we don't really have a full time-varying reference trajectory. All we have is that, in the end, the acrobot should be in the swing-up position, i.e. $x_f = [\pi, 0, 0, 0]^T$, in some finite time, say $t_f=4$ seconds. We can choose a discretization of $N=21$ equal steps to reach $t_f$. Since we don't know the time-varying trajectory, we can choose [.code]Q[.code] and [.code]QN[.code] to be zero matrices, implying that we don't really penalize the deviations from the unknown time-varying reference trajectory. However, we do know the final state, so we can include this as a constraint.

We can visualize the solution with the [.code]animate_acrobot[.code] utility:

Note that this is just a solution to the optimization problem. It is not guaranteed, in fact it is unlikely, that the acrobot will get this very state $x_{opt}(t)$ as a solution to its dynamics if $u_{opt}(t)$ were applied as control.

To simulate the Acrobot with the planned torques by the trajectory optimization solution, we can linearly interpolate the solution and pass it to our Acrobot LeafSystem. We can create this interpolant quite conveniently by declaring a [.code]SourceBlock[.code], and providing a [.code]vmap[.code] version of the the [.code]jnp.interp[.code] function as its callback.

With this new [.code]InterpArray[.code] block available to us, we can create a diagram for our acrobot system controlled by the planned torques as follows:

Next, we can simulate the system and animate the output:

The acrobot does follow our refrence in the beginning but then deviates significantly from the desired trajectory. One may say that the trajectory obtained by the process of trajectory optimization is unstable. This is indeed true, and we must stabilize the trajectory. One method for this is the finite-horizon linear quadratic regulator. This is demonstrated next.

Previously (see the LQR example notebook), we saw how the LQR was used to stabilize the plant around an equilibrium point. Here, we consider its extension of stabilizing the plant around a trajectory. The general idea is simple, instead of linearising a plant around an equilibrium point, we linearise the plant around a nominal trajectory. Subsequently, we minimise a quadratic function representing deviations from a desired trajectory. The reader is referred to Chapter 8 of [1] for all things related to the LQR.

In Collimator, the finite-horizon LQR is available as the [.code]FiniteHorizonLinearQuadraticRegulator[.code] block.

For stabilization of the acrobot trajectory, we can use this block and provide the solution of trajectory optimization as both the nominal and desired trajectories.

Note that our trajectory optimization is only available until $t_f$. To see that our stabilization with LQR works, we would like to simulate the system longer than $t_f$ and observe that the acrobot remains in the swing-up orientation. We first create two helper functions, that provide us the trajectory optimization until $t_f$ and the [.code]x_up, u_up[.code] constant solutions for any time beyind $t_f$. These would provide our nominal trajectories.

Next, we setup the parameters for finite-horizon LQR:

Next, we can simulate and visualize the system:

The finite-horizon LQR stabilizes the trajectory quite nicely. We can visualize the difference between the trajectory optimization (unstable) solution and the LQR-stabilized solution.

In summary, we have demonstrated the usage of a collocation method in Collimator for trajectory optimization, and subsequent trajectory stabilization with finite-horizon LQR. We refer the reader to the excellent notes [1] on LQR for an extensive treatment.

****References****:

[1] Russ Tedrake. Underactuated Robotics: Algorithms for Walking, Running, Swimming, Flying, and Manipulation (Course Notes for MIT 6.832). Available online