@@ -198,7 +198,8 @@ in your command line/shell before executing your program.
...
@@ -198,7 +198,8 @@ in your command line/shell before executing your program.
Once the function value of the $x_{k+1}$ is less than that of $x_k$ continue to the next step of the iteration. \\
Once the function value of the $x_{k+1}$ is less than that of $x_k$ continue to the next step of the iteration. \\
In case you enounter a zero-gradient or the gradient length falls below $p$ the algorithm terminates and returns the current position $x_k$ as an estimate for the local minimum argument.
In case you enounter a zero-gradient or the gradient length falls below $p$ the algorithm terminates and returns the current position $x_k$ as an estimate for the local minimum argument.
You usually additionally choose a limit to the number of iterations to avoid being trapped in an infinite loop due to the iteration diverging.\\
You usually additionally choose a limit to the number of iterations to avoid being trapped in an infinite loop due to the iteration diverging.\\
The class provides you with a member \code{stepsize} to be used as $a$ and a Differentiator (\code{diff}) which you can use to calculate the differential values of functions where required. These two things are not passed as arguments to the function \code{GradientDescent::optimize}.\par
The class provides you with a member \code{stepsize} to be used as $a$ and a Differentiator (\code{diff}) which you can use to calculate the differential values of functions where required together with the differentiation step $h$ given by the member \code{diff\_precision}. These three things are not passed as arguments to the function \code{GradientDescent::optimize}.\\
The members \code{diff} and \code{diff\_precision} will be present in all three classes for optimization.\par
Implement the iterated gradient descent as described in the method \code{GradientDescent::optimize}.
Implement the iterated gradient descent as described in the method \code{GradientDescent::optimize}.
\end{homeworkProblem}
\end{homeworkProblem}
\begin{homeworkProblem}
\begin{homeworkProblem}
...
@@ -210,11 +211,38 @@ in your command line/shell before executing your program.
...
@@ -210,11 +211,38 @@ in your command line/shell before executing your program.
If you want to skip this part of the exercise or in order to check that your implementation works, you can use the function \code{nabla} defined in \path{include/differential.h} to generate a function calculating the gradient of a function by providing both a \class{Function} object and a \class{Differentiator} object.
If you want to skip this part of the exercise or in order to check that your implementation works, you can use the function \code{nabla} defined in \path{include/differential.h} to generate a function calculating the gradient of a function by providing both a \class{Function} object and a \class{Differentiator} object.
\end{homeworkProblem}
\end{homeworkProblem}
\begin{homeworkProblem}
\begin{homeworkProblem}
Last but not least we will have a look at the \emph{CG}-method of optimization which is related to the general idea of gradient descent.
The name of this method comes from \emph{conjugate gradients} which refers to the directions of descent $d_i$ of successive iterative steps to be linearly independent according to some scalar product $\left\langle Ax, y\right\rangle$ with $A$ being a positive definite matrix.
The main idea behind this is that one should not have to go in the same direction twice and should instead choose the respective step sizes wisely.
According to Linear Algebra this leaves us with at most $n$ linearly independent/conjugate directions $d_i$ if vector space of our function arguments is $n$-dimensional.
Therefore the algorithm employs a substep where $n$ successive steps are executed using conjugate directions before the resulting end position is used as the new starting point for another substep of $n$ directions.\\
As we want to approximate local minima of functions, a good choice for a matrix $A$ is the Hesse-Matrix of the funtion $f$ that we want to minimize at the position of the local minimum $x^\ast$.
Due to us not knowing the precise position of the minimum (why would we be looking to approximate it, if we were already able to locate it?) we will instead consider $A=\nabla^2f(x_k)$ the Hesse matrix at the current position to be an approximation.\\
Let us now describe the substep of the CG-method in more detail:
\begin{enumerate}
\begin{enumerate}
\item Let $x_0$ be the starting position. Calculate $g_0=\nabla f(x_0)$ and choose the first direction $d_0=-g_0$ as in the gradient descent.
\item
\item
For $k=0,1,2,\ldots n-1$ with $n$ being the dimension of the parameter space of $f$, repeat the following:
\begin{enumerate}[label=\alph*)]
\begin{enumerate}[label=\alph*)]
\item
\item Calculate the Hesse matrix once and store its value:
\item If $k < n-1$: Correct for conjugacy in the next direction $d_{k+1}$:
$$ d_{k+1}=-g_{k+1}+\beta_k d_k$$
\end{enumerate}
\end{enumerate}
\end{enumerate}
The resulting $x_n$ is the end approximation of the substep and can afterwards be used as the input for subsequent executions of this subroutine.\par
You will now implement the CG optimization algorithm in the method \code{optimize} of the class \class{ConjugateGradient}.
To structure your code you will implement the method \code{CGstep} in that class, which executes one full iteration of the subroutine as detailed above.