Leah Pope

2022-06-28

Actually i don't know how to distinguish maximization and minimization of nonlinear function with Newton Raphson Method.

What i know is, for finding the optimization points, we have to calculate this iteration:
${x}_{i+1}={x}_{i}-\left[\mathbf{H}f\left({x}_{i}\right){\right]}^{-1}\mathrm{\nabla }f\left({x}_{i}\right)$
Then, what is actually the difference between maximization and minimization using this method?

Trey Ross

Newton-Raphson is based on a local quadratic approximation. The iterate moves to the optimum of the quadratic approximation. Whether you minimize or maximize does not depend on the iteration calculation (you cannot modify it to turn minimization into maximization or vice versa) but on the shape of the approximation. The approximation is convex when $Hf\left({x}_{i}\right)$ is positive semidefinite (psd), and concave when $Hf\left({x}_{i}\right)$ is negative semidefinite (nsd). When $Hf\left({x}_{i}\right)$ is psd, you expect to move to a local minimum, whereas when it is nsd,you expect to move to a local maximum.

The easiest way to think about this is for functions $\mathbb{R}\to \mathbb{R}$, so let's take $f\left(x\right)={x}^{3}$. At $x=1$ the local quadratic approximation is $g\left(x\right)=1+3\left(x-1\right)+3\left(x-1{\right)}^{2}$ which is convex. So if you perform an iteration of Newton raphson, you move to the minimum of $g$ and you hope to find a minimum of $f$.

On the other hand, if you start at $x=-1$, the local quadratic approximation is $g\left(x\right)=-1+3\left(x+1\right)-3\left(x+1{\right)}^{2}$, which is concave. So if you perform an iteration of Newton raphson, you move to the maximum of $g$ and you hope to find a maximum of $f$.

If the definiteness of $Hf$ does not agree with your goal (e.g., $Hf$ is nsd but you want to minimize), then a quadratic approximation is not useful. It might be better to switch to other methods such as gradient descent.

Yahir Crane

Suppose we want to find the $\stackrel{^}{x}\in {\mathbb{R}}^{k}$ that maximizes the (twice continuously) differentiable function $f:{\mathbb{R}}^{k}\to \mathbb{R}$.

Well
$f\left(\mathbf{x}\mathbf{+}\mathbf{h}\mathbf{\right)}\approx \mathbf{a}\mathbf{+}{\mathbf{b}}^{\prime }\mathbf{h}\mathbf{+}\frac{\mathbf{1}}{\mathbf{2}}{\mathbf{h}}^{\prime }\mathbf{C}\mathbf{h}$
where $a=f\left(x\right),b=\mathrm{\nabla }f\left(x\right)$ and $C={D}^{2}f\left(x\right)$.

Note that C will be symmetric. This implies
$\nabla f\left(x+h\right)\approx b+Ch.$
Thus the first order condition for a maximum is
$0=b+C\stackrel{^}{h}$
which implies that
$\stackrel{^}{h}={C}^{-1}b$
In other words, the vector that maximizes the second order Taylor approximation to $f$ at $x$ is
$\begin{array}{rl}x+\stackrel{^}{h}& =x-{C}^{-1}b\\ & =x-\left({D}^{2}f\left(x{\right)}^{-1}\right)\mathrm{\nabla }f\left(x\right)\end{array}$
Which I am sure you can relate to your initial formula above.

Do you have a similar question?