10. Low dimensional worst-cases scenarios

Inexact gradient
Optimized gradient

10.1. Inexact gradient 

PEPit.examples.low_dimensional_worst_cases_scenarios.wc_inexact_gradient(L, mu, epsilon, n, verbose=True)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for an inexact gradient method. That is, it computes the smallest possible \(\tau(n,L,\mu,\varepsilon)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n,L,\mu,\varepsilon) (f(x_0) - f_\star)\]

is valid, where \(x_n\) is the output of the gradient descent with an inexact descent direction, and where \(x_\star\) is the minimizer of \(f\).

The inexact descent direction is assumed to satisfy a relative inaccuracy described by (with \(0 \leqslant \varepsilon \leqslant 1\))

\[\|\nabla f(x_t) - d_t\| \leqslant \varepsilon \|\nabla f(x_t)\|,\]

where \(\nabla f(x_t)\) is the true gradient, and \(d_t\) is the approximate descent direction that is used.

Algorithm:

The inexact gradient descent under consideration can be written as

\[x_{t+1} = x_t - \frac{2}{L_{\varepsilon} + \mu_{\varepsilon}} d_t\]

where \(d_t\) is the inexact search direction, \(L_{\varepsilon} = (1 + \varepsilon)L\) and \(\mu_{\varepsilon} = (1-\varepsilon) \mu\).

Theoretical guarantee:

A tight worst-case guarantee obtained in [1, Theorem 5.3] or [2, Remark 1.6] is

\[f(x_n) - f_\star \leqslant \left(\frac{L_{\varepsilon} - \mu_{\varepsilon}}{L_{\varepsilon} + \mu_{\varepsilon}}\right)^{2n}(f(x_0) - f_\star ),\]

with \(L_{\varepsilon} = (1 + \varepsilon)L\) and \(\mu_{\varepsilon} = (1-\varepsilon) \mu\). This guarantee is achieved on one-dimensional quadratic functions.

References:The detailed analyses can be found in [1, 2].

[1] E. De Klerk, F. Glineur, A. Taylor (2020). Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation. SIAM Journal on Optimization, 30(3), 2053-2082.

[2] O. Gannot (2021). A frequency-domain analysis of inexact gradient methods. Mathematical Programming (to appear).

Parameters

L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
epsilon (float) – level of inaccuracy
n (int) – number of iterations.
verbose (bool) – if True, print conclusion

Returns

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_inexact_gradient(L=1, mu=0.1, epsilon=0.1, n=2, verbose=True)
(PEPit) Setting up the problem: size of the main PSD matrix: 8x8
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 15 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.5188661397616067
(PEPit) Postprocessing: applying trace heuristic. Currently 4 eigenvalue(s) > 1e-05 before resolve.
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); objective value: 0.5188511226295036
(PEPit) Postprocessing: 3 eigenvalue(s) > 1e-05 after trace heuristic
*** Example file: worst-case performance of inexact gradient ***
    PEPit guarantee:             f(x_n)-f_* <= 0.518851 (f(x_0)-f_*)
    Theoretical guarantee:       f(x_n)-f_* <= 0.518917 (f(x_0)-f_*)

10.2. Optimized gradient 

PEPit.examples.low_dimensional_worst_cases_scenarios.wc_optimized_gradient(L, n, verbose=True)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code computes a worst-case guarantee for optimized gradient method (OGM), and applies the trace heuristic for trying to find a low-dimensional worst-case example on which this guarantee is achieved. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of OGM and where \(x_\star\) is a minimizer of \(f\). Then, it applies the trace heuristic, which allows obtaining a one-dimensional function on which the guarantee is achieved.

Algorithm: The optimized gradient method is described by

\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t)\\ y_{t+1} & = & x_{t+1} + \frac{\theta_{t}-1}{\theta_{t+1}}(x_{t+1}-x_t)+\frac{\theta_{t}}{\theta_{t+1}}(x_{t+1}-y_t), \end{eqnarray}

with

\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}. \end{eqnarray}

Theoretical guarantee: The tight theoretical guarantee can be found in [2, Theorem 2]:

\[f(x_n)-f_\star \leqslant \frac{L\|x_0-x_\star\|^2}{2\theta_n^2}.\]

References: The OGM was developed in [1,2]. Low-dimensional worst-case functions for OGM were obtained in [3, 4].

[1] Y. Drori, M. Teboulle (2014). Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming 145(1–2), 451–482.

[2] D. Kim, J. Fessler (2016). Optimized first-order methods for smooth convex minimization. Mathematical Programming 159.1-2: 81-107.

[3] A. Taylor, J. Hendrickx, F. Glineur (2017). Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Mathematical Programming, 161(1-2), 307-345.

[4] D. Kim, J. Fessler (2017). On the convergence analysis of the optimized gradient method. Journal of Optimization Theory and Applications, 172(1), 187-205.