1. Unconstrained convex minimization

1.1. Gradient descent

PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent(L, gamma, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \gamma) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: Gradient descent is described by

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size.

Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), the tight theoretical guarantee can be found in [1, Theorem 3.1]:

\[f(x_n)-f_\star \leqslant \frac{L}{4nL\gamma+2} \|x_0-x_\star\|^2,\]

which is tight on some Huber loss functions.

References:

[1] Y. Drori, M. Teboulle (2014). Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming 145(1–2), 451–482.

Parameters
  • L (float) – the smoothness parameter.

  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> L = 3
>>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=1 / L, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 30 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: MOSEK); optimal value: 0.16666666497937685
*** Example file: worst-case performance of gradient descent with fixed step-sizes ***
    PEPit guarantee:             f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
    Theoretical guarantee:       f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2

1.2. Subgradient method

PEPit.examples.unconstrained_convex_minimization.wc_subgradient_method(M, n, gamma, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is convex and \(M\)-Lipschitz. This problem is a (possibly non-smooth) minimization problem.

This code computes a worst-case guarantee for the subgradient method. That is, it computes the smallest possible \(\tau(n, M, \gamma)\) such that the guarantee

\[\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star \leqslant \tau(n, M, \gamma) \|x_0 - x_\star\|\]

is valid, where \(x_t\) is the output of the subgradient method after \(t\leqslant n\) steps, and where \(x_\star\) is the minimizer of \(f\).

In short, for given values of \(M\), the step-size \(\gamma\) and the number of iterations \(n\), \(\tau(n, M, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star\) when \(\|x_0-x_\star\| \leqslant 1\).

Algorithm: For \(t\in \{0, \dots, n-1 \}\)

\begin{eqnarray} g_{t} & \in & \partial f(x_t) \\ x_{t+1} & = & x_t - \gamma g_t \end{eqnarray}

Theoretical guarantee: The tight bound is obtained in [1, Section 3.2.3] and [2, Eq (2)]

\[\min_{0 \leqslant t \leqslant n} f(x_t)- f(x_\star) \leqslant \frac{M}{\sqrt{n+1}}\|x_0-x_\star\|,\]

and tightness follows from the lower complexity bound for this class of problems, e.g., [3, Appendix A].

References: Classical references on this topic include [1, 2].

[1] Y. Nesterov (2003). Introductory lectures on convex optimization: A basic course. Springer Science & Business Media.

[2] S. Boyd, L. Xiao, A. Mutapcic (2003). Subgradient Methods (lecture notes).

[3] Y. Drori, M. Teboulle (2016). An optimal variant of Kelley’s cutting-plane method. Mathematical Programming, 160(1), 321-351.

Parameters
  • M (float) – the Lipschitz parameter.

  • n (int) – the number of iterations.

  • gamma (float) – step-size.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> M = 2
>>> n = 6
>>> gamma = 1 / (M * sqrt(n + 1))
>>> pepit_tau, theoretical_tau = wc_subgradient_method(M=M, n=n, gamma=gamma, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 9x9
(PEPit) Setting up the problem: performance measure is minimum of 7 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 64 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.7559825331741553
*** Example file: worst-case performance of subgradient method ***
    PEPit guarantee:             min_(0 \leq t \leq n) f(x_i) - f_* <= 0.755983 ||x_0 - x_*||
    Theoretical guarantee:       min_(0 \leq t \leq n) f(x_i) - f_* <= 0.755929 ||x_0 - x_*||

1.3. Subgradient method under restricted secant inequality and error bound

PEPit.examples.unconstrained_convex_minimization.wc_subgradient_method_rsi_eb(mu, L, gamma, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) verifies the “lower” restricted secant inequality (\(\mu-\text{RSI}^-\)) and the “upper” error bound (\(L-\text{EB}^+\)) [1].

This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, \mu, L, \gamma)\) such that the guarantee

\[\| x_n - x_\star \|^2 \leqslant \tau(n, \mu, L, \gamma) \| x_0 - x_\star \|^2\]

is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, \mu, L, \gamma)\) is computed as the worst-case value of \(\| x_n - x_\star \|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: Sub-gradient descent is described by

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size.

Theoretical guarantee: The tight theoretical guarantee can be found in [1, Prop 1] (upper bound) and [1, Theorem 2] (lower bound):

\[\| x_n - x_\star \|^2 \leqslant (1 - 2\gamma\mu + L^2 \gamma^2)^n \|x_0-x_\star\|^2.\]

References:

Definition and convergence guarantees can be found in [1].

[1] C. Guille-Escuret, B. Goujaud, A. Ibrahim, I. Mitliagkas (2022). Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound.

Parameters
  • mu (float) – the rsi parameter

  • L (float) – the eb parameter.

  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> mu = .1
>>> L = 1
>>> pepit_tau, theoretical_tau = wc_subgradient_method_under_restricted_secant_inequality_and_error_bound(mu=mu, L=L, gamma=mu / L**2, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 6x6
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 8 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.9605893213566064
*** Example file: worst-case performance of gradient descent with fixed step-sizes ***
    PEPit guarantee:         f(x_n)-f_* <= 0.960589 ||x_0 - x_*||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.960596 ||x_0 - x_*||^2

1.5. Conjugate gradient

PEPit.examples.unconstrained_convex_minimization.wc_conjugate_gradient(L, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code computes a worst-case guarantee for the conjugate gradient (CG) method (with exact span searches). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0-x_\star\|^2\]

is valid, where \(x_n\) is the output of the conjugate gradient method, and where \(x_\star\) is a minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0-x_\star\|^2 \leqslant 1\).

Algorithm:

\[x_{t+1} = x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i)\]

with

\[(\gamma_i)_{i \leqslant t} = \arg\min_{(\gamma_i)_{i \leqslant t}} f \left(x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i) \right)\]

Theoretical guarantee:

The tight guarantee obtained in [1] is

\[f(x_n) - f_\star \leqslant\frac{L}{2 \theta_n^2}\|x_0-x_\star\|^2.\]

where

\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}, \end{eqnarray}

and tightness follows from [2, Theorem 3].

References: The detailed approach (based on convex relaxations) is available in [1, Corollary 6].

[1] Y. Drori and A. Taylor (2020). Efficient first-order methods for convex minimization: a constructive approach. Mathematical Programming 184 (1), 183-220.

[2] Y. Drori (2017). The exact information-based complexity of smooth convex minimization. Journal of Complexity, 39, 1-16.

Parameters
  • L (float) – the smoothness parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_conjugate_gradient(L=1, n=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 18 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.061893515427809735
*** Example file: worst-case performance of conjugate gradient method ***
    PEPit guarantee:             f(x_n)-f_* <= 0.0618935 ||x_0 - x_*||^2
    Theoretical guarantee:       f(x_n)-f_* <= 0.0618942 ||x_0 - x_*||^2

1.6. Heavy Ball momentum

PEPit.examples.unconstrained_convex_minimization.wc_heavy_ball_momentum(mu, L, alpha, beta, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for the Heavy-ball (HB) method, aka Polyak momentum method. That is, it computes the smallest possible \(\tau(n, L, \mu, \alpha, \beta)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \alpha, \beta) (f(x_0) - f_\star)\]

is valid, where \(x_n\) is the output of the Heavy-ball (HB) method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu, \alpha, \beta)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).

Algorithm:

\[x_{t+1} = x_t - \alpha \nabla f(x_t) + \beta (x_t-x_{t-1})\]

with

\[\alpha \in (0, \frac{1}{L}]\]

and

\[\beta = \sqrt{(1 - \alpha \mu)(1 - L \alpha)}\]

Theoretical guarantee:

The upper guarantee obtained in [2, Theorem 4] is

\[f(x_n) - f_\star \leqslant (1 - \alpha \mu)^n (f(x_0) - f_\star).\]

References: This methods was first introduce in [1, Section 2], and convergence upper bound was proven in [2, Theorem 4].

[1] B.T. Polyak (1964). Some methods of speeding up the convergence of iteration method. URSS Computational Mathematics and Mathematical Physics.

[2] E. Ghadimi, H. R. Feyzmahdavian, M. Johansson (2015). Global convergence of the Heavy-ball method for convex optimization. European Control Conference (ECC).

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • alpha (float) – parameter of the scheme.

  • beta (float) – parameter of the scheme such that \(0<\beta<1\) and \(0<\alpha<2(1+\beta)\).

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> mu = 0.1
>>> L = 1.
>>> alpha = 1 / (2 * L)  # alpha \in [0, 1 / L]
>>> beta = sqrt((1 - alpha * mu) * (1 - L * alpha))
>>> pepit_tau, theoretical_tau = wc_heavy_ball_momentum(mu=mu, L=L, alpha=alpha, beta=beta, n=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 5x5
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 12 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.8145062493468549
*** Example file: worst-case performance of the Heavy-Ball method ***
    PEPit guarantee:             f(x_n)-f_* <= 0.753492 (f(x_0) -  f(x_*))
    Theoretical guarantee:       f(x_n)-f_* <= 0.9025 (f(x_0) -  f(x_*))

1.7. Accelerated gradient for convex objective

PEPit.examples.unconstrained_convex_minimization.wc_accelerated_gradient_convex(mu, L, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex (\(\mu\) is possibly 0).

This code computes a worst-case guarantee for an accelerated gradient method, a.k.a. fast gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the accelerated gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: The accelerated gradient method of this example is provided by

\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \\ y_{t+1} & = & x_{t+1} + \frac{t-1}{t+2} (x_{t+1} - x_t). \end{eqnarray}

Theoretical guarantee: When \(\mu=0\), a tight empirical guarantee can be found in [1, Table 1]:

\[f(x_n)-f_\star \leqslant \frac{2L\|x_0-x_\star\|^2}{n^2 + 5 n + 6},\]

where tightness is obtained on some Huber loss functions.

References:

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

Parameters
  • mu (float) – the strong convexity parameter

  • L (float) – the smoothness parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_convex(mu=0, L=1, n=1, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 4x4
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 6 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.16666666668209376
*** Example file: worst-case performance of accelerated gradient method ***
    PEPit guarantee:             f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
    Theoretical guarantee:       f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2

1.8. Accelerated gradient for strongly convex objective

PEPit.examples.unconstrained_convex_minimization.wc_accelerated_gradient_strongly_convex(mu, L, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for an accelerated gradient method, a.k.a fast gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \left(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2\right),\]

is valid, where \(x_n\) is the output of the accelerated gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: For \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} y_t & = & x_t + \frac{\sqrt{L} - \sqrt{\mu}}{\sqrt{L} + \sqrt{\mu}}(x_t - x_{t-1}) \\ x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \end{eqnarray}

with \(x_{-1}:= x_0\).

Theoretical guarantee:

The following upper guarantee can be found in [1, Corollary 4.15]:

\[f(x_n)-f_\star \leqslant \left(1 - \sqrt{\frac{\mu}{L}}\right)^n \left(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2\right).\]

References:

[1] A. d’Aspremont, D. Scieur, A. Taylor (2021). Acceleration Methods. Foundations and Trends in Optimization: Vol. 5, No. 1-2.

Parameters
  • mu (float) – the strong convexity parameter

  • L (float) – the smoothness parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_strongly_convex(mu=0.1, L=1, n=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 5x5
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 12 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.34758587217463155
*** Example file: worst-case performance of the accelerated gradient method ***
    PEPit guarantee:             f(x_n)-f_*  <= 0.347586 (f(x_0) -  f(x_*) +  mu/2*|| x_0 - x_* ||**2)
    Theoretical guarantee:       f(x_n)-f_*  <= 0.467544 (f(x_0) -  f(x_*) +  mu/2*|| x_0 - x_* ||**2)

1.9. Optimized gradient

PEPit.examples.unconstrained_convex_minimization.wc_optimized_gradient(L, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code computes a worst-case guarantee for optimized gradient method (OGM). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of OGM and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: The optimized gradient method is described by

\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t)\\ y_{t+1} & = & x_{t+1} + \frac{\theta_{t}-1}{\theta_{t+1}}(x_{t+1}-x_t)+\frac{\theta_{t}}{\theta_{t+1}}(x_{t+1}-y_t), \end{eqnarray}

with

\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}. \end{eqnarray}

Theoretical guarantee: The tight theoretical guarantee can be found in [2, Theorem 2]:

\[f(x_n)-f_\star \leqslant \frac{L\|x_0-x_\star\|^2}{2\theta_n^2},\]

where tightness follows from [3, Theorem 3].

References: The optimized gradient method was developed in [1, 2]; the corresponding lower bound was first obtained in [3].

[1] Y. Drori, M. Teboulle (2014). Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming 145(1–2), 451–482.

[2] D. Kim, J. Fessler (2016). Optimized first-order methods for smooth convex minimization. Mathematical Programming 159.1-2: 81-107.

[3] Y. Drori (2017). The exact information-based complexity of smooth convex minimization. Journal of Complexity, 39, 1-16.

Parameters
  • L (float) – the smoothness parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_optimized_gradient(L=3, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 30 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: MOSEK); optimal value: 0.07675182659831646
*** Example file: worst-case performance of optimized gradient method ***
        PEPit guarantee:       f(y_n)-f_* <= 0.0767518 || x_0 - x_* ||^2
        Theoretical guarantee:  f(y_n)-f_* <= 0.0767518 || x_0 - x_* ||^2

1.10. Optimized gradient for gradient

PEPit.examples.unconstrained_convex_minimization.wc_optimized_gradient_for_gradient(L, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code computes a worst-case guarantee for optimized gradient method for gradient (OGM-G). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[\|\nabla f(x_n)\|^2 \leqslant \tau(n, L) (f(x_0) - f_\star)\]

is valid, where \(x_n\) is the output of OGM-G and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(\|\nabla f(x_n)\|^2\) when \(f(x_0)-f_\star \leqslant 1\).

Algorithm: For \(t\in\{0,1,\ldots,n-1\}\), the optimized gradient method for gradient [1, Section 6.3] is described by

\begin{eqnarray} y_{t+1} & = & x_t - \frac{1}{L} \nabla f(x_t),\\ x_{t+1} & = & y_{t+1} + \frac{(\tilde{\theta}_t-1)(2\tilde{\theta}_{t+1}-1)}{\tilde{\theta}_t(2\tilde{\theta}_t-1)}(y_{t+1}-y_t)+\frac{2\tilde{\theta}_{t+1}-1}{2\tilde{\theta}_t-1}(y_{t+1}-x_t), \end{eqnarray}

with

\begin{eqnarray} \tilde{\theta}_n & = & 1 \\ \tilde{\theta}_t & = & \frac{1 + \sqrt{4 \tilde{\theta}_{t+1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \tilde{\theta}_0 & = & \frac{1 + \sqrt{8 \tilde{\theta}_{1}^2 + 1}}{2}. \end{eqnarray}

Theoretical guarantee: The tight worst-case guarantee can be found in [1, Theorem 6.1]:

\[\|\nabla f(x_n)\|^2 \leqslant \frac{2L(f(x_0)-f_\star)}{\tilde{\theta}_0^2},\]

where tightness is achieved on Huber losses, see [1, Section 6.4].

References: The optimized gradient method for gradient was developed in [1].

[1] D. Kim, J. Fessler (2021). Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. Journal of optimization theory and applications, 188(1), 192-219.

Parameters
  • L (float) – the smoothness parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_optimized_gradient_for_gradient(L=3, n=4, verbose=1)
(PEP-it) Setting up the problem: size of the main PSD matrix: 7x7
(PEP-it) Setting up the problem: performance measure is minimum of 1 element(s)
(PEP-it) Setting up the problem: initial conditions (1 constraint(s) added)
(PEP-it) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 30 constraint(s) added
(PEP-it) Compiling SDP
(PEP-it) Calling SDP solver
(PEP-it) Solver status: optimal (solver: MOSEK); optimal value: 0.30700730460985826
*** Example file: worst-case performance of optimized gradient method for gradient ***
        PEP-it guarantee:       ||f'(x_n)|| ^ 2 <= 0.307007 (f(x_0) - f_*)
        Theoretical guarantee:  ||f'(x_n)|| ^ 2 <= 0.307007 (f(x_0) - f_*)

1.11. Robust momentum

PEPit.examples.unconstrained_convex_minimization.wc_robust_momentum(mu, L, lam, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly-convex.

This code computes a worst-case guarantee for the robust momentum method (RMM). That is, it computes the smallest possible \(\tau(n, \mu, L, \lambda)\) such that the guarantee

\[v(x_{n+1}) \leqslant \tau(n, \mu, L, \lambda) v(x_{n}),\]

is valid, where \(x_n\) is the \(n^{\mathrm{th}}\) iterate of the RMM, and \(x_\star\) is a minimizer of \(f\). The function \(v(.)\) is a well-chosen Lyapunov defined as follows,

\begin{eqnarray} v(x_t) & = & l\|z_t - x_\star\|^2 + q_t, \\ q_t & = & (L - \mu) \left(f(x_t) - f_\star - \frac{\mu}{2}\|y_t - x_\star\|^2 - \frac{1}{2}\|\nabla f(y_t) - \mu (y_t - x_\star)\|^2 \right), \end{eqnarray}

with \(\kappa = \frac{\mu}{L}\), \(\rho = \lambda (1 - \frac{1}{\kappa}) + (1 - \lambda) \left(1 - \frac{1}{\sqrt{\kappa}}\right)\), and \(l = \mu^2 \frac{\kappa - \kappa \rho^2 - 1}{2 \rho (1 - \rho)}\).

Algorithm:

For \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} x_{t+1} & = & x_{t} + \beta (x_t - x_{t-1}) - \alpha \nabla f(y_t), \\ y_{t+1} & = & y_{t} + \gamma (x_t - x_{t-1}), \end{eqnarray}

with \(x_{-1}, x_0 \in \mathrm{R}^d\), and with parameters \(\alpha = \frac{\kappa (1 - \rho^2)(1 + \rho)}{L}\), \(\beta = \frac{\kappa \rho^3}{\kappa - 1}\), \(\gamma = \frac{\rho^2}{(\kappa - 1)(1 - \rho)^2(1 + \rho)}\).

Theoretical guarantee:

A convergence guarantee (empirically tight) is obtained in [1, Theorem 1],

\[v(x_{n+1}) \leqslant \rho^2 v(x_n),\]

with \(\rho = \lambda (1 - \frac{1}{\kappa}) + (1 - \lambda) \left(1 - \frac{1}{\sqrt{\kappa}}\right)\).

References:

[1] S. Cyrus, B. Hu, B. Van Scoy, L. Lessard (2018). A robust accelerated optimization algorithm for strongly convex functions. American Control Conference (ACC).

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • lam (float) – if \(\lambda=1\) it is the gradient descent, if \(\lambda=0\), it is the Triple Momentum Method.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Examples

>>> pepit_tau, theoretical_tau = wc_robust_momentum(0.1, 1, 0.2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 5x5
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 6 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.5285548355257013
*** Example file: worst-case performance of the Robust Momentum Method ***
    PEPit guarantee:             v(x_(n+1)) <= 0.528555 v(x_n)
    Theoretical guarantee:       v(x_(n+1)) <= 0.528555 v(x_n)

1.12. Triple momentum

PEPit.examples.unconstrained_convex_minimization.wc_triple_momentum(mu, L, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for triple momentum method (TMM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the TMM, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm:

For \(t \in \{ 1, \dots, n\}\)

\begin{eqnarray} \xi_{t+1} &&= (1 + \beta) \xi_{t} - \beta \xi_{t-1} - \alpha \nabla f(y_t) \\ y_{t} &&= (1+\gamma ) \xi_{t} -\gamma \xi_{t-1} \\ x_{t} && = (1 + \delta) \xi_{t} - \delta \xi_{t-1} \end{eqnarray}

with

\begin{eqnarray} \kappa &&= \frac{L}{\mu} , \quad \rho = 1- \frac{1}{\sqrt{\kappa}}\\ (\alpha, \beta, \gamma,\delta) && = \left(\frac{1+\rho}{L}, \frac{\rho^2}{2-\rho}, \frac{\rho^2}{(1+\rho)(2-\rho)}, \frac{\rho^2}{1-\rho^2}\right) \end{eqnarray}

and

\begin{eqnarray} \xi_{0} = x_0 \\ \xi_{1} = x_0 \\ y = x_0 \end{eqnarray}

Theoretical guarantee: A theoretical upper (empirically tight) bound can be found in [1, Theorem 1, eq. 4]:

\[f(x_n)-f_\star \leqslant \frac{\rho^{2(n+1)} L \kappa}{2}\|x_0 - x_\star\|^2.\]

References: The triple momentum method was discovered and analyzed in [1].

[1] Van Scoy, B., Freeman, R. A., Lynch, K. M. (2018), The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Letters, 2(1), 49-54.

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_triple_momentum(mu=0.1, L=1., n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 30 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.23893532450841679
*** Example file: worst-case performance of the Triple Momentum Method ***
    PEPit guarantee:             f(x_n)-f_* <= 0.238935 ||x_0-x_*||^2
    Theoretical guarantee:       f(x_n)-f_* <= 0.238925 ||x_0-x_*||^2

1.13. Information theoretic exact method

PEPit.examples.unconstrained_convex_minimization.wc_information_theoretic(mu, L, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex (\(\mu\) is possibly 0).

This code computes a worst-case guarantee for the information theoretic exact method (ITEM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee

\[\|z_n - x_\star\|^2 \leqslant \tau(n, L, \mu) \|z_0 - x_\star\|^2\]

is valid, where \(z_n\) is the output of the ITEM, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\|z_n - x_\star\|^2\) when \(\|z_0 - x_\star\|^2 \leqslant 1\).

Algorithm: For \(t\in\{0,1,\ldots,n-1\}\), the information theoretic exact method of this example is provided by

\begin{eqnarray} y_{t} & = & (1-\beta_t) z_t + \beta_t x_t \\ x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \\ z_{t+1} & = & \left(1-q\delta_t\right) z_t+q\delta_t y_t-\frac{\delta_t}{L}\nabla f(y_t), \end{eqnarray}

with \(y_{-1}=x_0=z_0\), \(q=\frac{\mu}{L}\) (inverse condition ratio), and the scalar sequences:

\begin{eqnarray} A_{t+1} & = & \frac{(1+q)A_t+2\left(1+\sqrt{(1+A_t)(1+qA_t)}\right)}{(1-q)^2},\\ \beta_{t+1} & = & \frac{A_t}{(1-q)A_{t+1}},\\ \delta_{t+1} & = & \frac{1}{2}\frac{(1-q)^2A_{t+1}-(1+q)A_t}{1+q+q A_t}, \end{eqnarray}

with \(A_0=0\).

Theoretical guarantee: A tight worst-case guarantee can be found in [1, Theorem 3]:

\[\|z_n - x_\star\|^2 \leqslant \frac{1}{1+q A_n} \|z_0-x_\star\|^2,\]

where tightness is obtained on some quadratic loss functions (see [1, Lemma 2]).

References:

[1] A. Taylor, Y. Drori (2021). An optimal gradient method for smooth strongly convex minimization. arXiv 2101.09741v2.

Parameters
  • mu (float) – the strong convexity parameter.

  • L (float) – the smoothness parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_information_theoretic(mu=.001, L=1, n=15, verbose=1)
(PEP-it) Setting up the problem: size of the main PSD matrix: 17x17
(PEP-it) Setting up the problem: performance measure is minimum of 1 element(s)
(PEP-it) Setting up the problem: initial conditions (1 constraint(s) added)
(PEP-it) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 240 constraint(s) added
(PEP-it) Compiling SDP
(PEP-it) Calling SDP solver
(PEP-it) Solver status: optimal (solver: MOSEK); optimal value: 0.7566088333863754
*** Example file: worst-case performance of the information theoretic exact method ***
        PEP-it guarantee:       ||z_n - x_* ||^2 <= 0.756609 ||z_0 - x_*||^2
        Theoretical guarantee:  ||z_n - x_* ||^2 <= 0.756605 ||z_0 - x_*||^2

1.14. Proximal point

PEPit.examples.unconstrained_convex_minimization.wc_proximal_point(gamma, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is closed, proper, and convex (and potentially non-smooth).

This code computes a worst-case guarantee for the proximal point method with step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n,\gamma)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, \gamma) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the proximal point method, and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(n\) and \(\gamma\), \(\tau(n,\gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm:

The proximal point method is described by

\[x_{t+1} = \arg\min_x \left\{f(x)+\frac{1}{2\gamma}\|x-x_t\|^2 \right\},\]

where \(\gamma\) is a step-size.

Theoretical guarantee:

The tight theoretical guarantee can be found in [1, Theorem 4.1]:

\[f(x_n)-f_\star \leqslant \frac{\|x_0-x_\star\|^2}{4\gamma n},\]

where tightness is obtained on, e.g., one-dimensional linear problems on the positive orthant.

References:

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

Parameters
  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_proximal_point(gamma=3, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 6x6
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 20 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: MOSEK); optimal value: 0.020833335685727362
*** Example file: worst-case performance of proximal point method ***
    PEPit guarantee:           f(x_n)-f_* <= 0.0208333 ||x_0 - x_*||^2
    Theoretical guarantee:      f(x_n)-f_* <= 0.0208333 ||x_0 - x_*||^2

1.15. Accelerated proximal point

PEPit.examples.unconstrained_convex_minimization.wc_accelerated_proximal_point(A0, gammas, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is convex and possibly non-smooth.

This code computes a worst-case guarantee an accelerated proximal point method, aka fast proximal point method (FPP). That is, it computes the smallest possible \(\tau(n, A_0,\vec{\gamma})\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, A_0, \vec{\gamma}) \left(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2\right)\]

is valid, where \(x_n\) is the output of FPP (with step-size \(\gamma_t\) at step \(t\in \{0, \dots, n-1\}\)) and where \(x_\star\) is a minimizer of \(f\) and \(A_0\) is a positive number.

In short, for given values of \(n\), \(A_0\) and \(\vec{\gamma}\), \(\tau(n)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2 \leqslant 1\), for the following method.

Algorithm: For \(t\in \{0, \dots, n-1\}\):

\begin{eqnarray} y_{t+1} & = & (1-\alpha_{t} ) x_{t} + \alpha_{t} v_t \\ x_{t+1} & = & \arg\min_x \left\{f(x)+\frac{1}{2\gamma_t}\|x-y_{t+1}\|^2 \right\}, \\ v_{t+1} & = & v_t + \frac{1}{\alpha_{t}} (x_{t+1}-y_{t+1}) \end{eqnarray}

with

\begin{eqnarray} \alpha_{t} & = & \frac{\sqrt{(A_t \gamma_t)^2 + 4 A_t \gamma_t} - A_t \gamma_t}{2} \\ A_{t+1} & = & (1 - \alpha_{t}) A_t \end{eqnarray}

and \(v_0=x_0\).

Theoretical guarantee: A theoretical upper bound can be found in [1, Theorem 2.3.]:

\[f(x_n)-f_\star \leqslant \frac{4}{A_0 (\sum_{t=0}^{n-1} \sqrt{\gamma_t})^2}\left(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2 \right).\]

References: The accelerated proximal point was first obtained and analyzed in [1].

[1] O. Güler (1992). New proximal point algorithms for convex minimization. SIAM Journal on Optimization, 2(4):649–664.

Parameters
  • A0 (float) – initial value for parameter A_0.

  • gammas (list) – sequence of step-sizes.

  • n (int) – number of iterations.

  • verbose (int) – Level of information details to print. - 1: No verbose at all. - 0: This example’s output. - 1: This example’s output + PEPit information. - 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_proximal_point(A0=5, gammas=[(i + 1) / 1.1 for i in range(3)], n=3, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 6x6
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
          function 1 : 20 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.01593113594082973
*** Example file: worst-case performance of optimized gradient method ***
    PEPit guarantee:       f(x_n)-f_* <= 0.0159311  (f(x_0) - f_* + A0/2* ||x_0 - x_*||^2)
    Theoretical guarantee:  f(x_n)-f_* <= 0.0511881  (f(x_0) - f_* + A0/2* ||x_0 - x_*||^2)

1.16. Inexact gradient descent

PEPit.examples.unconstrained_convex_minimization.wc_inexact_gradient_descent(L, mu, epsilon, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for the inexact gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu, \varepsilon)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \varepsilon) (f(x_0) - f_\star)\]

is valid, where \(x_n\) is the output of the inexact gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\), \(\mu\) and \(\varepsilon\), \(\tau(n, L, \mu, \varepsilon)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).

Algorithm:

\[x_{t+1} = x_t - \gamma d_t\]

with

\[\|d_t - \nabla f(x_t)\| \leqslant \varepsilon \|\nabla f(x_t)\|\]

and

\[\gamma = \frac{2}{L_{\varepsilon} + \mu_{\varepsilon}}\]

where \(L_{\varepsilon} = (1 + \varepsilon) L\) and \(\mu_{\varepsilon} = (1 - \varepsilon) \mu\).

Theoretical guarantee:

The tight worst-case guarantee obtained in [1, Theorem 5.3] or [2, Remark 1.6] is

\[f(x_n) - f_\star \leqslant \left(\frac{L_{\varepsilon}-\mu_{\varepsilon}}{L_{\varepsilon}+\mu_{\varepsilon}}\right)^{2n}(f(x_0) - f_\star),\]

where tightness is achieved on simple quadratic functions.

References: The detailed analyses can be found in [1, 2].

[1] E. De Klerk, F. Glineur, A. Taylor (2020). Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation. SIAM Journal on Optimization, 30(3), 2053-2082.

[2] O. Gannot (2021). A frequency-domain analysis of inexact gradient methods. Mathematical Programming (to appear).

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • epsilon (float) – level of inaccuracy.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_inexact_gradient_descent(L=1, mu=.1, epsilon=.1, n=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 8x8
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 15 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.5188661397616067
*** Example file: worst-case performance of inexact gradient method in distance in function values ***
    PEPit guarantee:             f(x_n)-f_* <= 0.518866 (f(x_0)-f_*)
    Theoretical guarantee:       f(x_n)-f_* <= 0.518917 (f(x_0)-f_*)

1.18. Inexact accelerated gradient

PEPit.examples.unconstrained_convex_minimization.wc_inexact_accelerated_gradient(L, epsilon, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code computes a worst-case guarantee for an accelerated gradient method using inexact first-order information. That is, it computes the smallest possible \(\tau(n, L, \varepsilon)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \varepsilon) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of inexact accelerated gradient descent and where \(x_\star\) is a minimizer of \(f\).

The inexact descent direction is assumed to satisfy a relative inaccuracy described by (with \(0\leqslant \varepsilon \leqslant 1\))

\[\|\nabla f(y_t) - d_t\| \leqslant \varepsilon \|\nabla f(y_t)\|,\]

where \(\nabla f(y_t)\) is the true gradient at \(y_t\) and \(d_t\) is the approximate descent direction that is used.

Algorithm: The inexact accelerated gradient method of this example is provided by

\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} d_t\\ y_{k+1} & = & x_{t+1} + \frac{t-1}{t+2} (x_{t+1} - x_t). \end{eqnarray}

Theoretical guarantee: When \(\varepsilon=0\), a tight empirical guarantee can be found in [1, Table 1]:

\[f(x_n)-f_\star \leqslant \frac{2L\|x_0-x_\star\|^2}{n^2 + 5 n + 6},\]

which is achieved on some Huber loss functions (when \(\varepsilon=0\)).

References:

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

Parameters
  • L (float) – smoothness parameter.

  • epsilon (float) – level of inaccuracy

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_inexact_accelerated_gradient(L=1, epsilon=0.1, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 13x13
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 47 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.03944038534724904
*** Example file: worst-case performance of inexact accelerated gradient method ***
    PEPit guarantee:                                 f(x_n)-f_* <= 0.0394404 (f(x_0)-f_*)
    Theoretical guarantee for epsilon = 0 :      f(x_n)-f_* <= 0.0357143 (f(x_0)-f_*)

1.19. Gradient descent for quadratically upper bounded convex objective

PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent_qg_convex(L, gamma, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [1]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.

This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \gamma) \| x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(||x_0 - x_\star||^2 \leqslant 1\).

Algorithm: Gradient descent is described by

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size.

Theoretical guarantee: When \(\gamma < \frac{1}{L}\), the lower theoretical guarantee can be found in [1, Theorem 2.2]:

\[f(x_n)-f_\star \leqslant \frac{L}{2}\max\left(\frac{1}{2n L \gamma + 1}, L \gamma\right) \|x_0-x_\star\|^2.\]

References:

The detailed approach is available in [1, Theorem 2.2].

[1] B. Goujaud, A. Taylor, A. Dieuleveut (2022). Optimal first-order methods for convex functions with a quadratic upper bound.

Parameters
  • L (float) – the quadratic growth parameter.

  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> L = 1
>>> pepit_tau, theoretical_tau = wc_gradient_descent_qg_convex(L=L, gamma=.2 / L, n=4, verbose=1)
(PEP-it) Setting up the problem: size of the main PSD matrix: 7x7
(PEP-it) Setting up the problem: performance measure is minimum of 1 element(s)
(PEP-it) Setting up the problem: initial conditions (1 constraint(s) added)
(PEP-it) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 35 constraint(s) added
(PEP-it) Compiling SDP
(PEP-it) Calling SDP solver
(PEP-it) Solver status: optimal (solver: SCS); optimal value: 0.19230811671886025
*** Example file: worst-case performance of gradient descent with fixed step-sizes ***
    PEP-it guarantee:            f(x_n)-f_* <= 0.192308 ||x_0 - x_*||^2
    Theoretical guarantee:       f(x_n)-f_* <= 0.192308 ||x_0 - x_*||^2

1.20. Gradient descent with decreasing step sizes for quadratically upper bounded convex objective

PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent_qg_convex_decreasing(L, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [1]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.

This code computes a worst-case guarantee for gradient descent with decreasing step-sizes. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L) \| x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of gradient descent with decreasing step-sizes, and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(||x_0 - x_\star||^2 \leqslant 1\).

Algorithm: Gradient descent with decreasing step sizes is described by

\[x_{t+1} = x_t - \gamma_t \nabla f(x_t)\]

with

\[\gamma_t = \frac{1}{L u_{t+1}}\]

where the sequence \(u\) is defined by

\begin{eqnarray} u_0 & = & 1 \\ u_{t} & = & \frac{u_{t-1}}{2} + \sqrt{\left(\frac{u_{t-1}}{2}\right)^2 + 2}, \quad \mathrm{for } t \geq 1 \end{eqnarray}

Theoretical guarantee: The tight theoretical guarantee is conjectured in [1, Conjecture A.3]:

\[f(x_n)-f_\star \leqslant \frac{L}{2 u_t} \|x_0-x_\star\|^2.\]

Notes:

We verify that \(u_t \sim 2\sqrt{t}\). The step sizes as well as the function values of the iterates decrease as \(O\left( \frac{1}{\sqrt{t}} \right)\).

References:

The detailed approach is available in [1, Appendix A.3].

[1] B. Goujaud, A. Taylor, A. Dieuleveut (2022). Optimal first-order methods for convex functions with a quadratic upper bound.

Parameters
  • L (float) – the quadratic growth parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_gradient_descent_qg_convex_decreasing(L=1, n=6, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 9x9
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: initial conditions (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 63 constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.10554312873105381
*** Example file: worst-case performance of gradient descent with fixed step-sizes ***
    PEP-it guarantee:            f(x_n)-f_* <= 0.105543 ||x_0 - x_*||^2
    Theoretical guarantee:       f(x_n)-f_* <= 0.105547 ||x_0 - x_*||^2

1.21. Conjugate gradient for quadratically upper bounded convex objective

PEPit.examples.unconstrained_convex_minimization.wc_conjugate_gradient_qg_convex(L, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [2]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.

This code computes a worst-case guarantee for the conjugate gradient (CG) method (with exact span searches). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0-x_\star\|^2\]

is valid, where \(x_n\) is the output of the conjugate gradient method, and where \(x_\star\) is a minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0-x_\star\|^2 \leqslant 1\).

Algorithm:

\[x_{t+1} = x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i)\]

with

\[(\gamma_i)_{i \leqslant t} = \arg\min_{(\gamma_i)_{i \leqslant t}} f \left(x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i) \right)\]

Theoretical guarantee:

The tight guarantee obtained in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper) is

\[f(x_n) - f_\star \leqslant \frac{L}{2 (n + 1)} \|x_0-x_\star\|^2.\]

References: The detailed approach (based on convex relaxations) is available in [1, Corollary 6], and the result provided in [2, Theorem 2.4].

[1] Y. Drori and A. Taylor (2020). Efficient first-order methods for convex minimization: a constructive approach. Mathematical Programming 184 (1), 183-220.

[2] B. Goujaud, A. Taylor, A. Dieuleveut (2022). Optimal first-order methods for convex functions with a quadratic upper bound.

Parameters
  • L (float) – the quadratic growth parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_conjugate_gradient_qg_convex(L=1, n=12, verbose=1)
(PEP-it) Setting up the problem: size of the main PSD matrix: 27x27
(PEP-it) Setting up the problem: performance measure is minimum of 1 element(s)
(PEP-it) Setting up the problem: initial conditions (1 constraint(s) added)
(PEP-it) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 351 constraint(s) added
(PEP-it) Compiling SDP
(PEP-it) Calling SDP solver
(PEP-it) Solver status: optimal (solver: SCS); optimal value: 0.038461130525391705
*** Example file: worst-case performance of conjugate gradient method ***
    PEP-it guarantee:            f(x_n)-f_* <= 0.0384611 ||x_0 - x_*||^2
    Theoretical guarantee:       f(x_n)-f_* <= 0.0384615 ||x_0 - x_*||^2

1.22. Heavy Ball momentum for quadratically upper bounded convex objective

PEPit.examples.unconstrained_convex_minimization.wc_heavy_ball_momentum_qg_convex(L, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [2]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.

This code computes a worst-case guarantee for the Heavy-ball (HB) method, aka Polyak momentum method. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the Heavy-ball (HB) method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm:

This method is described in [1]

\[x_{t+1} = x_t - \alpha_t \nabla f(x_t) + \beta_t (x_t-x_{t-1})\]

with

\[\alpha_t = \frac{1}{L} \frac{1}{t+2}\]

and

\[\beta_t = \frac{t}{t+2}\]

Theoretical guarantee:

The tight guarantee obtained in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper) is

\[f(x_n) - f_\star \leqslant \frac{L}{2}\frac{1}{n+1} \|x_0 - x_\star\|^2.\]

References: This methods was first introduce in [1, section 3], and convergence tight bound was proven in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper).

[1] E. Ghadimi, H. R. Feyzmahdavian, M. Johansson (2015). Global convergence of the Heavy-ball method for convex optimization. European Control Conference (ECC).

[2] B. Goujaud, A. Taylor, A. Dieuleveut (2022). Optimal first-order methods for convex functions with a quadratic upper bound.

Parameters
  • L (float) – the quadratic growth parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • 1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_heavy_ball_momentum_qg_convex(L=1, n=5, verbose=1)
(PEP-it) Setting up the problem: size of the main PSD matrix: 9x9
(PEP-it) Setting up the problem: performance measure is minimum of 1 element(s)
(PEP-it) Setting up the problem: initial conditions (1 constraint(s) added)
(PEP-it) Setting up the problem: interpolation conditions for 1 function(s)
         function 1 : 63 constraint(s) added
(PEP-it) Compiling SDP
(PEP-it) Calling SDP solver
(PEP-it) Solver status: optimal (solver: SCS); optimal value: 0.08333167067320212
*** Example file: worst-case performance of the Heavy-Ball method ***
    PEP-it guarantee:            f(x_n)-f_* <= 0.0833317 ||x_0 - x_*||^2
    Theoretical guarantee:       f(x_n)-f_* <= 0.0833333 ||x_0 - x_*||^2