3. Non-convex optimization

3.1. Gradient Descent

PEPit.examples.nonconvex_optimization.wc_gradient_descent(L, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth.

This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee

\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \tau(n, L, \gamma) (f(x_0) - f(x_n))\]

is valid, where \(x_n\) is the n-th iterates obtained with the gradient method with fixed step-size.

Algorithm: Gradient descent is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size and.

Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), an empirically tight theoretical worst-case guarantee is

\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \frac{4}{3}\frac{L}{n} (f(x_0) - f(x_n)),\]

see discussions in [1, page 190] and [2].

References:

[1] Taylor, A. B. (2017). Convex interpolation and performance estimation of first-order methods for convex optimization. PhD Thesis, UCLouvain.

[2] H. Abbaszadehpeivasti, E. de Klerk, M. Zamani (2021). The exact worst-case convergence rate of the gradient method with fixed step lengths for L-smooth functions. Optimization Letters, 16(6), 1649-1661.

Parameters:
  • L (float) – the smoothness parameter.

  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • wrapper (str) – the name of the wrapper to be used.

  • solver (str) – the name of the solver the wrapper should use.

  • verbose (int) –

    level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + solver details.

Returns:
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=gamma, n=5, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 7x7
(PEPit) Setting up the problem: performance measure is the minimum of 6 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                        Function 1 : Adding 30 scalar constraint(s) ...
                        Function 1 : 30 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.26666666551166657
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite
                All the primal scalar constraints are verified
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative up to an error of 4.5045561757111027e-10
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 4.491578307483797e-09
(PEPit) Final upper bound (dual): 0.2666666657156721 and lower bound (primal example): 0.26666666551166657
(PEPit) Duality gap: absolute: 2.0400553468746807e-10 and relative: 7.650207583915017e-10
*** Example file: worst-case performance of gradient descent with fixed step-size ***
        PEPit guarantee:         min_i ||f'(x_i)||^2 <= 0.266667 (f(x_0)-f_*)
        Theoretical guarantee:   min_i ||f'(x_i)||^2 <= 0.266667 (f(x_0)-f_*)

3.2. No Lips 1

PEPit.examples.nonconvex_optimization.wc_no_lips_1(L, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the constrainted non-convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]

where \(f_2\) is a closed convex indicator function and \(f_1\) is possibly non-convex and \(L\)-smooth relatively to \(h\), and where \(h\) is closed proper and convex.

This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee

\[\min_{0 \leqslant t \leqslant n-1} D_h(x_{t+1}; x_t) \leqslant \tau(n, L, \gamma) (F(x_0) - F(x_n))\]

is valid, where \(x_n\) is the output of the NoLips method, and where \(D_h\) is the Bregman distance generated by \(h\):

\[D_h(x; y) \triangleq h(x) - h(y) - \nabla h (y)^T(x - y).\]

In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n-1}D_h(x_{t+1}; x_t)\) when \(F(x_0) - F(x_n) \leqslant 1\).

Algorithms: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [1, Section 3]. For \(t \in \{0, \dots, n-1\}\),

\[x_{t+1} = \arg\min_{u \in R^d} \nabla f(x_t)^T(u - x_t) + \frac{1}{\gamma} D_h(u; x_t).\]

Theoretical guarantees: The tight theoretical upper bound is obtained in [1, Proposition 4.1]

\[\min_{0 \leqslant t \leqslant n-1} D_h(x_{t+1}; x_t) \leqslant \frac{\gamma}{n(1 - L\gamma)}(F(x_0) - F(x_n))\]

References: The detailed setup and results are availaible in [1]. The PEP approach for studying such settings is presented in [2].

[1] J. Bolte, S. Sabach, M. Teboulle, Y. Vaisbourd (2018). First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3), 2131-2151.

[2] R. Dragomir, A. Taylor, A. d’Aspremont, J. Bolte (2021). Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 1-43.

DISCLAIMER: This example requires some experience with PEPit and PEPs (see Section 4 in [2]).

Parameters:
  • L (float) – relative-smoothness parameter.

  • gamma (float) – step-size (equal to 1/(2*L) for guarantee).

  • n (int) – number of iterations.

  • wrapper (str) – the name of the wrapper to be used.

  • solver (str) – the name of the solver the wrapper should use.

  • verbose (int) –

    level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + solver details.

Returns:
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> gamma = 1 / (2 * L)
>>> pepit_tau, theoretical_tau = wc_no_lips_1(L=L, gamma=gamma, n=5, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 20x20
(PEPit) Setting up the problem: performance measure is the minimum of 5 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                        Function 1 : Adding 30 scalar constraint(s) ...
                        Function 1 : 30 scalar constraint(s) added
                        Function 2 : Adding 30 scalar constraint(s) ...
                        Function 2 : 30 scalar constraint(s) added
                        Function 3 : Adding 49 scalar constraint(s) ...
                        Function 3 : 49 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.19999999999153717
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite
                All the primal scalar constraints are verified up to an error of 2.415956323886803e-12
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative up to an error of 2.576047642211146e-12
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 4.834093507252019e-11
(PEPit) Final upper bound (dual): 0.1999999999931975 and lower bound (primal example): 0.19999999999153717
(PEPit) Duality gap: absolute: 1.6603385333269216e-12 and relative: 8.301692666985887e-12
*** Example file: worst-case performance of the NoLips in Bregman divergence ***
        PEPit guarantee:         min_t Dh(x_(t+1), x_(t)) <= 0.2 (F(x_0) - F(x_n))
        Theoretical guarantee :  min_t Dh(x_(t+1), x_(t)) <= 0.2 (F(x_0) - F(x_n))

3.3. No Lips 2

PEPit.examples.nonconvex_optimization.wc_no_lips_2(L, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the constrainted composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]

where \(f_2\) is a closed convex indicator function and \(f_1\) is possibly non-convex, \(L\)-smooth relatively to \(h\), and \(h\) is closed proper and convex.

This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n,L,\gamma)\) such that the guarantee

\[\min_{0 \leqslant t \leqslant n-1} D_h(x_t;x_{t+1}) \leqslant \tau(n, L, \gamma) (F(x_0) - F(x_n))\]

is valid, where \(x_n\) is the output of the NoLips method, and where \(D_h\) is the Bregman distance generated by \(h\):

\[D_h(x; y) \triangleq h(x) - h(y) - \nabla h (y)^T(x - y).\]

In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n-1}D_h(x_t;x_{t+1})\) when \(F(x_0) - F(x_n) \leqslant 1\).

Algorithms: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [1, Section 3]. For \(t \in \{0, \dots, n-1\}\),

\[x_{t+1} = \arg\min_{u \in R^d} \nabla f(x_t)^T(u - x_t) + \frac{1}{\gamma} D_h(u; x_t).\]

Theoretical guarantees: An empirically tight worst-case guarantee is

\[\min_{0 \leqslant t \leqslant n-1}D_h(x_t;x_{t+1}) \leqslant \frac{\gamma}{n}(F(x_0) - F(x_n)).\]

References: The detailed setup is presented in [1]. The PEP approach for studying such settings is presented in [2].

[1] J. Bolte, S. Sabach, M. Teboulle, Y. Vaisbourd (2018). First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3), 2131-2151.

[2] R. Dragomir, A. Taylor, A. d’Aspremont, J. Bolte (2021). Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 1-43.

DISCLAIMER: This example requires some experience with PEPit and PEPs (see Section 4 in [2]).

Parameters:
  • L (float) – relative-smoothness parameter.

  • gamma (float) – step-size (equal to \(\frac{1}{2L}\) for guarantee).

  • n (int) – number of iterations.

  • wrapper (str) – the name of the wrapper to be used.

  • solver (str) – the name of the solver the wrapper should use.

  • verbose (int) –

    level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + solver details.

Returns:
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_no_lips_2(L=L, gamma=gamma, n=3, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 14x14
(PEPit) Setting up the problem: performance measure is the minimum of 3 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                        Function 1 : Adding 12 scalar constraint(s) ...
                        Function 1 : 12 scalar constraint(s) added
                        Function 2 : Adding 12 scalar constraint(s) ...
                        Function 2 : 12 scalar constraint(s) added
                        Function 3 : Adding 25 scalar constraint(s) ...
                        Function 3 : 25 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.3333333333330493
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite
                All the primal scalar constraints are verified up to an error of 4.196643033083092e-14
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative up to an error of 4.199523786044634e-14
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 5.914928381589679e-13
(PEPit) Final upper bound (dual): 0.3333333333330737 and lower bound (primal example): 0.3333333333330493
(PEPit) Duality gap: absolute: 2.4369395390522186e-14 and relative: 7.310818617162885e-14
*** Example file: worst-case performance of the NoLips_2 in Bregman distance ***
        PEPit guarantee:         min_t Dh(x_(t-1), x_(t)) <= 0.333333 (F(x_0) - F(x_n))
        Theoretical guarantee:   min_t Dh(x_(t-1), x_(t)) <= 0.333333 (F(x_0) - F(x_n))