3 Non-convex optimization

3.1 Gradient Descent 

PEPit.examples.nonconvex_optimization.wc_gradient_descent(L, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth.

This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee

\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \tau(n, L, \gamma) (f(x_0) - f(x_n))\]

is valid, where \(x_n\) is the n-th iterates obtained with the gradient method with fixed step-size.

Algorithm: Gradient descent is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size.

Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), an empirically tight theoretical worst-case guarantee is

\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \frac{4}{3}\frac{L}{n} (f(x_0) - f(x_n)),\]

see discussions in [1, page 190] and [2].

References:

[1] Taylor, A. B. (2017). Convex interpolation and performance estimation of first-order methods for convex optimization. PhD Thesis, UCLouvain.

[2] H. Abbaszadehpeivasti, E. de Klerk, M. Zamani (2021). The exact worst-case convergence rate of the gradient method with fixed step lengths for L-smooth functions. Optimization Letters, 16(6), 1649-1661.

Parameters:

L (float) – the smoothness parameter.
gamma (float) – step-size.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + solver details.

Returns:

pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=gamma, n=5, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 7x7
(PEPit) Setting up the problem: performance measure is the minimum of 6 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                        Function 1 : Adding 30 scalar constraint(s) ...
                        Function 1 : 30 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.2666666655116666
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite
                All the primal scalar constraints are verified
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative up to an error of 4.5045553432130066e-10
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 4.49157760320874e-09
(PEPit) Final upper bound (dual): 0.2666666657156721 and lower bound (primal example): 0.2666666655116666
(PEPit) Duality gap: absolute: 2.0400547917631684e-10 and relative: 7.650205502246835e-10
*** Example file: worst-case performance of gradient descent with fixed step-size ***
        PEPit guarantee:         min_i ||f'(x_i)||^2 <= 0.266667 (f(x_0)-f_*)
        Theoretical guarantee:   min_i ||f'(x_i)||^2 <= 0.266667 (f(x_0)-f_*)

3.2 No Lips 1 

PEPit.examples.nonconvex_optimization.wc_no_lips_1(L, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the constrainted non-convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]

where \(f_2\) is a closed convex indicator function and \(f_1\) is possibly non-convex and \(L\)-smooth relatively to \(h\), and where \(h\) is closed proper and convex.

This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee

\[\min_{0 \leqslant t \leqslant n-1} D_h(x_{t+1}; x_t) \leqslant \tau(n, L, \gamma) (F(x_0) - F(x_n))\]

is valid, where \(x_n\) is the output of the NoLips method, and where \(D_h\) is the Bregman distance generated by \(h\):

\[D_h(x; y) \triangleq h(x) - h(y) - \nabla h (y)^T(x - y).\]

In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n-1}D_h(x_{t+1}; x_t)\) when \(F(x_0) - F(x_n) \leqslant 1\).

Algorithms: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [1, Section 3]. For \(t \in \{0, \dots, n-1\}\),

\[x_{t+1} = \arg\min_{u \in R^d} \nabla f(x_t)^T(u - x_t) + \frac{1}{\gamma} D_h(u; x_t).\]

Theoretical guarantees: The tight theoretical upper bound is obtained in [1, Proposition 4.1]

\[\min_{0 \leqslant t \leqslant n-1} D_h(x_{t+1}; x_t) \leqslant \frac{\gamma}{n(1 - L\gamma)}(F(x_0) - F(x_n))\]

References: The detailed setup and results are availaible in [1]. The PEP approach for studying such settings is presented in [2].

[1] J. Bolte, S. Sabach, M. Teboulle, Y. Vaisbourd (2018). First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3), 2131-2151.

[2] R. Dragomir, A. Taylor, A. d’Aspremont, J. Bolte (2021). Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 1-43.

DISCLAIMER: This example requires some experience with PEPit and PEPs (see Section 4 in [2]).

Parameters:

L (float) – relative-smoothness parameter.
gamma (float) – step-size (equal to 1/(2*L) for guarantee).
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + solver details.

Returns:

pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> gamma = 1 / (2 * L)
>>> pepit_tau, theoretical_tau = wc_no_lips_1(L=L, gamma=gamma, n=5, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 20x20
(PEPit) Setting up the problem: performance measure is the minimum of 5 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                        Function 1 : Adding 30 scalar constraint(s) ...
                        Function 1 : 30 scalar constraint(s) added
                        Function 2 : Adding 30 scalar constraint(s) ...
                        Function 2 : 30 scalar constraint(s) added
                        Function 3 : Adding 49 scalar constraint(s) ...
                        Function 3 : 49 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.19999999999153723
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite
                All the primal scalar constraints are verified up to an error of 2.416178368491728e-12
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative up to an error of 2.5759179508275314e-12
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 4.8348778296212814e-11
(PEPit) Final upper bound (dual): 0.19999999999319756 and lower bound (primal example): 0.19999999999153723
(PEPit) Duality gap: absolute: 1.6603385333269216e-12 and relative: 8.301692666985885e-12
*** Example file: worst-case performance of the NoLips in Bregman divergence ***
        PEPit guarantee:         min_t Dh(x_(t+1), x_(t)) <= 0.2 (F(x_0) - F(x_n))
        Theoretical guarantee :  min_t Dh(x_(t+1), x_(t)) <= 0.2 (F(x_0) - F(x_n))

3.3 No Lips 2 

PEPit.examples.nonconvex_optimization.wc_no_lips_2(L, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the constrainted composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]

where \(f_2\) is a closed convex indicator function and \(f_1\) is possibly non-convex, \(L\)-smooth relatively to \(h\), and \(h\) is closed proper and convex.

This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n,L,\gamma)\) such that the guarantee

\[\min_{0 \leqslant t \leqslant n-1} D_h(x_t;x_{t+1}) \leqslant \tau(n, L, \gamma) (F(x_0) - F(x_n))\]

is valid, where \(x_n\) is the output of the NoLips method, and where \(D_h\) is the Bregman distance generated by \(h\):

\[D_h(x; y) \triangleq h(x) - h(y) - \nabla h (y)^T(x - y).\]

In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n-1}D_h(x_t;x_{t+1})\) when \(F(x_0) - F(x_n) \leqslant 1\).

Algorithms: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [1, Section 3]. For \(t \in \{0, \dots, n-1\}\),

\[x_{t+1} = \arg\min_{u \in R^d} \nabla f(x_t)^T(u - x_t) + \frac{1}{\gamma} D_h(u; x_t).\]

Theoretical guarantees: An empirically tight worst-case guarantee is

\[\min_{0 \leqslant t \leqslant n-1}D_h(x_t;x_{t+1}) \leqslant \frac{\gamma}{n}(F(x_0) - F(x_n)).\]

References: The detailed setup is presented in [1]. The PEP approach for studying such settings is presented in [2].

[1] J. Bolte, S. Sabach, M. Teboulle, Y. Vaisbourd (2018). First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3), 2131-2151.

[2] R. Dragomir, A. Taylor, A. d’Aspremont, J. Bolte (2021). Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 1-43.

DISCLAIMER: This example requires some experience with PEPit and PEPs (see Section 4 in [2]).

Parameters:

L (float) – relative-smoothness parameter.
gamma (float) – step-size (equal to \(\frac{1}{2L}\) for guarantee).
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + solver details.

Returns:

pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_no_lips_2(L=L, gamma=gamma, n=3, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 14x14
(PEPit) Setting up the problem: performance measure is the minimum of 3 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                        Function 1 : Adding 12 scalar constraint(s) ...
                        Function 1 : 12 scalar constraint(s) added
                        Function 2 : Adding 12 scalar constraint(s) ...
                        Function 2 : 12 scalar constraint(s) added
                        Function 3 : Adding 25 scalar constraint(s) ...
                        Function 3 : 25 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.3333333333330492
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite
                All the primal scalar constraints are verified up to an error of 4.18554080283684e-14
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative up to an error of 4.1967204284666985e-14
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 5.917219186068013e-13
(PEPit) Final upper bound (dual): 0.3333333333330736 and lower bound (primal example): 0.3333333333330492
(PEPit) Duality gap: absolute: 2.4369395390522186e-14 and relative: 7.310818617162887e-14
*** Example file: worst-case performance of the NoLips_2 in Bregman distance ***
        PEPit guarantee:         min_t Dh(x_(t-1), x_(t)) <= 0.333333 (F(x_0) - F(x_n))
        Theoretical guarantee:   min_t Dh(x_(t-1), x_(t)) <= 0.333333 (F(x_0) - F(x_n))

3.4 Gradient descent on smooth function satisfying quadratic Lojasiewicz inequality (naive version)

PEPit.examples.nonconvex_optimization.wc_gradient_descent_quadratic_lojasiewicz_naive(L, mu, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and satisfies a quadratic Lojasiewicz inequality:

\[f(x)-f_\star \leqslant \frac{1}{2\mu}\|\nabla f(x) \|^2,\]

details can be found in [1,2,3].

This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee

\[f(x_n)-f_\star \leqslant \tau(n, L, \mu, \gamma) (f(x_0) - f(x_\star))\]

is valid, where \(x_n\) is the n-th iterates obtained with the gradient method with fixed step-size.

Algorithm: Gradient descent is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size and.

Theoretical guarantee: We compare with the guarantees from [4, Theorem 3].

References:

[1] S. Lojasiewicz (1963). Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles, 117 (1963), 87–89.

[2] B. Polyak (1963). Gradient methods for the minimisation of functionals USSR Computational Mathematics and Mathematical Physics 3(4), 864–878.

[3] J. Bolte, A. Daniilidis, and A. Lewis (2007). The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM Journal on Optimization 17, 1205–1223.

[4] H. Abbaszadehpeivasti, E. de Klerk, M. Zamani (2023). Conditions for linear convergence of the gradient method for non-convex optimization. Optimization Letters.

Parameters:

L (float) – the smoothness parameter.
mu (float) – Lojasiewicz parameter.
gamma (float) – step-size.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + solver details.

Returns:

pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.

Example

>>> L, mu, gamma, n = 1, .2, 1, 1
>>> pepit_tau, theoretical_tau = wc_gradient_descent_quadratic_lojasiewicz_naive(L=L, gamma=gamma, n=n, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 4x4
(PEPit) Setting up the problem: performance measure is the minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                        Function 1 : Adding 10 scalar constraint(s) ...
                        Function 1 : 10 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.7272727088286305
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite
                All the primal scalar constraints are verified
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative up to an error of 6.45387237307569e-09
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 3.656794643018536e-08
(PEPit) Final upper bound (dual): 0.7272727115834279 and lower bound (primal example): 0.7272727088286305
(PEPit) Duality gap: absolute: 2.754797390203123e-09 and relative: 3.7878465075914795e-09
*** Example file: worst-case performance of gradient descent with fixed step-size ***
***      (smooth problem satisfying a Lojasiewicz inequality; cheap naive version) ***
        PEPit guarantee:         f(x_1) - f(x_*) <= 0.727273 (f(x_0)-f_*)
        Theoretical guarantee:   f(x_1) - f(x_*) <= 0.727273 (f(x_0)-f_*)

3.5 Gradient descent on smooth function satisfying quadratic Lojasiewicz inequality (intermediate)

PEPit.examples.nonconvex_optimization.wc_gradient_descent_quadratic_lojasiewicz_intermediate(L, mu, gamma, n, alpha, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and satisfies a quadratic Lojasiewicz inequality:

\[f(x)-f_\star \leqslant \frac{1}{2\mu}\|\nabla f(x) \|^2,\]

details can be found in [1,2,3]. The example here relies on the SmoothQuadraticLojasiewiczFunctionCheap description of smooth Lojasiewicz functions (based on [5, Proposition 3.2]).

This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee

\[f(x_n)-f_\star \leqslant \tau(n, L, \mu, \gamma) (f(x_0) - f(x_\star))\]

is valid, where \(x_n\) is the n-th iterates obtained with the gradient method with fixed step-size.

Algorithm: Gradient descent is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size and.

Theoretical guarantee: We compare with the guarantees from [4, Theorem 3].

References:

[1] S. Lojasiewicz (1963). Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles, 117 (1963), 87–89.

[2] B. Polyak (1963). Gradient methods for the minimisation of functionals USSR Computational Mathematics and Mathematical Physics 3(4), 864–878.

[3] J. Bolte, A. Daniilidis, and A. Lewis (2007). The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM Journal on Optimization 17, 1205–1223.

[4] H. Abbaszadehpeivasti, E. de Klerk, M. Zamani (2023). Conditions for linear convergence of the gradient method for non-convex optimization. Optimization Letters.

[5] A. Rubbens, J.M. Hendrickx, A. Taylor (2025). A constructive approach to strengthen algebraic descriptions of function and operator classes.

Parameters:

L (float) – the smoothness parameter.
mu (float) – Lojasiewicz parameter.
gamma (float) – step-size.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + solver details.

Returns:

pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.

Example

>>> L, mu, gamma, n = 1, .2, 1, 1
>>> alpha = (2*mu/(2*L+mu))
>>> pepit_tau, theoretical_tau = wc_gradient_descent_quadratic_lojasiewicz_intermediate(L=L, gamma=gamma, n=1, alpha=alpha, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 4x4
(PEPit) Setting up the problem: performance measure is the minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                        Function 1 : Adding 16 scalar constraint(s) ...
                        Function 1 : 16 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.727272727239017
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite
                All the primal scalar constraints are verified up to an error of 9.529710354172494e-12
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative up to an error of 5.521136597015314e-11
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 3.812156896706273e-11
(PEPit) Final upper bound (dual): 0.7272727272394729 and lower bound (primal example): 0.727272727239017
(PEPit) Duality gap: absolute: 4.558575739110893e-13 and relative: 6.268041641568012e-13
*** Example file: worst-case performance of gradient descent with fixed step-size ***
***      (smooth problem satisfying a Lojasiewicz inequality; intermediate version) ***
        PEPit guarantee:         f(x_1) - f(x_*) <= 0.727273 (f(x_0)-f_*)
        Theoretical guarantee:   f(x_1) - f(x_*) <= 0.727273 (f(x_0)-f_*)

3.6 Gradient descent on smooth function satisfying quadratic Lojasiewicz inequality (expensive version)

PEPit.examples.nonconvex_optimization.wc_gradient_descent_quadratic_lojasiewicz_expensive(L, mu, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and satisfies a quadratic Lojasiewicz inequality:

\[f(x)-f_\star \leqslant \frac{1}{2\mu}\|\nabla f(x) \|^2,\]

details can be found in [1,2,3]. The example here relies on the SmoothQuadraticLojasiewiczFunctionExpensive description of smooth Lojasiewicz functions (based on [5, Proposition 3.4]).

This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee

\[f(x_n)-f_\star \leqslant \tau(n, L, \mu, \gamma) (f(x_0) - f(x_\star))\]

is valid, where \(x_n\) is the n-th iterates obtained with the gradient method with fixed step-size.

Algorithm: Gradient descent is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size and.

Theoretical guarantee: We compare with the guarantees from [4, Theorem 3].

References:

[1] S. Lojasiewicz (1963). Une propriété topologique des sous-ensembles analytiques réels. Les équations aux dérivées partielles, 117 (1963), 87–89.

[2] B. Polyak (1963). Gradient methods for the minimisation of functionals USSR Computational Mathematics and Mathematical Physics 3(4), 864–878.

[3] J. Bolte, A. Daniilidis, and A. Lewis (2007). The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM Journal on Optimization 17, 1205–1223.

[4] H. Abbaszadehpeivasti, E. de Klerk, M. Zamani (2023). Conditions for linear convergence of the gradient method for non-convex optimization. Optimization Letters.

[5] A. Rubbens, J.M. Hendrickx, A. Taylor (2025). A constructive approach to strengthen algebraic descriptions of function and operator classes.

Parameters:

L (float) – the smoothness parameter.
mu (float) – Lojasiewicz parameter.
gamma (float) – step-size.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + solver details.

Returns:

pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.

Example

>>> L, mu, gamma, n = 1, .2, 1, 1
>>> pepit_tau, theoretical_tau = wc_gradient_descent_quadratic_lojasiewicz_expensive(L=L, gamma=gamma, n=n, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 4x4
(PEPit) Setting up the problem: performance measure is the minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                        Function 1 : Adding 4 scalar constraint(s) ...
                        Function 1 : 4 scalar constraint(s) added
                        Function 1 : Adding 6 lmi constraint(s) ...
                 Size of PSD matrix 1: 4x4
                 Size of PSD matrix 2: 4x4
                 Size of PSD matrix 3: 4x4
                 Size of PSD matrix 4: 4x4
                 Size of PSD matrix 5: 4x4
                 Size of PSD matrix 6: 4x4
                        Function 1 : 6 lmi constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.6832669556328734
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite
                All required PSD matrices are indeed positive semi-definite up to an error of 1.0099203333404037e-09
                All the primal scalar constraints are verified
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual matrices to lmi are positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative up to an error of 5.671954340368105e-10
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 2.0306640495891322e-08
(PEPit) Final upper bound (dual): 0.6832669563172779 and lower bound (primal example): 0.6832669556328734
(PEPit) Duality gap: absolute: 6.844044220244427e-10 and relative: 1.0016647466735981e-09
*** Example file: worst-case performance of gradient descent with fixed step-size ***
***      (smooth problem satisfying a Lojasiewicz inequality; expensive version) ***
        PEPit guarantee:         f(x_1) - f(x_*) <= 0.683267 (f(x_0)-f_*)
        Theoretical guarantee:   f(x_1) - f(x_*) <= 0.727273 (f(x_0)-f_*)

3.7 Difference-of-convex algorithm (DCA)

PEPit.examples.nonconvex_optimization.wc_difference_of_convex_algorithm(mu1, mu2, L1, L2, n, alpha=0, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the minimization problem

\[F_\star \triangleq \min_x f_1(x)-f_2(x),\]

where \(f_1\) and \(f_2\) are convex functions, respectively \(L_1\)-smooth and \(\mu_1\)-strongly convex and \(L_2\)-smooth and \(\mu_2\)-strongly convex.

This code computes a worst-case guarantee for DCA (difference-of-convex algorithm, also known as the convex-concave procedure). That is, it computes the smallest possible \(\tau(n, \mu_1, L_1,\mu_2, L_2)\) such that the guarantee

\[\min_{t\leqslant n} \|\nabla f_1(x_t)-\nabla f_2(x_t)\|^2 \leqslant \tau(n, \mu_1, L_1,\mu_2, L_2) (f_1(x_0)-f_2(x_0)-F_\star)\]

is valid, where \(x_n\) is the n-th iterates obtained with DCA.

Algorithm: DCA is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\[x_{t+1} \in \mathrm{argmin}_x\,\{ f_1(x) - \langle \nabla f_2(x_t), x\rangle\},\]

Theoretical guarantee: The results are compared with [1, Theorem 3]; a more complete picture can be obtained from [2], also by possibly allowing for non-convex functions \(f_1\) and \(f_2\) (i.e., possibly negative values for \(\mu_1\), \(\mu_2\)).

References:

[1] H. Abbaszadehpeivasti, E. de Klerk, M. Zamani (2021). On the rate of convergence of the difference-of-convex algorithm (DCA). Journal of Optimization Theory and Applications, 202(1), 475-496.

[2] T. Rotaru, P. Patrinos, F. Glineur (2025). Tight Analysis of Difference-of-Convex Algorithm (DCA) Improves Convergence Rates for Proximal Gradient Descent. Journal of Optimization Theory and Applications, 202(1), 475-496.

Parameters:

mu1 (float) – strong convexity parameter for f1.
mu2 (float) – strong convexity parameter for f2.
L1 (float) – smoothness parameter for f1.
L2 (float) – smoothness parameter for f2.
alpha (float) – boosting parameter (defaulted to 0).
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + solver details.

Returns:

pepit_tau (float) – worst-case value.
theoretical_tau (float) – reference theoretical value [1, Theorem 3].

Example

>>> L1, L2, mu1, mu2 = 2., 5., .15, .1
>>> pepit_tau, theory = wc_difference_of_convex_algorithm(mu1=mu1, mu2=mu2, L1=L1, L2=L2, n=5, alpha=0, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 15x15
(PEPit) Setting up the problem: performance measure is the minimum of 6 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (7 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                        Function 1 : Adding 42 scalar constraint(s) ...
                        Function 1 : 42 scalar constraint(s) added
                        Function 2 : Adding 42 scalar constraint(s) ...
                        Function 2 : 42 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: prosta.prim_and_dual_feas (wrapper:mosek, solver: MOSEK); optimal value: 0.49113062165416893
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite
                All the primal scalar constraints are verified up to an error of 8.798134450149764e-09
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative up to an error of 1.0048489946103415e-08
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 2.102726387109924e-07
(PEPit) Final upper bound (dual): 0.4911306282045956 and lower bound (primal example): 0.49113062165416893
(PEPit) Duality gap: absolute: 6.550426645546281e-09 and relative: 1.3337442946407816e-08
*** Example file: worst-case performance of DCA ***
        PEPit guarantee:         min_i ||f'(x_i)||^2 <= 0.491131 (f(x_0)-f_*)
        Theoretical guarantee:   min_i ||f'(x_i)||^2 <= 0.491131 (f(x_0)-f_*)

3 Non-convex optimization

3.1 Gradient Descent

3.2 No Lips 1

3.3 No Lips 2

3.4 Gradient descent on smooth function satisfying quadratic Lojasiewicz inequality (naive version)

3.5 Gradient descent on smooth function satisfying quadratic Lojasiewicz inequality (intermediate)

3.6 Gradient descent on smooth function satisfying quadratic Lojasiewicz inequality (expensive version)

3.7 Difference-of-convex algorithm (DCA)

3.1 Gradient Descent 

3.2 No Lips 1 

3.3 No Lips 2 