3. Non-convex optimization
3.1. Gradient Descent
- PEPit.examples.nonconvex_optimization.wc_gradient_descent(L, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth.
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee
\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \tau(n, L, \gamma) (f(x_0) - f(x_n))\]is valid, where \(x_n\) is the n-th iterates obtained with the gradient method with fixed step-size.
Algorithm: Gradient descent is described as follows, for \(t \in \{ 0, \dots, n-1\}\),
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size and.
Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), an empirically tight theoretical worst-case guarantee is
\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \frac{4}{3}\frac{L}{n} (f(x_0) - f(x_n)),\]see discussions in [1, page 190] and [2].
References:
- Parameters:
L (float) – the smoothness parameter.
gamma (float) – step-size.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> L = 1 >>> gamma = 1 / L >>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=gamma, n=5, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 7x7 (PEPit) Setting up the problem: performance measure is the minimum of 6 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 30 scalar constraint(s) ... Function 1 : 30 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.26666666551166657 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite All the primal scalar constraints are verified (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 4.5045561757111027e-10 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 4.491578307483797e-09 (PEPit) Final upper bound (dual): 0.2666666657156721 and lower bound (primal example): 0.26666666551166657 (PEPit) Duality gap: absolute: 2.0400553468746807e-10 and relative: 7.650207583915017e-10 *** Example file: worst-case performance of gradient descent with fixed step-size *** PEPit guarantee: min_i ||f'(x_i)||^2 <= 0.266667 (f(x_0)-f_*) Theoretical guarantee: min_i ||f'(x_i)||^2 <= 0.266667 (f(x_0)-f_*)
3.2. No Lips 1
- PEPit.examples.nonconvex_optimization.wc_no_lips_1(L, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the constrainted non-convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]where \(f_2\) is a closed convex indicator function and \(f_1\) is possibly non-convex and \(L\)-smooth relatively to \(h\), and where \(h\) is closed proper and convex.
This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee
\[\min_{0 \leqslant t \leqslant n-1} D_h(x_{t+1}; x_t) \leqslant \tau(n, L, \gamma) (F(x_0) - F(x_n))\]is valid, where \(x_n\) is the output of the NoLips method, and where \(D_h\) is the Bregman distance generated by \(h\):
\[D_h(x; y) \triangleq h(x) - h(y) - \nabla h (y)^T(x - y).\]In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n-1}D_h(x_{t+1}; x_t)\) when \(F(x_0) - F(x_n) \leqslant 1\).
Algorithms: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [1, Section 3]. For \(t \in \{0, \dots, n-1\}\),
\[x_{t+1} = \arg\min_{u \in R^d} \nabla f(x_t)^T(u - x_t) + \frac{1}{\gamma} D_h(u; x_t).\]Theoretical guarantees: The tight theoretical upper bound is obtained in [1, Proposition 4.1]
\[\min_{0 \leqslant t \leqslant n-1} D_h(x_{t+1}; x_t) \leqslant \frac{\gamma}{n(1 - L\gamma)}(F(x_0) - F(x_n))\]References: The detailed setup and results are availaible in [1]. The PEP approach for studying such settings is presented in [2].
DISCLAIMER: This example requires some experience with PEPit and PEPs (see Section 4 in [2]).
- Parameters:
L (float) – relative-smoothness parameter.
gamma (float) – step-size (equal to 1/(2*L) for guarantee).
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> L = 1 >>> gamma = 1 / (2 * L) >>> pepit_tau, theoretical_tau = wc_no_lips_1(L=L, gamma=gamma, n=5, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 20x20 (PEPit) Setting up the problem: performance measure is the minimum of 5 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 3 function(s) Function 1 : Adding 30 scalar constraint(s) ... Function 1 : 30 scalar constraint(s) added Function 2 : Adding 30 scalar constraint(s) ... Function 2 : 30 scalar constraint(s) added Function 3 : Adding 49 scalar constraint(s) ... Function 3 : 49 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.19999999999153717 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite All the primal scalar constraints are verified up to an error of 2.415956323886803e-12 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 2.576047642211146e-12 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 4.834093507252019e-11 (PEPit) Final upper bound (dual): 0.1999999999931975 and lower bound (primal example): 0.19999999999153717 (PEPit) Duality gap: absolute: 1.6603385333269216e-12 and relative: 8.301692666985887e-12 *** Example file: worst-case performance of the NoLips in Bregman divergence *** PEPit guarantee: min_t Dh(x_(t+1), x_(t)) <= 0.2 (F(x_0) - F(x_n)) Theoretical guarantee : min_t Dh(x_(t+1), x_(t)) <= 0.2 (F(x_0) - F(x_n))
3.3. No Lips 2
- PEPit.examples.nonconvex_optimization.wc_no_lips_2(L, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the constrainted composite convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]where \(f_2\) is a closed convex indicator function and \(f_1\) is possibly non-convex, \(L\)-smooth relatively to \(h\), and \(h\) is closed proper and convex.
This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n,L,\gamma)\) such that the guarantee
\[\min_{0 \leqslant t \leqslant n-1} D_h(x_t;x_{t+1}) \leqslant \tau(n, L, \gamma) (F(x_0) - F(x_n))\]is valid, where \(x_n\) is the output of the NoLips method, and where \(D_h\) is the Bregman distance generated by \(h\):
\[D_h(x; y) \triangleq h(x) - h(y) - \nabla h (y)^T(x - y).\]In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n-1}D_h(x_t;x_{t+1})\) when \(F(x_0) - F(x_n) \leqslant 1\).
Algorithms: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [1, Section 3]. For \(t \in \{0, \dots, n-1\}\),
\[x_{t+1} = \arg\min_{u \in R^d} \nabla f(x_t)^T(u - x_t) + \frac{1}{\gamma} D_h(u; x_t).\]Theoretical guarantees: An empirically tight worst-case guarantee is
\[\min_{0 \leqslant t \leqslant n-1}D_h(x_t;x_{t+1}) \leqslant \frac{\gamma}{n}(F(x_0) - F(x_n)).\]References: The detailed setup is presented in [1]. The PEP approach for studying such settings is presented in [2].
DISCLAIMER: This example requires some experience with PEPit and PEPs (see Section 4 in [2]).
- Parameters:
L (float) – relative-smoothness parameter.
gamma (float) – step-size (equal to \(\frac{1}{2L}\) for guarantee).
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> L = 1 >>> gamma = 1 / L >>> pepit_tau, theoretical_tau = wc_no_lips_2(L=L, gamma=gamma, n=3, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 14x14 (PEPit) Setting up the problem: performance measure is the minimum of 3 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 3 function(s) Function 1 : Adding 12 scalar constraint(s) ... Function 1 : 12 scalar constraint(s) added Function 2 : Adding 12 scalar constraint(s) ... Function 2 : 12 scalar constraint(s) added Function 3 : Adding 25 scalar constraint(s) ... Function 3 : 25 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.3333333333330493 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite All the primal scalar constraints are verified up to an error of 4.196643033083092e-14 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 4.199523786044634e-14 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 5.914928381589679e-13 (PEPit) Final upper bound (dual): 0.3333333333330737 and lower bound (primal example): 0.3333333333330493 (PEPit) Duality gap: absolute: 2.4369395390522186e-14 and relative: 7.310818617162885e-14 *** Example file: worst-case performance of the NoLips_2 in Bregman distance *** PEPit guarantee: min_t Dh(x_(t-1), x_(t)) <= 0.333333 (F(x_0) - F(x_n)) Theoretical guarantee: min_t Dh(x_(t-1), x_(t)) <= 0.333333 (F(x_0) - F(x_n))