1. Unconstrained convex minimization
1.1. Gradient descent
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent(L, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \gamma) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: Gradient descent is described by
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size.
Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), the tight theoretical guarantee can be found in [1, Theorem 3.1]:
\[f(x_n)-f_\star \leqslant \frac{L}{4nL\gamma+2} \|x_0-x_\star\|^2,\]which is tight on some Huber loss functions.
References:
- Parameters:
L (float) – the smoothness parameter.
gamma (float) – step-size.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> L = 3 >>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=1 / L, n=4, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 7x7 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 30 scalar constraint(s) ... Function 1 : 30 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.1666666649793712 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 6.325029587061441e-10 All the primal scalar constraints are verified up to an error of 6.633613956752438e-09 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 7.0696173743789816e-09 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 7.547098305159261e-08 (PEPit) Final upper bound (dual): 0.16666667331941884 and lower bound (primal example): 0.1666666649793712 (PEPit) Duality gap: absolute: 8.340047652488636e-09 and relative: 5.004028642152831e-08 *** Example file: worst-case performance of gradient descent with fixed step-sizes *** PEPit guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
1.2. Subgradient method
- PEPit.examples.unconstrained_convex_minimization.wc_subgradient_method(M, n, gamma, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is convex and \(M\)-Lipschitz. This problem is a (possibly non-smooth) minimization problem.
This code computes a worst-case guarantee for the subgradient method. That is, it computes the smallest possible \(\tau(n, M, \gamma)\) such that the guarantee
\[\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star \leqslant \tau(n, M, \gamma)\]is valid, where \(x_t\) are the iterates of the subgradient method after \(t\leqslant n\) steps, where \(x_\star\) is a minimizer of \(f\), and when \(\|x_0-x_\star\|\leqslant 1\).
In short, for given values of \(M\), the step-size \(\gamma\) and the number of iterations \(n\), \(\tau(n, M, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star\) when \(\|x_0-x_\star\| \leqslant 1\).
Algorithm: For \(t\in \{0, \dots, n-1 \}\)
\begin{eqnarray} g_{t} & \in & \partial f(x_t) \\ x_{t+1} & = & x_t - \gamma g_t \end{eqnarray}Theoretical guarantee: The tight bound is obtained in [1, Section 3.2.3] and [2, Eq (2)]
\[\min_{0 \leqslant t \leqslant n} f(x_t)- f(x_\star) \leqslant \frac{M}{\sqrt{n+1}}\|x_0-x_\star\|,\]and tightness follows from the lower complexity bound for this class of problems, e.g., [3, Appendix A].
References: Classical references on this topic include [1, 2].
[2] S. Boyd, L. Xiao, A. Mutapcic (2003). Subgradient Methods (lecture notes).
- Parameters:
M (float) – the Lipschitz parameter.
n (int) – the number of iterations.
gamma (float) – step-size.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> M = 2 >>> n = 6 >>> gamma = 1 / (M * sqrt(n + 1)) >>> pepit_tau, theoretical_tau = wc_subgradient_method(M=M, n=n, gamma=gamma, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 9x9 (PEPit) Setting up the problem: performance measure is the minimum of 7 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 64 scalar constraint(s) ... Function 1 : 64 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.7559287513714278 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite All the primal scalar constraints are verified (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 1.0475429120359347e-09 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 8.251581484359109e-08 (PEPit) Final upper bound (dual): 0.7559287543574007 and lower bound (primal example): 0.7559287513714278 (PEPit) Duality gap: absolute: 2.9859729133718815e-09 and relative: 3.950071892297578e-09 *** Example file: worst-case performance of subgradient method *** PEPit guarantee: min_(0 \leq t \leq n) f(x_i) - f_* <= 0.755929 ||x_0 - x_*|| Theoretical guarantee: min_(0 \leq t \leq n) f(x_i) - f_* <= 0.755929 ||x_0 - x_*||
1.3. Subgradient method under restricted secant inequality and error bound
- PEPit.examples.unconstrained_convex_minimization.wc_subgradient_method_rsi_eb(mu, L, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) verifies the “lower” restricted secant inequality (\(\mu-\text{RSI}^-\)) and the “upper” error bound (\(L-\text{EB}^+\)) [1].
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, \mu, L, \gamma)\) such that the guarantee
\[\| x_n - x_\star \|^2 \leqslant \tau(n, \mu, L, \gamma) \| x_0 - x_\star \|^2\]is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, \mu, L, \gamma)\) is computed as the worst-case value of \(\| x_n - x_\star \|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: Sub-gradient descent is described by
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size.
Theoretical guarantee: The tight theoretical guarantee can be found in [1, Prop 1] (upper bound) and [1, Theorem 2] (lower bound):
\[\| x_n - x_\star \|^2 \leqslant (1 - 2\gamma\mu + L^2 \gamma^2)^n \|x_0-x_\star\|^2.\]References: Definition and convergence guarantees can be found in [1].
- Parameters:
mu (float) – the rsi parameter
L (float) – the eb parameter.
gamma (float) – step-size.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> mu = .1 >>> L = 1 >>> pepit_tau, theoretical_tau = wc_subgradient_method_rsi_eb(mu=mu, L=L, gamma=mu / L ** 2, n=4, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 6x6 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 8 scalar constraint(s) ... Function 1 : 8 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.9605960099986828 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite All the primal scalar constraints are verified (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 2.9007807847382036e-12 (PEPit) Final upper bound (dual): 0.960596009998838 and lower bound (primal example): 0.9605960099986828 (PEPit) Duality gap: absolute: 1.5520917884259688e-13 and relative: 1.6157591456455218e-13 *** Example file: worst-case performance of gradient descent with fixed step-sizes *** PEPit guarantee: f(x_n)-f_* <= 0.960596 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.960596 ||x_0 - x_*||^2
1.4. Gradient descent with exact line search
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_exact_line_search(L, mu, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for the gradient descent (GD) with exact linesearch (ELS). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of the GD with ELS, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).
Algorithm: GD with ELS can be written as
\[x_{t+1} = x_t - \gamma_t \nabla f(x_t)\]with \(\gamma_t = \arg\min_{\gamma} f \left( x_t - \gamma \nabla f(x_t) \right)\).
Theoretical guarantee: The tight worst-case guarantee for GD with ELS, obtained in [1, Theorem 1.2], is
\[f(x_n) - f_\star \leqslant \left(\frac{L-\mu}{L+\mu}\right)^{2n} (f(x_0) - f_\star).\]References: The detailed approach (based on convex relaxations) is available in [1], along with theoretical bound.
- Parameters:
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_gradient_exact_line_search(L=1, mu=.1, n=2, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 7x7 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 12 scalar constraint(s) ... Function 1 : 12 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 1 function(s) Function 1 : Adding 4 scalar constraint(s) ... Function 1 : 4 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.4481249685889447 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 1.5179284576304284e-08 All the primal scalar constraints are verified up to an error of 1.8619346286996574e-08 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 1.3895833635451835e-07 (PEPit) Final upper bound (dual): 0.44812496921062095 and lower bound (primal example): 0.4481249685889447 (PEPit) Duality gap: absolute: 6.216762660216091e-10 and relative: 1.3872832571216518e-09 *** Example file: worst-case performance of gradient descent with exact linesearch (ELS) *** PEPit guarantee: f(x_n)-f_* <= 0.448125 (f(x_0)-f_*) Theoretical guarantee: f(x_n)-f_* <= 0.448125 (f(x_0)-f_*)
1.5. Conjugate gradient
- PEPit.examples.unconstrained_convex_minimization.wc_conjugate_gradient(L, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for the conjugate gradient (CG) method (with exact span searches). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0-x_\star\|^2\]is valid, where \(x_n\) is the output of the conjugate gradient method, and where \(x_\star\) is a minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0-x_\star\|^2 \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i)\]with
\[(\gamma_i)_{i \leqslant t} = \arg\min_{(\gamma_i)_{i \leqslant t}} f \left(x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i) \right)\]Theoretical guarantee:
The tight guarantee obtained in [1] is
\[f(x_n) - f_\star \leqslant\frac{L}{2 \theta_n^2}\|x_0-x_\star\|^2.\]where
\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}, \end{eqnarray}and tightness follows from [2, Theorem 3].
References: The detailed approach (based on convex relaxations) is available in [1, Corollary 6].
- Parameters:
L (float) – the smoothness parameter.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_conjugate_gradient(L=1, n=2, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 7x7 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 12 scalar constraint(s) ... Function 1 : 12 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 1 function(s) Function 1 : Adding 6 scalar constraint(s) ... Function 1 : 6 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.061894196487033516 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 4.09403336792362e-09 All the primal scalar constraints are verified up to an error of 7.438128096087793e-09 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 8.571121581640089e-08 (PEPit) Final upper bound (dual): 0.06189420236648946 and lower bound (primal example): 0.061894196487033516 (PEPit) Duality gap: absolute: 5.8794559429364845e-09 and relative: 9.49920392644276e-08 *** Example file: worst-case performance of conjugate gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.0618942 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0618942 ||x_0 - x_*||^2
1.6. Heavy Ball momentum
- PEPit.examples.unconstrained_convex_minimization.wc_heavy_ball_momentum(mu, L, alpha, beta, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for the Heavy-ball (HB) method, aka Polyak momentum method. That is, it computes the smallest possible \(\tau(n, L, \mu, \alpha, \beta)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \alpha, \beta) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of the Heavy-ball (HB) method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu, \alpha, \beta)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \alpha \nabla f(x_t) + \beta (x_t-x_{t-1})\]with
\[\alpha \in (0, \frac{1}{L}]\]and
\[\beta = \sqrt{(1 - \alpha \mu)(1 - L \alpha)}\]Theoretical guarantee:
The upper guarantee obtained in [2, Theorem 4] is
\[f(x_n) - f_\star \leqslant (1 - \alpha \mu)^n (f(x_0) - f_\star).\]References: This methods was first introduce in [1, Section 2], and convergence upper bound was proven in [2, Theorem 4].
- Parameters:
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
alpha (float) – parameter of the scheme.
beta (float) – parameter of the scheme such that \(0<\beta<1\) and \(0<\alpha<2(1+\beta)\).
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> mu = 0.1 >>> L = 1. >>> alpha = 1 / (2 * L) # alpha \in [0, 1 / L] >>> beta = sqrt((1 - alpha * mu) * (1 - L * alpha)) >>> pepit_tau, theoretical_tau = wc_heavy_ball_momentum(mu=mu, L=L, alpha=alpha, beta=beta, n=2, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 5x5 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 12 scalar constraint(s) ... Function 1 : 12 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.7534930184723507 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 8.447704542447025e-09 All the primal scalar constraints are verified up to an error of 2.3525640133886805e-09 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 1.1216875770062036e-07 (PEPit) Final upper bound (dual): 0.7534930169804492 and lower bound (primal example): 0.7534930184723507 (PEPit) Duality gap: absolute: -1.4919014912351258e-09 and relative: -1.979980510316926e-09 *** Example file: worst-case performance of the Heavy-Ball method *** PEPit guarantee: f(x_n)-f_* <= 0.753493 (f(x_0) - f(x_*)) Theoretical guarantee: f(x_n)-f_* <= 0.9025 (f(x_0) - f(x_*))
1.7. Accelerated gradient for convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_accelerated_gradient_convex(mu, L, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex (\(\mu\) is possibly 0).
This code computes a worst-case guarantee for an accelerated gradient method, a.k.a. fast gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the accelerated gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: The accelerated gradient method of this example is provided by
\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \\ y_{t+1} & = & x_{t+1} + \frac{t-1}{t+2} (x_{t+1} - x_t). \end{eqnarray}Theoretical guarantee: When \(\mu=0\), a tight empirical guarantee can be found in [1, Table 1]:
\[f(x_n)-f_\star \leqslant \frac{2L\|x_0-x_\star\|^2}{n^2 + 5 n + 6},\]where tightness is obtained on some Huber loss functions.
References:
- Parameters:
mu (float) – the strong convexity parameter
L (float) – the smoothness parameter.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_convex(mu=0, L=1, n=1, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 4x4 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 6 scalar constraint(s) ... Function 1 : 6 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.16666666115099577 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 4.820889929895971e-09 All the primal scalar constraints are verified up to an error of 3.620050065267222e-09 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 3.1009588480346295e-08 (PEPit) Final upper bound (dual): 0.16666666498584737 and lower bound (primal example): 0.16666666115099577 (PEPit) Duality gap: absolute: 3.834851602935174e-09 and relative: 2.300911037907513e-08 *** Example file: worst-case performance of accelerated gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
1.8. Accelerated gradient for strongly convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_accelerated_gradient_strongly_convex(mu, L, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for an accelerated gradient method, a.k.a fast gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \left(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2\right),\]is valid, where \(x_n\) is the output of the accelerated gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} y_t & = & x_t + \frac{\sqrt{L} - \sqrt{\mu}}{\sqrt{L} + \sqrt{\mu}}(x_t - x_{t-1}) \\ x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \end{eqnarray}with \(x_{-1}:= x_0\).
Theoretical guarantee:
The following upper guarantee can be found in [1, Corollary 4.15]:
\[f(x_n)-f_\star \leqslant \left(1 - \sqrt{\frac{\mu}{L}}\right)^n \left(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2\right).\]References:
- Parameters:
mu (float) – the strong convexity parameter
L (float) – the smoothness parameter.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_strongly_convex(mu=0.1, L=1, n=2, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 5x5 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 12 scalar constraint(s) ... Function 1 : 12 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.34760222631660587 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 1.9078112028836573e-09 All the primal scalar constraints are verified up to an error of 1.3491461073322775e-09 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 1.191295818299753e-08 (PEPit) Final upper bound (dual): 0.3476022268314509 and lower bound (primal example): 0.34760222631660587 (PEPit) Duality gap: absolute: 5.148450554770534e-10 and relative: 1.4811327905826417e-09 *** Example file: worst-case performance of the accelerated gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.347602 (f(x_0) - f(x_*) + mu/2*||x_0 - x_*||**2) Theoretical guarantee: f(x_n)-f_* <= 0.467544 (f(x_0) - f(x_*) + mu/2*||x_0 - x_*||**2)
1.9. Optimized gradient
- PEPit.examples.unconstrained_convex_minimization.wc_optimized_gradient(L, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for optimized gradient method (OGM). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of OGM and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: The optimized gradient method is described by
\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t)\\ y_{t+1} & = & x_{t+1} + \frac{\theta_{t}-1}{\theta_{t+1}}(x_{t+1}-x_t)+\frac{\theta_{t}}{\theta_{t+1}}(x_{t+1}-y_t), \end{eqnarray}with
\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}. \end{eqnarray}Theoretical guarantee: The tight theoretical guarantee can be found in [2, Theorem 2]:
\[f(x_n)-f_\star \leqslant \frac{L\|x_0-x_\star\|^2}{2\theta_n^2},\]where tightness follows from [3, Theorem 3].
References: The optimized gradient method was developed in [1, 2]; the corresponding lower bound was first obtained in [3].
- Parameters:
L (float) – the smoothness parameter.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_optimized_gradient(L=3, n=4, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 7x7 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 30 scalar constraint(s) ... Function 1 : 30 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.0767518265733206 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 4.2100544930481964e-09 All the primal scalar constraints are verified up to an error of 4.178030258567e-09 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 2.3578267940913163e-09 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 3.653093053290753e-08 (PEPit) Final upper bound (dual): 0.0767518302587488 and lower bound (primal example): 0.0767518265733206 (PEPit) Duality gap: absolute: 3.6854281987297455e-09 and relative: 4.801746568479483e-08 *** Example file: worst-case performance of optimized gradient method *** PEPit guarantee: f(y_n)-f_* <= 0.0767518 ||x_0 - x_*||^2 Theoretical guarantee: f(y_n)-f_* <= 0.0767518 ||x_0 - x_*||^2
1.10. Optimized gradient for gradient
- PEPit.examples.unconstrained_convex_minimization.wc_optimized_gradient_for_gradient(L, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for optimized gradient method for gradient (OGM-G). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[\|\nabla f(x_n)\|^2 \leqslant \tau(n, L) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of OGM-G and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(\|\nabla f(x_n)\|^2\) when \(f(x_0)-f_\star \leqslant 1\).
Algorithm: For \(t\in\{0,1,\ldots,n-1\}\), the optimized gradient method for gradient [1, Section 6.3] is described by
\begin{eqnarray} y_{t+1} & = & x_t - \frac{1}{L} \nabla f(x_t),\\ x_{t+1} & = & y_{t+1} + \frac{(\tilde{\theta}_t-1)(2\tilde{\theta}_{t+1}-1)}{\tilde{\theta}_t(2\tilde{\theta}_t-1)}(y_{t+1}-y_t)+\frac{2\tilde{\theta}_{t+1}-1}{2\tilde{\theta}_t-1}(y_{t+1}-x_t), \end{eqnarray}with
\begin{eqnarray} \tilde{\theta}_n & = & 1 \\ \tilde{\theta}_t & = & \frac{1 + \sqrt{4 \tilde{\theta}_{t+1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \tilde{\theta}_0 & = & \frac{1 + \sqrt{8 \tilde{\theta}_{1}^2 + 1}}{2}. \end{eqnarray}Theoretical guarantee: The tight worst-case guarantee can be found in [1, Theorem 6.1]:
\[\|\nabla f(x_n)\|^2 \leqslant \frac{2L(f(x_0)-f_\star)}{\tilde{\theta}_0^2},\]where tightness is achieved on Huber losses, see [1, Section 6.4].
References: The optimized gradient method for gradient was developed in [1].
- Parameters:
L (float) – the smoothness parameter.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_optimized_gradient_for_gradient(L=3, n=4, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 7x7 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 30 scalar constraint(s) ... Function 1 : 30 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.307007304609862 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite All the primal scalar constraints are verified up to an error of 1.851107461758872e-11 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 1.897293067292418e-11 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 3.0436663734975135e-10 (PEPit) Final upper bound (dual): 0.30700730461344283 and lower bound (primal example): 0.307007304609862 (PEPit) Duality gap: absolute: 3.5808578324747486e-12 and relative: 1.1663754505858493e-11 *** Example file: worst-case performance of optimized gradient method for gradient *** PEP-it guarantee: ||f'(x_n)||^2 <= 0.307007 (f(x_0) - f_*) Theoretical guarantee: ||f'(x_n)||^2 <= 0.307007 (f(x_0) - f_*)
1.11. Robust momentum
- PEPit.examples.unconstrained_convex_minimization.wc_robust_momentum(mu, L, lam, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly-convex.
This code computes a worst-case guarantee for the robust momentum method (RMM). That is, it computes the smallest possible \(\tau(n, \mu, L, \lambda)\) such that the guarantee
\[v(x_{n+1}) \leqslant \tau(n, \mu, L, \lambda) v(x_{n}),\]is valid, where \(x_n\) is the \(n^{\mathrm{th}}\) iterate of the RMM, and \(x_\star\) is a minimizer of \(f\). The function \(v(.)\) is a well-chosen Lyapunov defined as follows,
\begin{eqnarray} v(x_t) & = & l\|z_t - x_\star\|^2 + q_t, \\ q_t & = & (L - \mu) \left(f(x_t) - f_\star - \frac{\mu}{2}\|y_t - x_\star\|^2 - \frac{1}{2}\|\nabla f(y_t) - \mu (y_t - x_\star)\|^2 \right), \end{eqnarray}with \(\kappa = \frac{\mu}{L}\), \(\rho = \lambda (1 - \frac{1}{\kappa}) + (1 - \lambda) \left(1 - \frac{1}{\sqrt{\kappa}}\right)\), and \(l = \mu^2 \frac{\kappa - \kappa \rho^2 - 1}{2 \rho (1 - \rho)}`\).
Algorithm:
For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} x_{t+1} & = & x_{t} + \beta (x_t - x_{t-1}) - \alpha \nabla f(y_t), \\ y_{t+1} & = & y_{t} + \gamma (x_t - x_{t-1}), \end{eqnarray}with \(x_{-1}, x_0 \in \mathrm{R}^d\), and with parameters \(\alpha = \frac{\kappa (1 - \rho^2)(1 + \rho)}{L}\), \(\beta = \frac{\kappa \rho^3}{\kappa - 1}\), \(\gamma = \frac{\rho^2}{(\kappa - 1)(1 - \rho)^2(1 + \rho)}\).
Theoretical guarantee:
A convergence guarantee (empirically tight) is obtained in [1, Theorem 1],
\[v(x_{n+1}) \leqslant \rho^2 v(x_n),\]with \(\rho = \lambda (1 - \frac{1}{\kappa}) + (1 - \lambda) \left(1 - \frac{1}{\sqrt{\kappa}}\right)\).
References:
- Parameters:
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
lam (float) – if \(\lambda=1\) it is the gradient descent, if \(\lambda=0\), it is the Triple Momentum Method.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Examples
>>> pepit_tau, theoretical_tau = wc_robust_momentum(mu=0.1, L=1, lam=0.2, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 5x5 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 6 scalar constraint(s) ... Function 1 : 6 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.5285548454743232 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite All the primal scalar constraints are verified (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 6.797521527290181e-08 (PEPit) Final upper bound (dual): 0.5285548474610776 and lower bound (primal example): 0.5285548454743232 (PEPit) Duality gap: absolute: 1.9867544276408466e-09 and relative: 3.758842520605294e-09 *** Example file: worst-case performance of the Robust Momentum Method *** PEPit guarantee: v(x_(n+1)) <= 0.528555 v(x_n) Theoretical guarantee: v(x_(n+1)) <= 0.528555 v(x_n)
1.12. Triple momentum
- PEPit.examples.unconstrained_convex_minimization.wc_triple_momentum(mu, L, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for triple momentum method (TMM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the TMM, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm:
For \(t \in \{ 1, \dots, n\}\)
\begin{eqnarray} \xi_{t+1} &&= (1 + \beta) \xi_{t} - \beta \xi_{t-1} - \alpha \nabla f(y_t) \\ y_{t} &&= (1+\gamma ) \xi_{t} -\gamma \xi_{t-1} \\ x_{t} && = (1 + \delta) \xi_{t} - \delta \xi_{t-1} \end{eqnarray}with
\begin{eqnarray} \kappa &&= \frac{L}{\mu} , \quad \rho = 1- \frac{1}{\sqrt{\kappa}}\\ (\alpha, \beta, \gamma,\delta) && = \left(\frac{1+\rho}{L}, \frac{\rho^2}{2-\rho}, \frac{\rho^2}{(1+\rho)(2-\rho)}, \frac{\rho^2}{1-\rho^2}\right) \end{eqnarray}and
\begin{eqnarray} \xi_{0} = x_0 \\ \xi_{1} = x_0 \\ y = x_0 \end{eqnarray}Theoretical guarantee: A theoretical upper (empirically tight) bound can be found in [1, Theorem 1, eq. 4]:
\[f(x_n)-f_\star \leqslant \frac{\rho^{2(n+1)} L \kappa}{2}\|x_0 - x_\star\|^2.\]References: The triple momentum method was discovered and analyzed in [1].
- Parameters:
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_triple_momentum(mu=0.1, L=1., n=4, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 7x7 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 30 scalar constraint(s) ... Function 1 : 30 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.23892507617696113 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 1.2421790162612716e-08 All the primal scalar constraints are verified up to an error of 2.3083153937765444e-08 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 2.128560722969591e-09 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 9.890478893964213e-08 (PEPit) Final upper bound (dual): 0.23892508270020568 and lower bound (primal example): 0.23892507617696113 (PEPit) Duality gap: absolute: 6.523244555634022e-09 and relative: 2.7302469292936613e-08 *** Example file: worst-case performance of the Triple Momentum Method *** PEPit guarantee: f(x_n)-f_* <= 0.238925 ||x_0-x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.238925 ||x_0-x_*||^2
1.13. Information theoretic exact method
- PEPit.examples.unconstrained_convex_minimization.wc_information_theoretic(mu, L, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex (\(\mu\) is possibly 0).
This code computes a worst-case guarantee for the information theoretic exact method (ITEM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[\|z_n - x_\star\|^2 \leqslant \tau(n, L, \mu) \|z_0 - x_\star\|^2\]is valid, where \(z_n\) is the output of the ITEM, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\|z_n - x_\star\|^2\) when \(\|z_0 - x_\star\|^2 \leqslant 1\).
Algorithm: For \(t\in\{0,1,\ldots,n-1\}\), the information theoretic exact method of this example is provided by
\begin{eqnarray} y_{t} & = & (1-\beta_t) z_t + \beta_t x_t \\ x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \\ z_{t+1} & = & \left(1-q\delta_t\right) z_t+q\delta_t y_t-\frac{\delta_t}{L}\nabla f(y_t), \end{eqnarray}with \(y_{-1}=x_0=z_0\), \(q=\frac{\mu}{L}\) (inverse condition ratio), and the scalar sequences:
\begin{eqnarray} A_{t+1} & = & \frac{(1+q)A_t+2\left(1+\sqrt{(1+A_t)(1+qA_t)}\right)}{(1-q)^2},\\ \beta_{t+1} & = & \frac{A_t}{(1-q)A_{t+1}},\\ \delta_{t+1} & = & \frac{1}{2}\frac{(1-q)^2A_{t+1}-(1+q)A_t}{1+q+q A_t}, \end{eqnarray}with \(A_0=0\).
Theoretical guarantee: A tight worst-case guarantee can be found in [1, Theorem 3]:
\[\|z_n - x_\star\|^2 \leqslant \frac{1}{1+q A_n} \|z_0-x_\star\|^2,\]where tightness is obtained on some quadratic loss functions (see [1, Lemma 2]).
References:
- Parameters:
mu (float) – the strong convexity parameter.
L (float) – the smoothness parameter.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_information_theoretic(mu=.001, L=1, n=15, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 17x17 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 240 scalar constraint(s) ... Function 1 : 240 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.7566088333863785 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite All the primal scalar constraints are verified up to an error of 2.6490565304427935e-09 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 5.0900580550698854e-05 (PEPit) Final upper bound (dual): 0.7566088219545419 and lower bound (primal example): 0.7566088333863785 (PEPit) Duality gap: absolute: -1.1431836588471356e-08 and relative: -1.5109308911059783e-08 *** Example file: worst-case performance of the information theoretic exact method *** PEP-it guarantee: ||z_n - x_* ||^2 <= 0.756609 ||z_0 - x_*||^2 Theoretical guarantee: ||z_n - x_* ||^2 <= 0.756605 ||z_0 - x_*||^2
1.14. Cyclic coordinate descent
- PEPit.examples.unconstrained_convex_minimization.wc_cyclic_coordinate_descent(L, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth by blocks (with \(d\) blocks) and convex.
This code computes a worst-case guarantee for cyclic coordinate descent with fixed step-sizes \(1/L_i\). That is, it computes the smallest possible \(\tau(n, d, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, d, L) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of cyclic coordinate descent with fixed step-sizes \(1/L_i\), and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\), \(L\), and \(d\), \(\tau(n, d, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: Cyclic coordinate descent is described by
\[x_{t+1} = x_t - \frac{1}{L_{i_t}} \nabla_{i_t} f(x_t),\]where \(L_{i_t}\) is the Lipschitz constant of the block \(i_t\), and where \(i_t\) follows a prescribed ordering.
References:
- Parameters:
L (list) – list of floats, smoothness parameters (for each block).
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – None
Example
>>> L = [1., 2., 10.] >>> pepit_tau, theoretical_tau = wc_cyclic_coordinate_descent(L=L, n=9, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 34x34 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 330 scalar constraint(s) ... Function 1 : 330 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Setting up the problem: 1 partition(s) added Partition 1 with 3 blocks: Adding 363 scalar constraint(s)... Partition 1 with 3 blocks: 363 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 1.4892758367502887 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 7.745171522956692e-09 All the primal scalar constraints are verified up to an error of 7.4478840872416185e-09 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 7.835133702255824e-09 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 3.797651214794727e-07 (PEPit) Final upper bound (dual): 1.4892758368167314 and lower bound (primal example): 1.4892758367502887 (PEPit) Duality gap: absolute: 6.644262917632204e-11 and relative: 4.461405169998919e-11 *** Example file: worst-case performance of cyclic coordinate descent with fixed step-sizes *** PEPit guarantee: f(x_n)-f_* <= 1.48928 ||x_0 - x_*||^2
1.15. Proximal point
- PEPit.examples.unconstrained_convex_minimization.wc_proximal_point(gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is closed, proper, and convex (and potentially non-smooth).
This code computes a worst-case guarantee for the proximal point method with step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n,\gamma)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, \gamma) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the proximal point method, and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(\gamma\), \(\tau(n,\gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm:
The proximal point method is described by
\[x_{t+1} = \arg\min_x \left\{f(x)+\frac{1}{2\gamma}\|x-x_t\|^2 \right\},\]where \(\gamma\) is a step-size.
Theoretical guarantee:
The tight theoretical guarantee can be found in [1, Theorem 4.1]:
\[f(x_n)-f_\star \leqslant \frac{\|x_0-x_\star\|^2}{4\gamma n},\]where tightness is obtained on, e.g., one-dimensional linear problems on the positive orthant.
References:
- Parameters:
gamma (float) – step-size.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_proximal_point(gamma=3, n=4, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 6x6 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 20 scalar constraint(s) ... Function 1 : 20 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.020833335685730252 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 3.626659005644299e-09 All the primal scalar constraints are verified up to an error of 1.1386158081487519e-08 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 3.0464297827437203e-08 (PEPit) Final upper bound (dual): 0.020833337068527292 and lower bound (primal example): 0.020833335685730252 (PEPit) Duality gap: absolute: 1.382797040067052e-09 and relative: 6.637425042856655e-08 *** Example file: worst-case performance of proximal point method *** PEPit guarantee: f(x_n)-f_* <= 0.0208333 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0208333 ||x_0 - x_*||^2
1.16. Accelerated proximal point
- PEPit.examples.unconstrained_convex_minimization.wc_accelerated_proximal_point(A0, gammas, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is convex and possibly non-smooth.
This code computes a worst-case guarantee an accelerated proximal point method, aka fast proximal point method (FPP). That is, it computes the smallest possible \(\tau(n, A_0,\vec{\gamma})\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, A_0, \vec{\gamma}) \left(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2\right)\]is valid, where \(x_n\) is the output of FPP (with step-size \(\gamma_t\) at step \(t\in \{0, \dots, n-1\}\)) and where \(x_\star\) is a minimizer of \(f\) and \(A_0\) is a positive number.
In short, for given values of \(n\), \(A_0\) and \(\vec{\gamma}\), \(\tau(n)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2 \leqslant 1\), for the following method.
Algorithm: For \(t\in \{0, \dots, n-1\}\):
\begin{eqnarray} y_{t+1} & = & (1-\alpha_{t} ) x_{t} + \alpha_{t} v_t \\ x_{t+1} & = & \arg\min_x \left\{f(x)+\frac{1}{2\gamma_t}\|x-y_{t+1}\|^2 \right\}, \\ v_{t+1} & = & v_t + \frac{1}{\alpha_{t}} (x_{t+1}-y_{t+1}) \end{eqnarray}with
\begin{eqnarray} \alpha_{t} & = & \frac{\sqrt{(A_t \gamma_t)^2 + 4 A_t \gamma_t} - A_t \gamma_t}{2} \\ A_{t+1} & = & (1 - \alpha_{t}) A_t \end{eqnarray}and \(v_0=x_0\).
Theoretical guarantee: A theoretical upper bound can be found in [1, Theorem 2.3.]:
\[f(x_n)-f_\star \leqslant \frac{4}{A_0 (\sum_{t=0}^{n-1} \sqrt{\gamma_t})^2}\left(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2 \right).\]References: The accelerated proximal point was first obtained and analyzed in [1].
- Parameters:
A0 (float) – initial value for parameter A_0.
gammas (list) – sequence of step-sizes.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_proximal_point(A0=5, gammas=[(i + 1) / 1.1 for i in range(3)], n=3, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 6x6 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 20 scalar constraint(s) ... Function 1 : 20 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.015931148923290624 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 3.713119626105772e-10 All the primal scalar constraints are verified up to an error of 1.4460231649235378e-09 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 2.0490523713620816e-10 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 2.9451787884841313e-09 (PEPit) Final upper bound (dual): 0.015931149263944334 and lower bound (primal example): 0.015931148923290624 (PEPit) Duality gap: absolute: 3.4065371010139067e-10 and relative: 2.138287148915985e-08 *** Example file: worst-case performance of fast proximal point method *** PEPit guarantee: f(x_n)-f_* <= 0.0159311 (f(x_0) - f_* + A/2* ||x_0 - x_*||^2) Theoretical guarantee: f(x_n)-f_* <= 0.0511881 (f(x_0) - f_* + A/2* ||x_0 - x_*||^2)
1.17. Inexact gradient descent
- PEPit.examples.unconstrained_convex_minimization.wc_inexact_gradient_descent(L, mu, epsilon, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for the inexact gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu, \varepsilon)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \varepsilon) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of the inexact gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\), \(\mu\) and \(\varepsilon\), \(\tau(n, L, \mu, \varepsilon)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \gamma d_t\]with
\[\|d_t - \nabla f(x_t)\| \leqslant \varepsilon \|\nabla f(x_t)\|\]and
\[\gamma = \frac{2}{L_{\varepsilon} + \mu_{\varepsilon}}\]where \(L_{\varepsilon} = (1 + \varepsilon) L\) and \(\mu_{\varepsilon} = (1 - \varepsilon) \mu\).
Theoretical guarantee:
The tight worst-case guarantee obtained in [1, Theorem 5.3] or [2, Remark 1.6] is
\[f(x_n) - f_\star \leqslant \left(\frac{L_{\varepsilon}-\mu_{\varepsilon}}{L_{\varepsilon}+\mu_{\varepsilon}}\right)^{2n}(f(x_0) - f_\star),\]where tightness is achieved on simple quadratic functions.
References: The detailed analyses can be found in [1, 2].
- Parameters:
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
epsilon (float) – level of inaccuracy.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_inexact_gradient_descent(L=1, mu=.1, epsilon=.1, n=2, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 7x7 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 12 scalar constraint(s) ... Function 1 : 12 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 1 function(s) Function 1 : Adding 2 scalar constraint(s) ... Function 1 : 2 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.5189167048760179 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 3.328901122905122e-09 All the primal scalar constraints are verified up to an error of 9.223752428511034e-09 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 1.0409575365469605e-07 (PEPit) Final upper bound (dual): 0.5189166992915334 and lower bound (primal example): 0.5189167048760179 (PEPit) Duality gap: absolute: -5.584484541465429e-09 and relative: -1.0761813001953176e-08 *** Example file: worst-case performance of inexact gradient method in distance in function values *** PEPit guarantee: f(x_n)-f_* <= 0.518917 (f(x_0)-f_*) Theoretical guarantee: f(x_n)-f_* <= 0.518917 (f(x_0)-f_*)
1.18. Inexact gradient descent with exact line search
- PEPit.examples.unconstrained_convex_minimization.wc_inexact_gradient_exact_line_search(L, mu, epsilon, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for an inexact gradient method with exact linesearch (ELS). That is, it computes the smallest possible \(\tau(n, L, \mu, \varepsilon)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \varepsilon) ( f(x_0) - f_\star )\]is valid, where \(x_n\) is the output of the gradient descent with an inexact descent direction and an exact linesearch, and where \(x_\star\) is the minimizer of \(f\).
The inexact descent direction \(d\) is assumed to satisfy a relative inaccuracy described by (with \(0 \leqslant \varepsilon < 1\))
\[\|\nabla f(x_t) - d_t\| \leqslant \varepsilon \|\nabla f(x_t)\|,\]where \(\nabla f(x_t)\) is the true gradient, and \(d_t\) is the approximate descent direction that is used.
Algorithm:
For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} \gamma_t & = & \arg\min_{\gamma \in R^d} f(x_t- \gamma d_t), \\ x_{t+1} & = & x_t - \gamma_t d_t. \end{eqnarray}Theoretical guarantees:
The tight guarantee obtained in [1, Theorem 5.1] is
\[f(x_n) - f_\star\leqslant \left(\frac{L_{\varepsilon} - \mu_{\varepsilon}}{L_{\varepsilon} + \mu_{\varepsilon}}\right)^{2n}( f(x_0) - f_\star ),\]with \(L_{\varepsilon} = (1 + \varepsilon) L\) and \(\mu_{\varepsilon} = (1 - \varepsilon) \mu\). Tightness is achieved on simple quadratic functions.
References: The detailed approach (based on convex relaxations) is available in [1],
- Parameters:
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
epsilon (float) – level of inaccuracy.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_inexact_gradient_exact_line_search(L=1, mu=0.1, epsilon=0.1, n=2, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 9x9 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 12 scalar constraint(s) ... Function 1 : 12 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 1 function(s) Function 1 : Adding 6 scalar constraint(s) ... Function 1 : 6 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.5189166579835516 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 7.855519391036226e-09 All the primal scalar constraints are verified up to an error of 2.1749565343176513e-08 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 8.349049574835928e-08 (PEPit) Final upper bound (dual): 0.5189166453796771 and lower bound (primal example): 0.5189166579835516 (PEPit) Duality gap: absolute: -1.260387449963929e-08 and relative: -2.4288822310342565e-08 *** Example file: worst-case performance of inexact gradient descent with exact linesearch *** PEPit guarantee: f(x_n)-f_* <= 0.518917 (f(x_0)-f_*) Theoretical guarantee: f(x_n)-f_* <= 0.518917 (f(x_0)-f_*)
1.19. Inexact accelerated gradient
- PEPit.examples.unconstrained_convex_minimization.wc_inexact_accelerated_gradient(L, epsilon, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for an accelerated gradient method using inexact first-order information. That is, it computes the smallest possible \(\tau(n, L, \varepsilon)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \varepsilon) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of inexact accelerated gradient descent and where \(x_\star\) is a minimizer of \(f\).
The inexact descent direction is assumed to satisfy a relative inaccuracy described by (with \(0\leqslant \varepsilon \leqslant 1\))
\[\|\nabla f(y_t) - d_t\| \leqslant \varepsilon \|\nabla f(y_t)\|,\]where \(\nabla f(y_t)\) is the true gradient at \(y_t\) and \(d_t\) is the approximate descent direction that is used.
Algorithm: The inexact accelerated gradient method of this example is provided by
\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} d_t\\ y_{k+1} & = & x_{t+1} + \frac{t-1}{t+2} (x_{t+1} - x_t). \end{eqnarray}Theoretical guarantee: When \(\varepsilon=0\), a tight empirical guarantee can be found in [1, Table 1]:
\[f(x_n)-f_\star \leqslant \frac{2L\|x_0-x_\star\|^2}{n^2 + 5 n + 6},\]which is achieved on some Huber loss functions (when \(\varepsilon=0\)).
References:
- Parameters:
L (float) – smoothness parameter.
epsilon (float) – level of inaccuracy
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_inexact_accelerated_gradient(L=1, epsilon=0.1, n=5, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 13x13 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 42 scalar constraint(s) ... Function 1 : 42 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 1 function(s) Function 1 : Adding 5 scalar constraint(s) ... Function 1 : 5 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.039388047868526704 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 9.664109261839382e-09 All the primal scalar constraints are verified up to an error of 2.7598597017106097e-08 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 1.4089929790920282e-07 (PEPit) Final upper bound (dual): 0.03938804252543366 and lower bound (primal example): 0.039388047868526704 (PEPit) Duality gap: absolute: -5.3430930443965075e-09 and relative: -1.3565264930699812e-07 *** Example file: worst-case performance of inexact accelerated gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.039388 (f(x_0)-f_*) Theoretical guarantee for epsilon = 0 : f(x_n)-f_* <= 0.0357143 (f(x_0)-f_*)
1.20. Epsilon-subgradient method
- PEPit.examples.unconstrained_convex_minimization.wc_epsilon_subgradient_method(M, n, gamma, eps, R, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is closed, convex, and proper. This problem is a (possibly non-smooth) minimization problem.
This code computes a worst-case guarantee for the \(\varepsilon\) -subgradient method. That is, it computes the smallest possible \(\tau(n, M, \gamma, \varepsilon, R)\) such that the guarantee
\[\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star \leqslant \tau(n, M, \gamma, \varepsilon, R)\]is valid, where \(x_t\) are the iterates of the \(\varepsilon\) -subgradient method after \(t\leqslant n\) steps, where \(x_\star\) is a minimizer of \(f\), where \(M\) is an upper bound on the norm of all \(\varepsilon\)-subgradients encountered, and when \(\|x_0-x_\star\|\leqslant R\).
In short, for given values of \(M\), of the accuracy \(\varepsilon\), of the step-size \(\gamma\), of the initial distance \(R\), and of the number of iterations \(n\), \(\tau(n, M, \gamma, \varepsilon, R)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star\).
Algorithm: For \(t\in \{0, \dots, n-1 \}\)
\begin{eqnarray} g_{t} & \in & \partial_{\varepsilon} f(x_t) \\ x_{t+1} & = & x_t - \gamma g_t \end{eqnarray}Theoretical guarantee: An upper bound is obtained in [1, Lemma 2]:
\[\min_{0 \leqslant t \leqslant n} f(x_t)- f(x_\star) \leqslant \frac{R^2+2(n+1)\gamma\varepsilon+(n+1) \gamma^2 M^2}{2(n+1) \gamma}.\]References:
- Parameters:
M (float) – the bound on norms of epsilon-subgradients.
n (int) – the number of iterations.
gamma (float) – step-size.
eps (float) – the bound on the value of epsilon (inaccuracy).
R (float) – the bound on initial distance to an optimal solution.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> M, n, eps, R = 2, 6, .1, 1 >>> gamma = 1 / sqrt(n + 1) >>> pepit_tau, theoretical_tau = wc_epsilon_subgradient_method(M=M, n=n, gamma=gamma, eps=eps, R=R, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 21x21 (PEPit) Setting up the problem: performance measure is the minimum of 7 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (14 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 182 scalar constraint(s) ... Function 1 : 182 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 1 function(s) Function 1 : Adding 6 scalar constraint(s) ... Function 1 : 6 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 1.0191560420875132 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite All the primal scalar constraints are verified up to an error of 9.992007221626409e-16 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 8.140035658377668e-09 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 2.4537099231463084e-07 (PEPit) Final upper bound (dual): 1.019156044756485 and lower bound (primal example): 1.0191560420875132 (PEPit) Duality gap: absolute: 2.668971710306778e-09 and relative: 2.6188057570065385e-09 *** Example file: worst-case performance of the epsilon-subgradient method *** PEPit guarantee: min_(0 <= t <= n) f(x_i) - f_* <= 1.01916 Theoretical guarantee: min_(0 <= t <= n) f(x_i) - f_* <= 1.04491
1.21. Gradient descent for quadratically upper bounded convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent_qg_convex(L, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [1]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \gamma) \| x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(||x_0 - x_\star||^2 \leqslant 1\).
Algorithm: Gradient descent is described by
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size.
Theoretical guarantee: When \(\gamma < \frac{1}{L}\), the lower theoretical guarantee can be found in [1, Theorem 2.2]:
\[f(x_n)-f_\star \leqslant \frac{L}{2}\max\left(\frac{1}{2n L \gamma + 1}, L \gamma\right) \|x_0-x_\star\|^2.\]References:
The detailed approach is available in [1, Theorem 2.2].
- Parameters:
L (float) – the quadratic growth parameter.
gamma (float) – step-size.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> L = 1 >>> pepit_tau, theoretical_tau = wc_gradient_descent_qg_convex(L=L, gamma=.2 / L, n=4, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 7x7 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 35 scalar constraint(s) ... Function 1 : 35 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.19230769057979225 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite All the primal scalar constraints are verified up to an error of 6.487419074163725e-10 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 9.613757534126084e-10 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 9.03711530175072e-09 (PEPit) Final upper bound (dual): 0.1923076915693468 and lower bound (primal example): 0.19230769057979225 (PEPit) Duality gap: absolute: 9.895545494131852e-10 and relative: 5.145683703182944e-09 *** Example file: worst-case performance of gradient descent with fixed step-sizes *** PEPit guarantee: f(x_n)-f_* <= 0.192308 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.192308 ||x_0 - x_*||^2
1.22. Gradient descent with decreasing step sizes for quadratically upper bounded convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent_qg_convex_decreasing(L, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [1]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.
This code computes a worst-case guarantee for gradient descent with decreasing step-sizes. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \| x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of gradient descent with decreasing step-sizes, and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(||x_0 - x_\star||^2 \leqslant 1\).
Algorithm: Gradient descent with decreasing step sizes is described by
\[x_{t+1} = x_t - \gamma_t \nabla f(x_t)\]with
\[\gamma_t = \frac{1}{L u_{t+1}}\]where the sequence \(u\) is defined by
\begin{eqnarray} u_0 & = & 1 \\ u_{t} & = & \frac{u_{t-1}}{2} + \sqrt{\left(\frac{u_{t-1}}{2}\right)^2 + 2}, \quad \mathrm{for } t \geq 1 \end{eqnarray}Theoretical guarantee: The tight theoretical guarantee is conjectured in [1, Conjecture A.3]:
\[f(x_n)-f_\star \leqslant \frac{L}{2 u_t} \|x_0-x_\star\|^2.\]Notes:
We verify that \(u_t \sim 2\sqrt{t}\). The step sizes as well as the function values of the iterates decrease as \(O\left( \frac{1}{\sqrt{t}} \right)\).
References:
The detailed approach is available in [1, Appendix A.3].
- Parameters:
L (float) – the quadratic growth parameter.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_gradient_descent_qg_convex_decreasing(L=1, n=6, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 9x9 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 63 scalar constraint(s) ... Function 1 : 63 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.10554738683923168 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 1.3052503007966848e-10 All the primal scalar constraints are verified up to an error of 6.537865110400887e-10 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 4.781039101724198e-10 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 8.076454220612452e-09 (PEPit) Final upper bound (dual): 0.10554738755645543 and lower bound (primal example): 0.10554738683923168 (PEPit) Duality gap: absolute: 7.172237526109626e-10 and relative: 6.7952772123428116e-09 *** Example file: worst-case performance of gradient descent with fixed step-sizes *** PEPit guarantee: f(x_n)-f_* <= 0.105547 ||x_0 - x_*||^2 Theoretical conjecture: f(x_n)-f_* <= 0.105547 ||x_0 - x_*||^2
1.23. Conjugate gradient for quadratically upper bounded convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_conjugate_gradient_qg_convex(L, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [2]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.
This code computes a worst-case guarantee for the conjugate gradient (CG) method (with exact span searches). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0-x_\star\|^2\]is valid, where \(x_n\) is the output of the conjugate gradient method, and where \(x_\star\) is a minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0-x_\star\|^2 \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i)\]with
\[(\gamma_i)_{i \leqslant t} = \arg\min_{(\gamma_i)_{i \leqslant t}} f \left(x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i) \right)\]Theoretical guarantee:
The tight guarantee obtained in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper) is
\[f(x_n) - f_\star \leqslant \frac{L}{2 (n + 1)} \|x_0-x_\star\|^2.\]References: The detailed approach (based on convex relaxations) is available in [1, Corollary 6], and the result provided in [2, Theorem 2.4].
- Parameters:
L (float) – the quadratic growth parameter.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_conjugate_gradient_qg_convex(L=1, n=12, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 27x27 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 195 scalar constraint(s) ... Function 1 : 195 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 1 function(s) Function 1 : Adding 156 scalar constraint(s) ... Function 1 : 156 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.038461537777152104 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 9.102635033370141e-10 All the primal scalar constraints are verified up to an error of 6.985771551504261e-09 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 6.487940056145134e-09 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 2.3524476901692115e-07 (PEPit) Final upper bound (dual): 0.038461545907145095 and lower bound (primal example): 0.038461537777152104 (PEPit) Duality gap: absolute: 8.129992991323665e-09 and relative: 2.113798215357174e-07 *** Example file: worst-case performance of conjugate gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.0384615 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0384615 ||x_0 - x_*||^2
1.24. Heavy Ball momentum for quadratically upper bounded convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_heavy_ball_momentum_qg_convex(L, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [2]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.
This code computes a worst-case guarantee for the Heavy-ball (HB) method, aka Polyak momentum method. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the Heavy-ball (HB) method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm:
This method is described in [1]
\[x_{t+1} = x_t - \alpha_t \nabla f(x_t) + \beta_t (x_t-x_{t-1})\]with
\[\alpha_t = \frac{1}{L} \frac{1}{t+2}\]and
\[\beta_t = \frac{t}{t+2}\]Theoretical guarantee:
The tight guarantee obtained in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper) is
\[f(x_n) - f_\star \leqslant \frac{L}{2}\frac{1}{n+1} \|x_0 - x_\star\|^2.\]References: This methods was first introduce in [1, section 3], and convergence tight bound was proven in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper).
- Parameters:
L (float) – the quadratic growth parameter.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + solver details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_heavy_ball_momentum_qg_convex(L=1, n=5, wrapper="cvxpy", solver=None, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 8x8 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 48 scalar constraint(s) ... Function 1 : 48 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.08333333312648067 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite All the primal scalar constraints are verified up to an error of 1.0360445487633818e-10 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative up to an error of 9.384283773139436e-11 (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 2.3121924686696765e-09 (PEPit) Final upper bound (dual): 0.08333333333812117 and lower bound (primal example): 0.08333333312648067 (PEPit) Duality gap: absolute: 2.1164049679445185e-10 and relative: 2.539685967837512e-09 *** Example file: worst-case performance of the Heavy-Ball method *** PEPit guarantee: f(x_n)-f_* <= 0.0833333 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0833333 ||x_0 - x_*||^2
1.25. Gradient descent for smooth strongly convex quadratic objective
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent_quadratics(mu, L, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f=\frac{1}{2} x^T Q x\) is \(L\)-smooth and \(\mu\)-strongly convex (i.e. \(\mu I \preceq Q \preceq LI\)).
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, \mu, L, \gamma)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, \mu, L, \gamma) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\), \(\mu\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: Gradient descent is described by
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size.
- Theoretical guarantee:
When \(\gamma \leqslant \frac{2}{L}\) and \(0 \leqslant \mu \leqslant L\), the tight theoretical conjecture can be found in [1, Equation (4.17)]:
\[f(x_n)-f_\star \leqslant \frac{L}{2} \max\left\{\alpha(1-\alpha L\gamma)^{2n}, (1-L\gamma)^{2n} \right\} \|x_0-x_\star\|^2,\]
where \(\alpha = \mathrm{proj}_{[\frac{\mu}{L},1]} \left(\frac{1}{L\gamma (2n+1)}\right)\).
References:
- Parameters:
mu (float) – the strong convexity parameter.
L (float) – the smoothness parameter.
gamma (float) – step-size.
n (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> L = 3. >>> mu = 0.3 >>> pepit_tau, theoretical_tau = wc_gradient_descent_quadratics(mu=mu, L=L, gamma=1 / L, n=4, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 7x7 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) Function 1 : Adding 21 scalar constraint(s) ... Function 1 : 21 scalar constraint(s) added Function 1 : Adding 1 lmi constraint(s) ... Size of PSD matrix 1: 6x6 Function 1 : 1 lmi constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.0649573836866313 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 6.909122375079227e-10 All required PSD matrices are indeed positive semi-definite up to an error of 8.456976505538092e-09 All the primal scalar constraints are verified up to an error of 8.495850689627105e-10 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual matrices to lmi are positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 2.5432361579460934 (PEPit) Final upper bound (dual): 0.06495738812047232 and lower bound (primal example): 0.0649573836866313 (PEPit) Duality gap: absolute: 4.4338410165600806e-09 and relative: 6.825769088776581e-08 *** Example file: worst-case performance of gradient descent on quadratics with fixed step-sizes *** PEPit guarantee: f(x_n)-f_* <= 0.0649574 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0649574 ||x_0 - x_*||^2
1.26. Gradient descent for smooth strongly convex objective with linear mapping
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent_lc(mug, Lg, typeM, muM, LM, gamma, n, verbose=1)[source]
Consider the convex minimization problem
\[g_\star \triangleq \min_x g(Mx),\]where \(g\) is an \(L_g\)-smooth, mu_g-strongly convex function and \(M\) is a general, symmetric or skew-symmetric matrix with \(\mu_M \leqslant \|M\| \leqslant L_M\).
Note
For general and skew-symmetric matrices, \(\mu_M\) must be set to 0.
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\) on a function \(F\) defined as the composition of a function \(G\) and a linear operator \(M\). That is, it computes the smallest possible \(\tau(n, \mu_g, L_g, \mu_M, L_M, \gamma)\) such that the guarantee
\[g(Mx_n) - g_\star \leqslant \tau(n, \mu_g, L_g, \mu_M, L_M, \gamma) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of gradient descent run on \(F\) with fixed step-size \(\gamma\), where \(x_\star\) is a minimizer of \(F(x) = g(Mx)\), and where \(g_\star = g(M x_\star)\).
In short, for given values of \(n\), \(\mu_g\), \(L_g\), \(\mu_M\), \(L_M\), and \(\gamma\), \(\tau(n, \mu_g, L_g, \mu_M, L_M, \gamma)\) is computed as the worst-case value of \(g(Mx_n)-g_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: Gradient descent on such a function is described by
\[x_{t+1} = x_t - \gamma M^T \nabla g(Mx_t),\]where \(\gamma\) is a step-size.
Theoretical guarantee: When \(\gamma \leqslant \frac{2}{L}\), \(0 \leqslant \mu_g \leqslant L_g\), and \(0 \leqslant \mu_M \leqslant L_M\), the following tight theoretical guarantee is conjectured in [1, Conjecture 4.2], and the associated lower theoretical guarantee is stated in [1, Conjecture 4.3]:
\[g(Mx_n)-g_\star \leqslant \frac{L}{2} \max\left\{\frac{\kappa_g {M^*}^2}{\kappa_g -1 + (1-\kappa_g {M^*}^2 L \gamma )^{-2n}}, (1-L\gamma)^{2n} \right\} \|x_0-x_\star\|^2,\]where \(L = L_g L_M^2\), \(\kappa_g = \frac{\mu_g}{L_g}\), \(\kappa_M = \frac{\mu_M}{L_M}\), \(M^* = \mathrm{proj}_{[\kappa_M,1]} \left(\sqrt{\frac{h_0}{L\gamma}}\right)\) for \(h_0\) solution of
\[(1-\kappa_g)(1-\kappa_g h_0)^{2n+1} = 1 - (2n+1)\kappa_g h_0.\]References:
- Parameters:
mug (float) – the strong convexity parameter of \(g(y)\).
Lg (float) – the smoothness parameter of \(g(y)\).
typeM (string) – type of matrix \(M\) (“gen”, “sym” or “skew”).
muM (float) – lower bound on \(\|M\|\) (if typeM != “sym”, then muM must be set to zero).
LM (float) – upper bound on \(\|M\|\).
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns:
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> Lg = 3.; mug = 0.3 >>> typeM = "sym"; LM = 1.; muM = 0. >>> L = Lg*LM**2 >>> pepit_tau, theoretical_tau = wc_gradient_descent_lc(mug = mug, Lg=Lg, typeM=typeM, muM = muM, LM=LM, gamma=1 / L, n=3, verbose=1) (PEPit) Setting up the problem: size of the Gram matrix: 16x16 (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (2 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 3 function(s) Function 1 : Adding 20 scalar constraint(s) ... Function 1 : 20 scalar constraint(s) added Function 2 : Adding 20 scalar constraint(s) ... Function 2 : 20 scalar constraint(s) added Function 2 : Adding 2 lmi constraint(s) ... Size of PSD matrix 1: 5x5 Size of PSD matrix 2: 4x4 Function 2 : 2 lmi constraint(s) added Function 3 : Adding 0 scalar constraint(s) ... Function 3 : 0 scalar constraint(s) added (PEPit) Setting up the problem: additional constraints for 0 function(s) (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.16380641240039548 (PEPit) Primal feasibility check: The solver found a Gram matrix that is positive semi-definite up to an error of 8.261931885644884e-09 All required PSD matrices are indeed positive semi-definite All the primal scalar constraints are verified up to an error of 1.7219835096598865e-08 (PEPit) Dual feasibility check: The solver found a residual matrix that is positive semi-definite All the dual matrices to lmi are positive semi-definite All the dual scalar values associated with inequality constraints are nonnegative (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 5.282963298309745e-07 (PEPit) Final upper bound (dual): 0.1638061803039686 and lower bound (primal example): 0.16380641240039548 (PEPit) Duality gap: absolute: -2.320964268831549e-07 and relative: -1.4168946348439444e-06 *** Example file: worst-case performance of gradient descent on g(Mx) with fixed step-sizes *** PEPit guarantee: f(x_n)-f_* <= 0.163806 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.16379 ||x_0 - x_*||^2