1. Unconstrained convex minimization
Subgradient method under restricted secant inequality and error bound
Gradient descent for quadratically upper bounded convex objective
Gradient descent with decreasing step sizes for quadratically upper bounded convex objective
Conjugate gradient for quadratically upper bounded convex objective
Heavy Ball momentum for quadratically upper bounded convex objective
1.1. Gradient descent
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent(L, gamma, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \gamma) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: Gradient descent is described by
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size.
Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), the tight theoretical guarantee can be found in [1, Theorem 3.1]:
\[f(x_n)-f_\star \leqslant \frac{L}{4nL\gamma+2} \|x_0-x_\star\|^2,\]which is tight on some Huber loss functions.
References:
- Parameters
L (float) – the smoothness parameter.
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> L = 3 >>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=1 / L, n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 30 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: MOSEK); optimal value: 0.16666666497937685 *** Example file: worst-case performance of gradient descent with fixed step-sizes *** PEPit guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
1.2. Subgradient method
- PEPit.examples.unconstrained_convex_minimization.wc_subgradient_method(M, n, gamma, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is convex and \(M\)-Lipschitz. This problem is a (possibly non-smooth) minimization problem.
This code computes a worst-case guarantee for the subgradient method. That is, it computes the smallest possible \(\tau(n, M, \gamma)\) such that the guarantee
\[\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star \leqslant \tau(n, M, \gamma) \|x_0 - x_\star\|\]is valid, where \(x_t\) is the output of the subgradient method after \(t\leqslant n\) steps, and where \(x_\star\) is the minimizer of \(f\).
In short, for given values of \(M\), the step-size \(\gamma\) and the number of iterations \(n\), \(\tau(n, M, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star\) when \(\|x_0-x_\star\| \leqslant 1\).
Algorithm: For \(t\in \{0, \dots, n-1 \}\)
\begin{eqnarray} g_{t} & \in & \partial f(x_t) \\ x_{t+1} & = & x_t - \gamma g_t \end{eqnarray}Theoretical guarantee: The tight bound is obtained in [1, Section 3.2.3] and [2, Eq (2)]
\[\min_{0 \leqslant t \leqslant n} f(x_t)- f(x_\star) \leqslant \frac{M}{\sqrt{n+1}}\|x_0-x_\star\|,\]and tightness follows from the lower complexity bound for this class of problems, e.g., [3, Appendix A].
References: Classical references on this topic include [1, 2].
[2] S. Boyd, L. Xiao, A. Mutapcic (2003). Subgradient Methods (lecture notes).
- Parameters
M (float) – the Lipschitz parameter.
n (int) – the number of iterations.
gamma (float) – step-size.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> M = 2 >>> n = 6 >>> gamma = 1 / (M * sqrt(n + 1)) >>> pepit_tau, theoretical_tau = wc_subgradient_method(M=M, n=n, gamma=gamma, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 9x9 (PEPit) Setting up the problem: performance measure is minimum of 7 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 64 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.7559825331741553 *** Example file: worst-case performance of subgradient method *** PEPit guarantee: min_(0 \leq t \leq n) f(x_i) - f_* <= 0.755983 ||x_0 - x_*|| Theoretical guarantee: min_(0 \leq t \leq n) f(x_i) - f_* <= 0.755929 ||x_0 - x_*||
1.3. Subgradient method under restricted secant inequality and error bound
- PEPit.examples.unconstrained_convex_minimization.wc_subgradient_method_rsi_eb(mu, L, gamma, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) verifies the “lower” restricted secant inequality (\(\mu-\text{RSI}^-\)) and the “upper” error bound (\(L-\text{EB}^+\)) [1].
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, \mu, L, \gamma)\) such that the guarantee
\[\| x_n - x_\star \|^2 \leqslant \tau(n, \mu, L, \gamma) \| x_0 - x_\star \|^2\]is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, \mu, L, \gamma)\) is computed as the worst-case value of \(\| x_n - x_\star \|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: Sub-gradient descent is described by
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size.
Theoretical guarantee: The tight theoretical guarantee can be found in [1, Prop 1] (upper bound) and [1, Theorem 2] (lower bound):
\[\| x_n - x_\star \|^2 \leqslant (1 - 2\gamma\mu + L^2 \gamma^2)^n \|x_0-x_\star\|^2.\]References:
Definition and convergence guarantees can be found in [1].
- Parameters
mu (float) – the rsi parameter
L (float) – the eb parameter.
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> mu = .1 >>> L = 1 >>> pepit_tau, theoretical_tau = wc_subgradient_method_under_restricted_secant_inequality_and_error_bound(mu=mu, L=L, gamma=mu / L**2, n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 6x6 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 8 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.9605893213566064 *** Example file: worst-case performance of gradient descent with fixed step-sizes *** PEPit guarantee: f(x_n)-f_* <= 0.960589 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.960596 ||x_0 - x_*||^2
1.4. Gradient descent with exact line search
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_exact_line_search(L, mu, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for the gradient descent (GD) with exact linesearch (ELS). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of the GD with ELS, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).
Algorithm: GD with ELS can be written as
\[x_{t+1} = x_t - \gamma_t \nabla f(x_t)\]with \(\gamma_t = \arg\min_{\gamma} f \left( x_t - \gamma \nabla f(x_t) \right)\).
Theoretical guarantee: The tight worst-case guarantee for GD with ELS, obtained in [1, Theorem 1.2], is
\[f(x_n) - f_\star \leqslant \left(\frac{L-\mu}{L+\mu}\right)^{2n} (f(x_0) - f_\star).\]References: The detailed approach (based on convex relaxations) is available in [1], along with theoretical bound.
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_gradient_exact_line_search(L=1, mu=.1, n=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 16 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.44812204883466417 *** Example file: worst-case performance of gradient descent with exact linesearch (ELS) *** PEPit guarantee: f(x_n)-f_* <= 0.448122 (f(x_0)-f_*) Theoretical guarantee: f(x_n)-f_* <= 0.448125 (f(x_0)-f_*)
1.5. Conjugate gradient
- PEPit.examples.unconstrained_convex_minimization.wc_conjugate_gradient(L, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for the conjugate gradient (CG) method (with exact span searches). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0-x_\star\|^2\]is valid, where \(x_n\) is the output of the conjugate gradient method, and where \(x_\star\) is a minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0-x_\star\|^2 \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i)\]with
\[(\gamma_i)_{i \leqslant t} = \arg\min_{(\gamma_i)_{i \leqslant t}} f \left(x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i) \right)\]Theoretical guarantee:
The tight guarantee obtained in [1] is
\[f(x_n) - f_\star \leqslant\frac{L}{2 \theta_n^2}\|x_0-x_\star\|^2.\]where
\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}, \end{eqnarray}and tightness follows from [2, Theorem 3].
References: The detailed approach (based on convex relaxations) is available in [1, Corollary 6].
- Parameters
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_conjugate_gradient(L=1, n=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 18 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.061893515427809735 *** Example file: worst-case performance of conjugate gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.0618935 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0618942 ||x_0 - x_*||^2
1.6. Heavy Ball momentum
- PEPit.examples.unconstrained_convex_minimization.wc_heavy_ball_momentum(mu, L, alpha, beta, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for the Heavy-ball (HB) method, aka Polyak momentum method. That is, it computes the smallest possible \(\tau(n, L, \mu, \alpha, \beta)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \alpha, \beta) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of the Heavy-ball (HB) method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu, \alpha, \beta)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \alpha \nabla f(x_t) + \beta (x_t-x_{t-1})\]with
\[\alpha \in (0, \frac{1}{L}]\]and
\[\beta = \sqrt{(1 - \alpha \mu)(1 - L \alpha)}\]Theoretical guarantee:
The upper guarantee obtained in [2, Theorem 4] is
\[f(x_n) - f_\star \leqslant (1 - \alpha \mu)^n (f(x_0) - f_\star).\]References: This methods was first introduce in [1, Section 2], and convergence upper bound was proven in [2, Theorem 4].
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
alpha (float) – parameter of the scheme.
beta (float) – parameter of the scheme such that \(0<\beta<1\) and \(0<\alpha<2(1+\beta)\).
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> mu = 0.1 >>> L = 1. >>> alpha = 1 / (2 * L) # alpha \in [0, 1 / L] >>> beta = sqrt((1 - alpha * mu) * (1 - L * alpha)) >>> pepit_tau, theoretical_tau = wc_heavy_ball_momentum(mu=mu, L=L, alpha=alpha, beta=beta, n=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 5x5 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 12 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.8145062493468549 *** Example file: worst-case performance of the Heavy-Ball method *** PEPit guarantee: f(x_n)-f_* <= 0.753492 (f(x_0) - f(x_*)) Theoretical guarantee: f(x_n)-f_* <= 0.9025 (f(x_0) - f(x_*))
1.7. Accelerated gradient for convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_accelerated_gradient_convex(mu, L, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex (\(\mu\) is possibly 0).
This code computes a worst-case guarantee for an accelerated gradient method, a.k.a. fast gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the accelerated gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: The accelerated gradient method of this example is provided by
\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \\ y_{t+1} & = & x_{t+1} + \frac{t-1}{t+2} (x_{t+1} - x_t). \end{eqnarray}Theoretical guarantee: When \(\mu=0\), a tight empirical guarantee can be found in [1, Table 1]:
\[f(x_n)-f_\star \leqslant \frac{2L\|x_0-x_\star\|^2}{n^2 + 5 n + 6},\]where tightness is obtained on some Huber loss functions.
References:
- Parameters
mu (float) – the strong convexity parameter
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_convex(mu=0, L=1, n=1, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 4x4 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 6 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.16666666668209376 *** Example file: worst-case performance of accelerated gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
1.8. Accelerated gradient for strongly convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_accelerated_gradient_strongly_convex(mu, L, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for an accelerated gradient method, a.k.a fast gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \left(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2\right),\]is valid, where \(x_n\) is the output of the accelerated gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} y_t & = & x_t + \frac{\sqrt{L} - \sqrt{\mu}}{\sqrt{L} + \sqrt{\mu}}(x_t - x_{t-1}) \\ x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \end{eqnarray}with \(x_{-1}:= x_0\).
Theoretical guarantee:
The following upper guarantee can be found in [1, Corollary 4.15]:
\[f(x_n)-f_\star \leqslant \left(1 - \sqrt{\frac{\mu}{L}}\right)^n \left(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2\right).\]References:
- Parameters
mu (float) – the strong convexity parameter
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_strongly_convex(mu=0.1, L=1, n=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 5x5 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 12 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.34758587217463155 *** Example file: worst-case performance of the accelerated gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.347586 (f(x_0) - f(x_*) + mu/2*|| x_0 - x_* ||**2) Theoretical guarantee: f(x_n)-f_* <= 0.467544 (f(x_0) - f(x_*) + mu/2*|| x_0 - x_* ||**2)
1.9. Optimized gradient
- PEPit.examples.unconstrained_convex_minimization.wc_optimized_gradient(L, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for optimized gradient method (OGM). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of OGM and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: The optimized gradient method is described by
\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t)\\ y_{t+1} & = & x_{t+1} + \frac{\theta_{t}-1}{\theta_{t+1}}(x_{t+1}-x_t)+\frac{\theta_{t}}{\theta_{t+1}}(x_{t+1}-y_t), \end{eqnarray}with
\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}. \end{eqnarray}Theoretical guarantee: The tight theoretical guarantee can be found in [2, Theorem 2]:
\[f(x_n)-f_\star \leqslant \frac{L\|x_0-x_\star\|^2}{2\theta_n^2},\]where tightness follows from [3, Theorem 3].
References: The optimized gradient method was developed in [1, 2]; the corresponding lower bound was first obtained in [3].
- Parameters
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_optimized_gradient(L=3, n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 30 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: MOSEK); optimal value: 0.07675182659831646 *** Example file: worst-case performance of optimized gradient method *** PEPit guarantee: f(y_n)-f_* <= 0.0767518 || x_0 - x_* ||^2 Theoretical guarantee: f(y_n)-f_* <= 0.0767518 || x_0 - x_* ||^2
1.10. Optimized gradient for gradient
- PEPit.examples.unconstrained_convex_minimization.wc_optimized_gradient_for_gradient(L, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for optimized gradient method for gradient (OGM-G). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[\|\nabla f(x_n)\|^2 \leqslant \tau(n, L) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of OGM-G and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(\|\nabla f(x_n)\|^2\) when \(f(x_0)-f_\star \leqslant 1\).
Algorithm: For \(t\in\{0,1,\ldots,n-1\}\), the optimized gradient method for gradient [1, Section 6.3] is described by
\begin{eqnarray} y_{t+1} & = & x_t - \frac{1}{L} \nabla f(x_t),\\ x_{t+1} & = & y_{t+1} + \frac{(\tilde{\theta}_t-1)(2\tilde{\theta}_{t+1}-1)}{\tilde{\theta}_t(2\tilde{\theta}_t-1)}(y_{t+1}-y_t)+\frac{2\tilde{\theta}_{t+1}-1}{2\tilde{\theta}_t-1}(y_{t+1}-x_t), \end{eqnarray}with
\begin{eqnarray} \tilde{\theta}_n & = & 1 \\ \tilde{\theta}_t & = & \frac{1 + \sqrt{4 \tilde{\theta}_{t+1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \tilde{\theta}_0 & = & \frac{1 + \sqrt{8 \tilde{\theta}_{1}^2 + 1}}{2}. \end{eqnarray}Theoretical guarantee: The tight worst-case guarantee can be found in [1, Theorem 6.1]:
\[\|\nabla f(x_n)\|^2 \leqslant \frac{2L(f(x_0)-f_\star)}{\tilde{\theta}_0^2},\]where tightness is achieved on Huber losses, see [1, Section 6.4].
References: The optimized gradient method for gradient was developed in [1].
- Parameters
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_optimized_gradient_for_gradient(L=3, n=4, verbose=1) (PEP-it) Setting up the problem: size of the main PSD matrix: 7x7 (PEP-it) Setting up the problem: performance measure is minimum of 1 element(s) (PEP-it) Setting up the problem: initial conditions (1 constraint(s) added) (PEP-it) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 30 constraint(s) added (PEP-it) Compiling SDP (PEP-it) Calling SDP solver (PEP-it) Solver status: optimal (solver: MOSEK); optimal value: 0.30700730460985826 *** Example file: worst-case performance of optimized gradient method for gradient *** PEP-it guarantee: ||f'(x_n)|| ^ 2 <= 0.307007 (f(x_0) - f_*) Theoretical guarantee: ||f'(x_n)|| ^ 2 <= 0.307007 (f(x_0) - f_*)
1.11. Robust momentum
- PEPit.examples.unconstrained_convex_minimization.wc_robust_momentum(mu, L, lam, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly-convex.
This code computes a worst-case guarantee for the robust momentum method (RMM). That is, it computes the smallest possible \(\tau(n, \mu, L, \lambda)\) such that the guarantee
\[v(x_{n+1}) \leqslant \tau(n, \mu, L, \lambda) v(x_{n}),\]is valid, where \(x_n\) is the \(n^{\mathrm{th}}\) iterate of the RMM, and \(x_\star\) is a minimizer of \(f\). The function \(v(.)\) is a well-chosen Lyapunov defined as follows,
\begin{eqnarray} v(x_t) & = & l\|z_t - x_\star\|^2 + q_t, \\ q_t & = & (L - \mu) \left(f(x_t) - f_\star - \frac{\mu}{2}\|y_t - x_\star\|^2 - \frac{1}{2}\|\nabla f(y_t) - \mu (y_t - x_\star)\|^2 \right), \end{eqnarray}with \(\kappa = \frac{\mu}{L}\), \(\rho = \lambda (1 - \frac{1}{\kappa}) + (1 - \lambda) \left(1 - \frac{1}{\sqrt{\kappa}}\right)\), and \(l = \mu^2 \frac{\kappa - \kappa \rho^2 - 1}{2 \rho (1 - \rho)}\).
Algorithm:
For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} x_{t+1} & = & x_{t} + \beta (x_t - x_{t-1}) - \alpha \nabla f(y_t), \\ y_{t+1} & = & y_{t} + \gamma (x_t - x_{t-1}), \end{eqnarray}with \(x_{-1}, x_0 \in \mathrm{R}^d\), and with parameters \(\alpha = \frac{\kappa (1 - \rho^2)(1 + \rho)}{L}\), \(\beta = \frac{\kappa \rho^3}{\kappa - 1}\), \(\gamma = \frac{\rho^2}{(\kappa - 1)(1 - \rho)^2(1 + \rho)}\).
Theoretical guarantee:
A convergence guarantee (empirically tight) is obtained in [1, Theorem 1],
\[v(x_{n+1}) \leqslant \rho^2 v(x_n),\]with \(\rho = \lambda (1 - \frac{1}{\kappa}) + (1 - \lambda) \left(1 - \frac{1}{\sqrt{\kappa}}\right)\).
References:
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
lam (float) – if \(\lambda=1\) it is the gradient descent, if \(\lambda=0\), it is the Triple Momentum Method.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Examples
>>> pepit_tau, theoretical_tau = wc_robust_momentum(0.1, 1, 0.2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 5x5 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 6 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.5285548355257013 *** Example file: worst-case performance of the Robust Momentum Method *** PEPit guarantee: v(x_(n+1)) <= 0.528555 v(x_n) Theoretical guarantee: v(x_(n+1)) <= 0.528555 v(x_n)
1.12. Triple momentum
- PEPit.examples.unconstrained_convex_minimization.wc_triple_momentum(mu, L, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for triple momentum method (TMM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the TMM, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm:
For \(t \in \{ 1, \dots, n\}\)
\begin{eqnarray} \xi_{t+1} &&= (1 + \beta) \xi_{t} - \beta \xi_{t-1} - \alpha \nabla f(y_t) \\ y_{t} &&= (1+\gamma ) \xi_{t} -\gamma \xi_{t-1} \\ x_{t} && = (1 + \delta) \xi_{t} - \delta \xi_{t-1} \end{eqnarray}with
\begin{eqnarray} \kappa &&= \frac{L}{\mu} , \quad \rho = 1- \frac{1}{\sqrt{\kappa}}\\ (\alpha, \beta, \gamma,\delta) && = \left(\frac{1+\rho}{L}, \frac{\rho^2}{2-\rho}, \frac{\rho^2}{(1+\rho)(2-\rho)}, \frac{\rho^2}{1-\rho^2}\right) \end{eqnarray}and
\begin{eqnarray} \xi_{0} = x_0 \\ \xi_{1} = x_0 \\ y = x_0 \end{eqnarray}Theoretical guarantee: A theoretical upper (empirically tight) bound can be found in [1, Theorem 1, eq. 4]:
\[f(x_n)-f_\star \leqslant \frac{\rho^{2(n+1)} L \kappa}{2}\|x_0 - x_\star\|^2.\]References: The triple momentum method was discovered and analyzed in [1].
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_triple_momentum(mu=0.1, L=1., n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 30 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.23893532450841679 *** Example file: worst-case performance of the Triple Momentum Method *** PEPit guarantee: f(x_n)-f_* <= 0.238935 ||x_0-x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.238925 ||x_0-x_*||^2
1.13. Information theoretic exact method
- PEPit.examples.unconstrained_convex_minimization.wc_information_theoretic(mu, L, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex (\(\mu\) is possibly 0).
This code computes a worst-case guarantee for the information theoretic exact method (ITEM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[\|z_n - x_\star\|^2 \leqslant \tau(n, L, \mu) \|z_0 - x_\star\|^2\]is valid, where \(z_n\) is the output of the ITEM, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\|z_n - x_\star\|^2\) when \(\|z_0 - x_\star\|^2 \leqslant 1\).
Algorithm: For \(t\in\{0,1,\ldots,n-1\}\), the information theoretic exact method of this example is provided by
\begin{eqnarray} y_{t} & = & (1-\beta_t) z_t + \beta_t x_t \\ x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \\ z_{t+1} & = & \left(1-q\delta_t\right) z_t+q\delta_t y_t-\frac{\delta_t}{L}\nabla f(y_t), \end{eqnarray}with \(y_{-1}=x_0=z_0\), \(q=\frac{\mu}{L}\) (inverse condition ratio), and the scalar sequences:
\begin{eqnarray} A_{t+1} & = & \frac{(1+q)A_t+2\left(1+\sqrt{(1+A_t)(1+qA_t)}\right)}{(1-q)^2},\\ \beta_{t+1} & = & \frac{A_t}{(1-q)A_{t+1}},\\ \delta_{t+1} & = & \frac{1}{2}\frac{(1-q)^2A_{t+1}-(1+q)A_t}{1+q+q A_t}, \end{eqnarray}with \(A_0=0\).
Theoretical guarantee: A tight worst-case guarantee can be found in [1, Theorem 3]:
\[\|z_n - x_\star\|^2 \leqslant \frac{1}{1+q A_n} \|z_0-x_\star\|^2,\]where tightness is obtained on some quadratic loss functions (see [1, Lemma 2]).
References:
- Parameters
mu (float) – the strong convexity parameter.
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_information_theoretic(mu=.001, L=1, n=15, verbose=1) (PEP-it) Setting up the problem: size of the main PSD matrix: 17x17 (PEP-it) Setting up the problem: performance measure is minimum of 1 element(s) (PEP-it) Setting up the problem: initial conditions (1 constraint(s) added) (PEP-it) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 240 constraint(s) added (PEP-it) Compiling SDP (PEP-it) Calling SDP solver (PEP-it) Solver status: optimal (solver: MOSEK); optimal value: 0.7566088333863754 *** Example file: worst-case performance of the information theoretic exact method *** PEP-it guarantee: ||z_n - x_* ||^2 <= 0.756609 ||z_0 - x_*||^2 Theoretical guarantee: ||z_n - x_* ||^2 <= 0.756605 ||z_0 - x_*||^2
1.14. Proximal point
- PEPit.examples.unconstrained_convex_minimization.wc_proximal_point(gamma, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is closed, proper, and convex (and potentially non-smooth).
This code computes a worst-case guarantee for the proximal point method with step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n,\gamma)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, \gamma) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the proximal point method, and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(\gamma\), \(\tau(n,\gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm:
The proximal point method is described by
\[x_{t+1} = \arg\min_x \left\{f(x)+\frac{1}{2\gamma}\|x-x_t\|^2 \right\},\]where \(\gamma\) is a step-size.
Theoretical guarantee:
The tight theoretical guarantee can be found in [1, Theorem 4.1]:
\[f(x_n)-f_\star \leqslant \frac{\|x_0-x_\star\|^2}{4\gamma n},\]where tightness is obtained on, e.g., one-dimensional linear problems on the positive orthant.
References:
- Parameters
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_proximal_point(gamma=3, n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 6x6 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 20 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: MOSEK); optimal value: 0.020833335685727362 *** Example file: worst-case performance of proximal point method *** PEPit guarantee: f(x_n)-f_* <= 0.0208333 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0208333 ||x_0 - x_*||^2
1.15. Accelerated proximal point
- PEPit.examples.unconstrained_convex_minimization.wc_accelerated_proximal_point(A0, gammas, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is convex and possibly non-smooth.
This code computes a worst-case guarantee an accelerated proximal point method, aka fast proximal point method (FPP). That is, it computes the smallest possible \(\tau(n, A_0,\vec{\gamma})\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, A_0, \vec{\gamma}) \left(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2\right)\]is valid, where \(x_n\) is the output of FPP (with step-size \(\gamma_t\) at step \(t\in \{0, \dots, n-1\}\)) and where \(x_\star\) is a minimizer of \(f\) and \(A_0\) is a positive number.
In short, for given values of \(n\), \(A_0\) and \(\vec{\gamma}\), \(\tau(n)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2 \leqslant 1\), for the following method.
Algorithm: For \(t\in \{0, \dots, n-1\}\):
\begin{eqnarray} y_{t+1} & = & (1-\alpha_{t} ) x_{t} + \alpha_{t} v_t \\ x_{t+1} & = & \arg\min_x \left\{f(x)+\frac{1}{2\gamma_t}\|x-y_{t+1}\|^2 \right\}, \\ v_{t+1} & = & v_t + \frac{1}{\alpha_{t}} (x_{t+1}-y_{t+1}) \end{eqnarray}with
\begin{eqnarray} \alpha_{t} & = & \frac{\sqrt{(A_t \gamma_t)^2 + 4 A_t \gamma_t} - A_t \gamma_t}{2} \\ A_{t+1} & = & (1 - \alpha_{t}) A_t \end{eqnarray}and \(v_0=x_0\).
Theoretical guarantee: A theoretical upper bound can be found in [1, Theorem 2.3.]:
\[f(x_n)-f_\star \leqslant \frac{4}{A_0 (\sum_{t=0}^{n-1} \sqrt{\gamma_t})^2}\left(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2 \right).\]References: The accelerated proximal point was first obtained and analyzed in [1].
- Parameters
A0 (float) – initial value for parameter A_0.
gammas (list) – sequence of step-sizes.
n (int) – number of iterations.
verbose (int) – Level of information details to print. - 1: No verbose at all. - 0: This example’s output. - 1: This example’s output + PEPit information. - 2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_proximal_point(A0=5, gammas=[(i + 1) / 1.1 for i in range(3)], n=3, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 6x6 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 20 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.01593113594082973 *** Example file: worst-case performance of optimized gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.0159311 (f(x_0) - f_* + A0/2* ||x_0 - x_*||^2) Theoretical guarantee: f(x_n)-f_* <= 0.0511881 (f(x_0) - f_* + A0/2* ||x_0 - x_*||^2)
1.16. Inexact gradient descent
- PEPit.examples.unconstrained_convex_minimization.wc_inexact_gradient_descent(L, mu, epsilon, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for the inexact gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu, \varepsilon)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \varepsilon) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of the inexact gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\), \(\mu\) and \(\varepsilon\), \(\tau(n, L, \mu, \varepsilon)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \gamma d_t\]with
\[\|d_t - \nabla f(x_t)\| \leqslant \varepsilon \|\nabla f(x_t)\|\]and
\[\gamma = \frac{2}{L_{\varepsilon} + \mu_{\varepsilon}}\]where \(L_{\varepsilon} = (1 + \varepsilon) L\) and \(\mu_{\varepsilon} = (1 - \varepsilon) \mu\).
Theoretical guarantee:
The tight worst-case guarantee obtained in [1, Theorem 5.3] or [2, Remark 1.6] is
\[f(x_n) - f_\star \leqslant \left(\frac{L_{\varepsilon}-\mu_{\varepsilon}}{L_{\varepsilon}+\mu_{\varepsilon}}\right)^{2n}(f(x_0) - f_\star),\]where tightness is achieved on simple quadratic functions.
References: The detailed analyses can be found in [1, 2].
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
epsilon (float) – level of inaccuracy.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_inexact_gradient_descent(L=1, mu=.1, epsilon=.1, n=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 8x8 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 15 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.5188661397616067 *** Example file: worst-case performance of inexact gradient method in distance in function values *** PEPit guarantee: f(x_n)-f_* <= 0.518866 (f(x_0)-f_*) Theoretical guarantee: f(x_n)-f_* <= 0.518917 (f(x_0)-f_*)
1.17. Inexact gradient descent with exact line search
- PEPit.examples.unconstrained_convex_minimization.wc_inexact_gradient_exact_line_search(L, mu, epsilon, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for an inexact gradient method with exact linesearch (ELS). That is, it computes the smallest possible \(\tau(n, L, \mu, \varepsilon)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \varepsilon) ( f(x_0) - f_\star )\]is valid, where \(x_n\) is the output of the gradient descent with an inexact descent direction and an exact linesearch, and where \(x_\star\) is the minimizer of \(f\).
The inexact descent direction \(d\) is assumed to satisfy a relative inaccuracy described by (with \(0 \leqslant \varepsilon < 1\))
\[\|\nabla f(x_t) - d_t\| \leqslant \varepsilon \|\nabla f(x_t)\|,\]where \(\nabla f(x_t)\) is the true gradient, and \(d_t\) is the approximate descent direction that is used.
Algorithm:
For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} \gamma_t & = & \arg\min_{\gamma \in R^d} f(x_t- \gamma d_t), \\ x_{t+1} & = & x_t - \gamma_t d_t. \end{eqnarray}Theoretical guarantees:
The tight guarantee obtained in [1, Theorem 5.1] is
\[f(x_n) - f_\star\leqslant \left(\frac{L_{\varepsilon} - \mu_{\varepsilon}}{L_{\varepsilon} + \mu_{\varepsilon}}\right)^{2n}( f(x_0) - f_\star ),\]with \(L_{\varepsilon} = (1 + \varepsilon) L\) and \(\mu_{\varepsilon} = (1 - \varepsilon) \mu\). Tightness is achieved on simple quadratic functions.
References: The detailed approach (based on convex relaxations) is available in [1],
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
epsilon (float) – level of inaccuracy.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_inexact_gradient_exact_line_search(1, 0.1, 0.1, 1, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 9x9 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 18 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.5186658287501191 *** Example file: worst-case performance of inexact gradient descent with exact linesearch *** PEPit guarantee: f(x_n)-f_* <= 0.518666 (f(x_0)-f_*) Theoretical guarantee: f(x_n)-f_* <= 0.518917 (f(x_0)-f_*)
1.18. Inexact accelerated gradient
- PEPit.examples.unconstrained_convex_minimization.wc_inexact_accelerated_gradient(L, epsilon, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for an accelerated gradient method using inexact first-order information. That is, it computes the smallest possible \(\tau(n, L, \varepsilon)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \varepsilon) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of inexact accelerated gradient descent and where \(x_\star\) is a minimizer of \(f\).
The inexact descent direction is assumed to satisfy a relative inaccuracy described by (with \(0\leqslant \varepsilon \leqslant 1\))
\[\|\nabla f(y_t) - d_t\| \leqslant \varepsilon \|\nabla f(y_t)\|,\]where \(\nabla f(y_t)\) is the true gradient at \(y_t\) and \(d_t\) is the approximate descent direction that is used.
Algorithm: The inexact accelerated gradient method of this example is provided by
\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} d_t\\ y_{k+1} & = & x_{t+1} + \frac{t-1}{t+2} (x_{t+1} - x_t). \end{eqnarray}Theoretical guarantee: When \(\varepsilon=0\), a tight empirical guarantee can be found in [1, Table 1]:
\[f(x_n)-f_\star \leqslant \frac{2L\|x_0-x_\star\|^2}{n^2 + 5 n + 6},\]which is achieved on some Huber loss functions (when \(\varepsilon=0\)).
References:
- Parameters
L (float) – smoothness parameter.
epsilon (float) – level of inaccuracy
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_inexact_accelerated_gradient(L=1, epsilon=0.1, n=5, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 13x13 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 47 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.03944038534724904 *** Example file: worst-case performance of inexact accelerated gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.0394404 (f(x_0)-f_*) Theoretical guarantee for epsilon = 0 : f(x_n)-f_* <= 0.0357143 (f(x_0)-f_*)
1.19. Gradient descent for quadratically upper bounded convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent_qg_convex(L, gamma, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [1]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \gamma) \| x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(||x_0 - x_\star||^2 \leqslant 1\).
Algorithm: Gradient descent is described by
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size.
Theoretical guarantee: When \(\gamma < \frac{1}{L}\), the lower theoretical guarantee can be found in [1, Theorem 2.2]:
\[f(x_n)-f_\star \leqslant \frac{L}{2}\max\left(\frac{1}{2n L \gamma + 1}, L \gamma\right) \|x_0-x_\star\|^2.\]References:
The detailed approach is available in [1, Theorem 2.2].
- Parameters
L (float) – the quadratic growth parameter.
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> L = 1 >>> pepit_tau, theoretical_tau = wc_gradient_descent_qg_convex(L=L, gamma=.2 / L, n=4, verbose=1) (PEP-it) Setting up the problem: size of the main PSD matrix: 7x7 (PEP-it) Setting up the problem: performance measure is minimum of 1 element(s) (PEP-it) Setting up the problem: initial conditions (1 constraint(s) added) (PEP-it) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 35 constraint(s) added (PEP-it) Compiling SDP (PEP-it) Calling SDP solver (PEP-it) Solver status: optimal (solver: SCS); optimal value: 0.19230811671886025 *** Example file: worst-case performance of gradient descent with fixed step-sizes *** PEP-it guarantee: f(x_n)-f_* <= 0.192308 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.192308 ||x_0 - x_*||^2
1.20. Gradient descent with decreasing step sizes for quadratically upper bounded convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent_qg_convex_decreasing(L, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [1]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.
This code computes a worst-case guarantee for gradient descent with decreasing step-sizes. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \| x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of gradient descent with decreasing step-sizes, and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(||x_0 - x_\star||^2 \leqslant 1\).
Algorithm: Gradient descent with decreasing step sizes is described by
\[x_{t+1} = x_t - \gamma_t \nabla f(x_t)\]with
\[\gamma_t = \frac{1}{L u_{t+1}}\]where the sequence \(u\) is defined by
\begin{eqnarray} u_0 & = & 1 \\ u_{t} & = & \frac{u_{t-1}}{2} + \sqrt{\left(\frac{u_{t-1}}{2}\right)^2 + 2}, \quad \mathrm{for } t \geq 1 \end{eqnarray}Theoretical guarantee: The tight theoretical guarantee is conjectured in [1, Conjecture A.3]:
\[f(x_n)-f_\star \leqslant \frac{L}{2 u_t} \|x_0-x_\star\|^2.\]Notes:
We verify that \(u_t \sim 2\sqrt{t}\). The step sizes as well as the function values of the iterates decrease as \(O\left( \frac{1}{\sqrt{t}} \right)\).
References:
The detailed approach is available in [1, Appendix A.3].
- Parameters
L (float) – the quadratic growth parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_gradient_descent_qg_convex_decreasing(L=1, n=6, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 9x9 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 63 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.10554312873105381 *** Example file: worst-case performance of gradient descent with fixed step-sizes *** PEP-it guarantee: f(x_n)-f_* <= 0.105543 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.105547 ||x_0 - x_*||^2
1.21. Conjugate gradient for quadratically upper bounded convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_conjugate_gradient_qg_convex(L, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [2]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.
This code computes a worst-case guarantee for the conjugate gradient (CG) method (with exact span searches). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0-x_\star\|^2\]is valid, where \(x_n\) is the output of the conjugate gradient method, and where \(x_\star\) is a minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0-x_\star\|^2 \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i)\]with
\[(\gamma_i)_{i \leqslant t} = \arg\min_{(\gamma_i)_{i \leqslant t}} f \left(x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i) \right)\]Theoretical guarantee:
The tight guarantee obtained in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper) is
\[f(x_n) - f_\star \leqslant \frac{L}{2 (n + 1)} \|x_0-x_\star\|^2.\]References: The detailed approach (based on convex relaxations) is available in [1, Corollary 6], and the result provided in [2, Theorem 2.4].
- Parameters
L (float) – the quadratic growth parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_conjugate_gradient_qg_convex(L=1, n=12, verbose=1) (PEP-it) Setting up the problem: size of the main PSD matrix: 27x27 (PEP-it) Setting up the problem: performance measure is minimum of 1 element(s) (PEP-it) Setting up the problem: initial conditions (1 constraint(s) added) (PEP-it) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 351 constraint(s) added (PEP-it) Compiling SDP (PEP-it) Calling SDP solver (PEP-it) Solver status: optimal (solver: SCS); optimal value: 0.038461130525391705 *** Example file: worst-case performance of conjugate gradient method *** PEP-it guarantee: f(x_n)-f_* <= 0.0384611 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0384615 ||x_0 - x_*||^2
1.22. Heavy Ball momentum for quadratically upper bounded convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_heavy_ball_momentum_qg_convex(L, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [2]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.
This code computes a worst-case guarantee for the Heavy-ball (HB) method, aka Polyak momentum method. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the Heavy-ball (HB) method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm:
This method is described in [1]
\[x_{t+1} = x_t - \alpha_t \nabla f(x_t) + \beta_t (x_t-x_{t-1})\]with
\[\alpha_t = \frac{1}{L} \frac{1}{t+2}\]and
\[\beta_t = \frac{t}{t+2}\]Theoretical guarantee:
The tight guarantee obtained in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper) is
\[f(x_n) - f_\star \leqslant \frac{L}{2}\frac{1}{n+1} \|x_0 - x_\star\|^2.\]References: This methods was first introduce in [1, section 3], and convergence tight bound was proven in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper).
- Parameters
L (float) – the quadratic growth parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_heavy_ball_momentum_qg_convex(L=1, n=5, verbose=1) (PEP-it) Setting up the problem: size of the main PSD matrix: 9x9 (PEP-it) Setting up the problem: performance measure is minimum of 1 element(s) (PEP-it) Setting up the problem: initial conditions (1 constraint(s) added) (PEP-it) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 63 constraint(s) added (PEP-it) Compiling SDP (PEP-it) Calling SDP solver (PEP-it) Solver status: optimal (solver: SCS); optimal value: 0.08333167067320212 *** Example file: worst-case performance of the Heavy-Ball method *** PEP-it guarantee: f(x_n)-f_* <= 0.0833317 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0833333 ||x_0 - x_*||^2