1. Unconstrained convex minimization
1.1. Gradient descent
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent(L, gamma, n, verbose=True)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \gamma) || x_0 - x_\star ||^2\]is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(||x_0 - x_\star||^2 \leqslant 1\).
Algorithm: Gradient descent is described by
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size.
Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), the tight theoretical guarantee can be found in [1, Theorem 1]:
\[f(x_n)-f_\star \leqslant \frac{L||x_0-x_\star||^2}{4nL\gamma+2},\]which is tight on some Huber loss functions.
References:
- Parameters
L (float) – the smoothness parameter.
gamma (float) – step-size.
n (int) – number of iterations.
verbose (bool) – if True, print conclusion
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> L = 3 >>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=1 / L, n=4, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 30 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: MOSEK); optimal value: 0.16666666497937685 *** Example file: worst-case performance of gradient descent with fixed step-sizes *** PEPit guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
1.2. Subgradient method
- PEPit.examples.unconstrained_convex_minimization.wc_subgradient_method(M, n, gamma, verbose=True)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is convex and \(M\)-Lipschitz. This problem is a (possibly non-smooth) minimization problem.
This code computes a worst-case guarantee for the subgradient method. That is, it computes the smallest possible \(\tau(n, M, \gamma)\) such that the guarantee
\[\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star \leqslant \tau(n, M, \gamma) \|x_0 - x_\star\|\]is valid, where \(x_t\) is the output of the subgradient method after \(t\leqslant n\) steps, and where \(x_\star\) is the minimizer of \(f\).
In short, for given values of \(M\), the step-size \(\gamma\) and the number of iterations \(n\), \(\tau(n, M, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star\) when \(\|x_0-x_\star\| \leqslant 1\).
Algorithm: For \(t\in \{0, \dots, n-1 \}\)
\begin{eqnarray} g_{t} & \in & \partial f(x_t) \\ x_{t+1} & = & x_t - \gamma g_t \end{eqnarray}Theoretical guarantee: The tight bound is obtained in [1, Section 3.2.3] and [2, Eq (2)]
\[\min_{0 \leqslant t \leqslant n} f(x_t)- f(x_\star) \leqslant \frac{M}{\sqrt{n+1}}\|x_0-x_\star\|,\]and tightness follows from the lower complexity bound for this class of problems, e.g., [3, Appendix A].
References: Classical references on this topic include [1, 2].
[2] S. Boyd, L. Xiao, A. Mutapcic (2003). Subgradient Methods (lecture notes).
- Parameters
M (float) – the Lipschitz parameter.
n (int) – the number of iterations.
gamma (float) – step-size.
verbose (bool, optional) – if True, print conclusion.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> M = 2 >>> n = 6 >>> gamma = 1 / (M * sqrt(n + 1)) >>> pepit_tau, theoretical_tau = wc_subgradient_method(M=M, n=n, gamma=gamma, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 9x9 (PEPit) Setting up the problem: performance measure is minimum of 7 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 64 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.7559825331741553 *** Example file: worst-case performance of subgradient method *** PEPit guarantee: min_(0 \leq t \leq n) f(x_i) - f_* <= 0.755983 ||x_0 - x_*|| Theoretical guarantee: min_(0 \leq t \leq n) f(x_i) - f_* <= 0.755929 ||x_0 - x_*||
1.3. Gradient descent with exact line search
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_exact_line_search(L, mu, n, verbose=True)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for the gradient descent (GD) with exact linesearch (ELS). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of the GD with ELS, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).
Algorithm: GD with ELS can be written as
\[x_{t+1} = x_t - \gamma_t \nabla f(x_t)\]with \(\gamma_t = \arg\min_{\gamma} f \left( x_t - \gamma \nabla f(x_t) \right)\).
Theoretical guarantee: The tight worst-case guarantee for GD with ELS, obtained in [1, Theorem 1.2], is
\[f(x_n) - f_\star \leqslant \left(\frac{L-\mu}{L+\mu}\right)^{2n} (f(x_0) - f_\star).\]References: The detailed approach (based on convex relaxations) is available in [1], along with theoretical bound.
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of iterations.
verbose (bool, optional) – if True, print conclusion.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_gradient_exact_line_search(L=1, mu=.1, n=2, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 16 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.44812204883466417 *** Example file: worst-case performance of gradient descent with exact linesearch (ELS) *** PEPit guarantee: f(x_n)-f_* <= 0.448122 (f(x_0)-f_*) Theoretical guarantee: f(x_n)-f_* <= 0.448125 (f(x_0)-f_*)
1.4. Conjugate gradient
- PEPit.examples.unconstrained_convex_minimization.wc_conjugate_gradient(L, n, verbose=True)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for the conjugate gradient (CG) method (with exact span searches). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0-x_\star\|^2\]is valid, where \(x_n\) is the output of the conjugate gradient method, and where \(x_\star\) is a minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0-x_\star\|^2 \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i)\]with
\[(\gamma_i)_{i \leqslant t} = \arg\min_{(\gamma_i)_{i \leqslant t}} f \left(x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i) \right)\]Theoretical guarantee:
The tight guarantee obtained in [1] is
\[f(x_n) - f_\star \leqslant\frac{L}{2 \theta_n^2}\|x_0-x_\star\|^2.\]where
\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}, \end{eqnarray}and tightness follows from [2, Theorem 3].
References: The detailed approach (based on convex relaxations) is available in [1, Corollary 6].
- Parameters
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (bool) – if True, print conclusion.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_conjugate_gradient(L=1, n=2, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 18 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.061893515427809735 *** Example file: worst-case performance of conjugate gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.0618935 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0618942 ||x_0 - x_*||^2
1.5. Heavy Ball momentum
- PEPit.examples.unconstrained_convex_minimization.wc_heavy_ball_momentum(mu, L, alpha, beta, n, verbose=True)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for the Heavy-ball (HB) method, aka Polyak momentum method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of the Heavy-ball (HB) method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \alpha \nabla f(x_t) + \beta (x_t-x_{t-1})\]with
\[\alpha \in (0, \frac{1}{L}]\]and
\[\beta = \sqrt{(1 - \alpha \mu)(1 - L \alpha)}\]Theoretical guarantee:
The upper guarantee obtained in [2, Theorem 4] is
\[f(x_n) - f_\star \leqslant (1 - \alpha \mu)^n (f(x_0) - f_\star).\]References: This methods was first introduce in [1, Section 2], and convergence upper bound was proven in [2, Theorem 4].
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
alpha (float) – parameter of the scheme.
beta (float) – parameter of the scheme such that \(0<\beta<1\) and \(0<\alpha<2(1+\beta)\).
n (int) – number of iterations.
verbose (bool) – if True, print conclusion.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> mu = 0.1 >>> L = 1. >>> alpha = 1 / (2 * L) # alpha \in [0, 1 / L] >>> beta = sqrt((1 - alpha * mu) * (1 - L * alpha)) >>> pepit_tau, theoretical_tau = wc_heavy_ball_momentum(mu=mu, L=L, alpha=alpha, beta=beta, n=2, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 5x5 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 12 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.8145062493468549 *** Example file: worst-case performance of the Heavy-Ball method *** PEPit guarantee: f(x_n)-f_* <= 0.753492 (f(x_0) - f(x_*)) Theoretical guarantee: f(x_n)-f_* <= 0.9025 (f(x_0) - f(x_*))
1.6. Accelerated gradient for convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_accelerated_gradient_convex(mu, L, n, verbose=True)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex (\(\mu\) is possibly 0).
This code computes a worst-case guarantee for an accelerated gradient method, a.k.a. fast gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the accelerated gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: The accelerated gradient method of this example is provided by
\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \\ y_{t+1} & = & x_{t+1} + \frac{t-1}{t+2} (x_{t+1} - x_t). \end{eqnarray}Theoretical guarantee: When \(\mu=0\), a tight empirical guarantee can be found in [1, Table 1]:
\[f(x_n)-f_\star \leqslant \frac{2L\|x_0-x_\star\|^2}{n^2 + 5 n + 6},\]where tightness is obtained on some Huber loss functions.
References:
- Parameters
mu (float) – the strong convexity parameter
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (bool) – if True, print conclusion
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_convex(mu=0, L=1, n=1, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 4x4 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 6 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.16666666668209376 *** Example file: worst-case performance of accelerated gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
1.7. Accelerated gradient for strongly convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_accelerated_gradient_strongly_convex(mu, L, n, verbose=True)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for an accelerated gradient method, a.k.a fast gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \left(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2\right),\]is valid, where \(x_n\) is the output of the accelerated gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} y_t & = & x_t + \frac{\sqrt{L} - \sqrt{\mu}}{\sqrt{L} + \sqrt{\mu}}(x_t - x_{t-1}) \\ x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \end{eqnarray}with \(x_{-1}:= x_0\).
Theoretical guarantee:
The following upper guarantee can be found in [1, Corollary 4.15]:
\[f(x_n)-f_\star \leqslant \left(1 - \sqrt{\frac{\mu}{L}}\right)^n \left(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2\right).\]References:
- Parameters
mu (float) – the strong convexity parameter
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (bool) – if True, print conclusion
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_strongly_convex(mu=0.1, L=1, n=2, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 5x5 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 12 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.34758587217463155 *** Example file: worst-case performance of the accelerated gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.347586 (f(x_0) - f(x_*) + mu/2*|| x_0 - x_* ||**2) Theoretical guarantee: f(x_n)-f_* <= 0.467544 (f(x_0) - f(x_*) + mu/2*|| x_0 - x_* ||**2)
1.8. Optimized gradient
- PEPit.examples.unconstrained_convex_minimization.wc_optimized_gradient(L, n, verbose=True)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for optimized gradient method (OGM). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of OGM and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: The optimized gradient method is described by
\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t)\\ y_{t+1} & = & x_{t+1} + \frac{\theta_{t}-1}{\theta_{t+1}}(x_{t+1}-x_t)+\frac{\theta_{t}}{\theta_{t+1}}(x_{t+1}-y_t), \end{eqnarray}with
\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}. \end{eqnarray}Theoretical guarantee: The tight theoretical guarantee can be found in [2, Theorem 2]:
\[f(x_n)-f_\star \leqslant \frac{L\|x_0-x_\star\|^2}{2\theta_n^2},\]where tightness follows from [3, Theorem 3].
References: The optimized gradient method was developed in [1, 2]; the corresponding lower bound was first obtained in [3].
- Parameters
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (bool) – if True, print conclusion
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_optimized_gradient(L=3, n=4, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 30 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: MOSEK); optimal value: 0.07675182659831646 *** Example file: worst-case performance of optimized gradient method *** PEPit guarantee: f(y_n)-f_* <= 0.0767518 || x_0 - x_* ||^2 Theoretical guarantee: f(y_n)-f_* <= 0.0767518 || x_0 - x_* ||^2
1.9. Optimized gradient for gradient
- PEPit.examples.unconstrained_convex_minimization.wc_optimized_gradient_for_gradient(L, n, verbose=True)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for optimized gradient method for gradient (OGM-G). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[\|\nabla f(x_n)\|^2 \leqslant \tau(n, L) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of OGM-G and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(\|\nabla f(x_n)\|^2\) when \(f(x_0)-f_\star \leqslant 1\).
Algorithm: For \(t\in\{0,1,\ldots,n-1\}\), the optimized gradient method for gradient [1, Section 6.3] is described by
\begin{eqnarray} y_{t+1} & = & x_t - \frac{1}{L} \nabla f(x_t),\\ x_{t+1} & = & y_{t+1} + \frac{(\tilde{\theta}_t-1)(2\tilde{\theta}_{t+1}-1)}{\tilde{\theta}_t(2\tilde{\theta}_t-1)}(y_{t+1}-y_t)+\frac{2\tilde{\theta}_{t+1}-1}{2\tilde{\theta}_t-1}(y_{t+1}-x_t), \end{eqnarray}with
\begin{eqnarray} \tilde{\theta}_n & = & 1 \\ \tilde{\theta}_t & = & \frac{1 + \sqrt{4 \tilde{\theta}_{t+1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \tilde{\theta}_0 & = & \frac{1 + \sqrt{8 \tilde{\theta}_{1}^2 + 1}}{2}. \end{eqnarray}Theoretical guarantee: The tight worst-case guarantee can be found in [1, Theorem 6.1]:
\[\|\nabla f(x_n)\|^2 \leqslant \frac{2L(f(x_0)-f_\star)}{\tilde{\theta}_0^2},\]where tightness is achieved on Huber losses, see [1, Section 6.4].
References: The optimized gradient method for gradient was developed in [1].
- Parameters
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (bool) – if True, print conclusion.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_optimized_gradient_for_gradient(L=3, n=4, verbose=True) (PEP-it) Setting up the problem: size of the main PSD matrix: 7x7 (PEP-it) Setting up the problem: performance measure is minimum of 1 element(s) (PEP-it) Setting up the problem: initial conditions (1 constraint(s) added) (PEP-it) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 30 constraint(s) added (PEP-it) Compiling SDP (PEP-it) Calling SDP solver (PEP-it) Solver status: optimal (solver: MOSEK); optimal value: 0.30700730460985826 *** Example file: worst-case performance of optimized gradient method for gradient *** PEP-it guarantee: ||f'(x_n)|| ^ 2 <= 0.307007 (f(x_0) - f_*) Theoretical guarantee: ||f'(x_n)|| ^ 2 <= 0.307007 (f(x_0) - f_*)
1.10. Robust momentum
- PEPit.examples.unconstrained_convex_minimization.wc_robust_momentum(mu, L, lam, verbose=True)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly-convex.
This code computes a worst-case guarantee for the robust momentum method (RMM). That is, it computes the smallest possible \(\tau(n, \mu, L, \lambda)\) such that the guarantee
\[v(x_{n+1}) \leqslant \tau(n, \mu, L, \lambda) v(x_{n}),\]is valid, where \(x_n\) is the \(n^{\mathrm{th}}\) iterate of the RMM, and \(x_\star\) is a minimizer of \(f\). The function \(v(.)\) is a well-chosen Lyapunov defined as follows,
\begin{eqnarray} v(x_t) & = & l\|z_t - x_\star\|^2 + q_t, \\ q_t & = & (L - \mu) \left(f(x_t) - f_\star - \frac{\mu}{2}\|y_t - x_\star\|^2 - \frac{1}{2}\|\nabla f(y_t) - \mu (y_t - x_\star)\|^2 \right), \end{eqnarray}with \(\kappa = \frac{\mu}{L}\), \(\rho = \lambda (1 - \frac{1}{\kappa}) + (1 - \lambda) \left(1 - \frac{1}{\sqrt{\kappa}}\right)\), and \(l = \mu^2 \frac{\kappa - \kappa \rho^2 - 1}{2 \rho (1 - \rho)}\).
Algorithm:
For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} x_{t+1} & = & x_{t} + \beta (x_t - x_{t-1}) - \alpha \nabla f(y_t), \\ y_{t+1} & = & y_{t} + \gamma (x_t - x_{t-1}), \end{eqnarray}with \(x_{-1}, x_0 \in \mathrm{R}^d\), and with parameters \(\alpha = \frac{\kappa (1 - \rho^2)(1 + \rho)}{L}\), \(\beta = \frac{\kappa \rho^3}{\kappa - 1}\), \(\gamma = \frac{\rho^2}{(\kappa - 1)(1 - \rho)^2(1 + \rho)}\).
Theoretical guarantee:
A convergence guarantee (empirically tight) is obtained in [1, Theorem 1],
\[v(x_{n+1}) \leqslant \rho^2 v(x_n),\]with \(\rho = \lambda (1 - \frac{1}{\kappa}) + (1 - \lambda) \left(1 - \frac{1}{\sqrt{\kappa}}\right)\).
References:
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
lam (float) – if \(\lambda=1\) it is the gradient descent, if \(\lambda=0\), it is the Triple Momentum Method.
verbose (bool, optional) – if True, print conclusion.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Examples
>>> pepit_tau, theoretical_tau = wc_robust_momentum(0.1, 1, 0.2, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 5x5 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 6 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.5285548355257013 *** Example file: worst-case performance of the Robust Momentum Method *** PEPit guarantee: v(x_(n+1)) <= 0.528555 v(x_n) Theoretical guarantee: v(x_(n+1)) <= 0.528555 v(x_n)
1.11. Triple momentum
- PEPit.examples.unconstrained_convex_minimization.wc_triple_momentum(mu, L, n, verbose=True)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for triple momentum method (TMM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the TMM, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm:
For \(t \in \{ 1, \dots, n\}\)
\begin{eqnarray} \xi_{t+1} &&= (1 + \beta) \xi_{t} - \beta \xi_{t-1} - \alpha \nabla f(y_t) \\ y_{t} &&= (1+\gamma ) \xi_{t} -\gamma \xi_{t-1} \\ x_{t} && = (1 + \delta) \xi_{t} - \delta \xi_{t-1} \end{eqnarray}with
\begin{eqnarray} \kappa &&= \frac{L}{\mu} , \quad \rho = 1- \frac{1}{\sqrt{\kappa}}\\ (\alpha, \beta, \gamma,\delta) && = \left(\frac{1+\rho}{L}, \frac{\rho^2}{2-\rho}, \frac{\rho^2}{(1+\rho)(2-\rho)}, \frac{\rho^2}{1-\rho^2}\right) \end{eqnarray}and
\begin{eqnarray} \xi_{0} = x_0 \\ \xi_{1} = x_0 \\ y = x_0 \end{eqnarray}Theoretical guarantee: A theoretical upper (empirically tight) bound can be found in [1, Theorem 1, eq. 4]:
\[f(x_n)-f_\star \leqslant \frac{\rho^{2(n+1)} L \kappa}{2}\|x_0 - x_\star\|^2.\]References: The triple momentum method was discovered and analyzed in [1].
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of iterations.
verbose (bool) – if True, print conclusion.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_triple_momentum(mu=0.1, L=1., n=4, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 30 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.23893532450841679 *** Example file: worst-case performance of the Triple Momentum Method *** PEPit guarantee: f(x_n)-f_* <= 0.238935 ||x_0-x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.238925 ||x_0-x_*||^2
1.12. Information theoretic exact method
- PEPit.examples.unconstrained_convex_minimization.wc_information_theoretic(mu, L, n, verbose=True)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex (\(\mu\) is possibly 0).
This code computes a worst-case guarantee for the information theoretic exact method (ITEM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[\|z_n - x_\star\|^2 \leqslant \tau(n, L, \mu) \|z_0 - x_\star\|^2\]is valid, where \(z_n\) is the output of the ITEM, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\|z_n - x_\star\|^2\) when \(\|z_0 - x_\star\|^2 \leqslant 1\).
Algorithm: For \(t\in\{0,1,\ldots,n-1\}\), the information theoretic exact method of this example is provided by
\begin{eqnarray} y_{t} & = & (1-\beta_t) z_t + \beta_t x_t \\ x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \\ z_{t+1} & = & \left(1-q\delta_t\right) z_t+q\delta_t y_t-\frac{\delta_t}{L}\nabla f(y_t), \end{eqnarray}with \(y_{-1}=x_0=z_0\), \(q=\frac{\mu}{L}\) (inverse condition ratio), and the scalar sequences:
\begin{eqnarray} A_{t+1} & = & \frac{(1+q)A_t+2\left(1+\sqrt{(1+A_t)(1+qA_t)}\right)}{(1-q)^2},\\ \beta_{t+1} & = & \frac{A_t}{(1-q)A_{t+1}},\\ \delta_{t+1} & = & \frac{1}{2}\frac{(1-q)^2A_{t+1}-(1+q)A_t}{1+q+q A_t}, \end{eqnarray}with \(A_0=0\).
Theoretical guarantee: A tight worst-case guarantee can be found in [1, Theorem 3]:
\[\|z_n - x_\star\|^2 \leqslant \frac{1}{1+q A_n} \|z_0-x_\star\|^2,\]where tightness is obtained on some quadratic loss functions (see [1, Lemma 2]).
References:
- Parameters
mu (float) – the strong convexity parameter.
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (bool) – if True, print conclusion.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_information_theoretic(mu=.001, L=1, n=15, verbose=True) (PEP-it) Setting up the problem: size of the main PSD matrix: 17x17 (PEP-it) Setting up the problem: performance measure is minimum of 1 element(s) (PEP-it) Setting up the problem: initial conditions (1 constraint(s) added) (PEP-it) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 240 constraint(s) added (PEP-it) Compiling SDP (PEP-it) Calling SDP solver (PEP-it) Solver status: optimal (solver: MOSEK); optimal value: 0.7566088333863754 *** Example file: worst-case performance of the information theoretic exact method *** PEP-it guarantee: ||z_n - x_* ||^2 <= 0.756609 ||z_0 - x_*||^2 Theoretical guarantee: ||z_n - x_* ||^2 <= 0.756605 ||z_0 - x_*||^2
1.13. Proximal point
- PEPit.examples.unconstrained_convex_minimization.wc_proximal_point(gamma, n, verbose=True)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is closed, proper, and convex (and potentially non-smooth).
This code computes a worst-case guarantee for the proximal point method with step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n,\gamma)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, \gamma) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the proximal point method, and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(\gamma\), \(\tau(n,\gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm:
The proximal point method is described by
\[x_{t+1} = \arg\min_x \left\{f(x)+\frac{1}{2\gamma}\|x-x_t\|^2 \right\},\]where \(\gamma\) is a step-size.
Theoretical guarantee:
The tight theoretical guarantee can be found in [1, Theorem 4.1]:
\[f(x_n)-f_\star \leqslant \frac{\|x_0-x_\star\|^2}{4\gamma n},\]where tightness is obtained on, e.g., one-dimensional linear problems on the positive orthant.
References:
- Parameters
gamma (float) – step-size.
n (int) – number of iterations.
verbose (bool) – if True, print conclusion
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_proximal_point(gamma=3, n=4, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 6x6 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 20 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: MOSEK); optimal value: 0.020833335685727362 *** Example file: worst-case performance of proximal point method *** PEPit guarantee: f(x_n)-f_* <= 0.0208333 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0208333 ||x_0 - x_*||^2
1.14. Accelerated proximal point
- PEPit.examples.unconstrained_convex_minimization.wc_accelerated_proximal_point(A0, gammas, n, verbose=True)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is convex and possibly non-smooth.
This code computes a worst-case guarantee an accelerated proximal point method, aka fast proximal point method (FPP). That is, it computes the smallest possible \(\tau(n, A_0,\vec{\gamma})\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, A_0, \vec{\gamma}) \left(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2\right)\]is valid, where \(x_n\) is the output of FPP (with step-size \(\gamma_t\) at step \(t\in \{0, \dots, n-1\}\)) and where \(x_\star\) is a minimizer of \(f\) and \(A_0\) is a positive number.
In short, for given values of \(n\), \(A_0\) and \(\vec{\gamma}\), \(\tau(n)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2 \leqslant 1\), for the following method.
Algorithm: For \(t\in \{0, \dots, n-1\}\):
\begin{eqnarray} y_{t+1} & = & (1-\alpha_{t} ) x_{t} + \alpha_{t} v_t \\ x_{t+1} & = & \arg\min_x \left\{f(x)+\frac{1}{2\gamma_t}\|x-y_{t+1}\|^2 \right\}, \\ v_{t+1} & = & v_t + \frac{1}{\alpha_{t}} (x_{t+1}-y_{t+1}) \end{eqnarray}with
\begin{eqnarray} \alpha_{t} & = & \frac{\sqrt{(A_t \gamma_t)^2 + 4 A_t \gamma_t} - A_t \gamma_t}{2} \\ A_{t+1} & = & (1 - \alpha_{t}) A_t \end{eqnarray}and \(v_0=x_0\).
Theoretical guarantee: A theoretical upper bound can be found in [1, Theorem 2.3.]:
\[f(x_n)-f_\star \leqslant \frac{4}{A_0 (\sum_{t=0}^{n-1} \sqrt{\gamma_t})^2}\left(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2 \right).\]References: The accelerated proximal point was first obtained and analyzed in [1].
- Parameters
A0 (float) – initial value for parameter A_0.
gammas (list) – sequence of step-sizes.
n (int) – number of iterations.
verbose (bool) – if True, print conclusion
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_proximal_point(A0=5, gammas=[(i + 1) / 1.1 for i in range(3)], n=3, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 6x6 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 20 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.01593113594082973 *** Example file: worst-case performance of optimized gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.0159311 (f(x_0) - f_* + A0/2* ||x_0 - x_*||^2) Theoretical guarantee: f(x_n)-f_* <= 0.0511881 (f(x_0) - f_* + A0/2* ||x_0 - x_*||^2)
1.15. Inexact gradient descent
- PEPit.examples.unconstrained_convex_minimization.wc_inexact_gradient_descent(L, mu, epsilon, n, verbose=True)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for the inexact gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu, \varepsilon)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \varepsilon) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of the inexact gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\), \(\mu\) and \(\varepsilon\), \(\tau(n, L, \mu, \varepsilon)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \gamma d_t\]with
\[\|d_t - \nabla f(x_t)\| \leqslant \varepsilon \|\nabla f(x_t)\|\]and
\[\gamma = \frac{2}{L_{\varepsilon} + \mu_{\varepsilon}}\]where \(L_{\varepsilon} = (1 + \varepsilon) L\) and \(\mu_{\varepsilon} = (1 - \varepsilon) \mu\).
Theoretical guarantee:
The tight worst-case guarantee obtained in [1, Theorem 5.3] or [2, Remark 1.6] is
\[f(x_n) - f_\star \leqslant \left(\frac{L_{\varepsilon}-\mu_{\varepsilon}}{L_{\varepsilon}+\mu_{\varepsilon}}\right)^{2n}(f(x_0) - f_\star),\]where tightness is achieved on simple quadratic functions.
References: The detailed analyses can be found in [1, 2].
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
epsilon (float) – level of inaccuracy.
n (int) – number of iterations.
verbose (bool) – if True, print conclusion.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_inexact_gradient_descent(L=1, mu=.1, epsilon=.1, n=2, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 8x8 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 15 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.5188661397616067 *** Example file: worst-case performance of inexact gradient method in distance in function values *** PEPit guarantee: f(x_n)-f_* <= 0.518866 (f(x_0)-f_*) Theoretical guarantee: f(x_n)-f_* <= 0.518917 (f(x_0)-f_*)
1.16. Inexact gradient descent with exact line search
- PEPit.examples.unconstrained_convex_minimization.wc_inexact_gradient_exact_line_search(L, mu, epsilon, n, verbose=True)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for an inexact gradient method with exact linesearch (ELS). That is, it computes the smallest possible \(\tau(n, L, \mu, \varepsilon)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \varepsilon) ( f(x_0) - f_\star )\]is valid, where \(x_n\) is the output of the gradient descent with an inexact descent direction and an exact linesearch, and where \(x_\star\) is the minimizer of \(f\).
The inexact descent direction \(d\) is assumed to satisfy a relative inaccuracy described by (with \(0 \leqslant \varepsilon < 1\))
\[\|\nabla f(x_t) - d_t\| \leqslant \varepsilon \|\nabla f(x_t)\|,\]where \(\nabla f(x_t)\) is the true gradient, and \(d_t\) is the approximate descent direction that is used.
Algorithm:
For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} \gamma_t & = & \arg\min_{\gamma \in R^d} f(x_t- \gamma d_t), \\ x_{t+1} & = & x_t - \gamma_t d_t. \end{eqnarray}Theoretical guarantees:
The tight guarantee obtained in [1, Theorem 5.1] is
\[f(x_n) - f_\star\leqslant \left(\frac{L_{\varepsilon} - \mu_{\varepsilon}}{L_{\varepsilon} + \mu_{\varepsilon}}\right)^{2n}( f(x_0) - f_\star ),\]with \(L_{\varepsilon} = (1 + \varepsilon) L\) and \(\mu_{\varepsilon} = (1 - \varepsilon) \mu\). Tightness is achieved on simple quadratic functions.
References: The detailed approach (based on convex relaxations) is available in [1],
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
epsilon (float) – level of inaccuracy.
n (int) – number of iterations.
verbose (bool, optional) – if True, print conclusion.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_inexact_gradient_exact_line_search(1, 0.1, 0.1, 1, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 9x9 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 18 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.5186658287501191 *** Example file: worst-case performance of inexact gradient descent with exact linesearch *** PEPit guarantee: f(x_n)-f_* <= 0.518666 (f(x_0)-f_*) Theoretical guarantee: f(x_n)-f_* <= 0.518917 (f(x_0)-f_*)
1.17. Inexact accelerated gradient
- PEPit.examples.unconstrained_convex_minimization.wc_inexact_accelerated_gradient(L, epsilon, n, verbose=True)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for an accelerated gradient method using inexact first-order information. That is, it computes the smallest possible \(\tau(n, L, \varepsilon)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \varepsilon) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of inexact accelerated gradient descent and where \(x_\star\) is a minimizer of \(f\).
The inexact descent direction is assumed to satisfy a relative inaccuracy described by (with \(0\leqslant \varepsilon \leqslant 1\))
\[\|\nabla f(y_t) - d_t\| \leqslant \varepsilon \|\nabla f(y_t)\|,\]where \(\nabla f(y_t)\) is the true gradient at \(y_t\) and \(d_t\) is the approximate descent direction that is used.
Algorithm: The inexact accelerated gradient method of this example is provided by
\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} d_t\\ y_{k+1} & = & x_{t+1} + \frac{t-1}{t+2} (x_{t+1} - x_t). \end{eqnarray}Theoretical guarantee: When \(\varepsilon=0\), a tight empirical guarantee can be found in [1, Table 1]:
\[f(x_n)-f_\star \leqslant \frac{2L\|x_0-x_\star\|^2}{n^2 + 5 n + 6},\]which is achieved on some Huber loss functions (when \(\varepsilon=0\)).
References:
- Parameters
L (float) – smoothness parameter.
epsilon (float) – level of inaccuracy
n (int) – number of iterations.
verbose (bool) – if True, print conclusion
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_inexact_accelerated_gradient(L=1, epsilon=0.1, n=5, verbose=True) (PEPit) Setting up the problem: size of the main PSD matrix: 13x13 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: initial conditions (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : 47 constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.03944038534724904 *** Example file: worst-case performance of inexact accelerated gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.0394404 (f(x_0)-f_*) Theoretical guarantee for epsilon = 0 : f(x_n)-f_* <= 0.0357143 (f(x_0)-f_*)