4. Stochastic and randomized convex minimization

4.1. Stochastic gradient descent 

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_sgd(L, mu, gamma, v, R, n, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the finite sum minimization problem

\[F_\star \triangleq \min_x \left\{F(x) \equiv \frac{1}{n} \sum_{i=1}^n f_i(x)\right\},\]

where \(f_1, ..., f_n\) are \(L\)-smooth and \(\mu\)-strongly convex. In the sequel, we use the notation \(\mathbb{E}\) for denoting the expectation over the uniform distribution of the index \(i \sim \mathcal{U}\left([|1, n|]\right)\), e.g., \(F(x)\equiv\mathbb{E}[f_i(x)]\). In addition, we assume a bounded variance at the optimal point (which is denoted by \(x_\star\)):

\[\mathbb{E}\left[\|\nabla f_i(x_\star)\|^2\right] = \frac{1}{n} \sum_{i=1}^n\|\nabla f_i(x_\star)\|^2 \leqslant v^2.\]

This code computes a worst-case guarantee for one step of the stochastic gradient descent (SGD) in expectation, for the distance to an optimal point. That is, it computes the smallest possible \(\tau(L, \mu, \gamma, v, R, n)\) such that

\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \tau(L, \mu, \gamma, v, R, n)\]

where \(\|x_0 - x_\star\|^2 \leqslant R^2\), where \(v\) is the variance at \(x_\star\), and where \(x_1\) is the output of one step of SGD (note that we use the notation \(x_0,x_1\) to denote two consecutive iterates for convenience; as the bound is valid for all \(x_0\), it is also valid for any pair of consecutive iterates of the algorithm).

Algorithm: One iteration of SGD is described by:

\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, n|]\right), \\ x_{t+1} & = & x_t - \gamma \nabla f_{i}(x_t), \end{eqnarray}\end{split}\]

where \(\gamma\) is a step-size.

Theoretical guarantee: An empirically tight one-iteration guarantee is provided in the code of PESTO [1]:

\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \frac{1}{2}\left(1-\frac{\mu}{L}\right)^2 R^2 + \frac{1}{2}\left(1-\frac{\mu}{L}\right) R \sqrt{\left(1-\frac{\mu}{L}\right)^2 R^2 + 4\frac{v^2}{L^2}} + \frac{v^2}{L^2},\]

when \(\gamma=\frac{1}{L}\). Note that we observe the guarantee does not depend on the number \(n\) of functions for this particular setting, thereby implying that the guarantees are also valid for expectation minimization settings (i.e., when \(n\) goes to infinity).

References: Empirically tight guarantee provided in code of [1]. Using SDPs for analyzing SGD-type method was proposed in [2, 3].

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Performance Estimation Toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In 56th IEEE Conference on Decision and Control (CDC).

[2] B. Hu, P. Seiler, L. Lessard (2020). Analysis of biased stochastic gradient descent using sequential semidefinite programs. Mathematical programming.

[3] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. Conference on Learning Theory (COLT).

Parameters:

L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
gamma (float) – the step-size.
v (float) – the variance bound.
R (float) – the initial distance.
n (int) – number of functions.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + solver details.

Returns:

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> mu = 0.1
>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_sgd(L=L, mu=mu, gamma=gamma, v=1, R=2, n=5, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 11x11
(PEPit) Setting up the problem: performance measure is the minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (2 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 5 function(s)
                        Function 1 : Adding 2 scalar constraint(s) ...
                        Function 1 : 2 scalar constraint(s) added
                        Function 2 : Adding 2 scalar constraint(s) ...
                        Function 2 : 2 scalar constraint(s) added
                        Function 3 : Adding 2 scalar constraint(s) ...
                        Function 3 : 2 scalar constraint(s) added
                        Function 4 : Adding 2 scalar constraint(s) ...
                        Function 4 : 2 scalar constraint(s) added
                        Function 5 : Adding 2 scalar constraint(s) ...
                        Function 5 : 2 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 5.041652165318314
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite up to an error of 7.85488416412956e-09
                All the primal scalar constraints are verified up to an error of 2.157126582913449e-08
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 1.9611762548955365e-07
(PEPit) Final upper bound (dual): 5.041652154374022 and lower bound (primal example): 5.041652165318314
(PEPit) Duality gap: absolute: -1.094429169512523e-08 and relative: -2.170774844486766e-09
*** Example file: worst-case performance of stochastic gradient descent with fixed step-size ***
        PEPit guarantee:         E[||x_1 - x_*||^2] <= 5.04165
        Theoretical guarantee:   E[||x_1 - x_*||^2] <= 5.04165

4.2. Stochastic gradient descent in overparametrized setting 

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_sgd_overparametrized(L, mu, gamma, n, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the finite sum minimization problem

\[F_\star \triangleq \min_x \left\{F(x) \equiv \frac{1}{n} \sum_{i=1}^n f_i(x)\right\},\]

where \(f_1, ..., f_n\) are \(L\)-smooth and \(\mu\)-strongly convex. In the sequel, we use the notation \(\mathbb{E}\) for denoting the expectation over the uniform distribution of the index \(i \sim \mathcal{U}\left([|1, n|]\right)\), e.g., \(F(x)\equiv\mathbb{E}[f_i(x)]\). In addition, we assume a zero variance at the optimal point (which is denoted by \(x_\star\)):

\[\mathbb{E}\left[\|\nabla f_i(x_\star)\|^2\right] = \frac{1}{n} \sum_{i=1}^n \|\nabla f_i(x_\star)\|^2 = 0,\]

where the expectation \(\mathbb{E}\) is taken over the uniform distribution of the index \(i \sim \mathcal{U}\left([|1, n|]\right)\).

This kind of situations happens for example in machine learning in the interpolation regime, that is if there exists a model \(x_\star\) such that the loss \(\mathcal{L}\) on any observation \((z_i)_{i \in [|1, n|]}\), \(\mathcal{L}(x_\star, z_i) = f_i(x_\star)\) is zero.

This code computes a worst-case guarantee for one step of the stochastic gradient descent (SGD) in expectation, for the distance to optimal point. That is, it computes the smallest possible \(\tau(L, \mu, \gamma, n)\) such that

\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \tau(L, \mu, \gamma, n) \|x_0 - x_\star\|^2\]

is valid, where \(x_1\) is the output of one step of SGD.

Algorithm: One iteration of SGD is described by:

\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, n|]\right), \\ x_{t+1} & = & x_t - \gamma \nabla f_{i}(x_t), \end{eqnarray}\end{split}\]

where \(\gamma\) is a step-size.

Theoretical guarantee: An empirically tight one-iteration guarantee is provided in the code of PESTO [1]:

\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \left(1-\frac{\mu}{L}\right)^2 \|x_0-x_\star\|^2,\]

when \(\gamma=\frac{1}{L}\). Note that we observe the guarantee does not depend on the number \(n\) of functions for this particular setting, thereby implying that the guarantees are also valid for expectation minimization settings (i.e., when \(n\) goes to infinity).

References: Empirically tight guarantee provided in code of [1]. Using SDPs for analyzing SGD-type method was proposed in [2, 3].

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Performance Estimation Toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In 56th IEEE Conference on Decision and Control (CDC).

[2] B. Hu, P. Seiler, L. Lessard (2020). Analysis of biased stochastic gradient descent using sequential semidefinite programs. Mathematical programming.

[3] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. Conference on Learning Theory (COLT).

Parameters:

L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
gamma (float) – the step-size.
n (int) – number of functions.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + solver details.

Returns:

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> mu = 0.1
>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_sgd_overparametrized(L=L, mu=mu, gamma=gamma, n=5, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 11x11
(PEPit) Setting up the problem: performance measure is the minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (2 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 5 function(s)
                        Function 1 : Adding 2 scalar constraint(s) ...
                        Function 1 : 2 scalar constraint(s) added
                        Function 2 : Adding 2 scalar constraint(s) ...
                        Function 2 : 2 scalar constraint(s) added
                        Function 3 : Adding 2 scalar constraint(s) ...
                        Function 3 : 2 scalar constraint(s) added
                        Function 4 : Adding 2 scalar constraint(s) ...
                        Function 4 : 2 scalar constraint(s) added
                        Function 5 : Adding 2 scalar constraint(s) ...
                        Function 5 : 2 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.8099999999882777
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite up to an error of 1.1380952365332445e-10
                All the primal scalar constraints are verified up to an error of 2.1146270370529728e-10
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 2.8089919395053243e-09
(PEPit) Final upper bound (dual): 0.8100000000728617 and lower bound (primal example): 0.8099999999882777
(PEPit) Duality gap: absolute: 8.458400646560449e-11 and relative: 1.044246993417637e-10
*** Example file: worst-case performance of stochastic gradient descent with fixed step-size and with zero variance at the optimal point ***
        PEPit guarantee:         E[||x_1 - x_*||^2] <= 0.81 ||x_0 - x_*||^2
        Theoretical guarantee:   E[||x_1 - x_*||^2] <= 0.81 ||x_0 - x_*||^2

4.3. SAGA 

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_saga(L, mu, n, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the finite sum convex minimization problem

\[F_\star \triangleq \min_x \left\{F(x) \equiv h(x) + \frac{1}{n} \sum_{i=1}^{n} f_i(x)\right\},\]

where the functions \(f_i\) are assumed to be \(L\)-smooth \(\mu\)-strongly convex, and \(h\) is closed, proper, and convex with a proximal operator readily available. In the sequel, we use the notation \(\mathbb{E}\) for denoting the expectation over the uniform distribution of the index \(i \sim \mathcal{U}\left([|1, n|]\right)\), e.g., \(F(x)\equiv h(x)+\mathbb{E}[f_i(x)]\).

This code computes the exact rate for a Lyapunov (or energy) function for SAGA [1]. That is, it computes the smallest possible \(\tau(n,L,\mu)\) such this Lyapunov function decreases geometrically

\[\mathbb{E}[V^{(1)}] \leqslant \tau(n, L, \mu) V^{(0)},\]

where the value of the Lyapunov function at iteration \(t\) is denoted by \(V^{(t)}\) and is defined as

\[V^{(t)} \triangleq \frac{1}{n} \sum_{i=1}^n \left(f_i(\phi_i^{(t)}) - f_i(x^\star) - \langle \nabla f_i(x^\star); \phi_i^{(t)} - x^\star\rangle\right) + \frac{1}{2 n \gamma (1-\mu \gamma)} \|x^{(t)} - x^\star\|^2,\]

with \(\gamma = \frac{1}{2(\mu n+L)}\) (this Lyapunov function was proposed in [1, Theorem 1]). We consider the case \(t=0\) in the code below, without loss of generality.

In short, for given values of \(n\), \(L\), and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\mathbb{E}[V^{(1)}]\) when \(V(x^{(0)}) \leqslant 1\).

Algorithm: One iteration of SAGA [1] is described as follows: at iteration \(t\), pick \(j\in\{1,\ldots,n\}\) uniformely at random and set:

\begin{eqnarray} \phi_j^{(t+1)} & = & x^{(t)} \\ w^{(t+1)} & = & x^{(t)} - \gamma \left[ \nabla f_j (\phi_j^{(t+1)}) - \nabla f_j(\phi_j^{(t)}) + \frac{1}{n} \sum_{i=1}^n(\nabla f_i(\phi^{(t)}))\right] \\ x^{(t+1)} & = & \mathrm{prox}_{\gamma h} (w^{(t+1)})\triangleq \arg\min_x \left\{ \gamma h(x)+\frac{1}{2}\|x-w^{(t+1)}\|^2\right\} \end{eqnarray}

Theoretical guarantee: The following upper bound (empirically tight) can be found in [1, Theorem 1]:

\[\mathbb{E}[V^{(t+1)}] \leqslant \left(1-\gamma\mu \right)V^{(t)}\]

References:

[1] A. Defazio, F. Bach, S. Lacoste-Julien (2014). SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in Neural Information Processing Systems (NIPS).

Parameters:

L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of functions.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + solver details.

Returns:

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_saga(L=1, mu=.1, n=5, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 27x27
(PEPit) Setting up the problem: performance measure is the minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 6 function(s)
                        Function 1 : Adding 30 scalar constraint(s) ...
                        Function 1 : 30 scalar constraint(s) added
                        Function 2 : Adding 6 scalar constraint(s) ...
                        Function 2 : 6 scalar constraint(s) added
                        Function 3 : Adding 6 scalar constraint(s) ...
                        Function 3 : 6 scalar constraint(s) added
                        Function 4 : Adding 6 scalar constraint(s) ...
                        Function 4 : 6 scalar constraint(s) added
                        Function 5 : Adding 6 scalar constraint(s) ...
                        Function 5 : 6 scalar constraint(s) added
                        Function 6 : Adding 6 scalar constraint(s) ...
                        Function 6 : 6 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.9666666451656595
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite up to an error of 8.407941346480772e-09
                All the primal scalar constraints are verified up to an error of 1.4346716234068732e-07
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 2.824905589324927e-07
(PEPit) Final upper bound (dual): 0.9666666537075449 and lower bound (primal example): 0.9666666451656595
(PEPit) Duality gap: absolute: 8.541885421209372e-09 and relative: 8.836433390898197e-09
*** Example file: worst-case performance of SAGA for Lyapunov function V_t ***
        PEPit guarantee:         V^(1) <= 0.966667 V^(0)
        Theoretical guarantee:   V^(1) <= 0.966667 V^(0)

4.4. Point SAGA 

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_point_saga(L, mu, n, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the finite sum minimization problem

\[F^\star \triangleq \min_x \left\{F(x) \equiv \frac{1}{n} \sum_{i=1}^n f_i(x)\right\},\]

where \(f_1, \dots, f_n\) are \(L\)-smooth and \(\mu\)-strongly convex, and with proximal operator readily available. In the sequel, we use the notation \(\mathbb{E}\) for denoting the expectation over the uniform distribution of the index \(i \sim \mathcal{U}\left([|1, n|]\right)\), e.g., \(F(x)\equiv \mathbb{E}[f_i(x)]\).

This code computes a tight (one-step) worst-case guarantee using a Lyapunov function for Point SAGA [1]. The Lyapunov (or energy) function at a point \(x\) is given in [1, Theorem 5]:

\[V(x) = \frac{1}{L \mu}\frac{1}{n} \sum_{i \leq n} \|\nabla f_i(x) - \nabla f_i(x_\star)\|^2 + \|x - x^\star\|^2,\]

where \(x^\star\) denotes the minimizer of \(F\). The code computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee (in expectation):

\[\mathbb{E}\left[V\left(x^{(1)}\right)\right] \leqslant \tau(n, L, \mu) V\left(x^{(0)}\right),\]

is valid (note that we use the notation \(x^{(0)},x^{(1)}\) to denote two consecutive iterates for convenience; as the bound is valid for all \(x^{(0)}\), it is also valid for any pair of consecutive iterates of the algorithm).

In short, for given values of \(n\), \(L\), and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\mathbb{E}\left[V\left(x^{(1)}\right)\right]\) when \(V\left(x^{(0)}\right) \leqslant 1\).

Algorithm: Point SAGA is described by

\[\begin{split}\begin{eqnarray} \text{Set }\gamma & = & \frac{\sqrt{(n - 1)^2 + 4n\frac{L}{\mu}}}{2Ln} - \frac{\left(1 - \frac{1}{n}\right)}{2L} \\ \text{Pick random }j & \sim & \mathcal{U}\left([|1, n|]\right) \\ z^{(t)} & = & x_t + \gamma \left(g_j^{(t)} - \frac{1}{n} \sum_{i\leq n}g_i^{(t)} \right), \\ x^{(t+1)} & = & \mathrm{prox}_{\gamma f_j}(z^{(t)})\triangleq \arg\min_x\left\{ \gamma f_j(x)+\frac{1}{2} \|x-z^{(t)}\|^2 \right\}, \\ g_j^{(t+1)} & = & \frac{1}{\gamma}(z^{(t)} - x^{(t+1)}). \end{eqnarray}\end{split}\]

Theoretical guarantee: A theoretical upper bound is given in [1, Theorem 5].

\[\mathbb{E}\left[V\left(x^{(t+1)}\right)\right] \leqslant \frac{1}{1 + \mu\gamma} V\left(x^{(t)}\right)\]

References:

[1] A. Defazio (2016). A simple practical accelerated method for finite sums. Advances in Neural Information Processing Systems (NIPS), 29, 676-684.

Parameters:

L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of functions.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + solver details.

Returns:

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_point_saga(L=1, mu=.01, n=10, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 31x31
(PEPit) Setting up the problem: performance measure is the minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 10 function(s)
                        Function 1 : Adding 2 scalar constraint(s) ...
                        Function 1 : 2 scalar constraint(s) added
                        Function 2 : Adding 2 scalar constraint(s) ...
                        Function 2 : 2 scalar constraint(s) added
                        Function 3 : Adding 2 scalar constraint(s) ...
                        Function 3 : 2 scalar constraint(s) added
                        Function 4 : Adding 2 scalar constraint(s) ...
                        Function 4 : 2 scalar constraint(s) added
                        Function 5 : Adding 2 scalar constraint(s) ...
                        Function 5 : 2 scalar constraint(s) added
                        Function 6 : Adding 2 scalar constraint(s) ...
                        Function 6 : 2 scalar constraint(s) added
                        Function 7 : Adding 2 scalar constraint(s) ...
                        Function 7 : 2 scalar constraint(s) added
                        Function 8 : Adding 2 scalar constraint(s) ...
                        Function 8 : 2 scalar constraint(s) added
                        Function 9 : Adding 2 scalar constraint(s) ...
                        Function 9 : 2 scalar constraint(s) added
                        Function 10 : Adding 2 scalar constraint(s) ...
                        Function 10 : 2 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.9714053953788857
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite up to an error of 4.818284021357577e-10
                All the primal scalar constraints are verified up to an error of 4.496377781215699e-08
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 6.574069365446737e-07
(PEPit) Final upper bound (dual): 0.9714053958630251 and lower bound (primal example): 0.9714053953788857
(PEPit) Duality gap: absolute: 4.841393952403905e-10 and relative: 4.983906796724733e-10
*** Example file: worst-case performance of Point SAGA for a given Lyapunov function ***
        PEPit guarantee:         E[V(x^(1))] <= 0.971405 V(x^(0))
        Theoretical guarantee:   E[V(x^(1))] <= 0.973292 V(x^(0))

4.5. Randomized coordinate descent for smooth strongly convex functions 

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_randomized_coordinate_descent_smooth_strongly_convex(L, mu, gamma, d, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for randomized block-coordinate descent with step-size \(\gamma\). That is, it computes the smallest possible \(\tau(L, \mu, \gamma, d)\) such that the guarantee

\[\mathbb{E}[\|x_{t+1} - x_\star \|^2] \leqslant \tau(L, \mu, \gamma, d) \|x_t - x_\star\|^2\]

holds for any fixed step-size \(\gamma\) and any number of blocks \(d\), and where \(x_\star\) denotes a minimizer of \(f\). The notation \(\mathbb{E}\) denotes the expectation over the uniform distribution of the index \(i \sim \mathcal{U}\left([|1, n|]\right)\).

In short, for given values of \(\mu\), \(L\), \(d\), and \(\gamma\), \(\tau(L, \mu, \gamma, d)\) is computed as the worst-case value of \(\mathbb{E}[\|x_{t+1} - x_\star \|^2]\) when \(\|x_t - x_\star\|^2 \leqslant 1\).

Algorithm: Randomized block-coordinate descent is described by

\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, d|]\right), \\ x_{t+1} & = & x_t - \gamma \nabla_i f(x_t), \end{eqnarray}\end{split}\]

where \(\gamma\) is a step-size and \(\nabla_i f(x_t)\) is the \(i^{\text{th}}\) partial gradient.

Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), the tight theoretical guarantee can be found in [1, Appendix I, Theorem 17]:

\[\mathbb{E}[\|x_{t+1} - x_\star \|^2] \leqslant \rho^2 \|x_t-x_\star\|^2,\]

where \(\rho^2 = \max \left( \frac{(\gamma\mu - 1)^2 + d - 1}{d},\frac{(\gamma L - 1)^2 + d - 1}{d} \right)\).

References:

[1] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. In Conference on Learning Theory (COLT).

Parameters:

L (float) – the smoothness parameter.
mu (float) – the strong-convexity parameter.
gamma (float) – the step-size.
d (int) – the dimension.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + solver details.

Returns:

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> L = 1
>>> mu = 0.1
>>> gamma = 2 / (mu + L)
>>> pepit_tau, theoretical_tau = wc_randomized_coordinate_descent_smooth_strongly_convex(L=L, mu=mu, gamma=gamma, d=2, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 4x4
(PEPit) Setting up the problem: performance measure is the minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                        Function 1 : Adding 2 scalar constraint(s) ...
                        Function 1 : 2 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Setting up the problem: 1 partition(s) added
                        Partition 1 with 2 blocks: Adding 1 scalar constraint(s)...
                        Partition 1 with 2 blocks: 1 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.8347107438584297
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite
                All the primal scalar constraints are verified up to an error of 1.4183154650737606e-11
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 4.950747594358786e-10
(PEPit) Final upper bound (dual): 0.8347107438665666 and lower bound (primal example): 0.8347107438584297
(PEPit) Duality gap: absolute: 8.136935569780235e-12 and relative: 9.748209939370677e-12
*** Example file: worst-case performance of randomized coordinate gradient descent ***
        PEPit guarantee:         E[||x_(t+1) - x_*||^2] <= 0.834711 ||x_t - x_*||^2
        Theoretical guarantee:   E[||x_(t+1) - x_*||^2] <= 0.834711 ||x_t - x_*||^2

4.6. Randomized coordinate descent for smooth convex functions 

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_randomized_coordinate_descent_smooth_convex(L, gamma, d, t, wrapper='cvxpy', solver=None, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code computes a worst-case guarantee for randomized block-coordinate descent with \(d\) blocks and fixed step-size \(\gamma\). That is, it verifies that the Lyapunov function

\[\phi(t, x_t) = (t \gamma \frac{L}{d} + 1)(f(x_t) - f_\star) + \frac{L}{2} \|x_t - x_\star\||^2\]

is decreasing in expectation over the randomized block-coordinate descent algorithm. We use the notation \(\mathbb{E}\) for denoting the expectation over the uniform distribution of the index \(i \sim \mathcal{U}\left([|1, n|]\right)\).

In short, for given values of \(L\), \(d\), and \(\gamma\), it computes the worst-case value of \(\mathbb{E}[\phi(t, x_t)]\) such that \(\phi(x_{t-1}) \leqslant 1\).

Algorithm: Randomized block-coordinate descent is described by

\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, d|]\right), \\ x_{t+1} & = & x_t - \gamma \nabla_i f(x_t), \end{eqnarray}\end{split}\]

where \(\gamma\) is a step-size and \(\nabla_i f(x_t)\) is the \(i^{\text{th}}\) partial gradient.

Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), the tight theoretical guarantee can be found in [1, Appendix I, Theorem 16]:

\[\mathbb{E}[\phi(t, x_t)] \leqslant \phi(t-1, x_{t-1}),\]

where \(\phi(t, x_t) = (t \gamma \frac{L}{d} + 1)(f(x_t) - f_\star) + \frac{L}{2} \|x_t - x_\star\|^2\).

References:

[1] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. In Conference on Learning Theory (COLT).

Parameters:

L (float) – the smoothness parameter.
gamma (float) – the step-size.
d (int) – the dimension.
t (int) – number of iterations.
wrapper (str) – the name of the wrapper to be used.
solver (str) – the name of the solver the wrapper should use.
verbose (int) –
level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + solver details.

Returns:

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> L = 1
>>> pepit_tau, theoretical_tau = wc_randomized_coordinate_descent_smooth_convex(L=L, gamma=1 / L, d=2, n=4, wrapper="cvxpy", solver=None, verbose=1)
(PEPit) Setting up the problem: size of the Gram matrix: 6x6
(PEPit) Setting up the problem: performance measure is the minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                        Function 1 : Adding 12 scalar constraint(s) ...
                        Function 1 : 12 scalar constraint(s) added
(PEPit) Setting up the problem: additional constraints for 0 function(s)
(PEPit) Setting up the problem: 1 partition(s) added
                        Partition 1 with 2 blocks: Adding 1 scalar constraint(s)...
                        Partition 1 with 2 blocks: 1 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 1.0000000021855517
(PEPit) Primal feasibility check:
                The solver found a Gram matrix that is positive semi-definite up to an error of 4.888278544731664e-09
                All the primal scalar constraints are verified up to an error of 8.385744333248845e-09
(PEPit) Dual feasibility check:
                The solver found a residual matrix that is positive semi-definite
                All the dual scalar values associated with inequality constraints are nonnegative
(PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 6.347767642940552e-08
(PEPit) Final upper bound (dual): 1.0000000024690172 and lower bound (primal example): 1.0000000021855517
(PEPit) Duality gap: absolute: 2.8346547331636884e-10 and relative: 2.834654726968404e-10
*** Example file: worst-case performance of randomized  coordinate gradient descent ***
        PEPit guarantee:         E[phi(t, x_t)] <= 1.0 phi(t-1, x_(t-1))
        Theoretical guarantee:   E[phi(t, x_t)] <= 1.0 phi(t-1, x_(t-1))

4. Stochastic and randomized convex minimization

4.1. Stochastic gradient descent

4.2. Stochastic gradient descent in overparametrized setting

4.3. SAGA

4.4. Point SAGA

4.5. Randomized coordinate descent for smooth strongly convex functions

4.6. Randomized coordinate descent for smooth convex functions

4.1. Stochastic gradient descent 

4.2. Stochastic gradient descent in overparametrized setting 

4.3. SAGA 

4.4. Point SAGA 

4.5. Randomized coordinate descent for smooth strongly convex functions 

4.6. Randomized coordinate descent for smooth convex functions 