4. Stochastic and randomized convex minimization

Stochastic gradient descent
Stochastic gradient descent in overparametrized setting
SAGA
Point SAGA
Randomized coordinate descent for smooth strongly convex functions
Randomized coordinate descent for smooth convex functions

4.1. Stochastic gradient descent 

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_sgd(L, mu, gamma, v, R, n, verbose=1)[source]

Consider the finite sum minimization problem

\[F_\star \triangleq \min_x \left\{F(x) \equiv \frac{1}{n} \sum_{i=1}^n f_i(x)\right\},\]

where \(f_1, ..., f_n\) are \(L\)-smooth and \(\mu\)-strongly convex. In addition, we assume a bounded variance at the optimal point (which is denoted by \(x_\star\)):

\[\mathbb{E}\left[\|\nabla f_i(x_\star)\|^2\right] = \frac{1}{n} \sum_{i=1}^n\|\nabla f_i(x_\star)\|^2 \leqslant v^2.\]

This code computes a worst-case guarantee for one step of the stochastic gradient descent (SGD) in expectation, for the distance to an optimal point. That is, it computes the smallest possible \(\tau(L, \mu, \gamma, v, R, n)\) such that

\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \tau(L, \mu, \gamma, v, R, n)\]

where \(\|x_0 - x_\star\|^2 \leqslant R^2\), where \(v\) is the variance at \(x_\star\), and where \(x_1\) is the output of one step of SGD (note that we use the notation \(x_0,x_1\) to denote two consecutive iterates for convenience; as the bound is valid for all \(x_0\), it is also valid for any pair of consecutive iterates of the algorithm).

Algorithm: One iteration of SGD is described by:

\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, n|]\right), \\ x_{t+1} & = & x_t - \gamma \nabla f_{i}(x_t), \end{eqnarray}\end{split}\]

where \(\gamma\) is a step-size.

Theoretical guarantee: An empirically tight one-iteration guarantee is provided in the code of PESTO [1]:

\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \frac{1}{2}\left(1-\frac{\mu}{L}\right)^2 R^2 + \frac{1}{2}\left(1-\frac{\mu}{L}\right) R \sqrt{\left(1-\frac{\mu}{L}\right)^2 R^2 + 4\frac{v^2}{L^2}} + \frac{v^2}{L^2},\]

when \(\gamma=\frac{1}{L}\). Note that we observe the guarantee does not depend on the number \(n\) of functions for this particular setting, thereby implying that the guarantees are also valid for expectation minimization settings (i.e., when \(n\) goes to infinity).

References: Empirically tight guarantee provided in code of [1]. Using SDPs for analyzing SGD-type method was proposed in [2, 3].

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Performance Estimation Toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In 56th IEEE Conference on Decision and Control (CDC).

[2] B. Hu, P. Seiler, L. Lessard (2020). Analysis of biased stochastic gradient descent using sequential semidefinite programs. Mathematical programming (to appear).

[3] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. Conference on Learning Theory (COLT).

Parameters

L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
gamma (float) – the step-size.
v (float) – the variance bound.
R (float) – the initial distance.
n (int) – number of functions.
verbose (int) –
Level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + CVXPY details.

Returns

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> mu = 0.1
>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_sgd(L=L, mu=mu, gamma=gamma, v=1, R=2, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 11x11
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (2 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 5 function(s)
                 function 1 : Adding 2 scalar constraint(s) ...
                 function 1 : 2 scalar constraint(s) added
                 function 2 : Adding 2 scalar constraint(s) ...
                 function 2 : 2 scalar constraint(s) added
                 function 3 : Adding 2 scalar constraint(s) ...
                 function 3 : 2 scalar constraint(s) added
                 function 4 : Adding 2 scalar constraint(s) ...
                 function 4 : 2 scalar constraint(s) added
                 function 5 : Adding 2 scalar constraint(s) ...
                 function 5 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 5.041652328250217
*** Example file: worst-case performance of stochastic gradient descent with fixed step-size ***
        PEPit guarantee:         E[||x_1 - x_*||^2] <= 5.04165
        Theoretical guarantee:   E[||x_1 - x_*||^2] <= 5.04165

4.2. Stochastic gradient descent in overparametrized setting 

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_sgd_overparametrized(L, mu, gamma, n, verbose=1)[source]

Consider the finite sum minimization problem

\[F_\star \triangleq \min_x \left\{F(x) \equiv \frac{1}{n} \sum_{i=1}^n f_i(x)\right\},\]

where \(f_1, ..., f_n\) are \(L\)-smooth and \(\mu\)-strongly convex. In addition, we assume a zero variance at the optimal point (which is denoted by \(x_\star\)):

\[\mathbb{E}\left[\|\nabla f_i(x_\star)\|^2\right] = \frac{1}{n} \sum_{i=1}^n \|\nabla f_i(x_\star)\|^2 = 0,\]

which happens for example in machine learning in the interpolation regime, that is if there exists a model \(x_\star\) such that the loss \(\mathcal{L}\) on any observation \((z_i)_{i \in [|1, n|]}\), \(\mathcal{L}(x_\star, z_i) = f_i(x_\star)\) is zero.

This code computes a worst-case guarantee for one step of the stochastic gradient descent (SGD) in expectation, for the distance to optimal point. That is, it computes the smallest possible \(\tau(L, \mu, \gamma, n)\) such that

\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \tau(L, \mu, \gamma, n) \|x_0 - x_\star\|^2\]

is valid, where \(x_1\) is the output of one step of SGD.

Algorithm: One iteration of SGD is described by:

\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, n|]\right), \\ x_{t+1} & = & x_t - \gamma \nabla f_{i}(x_t), \end{eqnarray}\end{split}\]

where \(\gamma\) is a step-size.

Theoretical guarantee: An empirically tight one-iteration guarantee is provided in the code of PESTO [1]:

\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \frac{1}{2}\left(1-\frac{\mu}{L}\right)^2 \|x_0-x_\star\|^2,\]

when \(\gamma=\frac{1}{L}\). Note that we observe the guarantee does not depend on the number \(n\) of functions for this particular setting, thereby implying that the guarantees are also valid for expectation minimization settings (i.e., when \(n\) goes to infinity).

References: Empirically tight guarantee provided in code of [1]. Using SDPs for analyzing SGD-type method was proposed in [2, 3].

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Performance Estimation Toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In 56th IEEE Conference on Decision and Control (CDC).

[2] B. Hu, P. Seiler, L. Lessard (2020). Analysis of biased stochastic gradient descent using sequential semidefinite programs. Mathematical programming (to appear).

[3] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. Conference on Learning Theory (COLT).

Parameters

L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
gamma (float) – the step-size.
n (int) – number of functions.
verbose (int) –
Level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + CVXPY details.

Returns

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> mu = 0.1
>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_sgd_overparametrized(L=L, mu=mu, gamma=gamma, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 11x11
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (2 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 5 function(s)
                 function 1 : Adding 2 scalar constraint(s) ...
                 function 1 : 2 scalar constraint(s) added
                 function 2 : Adding 2 scalar constraint(s) ...
                 function 2 : 2 scalar constraint(s) added
                 function 3 : Adding 2 scalar constraint(s) ...
                 function 3 : 2 scalar constraint(s) added
                 function 4 : Adding 2 scalar constraint(s) ...
                 function 4 : 2 scalar constraint(s) added
                 function 5 : Adding 2 scalar constraint(s) ...
                 function 5 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.8099999999798264
*** Example file: worst-case performance of stochastic gradient descent with fixed step-size and with zero variance at the optimal point ***
        PEPit guarantee:         E[||x_1 - x_*||^2] <= 0.81 ||x_0 - x_*||^2
        Theoretical guarantee:   E[||x_1 - x_*||^2] <= 0.81 ||x_0 - x_*||^2

4.3. SAGA 

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_saga(L, mu, n, verbose=1)[source]

Consider the finite sum convex minimization problem

\[F_\star \triangleq \min_x \left\{F(x) \equiv h(x) + \frac{1}{n} \sum_{i=1}^{n} f_i(x)\right\},\]

where the functions \(f_i\) are assumed to be \(L\)-smooth \(\mu\)-strongly convex, and \(h\) is closed, proper, and convex with a proximal operator readily available.

This code computes the exact rate for a Lyapunov (or energy) function for SAGA [1]. That is, it computes the smallest possible \(\tau(n,L,\mu)\) such this Lyapunov function decreases geometrically

\[\mathbb{E}[V^{(1)}] \leqslant \tau(n, L, \mu) V^{(0)},\]

where the value of the Lyapunov function at iteration \(t\) is denoted by \(V^{(t)}\) and is defined as

\[V^{(t)} \triangleq \frac{1}{n} \sum_{i=1}^n \left(f_i(\phi_i^{(t)}) - f_i(x^\star) - \langle \nabla f_i(x^\star); \phi_i^{(t)} - x^\star\rangle\right) + \frac{1}{2 n \gamma (1-\mu \gamma)} \|x^{(t)} - x^\star\|^2,\]

with \(\gamma = \frac{1}{2(\mu n+L)}\) (this Lyapunov function was proposed in [1, Theorem 1]). We consider the case \(t=0\) in the code below, without loss of generality.

In short, for given values of \(n\), \(L\), and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\mathbb{E}[V^{(1)}]\) when \(V(x^{(0)}) \leqslant 1\).

Algorithm: One iteration of SAGA [1] is described as follows: at iteration \(t\), pick \(j\in\{1,\ldots,n\}\) uniformely at random and set:

\begin{eqnarray} \phi_j^{(t+1)} & = & x^{(t)} \\ w^{(t+1)} & = & x^{(t)} - \gamma \left[ \nabla f_j (\phi_j^{(t+1)}) - \nabla f_j(\phi_j^{(t)}) + \frac{1}{n} \sum_{i=1}^n(\nabla f_i(\phi^{(t)}))\right] \\ x^{(t+1)} & = & \mathrm{prox}_{\gamma h} (w^{(t+1)})\triangleq \arg\min_x \left\{ \gamma h(x)+\frac{1}{2}\|x-w^{(t+1)}\|^2\right\} \end{eqnarray}

Theoretical guarantee: The following upper bound (empirically tight) can be found in [1, Theorem 1]:

\[\mathbb{E}[V^{(t+1)}] \leqslant \left(1-\gamma\mu \right)V^{(t)}\]

References:

[1] A. Defazio, F. Bach, S. Lacoste-Julien (2014). SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in Neural Information Processing Systems (NIPS).

Parameters

L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of functions.
verbose (int) –
Level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + CVXPY details.

Returns

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_saga(L=1, mu=.1, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 27x27
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 6 function(s)
                 function 1 : Adding 30 scalar constraint(s) ...
                 function 1 : 30 scalar constraint(s) added
                 function 2 : Adding 6 scalar constraint(s) ...
                 function 2 : 6 scalar constraint(s) added
                 function 3 : Adding 6 scalar constraint(s) ...
                 function 3 : 6 scalar constraint(s) added
                 function 4 : Adding 6 scalar constraint(s) ...
                 function 4 : 6 scalar constraint(s) added
                 function 5 : Adding 6 scalar constraint(s) ...
                 function 5 : 6 scalar constraint(s) added
                 function 6 : Adding 6 scalar constraint(s) ...
                 function 6 : 6 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.9666748513396348
*** Example file: worst-case performance of SAGA for Lyapunov function V_t ***
        PEPit guarantee:         V^(1) <= 0.966675 V^(0)
        Theoretical guarantee:   V^(1) <= 0.966667 V^(0)

4.4. Point SAGA 

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_point_saga(L, mu, n, verbose=1)[source]

Consider the finite sum minimization problem

\[F^\star \triangleq \min_x \left\{F(x) \equiv \frac{1}{n} \sum_{i=1}^n f_i(x)\right\},\]

where \(f_1, \dots, f_n\) are \(L\)-smooth and \(\mu\)-strongly convex, and with proximal operator readily available.

This code computes a tight (one-step) worst-case guarantee using a Lyapunov function for Point SAGA [1]. The Lyapunov (or energy) function at a point \(x\) is given in [1, Theorem 5]:

\[V(x) = \frac{1}{L \mu}\frac{1}{n} \sum_{i \leq n} \|\nabla f_i(x) - \nabla f_i(x_\star)\|^2 + \|x - x^\star\|^2,\]

where \(x^\star\) denotes the minimizer of \(F\). The code computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee (in expectation):

\[\mathbb{E}\left[V\left(x^{(1)}\right)\right] \leqslant \tau(n, L, \mu) V\left(x^{(0)}\right),\]

is valid (note that we use the notation \(x^{(0)},x^{(1)}\) to denote two consecutive iterates for convenience; as the bound is valid for all \(x^{(0)}\), it is also valid for any pair of consecutive iterates of the algorithm).

In short, for given values of \(n\), \(L\), and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\mathbb{E}\left[V\left(x^{(1)}\right)\right]\) when \(V\left(x^{(0)}\right) \leqslant 1\).

Algorithm: Point SAGA is described by

\[\begin{split}\begin{eqnarray} \text{Set }\gamma & = & \frac{\sqrt{(n - 1)^2 + 4n\frac{L}{\mu}}}{2Ln} - \frac{\left(1 - \frac{1}{n}\right)}{2L} \\ \text{Pick random }j & \sim & \mathcal{U}\left([|1, n|]\right) \\ z^{(t)} & = & x_t + \gamma \left(g_j^{(t)} - \frac{1}{n} \sum_{i\leq n}g_i^{(t)} \right), \\ x^{(t+1)} & = & \mathrm{prox}_{\gamma f_j}(z^{(t)})\triangleq \arg\min_x\left\{ \gamma f_j(x)+\frac{1}{2} \|x-z^{(t)}\|^2 \right\}, \\ g_j^{(t+1)} & = & \frac{1}{\gamma}(z^{(t)} - x^{(t+1)}). \end{eqnarray}\end{split}\]

Theoretical guarantee: A theoretical upper bound is given in [1, Theorem 5].

\[\mathbb{E}\left[V\left(x^{(t+1)}\right)\right] \leqslant \frac{1}{1 + \mu\gamma} V\left(x^{(t)}\right)\]

References:

[1] A. Defazio (2016). A simple practical accelerated method for finite sums. Advances in Neural Information Processing Systems (NIPS), 29, 676-684.

Parameters

L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of functions.
verbose (int) –
Level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + CVXPY details.

Returns

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_point_saga(L=1, mu=.01, n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 31x31
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 10 function(s)
                 function 1 : Adding 2 scalar constraint(s) ...
                 function 1 : 2 scalar constraint(s) added
                 function 2 : Adding 2 scalar constraint(s) ...
                 function 2 : 2 scalar constraint(s) added
                 function 3 : Adding 2 scalar constraint(s) ...
                 function 3 : 2 scalar constraint(s) added
                 function 4 : Adding 2 scalar constraint(s) ...
                 function 4 : 2 scalar constraint(s) added
                 function 5 : Adding 2 scalar constraint(s) ...
                 function 5 : 2 scalar constraint(s) added
                 function 6 : Adding 2 scalar constraint(s) ...
                 function 6 : 2 scalar constraint(s) added
                 function 7 : Adding 2 scalar constraint(s) ...
                 function 7 : 2 scalar constraint(s) added
                 function 8 : Adding 2 scalar constraint(s) ...
                 function 8 : 2 scalar constraint(s) added
                 function 9 : Adding 2 scalar constraint(s) ...
                 function 9 : 2 scalar constraint(s) added
                 function 10 : Adding 2 scalar constraint(s) ...
                 function 10 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.9714053941143999
*** Example file: worst-case performance of Point SAGA for a given Lyapunov function ***
        PEPit guarantee:         E[V(x^(1))] <= 0.971405 V(x^(0))
        Theoretical guarantee:   E[V(x^(1))] <= 0.973292 V(x^(0))

4.5. Randomized coordinate descent for smooth strongly convex functions 

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_randomized_coordinate_descent_smooth_strongly_convex(L, mu, gamma, d, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for randomized block-coordinate descent with step-size \(\gamma\). That is, it computes the smallest possible \(\tau(L, \mu, \gamma, d)\) such that the guarantee

\[\mathbb{E}_i[\|x_{t+1}^{(i)} - x_\star \|^2] \leqslant \tau(L, \mu, \gamma, d) \|x_{t} - x_\star\|^2\]

where \(x_{t+1}^{(i)}\) denotes the value of the iterate \(x_{t+1}\) in the scenario where the \(i\) th block of coordinates is selected for the update with fixed step-size \(\gamma\), \(d\) is the number of blocks of coordinates and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(\mu\), \(L\), \(d\), and \(\gamma\), \(\tau(L, \mu, \gamma, d)\) is computed as the worst-case value of \(\mathbb{E}_i[\|x_{t+1}^{(i)} - x_\star \|^2]\) when \(\|x_t - x_\star\|^2 \leqslant 1\).

Algorithm: Randomized block-coordinate descent is described by

\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, d|]\right), \\ x_{t+1}^{(i)} & = & x_t - \gamma \nabla_i f(x_t), \end{eqnarray}\end{split}\]

where \(\gamma\) is a step-size and \(\nabla_i f(x_t)\) is the partial derivative corresponding to the block \(i\).

Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), the tight theoretical guarantee can be found in [1, Appendix I, Theorem 17]:

\[\mathbb{E}_i[\|x_{t+1}^{(i)} - x_\star \|^2] \leqslant \rho^2 \|x_t-x_\star\|^2,\]

where \(\rho^2 = \max \left( \frac{(\gamma\mu - 1)^2 + d - 1}{d},\frac{(\gamma L - 1)^2 + d - 1}{d} \right)\).

References:

[1] A. Taylor, F. Bach (2021). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. In Conference on Learning Theory (COLT).

Parameters

L (float) – the smoothness parameter.
mu (float) – the strong-convexity parameter.
gamma (float) – the step-size.
d (int) – the dimension.
verbose (int) –
Level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + CVXPY details.

Returns

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> L = 1
>>> mu = 0.1
>>> gamma = 2 / (mu + L)
>>> pepit_tau, theoretical_tau = wc_randomized_coordinate_descent_smooth_strongly_convex(L=L, mu=mu, gamma=gamma, d=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 4x4
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (3 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 2 scalar constraint(s) ...
                 function 1 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.8347107377149059
*** Example file: worst-case performance of randomized coordinate gradient descent ***
        PEPit guarantee:         E||x_(n+1) - x_*||^2 <= 0.834711 ||x_n - x_*||^2
        Theoretical guarantee:   E||x_(n+1) - x_*||^2 <= 0.834711 ||x_n - x_*||^2

4.6. Randomized coordinate descent for smooth convex functions 

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_randomized_coordinate_descent_smooth_convex(L, gamma, d, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is convex and \(L\)-smooth.

This code computes a worst-case guarantee for randomized block-coordinate descent with fixed step-size \(\gamma\). That is, it verifies that the inequality holds (the expectation is over the index of the block of coordinates that is randomly selected)

\[\mathbb{E}_i[\phi(x_{t+1}^{(i)})] \leqslant \phi(x_{t}),\]

where \(x_{t+1}^{(i)}\) denotes the value of the iterate \(x_{t+1}\) in the scenario where the \(i\) th block of coordinates is selected for the update with fixed step-size \(\gamma\), and \(d\) is the number of blocks of coordinates.

In short, for given values of \(L\), \(d\), and \(\gamma\), it computes the worst-case value of \(\mathbb{E}_i[\phi(x_{t+1}^{(i)})]\) such that \(\phi(x_{t}) \leqslant 1\).

Algorithm: Randomized block-coordinate descent is described by

\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, d|]\right), \\ x_{t+1}^{(i)} & = & x_t - \gamma \nabla_i f(x_t), \end{eqnarray}\end{split}\]

where \(\gamma\) is a step-size and \(\nabla_i f(x_t)\) is the partial derivative corresponding to the block \(i\).

Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), the tight theoretical guarantee can be found in [1, Appendix I, Theorem 16]:

\[\mathbb{E}_i[\phi(x^{(i)}_{t+1})] \leqslant \phi(x_{t}),\]

where \(\phi(x_t) = d_t (f(x_t) - f_\star) + \frac{L}{2} \|x_t - x_\star\|^2\), \(d_{t+1} = d_t + \frac{\gamma L}{d}\), and \(d_t \geqslant 1\).

References:

[1] A. Taylor, F. Bach (2021). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. In Conference on Learning Theory (COLT).

Parameters

L (float) – the smoothness parameter.
gamma (float) – the step-size.
d (int) – the dimension.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + CVXPY details.

Returns

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> L = 1
>>> pepit_tau, theoretical_tau = wc_randomized_coordinate_descent_smooth_convex(L=L, gamma=1 / L, d=2, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 12x12
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (9 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 42 scalar constraint(s) ...
                 function 1 : 42 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.9999978377393944
*** Example file: worst-case performance of randomized  coordinate gradient descent ***
        PEPit guarantee:         E[phi_(n+1)(x_(n+1))] <= 0.999998 phi_n(x_n)
        Theoretical guarantee:   E[phi_(n+1)(x_(n+1))] <= 1.0 phi_n(x_n)

4. Stochastic and randomized convex minimization

4.1. Stochastic gradient descent

4.2. Stochastic gradient descent in overparametrized setting

4.3. SAGA

4.4. Point SAGA

4.5. Randomized coordinate descent for smooth strongly convex functions

4.6. Randomized coordinate descent for smooth convex functions

4.1. Stochastic gradient descent 

4.2. Stochastic gradient descent in overparametrized setting 

4.3. SAGA 

4.4. Point SAGA 

4.5. Randomized coordinate descent for smooth strongly convex functions 

4.6. Randomized coordinate descent for smooth convex functions 