7. Potential functions

Gradient descent Lyapunov 1
Gradient descent Lyapunov 2
Accelerated gradient method

7.1. Gradient descent Lyapunov 1 

PEPit.examples.potential_functions.wc_gradient_descent_lyapunov_1(L, gamma, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code verifies a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it verifies that the Lyapunov (or potential/energy) function

\[V_n \triangleq n (f(x_n) - f_\star) + \frac{L}{2} \|x_n - x_\star\|^2\]

is decreasing along all trajectories and all smooth convex function \(f\) (i.e., in the worst-case):

\[V_{n+1} \leqslant V_n,\]

where \(x_{n+1}\) is obtained from a gradient step from \(x_{n}\) with fixed step-size \(\gamma=\frac{1}{L}\).

Algorithm: Onte iteration of gradient descent is described by

\[x_{n+1} = x_n - \gamma \nabla f(x_n),\]

where \(\gamma\) is a step-size.

Theoretical guarantee: The theoretical guarantee can be found in e.g., [1, Theorem 3.3]:

\[V_{n+1} - V_n \leqslant 0,\]

when \(\gamma=\frac{1}{L}\).

References: The detailed potential function can found in [1] and the SDP approach can be found in [2].

[1] N. Bansal, A. Gupta (2019). Potential-function proofs for gradient methods. Theory of Computing, 15(1), 1-32.

[2] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. Conference on Learning Theory (COLT).

Parameters

L (float) – the smoothness parameter.
gamma (float) – the step-size.
n (int) – current iteration number.
verbose (int) –
Level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + CVXPY details.

Returns

pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.

Examples

>>> L = 1
>>> pepit_tau, theoretical_tau = wc_gradient_descent_lyapunov_1(L=L, gamma=1 / L, n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 4x4
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 6 scalar constraint(s) ...
                 function 1 : 6 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 3.3902995517363515e-18
*** Example file: worst-case performance of gradient descent with fixed step-size for a given Lyapunov function***
        PEPit guarantee:        V_(n+1) - V_(n) <= 3.3903e-18
        Theoretical guarantee:  V_(n+1) - V_(n) <= 0.0

7.2. Gradient descent Lyapunov 2 

PEPit.examples.potential_functions.wc_gradient_descent_lyapunov_2(L, gamma, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code verifies a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it verifies that the Lyapunov (or potential/energy) function

\[V_n \triangleq (2n + 1) L \left(f(x_n) - f_\star\right) + n(n+2) \|\nabla f(x_n)\|^2 + L^2 \|x_n - x_\star\|^2\]

is decreasing along all trajectories and all smooth convex function \(f\) (i.e., in the worst-case):

\[V_{n+1} \leqslant V_n,\]

where \(x_{n+1}\) is obtained from a gradient step from \(x_{n}\) with fixed step-size \(\gamma=\frac{1}{L}\).

Algorithm: Onte iteration of radient descent is described by

\[x_{n+1} = x_n - \gamma \nabla f(x_n),\]

where \(\gamma\) is a step-size.

Theoretical guarantee: The theoretical guarantee can be found in [1, Theorem 3]:

\[V_{n+1} - V_n \leqslant 0,\]

when \(\gamma=\frac{1}{L}\).

References: The detailed potential function and SDP approach can be found in [1].

[1] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. Conference on Learning Theory (COLT).

Parameters

L (float) – the smoothness parameter.
gamma (float) – the step-size.
n (int) – current iteration number.
verbose (int) –
Level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + CVXPY details.

Returns

pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.

Examples

>>> L = 1
>>> pepit_tau, theoretical_tau = wc_gradient_descent_lyapunov_2(L=L, gamma=1 / L, n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 4x4
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 6 scalar constraint(s) ...
                 function 1 : 6 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 1.894425729310791e-17
*** Example file: worst-case performance of gradient descent with fixed step size for a given Lyapunov function***
        PEPit guarantee:        V_(n+1) - V_(n) <= 1.89443e-17
        Theoretical guarantee:  V_(n+1) - V_(n) <= 0.0

7.3. Accelerated gradient method 

PEPit.examples.potential_functions.wc_accelerated_gradient_method(L, gamma, lam, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code verifies a worst-case guarantee for an accelerated gradient method. That is, it verifies that the Lyapunov (or potential/energy) function

\[V_n \triangleq \lambda_n^2 (f(x_n) - f_\star) + \frac{L}{2} \|z_n - x_\star\|^2\]

is decreasing along all trajectories and all smooth convex function \(f\) (i.e., in the worst-case):

\[V_{n+1} \leqslant V_n,\]

where \(x_{n+1}\), \(z_{n+1}\), and \(\lambda_{n+1}\) are obtained from one iteration of the accelerated gradient method below, from some arbitrary \(x_{n}\), \(z_{n}\), and \(\lambda_{n}\).

Algorithm: One iteration of accelerated gradient method is described by

\[\begin{split}\begin{eqnarray} \text{Set: }\lambda_{n+1} & = & \frac{1}{2} \left(1 + \sqrt{4\lambda_n^2 + 1}\right), \tau_n & = & \frac{1}{\lambda_{n+1}}, \text{ and } \eta_n & = & \frac{\lambda_{n+1}^2 - \lambda_{n}^2}{L} \\ y_n & = & (1 - \tau_n) x_n + \tau_n z_n,\\ z_{n+1} & = & z_n - \eta_n \nabla f(y_n), \\ x_{n+1} & = & y_n - \gamma \nabla f(y_n). \end{eqnarray}\end{split}\]

Theoretical guarantee: The following worst-case guarantee can be found in e.g., [2, Theorem 5.3]:

\[V_{n+1} - V_n \leqslant 0,\]

when \(\gamma=\frac{1}{L}\).

References: The potential can be found in the historical [1]; and in more recent works, e.g., [2, 3].

[1] Y. Nesterov (1983). A method for solving the convex programming problem with convergence rate :math:`O(1/k^2). In Dokl. akad. nauk Sssr (Vol. 269, pp. 543-547). <http://www.mathnet.ru/links/9bcb158ed2df3d8db3532aafd551967d/dan46009.pdf>`_

[2] N. Bansal, A. Gupta (2019). Potential-function proofs for gradient methods. Theory of Computing, 15(1), 1-32.

[3] A. d’Aspremont, D. Scieur, A. Taylor (2021). Acceleration Methods. Foundations and Trends in Optimization: Vol. 5, No. 1-2.

Parameters

L (float) – the smoothness parameter.
gamma (float) – the step-size.
lam (float) – the initial value for sequence \((\lambda_t)_t\).
verbose (int) –
Level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + CVXPY details.

Returns

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Examples

>>> L = 1
>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_method(L=L, gamma=1 / L, lam=10., verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 6x6
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 12 scalar constraint(s) ...
                 function 1 : 12 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 5.264872499157039e-14
*** Example file: worst-case performance of accelerated gradient method for a given Lyapunov function***
        PEPit guarantee:         V_(n+1) - V_n <= 5.26487e-14
        Theoretical guarantee:   V_(n+1) - V_n <= 0.0

7. Potential functions

7.1. Gradient descent Lyapunov 1

7.2. Gradient descent Lyapunov 2

7.3. Accelerated gradient method

7.1. Gradient descent Lyapunov 1 

7.2. Gradient descent Lyapunov 2 

7.3. Accelerated gradient method 