2. Composite convex minimization

2.1. Proximal gradient

PEPit.examples.composite_convex_minimization.wc_proximal_gradient(L, mu, gamma, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where \(f_1\) is \(L\)-smooth and \(\mu\)-strongly convex, and where \(f_2\) is closed convex and proper.

This code computes a worst-case guarantee for the proximal gradient method (PGM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee

\[\|x_n - x_\star\|^2 \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2,\]

is valid, where \(x_n\) is the output of the proximal gradient, and where \(x_\star\) is a minimizer of \(F\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\|x_n - x_\star\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: Proximal gradient is described by

\[\begin{split}\begin{eqnarray} y_t & = & x_t - \gamma \nabla f_1(x_t), \\ x_{t+1} & = & \arg\min_x \left\{f_2(x)+\frac{1}{2\gamma}\|x-y_t\|^2 \right\}, \end{eqnarray}\end{split}\]

for \(t \in \{ 0, \dots, n-1\}\) and where \(\gamma\) is a step-size.

Theoretical guarantee: It is well known that a tight guarantee for PGM is provided by

\[\|x_n - x_\star\|^2 \leqslant \max\{(1-L\gamma)^2,(1-\mu\gamma)^2\}^n \|x_0 - x_\star\|^2,\]

which can be found in, e.g., [1, Theorem 3.1]. It is a folk knowledge and the result can be found in many references for gradient descent; see, e.g.,[2, Section 1.4: Theorem 3], [3, Section 5.1] and [4, Section 4.4].

References:

[1] A. Taylor, J. Hendrickx, F. Glineur (2018). Exact worst-case convergence rates of the proximal gradient method for composite convex minimization. Journal of Optimization Theory and Applications, 178(2), 455-476.

[2] B. Polyak (1987). Introduction to Optimization. Optimization Software New York.

[3] E. Ryu, S. Boyd (2016). A primer on monotone operator methods. Applied and Computational Mathematics 15(1), 3-43.

[4] L. Lessard, B. Recht, A. Packard (2016). Analysis and design of optimization algorithms via integral quadratic constraints. SIAM Journal on Optimization 26(1), 57–95.

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • gamma (float) – proximal step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_proximal_gradient(L=1, mu=.1, gamma=1, n=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 6 scalar constraint(s) ...
                 function 1 : 6 scalar constraint(s) added
                 function 2 : Adding 6 scalar constraint(s) ...
                 function 2 : 6 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.6560999999942829
*** Example file: worst-case performance of the Proximal Gradient Method in function values***
        PEPit guarantee:         ||x_n - x_*||^2 <= 0.6561 ||x0 - xs||^2
        Theoretical guarantee:   ||x_n - x_*||^2 <= 0.6561 ||x0 - xs||^2

2.2. Accelerated proximal gradient

PEPit.examples.composite_convex_minimization.wc_accelerated_proximal_gradient(mu, L, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f(x) + h(x)\},\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex, and where \(h\) is closed convex and proper.

This code computes a worst-case guarantee for the accelerated proximal gradient method, also known as fast proximal gradient (FPGM) method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee

\[F(x_n) - F(x_\star) \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2,\]

is valid, where \(x_n\) is the output of the accelerated proximal gradient method, and where \(x_\star\) is a minimizer of \(F\).

In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(F(x_n) - F(x_\star)\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: Accelerated proximal gradient is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\begin{eqnarray} x_{t+1} & = & \arg\min_x \left\{h(x)+\frac{L}{2}\|x-\left(y_{t} - \frac{1}{L} \nabla f(y_t)\right)\|^2 \right\}, \\ y_{t+1} & = & x_{t+1} + \frac{i}{i+3} (x_{t+1} - x_{t}), \end{eqnarray}

where \(y_{0} = x_0\).

Theoretical guarantee: A tight (empirical) worst-case guarantee for FPGM is obtained in [1, method FPGM1 in Sec. 4.2.1, Table 1 in sec 4.2.2], for \(\mu=0\):

\[F(x_n) - F_\star \leqslant \frac{2 L}{n^2+5n+2} \|x_0 - x_\star\|^2,\]

which is attained on simple one-dimensional constrained linear optimization problems.

References:

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_proximal_gradient(L=1, mu=0, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 12x12
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 30 scalar constraint(s) ...
                 function 1 : 30 scalar constraint(s) added
                 function 2 : Adding 20 scalar constraint(s) ...
                 function 2 : 20 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.052630167313517565
(PEPit) Postprocessing: solver's output is not entirely feasible (smallest eigenvalue of the Gram matrix is: -7.28e-06 < 0).
 Small deviation from 0 may simply be due to numerical error. Big ones should be deeply investigated.
 In any case, from now the provided values of parameters are based on the projection of the Gram matrix onto the cone of symmetric semi-definite matrix.
*** Example file: worst-case performance of the Accelerated Proximal Gradient Method in function values***
        PEPit guarantee:         f(x_n)-f_* <= 0.0526302 ||x0 - xs||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.0526316 ||x0 - xs||^2

2.3. Bregman proximal point

PEPit.examples.composite_convex_minimization.wc_bregman_proximal_point(gamma, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]

where \(f_1(x)\) and \(f_2(x)\) are closed convex proper functions.

This code computes a worst-case guarantee for Bregman Proximal Point method. That is, it computes the smallest possible \(\tau(n, \gamma)\) such that the guarantee

\[F(x_n) - F(x_\star) \leqslant \tau(n, \gamma) D_{f_1}(x_\star; x_0)\]

is valid, where \(x_n\) is the output of the Bregman Proximal Point (BPP) method, where \(x_\star\) is a minimizer of \(F\), and when \(D_{f_1}\) is the Bregman distance generated by \(f_1\).

Algorithm: Bregman proximal point is described in [1, Section 2, equation (9)]. For \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} x_{t+1} & = & \arg\min_{u \in R^n} f_1(u) + \frac{1}{\gamma} D_{f_2}(u; x_t), \\ D_h(x; y) & = & h(x) - h(y) - \nabla h (y)^T(x - y). \end{eqnarray}

Theoretical guarantee: A tight empirical guarantee can be guessed from the numerics

\[F(x_n) - F(x_\star) \leqslant \frac{1}{\gamma n} D_{f_1}(x_\star, x_0).\]

References:

[1] Y. Censor, S.A. Zenios (1992). Proximal minimization algorithm with D-functions. Journal of Optimization Theory and Applications, 73(3), 451-464.

Parameters
  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Examples

>>> pepit_tau, theoretical_tau = wc_bregman_proximal_point(gamma=3, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 14x14
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 30 scalar constraint(s) ...
                 function 1 : 30 scalar constraint(s) added
                 function 2 : Adding 42 scalar constraint(s) ...
                 function 2 : 42 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.06666740784196148
*** Example file: worst-case performance of the Bregman Proximal Point in function values ***
        PEPit guarantee:         F(x_n)-F_* <= 0.0666674 Dh(x_*; x_0)
        Theoretical guarantee:   F(x_n)-F_* <= 0.0666667 Dh(x_*; x_0)

2.4. Douglas Rachford splitting

PEPit.examples.composite_convex_minimization.wc_douglas_rachford_splitting(L, alpha, theta, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]

where \(f_1(x)\) is is convex, closed and proper , and \(f_2\) is \(L\)-smooth. Both proximal operators are assumed to be available.

This code computes a worst-case guarantee for the Douglas Rachford Splitting (DRS) method. That is, it computes the smallest possible \(\tau(n, L, \alpha, \theta)\) such that the guarantee

\[F(y_n) - F(x_\star) \leqslant \tau(n, L, \alpha, \theta) \|x_0 - x_\star\|^2.\]

is valid, where it is known that \(x_k\) and \(y_k\) converge to \(x_\star\), but not \(w_k\) (see definitions in the section Algorithm). Hence we require the initial condition on \(x_0\) (arbitrary choice, partially justified by the fact we choose \(f_2\) to be the smooth function).

Note that \(y_n\) is feasible as it has a finite value for \(f_1\) (output of the proximal operator on \(f_1\)) and as \(f_2\) is smooth.

Algorithm:

Our notations for the DRS method are as follows, for \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} x_t & = & \mathrm{prox}_{\alpha f_2}(w_t), \\ y_t & = & \mathrm{prox}_{\alpha f_1}(2x_t - w_t), \\ w_{t+1} & = & w_t + \theta (y_t - x_t). \end{eqnarray}

This description can be found in [1, Section 7.3].

Theoretical guarantee: We compare the output with that of PESTO [2] for when \(0\leqslant n \leqslant 10\) in the case where \(\alpha=\theta=L=1\).

References:

[1] E. Ryu, S. Boyd (2016). A primer on monotone operator methods. Applied and Computational Mathematics 15(1), 3-43.

[2] A. Taylor, J. Hendrickx, F. Glineur (2017). Performance Estimation Toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In 56th IEEE Conference on Decision and Control (CDC).

Parameters
  • L (float) – the smoothness parameter.

  • alpha (float) – parameter of the scheme.

  • theta (float) – parameter of the scheme.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_douglas_rachford_splitting(L=1, alpha=1, theta=1, n=9, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 22x22
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 90 scalar constraint(s) ...
                 function 1 : 90 scalar constraint(s) added
                 function 2 : Adding 110 scalar constraint(s) ...
                 function 2 : 110 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.027792700548325236
*** Example file: worst-case performance of the Douglas Rachford Splitting in function values ***
        PEPit guarantee:         f(y_n)-f_* <= 0.0278 ||x0 - xs||^2
        Theoretical guarantee:   f(y_n)-f_* <= 0.0278 ||x0 - xs||^2

2.5. Douglas Rachford splitting contraction

PEPit.examples.composite_convex_minimization.wc_douglas_rachford_splitting_contraction(mu, L, alpha, theta, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x) \}\]

where \(f_1(x)\) is \(L\)-smooth and \(\mu\)-strongly convex, and \(f_2\) is convex, closed and proper. Both proximal operators are assumed to be available.

This code computes a worst-case guarantee for the Douglas Rachford Splitting (DRS) method. That is, it computes the smallest possible \(\tau(\mu,L,\alpha,\theta,n)\) such that the guarantee

\[\|w_1 - w_1'\|^2 \leqslant \tau(\mu,L,\alpha,\theta,n) \|w_0 - w_0'\|^2.\]

is valid, where \(x_n\) is the output of the Douglas Rachford Splitting method. It is a contraction factor computed when the algorithm is started from two different points \(w_0\) and \(w_0\).

Algorithm:

Our notations for the DRS method are as follows [3, Section 7.3], for \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} x_t & = & \mathrm{prox}_{\alpha f_2}(w_t), \\ y_t & = & \mathrm{prox}_{\alpha f_1}(2x_t - w_t), \\ w_{t+1} & = & w_t + \theta (y_t - x_t). \end{eqnarray}

Theoretical guarantee:

The tight theoretial guarantee is obtained in [2, Theorem 2]:

\[\|w_1 - w_1'\|^2 \leqslant \max\left(\frac{1}{1 + \mu \alpha}, \frac{\alpha L }{1 + L \alpha}\right)^{2n} \|w_0 - w_0'\|^2\]

for when \(\theta=1\).

References:

Details on the SDP formulations can be found in

[1] E. Ryu, A. Taylor, C. Bergeling, P. Giselsson (2020). Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization, 30(3), 2251-2271.

When \(\theta = 1\), the bound can be compared with that of [2, Theorem 2]

[2] P. Giselsson, and S. Boyd (2016). Linear convergence and metric selection in Douglas-Rachford splitting and ADMM. IEEE Transactions on Automatic Control, 62(2), 532-544.

A description for the DRS method can be found in [3, 7.3]

[3] E. Ryu, S. Boyd (2016). A primer on monotone operator methods. Applied and Computational Mathematics 15(1), 3-43.

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • alpha (float) – parameter of the scheme.

  • theta (float) – parameter of the scheme.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Examples

>>> pepit_tau, theoretical_tau = wc_douglas_rachford_splitting_contraction(mu=.1, L=1, alpha=3, theta=1, n=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 12x12
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 20 scalar constraint(s) ...
                 function 1 : 20 scalar constraint(s) added
                 function 2 : Adding 20 scalar constraint(s) ...
                 function 2 : 20 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.35012779919911946
*** Example file: worst-case performance of the Douglas-Rachford splitting in distance ***
        PEPit guarantee:         ||w - wp||^2 <= 0.350128 ||w0 - w0p||^2
        Theoretical guarantee:   ||w - wp||^2 <= 0.350128 ||w0 - w0p||^2

2.6. Accelerated Douglas Rachford splitting

PEPit.examples.composite_convex_minimization.wc_accelerated_douglas_rachford_splitting(mu, L, alpha, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where \(f_1\) is closed convex and proper, and \(f_2\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for accelerated Douglas-Rachford. That is, it computes the smallest possible \(\tau(n, L, \mu, \alpha)\) such that the guarantee

\[F(y_n) - F(x_\star) \leqslant \tau(n,L,\mu,\alpha) \|w_0 - w_\star\|^2\]

is valid, \(\alpha\) is a parameter of the method, and where \(y_n\) is the output of the accelerated Douglas-Rachford Splitting method, where \(x_\star\) is a minimizer of \(F\), and \(w_\star\) defined such that

\[x_\star = \mathrm{prox}_{\alpha f_2}(w_\star)\]

is an optimal point.

In short, for given values of \(n\), \(L\), \(\mu\), \(\alpha\), \(\tau(n, L, \mu, \alpha)\) is computed as the worst-case value of \(F(y_n)-F_\star\) when \(\|w_0 - w_\star\|^2 \leqslant 1\).

Algorithm: The accelerated Douglas-Rachford splitting is described in [1, Section 4]. For \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} x_{t} & = & \mathrm{prox}_{\alpha f_2} (u_t),\\ y_{t} & = & \mathrm{prox}_{\alpha f_1}(2x_t-u_t),\\ w_{t+1} & = & u_t + \theta (y_t-x_t),\\ u_{t+1} & = & \left\{\begin{array}{ll} w_{t+1}+\frac{t-1}{t+2}(w_{t+1}-w_t)\, & \text{if } t >1,\\ w_{t+1} & \text{otherwise.} \end{array}\right. \end{eqnarray}

Theoretical guarantee: There is no known worst-case guarantee for this method beyond quadratic minimization. For quadratics, an upper bound on is provided by [1, Theorem 5]:

\[F(y_n) - F_\star \leqslant \frac{2}{\alpha \theta (n + 3)^ 2} \|w_0-w_\star\|^2,\]

when \(\theta=\frac{1-\alpha L}{1+\alpha L}\) and \(\alpha < \frac{1}{L}\).

References: An analysis of the accelerated Douglas-Rachford splitting is available in [1, Theorem 5] for when the convex minimization problem is quadratic.

[1] P. Patrinos, L. Stella, A. Bemporad (2014). Douglas-Rachford splitting: Complexity estimates and accelerated variants. In 53rd IEEE Conference on Decision and Control (CDC).

Parameters
  • mu (float) – the strong convexity parameter.

  • L (float) – the smoothness parameter.

  • alpha (float) – the parameter of the scheme.

  • n (int) – the number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value (upper bound for quadratics; not directly comparable).

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_douglas_rachford_splitting(mu=.1, L=1, alpha=.9, n=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 11x11
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 20 scalar constraint(s) ...
                 function 1 : 20 scalar constraint(s) added
                 function 2 : Adding 20 scalar constraint(s) ...
                 function 2 : 20 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.19291623136473224
*** Example file: worst-case performance of the Accelerated Douglas Rachford Splitting in function values ***
        PEPit guarantee:                         F(y_n)-F_* <= 0.192916 ||x0 - ws||^2
        Theoretical guarantee for quadratics:    F(y_n)-F_* <= 1.68889 ||x0 - ws||^2

2.7. Frank Wolfe

PEPit.examples.composite_convex_minimization.wc_frank_wolfe(L, D, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where \(f_1\) is \(L\)-smooth and convex and where \(f_2\) is a convex indicator function on \(\mathcal{D}\) of diameter at most \(D\).

This code computes a worst-case guarantee for the conditional gradient method, aka Frank-Wolfe method. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[F(x_n) - F(x_\star) \leqslant \tau(n, L) D^2,\]

is valid, where x_n is the output of the conditional gradient method, and where \(x_\star\) is a minimizer of \(F\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(F(x_n) - F(x_\star)\) when \(D \leqslant 1\).

Algorithm:

This method was first presented in [1]. A more recent version can be found in, e.g., [2, Algorithm 1]. For \(t \in \{0, \dots, n-1\}\),

\[\begin{split}\begin{eqnarray} y_t & = & \arg\min_{s \in \mathcal{D}} \langle s \mid \nabla f_1(x_t) \rangle, \\ x_{t+1} & = & \frac{t}{t + 2} x_t + \frac{2}{t + 2} y_t. \end{eqnarray}\end{split}\]

Theoretical guarantee:

An upper guarantee obtained in [2, Theorem 1] is

\[F(x_n) - F(x_\star) \leqslant \frac{2L D^2}{n+2}.\]

References:

[1] M .Frank, P. Wolfe (1956). An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2), 95-110.

[2] M. Jaggi (2013). Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In 30th International Conference on Machine Learning (ICML).

Parameters
  • L (float) – the smoothness parameter.

  • D (float) – diameter of \(f_2\).

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_frank_wolfe(L=1, D=1, n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 26x26
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 132 scalar constraint(s) ...
                 function 1 : 132 scalar constraint(s) added
                 function 2 : Adding 325 scalar constraint(s) ...
                 function 2 : 325 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.07830185202143693
*** Example file: worst-case performance of the Conditional Gradient (Frank-Wolfe) in function value ***
        PEPit guarantee:         f(x_n)-f_* <= 0.0783019 ||x0 - xs||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.166667 ||x0 - xs||^2

2.8. Improved interior method

PEPit.examples.composite_convex_minimization.wc_improved_interior_algorithm(L, mu, c, lam, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where \(f_1\) is a \(L\)-smooth convex function, and \(f_2\) is a closed convex indicator function. We use a kernel function \(h\) that is assumed to be closed, proper, and strongly convex (see [1, Section 5]).

This code computes a worst-case guarantee for Improved interior gradient algorithm (IGA). That is, it computes the smallest possible \(\tau(\mu,L,c,\lambda,n)\) such that the guarantee

\[F(x_n) - F(x_\star) \leqslant \tau(\mu,L,c,\lambda,n) (c D_h(x_\star;x_0) + f_1(x_0) - f_1(x_\star))\]

is valid, where \(x_n\) is the output of the IGA and where \(x_\star\) is a minimizer of \(F\) and \(D_h\) is the Bregman distance generated by \(h\).

In short, for given values of \(\mu\), \(L\), \(c\), \(\lambda\) and \(n\), \(\tau(\mu,L,c,\lambda,n)\) is computed as the worst-case value of \(F(x_n)-F_\star\) when \(c D_h(x_\star;x_0) + f_1(x_0) - f_1(x_\star)\leqslant 1\).

Algorithm: The IGA is described in [1, “Improved Interior Gradient Algorithm”]. For \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} \alpha_t & = & \frac{\sqrt{(c_t\lambda)^2+4c_t\lambda}-\lambda c_t}{2},\\ y_t & = & (1-\alpha_t) x_t + \alpha_t z_t,\\ c_{t+1} & = & (1-\alpha_t)c_t,\\ z_{t+1} & = & \arg\min_{z} \left\{ \left< z;\frac{\alpha_t}{c_{t+1}}\nabla f_1(y_t)\right> +f_2(z)+D_h(z;z_t)\right\}, \\ x_{t+1} & = & (1-\alpha_t) x_t + \alpha_t z_{t+1}. \end{eqnarray}

Theoretical guarantee: The following upper bound can be found in [1, Theorem 5.2]:

\[F(x_n) - F_\star \leqslant \frac{4L}{c n^2}\left(c D_h(x_\star;x_0) + f_1(x_0) - f_1(x_\star) \right).\]

References:

[1] A. Auslender, M. Teboulle (2006). Interior gradient and proximal methods for convex and conic optimization. SIAM Journal on Optimization 16.3 (2006): 697-725.

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong-convexity parameter.

  • c (float) – initial value.

  • lam (float) – the step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> lam = 1 / L
>>> pepit_tau, theoretical_tau = wc_improved_interior_algorithm(L=L, mu=1, c=1, lam=lam, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 22x22
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                 function 1 : Adding 42 scalar constraint(s) ...
                 function 1 : 42 scalar constraint(s) added
                 function 2 : Adding 49 scalar constraint(s) ...
                 function 2 : 49 scalar constraint(s) added
                 function 3 : Adding 42 scalar constraint(s) ...
                 function 3 : 42 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal_inaccurate (solver: SCS); optimal value: 0.06675394483126838
*** Example file: worst-case performance of the Improved interior gradient algorithm in function values ***
        PEPit guarantee:         F(x_n)-F_* <= 0.0667539 (c * Dh(xs;x0) + f1(x0) - F_*)
        Theoretical guarantee:   F(x_n)-F_* <= 0.111111 (c * Dh(xs;x0) + f1(x0) - F_*)

2.9. No Lips in function value

PEPit.examples.composite_convex_minimization.wc_no_lips_in_function_value(L, gamma, n, verbose=1)[source]

Consider the constrainted composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where \(f_1\) is convex and \(L\)-smooth relatively to \(h\), \(h\) being closed proper and convex, and where \(f_2\) is a closed convex indicator function.

This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[F(x_n) - F_\star \leqslant \tau(n, L) D_h(x_\star; x_0),\]

is valid, where \(x_n\) is the output of the NoLips method, where \(x_\star\) is a minimizer of \(F\), and where \(D_h\) is the Bregman divergence generated by \(h\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(F(x_n) - F_\star\) when \(D_h(x_\star; x_0) \leqslant 1\).

Algorithm: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [2, Algorithm 1]. For \(t \in \{0, \dots, n-1\}\),

\[x_{t+1} = \arg\min_{u} \{f_2(u)+\langle \nabla f_1(x_t) \mid u - x_t \rangle + \frac{1}{\gamma} D_h(u; x_t)\}.\]

Theoretical guarantee:

The tight guarantee obtained in [2, Theorem 1] is

\[F(x_n) - F_\star \leqslant \frac{1}{\gamma n} D_h(x_\star; x_0),\]

for any \(\gamma \leq \frac{1}{L}\); tightness is provided in [2, page 23].

References: NoLips was proposed [1] for convex problems involving relative smoothness. The worst-case analysis using a PEP, as well as the tightness are provided in [2].

[1] H.H. Bauschke, J. Bolte, M. Teboulle (2017). A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications. Mathematics of Operations Research, 2017, vol. 42, no 2, p. 330-348.

[2] R. Dragomir, A. Taylor, A. d’Aspremont, J. Bolte (2021). Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 1-43.

Notes

Disclaimer: This example requires some experience with PEPit and PEPs ([2], section 4).

Parameters
  • L (float) – relative-smoothness parameter.

  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> gamma = 1 / (2 * L)
>>> pepit_tau, theoretical_tau = wc_no_lips_in_function_value(L=L, gamma=gamma, n=3, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 15x15
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                 function 1 : Adding 20 scalar constraint(s) ...
                 function 1 : 20 scalar constraint(s) added
                 function 2 : Adding 20 scalar constraint(s) ...
                 function 2 : 20 scalar constraint(s) added
                 function 3 : Adding 16 scalar constraint(s) ...
                 function 3 : 16 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.6666714558260607
*** Example file: worst-case performance of the NoLips in function values ***
        PEPit guarantee:         F(x_n) - F_* <= 0.666671 Dh(x_*; x_0)
        Theoretical guarantee:   F(x_n) - F_* <= 0.666667 Dh(x_*; x_0)

2.10. No Lips in Bregman divergence

PEPit.examples.composite_convex_minimization.wc_no_lips_in_bregman_divergence(L, gamma, n, verbose=1)[source]

Consider the constrainted composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where \(f_1\) is convex and \(L\)-smooth relatively to \(h\), \(h\) being closed proper and convex, and where \(f_2\) is a closed convex indicator function.

This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[\min_{t\leqslant n} D_h(x_{t-1}; x_t) \leqslant \tau(n, L) D_h(x_\star; x_0),\]

is valid, where \(x_n\) is the output of the NoLips method, where \(x_\star\) is a minimizer of \(F\), and where \(D_h\) is the Bregman divergence generated by \(h\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(\min_{t\leqslant n} D_h(x_{t-1}; x_t)\) when \(D_h(x_\star; x_0) \leqslant 1\).

Algorithm: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [2, Algorithm 1]. For \(t \in \{0, \dots, n-1\}\),

\[x_{t+1} = \arg\min_{u} \{f_2(u)+\langle \nabla f_1(x_t) \mid u - x_t \rangle + \frac{1}{\gamma} D_h(u; x_t)\}.\]

Theoretical guarantee: The upper guarantee obtained in [2, Proposition 4] is

\[\min_{t\leqslant n} D_h(x_{t-1}; x_t) \leqslant \frac{2}{n (n - 1)} D_h(x_\star; x_0),\]

for any \(\gamma \leq \frac{1}{L}\). It is empirically tight.

References:

[1] H.H. Bauschke, J. Bolte, M. Teboulle (2017). A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications. Mathematics of Operations Research, 2017, vol. 42, no 2, p. 330-348.

[2] R. Dragomir, A. Taylor, A. d’Aspremont, J. Bolte (2021). Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 1-43.

Notes

Disclaimer: This example requires some experience with PEPit and PEPs ([2], section 4).

Parameters
  • L (float) – relative-smoothness parameter.

  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_no_lips_in_bregman_divergence(L=L, gamma=gamma, n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 36x36
(PEPit) Setting up the problem: performance measure is minimum of 10 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                 function 1 : Adding 132 scalar constraint(s) ...
                 function 1 : 132 scalar constraint(s) added
                 function 2 : Adding 132 scalar constraint(s) ...
                 function 2 : 132 scalar constraint(s) added
                 function 3 : Adding 121 scalar constraint(s) ...
                 function 3 : 121 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.022279210584840024
*** Example file: worst-case performance of the NoLips_2 in Bregman divergence ***
        PEPit guarantee:         min_t Dh(x_(t-1); x_t) <= 0.0222792 Dh(x_*; x_0)
        Theoretical guarantee:   min_t Dh(x_(t-1); x_t) <= 0.0222222 Dh(x_*; x_0)

2.11. Three operator splitting

PEPit.examples.composite_convex_minimization.wc_three_operator_splitting(mu1, L1, L3, alpha, theta, n, verbose=1)[source]

Consider the composite convex minimization problem,

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x) + f_3(x)\}\]

where, \(f_1\) is \(L_1\)-smooth and \(\mu_1\)-strongly convex, \(f_2\) is closed, convex and proper, and \(f_3\) is \(L_3\)-smooth convex. Proximal operators are assumed to be available for \(f_1\) and \(f_2\).

This code computes a worst-case guarantee for the Three Operator Splitting (TOS). That is, it computes the smallest possible \(\tau(n, L_1, L_3, \mu_1)\) such that the guarantee

\[\|w^{(0)}_{n} - w^{(1)}_{n}\|^2 \leqslant \tau(n, L_1, L_3, \mu_1, \alpha, \theta) \|w^{(0)}_{0} - w^{(1)}_{0}\|^2\]

is valid, where \(w^{(0)}_{0}\) and \(w^{(1)}_{0}\) are two different starting points and \(w^{(0)}_{n}\) and \(w^{(1)}_{n}\) are the two corresponding \(n^{\mathrm{th}}\) outputs of TOS. (i.e., how do the iterates contract when the method is started from two different initial points).

In short, for given values of \(n\), \(L_1\), \(L_3\), \(\mu_1\), \(\alpha\) and \(\theta\), the contraction factor \(\tau(n, L_1, L_3, \mu_1, \alpha, \theta)\) is computed as the worst-case value of \(\|w^{(0)}_{n} - w^{(1)}_{n}\|^2\) when \(\|w^{(0)}_{0} - w^{(1)}_{0}\|^2 \leqslant 1\).

Algorithm: One iteration of the algorithm is described in [1]. For \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} x_t & = & \mathrm{prox}_{\alpha, f_2}(w_t), \\ y_t & = & \mathrm{prox}_{\alpha, f_1}(2 x_t - w_t - \alpha \nabla f_3(x_t)), \\ w_{t+1} & = & w_t + \theta (y_t - x_t). \end{eqnarray}

References: The TOS was introduced in [1].

[1] D. Davis, W. Yin (2017). A three-operator splitting scheme and its optimization applications. Set-valued and variational analysis, 25(4), 829-858.

Parameters
  • mu1 (float) – the strong convexity parameter of function \(f_1\).

  • L1 (float) – the smoothness parameter of function \(f_1\).

  • L3 (float) – the smoothness parameter of function \(f_3\).

  • alpha (float) – parameter of the scheme.

  • theta (float) – parameter of the scheme.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (None) – no theoretical value.

Example

>>> L3 = 1
>>> alpha = 1 / L3
>>> pepit_tau, theoretical_tau = wc_three_operator_splitting(mu1=0.1, L1=10, L3=L3, alpha=alpha, theta=1, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 26x26
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                 function 1 : Adding 56 scalar constraint(s) ...
                 function 1 : 56 scalar constraint(s) added
                 function 2 : Adding 56 scalar constraint(s) ...
                 function 2 : 56 scalar constraint(s) added
                 function 3 : Adding 56 scalar constraint(s) ...
                 function 3 : 56 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.47544137382115453
*** Example file: worst-case performance of the Three Operator Splitting in distance ***
        PEPit guarantee:         ||w^2_n - w^1_n||^2 <= 0.475441 ||x0 - ws||^2