8. Inexact proximal methods

Accelerated inexact forward backward
Partially inexact Douglas Rachford splitting
Relatively inexact proximal point

8.1. Accelerated inexact forward backward 

PEPit.examples.inexact_proximal_methods.wc_accelerated_inexact_forward_backward(L, zeta, n, verbose=1)[source]

Consider the composite convex minimization problem,

\[F_\star \triangleq \min_x \left\{F(x) \equiv f(x) + g(x) \right\},\]

where \(f\) is \(L\)-smooth convex, and \(g\) is closed, proper, and convex. We further assume that one can readily evaluate the gradient of \(f\) and that one has access to an inexact version of the proximal operator of \(g\) (whose level of accuracy is controlled by some parameter \(\zeta\in (0,1)\)).

This code computes a worst-case guarantee for an accelerated inexact forward backward (AIFB) method (a.k.a., inexact accelerated proximal gradient method). That is, it computes the smallest possible \(\tau(n, L, \zeta)\) such that the guarantee

\[F(x_n) - F(x_\star) \leqslant \tau(n, L, \zeta) \|x_0 - x_\star\|^2,\]

is valid, where \(x_n\) is the output of the IAFB, and where \(x_\star\) is a minimizer of \(F\).

In short, for given values of \(n\), \(L\) and \(\zeta\), \(\tau(n, L, \zeta)\) is computed as the worst-case value of \(F(x_n) - F(x_\star)\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: Let \(t\in\{0,1,\ldots,n\}\). The method is presented in, e.g., [1, Algorithm 3.1]. For simplicity, we instantiate [1, Algorithm 3.1] using simple values for its parameters and for the problem setting (in the notation of [1]: \(A_0\triangleq 0\), \(\mu=0\), \(\xi_t \triangleq0\), \(\sigma_t\triangleq 0\), \(\lambda_t \triangleq\gamma\triangleq\tfrac{1}{L}\), \(\zeta_t\triangleq\zeta\), \(\eta \triangleq (1-\zeta^2) \gamma\)), and without backtracking, arriving to:

\begin{eqnarray} A_{t+1} && = A_t + \frac{\eta+\sqrt{\eta^2+4\eta A_t}}{2},\\ y_{t} && = x_t + \frac{A_{t+1}-A_t}{A_{t+1}} (z_t-x_t),\\ (x_{t+1},v_{t+1}) && \approx_{\varepsilon_t} \left(\mathrm{prox}_{\gamma g}\left(y_t-\gamma \nabla f(y_t)\right),\, \mathrm{prox}_{ g^*/\gamma}\left(\frac{y_t-\gamma \nabla f(y_t)}{\gamma}\right)\right),\\ && \text{with } \varepsilon_t = \frac{\zeta^2\gamma^2}{2}\|v_{t+1}+\nabla f(y_t) \|^2,\\ z_{t+1} && = z_t-(A_{t+1}-A_t)\left(v_{t+1}+\nabla f(y_t)\right),\\ \end{eqnarray}

where \(\{\varepsilon_t\}_{t\geqslant 0}\) is some sequence of accuracy parameters (whose values are fixed within the algorithm as it runs), and \(\{A_t\}_{t\geqslant 0}\) is some scalar sequence of parameters for the method (typical of accelerated methods).

The line with “\(\approx_{\varepsilon}\)” can be described as the pair \((x_{t+1},v_{t+1})\) satisfying an accuracy requirement provided by [1, Definition 2.3]. More precisely (but without providing any intuition), it requires the existence of some \(w_{t+1}\) such that \(v_{t+1} \in \partial g(w_{t+1})\) and for which the accuracy requirement

\[\gamma^2 || x_{t+1} - y_t + \gamma v_{t+1} ||^2 + \gamma (g(x_{t+1}) - g(w_{t+1}) - v_{t+1}(x_{t+1} - w_{t+1})) \leqslant \varepsilon_t,\]

is valid.

Theoretical guarantee: A theoretical upper bound is obtained in [1, Corollary 3.5]:

\[F(x_n)-F_\star\leqslant \frac{2L \|x_0-x_\star\|^2}{(1-\zeta^2)n^2}.\]

References: The method and theoretical result can be found in [1, Section 3].

[1] M. Barre, A. Taylor, F. Bach (2021). A note on approximate accelerated forward-backward methods with absolute and relative errors, and possibly strongly convex objectives. arXiv:2106.15536v2.

Parameters

L (float) – smoothness parameter.
zeta (float) – relative approximation parameter in (0,1).
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + CVXPY details.

Returns

pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_inexact_forward_backward(L=1.3, zeta=.45, n=11, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 59x59
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 156 scalar constraint(s) ...
                 function 1 : 156 scalar constraint(s) added
                 function 2 : Adding 528 scalar constraint(s) ...
                 function 2 : 528 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.018869997698251897
*** Example file: worst-case performance of an inexact accelerated forward backward method ***
        PEPit guarantee:         F(x_n)-F_* <= 0.01887 ||x_0 - x_*||^2
        Theoretical guarantee:   F(x_n)-F_* <= 0.0269437 ||x_0 - x_*||^2

8.2. Partially inexact Douglas Rachford splitting 

PEPit.examples.inexact_proximal_methods.wc_partially_inexact_douglas_rachford_splitting(mu, L, n, gamma, sigma, verbose=1)[source]

Consider the composite strongly convex minimization problem,

\[F_\star \triangleq \min_x \left\{ F(x) \equiv f(x) + g(x) \right\}\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex, and \(g\) is closed convex and proper. We denote by \(x_\star = \arg\min_x F(x)\) the minimizer of \(F\). The (exact) proximal operator of \(g\), and an approximate version of the proximal operator of \(f\) are assumed to be available.

This code computes a worst-case guarantee for a partially inexact Douglas-Rachford Splitting (DRS). That is, it computes the smallest possible \(\tau(n,L,\mu,\sigma,\gamma)\) such that the guarantee

\[\|z_{n} - z_\star\|^2 \leqslant \tau(n,L,\mu,\sigma,\gamma) \|z_0 - z_\star\|^2\]

is valid, where \(z_n\) is the output of the DRS (initiated at \(x_0\)), \(z_\star\) is its fixed point, \(\gamma\) is a step-size, and \(\sigma\) is the level of inaccuracy.

Algorithm: The partially inexact Douglas-Rachford splitting under consideration is described by

\begin{eqnarray} x_{t} && \approx_{\sigma} \arg\min_x \left\{ \gamma f(x)+\frac{1}{2} \|x-z_t\|^2 \right\},\\ y_{t} && = \arg\min_y \left\{ \gamma g(y)+\frac{1}{2} \|y-(x_t-\gamma \nabla f(x_t))\|^2 \right\},\\ z_{t+1} && = z_t + y_t - x_t. \end{eqnarray}

More precisely, the notation “\(\approx_{\sigma}\)” correspond to require the existence of some \(e_{t}\) such that

\begin{eqnarray} x_{t} && = z_t - \gamma (\nabla f(x_t) - e_t),\\ y_{t} && = \arg\min_y \left\{ \gamma g(y)+\frac{1}{2} \|y-(x_t-\gamma \nabla f(x_t))\|^2 \right\},\\ && \text{with } \|e_t\|^2 \leqslant \frac{\sigma^2}{\gamma^2}\|y_{t} - z_t + \gamma \nabla f(x_t) \|^2,\\ z_{t+1} && = z_t + y_t - x_t. \end{eqnarray}

Theoretical guarantee: The following tight theoretical bound is due to [2, Theorem 5.1]:

\[\|z_{n} - z_\star\|^2 \leqslant \max\left(\frac{1 - \sigma + \gamma \mu \sigma}{1 - \sigma + \gamma \mu}, \frac{\sigma + (1 - \sigma) \gamma L}{1 + (1 - \sigma) \gamma L)}\right)^{2n} \|z_0 - z_\star\|^2.\]

References: The method is from [1], its PEP formulation and the worst-case analysis from [2], see [2, Section 4.4] for more details.

[1] J. Eckstein and W. Yao (2018). Relative-error approximate versions of Douglas–Rachford splitting and special cases of the ADMM. Mathematical Programming, 170(2), 417-444.

[2] M. Barre, A. Taylor, F. Bach (2020). Principled analyses and design of first-order methods with inexact proximal operators, arXiv 2006.06041v2.

Parameters

L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of iterations.
gamma (float) – the step-size.
sigma (float) – noise parameter.
verbose (int) –
Level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + CVXPY details.

Returns

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_partially_inexact_douglas_rachford_splitting(mu=.1, L=5, n=5, gamma=1.4, sigma=.2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 18x18
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 40 scalar constraint(s) ...
                 function 1 : 40 scalar constraint(s) added
                 function 2 : Adding 30 scalar constraint(s) ...
                 function 2 : 30 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.28120549805153155
*** Example file: worst-case performance of the partially inexact Douglas Rachford splitting ***
        PEPit guarantee:         ||z_n - z_*||^2 <= 0.281205 ||z_0 - z_*||^2
        Theoretical guarantee:   ||z_n - z_*||^2 <= 0.281206 ||z_0 - z_*||^2

8.3. Relatively inexact proximal point 

PEPit.examples.inexact_proximal_methods.wc_relatively_inexact_proximal_point_algorithm(n, gamma, sigma, verbose=1)[source]

Consider the (possibly non-smooth) convex minimization problem,

\[f_\star \triangleq \min_x f(x)\]

where \(f\) is closed, convex, and proper. We denote by \(x_\star\) some optimal point of \(f\) (hence \(0\in\partial f(x_\star)\)). We further assume that one has access to an inexact version of the proximal operator of \(f\), whose level of accuracy is controlled by some parameter \(\sigma\geqslant 0\).

This code computes a worst-case guarantee for an inexact proximal point method. That is, it computes the smallest possible \(\tau(n, \gamma, \sigma)\) such that the guarantee

\[f(x_n) - f(x_\star) \leqslant \tau(n, \gamma, \sigma) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the method, \(\gamma\) is some step-size, and \(\sigma\) is the level of accuracy of the approximate proximal point oracle.

Algorithm: The approximate proximal point method under consideration is described by

\[x_{t+1} \approx_{\sigma} \arg\min_x \left\{ \gamma f(x)+\frac{1}{2} \|x-x_t\|^2 \right\},\]

where the notation “\(\approx_{\sigma}\)” corresponds to require the existence of some vector \(s_{t+1}\in\partial f(x_{t+1})\) and \(e_{t+1}\) such that

\[x_{t+1} = x_t - \gamma s_{t+1} + e_{t+1} \quad \quad \text{with }\|e_{t+1}\|^2 \leqslant \sigma^2\|x_{t+1} - x_t\|^2.\]

We note that the case \(\sigma=0\) implies \(e_{t+1}=0\) and this operation reduces to a standard proximal step with step-size \(\gamma\).

Theoretical guarantee: The following (empirical) upper bound is provided in [1, Section 3.5.1],

\[f(x_n) - f(x_\star) \leqslant \frac{1 + \sigma}{4 \gamma n^{\sqrt{1 - \sigma^2}}}\|x_0 - x_\star\|^2.\]

References: The precise formulation is presented in [1, Section 3.5.1].

[1] M. Barre, A. Taylor, F. Bach (2020). Principled analyses and design of first-order methods with inexact proximal operators. arXiv 2006.06041v2.

Parameters

n (int) – number of iterations.
gamma (float) – the step-size.
sigma (float) – accuracy parameter of the proximal point computation.
verbose (int) –
Level of information details to print.
- -1: No verbose at all.
- 0: This example’s output.
- 1: This example’s output + PEPit information.
- 2: This example’s output + PEPit information + CVXPY details.

Returns

pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_relatively_inexact_proximal_point_algorithm(n=8, gamma=10, sigma=.65, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 18x18
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 88 scalar constraint(s) ...
                 function 1 : 88 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal_inaccurate (solver: SCS); optimal value: 0.007753853579615959
*** Example file: worst-case performance of an inexact proximal point method in distance in function values ***
        PEPit guarantee:         f(x_n) - f(x_*) <= 0.00775385 ||x_0 - x_*||^2
        Theoretical guarantee:   f(x_n) - f(x_*) <= 0.00849444 ||x_0 - x_*||^2

8. Inexact proximal methods

8.1. Accelerated inexact forward backward

8.2. Partially inexact Douglas Rachford splitting

8.3. Relatively inexact proximal point

8.1. Accelerated inexact forward backward 

8.2. Partially inexact Douglas Rachford splitting 

8.3. Relatively inexact proximal point 