Primitive steps

Inexact gradient step 

PEPit.primitive_steps.inexact_gradient_step(x0, f, gamma, epsilon, notion='absolute')[source]

This routine performs a step \(x \leftarrow x_0 - \gamma d_{x_0}\) where \(d_{x_0}\) is close to the gradient of \(f\) in \(x_0\) in the following sense:

\[\begin{split}\|d_{x_0} - \nabla f(x_0)\|^2 \leqslant \left\{ \begin{eqnarray} & \varepsilon^2 & \text{if notion is set to 'absolute'}, \\ & \varepsilon^2 \|\nabla f(x_0)\|^2 & \text{if notion is set to 'relative'}. \end{eqnarray} \right.\end{split}\]

This relative approximation is used at least in 3 PEPit examples, in particular in 2 unconstrained convex minimizations: an inexact gradient descent, and an inexact accelerated gradient.

References

[1] E. De Klerk, F. Glineur, A. Taylor (2020). Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation. SIAM Journal on Optimization, 30(3), 2053-2082.

Parameters:

x0 (Point) – starting point x0.
f (Function) – a function.
gamma (float) – the step size parameter.
epsilon (float) – the required accuracy.
notion (string) – defines the mode (absolute or relative inaccuracy). By default, notion=’absolute’.

Returns:

x (Point) – the output point.
dx0 (Point) – the approximate (sub)gradient of f at x0.
fx0 (Expression) – the value of the function f at x0.

Raises:

ValueError – if notion is not set in [‘absolute’, ‘relative’].

Note

When \(\gamma\) is set to 0, then this routine returns \(x_0\), \(d_{x_0}\), and \(f_{x_0}\). It is used as is in the example of unconstrained convex minimization scheme called “inexact gradient exact line search” only to access to the direction \(d_{x_0}\) close to the gradient \(g_{x_0}\).

Exact line-search step 

PEPit.primitive_steps.exact_linesearch_step(x0, f, directions, name='new_x')[source]

This routine outputs some \(x\) by mimicking an exact line/span search in specified directions. It is used for instance in PEPit.examples.unconstrained_convex_minimization.wc_gradient_exact_line_search and in PEPit.examples.unconstrained_convex_minimization.wc_conjugate_gradient.

The routine aims at mimicking the operation:

\begin{eqnarray} x & = & x_0 - \sum_{i=1}^{T} \gamma_i d_i,\\ \text{with } \overrightarrow{\gamma} & = & \arg\min_\overrightarrow{\gamma} f\left(x_0 - \sum_{i=1}^{T} \gamma_i d_i\right), \end{eqnarray}

where \(T\) denotes the number of directions \(d_i\). This operation can equivalently be described in terms of the following conditions:

\begin{eqnarray} x - x_0 & \in & \text{span}\left\{d_1,\ldots,d_T\right\}, \\ \nabla f(x) & \perp & \text{span}\left\{d_1,\ldots,d_T\right\}. \end{eqnarray}

In this routine, we instead constrain \(x_{t}\) and \(\nabla f(x_{t})\) to satisfy

\begin{eqnarray} \forall i=1,\ldots,T: & \left< \nabla f(x);\, d_i \right> & = & 0,\\ \text{and } & \left< \nabla f(x);\, x - x_0 \right> & = & 0, \end{eqnarray}

which is a relaxation of the true line/span search conditions.

Note

The latest condition is automatically implied by the 2 previous ones.

Warning

One can notice this routine does not encode completely the fact that \(x_{t+1} - x_t\) must be a linear combination of the provided directions (i.e., this routine performs a relaxation). Therefore, if this routine is included in a PEP, the obtained value might be an upper bound on the true worst-case value.

Although not always tight, this relaxation is often observed to deliver pretty accurate results (in particular, it automatically produces tight results under some specific conditions, see, e.g., [1]). Two such examples are provided in the conjugate gradient and gradient with exact line search example files.

References

[1] Y. Drori and A. Taylor (2020). Efficient first-order methods for convex minimization: a constructive approach. Mathematical Programming 184 (1), 183-220.

Parameters:

x0 (Point) – the starting point.
f (Function) – the function on which the (sub)gradient will be evaluated.
directions (List of Points) – the list of all directions required to be orthogonal to the (sub)gradient of x.
name (Str) – the name of the point we arrive to.

Returns:

x (Point) – such that all vectors in directions are orthogonal to the (sub)gradient of f at x.
gx (Point) – a (sub)gradient of f at x.
fx (Expression) – the function f evaluated at x.

Proximal step 

PEPit.primitive_steps.proximal_step(x0, f, gamma)[source]

This routine performs a proximal step of step-size gamma, starting from x0, and on function f. That is, it performs:

\begin{eqnarray} x \triangleq \text{prox}_{\gamma f}(x_0) & \triangleq & \arg\min_x \left\{ \gamma f(x) + \frac{1}{2} \|x - x_0\|^2 \right\}, \\ & \Updownarrow & \\ 0 & = & \gamma g_x + x - x_0 \text{ for some } g_x\in\partial f(x),\\ & \Updownarrow & \\ x & = & x_0 - \gamma g_x \text{ for some } g_x\in\partial f(x). \end{eqnarray}

Parameters:

x0 (Point) – starting point x0.
f (Function) – function on which the proximal step is computed.
gamma (float) – step-size of the proximal step.

Returns:

x (Point) – proximal point.
gx (Point) – the (sub)gradient of f at x.
fx (Expression) – the function value of f on x.

Inexact proximal step 

PEPit.primitive_steps.inexact_proximal_step(x0, f, gamma, opt='PD_gapII')[source]

This routine encodes an inexact proximal operation with step size \(\gamma\). That is, it outputs a tuple \((x, g\in \partial f(x), f(x), w, v\in\partial f(w), f(w), \varepsilon)\) which are described as follows.

First, \(x\) is an approximation to the proximal point of \(x_0\) on function \(f\):

\[x \approx \mathrm{prox}_{\gamma f}(x_0)\triangleq\arg\min_x \left\{ \gamma f(x) + \frac{1}{2}\|x-x_0\|^2\right\},\]

where the meaning of \(\approx\) depends on the option “opt” and is explained below. The notions of inaccuracy implemented within this routine are specified using primal and dual proximal problems, denoted by

\begin{eqnarray} &\Phi^{(p)}_{\gamma f}(x; x_0) \triangleq \gamma f(x) + \frac{1}{2}\|x-x_0\|^2,\\ &\Phi^{(d)}_{\gamma f}(v; x_0) \triangleq -\gamma f^*(v)-\frac{1}{2}\|x_0-\gamma v\|^2 + \frac{1}{2}\|x_0\|^2,\\ \end{eqnarray}

where \(\Phi^{(p)}_{\gamma f}(x;x_0)\) and \(\Phi^{(d)}_{\gamma f}(v;x_0)\) respectively denote the primal and the dual proximal problems, and where \(f^*\) is the Fenchel conjugate of \(f\). The options below encode different meanings of “\(\approx\)” by specifying accuracy requirements on primal-dual pairs:

\[(x,v) \approx_{\varepsilon} \left(\mathrm{prox}_{\gamma f}(x_0),\,\mathrm{prox}_{f^*/\gamma}(x_0/\gamma)\right),\]

where \(\approx_{\varepsilon}\) corresponds to require the primal-dual pair \((x,v)\) to satisfy some primal-dual accuracy requirement:

\[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(v;x_0) \leqslant \varepsilon,\]

where \(\varepsilon\geqslant 0\) is the error magnitude, which is returned to the user so that one can constrain it to be bounded by some other values.

Relation to the exact proximal operation: In the exact case (no error in the computation, \(\varepsilon=0\)), \(v\) corresponds to the solution of the dual proximal problem and one can write

\[x = x_0-\gamma g,\]

with \(g=v=\mathrm{prox}_{f^*/\gamma}(x_0/\gamma)\in\partial f(x)\), and \(x=w\).

Reformulation of the primal-dual gap: In regard with the exact proximal computation; the inexact case under consideration here can be described as performing

\[x = x_0-\gamma v + e,\]

where \(v\) is an \(\epsilon\)-subgradient of \(f\) at \(x\) (notation \(v\in\partial_{\epsilon} f(x)\)) and \(e\) is some additional computation error. Those elements allow for a common convenient reformulation of the primal-dual gap, written in terms of the magnitudes of \(\epsilon\) and of \(e\):

\[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(v;x_0) = \frac{1}{2} \|e\|^2 + \gamma \epsilon.\]

Options: The following options are available (a list of such choices is presented in [4]; we provide a reference for each of those choices below).

‘PD_gapI’ : the constraint imposed on the output is the vanilla (see, e.g., [2])

\[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(v;x_0) \leqslant \varepsilon.\]

This approximation requirement is used in one PEPit example: an accelerated inexact forward backward.

‘PD_gapII’the constraint is stronger than the vanilla primal-dual gap, as more structure is imposed

(see, e.g., [1,5]) :

\[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(g;x_0) \leqslant \varepsilon,\]

where we imposed that \(v\triangleq g\in\partial f(x)\) and \(w\triangleq x\). This approximation requirement is used in two PEPit examples: in a relatively inexact proximal point algorithm and in a partially inexact Douglas-Rachford splitting.

‘PD_gapIII’the constraint is stronger than the vanilla primal-dual gap, as more structure is imposed

(see, e.g., [3]):

\[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(\tfrac{x_0 - x}{\gamma};x_0) \leqslant \varepsilon,\]

where we imposed that \(v \triangleq \frac{x_0 - x}{\gamma}\).

References

[1] R.T. Rockafellar (1976). Monotone operators and the proximal point algorithm. SIAM journal on control and optimization, 14(5), 877-898.

[2] R.D. Monteiro, B.F. Svaiter (2013). An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM Journal on Optimization, 23(2), 1092-1125.

[3] S. Salzo, S. Villa (2012). Inexact and accelerated proximal point algorithms. Journal of Convex analysis, 19(4), 1167-1192.

[4] M. Barre, A. Taylor, F. Bach (2020). Principled analyses and design of first-order methods with inexact proximal operators.

[5] A. d’Aspremont, D. Scieur, A. Taylor (2021). Acceleration Methods. Foundations and Trends in Optimization: Vol. 5, No. 1-2.

Parameters:

x0 (Point) – point for which we aim to compute an approximate proximal step.
f (Function) – function whose proximal operator is approximated.
gamma (float) – step size of the proximal step.
opt (string) – option (type of error requirement) among ‘PD_gapI’, ‘PD_gapII’, and ‘PD_gapIII’.

Returns:

x (Point) – the approximated proximal point.
gx (Point) – a (sub)gradient of f at x (subgradient used in evaluating the accuracy criterion).
fx (Expression) – f evaluated at x.
w (Point) – a point w such that v (see next output) is a subgradient of f at w.
v (Point) – the approximated proximal point of the dual problem, (sub)gradient of f evaluated at w.
fw (Expression) – f evaluated at w.
eps_var (Expression) – value of the primal-dual gap (which can be further bounded by the user).

Bregman gradient step 

PEPit.primitive_steps.bregman_gradient_step(gx0, sx0, mirror_map, gamma)[source]

This routine outputs \(x\) by performing a mirror step of step-size \(\gamma\). That is, denoting \(f\) the function to be minimized and \(h\) the mirror map, it performs

\[x = \arg\min_x \left[ f(x_0) + \left< \nabla f(x_0);\, x - x_0 \right> + \frac{1}{\gamma} D_h(x; x_0) \right],\]

where \(D_h(x; x_0)\) denotes the Bregman divergence of \(h\) on \(x\) with respect to \(x_0\).

\[D_h(x; x_0) \triangleq h(x) - h(x_0) - \left< \nabla h(x_0);\, x - x_0 \right>.\]

Warning

The mirror map \(h\) is assumed differentiable.

By differentiating the previous objective function, one can observe that

\[\nabla h(x) = \nabla h(x_0) - \gamma \nabla f(x_0).\]

Parameters:

sx0 (Point) – starting gradient \(\textbf{sx0} \triangleq \nabla h(x_0)\).
gx0 (Point) – descent direction \(\textbf{gx0} \triangleq \nabla f(x_0)\).
mirror_map (Function) – the reference function \(h\) we computed Bregman divergence of.
gamma (float) – step size.

Returns:

x (Point) – new iterate \(\textbf{x} \triangleq x\).
sx (Point) – \(h\)’s gradient on new iterate \(x\) \(\textbf{sx} \triangleq \nabla h(x)\).
hx (Expression) – \(h\)’s value on new iterate \(\textbf{hx} \triangleq h(x)\).

Bregman proximal step 

PEPit.primitive_steps.bregman_proximal_step(sx0, mirror_map, min_function, gamma)[source]

This routine outputs \(x\) by performing a proximal mirror step of step-size \(\gamma\). That is, denoting \(f\) the function to be minimized and \(h\) the mirror map, it performs

\[x = \arg\min_x \left[ f(x) + \frac{1}{\gamma} D_h(x; x_0) \right],\]

where \(D_h(x; x_0)\) denotes the Bregman divergence of \(h\) on \(x\) with respect to \(x_0\).

\[D_h(x; x_0) \triangleq h(x) - h(x_0) - \left< \nabla h(x_0);\, x - x_0 \right>.\]

Warning

The mirror map \(h\) is assumed differentiable.

By differentiating the previous objective function, one can observe that

\[\nabla h(x) = \nabla h(x_0) - \gamma \nabla f(x).\]

Parameters:

sx0 (Point) – starting gradient \(\textbf{sx0} \triangleq \nabla h(x_0)\).
mirror_map (Function) – the reference function \(h\) we computed Bregman divergence of.
min_function (Function) – function we aim to minimize.
gamma (float) – step size.

Returns:

x (Point) – new iterate \(\textbf{x} \triangleq x\).
sx (Point) – \(h\)’s gradient on new iterate \(x\) \(\textbf{sx} \triangleq \nabla h(x)\).
hx (Expression) – \(h\)’s value on new iterate \(\textbf{hx} \triangleq h(x)\).
gx (Point) – \(f\)’s gradient on new iterate \(x\) \(\textbf{gx} \triangleq \nabla f(x)\).
fx (Expression) – \(f\)’s value on new iterate \(\textbf{fx} \triangleq f(x)\).

Linear optimization step 

PEPit.primitive_steps.linear_optimization_step(dir, ind)[source]

This routine outputs the result of a minimization problem with linear objective (whose direction is provided by dir) on the domain of the (closed convex) indicator function ind. That is, it outputs a solution to

\[\arg\min_{\text{ind}(x)=0} \left< \text{dir};\, x \right>,\]

One can notice that \(x\) is solution of this problem if and only if

\[- \text{dir} \in \partial \text{ind}(x).\]

Linear optimization oracles are classically used in conditional gradient-type algorithm (a.k.a., Frank-Wolfe) [1].

References

[1] M. Frank, P. Wolfe (1956). An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2), 95-110.

Parameters:

dir (Point) – direction of optimization
ind (ConvexIndicatorFunction) – convex indicator function

Returns:

x (Point) – oracle output.
gx (Point) – the (sub)gradient of ind on x.
fx (Expression) – the function value of ind on x.

Linearly shifted optimization step 

PEPit.primitive_steps.shifted_optimization_step(dir, f)[source]

This routine outputs a stationary point of a minimization problem:

\[\arg\min_{x} f(x)-\left< \text{dir};\, x \right>.\]

That is, it outputs \(x\) such that

\[\text{dir} \in \partial f(x).\]

Shifted optimization oracles are classically used in difference-of-convex algorithms (a.k.a., convex-concave procedure), see, e.g., [1].

References

[1] H.A. Le Thi, T. Pham Dinh (2018). DC programming and DCA: thirty years of developments. Mathematical Programming, 169(1), 5-68.

Parameters:

dir (Point) – direction/linear shift in the objective of the optimization problem
f (Function) – function

Returns:

x (Point) – oracle output.
gx (Point) – the (sub)gradient of f at x.
fx (Expression) – the function value of f at x.

Epsilon-subgradient step 

PEPit.primitive_steps.epsilon_subgradient_step(x0, f, gamma)[source]

This routine performs a step \(x \leftarrow x_0 - \gamma g_0\) where \(g_0 \in\partial_{\varepsilon} f(x_0)\). That is, \(g_0\) is an \(\varepsilon\)-subgradient of \(f\) at \(x_0\). The set \(\partial_{\varepsilon} f(x_0)\) (referred to as the \(\varepsilon\)-subdifferential) is defined as (see [1, Section 3])

\[\partial_{\varepsilon} f(x_0)=\left\{g_0:\,\forall z,\, f(z)\geqslant f(x_0)+\left< g_0;\, z-x_0 \right>-\varepsilon \right\}.\]

An alternative characterization of \(g_0 \in\partial_{\varepsilon} f(x_0)\) consists in writing

\[f(x_0)+f^*(g_0)-\left< g_0;x_0\right>\leqslant \varepsilon.\]

References

[1] A. Brøndsted, R.T. Rockafellar. On the subdifferentiability of convex functions. Proceedings of the American Mathematical Society 16(4), 605–611 (1965)

Parameters:

x0 (Point) – starting point x0.
f (Function) – a function.
gamma (float) – the step size parameter.

Returns:

x (Point) – the output point.
g0 (Point) – an \(\varepsilon\)-subgradient of f at x0.
f0 (Expression) – the value of the function f at x0.
epsilon (Expression) – the value of epsilon.