Welcome to PEPit’s documentation!

Quick start guide

The toolbox implements the performance estimation approach, pioneered by Drori and Teboulle [2]. A gentle introduction to performance estimation problems is provided in this blog post.

The PEPit implementation is in line with the framework as exposed in [3,4] and follow-up works (for which proper references are provided in the example files). A gentle introduction to the toolbox is provided in [1].

When to use PEPit?

The general purpose of the toolbox is to help the researchers producing worst-case guarantees for their favorite first-order methods.

This toolbox is presented under the form of a Python package. For people who are more comfortable with Matlab, we report to PESTO.

How tu use PEPit?

Installation

PEPit is available on pypi, hence can be installed very simply by running

pip install pepit

Now you are all set! You should be able to run

import PEPit

in an Python interpreter.

Basic usage: getting worst-case guarantees

The main object is called a PEP. It stores the problem you will describe to PEPit.

First create a PEP object.

from PEPit import PEP
problem = PEP()

From now, you can declare functions thanks to the declare_function method.

from PEPit.functions import SmoothConvexFunction
func = problem.declare_function(SmoothConvexFunction, L=L)

Warning

To enforce the same subgradient to be returned each time one is required, we introduced the attribute reuse_gradient in the Function class. Some classes of functions contain only differentiable functions (e.g. smooth convex function). In those, the reuse_gradient attribute is set to True by default.

When the same subgradient is used several times in the same code and when it is difficult to to keep track of it (through proximal calls for instance), it may be useful to set this parameter to True even if the function is not differentiable. This helps reducing the number of constraints, and improve the accuracy of the underlying semidefinite program. See for instance the code for improved interior method or no Lips in Bregman divergence.

You can also define a new point with

x0 = problem.set_initial_point()

and give a name to the value of func on x0

f0 = func(x0)

as well as the (sub)gradient of func on x0

g0 = func.gradient(x0)

or

g0 = func.subgradient(x0)

There is a more compact way to do it using the oracle method.

g0, f0 = func.oracle(x0)

You can declare a stationary point of func, defined as a point which gradient on func is zero, as follow:

xs = func.stationary_point()

You can combine points and gradients naturally

x = x0
for _ in range(n):
    x = x - gamma * func.gradient(x)

You must declare some initial conditions like

problem.set_initial_condition((x0 - xs) ** 2 <= 1)

as well as performance metrics like

problem.set_performance_metric(func(x) - fs)

Finally, you can ask PEPit to solve the system for you and return the worst-case guarantee of your method.

pepit_tau = problem.solve()

Warning

Performance estimation problems consist in reformulating the problem of finding a worst-case scenario as a semidefinite program (SDP). The dimension of the corresponding SDP is directly related to the number of function and gradient evaluations in a given code.

We encourage the users to perform as few function and subgradient evaluations as possible, as the size of the corresponding SDP grows with the number of subgradient/function evaluations at different points.

Derive proofs and adversarial objectives

When one can the solve method, PEPit does much more that just finding the worst-case value.

In particular, it stores possible values of each points, gradients and function values that achieve this worst-case guarantee, as well as the dual variable values associated with each constraint.

Values and dual variables values

Let’s consider the above example. After solving the PEP, you can ask PEPit

print(x.eval())

which returns one possible value of the output of the described algorithm at optimum.

You can also ask for gradients and function values

print(func.gradient(x).eval())
print(func(x).eval())

Recovering the values of all the points, gradients and function values at optimum allows you to reconstruct the function that achieves the worst-case complexity of your method.

You can also get the dual variables values of constraints at optimum, which essentially allows you to write the proof of the worst-case guarantee you just obtained.

Let’s consider again the previous example, but this time, let’s give a name to a constraint before using it.

constraint = (x0 - xs) ** 2 <= 1
problem.set_initial_condition(constraint)

Then, after solving the system, you can require its associated dual variable value with

constraint.eval_dual()
Output pdf

In a later release, we will provide an option to output a pdf file summarizing all those pieces of information.

Simpler worst-case scenarios

Sometimes, there are several solutions to the PEP problem. For obtaining simpler worst-case scenarios, one would prefer a low dimension solutions to the SDP. To this end, we provide heuristics based on the trace norm or log det minimization for reducing the dimension of the numerical solution to the SDP.

You can use the trace heuristic by specifying

problem.solve(dimension_reduction_heuristic="trace")

You can use the n iteration of the log det heuristic by specifying “logdetn”. For example, for using 5 iterations of the logdet heuristic:

problem.solve(dimension_reduction_heuristic="logdet5")

Finding Lyapunov

In a later release, we will provide tools to help finding good Lyapunov functions to study a given method.

This tool will be based on the very recent work [7].

References

[1] B. Goujaud, C. Moucer, F. Glineur, J. Hendrickx, A. Taylor, A. Dieuleveut. PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python.

[2] Drori, Yoel, and Marc Teboulle. Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming 145.1-2 (2014): 451-482

[3] Taylor, Adrien B., Julien M. Hendrickx, and François Glineur. Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Mathematical Programming 161.1-2 (2017): 307-345.

[4] Taylor, Adrien B., Julien M. Hendrickx, and François Glineur. Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization 27.3 (2017): 1283-1313.

[5] Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research (JMLR) 17.83.1–5 (2016).

[6] Agrawal, Akshay and Verschueren, Robin and Diamond, Steven and Boyd, Stephen. A rewriting system for convex optimization problems. Journal of Control and Decision (JCD) 5.1.42–60 (2018).

[7] Adrien Taylor, Bryan Van Scoy, Laurent Lessard. Lyapunov Functions for First-Order Methods: Tight Automated Convergence Guarantees. International Conference on Machine Learning (ICML).

API and modules

Main modules

PEP

class PEPit.PEP[source]

Bases: object

The class PEP is the main class of this framework. A PEP object encodes a complete performance estimation problem. It stores the following information.

Attributes
  • list_of_functions (list) – list of leaf Function objects that are defined through the pipeline.

  • list_of_points (list) – list of Point objects that are defined out of the scope of a Function. Typically the initial Point.

  • list_of_constraints (list) – list of Constraint objects that are defined out of the scope of a Function. Typically the initial Constraint.

  • list_of_performance_metrics (list) – list of Expression objects. The pep maximizes the minimum of all performance metrics.

  • counter (int) – counts the number of PEP objects. Ideally, only one is defined at a time.

A PEP object can be instantiated without any argument

Example

>>> pep = PEP()
add_constraint(constraint)[source]

Store a new Constraint to the list of constraints of this PEP.

Parameters

constraint (Constraint) – typically resulting from a comparison of 2 Expression objects.

Raises

AssertionError – if provided constraint is not a Constraint object.

add_psd_matrix(matrix_of_expressions)[source]

Store a new matrix of Expressions that we enforce to be positive semidefinite.

Parameters

matrix_of_expressions (2D ndarray of Expression) – a square matrix of Expression.

Raises
  • AssertionError – if provided matrix is not a square matrix.

  • TypeError – if provided matrix does not contain only Expressions.

declare_function(function_class, **kwargs)[source]

Instantiate a leaf Function and store it in the attribute list_of_functions.

Parameters
  • function_class (class) – a subclass of Function that overwrite the add_class_constraints method.

  • kwargs (dict) – dictionary of parameters that characterize the function class. Can also contains the boolean reuse_gradient, that enforces using only one subgradient per point.

Returns

f (Function) – the newly created function.

static get_nb_eigenvalues_and_corrected_matrix(M)[source]

Compute the number of True non zero eigenvalues of M, and recompute M with corrected eigenvalues.

Parameters

M (nd.array) – a 2 dimensional array, supposedly symmetric.

Returns
  • nb_eigenvalues (int) – The number of eigenvalues of M estimated to be strictly positive zero.

  • eig_threshold (float) – The threshold used to determine whether an eigenvalue is 0 or not.

  • corrected_S (nd.array) – Updated M with zero eigenvalues instead of small ones.

send_constraint_to_cvxpy(constraint, F, G)[source]

Transform a PEPit Constraint into a CVXPY one.

Parameters
  • constraint (Constraint) – a Constraint object to be sent to CVXPY.

  • F (CVXPY Variable) – a CVXPY Variable referring to function values.

  • G (CVXPY Variable) – a CVXPY Variable referring to points and gradients.

Returns

cvxpy_constraint (CVXPY constraint) – the corresponding CVXPY constraint.

:raises ValueError if the attribute equality_or_inequality of the Constraint: :raises is neither equality, nor inequality.:

send_lmi_constraint_to_cvxpy(psd_counter, psd_matrix, F, G, verbose)[source]

Transform a PEPit PSDMatrix into a CVXPY symmetric PSD matrix.

Parameters
  • psd_counter (int) – a counter useful for the verbose mode.

  • psd_matrix (PSDMatrix) – a matrix of expressions that is constrained to be PSD.

  • F (CVXPY Variable) – a CVXPY Variable referring to function values.

  • G (CVXPY Variable) – a CVXPY Variable referring to points and gradients.

  • verbose (int) –

    Level of information details to print (Override the CVXPY solver verbose parameter).

    • 0: No verbose at all.

    • 1: PEPit information is printed but not CVXPY’s

    • 2: Both PEPit and CVXPY details are printed

Returns

cvxpy_constraints_list (list of CVXPY constraints) – the PSD constraint as well as correspondence between the matrix and its elements.

set_initial_condition(condition)[source]

Store a new Constraint to the list of constraints of this PEP. Typically an condition of the form \(\|x_0 - x_\star\|^2 \leq 1\).

Parameters

condition (Constraint) – typically resulting from a comparison of 2 Expression objects.

Raises

AssertionError – if provided constraint is not a Constraint object.

set_initial_point()[source]

Create a new leaf Point and store it in the attribute list_of_points.

Returns

x (Point) – the newly created Point.

set_performance_metric(expression)[source]

Store a performance metric in the attribute list_of_performance_metrics. The objective of the PEP (which is maximized) is the minimum of the elements of list_of_performance_metrics.

Parameters

expression (Expression) – a new performance metric.

solve(verbose=1, return_full_cvxpy_problem=False, dimension_reduction_heuristic=None, eig_regularization=0.001, tol_dimension_reduction=1e-05, **kwargs)[source]

Transform the PEP under the SDP form, and solve it.

Parameters
  • verbose (int) – Level of information details to print (Override the CVXPY solver verbose parameter). 0: No verbose at all 1: PEPit information is printed but not CVXPY’s 2: Both PEPit and CVXPY details are printed

  • return_full_cvxpy_problem (bool) – If True, return the cvxpy Problem object. If False, return the worst case value only. Set to False by default.

  • dimension_reduction_heuristic (str, optional) –

    An heuristic to reduce the dimension of the solution (rank of the Gram matrix). Set to None to deactivate it (default value). Available heuristics are:

    • ”trace”: minimize \(Tr(G)\)

    • ”logdet{an integer n}”: minimize \(\log\left(\mathrm{Det}(G)\right)\) using n iterations of local approximation problems.

  • eig_regularization (float, optional) – The regularization we use to make \(G + \mathrm{eig_regularization}I_d \succ 0\). (only used when “dimension_reduction_heuristic” is not None) The default value is 1e-5.

  • tol_dimension_reduction (float, optional) – The error tolerance in the heuristic minimization problem. Precisely, the second problem minimizes “optimal_value - tol” (only used when “dimension_reduction_heuristic” is not None) The default value is 1e-5.

  • kwargs (keywords, optional) – Additional CVXPY solver specific arguments.

Returns

float or cp.Problem – Value of the performance metric of cp.Problem object corresponding to the SDP. The value only is returned by default.

Point

class PEPit.Point(is_leaf=True, decomposition_dict=None)[source]

Bases: object

A Point encodes an element of a pre-Hilbert space, either a point or a gradient.

Attributes
  • _is_leaf (bool) – True if self is defined from scratch (not as linear combination of other Point objects). False if self is defined as linear combination of other points.

  • _value (nd.array) – numerical value of self obtained after solving the PEP via SDP solver. Set to None before the call to the method PEP.solve from the PEP.

  • decomposition_dict (dict) – decomposition of self as a linear combination of leaf Point objects. Keys are Point objects. And values are their associated coefficients.

  • counter (int) – counts the number of leaf Point objects.

Point objects can be added or subtracted together. They can also be multiplied and divided by a scalar value.

Example

>>> point1 = Point()
>>> point2 = Point()
>>> new_point = (- point1 + point2) / 5

As in any pre-Hilbert space, there exists a scalar product. Therefore, Point objects can be multiplied together.

Example

>>> point1 = Point()
>>> point2 = Point()
>>> new_expr = point1 * point2

The output is a scalar of type Expression.

The corresponding squared norm can also be computed.

Example

>>> point = Point()
>>> new_expr = point ** 2

Point objects can also be instantiated via the following arguments

Parameters
  • is_leaf (bool) – True if self is a Point defined from scratch (not as linear combination of other Point objects). False if self is a linear combination of existing Point objects.

  • decomposition_dict (dict) – decomposition of self as a linear combination of leaf Point objects. Keys are Point objects. And values are their associated coefficients.

Note

If is_leaf is True, then decomposition_dict must be provided as None. Then self.decomposition_dict will be set to {self: 1}.

Instantiating the Point object of the first example can be done by

Example

>>> point1 = Point()
>>> point2 = Point()
>>> new_point = Point(is_leaf=False, decomposition_dict = {point1: -1/5, point2: 1/5})
eval()[source]

Compute, store and return the value of this Point.

Returns

self._value (np.array) – The value of this Point after the corresponding PEP was solved numerically.

Raises

ValueError("The PEP must be solved to evaluate Points!") if the PEP has not been solved yet.

get_is_leaf()[source]
Returns

self._is_leaf (bool) – allows to access the protected attribute _is_leaf.

Expression

class PEPit.Expression(is_leaf=True, decomposition_dict=None)[source]

Bases: object

An Expression is a linear combination of functions values, inner products of points and / or gradients (product of 2 Point objects), and constant scalar values.

Attributes
  • _is_leaf (bool) – True if self is a function value defined from scratch (not as linear combination of other function values). False if self is a linear combination of existing Expression objects.

  • _value (float) – numerical value of self obtained after solving the PEP via SDP solver. Set to None before the call to the method PEP.solve from the PEP.

  • decomposition_dict (dict) – decomposition of self as a linear combination of leaf Expression objects. Keys are Expression objects or tuple of 2 Point objects. And values are their associated coefficients.

  • counter (int) – counts the number of leaf Expression objects.

Expression objects can be added or subtracted together. They can also be added, subtracted, multiplied and divided by a scalar value.

Example

>>> expr1 = Expression()
>>> expr2 = Expression()
>>> new_expr = (- expr1 + expr2 - 1) / 5

Expression objects can also be compared together

Example

>>> expr1 = Expression()
>>> expr2 = Expression()
>>> inequality1 = expr1 <= expr2
>>> inequality2 = expr1 >= expr2
>>> equality = expr1 == expr2

The three outputs inequality1, inequality2 and equality are then Constraint objects.

Expression objects can also be instantiated via the following arguments

Parameters
  • is_leaf (bool) – True if self is a function value defined from scratch (not as linear combination of other function values). False if self is a linear combination of existing Expression objects.

  • decomposition_dict (dict) – decomposition of self as a linear combination of leaf Expression objects. Keys are Expression objects or tuple of 2 Point objects. And values are their associated coefficients.

Note

If is_leaf is True, then decomposition_dict must be provided as None. Then self.decomposition_dict will be set to {self: 1}.

Instantiating the Expression object of the first example can be done by

Example

>>> expr1 = Expression()
>>> expr2 = Expression()
>>> new_expr = Expression(is_leaf=False, decomposition_dict = {expr1: -1/5, expr2: 1/5, 1: -1/5})
eval()[source]

Compute, store and return the value of this Expression.

Returns

self._value (np.array) – Value of this Expression after the corresponding PEP was solved numerically.

Raises
  • ValueError("The PEP must be solved to evaluate Expressions!") if the PEP has not been solved yet.

  • TypeError("Expressions are made of function values, inner products and constants only!")

get_is_leaf()[source]
Returns

self._is_leaf (bool) – allows to access the protected attribute _is_leaf.

Constraint

class PEPit.Constraint(expression, equality_or_inequality)[source]

Bases: object

A Constraint encodes either an equality or an inequality between two Expression objects.

A Constraint must be understood either as self.expression = 0 or self.expression \(\leqslant\) 0 depending on the value of self.equality_or_inequality.

Attributes
  • expression (Expression) – The Expression that is compared to 0.

  • equality_or_inequality (str) – “equality” or “inequality”. Encodes the type of constraint.

  • _value (float) – numerical value of self.expression obtained after solving the PEP via SDP solver. Set to None before the call to the method PEP.solve from the PEP.

  • _dual_variable_value (float) – the associated dual variable from the numerical solution to the corresponding PEP. Set to None before the call to PEP.solve from the PEP

  • counter (int) – counts the number of Constraint objects.

A Constraint results from a comparison between two Expression objects.

Example

>>> from PEPit import Expression
>>> expr1 = Expression()
>>> expr2 = Expression()
>>> inequality1 = expr1 <= expr2
>>> inequality2 = expr1 >= expr2
>>> equality = expr1 == expr2

Constraint objects can also be instantiated via the following arguments.

Parameters
  • expression (Expression) – an object of class Expression

  • equality_or_inequality (str) – either ‘equality’ or ‘inequality’.

Instantiating the Constraint objects of the first example can be done by

Example

>>> from PEPit import Expression
>>> expr1 = Expression()
>>> expr2 = Expression()
>>> inequality1 = Constraint(expression=expr1-expr2, equality_or_inequality="inequality")
>>> inequality2 = Constraint(expression=expr2-expr1, equality_or_inequality="inequality")
>>> equality = Constraint(expression=expr1-expr2, equality_or_inequality="equality")
Raises

AssertionError – if provided equality_or_inequality argument is neither “equality” nor “inequality”.

eval()[source]

Compute, store and return the value of the underlying Expression of this Constraint.

Returns

self._value (np.array) – The value of the underlying Expression of this Constraint after the corresponding PEP was solved numerically.

Raises

ValueError("The PEP must be solved to evaluate Constraints!") if the PEP has not been solved yet.

eval_dual()[source]

Compute, store and return the value of the dual variable of this Constraint.

Returns

self._dual_variable_value (float) – The value of the dual variable of this Constraint after the corresponding PEP was solved numerically.

Raises

ValueError("The PEP must be solved to evaluate Constraints dual variables!") if the PEP has not been solved yet.

Symmetric positive semidefinite matrix

class PEPit.PSDMatrix(matrix_of_expressions)[source]

Bases: object

A PSDMatrix encodes a square matrix of Expression objects that is constrained to be symmetric PSD.

Attributes
  • matrix_of_expressions (2D ndarray of Expression) – a square matrix of Expression objects.

  • shape (tuple of ints) – the shape of the underlying matrix of Expression objects.

  • _value (2D ndarray of floats) – numerical values of Expression objects obtained after solving the PEP via SDP solver. Set to None before the call to the method PEP.solve from the PEP.

  • _dual_variable_value (2D ndarray of floats) – the associated dual matrix from the numerical solution to the corresponding PEP. Set to None before the call to PEP.solve from the PEP.

  • entries_dual_variable_value (2D ndarray of floats) – the dual of each correspondence between entries of the matrix and the underlying Expression objects.

  • counter (int) – counts the number of PSDMatrix objects.

Example

>>> # Defining t <= sqrt(expr) for a given expression expr.
>>> import numpy as np
>>> from PEPit import Expression
>>> from PEPit import PSDMatrix
>>> expr = Expression()
>>> t = Expression()
>>> psd_matrix = PSDMatrix(matrix_of_expressions=np.array([[expr, t], [t, 1]]))
>>> # The last line means that the matrix [[expr, t], [t, 1]] is constrained to be PSD.
>>> # This is equivalent to det([[expr, t], [t, 1]]) >= 0, i.e. expr - t^2 >= 0.

PSDMatrix objects are instantiated via the following argument.

Parameters

matrix_of_expressions (2D ndarray of Expression) – a square matrix of Expression.

Instantiating the PSDMatrix objects of the first example can be done by

Example

>>> import numpy as np
>>> from PEPit import Expression
>>> from PEPit import PSDMatrix
>>> matrix_of_expressions = np.array([Expression() for i in range(4)]).reshape(2, 2)
>>> psd_matrix = PSDMatrix(matrix_of_expressions=matrix_of_expressions)
Raises
  • AssertionError – if provided matrix is not a square matrix.

  • TypeError – if provided matrix does not contain only Expressions and / or scalar values.

eval()[source]

Compute, store and return the value of the underlying matrix of Expression objects.

Returns

self._value (np.array) – The value of the underlying matrix of Expression objects after the corresponding PEP was solved numerically.

Raises

ValueError("The PEP must be solved to evaluate PSDMatrix!") if the PEP has not been solved yet.

eval_dual()[source]

Compute, store and return the value of the dual variable of this PSDMatrix.

Returns

self._dual_variable_value (ndarray of floats) – The value of the dual variable of this PSDMatrix after the corresponding PEP was solved numerically.

Raises

ValueError("The PEP must be solved to evaluate PSDMatrix dual variables!") if the PEP has not been solved yet.

Function

class PEPit.Function(is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]

Bases: object

A Function object encodes a function or an operator.

Warning

This class must be overwritten by a child class that encodes some conditions on the Function. In particular, the method add_class_constraints must be overwritten. See the PEPit.functions and PEPit.operators modules.

Some Function objects are defined from scratch as leaf Function objects, and some are linear combinations of pre-existing ones.

Attributes
  • _is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaves.

  • decomposition_dict (dict) – decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

  • list_of_points (list) – A list of triplets storing the points where this Function has been evaluated, as well as the associated subgradients and function values.

  • list_of_stationary_points (list) – The sublist of self.list_of_points of stationary points (characterized by some subgradient=0).

  • list_of_constraints (list) – The list of Constraint objects associated with this Function.

  • counter (int) – counts the number of leaf Function objects.

Note

PEPit was initially tough for evaluating performances of optimization algorithms. Operators are represented in the same way as functions, but function values must not be used (they don’t have any sense in this framework). Use gradient to access an operator value.

Function objects can be added or subtracted together. They can also be multiplied and divided by a scalar value.

Example

>>> func1 = Function()
>>> func2 = Function()
>>> new_func = (- func1 + func2) / 5

Function objects can also be instantiated via the following arguments.

Parameters
  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as a linear combination of leaves.

  • decomposition_dict (dict) – Decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

Note

If is_leaf is True, then decomposition_dict must be provided as None. Then self.decomposition_dict will be set to {self: 1}.

Note

reuse_gradient is typically set to True when this Function is differentiable, that is there exists only one subgradient per Point.

Instantiating the Function object of the first example can be done by

Example

>>> func1 = Function()
>>> func2 = Function()
>>> new_func = Function(is_leaf=False, decomposition_dict = {func1: -1/5, func2: 1/5})
add_class_constraints()[source]

Warning

Needs to be overwritten with interpolation conditions (or necessary conditions for interpolation for obtaining possibly non-tight upper bounds on the worst-case performance).

This method is run by the PEP just before solving the problem. It evaluates interpolation conditions for the 2 lists of points that is stored in this Function.

Raises

NotImplementedError – This method must be overwritten in children classes

add_constraint(constraint)[source]

Store a new Constraint to the list of constraints of this Function.

Parameters

constraint (Constraint) – typically resulting from a comparison of 2 Expression objects.

Raises

AssertionError – if provided constraint is not a Constraint object.

add_point(triplet)[source]

Add a triplet (point, gradient, function_value) to the list of points of this function.

Parameters

triplet (tuple) – A tuple containing 3 elements: point (Point), gradient (Point), and function value (Expression).

add_psd_matrix(matrix_of_expressions)[source]

Store a new matrix of Expressions that we enforce to be positive semidefinite.

Parameters

matrix_of_expressions (2D ndarray of Expression) – a square matrix of Expression.

Raises
  • AssertionError – if provided matrix is not a square matrix.

  • TypeError – if provided matrix does not contain only Expressions.

fixed_point()[source]

This routine outputs a fixed point of this function, that is \(x\) such that \(x\in\partial f(x)\). If self is an operator \(A\), the fixed point is such that \(Ax = x\).

Returns
  • x (Point) – a fixed point of the differential of self.

  • x (Point) – nabla f(x) = x.

  • fx (Expression) – a function value (useful only if self is a function).

get_is_leaf()[source]
Returns

self._is_leaf (bool) – allows to access the protected attribute _is_leaf.

gradient(point)[source]

Return the gradient (or a subgradient) of this Function evaluated at point.

Parameters

point (Point) – any point.

Returns

Point – a gradient (Point) of this Function on point (Point).

Note

the method subgradient does the exact same thing.

oracle(point)[source]

Return a gradient (or a subgradient) and the function value of self evaluated at point.

Parameters

point (Point) – any point.

Returns

tuple – a (sub)gradient (Point) and a function value (Expression).

stationary_point(return_gradient_and_function_value=False)[source]

Create a new stationary point, as well as its zero gradient and its function value.

Parameters

return_gradient_and_function_value (bool) – if True, return the triplet point (Point), gradient (Point), function value (Expression). Otherwise, return only the point (Point).

Returns

Point or tuple – an optimal point

subgradient(point)[source]

Return a subgradient of this Function evaluated at point.

Parameters

point (Point) – any point.

Returns

Point – a subgradient (Point) of this Function on point (Point).

Note

the method gradient does the exact same thing.

value(point)[source]

Return the function value of this Function on point.

Parameters

point (Point) – any point.

Returns

Point – the function value (Expression) of this Function on point (Point).

Functions classes

Functions

Convex
class PEPit.functions.ConvexFunction(is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]

Bases: Function

The ConvexFunction class overwrites the add_class_constraints method of Function, implementing the interpolation constraints of the class of convex, closed and proper (CCP) functions (i.e., convex functions whose epigraphs are non-empty closed sets).

General CCP functions are not characterized by any parameter, hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.functions import ConvexFunction
>>> problem = PEP()
>>> func = problem.declare_function(function_class=ConvexFunction)
Parameters
  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.

  • decomposition_dict (dict) – Decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

add_class_constraints()[source]

Formulates the list of interpolation constraints for self (CCP function).

Strongly convex
class PEPit.functions.StronglyConvexFunction(mu, is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]

Bases: Function

The StronglyConvexFunction class overwrites the add_class_constraints method of Function, implementing the interpolation constraints of the class of strongly convex closed proper functions (strongly convex functions whose epigraphs are non-empty closed sets).

Attributes

mu (float) – strong convexity parameter

Strongly convex functions are characterized by the strong convexity parameter \(\mu\), hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.functions import StronglyConvexFunction
>>> problem = PEP()
>>> func = problem.declare_function(function_class=StronglyConvexFunction, mu=.1)

References

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Mathematical Programming, 161(1-2), 307-345.

Parameters
  • mu (float) – The strong convexity parameter.

  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.

  • decomposition_dict (dict) – Decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

add_class_constraints()[source]

Formulates the list of interpolation constraints for self (strongly convex closed proper function), see [1, Corollary 2].

Smooth
class PEPit.functions.SmoothFunction(L=1.0, is_leaf=True, decomposition_dict=None, reuse_gradient=True)[source]

Bases: Function

The SmoothFunction class overwrites the add_class_constraints method of Function, implementing the interpolation constraints of the class of smooth (not necessarily convex) functions.

Attributes

L (float) – smoothness parameter

Smooth functions are characterized by the smoothness parameter L, hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.functions import SmoothFunction
>>> problem = PEP()
>>> func = problem.declare_function(function_class=SmoothFunction, L=1.)

References

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

Parameters
  • L (float) – The smoothness parameter.

  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.

  • decomposition_dict (dict) – Decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

Note

Smooth functions are necessarily differentiable, hence reuse_gradient is set to True.

add_class_constraints()[source]

Formulates the list of interpolation constraints for self (smooth (not necessarily convex) function), see [1, Theorem 3.10].

Convex and smooth
class PEPit.functions.SmoothConvexFunction(L=1.0, is_leaf=True, decomposition_dict=None, reuse_gradient=True)[source]

Bases: SmoothStronglyConvexFunction

The SmoothConvexFunction class implements smooth convex functions as particular cases of SmoothStronglyConvexFunction.

Attributes

L (float) – smoothness parameter

Smooth convex functions are characterized by the smoothness parameter L, hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.functions import SmoothConvexFunction
>>> problem = PEP()
>>> func = problem.declare_function(function_class=SmoothConvexFunction, L=1.)
Parameters
  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.

  • decomposition_dict (dict) – Decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

  • L (float) – The smoothness parameter.

Note

Smooth convex functions are necessarily differentiable, hence reuse_gradient is set to True.

Convex and quadratically upper bounded
class PEPit.functions.ConvexQGFunction(L=1, is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]

Bases: Function

The ConvexQGFunction class overwrites the add_class_constraints method of Function, implementing the interpolation constraints of the class of quadratically upper bounded (\(\text{QG}^+\) [1]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex functions.

Attributes

L (float) – The quadratic upper bound parameter

General quadratically upper bounded (\(\text{QG}^+\)) convex functions are characterized by the quadratic growth parameter L, hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.functions import ConvexQGFunction
>>> problem = PEP()
>>> func = problem.declare_function(function_class=ConvexQGFunction, L=1)

References:

[1] B. Goujaud, A. Taylor, A. Dieuleveut (2022). Optimal first-order methods for convex functions with a quadratic upper bound.

Parameters
  • L (float) – The quadratic upper bound parameter.

  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.

  • decomposition_dict (dict) – decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

add_class_constraints()[source]

Formulates the list of interpolation constraints for self (quadratically maximally growing convex function); see [1, Theorem 2.6].

Strongly convex and smooth
class PEPit.functions.SmoothStronglyConvexFunction(mu, L=1.0, is_leaf=True, decomposition_dict=None, reuse_gradient=True)[source]

Bases: Function

The SmoothStronglyConvexFunction class overwrites the add_class_constraints method of Function, by implementing interpolation constraints of the class of smooth strongly convex functions.

Attributes
  • mu (float) – strong convexity parameter

  • L (float) – smoothness parameter

Smooth strongly convex functions are characterized by parameters \(\mu\) and L, hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.functions import SmoothStronglyConvexFunction
>>> problem = PEP()
>>> func = problem.declare_function(function_class=SmoothStronglyConvexFunction, mu=.1, L=1.)

References

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Mathematical Programming, 161(1-2), 307-345.

Parameters
  • mu (float) – The strong convexity parameter.

  • L (float) – The smoothness parameter.

  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.

  • decomposition_dict (dict) – Decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

Note

Smooth strongly convex functions are necessarily differentiable, hence reuse_gradient is set to True.

add_class_constraints()[source]

Formulates the list of interpolation constraints for self (smooth strongly convex function); see [1, Theorem 4].

Convex and Lipschitz continuous
class PEPit.functions.ConvexLipschitzFunction(M=1.0, is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]

Bases: Function

The ConvexLipschitzFunction class overwrites the add_class_constraints method of Function, implementing the interpolation constraints of the class of convex closed proper (CCP) Lipschitz continuous functions.

Attributes

M (float) – Lipschitz parameter

CCP Lipschitz continuous functions are characterized by a parameter M, hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.functions import ConvexLipschitzFunction
>>> problem = PEP()
>>> func = problem.declare_function(function_class=ConvexLipschitzFunction, M=1.)

References

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

Parameters
  • M (float) – The Lipschitz continuity parameter of self.

  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.

  • decomposition_dict (dict) – Decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

add_class_constraints()[source]

Formulates the list of interpolation constraints for self (CCP Lipschitz continuous function), see [1, Theorem 3.5].

Convex indicator
class PEPit.functions.ConvexIndicatorFunction(D=inf, is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]

Bases: Function

The ConvexIndicatorFunction class overwrites the add_class_constraints method of Function, implementing interpolation constraints for the class of closed convex indicator functions.

Attributes

D (float) – upper bound on the diameter of the feasible set, possibly set to np.inf

Convex indicator functions are characterized by a parameter D, hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.functions import ConvexIndicatorFunction
>>> problem = PEP()
>>> func = problem.declare_function(function_class=ConvexIndicatorFunction, D=1)

References

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

Parameters
  • D (float) – Diameter of the support of self.

  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.

  • decomposition_dict (dict) – Decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

add_class_constraints()[source]

Formulates the list of interpolation constraints for self (closed convex indicator function), see [1, Theorem 3.6].

Convex support functions
class PEPit.functions.ConvexSupportFunction(M=inf, is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]

Bases: Function

The ConvexSupportFunction class overwrites the add_class_constraints method of Function, implementing interpolation constraints for the class of closed convex support functions.

Attributes

M (float) – upper bound on the Lipschitz constant

Convex support functions are characterized by a parameter M, hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.functions import ConvexSupportFunction
>>> problem = PEP()
>>> func = problem.declare_function(function_class=ConvexSupportFunction, M=1)

References

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

Parameters
  • M (float) – Lipschitz constant of self.

  • is_leaf (bool) – True if self is defined from scratch. False is self is defined as linear combination of leaf .

  • decomposition_dict (dict) – Decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

add_class_constraints()[source]

Formulates the list of interpolation constraints for self (closed convex support function), see [1, Corollary 3.7].

Restricted secant inequality and error bound
class PEPit.functions.RsiEbFunction(mu, L=1, is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]

Bases: Function

The RsiEbFunction class overwrites the add_class_constraints method of Function, implementing the interpolation constraints of the class of functions verifying the “lower” restricted secant inequality (\(\text{RSI}^-\)) and the “upper” error bound (\(\text{EB}^+\)).

Attributes
  • mu (float) – Restricted sequent inequality parameter

  • L (float) – Error bound parameter

\(\text{RSI}^-\) and \(\text{EB}^+\) functions are characterized by parameters \(\mu\) and L, hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.functions import RsiEbFunction
>>> problem = PEP()
>>> h = problem.declare_function(function_class=RsiEbFunction, mu=.1, L=1)

References

A definition of the class of \(\text{RSI}^-\) and \(\text{EB}^+\) functions can be found in [1].

[1] C. Guille-Escuret, B. Goujaud, A. Ibrahim, I. Mitliagkas (2022). Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound. arXiv 2203.00342.

Parameters
  • mu (float) – The restricted secant inequality parameter.

  • L (float) – The upper error bound parameter.

  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf .

  • decomposition_dict (dict) – decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

add_class_constraints()[source]

Formulates the list of necessary conditions for interpolation of self, see [1, Theorem 1].

Operators

Monotone
class PEPit.operators.MonotoneOperator(is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]

Bases: Function

The MonotoneOperator class overwrites the add_class_constraints method of Function, implementing interpolation constraints for the class of maximally monotone operators.

Note

Operator values can be requested through gradient and function values should not be used.

General maximally monotone operators are not characterized by any parameter, hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.operators import MonotoneOperator
>>> problem = PEP()
>>> h = problem.declare_function(function_class=MonotoneOperator)

References

[1] H. H. Bauschke and P. L. Combettes (2017). Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer New York, 2nd ed.

Parameters
  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf .

  • decomposition_dict (dict) – Decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

add_class_constraints()[source]

Formulates the list of interpolation constraints for self (maximally monotone operator), see, e.g., [1, Theorem 20.21].

Strongly monotone
class PEPit.operators.StronglyMonotoneOperator(mu, is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]

Bases: Function

The StronglyMonotoneOperator class overwrites the add_class_constraints method of Function, implementing interpolation constraints of the class of strongly monotone (maximally monotone) operators.

Note

Operator values can be requested through gradient and function values should not be used.

Attributes

mu (float) – strong monotonicity parameter

Strongly monotone (and maximally monotone) operators are characterized by the parameter \(\mu\), hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.operators import StronglyMonotoneOperator
>>> problem = PEP()
>>> h = problem.declare_function(function_class=StronglyMonotoneOperator, mu=.1)

References

Discussions and appropriate pointers for the problem of interpolation of maximally monotone operators can be found in: [1] E. Ryu, A. Taylor, C. Bergeling, P. Giselsson (2020). Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization, 30(3), 2251-2271.

Parameters
  • mu (float) – Strong monotonicity parameter.

  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf .

  • decomposition_dict (dict) – Decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

add_class_constraints()[source]

Formulates the list of interpolation constraints for self (strongly monotone maximally monotone operator), see, e.g., [1, Proposition 1].

Lipschitz continuous
class PEPit.operators.LipschitzOperator(L=1.0, is_leaf=True, decomposition_dict=None, reuse_gradient=True)[source]

Bases: Function

The LipschitzOperator class overwrites the add_class_constraints method of Function, implementing the interpolation constraints of the class of Lipschitz continuous operators.

Note

Operator values can be requested through gradient and function values should not be used.

Attributes

L (float)

Cocoercive operators are characterized by the parameter \(L\), hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.operators import LipschitzOperator
>>> problem = PEP()
>>> func = problem.declare_function(function_class=LipschitzOperator, L=1.)

Notes

By setting L=1, we define a non expansive operator.

By setting L<1, we define a contracting operator.

References

[1] M. Kirszbraun (1934). Uber die zusammenziehende und Lipschitzsche transformationen. Fundamenta Mathematicae, 22 (1934).

[2] F.A. Valentine (1943). On the extension of a vector function so as to preserve a Lipschitz condition. Bulletin of the American Mathematical Society, 49 (2).

[3] F.A. Valentine (1945). A Lipschitz condition preserving extension for a vector function. American Journal of Mathematics, 67(1).

Discussions and appropriate pointers for the interpolation problem can be found in: [4] E. Ryu, A. Taylor, C. Bergeling, P. Giselsson (2020). Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization, 30(3), 2251-2271.

Parameters
  • L (float) – Lipschitz continuity parameter.

  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf .

  • decomposition_dict (dict) – Decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

Note

Lipschitz continuous operators are necessarily continuous, hence reuse_gradient is set to True.

add_class_constraints()[source]

Formulates the list of interpolation constraints for self (Lipschitz operator), see [1, 2, 3] or e.g., [4, Fact 2].

Strongly monotone and Lipschitz continuous
class PEPit.operators.LipschitzStronglyMonotoneOperator(mu, L=1.0, is_leaf=True, decomposition_dict=None, reuse_gradient=True)[source]

Bases: Function

The LipschitzStronglyMonotoneOperator class overwrites the add_class_constraints method of Function, implementing some constraints (which are not necessary and sufficient for interpolation) for the class of Lipschitz continuous strongly monotone (and maximally monotone) operators.

Note

Operator values can be requested through gradient and function values should not be used.

Warning

Lipschitz strongly monotone operators do not enjoy known interpolation conditions. The conditions implemented in this class are necessary but a priori not sufficient for interpolation. Hence the numerical results obtained when using this class might be non-tight upper bounds (see Discussions in [1, Section 2]).

Attributes
  • mu (float) – strong monotonicity parameter

  • L (float) – Lipschitz parameter

Lipschitz continuous strongly monotone operators are characterized by parameters \(\mu\) and L, hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.operators import LipschitzStronglyMonotoneOperator
>>> problem = PEP()
>>> h = problem.declare_function(function_class=LipschitzStronglyMonotoneOperator, mu=.1, L=1.)

References

[1] E. Ryu, A. Taylor, C. Bergeling, P. Giselsson (2020). Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization, 30(3), 2251-2271.

Parameters
  • mu (float) – The strong monotonicity parameter.

  • L (float) – The Lipschitz continuity parameter.

  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf .

  • decomposition_dict (dict) – Decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

Note

Lipschitz continuous strongly monotone operators are necessarily continuous, hence reuse_gradient is set to True.

add_class_constraints()[source]

Formulates the list of necessary conditions for interpolation of self (Lipschitz strongly monotone and maximally monotone operator), see, e.g., discussions in [1, Section 2].

Cocoercive
class PEPit.operators.CocoerciveOperator(beta=1.0, is_leaf=True, decomposition_dict=None, reuse_gradient=True)[source]

Bases: Function

The CocoerciveOperator class overwrites the add_class_constraints method of Function, implementing the interpolation constraints of the class of cocoercive (and maximally monotone) operators.

Note

Operator values can be requested through gradient and function values should not be used.

Attributes

beta (float) – cocoercivity parameter

Cocoercive operators are characterized by the parameter \(\beta\), hence can be instantiated as

Example

>>> from PEPit import PEP
>>> from PEPit.operators import CocoerciveOperator
>>> problem = PEP()
>>> func = problem.declare_function(function_class=CocoerciveOperator, beta=1.)

References

[1] E. Ryu, A. Taylor, C. Bergeling, P. Giselsson (2020). Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization, 30(3), 2251-2271.

Parameters
  • beta (float) – The cocoercivity parameter.

  • is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf .

  • decomposition_dict (dict) – Decomposition of self as linear combination of leaf Function objects. Keys are Function objects and values are their associated coefficients.

  • reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same Point. If False, a new subgradient is computed each time one is required.

Note

Cocoercive operators are necessarily continuous, hence reuse_gradient is set to True.

add_class_constraints()[source]

Formulates the list of interpolation constraints for self (cocoercive maximally monotone operator), see, e.g., [1, Proposition 2].

Primitive steps

Inexact gradient step

PEPit.primitive_steps.inexact_gradient_step(x0, f, gamma, epsilon, notion='absolute')[source]

This routines performs a step \(x \leftarrow x_0 - \gamma d_{x_0}\) where \(d_{x_0}\) is close to the gradient of \(f\) in \(x_0\) in the following sense:

\[\begin{split}\|d_{x_0} - \nabla f(x_0)\|^2 \leqslant \left\{ \begin{eqnarray} & \varepsilon^2 & \text{if notion is set to 'absolute'}, \\ & \varepsilon^2 \|\nabla f(x_0)\|^2 & \text{if notion is set to 'relative'}. \end{eqnarray} \right.\end{split}\]

This relative approximation is used at least in in 3 PEPit examples, in particular in 2 unconstrained convex minimizations: an inexact gradient descent, and an inexact accelerated gradient.

Parameters
  • x0 (Point) – starting point x0.

  • f (Function) – a function.

  • gamma (float) – the step size parameter.

  • epsilon (float) – the required accuracy.

  • notion (string) – defines the mode (absolute or relative inaccuracy).

Returns
  • x (Point) – the output point.

  • dx0 (Point) – the approximate (sub)gradient of f at x0.

  • fx0 (Expression) – the value of the function f at x0.

Raises

ValueError – if notion is not set in [‘absolute’, ‘relative’].

Note

When \(\gamma\) is set to 0, then the this routine returns \(x_0\), \(d_{x_0}\), and \(f_{x_0}\). It is used as is in the example of unconstrained convex minimization scheme called “inexact gradient exact line search” only to access to the direction \(d_{x_0}\) close to the gradient \(g_{x_0}\).

Exact line-search step

PEPit.primitive_steps.exact_linesearch_step(x0, f, directions)[source]

This routines outputs some \(x\) by mimicking an exact line/span search in specified directions. It is used for instance in PEPit.examples.unconstrained_convex_minimization.wc_gradient_exact_line_search and in PEPit.examples.unconstrained_convex_minimization.wc_conjugate_gradient.

The routine aims at mimicking the operation:

\begin{eqnarray} x & = & x_0 - \sum_{i=1}^{T} \gamma_i d_i,\\ \text{with } \overrightarrow{\gamma} & = & \arg\min_\overrightarrow{\gamma} f\left(x_0 - \sum_{i=1}^{T} \gamma_i d_i\right), \end{eqnarray}

where \(T\) denotes the number of directions \(d_i\). This operation can equivalently be described in terms of the following conditions:

\begin{eqnarray} x - x_0 & \in & \text{span}\left\{d_1,\ldots,d_T\right\}, \\ \nabla f(x) & \perp & \text{span}\left\{d_1,\ldots,d_T\right\}. \end{eqnarray}

In this routine, we instead constrain \(x_{t}\) and \(\nabla f(x_{t})\) to satisfy

\begin{eqnarray} \forall i=1,\ldots,T: & \left< \nabla f(x);\, d_i \right> & = & 0,\\ \text{and } & \left< \nabla f(x);\, x - x_0 \right> & = & 0, \end{eqnarray}

which is a relaxation of the true line/span search conditions.

Note

The latest condition is automatically implied by the 2 previous ones.

Warning

One can notice this routine does not encode completely the fact that \(x_{t+1} - x_t\) must be a linear combination of the provided directions (i.e., this routine performs a relaxation). Therefore, if this routine is included in a PEP, the obtained value might be an upper bound on the true worst-case value.

Although not always tight, this relaxation is often observed to deliver pretty accurate results (in particular, it automatically produces tight results under some specific conditions, see, e.g., [1]). Two such examples are provided in the conjugate gradient and gradient with exact line search example files.

References

[1] Y. Drori and A. Taylor (2020). Efficient first-order methods for convex minimization: a constructive approach. Mathematical Programming 184 (1), 183-220.

Parameters
  • x0 (Point) – the starting point.

  • f (Function) – the function on which the (sub)gradient will be evaluated.

  • directions (List of Points) – the list of all directions required to be orthogonal to the (sub)gradient of x.

Returns
  • x (Point) – such that all vectors in directions are orthogonal to the (sub)gradient of f at x.

  • gx (Point) – a (sub)gradient of f at x.

  • fx (Expression) – the function f evaluated at x.

Proximal step

PEPit.primitive_steps.proximal_step(x0, f, gamma)[source]

This routine performs a proximal step of step-size gamma, starting from x0, and on function f. That is, it performs:

\begin{eqnarray} x \triangleq \text{prox}_{\gamma f}(x_0) & \triangleq & \arg\min_x \left\{ \gamma f(x) + \frac{1}{2} \|x - x_0\|^2 \right\}, \\ & \Updownarrow & \\ 0 & = & \gamma g_x + x - x_0 \text{ for some } g_x\in\partial f(x),\\ & \Updownarrow & \\ x & = & x_0 - \gamma g_x \text{ for some } g_x\in\partial f(x). \end{eqnarray}
Parameters
  • x0 (Point) – starting point x0.

  • f (Function) – function on which the proximal step is computed.

  • gamma (float) – step-size of the proximal step.

Returns
  • x (Point) – proximal point.

  • gx (Point) – the (sub)gradient of f at x.

  • fx (Expression) – the function value of f on x.

Inexact proximal step

PEPit.primitive_steps.inexact_proximal_step(x0, f, gamma, opt='PD_gapII')[source]

This routine encodes an inexact proximal operation with step size \(\gamma\). That is, it outputs a tuple \((x, g\in \partial f(x), f(x), w, v\in\partial f(w), f(w), \varepsilon)\) which are described as follows.

First, \(x\) is an approximation to the proximal point of \(x_0\) on function \(f\):

\[x \approx \mathrm{prox}_{\gamma f}(x_0)\triangleq\arg\min_x \left\{ \gamma f(x) + \frac{1}{2}\|x-x_0\|^2\right\},\]

where the meaning of \(\approx\) depends on the option “opt” and is explained below. The notions of inaccuracy implemented within this routine are specified using primal and dual proximal problems, denoted by

\begin{eqnarray} &\Phi^{(p)}_{\gamma f}(x; x_0) \triangleq \gamma f(x) + \frac{1}{2}\|x-x_0\|^2,\\ &\Phi^{(d)}_{\gamma f}(v; x_0) \triangleq -\gamma f^*(v)-\frac{1}{2}\|x_0-\gamma v\|^2 + \frac{1}{2}\|x_0\|^2,\\ \end{eqnarray}

where \(\Phi^{(p)}_{\gamma f}(x;x_0)\) and \(\Phi^{(d)}_{\gamma f}(v;x_0)\) respectively denote the primal and the dual proximal problems, and where \(f^*\) is the Fenchel conjugate of \(f\). The options below encode different meanings of “\(\approx\)” by specifying accuracy requirements on primal-dual pairs:

\[(x,v) \approx_{\varepsilon} \left(\mathrm{prox}_{\gamma f}(x_0),\,\mathrm{prox}_{f^*/\gamma}(x_0/\gamma)\right),\]

where \(\approx_{\varepsilon}\) corresponds to require the primal-dual pair \((x,v)\) to satisfy some primal-dual accuracy requirement:

\[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(v;x_0) \leqslant \varepsilon,\]

where \(\varepsilon\geqslant 0\) is the error magnitude, which is returned to the user so that one can constrain it to be bounded by some other values.

Relation to the exact proximal operation: In the exact case (no error in the computation, \(\varepsilon=0\)), \(v\) corresponds to the solution of the dual proximal problem and one can write

\[x = x_0-\gamma g,\]

with \(g=v=\mathrm{prox}_{f^*/\gamma}(x_0/\gamma)\in\partial f(x)\), and \(x=w\).

Reformulation of the primal-dual gap: In regard with the exact proximal computation; the inexact case under consideration here can be described as performing

\[x = x_0-\gamma v + e,\]

where \(v\) is an \(\epsilon\)-subgradient of \(f\) at \(x\) (notation \(v\in\partial_{\epsilon} f(x)\)) and \(e\) is some additional computation error. Those elements allow for a common convenient reformulation of the primal-dual gap, written in terms of the magnitudes of \(\epsilon\) and of \(e\):

\[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(v;x_0) = \frac{1}{2} \|e\|^2 + \gamma \epsilon.\]

Options: The following options are available (a list of such choices is presented in [4]; we provide a reference for each of those choices below).

  • ‘PD_gapI’ : the constraint imposed on the output is the vanilla (see, e.g., [2])

    \[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(v;x_0) \leqslant \varepsilon.\]

This approximation requirement is used in one PEPit example: an accelerated inexact forward backward.

  • ‘PD_gapII’ : the constraint is stronger than the vanilla primal-dual gap, as more structure is imposed (see, e.g., [1,5]) :

    \[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(g;x_0) \leqslant \varepsilon,\]

where we imposed that \(v\triangleq g\in\partial f(x)\) and \(w\triangleq x\). This approximation requirement is used in two PEPit examples: in a relatively inexact proximal point algorithm and in a partially inexact Douglas-Rachford splitting.

  • ‘PD_gapIII’ : the constraint is stronger than the vanilla primal-dual gap, as more structure is imposed (see, e.g., [3]):

    \[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(\tfrac{x_0 - x}{\gamma};x_0) \leqslant \varepsilon,\]

where we imposed that \(v \triangleq \frac{x_0 - x}{\gamma}\).

References

[1] R.T. Rockafellar (1976). Monotone operators and the proximal point algorithm. SIAM journal on control and optimization, 14(5), 877-898.

[2] R.D. Monteiro, B.F. Svaiter (2013). An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM Journal on Optimization, 23(2), 1092-1125.

[3] S. Salzo, S. Villa (2012). Inexact and accelerated proximal point algorithms. Journal of Convex analysis, 19(4), 1167-1192.

[4] M. Barre, A. Taylor, F. Bach (2020). Principled analyses and design of first-order methods with inexact proximal operators.

[5] A. d’Aspremont, D. Scieur, A. Taylor (2021). Acceleration Methods. Foundations and Trends in Optimization: Vol. 5, No. 1-2.

Parameters
  • x0 (Point) – point for which we aim to compute an approximate proximal step.

  • f (Function) – function whose proximal operator is approximated.

  • gamma (float) – step size of the proximal step.

  • opt (string) – option (type of error requirement) among ‘PD_gapI’, ‘PD_gapII’, and ‘PD_gapIII’.

Returns
  • x (Point) – the approximated proximal point.

  • gx (Point) – a (sub)gradient of f at x (subgradient used in evaluating the accuracy criterion).

  • fx (Expression) – f evaluated at x.

  • w (Point) – a point w such that v (see next output) is a subgradient of f at w.

  • v (Point) – the approximated proximal point of the dual problem, (sub)gradient of f evaluated at w.

  • fw (Expression) – f evaluated at w.

  • eps_var (Expression) – value of the primal-dual gap (which can be further bounded by the user).

Bregman gradient step

PEPit.primitive_steps.bregman_gradient_step(gx0, sx0, mirror_map, gamma)[source]

This routine outputs \(x\) by performing a mirror step of step-size \(\gamma\). That is, denoting \(f\) the function to be minimized and \(h\) the mirror map, it performs

\[x = \arg\min_x \left[ f(x_0) + \left< \nabla f(x_0);\, x - x_0 \right> + \frac{1}{\gamma} D_h(x; x_0) \right],\]

where \(D_h(x; x_0)\) denotes the Bregman divergence of \(h\) on \(x\) with respect to \(x_0\).

\[D_h(x; x_0) \triangleq h(x) - h(x_0) - \left< \nabla h(x_0);\, x - x_0 \right>.\]

Warning

The mirror map \(h\) is assumed differentiable.

By differentiating the previous objective function, one can observe that

\[\nabla h(x) = \nabla h(x_0) - \gamma \nabla f(x_0).\]
Parameters
  • sx0 (Point) – starting gradient \(\textbf{sx0} \triangleq \nabla h(x_0)\).

  • gx0 (Point) – descent direction \(\textbf{gx0} \triangleq \nabla f(x_0)\).

  • mirror_map (Function) – the reference function \(h\) we computed Bregman divergence of.

  • gamma (float) – step size.

Returns
  • x (Point) – new iterate \(\textbf{x} \triangleq x\).

  • sx (Point) – \(h\)’s gradient on new iterate \(x\) \(\textbf{sx} \triangleq \nabla h(x)\).

  • hx (Expression) – \(h\)’s value on new iterate \(\textbf{hx} \triangleq h(x)\).

Bregman proximal step

PEPit.primitive_steps.bregman_proximal_step(sx0, mirror_map, min_function, gamma)[source]

This routine outputs \(x\) by performing a proximal mirror step of step-size \(\gamma\). That is, denoting \(f\) the function to be minimized and \(h\) the mirror map, it performs

\[x = \arg\min_x \left[ f(x) + \frac{1}{\gamma} D_h(x; x_0) \right],\]

where \(D_h(x; x_0)\) denotes the Bregman divergence of \(h\) on \(x\) with respect to \(x_0\).

\[D_h(x; x_0) \triangleq h(x) - h(x_0) - \left< \nabla h(x_0);\, x - x_0 \right>.\]

Warning

The mirror map \(h\) is assumed differentiable.

By differentiating the previous objective function, one can observe that

\[\nabla h(x) = \nabla h(x_0) - \gamma \nabla f(x).\]
Parameters
  • sx0 (Point) – starting gradient \(\textbf{sx0} \triangleq \nabla h(x_0)\).

  • mirror_map (Function) – the reference function \(h\) we computed Bregman divergence of.

  • min_function (Function) – function we aim to minimize.

  • gamma (float) – step size.

Returns
  • x (Point) – new iterate \(\textbf{x} \triangleq x\).

  • sx (Point) – \(h\)’s gradient on new iterate \(x\) \(\textbf{sx} \triangleq \nabla h(x)\).

  • hx (Expression) – \(h\)’s value on new iterate \(\textbf{hx} \triangleq h(x)\).

  • gx (Point) – \(f\)’s gradient on new iterate \(x\) \(\textbf{gx} \triangleq \nabla f(x)\).

  • fx (Expression) – \(f\)’s value on new iterate \(\textbf{fx} \triangleq f(x)\).

Linear optimization step

PEPit.primitive_steps.linear_optimization_step(dir, ind)[source]

This routine outputs the result of a minimization problem with linear objective (whose direction is provided by dir) on the domain of the (closed convex) indicator function ind. That is, it outputs a solution to

\[\arg\min_{\text{ind}(x)=0} \left< \text{dir};\, x \right>,\]

One can notice that \(x\) is solution of this problem if and only if

\[- \text{dir} \in \partial \text{ind}(x).\]
Parameters
Returns
  • x (Point) – the optimal point.

  • gx (Point) – the (sub)gradient of ind on x.

  • fx (Expression) – the function value of ind on x.

Epsilon-subgradient step

PEPit.primitive_steps.epsilon_subgradient_step(x0, f, gamma)[source]

This routines performs a step \(x \leftarrow x_0 - \gamma g_0\) where \(g_0 \in\partial_{\varepsilon} f(x_0)\). That is, \(g_0\) is an \(\varepsilon\)-subgradient of \(f\) at \(x_0\). The set \(\partial_{\varepsilon} f(x_0)\) (referred to as the \(\varepsilon\)-subdifferential) is defined as (see [1, Section 3])

\[\partial_{\varepsilon} f(x)=\left\{g:\, f(z)\geqslant f(x)+\left< g;\, z-x \right>-\varepsilon \right\}.\]

An alternative characterization of \(g_0 \in\partial_{\varepsilon} f(x_0)\) consists in writing

\[f(x_0)+f^*(g_0)-\left< g_0;x_0\right>\leqslant \varepsilon.\]

References

[1] A. Brøndsted, R.T. Rockafellar. On the subdifferentiability of convex functions. Proceedings of the American Mathematical Society 16(4), 605–611 (1965)

Parameters
  • x0 (Point) – starting point x0.

  • f (Function) – a function.

  • gamma (float) – the step size parameter.

Returns
  • x (Point) – the output point.

  • g0 (Point) – an \(\varepsilon\)-subgradient of f at x0.

  • f0 (Expression) – the value of the function f at x0.

  • epsilon (Expression) – the value of epsilon.

Tools

Merge two dictionaries

PEPit.tools.merge_dict(dict1, dict2)[source]

Merge keys of dict1 and dict2. If a key is in the 2 dictionaries, then add the values.

Parameters
  • dict1 (dict) – any dictionary

  • dict2 (dict) – any dictionary

Returns

merged_dict (dict) – the union of the 2 inputs with added values.

Multiply two dictionaries

PEPit.tools.multiply_dicts(dict1, dict2)[source]

Multiply 2 dictionaries in the sense of developing a product of 2 sums.

Parameters
  • dict1 (dict) – any dictionary

  • dict2 (dict) – any dictionary

Returns

product_dict (dict) – the keys are the couple of keys of dict1 and dict2 and the values the product of values of dict1 and dict2.

Prune a dictionary

PEPit.tools.prune_dict(my_dict)[source]

Remove all keys associated to a null value.

Parameters

my_dict (dict) – any dictionary

Returns

pruned_dict (dict) – pruned dictionary

Examples

Unconstrained convex minimization

Gradient descent

PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent(L, gamma, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \gamma) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: Gradient descent is described by

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size.

Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), the tight theoretical guarantee can be found in [1, Theorem 3.1]:

\[f(x_n)-f_\star \leqslant \frac{L}{4nL\gamma+2} \|x_0-x_\star\|^2,\]

which is tight on some Huber loss functions.

References:

[1] Y. Drori, M. Teboulle (2014). Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming 145(1–2), 451–482.

Parameters
  • L (float) – the smoothness parameter.

  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> L = 3
>>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=1 / L, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 30 scalar constraint(s) ...
                 function 1 : 30 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.16666664596175398
*** Example file: worst-case performance of gradient descent with fixed step-sizes ***
        PEPit guarantee:         f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2

Subgradient method

PEPit.examples.unconstrained_convex_minimization.wc_subgradient_method(M, n, gamma, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is convex and \(M\)-Lipschitz. This problem is a (possibly non-smooth) minimization problem.

This code computes a worst-case guarantee for the subgradient method. That is, it computes the smallest possible \(\tau(n, M, \gamma)\) such that the guarantee

\[\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star \leqslant \tau(n, M, \gamma)\]

is valid, where \(x_t\) are the iterates of the subgradient method after \(t\leqslant n\) steps, where \(x_\star\) is a minimizer of \(f\), and when \(\|x_0-x_\star\|\leqslant 1\).

In short, for given values of \(M\), the step-size \(\gamma\) and the number of iterations \(n\), \(\tau(n, M, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star\) when \(\|x_0-x_\star\| \leqslant 1\).

Algorithm: For \(t\in \{0, \dots, n-1 \}\)

\begin{eqnarray} g_{t} & \in & \partial f(x_t) \\ x_{t+1} & = & x_t - \gamma g_t \end{eqnarray}

Theoretical guarantee: The tight bound is obtained in [1, Section 3.2.3] and [2, Eq (2)]

\[\min_{0 \leqslant t \leqslant n} f(x_t)- f(x_\star) \leqslant \frac{M}{\sqrt{n+1}}\|x_0-x_\star\|,\]

and tightness follows from the lower complexity bound for this class of problems, e.g., [3, Appendix A].

References: Classical references on this topic include [1, 2].

[1] Y. Nesterov (2003). Introductory lectures on convex optimization: A basic course. Springer Science & Business Media.

[2] S. Boyd, L. Xiao, A. Mutapcic (2003). Subgradient Methods (lecture notes).

[3] Y. Drori, M. Teboulle (2016). An optimal variant of Kelley’s cutting-plane method. Mathematical Programming, 160(1), 321-351.

Parameters
  • M (float) – the Lipschitz parameter.

  • n (int) – the number of iterations.

  • gamma (float) – step-size.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> M = 2
>>> n = 6
>>> gamma = 1 / (M * sqrt(n + 1))
>>> pepit_tau, theoretical_tau = wc_subgradient_method(M=M, n=n, gamma=gamma, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 9x9
(PEPit) Setting up the problem: performance measure is minimum of 7 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 64 scalar constraint(s) ...
                 function 1 : 64 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.7559825331741553
*** Example file: worst-case performance of subgradient method ***
        PEPit guarantee:         min_(0 \leq t \leq n) f(x_i) - f_* <= 0.755983 ||x_0 - x_*||
        Theoretical guarantee:   min_(0 \leq t \leq n) f(x_i) - f_* <= 0.755929 ||x_0 - x_*||

Subgradient method under restricted secant inequality and error bound

PEPit.examples.unconstrained_convex_minimization.wc_subgradient_method_rsi_eb(mu, L, gamma, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) verifies the “lower” restricted secant inequality (\(\mu-\text{RSI}^-\)) and the “upper” error bound (\(L-\text{EB}^+\)) [1].

This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, \mu, L, \gamma)\) such that the guarantee

\[\| x_n - x_\star \|^2 \leqslant \tau(n, \mu, L, \gamma) \| x_0 - x_\star \|^2\]

is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, \mu, L, \gamma)\) is computed as the worst-case value of \(\| x_n - x_\star \|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: Sub-gradient descent is described by

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size.

Theoretical guarantee: The tight theoretical guarantee can be found in [1, Prop 1] (upper bound) and [1, Theorem 2] (lower bound):

\[\| x_n - x_\star \|^2 \leqslant (1 - 2\gamma\mu + L^2 \gamma^2)^n \|x_0-x_\star\|^2.\]

References:

Definition and convergence guarantees can be found in [1].

[1] C. Guille-Escuret, B. Goujaud, A. Ibrahim, I. Mitliagkas (2022). Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound.

Parameters
  • mu (float) – the rsi parameter

  • L (float) – the eb parameter.

  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> mu = .1
>>> L = 1
>>> pepit_tau, theoretical_tau = wc_subgradient_method_rsi_eb(mu=mu, L=L, gamma=mu / L ** 2, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 6x6
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 8 scalar constraint(s) ...
                 function 1 : 8 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.9605893213566064
*** Example file: worst-case performance of gradient descent with fixed step-sizes ***
        PEPit guarantee:         f(x_n)-f_* <= 0.960589 ||x_0 - x_*||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.960596 ||x_0 - x_*||^2

Conjugate gradient

PEPit.examples.unconstrained_convex_minimization.wc_conjugate_gradient(L, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code computes a worst-case guarantee for the conjugate gradient (CG) method (with exact span searches). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0-x_\star\|^2\]

is valid, where \(x_n\) is the output of the conjugate gradient method, and where \(x_\star\) is a minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0-x_\star\|^2 \leqslant 1\).

Algorithm:

\[x_{t+1} = x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i)\]

with

\[(\gamma_i)_{i \leqslant t} = \arg\min_{(\gamma_i)_{i \leqslant t}} f \left(x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i) \right)\]

Theoretical guarantee:

The tight guarantee obtained in [1] is

\[f(x_n) - f_\star \leqslant\frac{L}{2 \theta_n^2}\|x_0-x_\star\|^2.\]

where

\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}, \end{eqnarray}

and tightness follows from [2, Theorem 3].

References: The detailed approach (based on convex relaxations) is available in [1, Corollary 6].

[1] Y. Drori and A. Taylor (2020). Efficient first-order methods for convex minimization: a constructive approach. Mathematical Programming 184 (1), 183-220.

[2] Y. Drori (2017). The exact information-based complexity of smooth convex minimization. Journal of Complexity, 39, 1-16.

Parameters
  • L (float) – the smoothness parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_conjugate_gradient(L=1, n=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 18 scalar constraint(s) ...
                 function 1 : 18 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.061893515427809735
*** Example file: worst-case performance of conjugate gradient method ***
        PEPit guarantee:         f(x_n)-f_* <= 0.0618935 ||x_0 - x_*||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.0618942 ||x_0 - x_*||^2

Heavy Ball momentum

PEPit.examples.unconstrained_convex_minimization.wc_heavy_ball_momentum(mu, L, alpha, beta, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for the Heavy-ball (HB) method, aka Polyak momentum method. That is, it computes the smallest possible \(\tau(n, L, \mu, \alpha, \beta)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \alpha, \beta) (f(x_0) - f_\star)\]

is valid, where \(x_n\) is the output of the Heavy-ball (HB) method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu, \alpha, \beta)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).

Algorithm:

\[x_{t+1} = x_t - \alpha \nabla f(x_t) + \beta (x_t-x_{t-1})\]

with

\[\alpha \in (0, \frac{1}{L}]\]

and

\[\beta = \sqrt{(1 - \alpha \mu)(1 - L \alpha)}\]

Theoretical guarantee:

The upper guarantee obtained in [2, Theorem 4] is

\[f(x_n) - f_\star \leqslant (1 - \alpha \mu)^n (f(x_0) - f_\star).\]

References: This methods was first introduce in [1, Section 2], and convergence upper bound was proven in [2, Theorem 4].

[1] B.T. Polyak (1964). Some methods of speeding up the convergence of iteration method. URSS Computational Mathematics and Mathematical Physics.

[2] E. Ghadimi, H. R. Feyzmahdavian, M. Johansson (2015). Global convergence of the Heavy-ball method for convex optimization. European Control Conference (ECC).

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • alpha (float) – parameter of the scheme.

  • beta (float) – parameter of the scheme such that \(0<\beta<1\) and \(0<\alpha<2(1+\beta)\).

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> mu = 0.1
>>> L = 1.
>>> alpha = 1 / (2 * L)  # alpha \in [0, 1 / L]
>>> beta = sqrt((1 - alpha * mu) * (1 - L * alpha))
>>> pepit_tau, theoretical_tau = wc_heavy_ball_momentum(mu=mu, L=L, alpha=alpha, beta=beta, n=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 5x5
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 12 scalar constraint(s) ...
                 function 1 : 12 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.753492450790045
*** Example file: worst-case performance of the Heavy-Ball method ***
        PEPit guarantee:         f(x_n)-f_* <= 0.753492 (f(x_0) - f(x_*))
        Theoretical guarantee:   f(x_n)-f_* <= 0.9025 (f(x_0) - f(x_*))

Accelerated gradient for convex objective

PEPit.examples.unconstrained_convex_minimization.wc_accelerated_gradient_convex(mu, L, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex (\(\mu\) is possibly 0).

This code computes a worst-case guarantee for an accelerated gradient method, a.k.a. fast gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the accelerated gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: The accelerated gradient method of this example is provided by

\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \\ y_{t+1} & = & x_{t+1} + \frac{t-1}{t+2} (x_{t+1} - x_t). \end{eqnarray}

Theoretical guarantee: When \(\mu=0\), a tight empirical guarantee can be found in [1, Table 1]:

\[f(x_n)-f_\star \leqslant \frac{2L\|x_0-x_\star\|^2}{n^2 + 5 n + 6},\]

where tightness is obtained on some Huber loss functions.

References:

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

Parameters
  • mu (float) – the strong convexity parameter

  • L (float) – the smoothness parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_convex(mu=0, L=1, n=1, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 4x4
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 6 scalar constraint(s) ...
                 function 1 : 6 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.16666666668209376
*** Example file: worst-case performance of accelerated gradient method ***
        PEPit guarantee:         f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2

Accelerated gradient for strongly convex objective

PEPit.examples.unconstrained_convex_minimization.wc_accelerated_gradient_strongly_convex(mu, L, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for an accelerated gradient method, a.k.a fast gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \left(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2\right),\]

is valid, where \(x_n\) is the output of the accelerated gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: For \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} y_t & = & x_t + \frac{\sqrt{L} - \sqrt{\mu}}{\sqrt{L} + \sqrt{\mu}}(x_t - x_{t-1}) \\ x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \end{eqnarray}

with \(x_{-1}:= x_0\).

Theoretical guarantee:

The following upper guarantee can be found in [1, Corollary 4.15]:

\[f(x_n)-f_\star \leqslant \left(1 - \sqrt{\frac{\mu}{L}}\right)^n \left(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2\right).\]

References:

[1] A. d’Aspremont, D. Scieur, A. Taylor (2021). Acceleration Methods. Foundations and Trends in Optimization: Vol. 5, No. 1-2.

Parameters
  • mu (float) – the strong convexity parameter

  • L (float) – the smoothness parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_strongly_convex(mu=0.1, L=1, n=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 5x5
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 12 scalar constraint(s) ...
                 function 1 : 12 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.34758587217463155
*** Example file: worst-case performance of the accelerated gradient method ***
        PEPit guarantee:         f(x_n)-f_* <= 0.347586 (f(x_0) - f(x_*) + mu/2*||x_0 - x_*||**2)
        Theoretical guarantee:   f(x_n)-f_* <= 0.467544 (f(x_0) - f(x_*) + mu/2*||x_0 - x_*||**2)

Optimized gradient

PEPit.examples.unconstrained_convex_minimization.wc_optimized_gradient(L, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code computes a worst-case guarantee for optimized gradient method (OGM). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of OGM and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: The optimized gradient method is described by

\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t)\\ y_{t+1} & = & x_{t+1} + \frac{\theta_{t}-1}{\theta_{t+1}}(x_{t+1}-x_t)+\frac{\theta_{t}}{\theta_{t+1}}(x_{t+1}-y_t), \end{eqnarray}

with

\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}. \end{eqnarray}

Theoretical guarantee: The tight theoretical guarantee can be found in [2, Theorem 2]:

\[f(x_n)-f_\star \leqslant \frac{L\|x_0-x_\star\|^2}{2\theta_n^2},\]

where tightness follows from [3, Theorem 3].

References: The optimized gradient method was developed in [1, 2]; the corresponding lower bound was first obtained in [3].

[1] Y. Drori, M. Teboulle (2014). Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming 145(1–2), 451–482.

[2] D. Kim, J. Fessler (2016). Optimized first-order methods for smooth convex minimization. Mathematical Programming 159.1-2: 81-107.

[3] Y. Drori (2017). The exact information-based complexity of smooth convex minimization. Journal of Complexity, 39, 1-16.

Parameters
  • L (float) – the smoothness parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_optimized_gradient(L=3, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 30 scalar constraint(s) ...
                 function 1 : 30 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.07675218017587908
*** Example file: worst-case performance of optimized gradient method ***
        PEPit guarantee:         f(y_n)-f_* <= 0.0767522 ||x_0 - x_*||^2
        Theoretical guarantee:   f(y_n)-f_* <= 0.0767518 ||x_0 - x_*||^2

Optimized gradient for gradient

PEPit.examples.unconstrained_convex_minimization.wc_optimized_gradient_for_gradient(L, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code computes a worst-case guarantee for optimized gradient method for gradient (OGM-G). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[\|\nabla f(x_n)\|^2 \leqslant \tau(n, L) (f(x_0) - f_\star)\]

is valid, where \(x_n\) is the output of OGM-G and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(\|\nabla f(x_n)\|^2\) when \(f(x_0)-f_\star \leqslant 1\).

Algorithm: For \(t\in\{0,1,\ldots,n-1\}\), the optimized gradient method for gradient [1, Section 6.3] is described by

\begin{eqnarray} y_{t+1} & = & x_t - \frac{1}{L} \nabla f(x_t),\\ x_{t+1} & = & y_{t+1} + \frac{(\tilde{\theta}_t-1)(2\tilde{\theta}_{t+1}-1)}{\tilde{\theta}_t(2\tilde{\theta}_t-1)}(y_{t+1}-y_t)+\frac{2\tilde{\theta}_{t+1}-1}{2\tilde{\theta}_t-1}(y_{t+1}-x_t), \end{eqnarray}

with

\begin{eqnarray} \tilde{\theta}_n & = & 1 \\ \tilde{\theta}_t & = & \frac{1 + \sqrt{4 \tilde{\theta}_{t+1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \tilde{\theta}_0 & = & \frac{1 + \sqrt{8 \tilde{\theta}_{1}^2 + 1}}{2}. \end{eqnarray}

Theoretical guarantee: The tight worst-case guarantee can be found in [1, Theorem 6.1]:

\[\|\nabla f(x_n)\|^2 \leqslant \frac{2L(f(x_0)-f_\star)}{\tilde{\theta}_0^2},\]

where tightness is achieved on Huber losses, see [1, Section 6.4].

References: The optimized gradient method for gradient was developed in [1].

[1] D. Kim, J. Fessler (2021). Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. Journal of optimization theory and applications, 188(1), 192-219.

Parameters
  • L (float) – the smoothness parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_optimized_gradient_for_gradient(L=3, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 30 scalar constraint(s) ...
                 function 1 : 30 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.30700758289614183
*** Example file: worst-case performance of optimized gradient method for gradient ***
        PEP-it guarantee:        ||f'(x_n)||^2 <= 0.307008 (f(x_0) - f_*)
        Theoretical guarantee:   ||f'(x_n)||^2 <= 0.307007 (f(x_0) - f_*)

Robust momentum

PEPit.examples.unconstrained_convex_minimization.wc_robust_momentum(mu, L, lam, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly-convex.

This code computes a worst-case guarantee for the robust momentum method (RMM). That is, it computes the smallest possible \(\tau(n, \mu, L, \lambda)\) such that the guarantee

\[v(x_{n+1}) \leqslant \tau(n, \mu, L, \lambda) v(x_{n}),\]

is valid, where \(x_n\) is the \(n^{\mathrm{th}}\) iterate of the RMM, and \(x_\star\) is a minimizer of \(f\). The function \(v(.)\) is a well-chosen Lyapunov defined as follows,

\begin{eqnarray} v(x_t) & = & l\|z_t - x_\star\|^2 + q_t, \\ q_t & = & (L - \mu) \left(f(x_t) - f_\star - \frac{\mu}{2}\|y_t - x_\star\|^2 - \frac{1}{2}\|\nabla f(y_t) - \mu (y_t - x_\star)\|^2 \right), \end{eqnarray}

with \(\kappa = \frac{\mu}{L}\), \(\rho = \lambda (1 - \frac{1}{\kappa}) + (1 - \lambda) \left(1 - \frac{1}{\sqrt{\kappa}}\right)\), and \(l = \mu^2 \frac{\kappa - \kappa \rho^2 - 1}{2 \rho (1 - \rho)}\).

Algorithm:

For \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} x_{t+1} & = & x_{t} + \beta (x_t - x_{t-1}) - \alpha \nabla f(y_t), \\ y_{t+1} & = & y_{t} + \gamma (x_t - x_{t-1}), \end{eqnarray}

with \(x_{-1}, x_0 \in \mathrm{R}^d\), and with parameters \(\alpha = \frac{\kappa (1 - \rho^2)(1 + \rho)}{L}\), \(\beta = \frac{\kappa \rho^3}{\kappa - 1}\), \(\gamma = \frac{\rho^2}{(\kappa - 1)(1 - \rho)^2(1 + \rho)}\).

Theoretical guarantee:

A convergence guarantee (empirically tight) is obtained in [1, Theorem 1],

\[v(x_{n+1}) \leqslant \rho^2 v(x_n),\]

with \(\rho = \lambda (1 - \frac{1}{\kappa}) + (1 - \lambda) \left(1 - \frac{1}{\sqrt{\kappa}}\right)\).

References:

[1] S. Cyrus, B. Hu, B. Van Scoy, L. Lessard (2018). A robust accelerated optimization algorithm for strongly convex functions. American Control Conference (ACC).

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • lam (float) – if \(\lambda=1\) it is the gradient descent, if \(\lambda=0\), it is the Triple Momentum Method.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Examples

>>> pepit_tau, theoretical_tau = wc_robust_momentum(mu=0.1, L=1, lam=0.2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 5x5
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 6 scalar constraint(s) ...
                 function 1 : 6 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.5285548355275751
*** Example file: worst-case performance of the Robust Momentum Method ***
        PEPit guarantee:         v(x_(n+1)) <= 0.528555 v(x_n)
        Theoretical guarantee:   v(x_(n+1)) <= 0.528555 v(x_n)

Triple momentum

PEPit.examples.unconstrained_convex_minimization.wc_triple_momentum(mu, L, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for triple momentum method (TMM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the TMM, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm:

For \(t \in \{ 1, \dots, n\}\)

\begin{eqnarray} \xi_{t+1} &&= (1 + \beta) \xi_{t} - \beta \xi_{t-1} - \alpha \nabla f(y_t) \\ y_{t} &&= (1+\gamma ) \xi_{t} -\gamma \xi_{t-1} \\ x_{t} && = (1 + \delta) \xi_{t} - \delta \xi_{t-1} \end{eqnarray}

with

\begin{eqnarray} \kappa &&= \frac{L}{\mu} , \quad \rho = 1- \frac{1}{\sqrt{\kappa}}\\ (\alpha, \beta, \gamma,\delta) && = \left(\frac{1+\rho}{L}, \frac{\rho^2}{2-\rho}, \frac{\rho^2}{(1+\rho)(2-\rho)}, \frac{\rho^2}{1-\rho^2}\right) \end{eqnarray}

and

\begin{eqnarray} \xi_{0} = x_0 \\ \xi_{1} = x_0 \\ y = x_0 \end{eqnarray}

Theoretical guarantee: A theoretical upper (empirically tight) bound can be found in [1, Theorem 1, eq. 4]:

\[f(x_n)-f_\star \leqslant \frac{\rho^{2(n+1)} L \kappa}{2}\|x_0 - x_\star\|^2.\]

References: The triple momentum method was discovered and analyzed in [1].

[1] Van Scoy, B., Freeman, R. A., Lynch, K. M. (2018), The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Letters, 2(1), 49-54.

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_triple_momentum(mu=0.1, L=1., n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 30 scalar constraint(s) ...
                 function 1 : 30 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.23893532450841679
*** Example file: worst-case performance of the Triple Momentum Method ***
        PEPit guarantee:         f(x_n)-f_* <= 0.238935 ||x_0-x_*||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.238925 ||x_0-x_*||^2

Information theoretic exact method

PEPit.examples.unconstrained_convex_minimization.wc_information_theoretic(mu, L, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex (\(\mu\) is possibly 0).

This code computes a worst-case guarantee for the information theoretic exact method (ITEM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee

\[\|z_n - x_\star\|^2 \leqslant \tau(n, L, \mu) \|z_0 - x_\star\|^2\]

is valid, where \(z_n\) is the output of the ITEM, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\|z_n - x_\star\|^2\) when \(\|z_0 - x_\star\|^2 \leqslant 1\).

Algorithm: For \(t\in\{0,1,\ldots,n-1\}\), the information theoretic exact method of this example is provided by

\begin{eqnarray} y_{t} & = & (1-\beta_t) z_t + \beta_t x_t \\ x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \\ z_{t+1} & = & \left(1-q\delta_t\right) z_t+q\delta_t y_t-\frac{\delta_t}{L}\nabla f(y_t), \end{eqnarray}

with \(y_{-1}=x_0=z_0\), \(q=\frac{\mu}{L}\) (inverse condition ratio), and the scalar sequences:

\begin{eqnarray} A_{t+1} & = & \frac{(1+q)A_t+2\left(1+\sqrt{(1+A_t)(1+qA_t)}\right)}{(1-q)^2},\\ \beta_{t+1} & = & \frac{A_t}{(1-q)A_{t+1}},\\ \delta_{t+1} & = & \frac{1}{2}\frac{(1-q)^2A_{t+1}-(1+q)A_t}{1+q+q A_t}, \end{eqnarray}

with \(A_0=0\).

Theoretical guarantee: A tight worst-case guarantee can be found in [1, Theorem 3]:

\[\|z_n - x_\star\|^2 \leqslant \frac{1}{1+q A_n} \|z_0-x_\star\|^2,\]

where tightness is obtained on some quadratic loss functions (see [1, Lemma 2]).

References:

[1] A. Taylor, Y. Drori (2021). An optimal gradient method for smooth strongly convex minimization. arXiv 2101.09741v2.

Parameters
  • mu (float) – the strong convexity parameter.

  • L (float) – the smoothness parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_information_theoretic(mu=.001, L=1, n=15, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 17x17
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 240 scalar constraint(s) ...
                 function 1 : 240 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.7566107333964406
*** Example file: worst-case performance of the information theoretic exact method ***
        PEP-it guarantee:        ||z_n - x_* ||^2 <= 0.756611 ||z_0 - x_*||^2
        Theoretical guarantee:   ||z_n - x_* ||^2 <= 0.756605 ||z_0 - x_*||^2

Proximal point

PEPit.examples.unconstrained_convex_minimization.wc_proximal_point(gamma, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is closed, proper, and convex (and potentially non-smooth).

This code computes a worst-case guarantee for the proximal point method with step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n,\gamma)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, \gamma) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the proximal point method, and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(n\) and \(\gamma\), \(\tau(n,\gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm:

The proximal point method is described by

\[x_{t+1} = \arg\min_x \left\{f(x)+\frac{1}{2\gamma}\|x-x_t\|^2 \right\},\]

where \(\gamma\) is a step-size.

Theoretical guarantee:

The tight theoretical guarantee can be found in [1, Theorem 4.1]:

\[f(x_n)-f_\star \leqslant \frac{\|x_0-x_\star\|^2}{4\gamma n},\]

where tightness is obtained on, e.g., one-dimensional linear problems on the positive orthant.

References:

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

Parameters
  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_proximal_point(gamma=3, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 6x6
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 20 scalar constraint(s) ...
                 function 1 : 20 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.02083327687098447
*** Example file: worst-case performance of proximal point method ***
        PEPit guarantee:         f(x_n)-f_* <= 0.0208333 ||x_0 - x_*||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.0208333 ||x_0 - x_*||^2

Accelerated proximal point

PEPit.examples.unconstrained_convex_minimization.wc_accelerated_proximal_point(A0, gammas, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is convex and possibly non-smooth.

This code computes a worst-case guarantee an accelerated proximal point method, aka fast proximal point method (FPP). That is, it computes the smallest possible \(\tau(n, A_0,\vec{\gamma})\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, A_0, \vec{\gamma}) \left(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2\right)\]

is valid, where \(x_n\) is the output of FPP (with step-size \(\gamma_t\) at step \(t\in \{0, \dots, n-1\}\)) and where \(x_\star\) is a minimizer of \(f\) and \(A_0\) is a positive number.

In short, for given values of \(n\), \(A_0\) and \(\vec{\gamma}\), \(\tau(n)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2 \leqslant 1\), for the following method.

Algorithm: For \(t\in \{0, \dots, n-1\}\):

\begin{eqnarray} y_{t+1} & = & (1-\alpha_{t} ) x_{t} + \alpha_{t} v_t \\ x_{t+1} & = & \arg\min_x \left\{f(x)+\frac{1}{2\gamma_t}\|x-y_{t+1}\|^2 \right\}, \\ v_{t+1} & = & v_t + \frac{1}{\alpha_{t}} (x_{t+1}-y_{t+1}) \end{eqnarray}

with

\begin{eqnarray} \alpha_{t} & = & \frac{\sqrt{(A_t \gamma_t)^2 + 4 A_t \gamma_t} - A_t \gamma_t}{2} \\ A_{t+1} & = & (1 - \alpha_{t}) A_t \end{eqnarray}

and \(v_0=x_0\).

Theoretical guarantee: A theoretical upper bound can be found in [1, Theorem 2.3.]:

\[f(x_n)-f_\star \leqslant \frac{4}{A_0 (\sum_{t=0}^{n-1} \sqrt{\gamma_t})^2}\left(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2 \right).\]

References: The accelerated proximal point was first obtained and analyzed in [1].

[1] O. Güler (1992). New proximal point algorithms for convex minimization. SIAM Journal on Optimization, 2(4):649–664.

Parameters
  • A0 (float) – initial value for parameter A_0.

  • gammas (list) – sequence of step-sizes.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_proximal_point(A0=5, gammas=[(i + 1) / 1.1 for i in range(3)], n=3, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 6x6
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 20 scalar constraint(s) ...
                 function 1 : 20 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.015931135941565824
*** Example file: worst-case performance of fast proximal point method ***
        PEPit guarantee:         f(x_n)-f_* <= 0.0159311 (f(x_0) - f_* + A/2* ||x_0 - x_*||^2)
        Theoretical guarantee:   f(x_n)-f_* <= 0.0511881 (f(x_0) - f_* + A/2* ||x_0 - x_*||^2)

Inexact gradient descent

PEPit.examples.unconstrained_convex_minimization.wc_inexact_gradient_descent(L, mu, epsilon, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for the inexact gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu, \varepsilon)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \varepsilon) (f(x_0) - f_\star)\]

is valid, where \(x_n\) is the output of the inexact gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\), \(\mu\) and \(\varepsilon\), \(\tau(n, L, \mu, \varepsilon)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).

Algorithm:

\[x_{t+1} = x_t - \gamma d_t\]

with

\[\|d_t - \nabla f(x_t)\| \leqslant \varepsilon \|\nabla f(x_t)\|\]

and

\[\gamma = \frac{2}{L_{\varepsilon} + \mu_{\varepsilon}}\]

where \(L_{\varepsilon} = (1 + \varepsilon) L\) and \(\mu_{\varepsilon} = (1 - \varepsilon) \mu\).

Theoretical guarantee:

The tight worst-case guarantee obtained in [1, Theorem 5.3] or [2, Remark 1.6] is

\[f(x_n) - f_\star \leqslant \left(\frac{L_{\varepsilon}-\mu_{\varepsilon}}{L_{\varepsilon}+\mu_{\varepsilon}}\right)^{2n}(f(x_0) - f_\star),\]

where tightness is achieved on simple quadratic functions.

References: The detailed analyses can be found in [1, 2].

[1] E. De Klerk, F. Glineur, A. Taylor (2020). Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation. SIAM Journal on Optimization, 30(3), 2053-2082.

[2] O. Gannot (2021). A frequency-domain analysis of inexact gradient methods. Mathematical Programming (to appear).

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • epsilon (float) – level of inaccuracy.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_inexact_gradient_descent(L=1, mu=.1, epsilon=.1, n=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 14 scalar constraint(s) ...
                 function 1 : 14 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.5189192063892595
*** Example file: worst-case performance of inexact gradient method in distance in function values ***
        PEPit guarantee:         f(x_n)-f_* <= 0.518919 (f(x_0)-f_*)
        Theoretical guarantee:   f(x_n)-f_* <= 0.518917 (f(x_0)-f_*)

Inexact accelerated gradient

PEPit.examples.unconstrained_convex_minimization.wc_inexact_accelerated_gradient(L, epsilon, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code computes a worst-case guarantee for an accelerated gradient method using inexact first-order information. That is, it computes the smallest possible \(\tau(n, L, \varepsilon)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \varepsilon) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of inexact accelerated gradient descent and where \(x_\star\) is a minimizer of \(f\).

The inexact descent direction is assumed to satisfy a relative inaccuracy described by (with \(0\leqslant \varepsilon \leqslant 1\))

\[\|\nabla f(y_t) - d_t\| \leqslant \varepsilon \|\nabla f(y_t)\|,\]

where \(\nabla f(y_t)\) is the true gradient at \(y_t\) and \(d_t\) is the approximate descent direction that is used.

Algorithm: The inexact accelerated gradient method of this example is provided by

\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} d_t\\ y_{k+1} & = & x_{t+1} + \frac{t-1}{t+2} (x_{t+1} - x_t). \end{eqnarray}

Theoretical guarantee: When \(\varepsilon=0\), a tight empirical guarantee can be found in [1, Table 1]:

\[f(x_n)-f_\star \leqslant \frac{2L\|x_0-x_\star\|^2}{n^2 + 5 n + 6},\]

which is achieved on some Huber loss functions (when \(\varepsilon=0\)).

References:

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

Parameters
  • L (float) – smoothness parameter.

  • epsilon (float) – level of inaccuracy

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_inexact_accelerated_gradient(L=1, epsilon=0.1, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 13x13
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 47 scalar constraint(s) ...
                 function 1 : 47 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.03944038534724904
*** Example file: worst-case performance of inexact accelerated gradient method ***
        PEPit guarantee:                         f(x_n)-f_* <= 0.0394404 (f(x_0)-f_*)
        Theoretical guarantee for epsilon = 0 :  f(x_n)-f_* <= 0.0357143 (f(x_0)-f_*)

Epsilon-subgradient method

PEPit.examples.unconstrained_convex_minimization.wc_epsilon_subgradient_method(M, n, gamma, eps, R, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is closed, convex, and proper. This problem is a (possibly non-smooth) minimization problem.

This code computes a worst-case guarantee for the \(\varepsilon\) -subgradient method. That is, it computes the smallest possible \(\tau(n, M, \gamma, \varepsilon, R)\) such that the guarantee

\[\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star \leqslant \tau(n, M, \gamma, \varepsilon, R)\]

is valid, where \(x_t\) are the iterates of the \(\varepsilon\) -subgradient method after \(t\leqslant n\) steps, where \(x_\star\) is a minimizer of \(f\), where \(M\) is an upper bound on the norm of all \(\varepsilon\)-subgradients encountered, and when \(\|x_0-x_\star\|\leqslant R\).

In short, for given values of \(M\), of the accuracy \(\varepsilon\), of the step-size \(\gamma\), of the initial distance \(R\), and of the number of iterations \(n\), \(\tau(n, M, \gamma, \varepsilon, R)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star\).

Algorithm: For \(t\in \{0, \dots, n-1 \}\)

\begin{eqnarray} g_{t} & \in & \partial_{\varepsilon} f(x_t) \\ x_{t+1} & = & x_t - \gamma g_t \end{eqnarray}

Theoretical guarantee: An upper bound is obtained in [1, Lemma 2]:

\[\min_{0 \leqslant t \leqslant n} f(x_t)- f(x_\star) \leqslant \frac{R^2+2(n+1)\gamma\varepsilon+(n+1) \gamma^2 M^2}{2(n+1) \gamma}.\]

References:

[1] R.D. Millán, M.P. Machado (2019). Inexact proximal epsilon-subgradient methods for composite convex optimization problems. Journal of Global Optimization 75.4 (2019): 1029-1060.

Parameters
  • M (float) – the bound on norms of epsilon-subgradients.

  • n (int) – the number of iterations.

  • gamma (float) – step-size.

  • eps (float) – the bound on the value of epsilon (inaccuracy).

  • R (float) – the bound on initial distance to an optimal solution.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> M, n, eps, R = 2, 6, .1, 1
>>> gamma = 1 / sqrt(n + 1)
>>> pepit_tau, theoretical_tau = wc_epsilon_subgradient_method(M=M, n=n, gamma=gamma, eps=eps, R=R, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 21x21
(PEPit) Setting up the problem: performance measure is minimum of 7 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (14 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 188 scalar constraint(s) ...
                 function 1 : 188 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 1.0191201198697333
*** Example file: worst-case performance of the epsilon-subgradient method ***
        PEPit guarantee:         min_(0 <= t <= n) f(x_i) - f_* <= 1.01912
        Theoretical guarantee:   min_(0 <= t <= n) f(x_i) - f_* <= 1.04491

Gradient descent for quadratically upper bounded convex objective

PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent_qg_convex(L, gamma, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [1]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.

This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L, \gamma) \| x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(||x_0 - x_\star||^2 \leqslant 1\).

Algorithm: Gradient descent is described by

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size.

Theoretical guarantee: When \(\gamma < \frac{1}{L}\), the lower theoretical guarantee can be found in [1, Theorem 2.2]:

\[f(x_n)-f_\star \leqslant \frac{L}{2}\max\left(\frac{1}{2n L \gamma + 1}, L \gamma\right) \|x_0-x_\star\|^2.\]

References:

The detailed approach is available in [1, Theorem 2.2].

[1] B. Goujaud, A. Taylor, A. Dieuleveut (2022). Optimal first-order methods for convex functions with a quadratic upper bound.

Parameters
  • L (float) – the quadratic growth parameter.

  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> L = 1
>>> pepit_tau, theoretical_tau = wc_gradient_descent_qg_convex(L=L, gamma=.2 / L, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 35 scalar constraint(s) ...
                 function 1 : 35 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.19230811671886025
*** Example file: worst-case performance of gradient descent with fixed step-sizes ***
        PEPit guarantee:         f(x_n)-f_* <= 0.192308 ||x_0 - x_*||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.192308 ||x_0 - x_*||^2

Gradient descent with decreasing step sizes for quadratically upper bounded convex objective

PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent_qg_convex_decreasing(L, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [1]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.

This code computes a worst-case guarantee for gradient descent with decreasing step-sizes. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L) \| x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of gradient descent with decreasing step-sizes, and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(||x_0 - x_\star||^2 \leqslant 1\).

Algorithm: Gradient descent with decreasing step sizes is described by

\[x_{t+1} = x_t - \gamma_t \nabla f(x_t)\]

with

\[\gamma_t = \frac{1}{L u_{t+1}}\]

where the sequence \(u\) is defined by

\begin{eqnarray} u_0 & = & 1 \\ u_{t} & = & \frac{u_{t-1}}{2} + \sqrt{\left(\frac{u_{t-1}}{2}\right)^2 + 2}, \quad \mathrm{for } t \geq 1 \end{eqnarray}

Theoretical guarantee: The tight theoretical guarantee is conjectured in [1, Conjecture A.3]:

\[f(x_n)-f_\star \leqslant \frac{L}{2 u_t} \|x_0-x_\star\|^2.\]

Notes:

We verify that \(u_t \sim 2\sqrt{t}\). The step sizes as well as the function values of the iterates decrease as \(O\left( \frac{1}{\sqrt{t}} \right)\).

References:

The detailed approach is available in [1, Appendix A.3].

[1] B. Goujaud, A. Taylor, A. Dieuleveut (2022). Optimal first-order methods for convex functions with a quadratic upper bound.

Parameters
  • L (float) – the quadratic growth parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_gradient_descent_qg_convex_decreasing(L=1, n=6, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 9x9
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 63 scalar constraint(s) ...
                 function 1 : 63 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.10554312873115372
(PEPit) Postprocessing: solver's output is not entirely feasible (smallest eigenvalue of the Gram matrix is: -4.19e-06 < 0).
 Small deviation from 0 may simply be due to numerical error. Big ones should be deeply investigated.
 In any case, from now the provided values of parameters are based on the projection of the Gram matrix onto the cone of symmetric semi-definite matrix.
*** Example file: worst-case performance of gradient descent with fixed step-sizes ***
        PEPit guarantee:         f(x_n)-f_* <= 0.105543 ||x_0 - x_*||^2
        Theoretical conjecture:  f(x_n)-f_* <= 0.105547 ||x_0 - x_*||^2

Conjugate gradient for quadratically upper bounded convex objective

PEPit.examples.unconstrained_convex_minimization.wc_conjugate_gradient_qg_convex(L, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [2]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.

This code computes a worst-case guarantee for the conjugate gradient (CG) method (with exact span searches). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0-x_\star\|^2\]

is valid, where \(x_n\) is the output of the conjugate gradient method, and where \(x_\star\) is a minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0-x_\star\|^2 \leqslant 1\).

Algorithm:

\[x_{t+1} = x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i)\]

with

\[(\gamma_i)_{i \leqslant t} = \arg\min_{(\gamma_i)_{i \leqslant t}} f \left(x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i) \right)\]

Theoretical guarantee:

The tight guarantee obtained in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper) is

\[f(x_n) - f_\star \leqslant \frac{L}{2 (n + 1)} \|x_0-x_\star\|^2.\]

References: The detailed approach (based on convex relaxations) is available in [1, Corollary 6], and the result provided in [2, Theorem 2.4].

[1] Y. Drori and A. Taylor (2020). Efficient first-order methods for convex minimization: a constructive approach. Mathematical Programming 184 (1), 183-220.

[2] B. Goujaud, A. Taylor, A. Dieuleveut (2022). Optimal first-order methods for convex functions with a quadratic upper bound.

Parameters
  • L (float) – the quadratic growth parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_conjugate_gradient_qg_convex(L=1, n=12, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 27x27
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 351 scalar constraint(s) ...
                 function 1 : 351 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.038461130525391705
*** Example file: worst-case performance of conjugate gradient method ***
        PEPit guarantee:         f(x_n)-f_* <= 0.0384611 ||x_0 - x_*||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.0384615 ||x_0 - x_*||^2

Heavy Ball momentum for quadratically upper bounded convex objective

PEPit.examples.unconstrained_convex_minimization.wc_heavy_ball_momentum_qg_convex(L, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [2]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.

This code computes a worst-case guarantee for the Heavy-ball (HB) method, aka Polyak momentum method. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the Heavy-ball (HB) method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm:

This method is described in [1]

\[x_{t+1} = x_t - \alpha_t \nabla f(x_t) + \beta_t (x_t-x_{t-1})\]

with

\[\alpha_t = \frac{1}{L} \frac{1}{t+2}\]

and

\[\beta_t = \frac{t}{t+2}\]

Theoretical guarantee:

The tight guarantee obtained in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper) is

\[f(x_n) - f_\star \leqslant \frac{L}{2}\frac{1}{n+1} \|x_0 - x_\star\|^2.\]

References: This methods was first introduce in [1, section 3], and convergence tight bound was proven in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper).

[1] E. Ghadimi, H. R. Feyzmahdavian, M. Johansson (2015). Global convergence of the Heavy-ball method for convex optimization. European Control Conference (ECC).

[2] B. Goujaud, A. Taylor, A. Dieuleveut (2022). Optimal first-order methods for convex functions with a quadratic upper bound.

Parameters
  • L (float) – the quadratic growth parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_heavy_ball_momentum_qg_convex(L=1, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 9x9
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 63 scalar constraint(s) ...
                 function 1 : 63 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.08333167067320212
*** Example file: worst-case performance of the Heavy-Ball method ***
        PEPit guarantee:         f(x_n)-f_* <= 0.0833317 ||x_0 - x_*||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.0833333 ||x_0 - x_*||^2

Composite convex minimization

Proximal gradient

PEPit.examples.composite_convex_minimization.wc_proximal_gradient(L, mu, gamma, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where \(f_1\) is \(L\)-smooth and \(\mu\)-strongly convex, and where \(f_2\) is closed convex and proper.

This code computes a worst-case guarantee for the proximal gradient method (PGM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee

\[\|x_n - x_\star\|^2 \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2,\]

is valid, where \(x_n\) is the output of the proximal gradient, and where \(x_\star\) is a minimizer of \(F\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\|x_n - x_\star\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: Proximal gradient is described by

\[\begin{split}\begin{eqnarray} y_t & = & x_t - \gamma \nabla f_1(x_t), \\ x_{t+1} & = & \arg\min_x \left\{f_2(x)+\frac{1}{2\gamma}\|x-y_t\|^2 \right\}, \end{eqnarray}\end{split}\]

for \(t \in \{ 0, \dots, n-1\}\) and where \(\gamma\) is a step-size.

Theoretical guarantee: It is well known that a tight guarantee for PGM is provided by

\[\|x_n - x_\star\|^2 \leqslant \max\{(1-L\gamma)^2,(1-\mu\gamma)^2\}^n \|x_0 - x_\star\|^2,\]

which can be found in, e.g., [1, Theorem 3.1]. It is a folk knowledge and the result can be found in many references for gradient descent; see, e.g.,[2, Section 1.4: Theorem 3], [3, Section 5.1] and [4, Section 4.4].

References:

[1] A. Taylor, J. Hendrickx, F. Glineur (2018). Exact worst-case convergence rates of the proximal gradient method for composite convex minimization. Journal of Optimization Theory and Applications, 178(2), 455-476.

[2] B. Polyak (1987). Introduction to Optimization. Optimization Software New York.

[3] E. Ryu, S. Boyd (2016). A primer on monotone operator methods. Applied and Computational Mathematics 15(1), 3-43.

[4] L. Lessard, B. Recht, A. Packard (2016). Analysis and design of optimization algorithms via integral quadratic constraints. SIAM Journal on Optimization 26(1), 57–95.

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • gamma (float) – proximal step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_proximal_gradient(L=1, mu=.1, gamma=1, n=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 6 scalar constraint(s) ...
                 function 1 : 6 scalar constraint(s) added
                 function 2 : Adding 6 scalar constraint(s) ...
                 function 2 : 6 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.6560999999942829
*** Example file: worst-case performance of the Proximal Gradient Method in function values***
        PEPit guarantee:         ||x_n - x_*||^2 <= 0.6561 ||x0 - xs||^2
        Theoretical guarantee:   ||x_n - x_*||^2 <= 0.6561 ||x0 - xs||^2

Accelerated proximal gradient

PEPit.examples.composite_convex_minimization.wc_accelerated_proximal_gradient(mu, L, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f(x) + h(x)\},\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex, and where \(h\) is closed convex and proper.

This code computes a worst-case guarantee for the accelerated proximal gradient method, also known as fast proximal gradient (FPGM) method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee

\[F(x_n) - F(x_\star) \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2,\]

is valid, where \(x_n\) is the output of the accelerated proximal gradient method, and where \(x_\star\) is a minimizer of \(F\).

In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(F(x_n) - F(x_\star)\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: Accelerated proximal gradient is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\begin{eqnarray} x_{t+1} & = & \arg\min_x \left\{h(x)+\frac{L}{2}\|x-\left(y_{t} - \frac{1}{L} \nabla f(y_t)\right)\|^2 \right\}, \\ y_{t+1} & = & x_{t+1} + \frac{i}{i+3} (x_{t+1} - x_{t}), \end{eqnarray}

where \(y_{0} = x_0\).

Theoretical guarantee: A tight (empirical) worst-case guarantee for FPGM is obtained in [1, method FPGM1 in Sec. 4.2.1, Table 1 in sec 4.2.2], for \(\mu=0\):

\[F(x_n) - F_\star \leqslant \frac{2 L}{n^2+5n+2} \|x_0 - x_\star\|^2,\]

which is attained on simple one-dimensional constrained linear optimization problems.

References:

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_proximal_gradient(L=1, mu=0, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 12x12
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 30 scalar constraint(s) ...
                 function 1 : 30 scalar constraint(s) added
                 function 2 : Adding 20 scalar constraint(s) ...
                 function 2 : 20 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.052630167313517565
(PEPit) Postprocessing: solver's output is not entirely feasible (smallest eigenvalue of the Gram matrix is: -7.28e-06 < 0).
 Small deviation from 0 may simply be due to numerical error. Big ones should be deeply investigated.
 In any case, from now the provided values of parameters are based on the projection of the Gram matrix onto the cone of symmetric semi-definite matrix.
*** Example file: worst-case performance of the Accelerated Proximal Gradient Method in function values***
        PEPit guarantee:         f(x_n)-f_* <= 0.0526302 ||x0 - xs||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.0526316 ||x0 - xs||^2

Bregman proximal point

PEPit.examples.composite_convex_minimization.wc_bregman_proximal_point(gamma, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]

where \(f_1(x)\) and \(f_2(x)\) are closed convex proper functions.

This code computes a worst-case guarantee for Bregman Proximal Point method. That is, it computes the smallest possible \(\tau(n, \gamma)\) such that the guarantee

\[F(x_n) - F(x_\star) \leqslant \tau(n, \gamma) D_{f_1}(x_\star; x_0)\]

is valid, where \(x_n\) is the output of the Bregman Proximal Point (BPP) method, where \(x_\star\) is a minimizer of \(F\), and when \(D_{f_1}\) is the Bregman distance generated by \(f_1\).

Algorithm: Bregman proximal point is described in [1, Section 2, equation (9)]. For \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} x_{t+1} & = & \arg\min_{u \in R^n} f_1(u) + \frac{1}{\gamma} D_{f_2}(u; x_t), \\ D_h(x; y) & = & h(x) - h(y) - \nabla h (y)^T(x - y). \end{eqnarray}

Theoretical guarantee: A tight empirical guarantee can be guessed from the numerics

\[F(x_n) - F(x_\star) \leqslant \frac{1}{\gamma n} D_{f_1}(x_\star, x_0).\]

References:

[1] Y. Censor, S.A. Zenios (1992). Proximal minimization algorithm with D-functions. Journal of Optimization Theory and Applications, 73(3), 451-464.

Parameters
  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Examples

>>> pepit_tau, theoretical_tau = wc_bregman_proximal_point(gamma=3, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 14x14
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 30 scalar constraint(s) ...
                 function 1 : 30 scalar constraint(s) added
                 function 2 : Adding 42 scalar constraint(s) ...
                 function 2 : 42 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.06666740784196148
*** Example file: worst-case performance of the Bregman Proximal Point in function values ***
        PEPit guarantee:         F(x_n)-F_* <= 0.0666674 Dh(x_*; x_0)
        Theoretical guarantee:   F(x_n)-F_* <= 0.0666667 Dh(x_*; x_0)

Douglas Rachford splitting

PEPit.examples.composite_convex_minimization.wc_douglas_rachford_splitting(L, alpha, theta, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]

where \(f_1(x)\) is is convex, closed and proper , and \(f_2\) is \(L\)-smooth. Both proximal operators are assumed to be available.

This code computes a worst-case guarantee for the Douglas Rachford Splitting (DRS) method. That is, it computes the smallest possible \(\tau(n, L, \alpha, \theta)\) such that the guarantee

\[F(y_n) - F(x_\star) \leqslant \tau(n, L, \alpha, \theta) \|x_0 - x_\star\|^2.\]

is valid, where it is known that \(x_k\) and \(y_k\) converge to \(x_\star\), but not \(w_k\) (see definitions in the section Algorithm). Hence we require the initial condition on \(x_0\) (arbitrary choice, partially justified by the fact we choose \(f_2\) to be the smooth function).

Note that \(y_n\) is feasible as it has a finite value for \(f_1\) (output of the proximal operator on \(f_1\)) and as \(f_2\) is smooth.

Algorithm:

Our notations for the DRS method are as follows, for \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} x_t & = & \mathrm{prox}_{\alpha f_2}(w_t), \\ y_t & = & \mathrm{prox}_{\alpha f_1}(2x_t - w_t), \\ w_{t+1} & = & w_t + \theta (y_t - x_t). \end{eqnarray}

This description can be found in [1, Section 7.3].

Theoretical guarantee: We compare the output with that of PESTO [2] for when \(0\leqslant n \leqslant 10\) in the case where \(\alpha=\theta=L=1\).

References:

[1] E. Ryu, S. Boyd (2016). A primer on monotone operator methods. Applied and Computational Mathematics 15(1), 3-43.

[2] A. Taylor, J. Hendrickx, F. Glineur (2017). Performance Estimation Toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In 56th IEEE Conference on Decision and Control (CDC).

Parameters
  • L (float) – the smoothness parameter.

  • alpha (float) – parameter of the scheme.

  • theta (float) – parameter of the scheme.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_douglas_rachford_splitting(L=1, alpha=1, theta=1, n=9, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 22x22
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 90 scalar constraint(s) ...
                 function 1 : 90 scalar constraint(s) added
                 function 2 : Adding 110 scalar constraint(s) ...
                 function 2 : 110 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.027792700548325236
*** Example file: worst-case performance of the Douglas Rachford Splitting in function values ***
        PEPit guarantee:         f(y_n)-f_* <= 0.0278 ||x0 - xs||^2
        Theoretical guarantee:   f(y_n)-f_* <= 0.0278 ||x0 - xs||^2

Douglas Rachford splitting contraction

PEPit.examples.composite_convex_minimization.wc_douglas_rachford_splitting_contraction(mu, L, alpha, theta, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x) \}\]

where \(f_1(x)\) is \(L\)-smooth and \(\mu\)-strongly convex, and \(f_2\) is convex, closed and proper. Both proximal operators are assumed to be available.

This code computes a worst-case guarantee for the Douglas Rachford Splitting (DRS) method. That is, it computes the smallest possible \(\tau(\mu,L,\alpha,\theta,n)\) such that the guarantee

\[\|w_1 - w_1'\|^2 \leqslant \tau(\mu,L,\alpha,\theta,n) \|w_0 - w_0'\|^2.\]

is valid, where \(x_n\) is the output of the Douglas Rachford Splitting method. It is a contraction factor computed when the algorithm is started from two different points \(w_0\) and \(w_0\).

Algorithm:

Our notations for the DRS method are as follows [3, Section 7.3], for \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} x_t & = & \mathrm{prox}_{\alpha f_2}(w_t), \\ y_t & = & \mathrm{prox}_{\alpha f_1}(2x_t - w_t), \\ w_{t+1} & = & w_t + \theta (y_t - x_t). \end{eqnarray}

Theoretical guarantee:

The tight theoretial guarantee is obtained in [2, Theorem 2]:

\[\|w_1 - w_1'\|^2 \leqslant \max\left(\frac{1}{1 + \mu \alpha}, \frac{\alpha L }{1 + L \alpha}\right)^{2n} \|w_0 - w_0'\|^2\]

for when \(\theta=1\).

References:

Details on the SDP formulations can be found in

[1] E. Ryu, A. Taylor, C. Bergeling, P. Giselsson (2020). Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization, 30(3), 2251-2271.

When \(\theta = 1\), the bound can be compared with that of [2, Theorem 2]

[2] P. Giselsson, and S. Boyd (2016). Linear convergence and metric selection in Douglas-Rachford splitting and ADMM. IEEE Transactions on Automatic Control, 62(2), 532-544.

A description for the DRS method can be found in [3, 7.3]

[3] E. Ryu, S. Boyd (2016). A primer on monotone operator methods. Applied and Computational Mathematics 15(1), 3-43.

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • alpha (float) – parameter of the scheme.

  • theta (float) – parameter of the scheme.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Examples

>>> pepit_tau, theoretical_tau = wc_douglas_rachford_splitting_contraction(mu=.1, L=1, alpha=3, theta=1, n=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 12x12
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 20 scalar constraint(s) ...
                 function 1 : 20 scalar constraint(s) added
                 function 2 : Adding 20 scalar constraint(s) ...
                 function 2 : 20 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.35012779919911946
*** Example file: worst-case performance of the Douglas-Rachford splitting in distance ***
        PEPit guarantee:         ||w - wp||^2 <= 0.350128 ||w0 - w0p||^2
        Theoretical guarantee:   ||w - wp||^2 <= 0.350128 ||w0 - w0p||^2

Accelerated Douglas Rachford splitting

PEPit.examples.composite_convex_minimization.wc_accelerated_douglas_rachford_splitting(mu, L, alpha, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where \(f_1\) is closed convex and proper, and \(f_2\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for accelerated Douglas-Rachford. That is, it computes the smallest possible \(\tau(n, L, \mu, \alpha)\) such that the guarantee

\[F(y_n) - F(x_\star) \leqslant \tau(n,L,\mu,\alpha) \|w_0 - w_\star\|^2\]

is valid, \(\alpha\) is a parameter of the method, and where \(y_n\) is the output of the accelerated Douglas-Rachford Splitting method, where \(x_\star\) is a minimizer of \(F\), and \(w_\star\) defined such that

\[x_\star = \mathrm{prox}_{\alpha f_2}(w_\star)\]

is an optimal point.

In short, for given values of \(n\), \(L\), \(\mu\), \(\alpha\), \(\tau(n, L, \mu, \alpha)\) is computed as the worst-case value of \(F(y_n)-F_\star\) when \(\|w_0 - w_\star\|^2 \leqslant 1\).

Algorithm: The accelerated Douglas-Rachford splitting is described in [1, Section 4]. For \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} x_{t} & = & \mathrm{prox}_{\alpha f_2} (u_t),\\ y_{t} & = & \mathrm{prox}_{\alpha f_1}(2x_t-u_t),\\ w_{t+1} & = & u_t + \theta (y_t-x_t),\\ u_{t+1} & = & \left\{\begin{array}{ll} w_{t+1}+\frac{t-1}{t+2}(w_{t+1}-w_t)\, & \text{if } t >1,\\ w_{t+1} & \text{otherwise.} \end{array}\right. \end{eqnarray}

Theoretical guarantee: There is no known worst-case guarantee for this method beyond quadratic minimization. For quadratics, an upper bound on is provided by [1, Theorem 5]:

\[F(y_n) - F_\star \leqslant \frac{2}{\alpha \theta (n + 3)^ 2} \|w_0-w_\star\|^2,\]

when \(\theta=\frac{1-\alpha L}{1+\alpha L}\) and \(\alpha < \frac{1}{L}\).

References: An analysis of the accelerated Douglas-Rachford splitting is available in [1, Theorem 5] for when the convex minimization problem is quadratic.

[1] P. Patrinos, L. Stella, A. Bemporad (2014). Douglas-Rachford splitting: Complexity estimates and accelerated variants. In 53rd IEEE Conference on Decision and Control (CDC).

Parameters
  • mu (float) – the strong convexity parameter.

  • L (float) – the smoothness parameter.

  • alpha (float) – the parameter of the scheme.

  • n (int) – the number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value (upper bound for quadratics; not directly comparable).

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_douglas_rachford_splitting(mu=.1, L=1, alpha=.9, n=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 11x11
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 20 scalar constraint(s) ...
                 function 1 : 20 scalar constraint(s) added
                 function 2 : Adding 20 scalar constraint(s) ...
                 function 2 : 20 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.19291623136473224
*** Example file: worst-case performance of the Accelerated Douglas Rachford Splitting in function values ***
        PEPit guarantee:                         F(y_n)-F_* <= 0.192916 ||x0 - ws||^2
        Theoretical guarantee for quadratics:    F(y_n)-F_* <= 1.68889 ||x0 - ws||^2

Frank Wolfe

PEPit.examples.composite_convex_minimization.wc_frank_wolfe(L, D, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where \(f_1\) is \(L\)-smooth and convex and where \(f_2\) is a convex indicator function on \(\mathcal{D}\) of diameter at most \(D\).

This code computes a worst-case guarantee for the conditional gradient method, aka Frank-Wolfe method. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[F(x_n) - F(x_\star) \leqslant \tau(n, L) D^2,\]

is valid, where x_n is the output of the conditional gradient method, and where \(x_\star\) is a minimizer of \(F\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(F(x_n) - F(x_\star)\) when \(D \leqslant 1\).

Algorithm:

This method was first presented in [1]. A more recent version can be found in, e.g., [2, Algorithm 1]. For \(t \in \{0, \dots, n-1\}\),

\[\begin{split}\begin{eqnarray} y_t & = & \arg\min_{s \in \mathcal{D}} \langle s \mid \nabla f_1(x_t) \rangle, \\ x_{t+1} & = & \frac{t}{t + 2} x_t + \frac{2}{t + 2} y_t. \end{eqnarray}\end{split}\]

Theoretical guarantee:

An upper guarantee obtained in [2, Theorem 1] is

\[F(x_n) - F(x_\star) \leqslant \frac{2L D^2}{n+2}.\]

References:

[1] M .Frank, P. Wolfe (1956). An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2), 95-110.

[2] M. Jaggi (2013). Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In 30th International Conference on Machine Learning (ICML).

Parameters
  • L (float) – the smoothness parameter.

  • D (float) – diameter of \(f_2\).

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_frank_wolfe(L=1, D=1, n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 26x26
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 132 scalar constraint(s) ...
                 function 1 : 132 scalar constraint(s) added
                 function 2 : Adding 325 scalar constraint(s) ...
                 function 2 : 325 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.07830185202143693
*** Example file: worst-case performance of the Conditional Gradient (Frank-Wolfe) in function value ***
        PEPit guarantee:         f(x_n)-f_* <= 0.0783019 ||x0 - xs||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.166667 ||x0 - xs||^2

Improved interior method

PEPit.examples.composite_convex_minimization.wc_improved_interior_algorithm(L, mu, c, lam, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where \(f_1\) is a \(L\)-smooth convex function, and \(f_2\) is a closed convex indicator function. We use a kernel function \(h\) that is assumed to be closed, proper, and strongly convex (see [1, Section 5]).

This code computes a worst-case guarantee for Improved interior gradient algorithm (IGA). That is, it computes the smallest possible \(\tau(\mu,L,c,\lambda,n)\) such that the guarantee

\[F(x_n) - F(x_\star) \leqslant \tau(\mu,L,c,\lambda,n) (c D_h(x_\star;x_0) + f_1(x_0) - f_1(x_\star))\]

is valid, where \(x_n\) is the output of the IGA and where \(x_\star\) is a minimizer of \(F\) and \(D_h\) is the Bregman distance generated by \(h\).

In short, for given values of \(\mu\), \(L\), \(c\), \(\lambda\) and \(n\), \(\tau(\mu,L,c,\lambda,n)\) is computed as the worst-case value of \(F(x_n)-F_\star\) when \(c D_h(x_\star;x_0) + f_1(x_0) - f_1(x_\star)\leqslant 1\).

Algorithm: The IGA is described in [1, “Improved Interior Gradient Algorithm”]. For \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} \alpha_t & = & \frac{\sqrt{(c_t\lambda)^2+4c_t\lambda}-\lambda c_t}{2},\\ y_t & = & (1-\alpha_t) x_t + \alpha_t z_t,\\ c_{t+1} & = & (1-\alpha_t)c_t,\\ z_{t+1} & = & \arg\min_{z} \left\{ \left< z;\frac{\alpha_t}{c_{t+1}}\nabla f_1(y_t)\right> +f_2(z)+D_h(z;z_t)\right\}, \\ x_{t+1} & = & (1-\alpha_t) x_t + \alpha_t z_{t+1}. \end{eqnarray}

Theoretical guarantee: The following upper bound can be found in [1, Theorem 5.2]:

\[F(x_n) - F_\star \leqslant \frac{4L}{c n^2}\left(c D_h(x_\star;x_0) + f_1(x_0) - f_1(x_\star) \right).\]

References:

[1] A. Auslender, M. Teboulle (2006). Interior gradient and proximal methods for convex and conic optimization. SIAM Journal on Optimization 16.3 (2006): 697-725.

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong-convexity parameter.

  • c (float) – initial value.

  • lam (float) – the step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> lam = 1 / L
>>> pepit_tau, theoretical_tau = wc_improved_interior_algorithm(L=L, mu=1, c=1, lam=lam, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 22x22
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                 function 1 : Adding 42 scalar constraint(s) ...
                 function 1 : 42 scalar constraint(s) added
                 function 2 : Adding 49 scalar constraint(s) ...
                 function 2 : 49 scalar constraint(s) added
                 function 3 : Adding 42 scalar constraint(s) ...
                 function 3 : 42 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal_inaccurate (solver: SCS); optimal value: 0.06675394483126838
*** Example file: worst-case performance of the Improved interior gradient algorithm in function values ***
        PEPit guarantee:         F(x_n)-F_* <= 0.0667539 (c * Dh(xs;x0) + f1(x0) - F_*)
        Theoretical guarantee:   F(x_n)-F_* <= 0.111111 (c * Dh(xs;x0) + f1(x0) - F_*)

No Lips in function value

PEPit.examples.composite_convex_minimization.wc_no_lips_in_function_value(L, gamma, n, verbose=1)[source]

Consider the constrainted composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where \(f_1\) is convex and \(L\)-smooth relatively to \(h\), \(h\) being closed proper and convex, and where \(f_2\) is a closed convex indicator function.

This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[F(x_n) - F_\star \leqslant \tau(n, L) D_h(x_\star; x_0),\]

is valid, where \(x_n\) is the output of the NoLips method, where \(x_\star\) is a minimizer of \(F\), and where \(D_h\) is the Bregman divergence generated by \(h\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(F(x_n) - F_\star\) when \(D_h(x_\star; x_0) \leqslant 1\).

Algorithm: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [2, Algorithm 1]. For \(t \in \{0, \dots, n-1\}\),

\[x_{t+1} = \arg\min_{u} \{f_2(u)+\langle \nabla f_1(x_t) \mid u - x_t \rangle + \frac{1}{\gamma} D_h(u; x_t)\}.\]

Theoretical guarantee:

The tight guarantee obtained in [2, Theorem 1] is

\[F(x_n) - F_\star \leqslant \frac{1}{\gamma n} D_h(x_\star; x_0),\]

for any \(\gamma \leq \frac{1}{L}\); tightness is provided in [2, page 23].

References: NoLips was proposed [1] for convex problems involving relative smoothness. The worst-case analysis using a PEP, as well as the tightness are provided in [2].

[1] H.H. Bauschke, J. Bolte, M. Teboulle (2017). A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications. Mathematics of Operations Research, 2017, vol. 42, no 2, p. 330-348.

[2] R. Dragomir, A. Taylor, A. d’Aspremont, J. Bolte (2021). Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 1-43.

Notes

Disclaimer: This example requires some experience with PEPit and PEPs ([2], section 4).

Parameters
  • L (float) – relative-smoothness parameter.

  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> gamma = 1 / (2 * L)
>>> pepit_tau, theoretical_tau = wc_no_lips_in_function_value(L=L, gamma=gamma, n=3, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 15x15
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                 function 1 : Adding 20 scalar constraint(s) ...
                 function 1 : 20 scalar constraint(s) added
                 function 2 : Adding 20 scalar constraint(s) ...
                 function 2 : 20 scalar constraint(s) added
                 function 3 : Adding 16 scalar constraint(s) ...
                 function 3 : 16 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.6666714558260607
*** Example file: worst-case performance of the NoLips in function values ***
        PEPit guarantee:         F(x_n) - F_* <= 0.666671 Dh(x_*; x_0)
        Theoretical guarantee:   F(x_n) - F_* <= 0.666667 Dh(x_*; x_0)

No Lips in Bregman divergence

PEPit.examples.composite_convex_minimization.wc_no_lips_in_bregman_divergence(L, gamma, n, verbose=1)[source]

Consider the constrainted composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where \(f_1\) is convex and \(L\)-smooth relatively to \(h\), \(h\) being closed proper and convex, and where \(f_2\) is a closed convex indicator function.

This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[\min_{t\leqslant n} D_h(x_{t-1}; x_t) \leqslant \tau(n, L) D_h(x_\star; x_0),\]

is valid, where \(x_n\) is the output of the NoLips method, where \(x_\star\) is a minimizer of \(F\), and where \(D_h\) is the Bregman divergence generated by \(h\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(\min_{t\leqslant n} D_h(x_{t-1}; x_t)\) when \(D_h(x_\star; x_0) \leqslant 1\).

Algorithm: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [2, Algorithm 1]. For \(t \in \{0, \dots, n-1\}\),

\[x_{t+1} = \arg\min_{u} \{f_2(u)+\langle \nabla f_1(x_t) \mid u - x_t \rangle + \frac{1}{\gamma} D_h(u; x_t)\}.\]

Theoretical guarantee: The upper guarantee obtained in [2, Proposition 4] is

\[\min_{t\leqslant n} D_h(x_{t-1}; x_t) \leqslant \frac{2}{n (n - 1)} D_h(x_\star; x_0),\]

for any \(\gamma \leq \frac{1}{L}\). It is empirically tight.

References:

[1] H.H. Bauschke, J. Bolte, M. Teboulle (2017). A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications. Mathematics of Operations Research, 2017, vol. 42, no 2, p. 330-348.

[2] R. Dragomir, A. Taylor, A. d’Aspremont, J. Bolte (2021). Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 1-43.

Notes

Disclaimer: This example requires some experience with PEPit and PEPs ([2], section 4).

Parameters
  • L (float) – relative-smoothness parameter.

  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_no_lips_in_bregman_divergence(L=L, gamma=gamma, n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 36x36
(PEPit) Setting up the problem: performance measure is minimum of 10 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                 function 1 : Adding 132 scalar constraint(s) ...
                 function 1 : 132 scalar constraint(s) added
                 function 2 : Adding 132 scalar constraint(s) ...
                 function 2 : 132 scalar constraint(s) added
                 function 3 : Adding 121 scalar constraint(s) ...
                 function 3 : 121 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.022279210584840024
*** Example file: worst-case performance of the NoLips_2 in Bregman divergence ***
        PEPit guarantee:         min_t Dh(x_(t-1); x_t) <= 0.0222792 Dh(x_*; x_0)
        Theoretical guarantee:   min_t Dh(x_(t-1); x_t) <= 0.0222222 Dh(x_*; x_0)

Three operator splitting

PEPit.examples.composite_convex_minimization.wc_three_operator_splitting(mu1, L1, L3, alpha, theta, n, verbose=1)[source]

Consider the composite convex minimization problem,

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x) + f_3(x)\}\]

where, \(f_1\) is \(L_1\)-smooth and \(\mu_1\)-strongly convex, \(f_2\) is closed, convex and proper, and \(f_3\) is \(L_3\)-smooth convex. Proximal operators are assumed to be available for \(f_1\) and \(f_2\).

This code computes a worst-case guarantee for the Three Operator Splitting (TOS). That is, it computes the smallest possible \(\tau(n, L_1, L_3, \mu_1)\) such that the guarantee

\[\|w^{(0)}_{n} - w^{(1)}_{n}\|^2 \leqslant \tau(n, L_1, L_3, \mu_1, \alpha, \theta) \|w^{(0)}_{0} - w^{(1)}_{0}\|^2\]

is valid, where \(w^{(0)}_{0}\) and \(w^{(1)}_{0}\) are two different starting points and \(w^{(0)}_{n}\) and \(w^{(1)}_{n}\) are the two corresponding \(n^{\mathrm{th}}\) outputs of TOS. (i.e., how do the iterates contract when the method is started from two different initial points).

In short, for given values of \(n\), \(L_1\), \(L_3\), \(\mu_1\), \(\alpha\) and \(\theta\), the contraction factor \(\tau(n, L_1, L_3, \mu_1, \alpha, \theta)\) is computed as the worst-case value of \(\|w^{(0)}_{n} - w^{(1)}_{n}\|^2\) when \(\|w^{(0)}_{0} - w^{(1)}_{0}\|^2 \leqslant 1\).

Algorithm: One iteration of the algorithm is described in [1]. For \(t \in \{0, \dots, n-1\}\),

\begin{eqnarray} x_t & = & \mathrm{prox}_{\alpha, f_2}(w_t), \\ y_t & = & \mathrm{prox}_{\alpha, f_1}(2 x_t - w_t - \alpha \nabla f_3(x_t)), \\ w_{t+1} & = & w_t + \theta (y_t - x_t). \end{eqnarray}

References: The TOS was introduced in [1].

[1] D. Davis, W. Yin (2017). A three-operator splitting scheme and its optimization applications. Set-valued and variational analysis, 25(4), 829-858.

Parameters
  • mu1 (float) – the strong convexity parameter of function \(f_1\).

  • L1 (float) – the smoothness parameter of function \(f_1\).

  • L3 (float) – the smoothness parameter of function \(f_3\).

  • alpha (float) – parameter of the scheme.

  • theta (float) – parameter of the scheme.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (None) – no theoretical value.

Example

>>> L3 = 1
>>> alpha = 1 / L3
>>> pepit_tau, theoretical_tau = wc_three_operator_splitting(mu1=0.1, L1=10, L3=L3, alpha=alpha, theta=1, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 26x26
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                 function 1 : Adding 56 scalar constraint(s) ...
                 function 1 : 56 scalar constraint(s) added
                 function 2 : Adding 56 scalar constraint(s) ...
                 function 2 : 56 scalar constraint(s) added
                 function 3 : Adding 56 scalar constraint(s) ...
                 function 3 : 56 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.47544137382115453
*** Example file: worst-case performance of the Three Operator Splitting in distance ***
        PEPit guarantee:         ||w^2_n - w^1_n||^2 <= 0.475441 ||x0 - ws||^2

Non-convex optimization

Gradient Descent

PEPit.examples.nonconvex_optimization.wc_gradient_descent(L, gamma, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth.

This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee

\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \tau(n, L, \gamma) (f(x_0) - f(x_n))\]

is valid, where \(x_n\) is the n-th iterates obtained with the gradient method with fixed step-size.

Algorithm: Gradient descent is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size and.

Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), an empirically tight theoretical worst-case guarantee is

\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \frac{4}{3}\frac{L}{n} (f(x_0) - f(x_n)),\]

see discussions in [1, page 190] and [2].

References:

[1] Taylor, A. B. (2017). Convex interpolation and performance estimation of first-order methods for convex optimization. PhD Thesis, UCLouvain.

[2] H. Abbaszadehpeivasti, E. de Klerk, M. Zamani (2021). The exact worst-case convergence rate of the gradient method with fixed step lengths for L-smooth functions. Optimization Letters, 16(6), 1649-1661.

Parameters
  • L (float) – the smoothness parameter.

  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=gamma, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 6 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 30 scalar constraint(s) ...
                 function 1 : 30 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.2666769474847614
*** Example file: worst-case performance of gradient descent with fixed step-size ***
        PEPit guarantee:         min_i ||f'(x_i)||^2 <= 0.266677 (f(x_0)-f_*)
        Theoretical guarantee:   min_i ||f'(x_i)||^2 <= 0.266667 (f(x_0)-f_*)

No Lips 1

PEPit.examples.nonconvex_optimization.wc_no_lips_1(L, gamma, n, verbose=1)[source]

Consider the constrainted non-convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]

where \(f_2\) is a closed convex indicator function and \(f_1\) is possibly non-convex and \(L\)-smooth relatively to \(h\), and where \(h\) is closed proper and convex.

This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee

\[\min_{0 \leqslant t \leqslant n-1} D_h(x_{t+1}; x_t) \leqslant \tau(n, L, \gamma) (F(x_0) - F(x_n))\]

is valid, where \(x_n\) is the output of the NoLips method, and where \(D_h\) is the Bregman distance generated by \(h\):

\[D_h(x; y) \triangleq h(x) - h(y) - \nabla h (y)^T(x - y).\]

In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n-1}D_h(x_{t+1}; x_t)\) when \(F(x_0) - F(x_n) \leqslant 1\).

Algorithms: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [1, Section 3]. For \(t \in \{0, \dots, n-1\}\),

\[x_{t+1} = \arg\min_{u \in R^d} \nabla f(x_t)^T(u - x_t) + \frac{1}{\gamma} D_h(u; x_t).\]

Theoretical guarantees: The tight theoretical upper bound is obtained in [1, Proposition 4.1]

\[\min_{0 \leqslant t \leqslant n-1} D_h(x_{t+1}; x_t) \leqslant \frac{\gamma}{n(1 - L\gamma)}(F(x_0) - F(x_n))\]

References: The detailed setup and results are availaible in [1]. The PEP approach for studying such settings is presented in [2].

[1] J. Bolte, S. Sabach, M. Teboulle, Y. Vaisbourd (2018). First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3), 2131-2151.

[2] R. Dragomir, A. Taylor, A. d’Aspremont, J. Bolte (2021). Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 1-43.

DISCLAIMER: This example requires some experience with PEPit and PEPs (see Section 4 in [2]).

Parameters
  • L (float) – relative-smoothness parameter.

  • gamma (float) – step-size (equal to 1/(2*L) for guarantee).

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> gamma = 1 / (2 * L)
>>> pepit_tau, theoretical_tau = wc_no_lips_1(L=L, gamma=gamma, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 20x20
(PEPit) Setting up the problem: performance measure is minimum of 5 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                 function 1 : Adding 30 scalar constraint(s) ...
                 function 1 : 30 scalar constraint(s) added
                 function 2 : Adding 30 scalar constraint(s) ...
                 function 2 : 30 scalar constraint(s) added
                 function 3 : Adding 49 scalar constraint(s) ...
                 function 3 : 49 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.20000306821054706
*** Example file: worst-case performance of the NoLips in Bregman divergence ***
        PEPit guarantee:         min_t Dh(x_(t+1), x_(t)) <= 0.200003 (F(x_0) - F(x_n))
        Theoretical guarantee :  min_t Dh(x_(t+1), x_(t)) <= 0.2 (F(x_0) - F(x_n))

No Lips 2

PEPit.examples.nonconvex_optimization.wc_no_lips_2(L, gamma, n, verbose=1)[source]

Consider the constrainted composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]

where \(f_2\) is a closed convex indicator function and \(f_1\) is possibly non-convex, \(L\)-smooth relatively to \(h\), and \(h\) is closed proper and convex.

This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n,L,\gamma)\) such that the guarantee

\[\min_{0 \leqslant t \leqslant n-1} D_h(x_t;x_{t+1}) \leqslant \tau(n, L, \gamma) (F(x_0) - F(x_n))\]

is valid, where \(x_n\) is the output of the NoLips method, and where \(D_h\) is the Bregman distance generated by \(h\):

\[D_h(x; y) \triangleq h(x) - h(y) - \nabla h (y)^T(x - y).\]

In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n-1}D_h(x_t;x_{t+1})\) when \(F(x_0) - F(x_n) \leqslant 1\).

Algorithms: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [1, Section 3]. For \(t \in \{0, \dots, n-1\}\),

\[x_{t+1} = \arg\min_{u \in R^d} \nabla f(x_t)^T(u - x_t) + \frac{1}{\gamma} D_h(u; x_t).\]

Theoretical guarantees: An empirically tight worst-case guarantee is

\[\min_{0 \leqslant t \leqslant n-1}D_h(x_t;x_{t+1}) \leqslant \frac{\gamma}{n}(F(x_0) - F(x_n)).\]

References: The detailed setup is presented in [1]. The PEP approach for studying such settings is presented in [2].

[1] J. Bolte, S. Sabach, M. Teboulle, Y. Vaisbourd (2018). First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3), 2131-2151.

[2] R. Dragomir, A. Taylor, A. d’Aspremont, J. Bolte (2021). Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 1-43.

DISCLAIMER: This example requires some experience with PEPit and PEPs (see Section 4 in [2]).

Parameters
  • L (float) – relative-smoothness parameter.

  • gamma (float) – step-size (equal to \(\frac{1}{2L}\) for guarantee).

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_no_lips_2(L=L, gamma=gamma, n=3, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 14x14
(PEPit) Setting up the problem: performance measure is minimum of 3 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                 function 1 : Adding 12 scalar constraint(s) ...
                 function 1 : 12 scalar constraint(s) added
                 function 2 : Adding 12 scalar constraint(s) ...
                 function 2 : 12 scalar constraint(s) added
                 function 3 : Adding 25 scalar constraint(s) ...
                 function 3 : 25 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.33333185324089176
*** Example file: worst-case performance of the NoLips_2 in Bregman distance ***
        PEPit guarantee:         min_t Dh(x_(t-1), x_(t)) <= 0.333332 (F(x_0) - F(x_n))
        Theoretical guarantee:   min_t Dh(x_(t-1), x_(t)) <= 0.333333 (F(x_0) - F(x_n))

Stochastic and randomized convex minimization

Stochastic gradient descent

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_sgd(L, mu, gamma, v, R, n, verbose=1)[source]

Consider the finite sum minimization problem

\[F_\star \triangleq \min_x \left\{F(x) \equiv \frac{1}{n} \sum_{i=1}^n f_i(x)\right\},\]

where \(f_1, ..., f_n\) are \(L\)-smooth and \(\mu\)-strongly convex. In addition, we assume a bounded variance at the optimal point (which is denoted by \(x_\star\)):

\[\mathbb{E}\left[\|\nabla f_i(x_\star)\|^2\right] = \frac{1}{n} \sum_{i=1}^n\|\nabla f_i(x_\star)\|^2 \leqslant v^2.\]

This code computes a worst-case guarantee for one step of the stochastic gradient descent (SGD) in expectation, for the distance to an optimal point. That is, it computes the smallest possible \(\tau(L, \mu, \gamma, v, R, n)\) such that

\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \tau(L, \mu, \gamma, v, R, n)\]

where \(\|x_0 - x_\star\|^2 \leqslant R^2\), where \(v\) is the variance at \(x_\star\), and where \(x_1\) is the output of one step of SGD (note that we use the notation \(x_0,x_1\) to denote two consecutive iterates for convenience; as the bound is valid for all \(x_0\), it is also valid for any pair of consecutive iterates of the algorithm).

Algorithm: One iteration of SGD is described by:

\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, n|]\right), \\ x_{t+1} & = & x_t - \gamma \nabla f_{i}(x_t), \end{eqnarray}\end{split}\]

where \(\gamma\) is a step-size.

Theoretical guarantee: An empirically tight one-iteration guarantee is provided in the code of PESTO [1]:

\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \frac{1}{2}\left(1-\frac{\mu}{L}\right)^2 R^2 + \frac{1}{2}\left(1-\frac{\mu}{L}\right) R \sqrt{\left(1-\frac{\mu}{L}\right)^2 R^2 + 4\frac{v^2}{L^2}} + \frac{v^2}{L^2},\]

when \(\gamma=\frac{1}{L}\). Note that we observe the guarantee does not depend on the number \(n\) of functions for this particular setting, thereby implying that the guarantees are also valid for expectation minimization settings (i.e., when \(n\) goes to infinity).

References: Empirically tight guarantee provided in code of [1]. Using SDPs for analyzing SGD-type method was proposed in [2, 3].

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Performance Estimation Toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In 56th IEEE Conference on Decision and Control (CDC).

[2] B. Hu, P. Seiler, L. Lessard (2020). Analysis of biased stochastic gradient descent using sequential semidefinite programs. Mathematical programming (to appear).

[3] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. Conference on Learning Theory (COLT).

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • gamma (float) – the step-size.

  • v (float) – the variance bound.

  • R (float) – the initial distance.

  • n (int) – number of functions.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> mu = 0.1
>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_sgd(L=L, mu=mu, gamma=gamma, v=1, R=2, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 11x11
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (2 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 5 function(s)
                 function 1 : Adding 2 scalar constraint(s) ...
                 function 1 : 2 scalar constraint(s) added
                 function 2 : Adding 2 scalar constraint(s) ...
                 function 2 : 2 scalar constraint(s) added
                 function 3 : Adding 2 scalar constraint(s) ...
                 function 3 : 2 scalar constraint(s) added
                 function 4 : Adding 2 scalar constraint(s) ...
                 function 4 : 2 scalar constraint(s) added
                 function 5 : Adding 2 scalar constraint(s) ...
                 function 5 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 5.041652328250217
*** Example file: worst-case performance of stochastic gradient descent with fixed step-size ***
        PEPit guarantee:         E[||x_1 - x_*||^2] <= 5.04165 ||x0 - x_*||^2
        Theoretical guarantee:   E[||x_1 - x_*||^2] <= 5.04165 ||x0 - x_*||^2

Stochastic gradient descent in overparametrized setting

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_sgd_overparametrized(L, mu, gamma, n, verbose=1)[source]

Consider the finite sum minimization problem

\[F_\star \triangleq \min_x \left\{F(x) \equiv \frac{1}{n} \sum_{i=1}^n f_i(x)\right\},\]

where \(f_1, ..., f_n\) are \(L\)-smooth and \(\mu\)-strongly convex. In addition, we assume a zero variance at the optimal point (which is denoted by \(x_\star\)):

\[\mathbb{E}\left[\|\nabla f_i(x_\star)\|^2\right] = \frac{1}{n} \sum_{i=1}^n \|\nabla f_i(x_\star)\|^2 = 0,\]

which happens for example in machine learning in the interpolation regime, that is if there exists a model \(x_\star\) such that the loss \(\mathcal{L}\) on any observation \((z_i)_{i \in [|1, n|]}\), \(\mathcal{L}(x_\star, z_i) = f_i(x_\star)\) is zero.

This code computes a worst-case guarantee for one step of the stochastic gradient descent (SGD) in expectation, for the distance to optimal point. That is, it computes the smallest possible \(\tau(L, \mu, \gamma, n)\) such that

\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \tau(L, \mu, \gamma, n) \|x_0 - x_\star\|^2\]

is valid, where \(x_1\) is the output of one step of SGD.

Algorithm: One iteration of SGD is described by:

\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, n|]\right), \\ x_{t+1} & = & x_t - \gamma \nabla f_{i}(x_t), \end{eqnarray}\end{split}\]

where \(\gamma\) is a step-size.

Theoretical guarantee: An empirically tight one-iteration guarantee is provided in the code of PESTO [1]:

\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \frac{1}{2}\left(1-\frac{\mu}{L}\right)^2 \|x_0-x_\star\|^2,\]

when \(\gamma=\frac{1}{L}\). Note that we observe the guarantee does not depend on the number \(n\) of functions for this particular setting, thereby implying that the guarantees are also valid for expectation minimization settings (i.e., when \(n\) goes to infinity).

References: Empirically tight guarantee provided in code of [1]. Using SDPs for analyzing SGD-type method was proposed in [2, 3].

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Performance Estimation Toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In 56th IEEE Conference on Decision and Control (CDC).

[2] B. Hu, P. Seiler, L. Lessard (2020). Analysis of biased stochastic gradient descent using sequential semidefinite programs. Mathematical programming (to appear).

[3] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. Conference on Learning Theory (COLT).

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • gamma (float) – the step-size.

  • n (int) – number of functions.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> mu = 0.1
>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_sgd_overparametrized(L=L, mu=mu, gamma=gamma, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 11x11
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (2 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 5 function(s)
                 function 1 : Adding 2 scalar constraint(s) ...
                 function 1 : 2 scalar constraint(s) added
                 function 2 : Adding 2 scalar constraint(s) ...
                 function 2 : 2 scalar constraint(s) added
                 function 3 : Adding 2 scalar constraint(s) ...
                 function 3 : 2 scalar constraint(s) added
                 function 4 : Adding 2 scalar constraint(s) ...
                 function 4 : 2 scalar constraint(s) added
                 function 5 : Adding 2 scalar constraint(s) ...
                 function 5 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.8099999999798264
*** Example file: worst-case performance of stochastic gradient descent with fixed step-size and with zero variance at the optimal point ***
        PEPit guarantee:         E[||x_1 - x_*||^2] <= 0.81 ||x0 - x_*||^2
        Theoretical guarantee:   E[||x_1 - x_*||^2] <= 0.81 ||x0 - x_*||^2

SAGA

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_saga(L, mu, n, verbose=1)[source]

Consider the finite sum convex minimization problem

\[F_\star \triangleq \min_x \left\{F(x) \equiv h(x) + \frac{1}{n} \sum_{i=1}^{n} f_i(x)\right\},\]

where the functions \(f_i\) are assumed to be \(L\)-smooth \(\mu\)-strongly convex, and \(h\) is closed, proper, and convex with a proximal operator readily available.

This code computes the exact rate for a Lyapunov (or energy) function for SAGA [1]. That is, it computes the smallest possible \(\tau(n,L,\mu)\) such this Lyapunov function decreases geometrically

\[\mathbb{E}[V^{(1)}] \leqslant \tau(n, L, \mu) V^{(0)},\]

where the value of the Lyapunov function at iteration \(t\) is denoted by \(V^{(t)}\) and is defined as

\[V^{(t)} \triangleq \frac{1}{n} \sum_{i=1}^n \left(f_i(\phi_i^{(t)}) - f_i(x^\star) - \langle \nabla f_i(x^\star); \phi_i^{(t)} - x^\star\rangle\right) + \frac{1}{2 n \gamma (1-\mu \gamma)} \|x^{(t)} - x^\star\|^2,\]

with \(\gamma = \frac{1}{2(\mu n+L)}\) (this Lyapunov function was proposed in [1, Theorem 1]). We consider the case \(t=0\) in the code below, without loss of generality.

In short, for given values of \(n\), \(L\), and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\mathbb{E}[V^{(1)}]\) when \(V(x^{(0)}) \leqslant 1\).

Algorithm: One iteration of SAGA [1] is described as follows: at iteration \(t\), pick \(j\in\{1,\ldots,n\}\) uniformely at random and set:

\begin{eqnarray} \phi_j^{(t+1)} & = & x^{(t)} \\ w^{(t+1)} & = & x^{(t)} - \gamma \left[ \nabla f_j (\phi_j^{(t+1)}) - \nabla f_j(\phi_j^{(t)}) + \frac{1}{n} \sum_{i=1}^n(\nabla f_i(\phi^{(t)}))\right] \\ x^{(t+1)} & = & \mathrm{prox}_{\gamma h} (w^{(t+1)})\triangleq \arg\min_x \left\{ \gamma h(x)+\frac{1}{2}\|x-w^{(t+1)}\|^2\right\} \end{eqnarray}

Theoretical guarantee: The following upper bound (empirically tight) can be found in [1, Theorem 1]:

\[\mathbb{E}[V^{(t+1)}] \leqslant \left(1-\gamma\mu \right)V^{(t)}\]

References:

[1] A. Defazio, F. Bach, S. Lacoste-Julien (2014). SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in Neural Information Processing Systems (NIPS).

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • n (int) – number of functions.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_saga(L=1, mu=.1, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 27x27
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 6 function(s)
                 function 1 : Adding 30 scalar constraint(s) ...
                 function 1 : 30 scalar constraint(s) added
                 function 2 : Adding 6 scalar constraint(s) ...
                 function 2 : 6 scalar constraint(s) added
                 function 3 : Adding 6 scalar constraint(s) ...
                 function 3 : 6 scalar constraint(s) added
                 function 4 : Adding 6 scalar constraint(s) ...
                 function 4 : 6 scalar constraint(s) added
                 function 5 : Adding 6 scalar constraint(s) ...
                 function 5 : 6 scalar constraint(s) added
                 function 6 : Adding 6 scalar constraint(s) ...
                 function 6 : 6 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.9666748513396348
*** Example file: worst-case performance of SAGA for Lyapunov function V_t ***
        PEPit guarantee:         V^(1) <= 0.966675 V^(0)
        Theoretical guarantee:   V^(1) <= 0.966667 V^(0)

Point SAGA

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_point_saga(L, mu, n, verbose=1)[source]

Consider the finite sum minimization problem

\[F^\star \triangleq \min_x \left\{F(x) \equiv \frac{1}{n} \sum_{i=1}^n f_i(x)\right\},\]

where \(f_1, \dots, f_n\) are \(L\)-smooth and \(\mu\)-strongly convex, and with proximal operator readily available.

This code computes a tight (one-step) worst-case guarantee using a Lyapunov function for Point SAGA [1]. The Lyapunov (or energy) function at a point \(x\) is given in [1, Theorem 5]:

\[V(x) = \frac{1}{L \mu}\frac{1}{n} \sum_{i \leq n} \|\nabla f_i(x) - \nabla f_i(x_\star)\|^2 + \|x - x^\star\|^2,\]

where \(x^\star\) denotes the minimizer of \(F\). The code computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee (in expectation):

\[\mathbb{E}\left[V\left(x^{(1)}\right)\right] \leqslant \tau(n, L, \mu) V\left(x^{(0)}\right),\]

is valid (note that we use the notation \(x^{(0)},x^{(1)}\) to denote two consecutive iterates for convenience; as the bound is valid for all \(x^{(0)}\), it is also valid for any pair of consecutive iterates of the algorithm).

In short, for given values of \(n\), \(L\), and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\mathbb{E}\left[V\left(x^{(1)}\right)\right]\) when \(V\left(x^{(0)}\right) \leqslant 1\).

Algorithm: Point SAGA is described by

\[\begin{split}\begin{eqnarray} \text{Set }\gamma & = & \frac{\sqrt{(n - 1)^2 + 4n\frac{L}{\mu}}}{2Ln} - \frac{\left(1 - \frac{1}{n}\right)}{2L} \\ \text{Pick random }j & \sim & \mathcal{U}\left([|1, n|]\right) \\ z^{(t)} & = & x_t + \gamma \left(g_j^{(t)} - \frac{1}{n} \sum_{i\leq n}g_i^{(t)} \right), \\ x^{(t+1)} & = & \mathrm{prox}_{\gamma f_j}(z^{(t)})\triangleq \arg\min_x\left\{ \gamma f_j(x)+\frac{1}{2} \|x-z^{(t)}\|^2 \right\}, \\ g_j^{(t+1)} & = & \frac{1}{\gamma}(z^{(t)} - x^{(t+1)}). \end{eqnarray}\end{split}\]

Theoretical guarantee: A theoretical upper bound is given in [1, Theorem 5].

\[\mathbb{E}\left[V\left(x^{(t+1)}\right)\right] \leqslant \frac{1}{1 + \mu\gamma} V\left(x^{(t)}\right)\]

References:

[1] A. Defazio (2016). A simple practical accelerated method for finite sums. Advances in Neural Information Processing Systems (NIPS), 29, 676-684.

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • n (int) – number of functions.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_point_saga(L=1, mu=.01, n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 31x31
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 10 function(s)
                 function 1 : Adding 2 scalar constraint(s) ...
                 function 1 : 2 scalar constraint(s) added
                 function 2 : Adding 2 scalar constraint(s) ...
                 function 2 : 2 scalar constraint(s) added
                 function 3 : Adding 2 scalar constraint(s) ...
                 function 3 : 2 scalar constraint(s) added
                 function 4 : Adding 2 scalar constraint(s) ...
                 function 4 : 2 scalar constraint(s) added
                 function 5 : Adding 2 scalar constraint(s) ...
                 function 5 : 2 scalar constraint(s) added
                 function 6 : Adding 2 scalar constraint(s) ...
                 function 6 : 2 scalar constraint(s) added
                 function 7 : Adding 2 scalar constraint(s) ...
                 function 7 : 2 scalar constraint(s) added
                 function 8 : Adding 2 scalar constraint(s) ...
                 function 8 : 2 scalar constraint(s) added
                 function 9 : Adding 2 scalar constraint(s) ...
                 function 9 : 2 scalar constraint(s) added
                 function 10 : Adding 2 scalar constraint(s) ...
                 function 10 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.9714053941143999
*** Example file: worst-case performance of Point SAGA for a given Lyapunov function ***
        PEPit guarantee:         E[V(x^(1))] <= 0.971405 V(x^(0))
        Theoretical guarantee:   E[V(x^(1))] <= 0.973292 V(x^(0))

Randomized coordinate descent for smooth strongly convex functions

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_randomized_coordinate_descent_smooth_strongly_convex(L, mu, gamma, d, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for randomized block-coordinate descent with step-size \(\gamma\). That is, it computes the smallest possible \(\tau(L, \mu, \gamma, d)\) such that the guarantee

\[\mathbb{E}_i[\|x_{t+1}^{(i)} - x_\star \|^2] \leqslant \tau(L, \mu, \gamma, d) \|x_{t} - x_\star\|^2\]

where \(x_{t+1}^{(i)}\) denotes the value of the iterate \(x_{t+1}\) in the scenario where the \(i\) th block of coordinates is selected for the update with fixed step-size \(\gamma\), \(d\) is the number of blocks of coordinates and where \(x_\star\) is a minimizer of \(f\).

In short, for given values of \(\mu\), \(L\), \(d\), and \(\gamma\), \(\tau(L, \mu, \gamma, d)\) is computed as the worst-case value of \(\mathbb{E}_i[\|x_{t+1}^{(i)} - x_\star \|^2]\) when \(\|x_t - x_\star\|^2 \leqslant 1\).

Algorithm: Randomized block-coordinate descent is described by

\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, d|]\right), \\ x_{t+1}^{(i)} & = & x_t - \gamma \nabla_i f(x_t), \end{eqnarray}\end{split}\]

where \(\gamma\) is a step-size and \(\nabla_i f(x_t)\) is the partial derivative corresponding to the block \(i\).

Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), the tight theoretical guarantee can be found in [1, Appendix I, Theorem 17]:

\[\mathbb{E}_i[\|x_{t+1}^{(i)} - x_\star \|^2] \leqslant \rho^2 \|x_t-x_\star\|^2,\]

where \(\rho^2 = \max \left( \frac{(\gamma\mu - 1)^2 + d - 1}{d},\frac{(\gamma L - 1)^2 + d - 1}{d} \right)\).

References:

[1] A. Taylor, F. Bach (2021). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. In Conference on Learning Theory (COLT).

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong-convexity parameter.

  • gamma (float) – the step-size.

  • d (int) – the dimension.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> L = 1
>>> mu = 0.1
>>> gamma = 2 / (mu + L)
>>> pepit_tau, theoretical_tau = wc_randomized_coordinate_descent_smooth_strongly_convex(L=L, mu=mu, gamma=gamma, d=2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 4x4
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (3 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 2 scalar constraint(s) ...
                 function 1 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.8347107377149059
*** Example file: worst-case performance of randomized coordinate gradient descent ***
        PEPit guarantee:         E||x_(n+1) - x_*||^2 <= 0.834711 ||x_n - x_*||^2
        Theoretical guarantee:   E||x_(n+1) - x_*||^2 <= 0.834711 ||x_n - x_*||^2

Randomized coordinate descent for smooth convex functions

PEPit.examples.stochastic_and_randomized_convex_minimization.wc_randomized_coordinate_descent_smooth_convex(L, gamma, d, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is convex and \(L\)-smooth.

This code computes a worst-case guarantee for randomized block-coordinate descent with fixed step-size \(\gamma\). That is, it verifies that the inequality holds (the expectation is over the index of the block of coordinates that is randomly selected)

\[\mathbb{E}_i[\phi(x_{t+1}^{(i)})] \leqslant \phi(x_{t}),\]

where \(x_{t+1}^{(i)}\) denotes the value of the iterate \(x_{t+1}\) in the scenario where the \(i\) th block of coordinates is selected for the update with fixed step-size \(\gamma\), and \(d\) is the number of blocks of coordinates.

In short, for given values of \(L\), \(d\), and \(\gamma\), it computes the worst-case value of \(\mathbb{E}_i[\phi(x_{t+1}^{(i)})]\) such that \(\phi(x_{t}) \leqslant 1\).

Algorithm: Randomized block-coordinate descent is described by

\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, d|]\right), \\ x_{t+1}^{(i)} & = & x_t - \gamma \nabla_i f(x_t), \end{eqnarray}\end{split}\]

where \(\gamma\) is a step-size and \(\nabla_i f(x_t)\) is the partial derivative corresponding to the block \(i\).

Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), the tight theoretical guarantee can be found in [1, Appendix I, Theorem 16]:

\[\mathbb{E}_i[\phi(x^{(i)}_{t+1})] \leqslant \phi(x_{t}),\]

where \(\phi(x_t) = d_t (f(x_t) - f_\star) + \frac{L}{2} \|x_t - x_\star\|^2\), \(d_{t+1} = d_t + \frac{\gamma L}{d}\), and \(d_t \geqslant 1\).

References:

[1] A. Taylor, F. Bach (2021). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. In Conference on Learning Theory (COLT).

Parameters
  • L (float) – the smoothness parameter.

  • gamma (float) – the step-size.

  • d (int) – the dimension.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> L = 1
>>> pepit_tau, theoretical_tau = wc_randomized_coordinate_descent_smooth_convex(L=L, gamma=1 / L, d=2, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 12x12
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (9 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 42 scalar constraint(s) ...
                 function 1 : 42 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.9999978377393944
*** Example file: worst-case performance of randomized  coordinate gradient descent ***
        PEPit guarantee:         E[phi_(n+1)(x_(n+1))] <= 0.999998 phi_n(x_n)
        Theoretical guarantee:   E[phi_(n+1)(x_(n+1))] <= 1.0 phi_n(x_n)

Monotone inclusions and variational inequalities

Proximal point

PEPit.examples.monotone_inclusions_variational_inequalities.wc_proximal_point(alpha, n, verbose=1)[source]

Consider the monotone inclusion problem

\[\mathrm{Find}\, x:\, 0\in Ax,\]

where \(A\) is maximally monotone. We denote \(J_A = (I + A)^{-1}\) the resolvents of \(A\).

This code computes a worst-case guarantee for the proximal point method. That, it computes the smallest possible \(\tau(n, \alpha)\) such that the guarantee

\[\|x_n - x_{n-1}\|^2 \leqslant \tau(n, \alpha) \|x_0 - x_\star\|^2,\]

is valid, where \(x_\star\) is such that \(0 \in Ax_\star\).

Algorithm: The proximal point algorithm for monotone inclusions is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\[x_{t+1} = J_{\alpha A}(x_t),\]

where \(\alpha\) is a step-size.

Theoretical guarantee: A tight theoretical guarantee can be found in [1, section 4].

\[\|x_n - x_{n-1}\|^2 \leqslant \frac{\left(1 - \frac{1}{n}\right)^{n - 1}}{n} \|x_0 - x_\star\|^2.\]

Reference:

[1] G. Gu, J. Yang (2020). Tight sublinear convergence rate of the proximal point algorithm for maximal monotone inclusion problem. SIAM Journal on Optimization, 30(3), 1905-1921.

Parameters
  • alpha (float) – the step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_proximal_point(alpha=2, n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 12x12
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 110 scalar constraint(s) ...
                 function 1 : 110 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.03874199421010509
*** Example file: worst-case performance of the Proximal Point Method***
        PEPit guarantee:         ||x(n) - x(n-1)||^2 <= 0.038742 ||x0 - xs||^2
        Theoretical guarantee:   ||x(n) - x(n-1)||^2 <= 0.038742 ||x0 - xs||^2

Accelerated proximal point

PEPit.examples.monotone_inclusions_variational_inequalities.wc_accelerated_proximal_point(alpha, n, verbose=1)[source]

Consider the monotone inclusion problem

\[\mathrm{Find}\, x:\, 0\in Ax,\]

where \(A\) is maximally monotone. We denote \(J_A = (I + A)^{-1}\) the resolvents of \(A\).

This code computes a worst-case guarantee for the accelerated proximal point method proposed in [1]. That, it computes the smallest possible \(\tau(n, \alpha)\) such that the guarantee

\[\|x_n - y_n\|^2 \leqslant \tau(n, \alpha) \|x_0 - x_\star\|^2,\]

is valid, where \(x_\star\) is such that \(0 \in Ax_\star\).

Algorithm: Accelerated proximal point is described as follows, for \(t \in \{ 0, \dots, n-1\}\)

\[\begin{split}\begin{eqnarray} x_{t+1} & = & J_{\alpha A}(y_t), \\ y_{t+1} & = & x_{t+1} + \frac{t}{t+2}(x_{t+1} - x_{t}) - \frac{t}{t+1}(x_t - y_{t-1}), \end{eqnarray}\end{split}\]

where \(x_0=y_0=y_{-1}\)

Theoretical guarantee: A tight theoretical worst-case guarantee can be found in [1, Theorem 4.1], for \(n \geqslant 1\),

\[\|x_n - y_{n-1}\|^2 \leqslant \frac{1}{n^2} \|x_0 - x_\star\|^2.\]

Reference:

[1] D. Kim (2021). Accelerated proximal point method for maximally monotone operators. Mathematical Programming, 1-31.

Parameters
  • alpha (float) – the step-size

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_proximal_point(alpha=2, n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 12x12
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 110 scalar constraint(s) ...
                 function 1 : 110 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.010000353550061647
*** Example file: worst-case performance of the Accelerated Proximal Point Method***
        PEPit guarantee:         ||x_n - y_n||^2 <= 0.0100004 ||x_0 - x_s||^2
        Theoretical guarantee:   ||x_n - y_n||^2 <= 0.01 ||x_0 - x_s||^2

Optimal Strongly-monotone Proximal Point

PEPit.examples.monotone_inclusions_variational_inequalities.wc_optimal_strongly_monotone_proximal_point(n, mu, verbose=1)[source]

Consider the monotone inclusion problem

\[\mathrm{Find}\, x:\, 0\in Ax,\]

where \(A\) is maximally \(\mu\)-strongly monotone. We denote by \(J_{A}\) the resolvent of \(A\).

For any \(x\) such that \(x = J_{A} y\) for some \(y\), define the resolvent residual \(\tilde{A}x = y - J_{A}y \in Ax\).

This code computes a worst-case guarantee for the Optimal Strongly-monotone Proximal Point Method (OS-PPM). That is, it computes the smallest possible \(\tau(n, \mu)\) such that the guarantee

\[\|\tilde{A}x_n\|^2 \leqslant \tau(n, \mu) \|x_0 - x_\star\|^2,\]

is valid, where \(x_n\) is the output of the Optimal Strongly-monotone Proximal Point Method, and \(x_\star\) is a zero of \(A\). In short, for a given value of \(n, \mu\), \(\tau(n, \mu)\) is computed as the worst-case value of \(\|\tilde{A}x_n\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: The Optimal Strongly-monotone Proximal Point Method can be written as

\begin{eqnarray} x_{t+1} & = & J_{A} y_t,\\ y_{t+1} & = & x_{t+1} + \frac{\varphi_{t} - 1}{\varphi_{t+1}} (x_{t+1} - x_t) - \frac{2 \mu \varphi_{t}}{\varphi_{t+1}} (y_t - x_{t+1}) \\ & & + \frac{(1+2\mu) \varphi_{t-1}}{\varphi_{t+1}} (y_{t-1} - x_t). \end{eqnarray}

where \(\varphi_k = \sum_{i=0}^k (1+2\mu)^{2i}\) with \(\varphi_{-1}=0\) and \(x_0 = y_0 = y_{-1}\) is a starting point.

This method is equivalent to the Optimal Contractive Halpern iteration.

Theoretical guarantee: A tight worst-case guarantee for the Optimal Strongly-monotone Proximal Point Method can be found in [1, Theorem 3.2, Corollary 4.2]:

\[\|\tilde{A}x_n\|^2 \leqslant \left( \frac{1}{\sum_{k=0}^{N-1} (1+2\mu)^k} \right)^2 \|x_0 - x_\star\|^2.\]

References: The detailed approach and tight bound are available in [1].

[1] J. Park, E. Ryu (2022). Exact Optimal Accelerated Complexity for Fixed-Point Iterations. In 39th International Conference on Machine Learning (ICML).

Parameters
  • n (int) – number of iterations.

  • mu (float) – \(\mu \ge 0\). \(A\) will be maximal \(\mu\)-strongly monotone.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_optimal_strongly_monotone_proximal_point(n=10, mu=0.05, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 12x12
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 110 scalar constraint(s) ...
                 function 1 : 110 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.003937868091430488
*** Example file: worst-case performance of Optimal Strongly-monotone Proximal Point Method ***
        PEPit guarantee:         ||AxN||^2 <= 0.00393787 ||x0 - x_*||^2
        Theoretical guarantee:   ||AxN||^2 <= 0.00393698 ||x0 - x_*||^2

Douglas Rachford Splitting

PEPit.examples.monotone_inclusions_variational_inequalities.wc_douglas_rachford_splitting(L, mu, alpha, theta, verbose=1)[source]

Consider the monotone inclusion problem

\[\mathrm{Find}\, x:\, 0\in Ax + Bx,\]

where \(A\) is \(L\)-Lipschitz and maximally monotone and \(B\) is (maximally) \(\mu\)-strongly monotone. We denote by \(J_{\alpha A}\) and \(J_{\alpha B}\) the resolvents of respectively A and B, with step-sizes \(\alpha\).

This code computes a worst-case guarantee for the Douglas-Rachford splitting (DRS). That is, given two initial points \(w^{(0)}_t\) and \(w^{(1)}_t\), this code computes the smallest possible \(\tau(L, \mu, \alpha, \theta)\) (a.k.a. “contraction factor”) such that the guarantee

\[\|w^{(0)}_{t+1} - w^{(1)}_{t+1}\|^2 \leqslant \tau(L, \mu, \alpha, \theta) \|w^{(0)}_{t} - w^{(1)}_{t}\|^2,\]

is valid, where \(w^{(0)}_{t+1}\) and \(w^{(1)}_{t+1}\) are obtained after one iteration of DRS from respectively \(w^{(0)}_{t}\) and \(w^{(1)}_{t}\).

In short, for given values of \(L\), \(\mu\), \(\alpha\) and \(\theta\), the contraction factor \(\tau(L, \mu, \alpha, \theta)\) is computed as the worst-case value of \(\|w^{(0)}_{t+1} - w^{(1)}_{t+1}\|^2\) when \(\|w^{(0)}_{t} - w^{(1)}_{t}\|^2 \leqslant 1\).

Algorithm: One iteration of the Douglas-Rachford splitting is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\begin{eqnarray} x_{t+1} & = & J_{\alpha B} (w_t),\\ y_{t+1} & = & J_{\alpha A} (2x_{t+1}-w_t),\\ w_{t+1} & = & w_t - \theta (x_{t+1}-y_{t+1}). \end{eqnarray}

Theoretical guarantee: Theoretical worst-case guarantees can be found in [1, section 4, Theorem 4.3]. Since the results of [2] tighten that of [1], we compare with [2, Theorem 4.3] below. The theoretical results are complicated and we do not copy them here.

References: The detailed PEP methodology for studying operator splitting is provided in [2].

[1] W. Moursi, L. Vandenberghe (2019). Douglas–Rachford Splitting for the Sum of a Lipschitz Continuous and a Strongly Monotone Operator. Journal of Optimization Theory and Applications 183, 179–198.

[2] E. Ryu, A. Taylor, C. Bergeling, P. Giselsson (2020). Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization, 30(3), 2251-2271.

Parameters
  • L (float) – the Lipschitz parameter.

  • mu (float) – the strongly monotone parameter.

  • alpha (float) – the step-size in the resolvent.

  • theta (float) – algorithm parameter.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_douglas_rachford_splitting(L=1, mu=.1, alpha=1.3, theta=.9, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 6x6
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 4 scalar constraint(s) ...
                 function 1 : 4 scalar constraint(s) added
                 function 2 : Adding 2 scalar constraint(s) ...
                 function 2 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.928770693164459
*** Example file: worst-case performance of the Douglas Rachford Splitting***
        PEPit guarantee:         ||w_(t+1)^0 - w_(t+1)^1||^2 <= 0.928771 ||w_(t)^0 - w_(t)^1||^2
        Theoretical guarantee:   ||w_(t+1)^0 - w_(t+1)^1||^2 <= 0.928771 ||w_(t)^0 - w_(t)^1||^2

Three operator splitting

PEPit.examples.monotone_inclusions_variational_inequalities.wc_three_operator_splitting(L, mu, beta, alpha, theta, verbose=1)[source]

Consider the monotone inclusion problem

\[\mathrm{Find}\, x:\, 0\in Ax + Bx + Cx,\]

where \(A\) is maximally monotone, \(B\) is \(\beta\)-cocoercive and C is the gradient of some \(L\)-smooth \(\mu\)-strongly convex function. We denote by \(J_{\alpha A}\) and \(J_{\alpha B}\) the resolvent of respectively \(A\) and \(B\), with step-size \(\alpha\).

This code computes a worst-case guarantee for the three operator splitting (TOS). That is, given two initial points \(w^{(0)}_t\) and \(w^{(1)}_t\), this code computes the smallest possible \(\tau(L, \mu, \beta, \alpha, \theta)\) (a.k.a. “contraction factor”) such that the guarantee

\[\|w^{(0)}_{t+1} - w^{(1)}_{t+1}\|^2 \leqslant \tau(L, \mu, \beta, \alpha, \theta) \|w^{(0)}_{t} - w^{(1)}_{t}\|^2,\]

is valid, where \(w^{(0)}_{t+1}\) and \(w^{(1)}_{t+1}\) are obtained after one iteration of TOS from respectively \(w^{(0)}_{t}\) and \(w^{(1)}_{t}\).

In short, for given values of \(L\), \(\mu\), \(\beta\), \(\alpha\) and \(\theta\), the contraction factor \(\tau(L, \mu, \beta, \alpha, \theta)\) is computed as the worst-case value of \(\|w^{(0)}_{t+1} - w^{(1)}_{t+1}\|^2\) when \(\|w^{(0)}_{t} - w^{(1)}_{t}\|^2 \leqslant 1\).

Algorithm: One iteration of the algorithm is described in [1]. For \(t \in \{ 0, \dots, n-1\}\),

\begin{eqnarray} x_{t+1} & = & J_{\alpha B} (w_t),\\ y_{t+1} & = & J_{\alpha A} (2x_{t+1} - w_t - C x_{t+1}),\\ w_{t+1} & = & w_t - \theta (x_{t+1} - y_{t+1}). \end{eqnarray}

References: The TOS was proposed in [1], the analysis of such operator splitting methods using PEPs was proposed in [2].

[1] D. Davis, W. Yin (2017). A three-operator splitting scheme and its optimization applications. Set-valued and variational analysis, 25(4), 829-858.

[2] E. Ryu, A. Taylor, C. Bergeling, P. Giselsson (2020). Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization, 30(3), 2251-2271.

Parameters
  • L (float) – smoothness constant of C.

  • mu (float) – strong convexity of C.

  • beta (float) – cocoercivity of B.

  • alpha (float) – step-size (in the resolvants).

  • theta (float) – overrelaxation parameter.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (None) – no theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_three_operator_splitting(L=1, mu=.1, beta=1, alpha=.9, theta=1.3, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 8x8
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 3 function(s)
                 function 1 : Adding 2 scalar constraint(s) ...
                 function 1 : 2 scalar constraint(s) added
                 function 2 : Adding 2 scalar constraint(s) ...
                 function 2 : 2 scalar constraint(s) added
                 function 3 : Adding 2 scalar constraint(s) ...
                 function 3 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.7796889999218343
*** Example file: worst-case contraction factor of the Three Operator Splitting ***
        PEPit guarantee:         ||w_(t+1)^0 - w_(t+1)^1||^2 <= 0.779689 ||w_(t)^0 - w_(t)^1||^2

Optimistic gradient

PEPit.examples.monotone_inclusions_variational_inequalities.wc_optimistic_gradient(n, gamma, L, verbose=1)[source]

Consider the monotone variational inequality

\[\mathrm{Find}\, x_\star \in C\text{ such that } \left<F(x_\star);x-x_\star\right> \geqslant 0\,\,\forall x\in C,\]

where \(C\) is a closed convex set and \(F\) is maximally monotone and Lipschitz.

This code computes a worst-case guarantee for the optimistic gradient method. That, it computes the smallest possible \(\tau(n)\) such that the guarantee

\[\|\tilde{x}_n - \tilde{x}_{n-1}\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2,\]

is valid, where \(\tilde{x}_n\) is the output of the optimistic gradient method and \(x_0\) its starting point.

Algorithm: The optimistic gradient method is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\begin{eqnarray} \tilde{x}_{t} & = & \mathrm{Proj}_{C} [x_t-\gamma F(\tilde{x}_{t-1})], \\ {x}_{t+1} & = & \tilde{x}_t + \gamma (F(\tilde{x}_{t-1}) - F(\tilde{x}_t)). \end{eqnarray}

where \(\gamma\) is some step-size.

Theoretical guarantee: The method and many variants of it are discussed in [1] and a PEP formulation suggesting a worst-case guarantee in \(O(1/n)\) can be found in [2, Appendix D].

References:

[1] Y.-G. Hsieh, F. Iutzeler, J. Malick, P. Mertikopoulos (2019). On the convergence of single-call stochastic extra-gradient methods. Advances in Neural Information Processing Systems, 32:6938–6948, 2019

[2] E. Gorbunov, A. Taylor, G. Gidel (2022). Last-Iterate Convergence of Optimistic Gradient Method for Monotone Variational Inequalities.

Parameters
  • n (int) – number of iterations.

  • gamma (float) – the step-size.

  • L (float) – the Lipschitz constant.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (None) – no theoretical bound.

Example

>>> pepit_tau, theoretical_tau = wc_optimistic_gradient(n=5, gamma=1 / 4, L=1, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 15x15
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 49 scalar constraint(s) ...
                 function 1 : 49 scalar constraint(s) added
                 function 2 : Adding 84 scalar constraint(s) ...
                 function 2 : 84 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.06631469189357277
*** Example file: worst-case performance of the Optimistic Gradient Method***
        PEPit guarantee:         ||x(n) - x(n-1)||^2 <= 0.0663147 ||x0 - xs||^2

Past extragradient

PEPit.examples.monotone_inclusions_variational_inequalities.wc_past_extragradient(n, gamma, L, verbose=1)[source]

Consider the monotone variational inequality

\[\mathrm{Find}\, x_\star \in C\text{ such that } \left<F(x_\star);x-x_\star\right> \geqslant 0\,\,\forall x\in C,\]

where \(C\) is a closed convex set and \(F\) is maximally monotone and Lipschitz.

This code computes a worst-case guarantee for the past extragradient method. That, it computes the smallest possible \(\tau(n)\) such that the guarantee

\[\|x_n - x_{n-1}\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2,\]

is valid, where \(x_n\) is the output of the past extragradient method and \(x_0\) its starting point.

Algorithm: The past extragradient method is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\begin{eqnarray} \tilde{x}_{t} & = & \mathrm{Proj}_{C} [x_t-\gamma F(\tilde{x}_{t-1})], \\ {x}_{t+1} & = & \mathrm{Proj}_{C} [x_t-\gamma F(\tilde{x}_{t})]. \end{eqnarray}

where \(\gamma\) is some step-size.

Theoretical guarantee: The method and many variants of it are discussed in [1]. A worst-case guarantee in \(O(1/n)\) can be found in [2, 3].

References:

[1] Y.-G. Hsieh, F. Iutzeler, J. Malick, P. Mertikopoulos (2019). On the convergence of single-call stochastic extra-gradient methods. Advances in Neural Information Processing Systems, 32:6938–6948, 2019

[2] E. Gorbunov, A. Taylor, G. Gidel (2022). Last-Iterate Convergence of Optimistic Gradient Method for Monotone Variational Inequalities.

[3] Y. Cai, A. Oikonomou, W. Zheng (2022). Tight Last-Iterate Convergence of the Extragradient and the Optimistic Gradient Descent-Ascent Algorithm for Constrained Monotone Variational Inequalities.

Parameters
  • n (int) – number of iterations.

  • gamma (float) – the step-size.

  • L (float) – the Lipschitz constant.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (None) – no theoretical bound.

Example

>>> pepit_tau, theoretical_tau = wc_past_extragradient(n=5, gamma=1 / 4, L=1, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 20x20
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 144 scalar constraint(s) ...
                 function 1 : 144 scalar constraint(s) added
                 function 2 : Adding 84 scalar constraint(s) ...
                 function 2 : 84 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.06026126041500441
*** Example file: worst-case performance of the Past Extragradient Method***
        PEPit guarantee:         ||x(n) - x(n-1)||^2 <= 0.0602613 ||x0 - xs||^2

Fixed point

Halpern iteration

PEPit.examples.fixed_point_problems.wc_halpern_iteration(n, verbose=1)[source]

Consider the fixed point problem

\[\mathrm{Find}\, x:\, x = Ax,\]

where \(A\) is a non-expansive operator, that is a \(L\)-Lipschitz operator with \(L=1\).

This code computes a worst-case guarantee for the Halpern Iteration. That is, it computes the smallest possible \(\tau(n)\) such that the guarantee

\[\|x_n - Ax_n\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the Halpern iteration, and \(x_\star\) the fixed point of \(A\).

In short, for a given value of \(n\), \(\tau(n)\) is computed as the worst-case value of \(\|x_n - Ax_n\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: The Halpern iteration can be written as

\[x_{t+1} = \frac{1}{t + 2} x_0 + \left(1 - \frac{1}{t + 2}\right) Ax_t.\]

Theoretical guarantee: A tight worst-case guarantee for Halpern iteration can be found in [1, Theorem 2.1]:

\[\|x_n - Ax_n\|^2 \leqslant \left(\frac{2}{n+1}\right)^2 \|x_0 - x_\star\|^2.\]

References: The detailed approach and tight bound are available in [1].

[1] F. Lieder (2021). On the convergence rate of the Halpern-iteration. Optimization Letters, 15(2), 405-418.

Parameters
  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_halpern_iteration(n=25, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 28x28
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 702 scalar constraint(s) ...
                 function 1 : 702 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.005933984368783424
*** Example file: worst-case performance of Halpern Iterations ***
        PEPit guarantee:         ||xN - AxN||^2 <= 0.00593398 ||x0 - x_*||^2
        Theoretical guarantee:   ||xN - AxN||^2 <= 0.00591716 ||x0 - x_*||^2

Optimal Contractive Halpern iteration

PEPit.examples.fixed_point_problems.wc_optimal_contractive_halpern_iteration(n, gamma, verbose=1)[source]

Consider the fixed point problem

\[\mathrm{Find}\, x:\, x = Ax,\]

where \(A\) is a \(1/\gamma\)-contractive operator, i.e. a \(L\)-Lipschitz operator with \(L=1/\gamma\).

This code computes a worst-case guarantee for the Optimal Contractive Halpern Iteration. That is, it computes the smallest possible \(\tau(n, \gamma)\) such that the guarantee

\[\|x_n - Ax_n\|^2 \leqslant \tau(n, \gamma) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the Optimal Contractive Halpern iteration, and \(x_\star\) is the fixed point of \(A\). In short, for a given value of \(n, \gamma\), \(\tau(n, \gamma)\) is computed as the worst-case value of \(\|x_n - Ax_n\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: The Optimal Contractive Halpern iteration can be written as

\[x_{t+1} = \left(1 - \frac{1}{\varphi_{t+1}} \right) Ax_t + \frac{1}{\varphi_{t+1}} x_0.\]

where \(\varphi_k = \sum_{i=0}^k \gamma^{2i}\) and \(x_0\) is a starting point.

Theoretical guarantee: A tight worst-case guarantee for the Optimal Contractive Halpern iteration can be found in [1, Corollary 3.3, Theorem 4.1]:

\[\|x_n - Ax_n\|^2 \leqslant \left(1 + \frac{1}{\gamma}\right)^2 \left( \frac{1}{\sum_{k=0}^n \gamma^k} \right)^2 \|x_0 - x_\star\|^2.\]

References: The detailed approach and tight bound are available in [1].

[1] J. Park, E. Ryu (2022). Exact Optimal Accelerated Complexity for Fixed-Point Iterations. In 39th International Conference on Machine Learning (ICML).

Parameters
  • n (int) – number of iterations.

  • gamma (float) – \(\gamma \ge 1\). \(A\) will be \(1/\gamma\)-contractive.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_optimal_contractive_halpern_iteration(n=10, gamma=1.1, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 13x13
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 132 scalar constraint(s) ...
                 function 1 : 132 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.010613882599724987
*** Example file: worst-case performance of Optimal Contractive Halpern Iterations ***
        PEPit guarantee:         ||xN - AxN||^2 <= 0.0106139 ||x0 - x_*||^2
        Theoretical guarantee:   ||xN - AxN||^2 <= 0.0106132 ||x0 - x_*||^2

Krasnoselskii-Mann with constant step-sizes

PEPit.examples.fixed_point_problems.wc_krasnoselskii_mann_constant_step_sizes(n, gamma, verbose=1)[source]

Consider the fixed point problem

\[\mathrm{Find}\, x:\, x = Ax,\]

where \(A\) is a non-expansive operator, that is a \(L\)-Lipschitz operator with \(L=1\).

This code computes a worst-case guarantee for the Krasnolselskii-Mann (KM) method with constant step-size. That is, it computes the smallest possible \(\tau(n)\) such that the guarantee

\[\frac{1}{4}\|x_n - Ax_n\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the KM method, and \(x_\star\) is some fixed point of \(A\) (i.e., \(x_\star=Ax_\star\)).

Algorithm: The constant step-size KM method is described by

\[x_{t+1} = \left(1 - \gamma\right) x_{t} + \gamma Ax_{t}.\]

Theoretical guarantee: A theoretical upper bound is provided by [1, Theorem 4.9]

\[\begin{split}\tau(n) = \left\{ \begin{eqnarray} \frac{1}{n+1}\left(\frac{n}{n+1}\right)^n \frac{1}{4 \gamma (1 - \gamma)}\quad & \text{if } \frac{1}{2}\leqslant \gamma \leqslant \frac{1}{2}\left(1+\sqrt{\frac{n}{n+1}}\right) \\ (\gamma - 1)^{2n} \quad & \text{if } \frac{1}{2}\left(1+\sqrt{\frac{n}{n+1}}\right) < \gamma \leqslant 1. \end{eqnarray} \right.\end{split}\]

Reference:

[1] F. Lieder (2018). Projection Based Methods for Conic Linear Programming Optimal First Order Complexities and Norm Constrained Quasi Newton Methods. PhD thesis, HHU Düsseldorf.

Parameters
  • n (int) – number of iterations.

  • gamma (float) – step-size between 1/2 and 1

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_krasnoselskii_mann_constant_step_sizes(n=3, gamma=3 / 4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 6x6
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 20 scalar constraint(s) ...
                 function 1 : 20 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.14062586461718285
*** Example file: worst-case performance of Kranoselskii-Mann iterations ***
        PEPit guarantee:         1/4||xN - AxN||^2 <= 0.140626 ||x0 - x_*||^2
        Theoretical guarantee:   1/4||xN - AxN||^2 <= 0.140625 ||x0 - x_*||^2

Krasnoselskii-Mann with increasing step-sizes

PEPit.examples.fixed_point_problems.wc_krasnoselskii_mann_increasing_step_sizes(n, verbose=1)[source]

Consider the fixed point problem

\[\mathrm{Find}\, x:\, x = Ax,\]

where \(A\) is a non-expansive operator, that is a \(L\)-Lipschitz operator with \(L=1\).

This code computes a worst-case guarantee for the Krasnolselskii-Mann method. That is, it computes the smallest possible \(\tau(n)\) such that the guarantee

\[\frac{1}{4}\|x_n - Ax_n\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the KM method, and \(x_\star\) is some fixed point of \(A\) (i.e., \(x_\star=Ax_\star\)).

Algorithm: The KM method is described by

\[x_{t+1} = \frac{1}{t + 2} x_{t} + \left(1 - \frac{1}{t + 2}\right) Ax_{t}.\]

Reference: This scheme was first studied using PEPs in [1].

[1] F. Lieder (2018). Projection Based Methods for Conic Linear Programming Optimal First Order Complexities and Norm Constrained Quasi Newton Methods. PhD thesis, HHU Düsseldorf.

Parameters
  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (None) – no theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_krasnoselskii_mann_increasing_step_sizes(n=3, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 6x6
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 20 scalar constraint(s) ...
                 function 1 : 20 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.11963406474148795
*** Example file: worst-case performance of Kranoselskii-Mann iterations ***
        PEPit guarantee:         1/4 ||xN - AxN||^2 <= 0.119634 ||x0 - x_*||^2

Potential functions

Gradient descent Lyapunov 1

PEPit.examples.potential_functions.wc_gradient_descent_lyapunov_1(L, gamma, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code verifies a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it verifies that the Lyapunov (or potential/energy) function

\[V_n \triangleq n (f(x_n) - f_\star) + \frac{L}{2} \|x_n - x_\star\|^2\]

is decreasing along all trajectories and all smooth convex function \(f\) (i.e., in the worst-case):

\[V_{n+1} \leqslant V_n,\]

where \(x_{n+1}\) is obtained from a gradient step from \(x_{n}\) with fixed step-size \(\gamma=\frac{1}{L}\).

Algorithm: Onte iteration of gradient descent is described by

\[x_{n+1} = x_n - \gamma \nabla f(x_n),\]

where \(\gamma\) is a step-size.

Theoretical guarantee: The theoretical guarantee can be found in e.g., [1, Theorem 3.3]:

\[V_{n+1} - V_n \leqslant 0,\]

when \(\gamma=\frac{1}{L}\).

References: The detailed potential function can found in [1] and the SDP approach can be found in [2].

[1] N. Bansal, A. Gupta (2019). Potential-function proofs for gradient methods. Theory of Computing, 15(1), 1-32.

[2] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. Conference on Learning Theory (COLT).

Parameters
  • L (float) – the smoothness parameter.

  • gamma (float) – the step-size.

  • n (int) – current iteration number.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Examples

>>> L = 1
>>> pepit_tau, theoretical_tau = wc_gradient_descent_lyapunov_1(L=L, gamma=1 / L, n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 4x4
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 6 scalar constraint(s) ...
                 function 1 : 6 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 3.3902995517363515e-18
*** Example file: worst-case performance of gradient descent with fixed step-size for a given Lyapunov function***
        PEPit guarantee:        V_(n+1) - V_(n) <= 3.3903e-18
        Theoretical guarantee:  V_(n+1) - V_(n) <= 0.0

Gradient descent Lyapunov 2

PEPit.examples.potential_functions.wc_gradient_descent_lyapunov_2(L, gamma, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code verifies a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it verifies that the Lyapunov (or potential/energy) function

\[V_n \triangleq (2n + 1) L \left(f(x_n) - f_\star\right) + n(n+2) \|\nabla f(x_n)\|^2 + L^2 \|x_n - x_\star\|^2\]

is decreasing along all trajectories and all smooth convex function \(f\) (i.e., in the worst-case):

\[V_{n+1} \leqslant V_n,\]

where \(x_{n+1}\) is obtained from a gradient step from \(x_{n}\) with fixed step-size \(\gamma=\frac{1}{L}\).

Algorithm: Onte iteration of radient descent is described by

\[x_{n+1} = x_n - \gamma \nabla f(x_n),\]

where \(\gamma\) is a step-size.

Theoretical guarantee: The theoretical guarantee can be found in [1, Theorem 3]:

\[V_{n+1} - V_n \leqslant 0,\]

when \(\gamma=\frac{1}{L}\).

References: The detailed potential function and SDP approach can be found in [1].

[1] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. Conference on Learning Theory (COLT).

Parameters
  • L (float) – the smoothness parameter.

  • gamma (float) – the step-size.

  • n (int) – current iteration number.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Examples

>>> L = 1
>>> pepit_tau, theoretical_tau = wc_gradient_descent_lyapunov_2(L=L, gamma=1 / L, n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 4x4
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 6 scalar constraint(s) ...
                 function 1 : 6 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 1.894425729310791e-17
*** Example file: worst-case performance of gradient descent with fixed step size for a given Lyapunov function***
        PEPit guarantee:        V_(n+1) - V_(n) <= 1.89443e-17
        Theoretical guarantee:  V_(n+1) - V_(n) <= 0.0

Accelerated gradient method

PEPit.examples.potential_functions.wc_accelerated_gradient_method(L, gamma, lam, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code verifies a worst-case guarantee for an accelerated gradient method. That is, it verifies that the Lyapunov (or potential/energy) function

\[V_n \triangleq \lambda_n^2 (f(x_n) - f_\star) + \frac{L}{2} \|z_n - x_\star\|^2\]

is decreasing along all trajectories and all smooth convex function \(f\) (i.e., in the worst-case):

\[V_{n+1} \leqslant V_n,\]

where \(x_{n+1}\), \(z_{n+1}\), and \(\lambda_{n+1}\) are obtained from one iteration of the accelerated gradient method below, from some arbitrary \(x_{n}\), \(z_{n}\), and \(\lambda_{n}\).

Algorithm: One iteration of accelerated gradient method is described by

\[\begin{split}\begin{eqnarray} \text{Set: }\lambda_{n+1} & = & \frac{1}{2} \left(1 + \sqrt{4\lambda_n^2 + 1}\right), \tau_n & = & \frac{1}{\lambda_{n+1}}, \text{ and } \eta_n & = & \frac{\lambda_{n+1}^2 - \lambda_{n}^2}{L} \\ y_n & = & (1 - \tau_n) x_n + \tau_n z_n,\\ z_{n+1} & = & z_n - \eta_n \nabla f(y_n), \\ x_{n+1} & = & y_n - \gamma \nabla f(y_n). \end{eqnarray}\end{split}\]

Theoretical guarantee: The following worst-case guarantee can be found in e.g., [2, Theorem 5.3]:

\[V_{n+1} - V_n \leqslant 0,\]

when \(\gamma=\frac{1}{L}\).

References: The potential can be found in the historical [1]; and in more recent works, e.g., [2, 3].

[1] Y. Nesterov (1983). A method for solving the convex programming problem with convergence rate :math:`O(1/k^2). In Dokl. akad. nauk Sssr (Vol. 269, pp. 543-547). <http://www.mathnet.ru/links/9bcb158ed2df3d8db3532aafd551967d/dan46009.pdf>`_

[2] N. Bansal, A. Gupta (2019). Potential-function proofs for gradient methods. Theory of Computing, 15(1), 1-32.

[3] A. d’Aspremont, D. Scieur, A. Taylor (2021). Acceleration Methods. Foundations and Trends in Optimization: Vol. 5, No. 1-2.

Parameters
  • L (float) – the smoothness parameter.

  • gamma (float) – the step-size.

  • lam (float) – the initial value for sequence \((\lambda_t)_t\).

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Examples

>>> L = 1
>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_method(L=L, gamma=1 / L, lam=10., verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 6x6
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 12 scalar constraint(s) ...
                 function 1 : 12 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 5.264872499157039e-14
*** Example file: worst-case performance of accelerated gradient method for a given Lyapunov function***
        PEPit guarantee:         V_(n+1) - V_n <= 5.26487e-14
        Theoretical guarantee:   V_(n+1) - V_n <= 0.0

Inexact proximal methods

Accelerated inexact forward backward

PEPit.examples.inexact_proximal_methods.wc_accelerated_inexact_forward_backward(L, zeta, n, verbose=1)[source]

Consider the composite convex minimization problem,

\[F_\star \triangleq \min_x \left\{F(x) \equiv f(x) + g(x) \right\},\]

where \(f\) is \(L\)-smooth convex, and \(g\) is closed, proper, and convex. We further assume that one can readily evaluate the gradient of \(f\) and that one has access to an inexact version of the proximal operator of \(g\) (whose level of accuracy is controlled by some parameter \(\zeta\in (0,1)\)).

This code computes a worst-case guarantee for an accelerated inexact forward backward (AIFB) method (a.k.a., inexact accelerated proximal gradient method). That is, it computes the smallest possible \(\tau(n, L, \zeta)\) such that the guarantee

\[F(x_n) - F(x_\star) \leqslant \tau(n, L, \zeta) \|x_0 - x_\star\|^2,\]

is valid, where \(x_n\) is the output of the IAFB, and where \(x_\star\) is a minimizer of \(F\).

In short, for given values of \(n\), \(L\) and \(\zeta\), \(\tau(n, L, \zeta)\) is computed as the worst-case value of \(F(x_n) - F(x_\star)\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).

Algorithm: Let \(t\in\{0,1,\ldots,n\}\). The method is presented in, e.g., [1, Algorithm 3.1]. For simplicity, we instantiate [1, Algorithm 3.1] using simple values for its parameters and for the problem setting (in the notation of [1]: \(A_0\triangleq 0\), \(\mu=0\), \(\xi_t \triangleq0\), \(\sigma_t\triangleq 0\), \(\lambda_t \triangleq\gamma\triangleq\tfrac{1}{L}\), \(\zeta_t\triangleq\zeta\), \(\eta \triangleq (1-\zeta^2) \gamma\)), and without backtracking, arriving to:

\begin{eqnarray} A_{t+1} && = A_t + \frac{\eta+\sqrt{\eta^2+4\eta A_t}}{2},\\ y_{t} && = x_t + \frac{A_{t+1}-A_t}{A_{t+1}} (z_t-x_t),\\ (x_{t+1},v_{t+1}) && \approx_{\varepsilon_t} \left(\mathrm{prox}_{\gamma g}\left(y_t-\gamma \nabla f(y_t)\right),\, \mathrm{prox}_{ g^*/\gamma}\left(\frac{y_t-\gamma \nabla f(y_t)}{\gamma}\right)\right),\\ && \text{with } \varepsilon_t = \frac{\zeta^2\gamma^2}{2}\|v_{t+1}+\nabla f(y_t) \|^2,\\ z_{t+1} && = z_t-(A_{t+1}-A_t)\left(v_{t+1}+\nabla f(y_t)\right),\\ \end{eqnarray}

where \(\{\varepsilon_t\}_{t\geqslant 0}\) is some sequence of accuracy parameters (whose values are fixed within the algorithm as it runs), and \(\{A_t\}_{t\geqslant 0}\) is some scalar sequence of parameters for the method (typical of accelerated methods).

The line with “\(\approx_{\varepsilon}\)” can be described as the pair \((x_{t+1},v_{t+1})\) satisfying an accuracy requirement provided by [1, Definition 2.3]. More precisely (but without providing any intuition), it requires the existence of some \(w_{t+1}\) such that \(v_{t+1} \in \partial g(w_{t+1})\) and for which the accuracy requirement

\[\gamma^2 || x_{t+1} - y_t + \gamma v_{t+1} ||^2 + \gamma (g(x_{t+1}) - g(w_{t+1}) - v_{t+1}(x_{t+1} - w_{t+1})) \leqslant \varepsilon_t,\]

is valid.

Theoretical guarantee: A theoretical upper bound is obtained in [1, Corollary 3.5]:

\[F(x_n)-F_\star\leqslant \frac{2L \|x_0-x_\star\|^2}{(1-\zeta^2)n^2}.\]

References: The method and theoretical result can be found in [1, Section 3].

[1] M. Barre, A. Taylor, F. Bach (2021). A note on approximate accelerated forward-backward methods with absolute and relative errors, and possibly strongly convex objectives. arXiv:2106.15536v2.

Parameters
  • L (float) – smoothness parameter.

  • zeta (float) – relative approximation parameter in (0,1).

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_inexact_forward_backward(L=1.3, zeta=.45, n=11, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 59x59
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 156 scalar constraint(s) ...
                 function 1 : 156 scalar constraint(s) added
                 function 2 : Adding 528 scalar constraint(s) ...
                 function 2 : 528 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.018869997698251897
*** Example file: worst-case performance of an inexact accelerated forward backward method ***
        PEPit guarantee:         F(x_n)-F_* <= 0.01887 ||x_0 - x_*||^2
        Theoretical guarantee:   F(x_n)-F_* <= 0.0269437 ||x_0 - x_*||^2

Partially inexact Douglas Rachford splitting

PEPit.examples.inexact_proximal_methods.wc_partially_inexact_douglas_rachford_splitting(mu, L, n, gamma, sigma, verbose=1)[source]

Consider the composite strongly convex minimization problem,

\[F_\star \triangleq \min_x \left\{ F(x) \equiv f(x) + g(x) \right\}\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex, and \(g\) is closed convex and proper. We denote by \(x_\star = \arg\min_x F(x)\) the minimizer of \(F\). The (exact) proximal operator of \(g\), and an approximate version of the proximal operator of \(f\) are assumed to be available.

This code computes a worst-case guarantee for a partially inexact Douglas-Rachford Splitting (DRS). That is, it computes the smallest possible \(\tau(n,L,\mu,\sigma,\gamma)\) such that the guarantee

\[\|z_{n} - z_\star\|^2 \leqslant \tau(n,L,\mu,\sigma,\gamma) \|z_0 - z_\star\|^2\]

is valid, where \(z_n\) is the output of the DRS (initiated at \(x_0\)), \(z_\star\) is its fixed point, \(\gamma\) is a step-size, and \(\sigma\) is the level of inaccuracy.

Algorithm: The partially inexact Douglas-Rachford splitting under consideration is described by

\begin{eqnarray} x_{t} && \approx_{\sigma} \arg\min_x \left\{ \gamma f(x)+\frac{1}{2} \|x-z_t\|^2 \right\},\\ y_{t} && = \arg\min_y \left\{ \gamma g(y)+\frac{1}{2} \|y-(x_t-\gamma \nabla f(x_t))\|^2 \right\},\\ z_{t+1} && = z_t + y_t - x_t. \end{eqnarray}

More precisely, the notation “\(\approx_{\sigma}\)” correspond to require the existence of some \(e_{t}\) such that

\begin{eqnarray} x_{t} && = z_t - \gamma (\nabla f(x_t) - e_t),\\ y_{t} && = \arg\min_y \left\{ \gamma g(y)+\frac{1}{2} \|y-(x_t-\gamma \nabla f(x_t))\|^2 \right\},\\ && \text{with } \|e_t\|^2 \leqslant \frac{\sigma^2}{\gamma^2}\|y_{t} - z_t + \gamma \nabla f(x_t) \|^2,\\ z_{t+1} && = z_t + y_t - x_t. \end{eqnarray}

Theoretical guarantee: The following tight theoretical bound is due to [2, Theorem 5.1]:

\[\|z_{n} - z_\star\|^2 \leqslant \max\left(\frac{1 - \sigma + \gamma \mu \sigma}{1 - \sigma + \gamma \mu}, \frac{\sigma + (1 - \sigma) \gamma L}{1 + (1 - \sigma) \gamma L)}\right)^{2n} \|z_0 - z_\star\|^2.\]

References: The method is from [1], its PEP formulation and the worst-case analysis from [2], see [2, Section 4.4] for more details.

[1] J. Eckstein and W. Yao (2018). Relative-error approximate versions of Douglas–Rachford splitting and special cases of the ADMM. Mathematical Programming, 170(2), 417-444.

[2] M. Barre, A. Taylor, F. Bach (2020). Principled analyses and design of first-order methods with inexact proximal operators, arXiv 2006.06041v2.

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • n (int) – number of iterations.

  • gamma (float) – the step-size.

  • sigma (float) – noise parameter.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_partially_inexact_douglas_rachford_splitting(mu=.1, L=5, n=5, gamma=1.4, sigma=.2, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 18x18
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 40 scalar constraint(s) ...
                 function 1 : 40 scalar constraint(s) added
                 function 2 : Adding 30 scalar constraint(s) ...
                 function 2 : 30 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.28120549805153155
*** Example file: worst-case performance of the partially inexact Douglas Rachford splitting ***
        PEPit guarantee:         ||z_n - z_*||^2 <= 0.281205 ||z_0 - z_*||^2
        Theoretical guarantee:   ||z_n - z_*||^2 <= 0.281206 ||z_0 - z_*||^2

Relatively inexact proximal point

PEPit.examples.inexact_proximal_methods.wc_relatively_inexact_proximal_point_algorithm(n, gamma, sigma, verbose=1)[source]

Consider the (possibly non-smooth) convex minimization problem,

\[f_\star \triangleq \min_x f(x)\]

where \(f\) is closed, convex, and proper. We denote by \(x_\star\) some optimal point of \(f\) (hence \(0\in\partial f(x_\star)\)). We further assume that one has access to an inexact version of the proximal operator of \(f\), whose level of accuracy is controlled by some parameter \(\sigma\geqslant 0\).

This code computes a worst-case guarantee for an inexact proximal point method. That is, it computes the smallest possible \(\tau(n, \gamma, \sigma)\) such that the guarantee

\[f(x_n) - f(x_\star) \leqslant \tau(n, \gamma, \sigma) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the method, \(\gamma\) is some step-size, and \(\sigma\) is the level of accuracy of the approximate proximal point oracle.

Algorithm: The approximate proximal point method under consideration is described by

\[x_{t+1} \approx_{\sigma} \arg\min_x \left\{ \gamma f(x)+\frac{1}{2} \|x-x_t\|^2 \right\},\]

where the notation “\(\approx_{\sigma}\)” corresponds to require the existence of some vector \(s_{t+1}\in\partial f(x_{t+1})\) and \(e_{t+1}\) such that

\[x_{t+1} = x_t - \gamma s_{t+1} + e_{t+1} \quad \quad \text{with }\|e_{t+1}\|^2 \leqslant \sigma^2\|x_{t+1} - x_t\|^2.\]

We note that the case \(\sigma=0\) implies \(e_{t+1}=0\) and this operation reduces to a standard proximal step with step-size \(\gamma\).

Theoretical guarantee: The following (empirical) upper bound is provided in [1, Section 3.5.1],

\[f(x_n) - f(x_\star) \leqslant \frac{1 + \sigma}{4 \gamma n^{\sqrt{1 - \sigma^2}}}\|x_0 - x_\star\|^2.\]

References: The precise formulation is presented in [1, Section 3.5.1].

[1] M. Barre, A. Taylor, F. Bach (2020). Principled analyses and design of first-order methods with inexact proximal operators. arXiv 2006.06041v2.

Parameters
  • n (int) – number of iterations.

  • gamma (float) – the step-size.

  • sigma (float) – accuracy parameter of the proximal point computation.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_relatively_inexact_proximal_point_algorithm(n=8, gamma=10, sigma=.65, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 18x18
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 88 scalar constraint(s) ...
                 function 1 : 88 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal_inaccurate (solver: SCS); optimal value: 0.007753853579615959
*** Example file: worst-case performance of an inexact proximal point method in distance in function values ***
        PEPit guarantee:         f(x_n) - f(x_*) <= 0.00775385 ||x_0 - x_*||^2
        Theoretical guarantee:   f(x_n) - f(x_*) <= 0.00849444 ||x_0 - x_*||^2

Adaptive methods

Polyak steps in distance to optimum

PEPit.examples.adaptive_methods.wc_polyak_steps_in_distance_to_optimum(L, mu, gamma, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex, and \(x_\star=\arg\min_x f(x)\).

This code computes a worst-case guarantee for a variant of a gradient method relying on Polyak step-sizes (PS). That is, it computes the smallest possible \(\tau(L, \mu, \gamma)\) such that the guarantee

\[\|x_{t+1} - x_\star\|^2 \leqslant \tau(L, \mu, \gamma) \|x_{t} - x_\star\|^2\]

is valid, where \(x_t\) is the output of the gradient method with PS and \(\gamma\) is the effective value of the step-size of the gradient method with PS.

In short, for given values of \(L\), \(\mu\), and \(\gamma\), \(\tau(L, \mu, \gamma)\) is computed as the worst-case value of \(\|x_{t+1} - x_\star\|^2\) when \(\|x_{t} - x_\star\|^2 \leqslant 1\).

Algorithm: Gradient descent is described by

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size. The Polyak step-size rule under consideration here corresponds to choosing of \(\gamma\) satisfying:

\[\gamma \|\nabla f(x_t)\|^2 = 2 (f(x_t) - f_\star).\]

Theoretical guarantee: The gradient method with the variant of Polyak step-sizes under consideration enjoys the tight theoretical guarantee [1, Proposition 1]:

\[\|x_{t+1} - x_\star\|^2 \leqslant \tau(L, \mu, \gamma) \|x_{t} - x_\star\|^2,\]

where \(\gamma\) is the effective step-size used at iteration \(t\) and

\begin{eqnarray} \tau(L, \mu, \gamma) & = & \left\{\begin{array}{ll} \frac{(\gamma L-1)(1-\gamma \mu)}{\gamma(L+\mu)-1} & \text{if } \gamma\in[\tfrac{1}{L},\tfrac{1}{\mu}],\\ 0 & \text{otherwise.} \end{array}\right. \end{eqnarray}

References:

[1] M. Barré, A. Taylor, A. d’Aspremont (2020). Complexity guarantees for Polyak steps with momentum. In Conference on Learning Theory (COLT).

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • gamma (float) – the step-size.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> L = 1
>>> mu = 0.1
>>> gamma = 2 / (L + mu)
>>> pepit_tau, theoretical_tau = wc_polyak_steps_in_distance_to_optimum(L=L, mu=mu, gamma=gamma, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 4x4
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (2 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 6 scalar constraint(s) ...
                 function 1 : 6 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.66942148764241
*** Example file: worst-case performance of Polyak steps ***
        PEPit guarantee:         ||x_1 - x_*||^2 <= 0.669421 ||x_0 - x_*||^2
        Theoretical guarantee:   ||x_1 - x_*||^2 <= 0.669421 ||x_0 - x_*||^2

Polyak steps in function value

PEPit.examples.adaptive_methods.wc_polyak_steps_in_function_value(L, mu, gamma, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex, and \(x_\star=\arg\min_x f(x)\).

This code computes a worst-case guarantee for a variant of a gradient method relying on Polyak step-sizes. That is, it computes the smallest possible \(\tau(L, \mu, \gamma)\) such that the guarantee

\[f(x_{t+1}) - f_\star \leqslant \tau(L, \mu, \gamma) (f(x_t) - f_\star)\]

is valid, where \(x_t\) is the output of the gradient method with PS and \(\gamma\) is the effective value of the step-size of the gradient method.

In short, for given values of \(L\), \(\mu\), and \(\gamma\), \(\tau(L, \mu, \gamma)\) is computed as the worst-case value of \(f(x_{t+1})-f_\star\) when \(f(x_t)-f_\star \leqslant 1\).

Algorithm: Gradient descent is described by

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size. The Polyak step-size rule under consideration here corresponds to choosing of \(\gamma\) satisfying:

\[\|\nabla f(x_t)\|^2 = 2 L (2 - L \gamma) (f(x_t) - f_\star).\]

Theoretical guarantee: The gradient method with the variant of Polyak step-sizes under consideration enjoys the tight theoretical guarantee [1, Proposition 2]:

\[f(x_{t+1})-f_\star \leqslant \tau(L,\mu,\gamma) (f(x_{t})-f_\star),\]

where \(\gamma\) is the effective step-size used at iteration \(t\) and

\begin{eqnarray} \tau(L,\mu,\gamma) & = & \left\{\begin{array}{ll} (\gamma L - 1) (L \gamma (3 - \gamma (L + \mu)) - 1) & \text{if } \gamma\in[\tfrac{1}{L},\tfrac{2L-\mu}{L^2}],\\ 0 & \text{otherwise.} \end{array}\right. \end{eqnarray}

References:

[1] M. Barré, A. Taylor, A. d’Aspremont (2020). Complexity guarantees for Polyak steps with momentum. In Conference on Learning Theory (COLT).

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • gamma (float) – the step-size.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> L = 1
>>> mu = 0.1
>>> gamma = 2 / (L + mu)
>>> pepit_tau, theoretical_tau = wc_polyak_steps_in_function_value(L=L, mu=mu, gamma=gamma, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 4x4
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (2 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 6 scalar constraint(s) ...
                 function 1 : 6 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.6694215432773613
*** Example file: worst-case performance of Polyak steps ***
        PEPit guarantee:         f(x_1) - f_* <= 0.669422 (f(x_0) - f_*)
        Theoretical guarantee:   f(x_1) - f_* <= 0.669421 (f(x_0) - f_*)

Low dimensional worst-cases scenarios

Inexact gradient

PEPit.examples.low_dimensional_worst_cases_scenarios.wc_inexact_gradient(L, mu, epsilon, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for an inexact gradient method and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee using \(10\) iterations of the logdet heuristic.

That is, it computes the smallest possible \(\tau(n,L,\mu,\varepsilon)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n,L,\mu,\varepsilon) (f(x_0) - f_\star)\]

is valid, where \(x_n\) is the output of the gradient descent with an inexact descent direction, and where \(x_\star\) is the minimizer of \(f\). Then, it looks for a low-dimensional nearly achieving this performance.

The inexact descent direction is assumed to satisfy a relative inaccuracy described by (with \(0 \leqslant \varepsilon \leqslant 1\))

\[\|\nabla f(x_t) - d_t\| \leqslant \varepsilon \|\nabla f(x_t)\|,\]

where \(\nabla f(x_t)\) is the true gradient, and \(d_t\) is the approximate descent direction that is used.

Algorithm:

The inexact gradient descent under consideration can be written as

\[x_{t+1} = x_t - \frac{2}{L_{\varepsilon} + \mu_{\varepsilon}} d_t\]

where \(d_t\) is the inexact search direction, \(L_{\varepsilon} = (1 + \varepsilon)L\) and \(\mu_{\varepsilon} = (1-\varepsilon) \mu\).

Theoretical guarantee:

A tight worst-case guarantee obtained in [1, Theorem 5.3] or [2, Remark 1.6] is

\[f(x_n) - f_\star \leqslant \left(\frac{L_{\varepsilon} - \mu_{\varepsilon}}{L_{\varepsilon} + \mu_{\varepsilon}}\right)^{2n}(f(x_0) - f_\star ),\]

with \(L_{\varepsilon} = (1 + \varepsilon)L\) and \(\mu_{\varepsilon} = (1-\varepsilon) \mu\). This guarantee is achieved on one-dimensional quadratic functions.

References:The detailed analyses can be found in [1, 2]. The logdet heuristic is presented in [3].

[1] E. De Klerk, F. Glineur, A. Taylor (2020). Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation. SIAM Journal on Optimization, 30(3), 2053-2082.

[2] O. Gannot (2021). A frequency-domain analysis of inexact gradient methods. Mathematical Programming (to appear).

[3] F. Maryam, H. Hindi, S. Boyd (2003). Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. American Control Conference (ACC).

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong convexity parameter.

  • epsilon (float) – level of inaccuracy

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_inexact_gradient(L=1, mu=0.1, epsilon=0.1, n=6, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 15x15
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 62 scalar constraint(s) ...
                 function 1 : 62 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.13989778793516514
(PEPit) Postprocessing: 2 eigenvalue(s) > 1.7005395180119392e-05 before dimension reduction
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.1398878008962302
(PEPit) Postprocessing: 2 eigenvalue(s) > 5.283608596989854e-06 after 1 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.13988778337004493
(PEPit) Postprocessing: 2 eigenvalue(s) > 5.335098252373141e-06 after 2 dimension reduction step(s)
(PEPit) Solver status: optimal (solver: SCS); objective value: 0.1398927512487368
(PEPit) Postprocessing: 2 eigenvalue(s) > 1.2372028101610534e-05 after 3 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.13988824650439619
(PEPit) Postprocessing: 2 eigenvalue(s) > 2.006867894032787e-05 after 4 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.13988779568391294
(PEPit) Postprocessing: 2 eigenvalue(s) > 5.416953129163531e-06 after 5 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.1398889451757595
(PEPit) Postprocessing: 2 eigenvalue(s) > 3.983502472713177e-05 after 6 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.13988780180833413
(PEPit) Postprocessing: 2 eigenvalue(s) > 5.4785759855262395e-06 after 7 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.13988778218159367
(PEPit) Postprocessing: 2 eigenvalue(s) > 5.360843247635456e-06 after 8 dimension reduction step(s)
(PEPit) Solver status: optimal (solver: SCS); objective value: 0.13988478099895965
(PEPit) Postprocessing: 2 eigenvalue(s) > 9.59529914206238e-06 after 9 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.13988791535665998
(PEPit) Postprocessing: 2 eigenvalue(s) > 9.339529753603287e-06 after 10 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.13988791535665998
(PEPit) Postprocessing: 2 eigenvalue(s) > 9.339529753603287e-06 after dimension reduction
*** Example file: worst-case performance of inexact gradient ***
        PEPit example:           f(x_n)-f_* == 0.139888 (f(x_0)-f_*)
        Theoretical guarantee:   f(x_n)-f_* <= 0.139731 (f(x_0)-f_*)

Non-convex gradient descent

PEPit.examples.low_dimensional_worst_cases_scenarios.wc_gradient_descent(L, gamma, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth.

This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\), and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee. That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee

\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \tau(n, L, \gamma) (f(x_0) - f(x_n))\]

is valid, where \(x_n\) is the n-th iterates obtained with the gradient method with fixed step-size. Then, it looks for a low-dimensional nearly achieving this performance.

Algorithm: Gradient descent is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size and.

Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), an empirically tight theoretical worst-case guarantee is

\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \frac{4}{3}\frac{L}{n} (f(x_0) - f(x_n)),\]

see discussions in [1, page 190] and [2].

References:

[1] Taylor, A. B. (2017). Convex interpolation and performance estimation of first-order methods for convex optimization. PhD Thesis, UCLouvain.

[2] H. Abbaszadehpeivasti, E. de Klerk, M. Zamani (2021). The exact worst-case convergence rate of the gradient method with fixed step lengths for L-smooth functions. Optimization Letters, 16(6), 1649-1661.

Parameters
  • L (float) – the smoothness parameter.

  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> L = 1
>>> gamma = 1 / L
>>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=gamma, n=5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 6 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 30 scalar constraint(s) ...
                 function 1 : 30 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.2666769474847614
(PEPit) Postprocessing: 6 eigenvalue(s) > 0.0 before dimension reduction
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); objective value: 0.266667996380269
(PEPit) Postprocessing: 2 eigenvalue(s) > 1.0527850440294492e-05 after 1 dimension reduction step(s)
(PEPit) Solver status: optimal (solver: SCS); objective value: 0.2666668138016744
(PEPit) Postprocessing: 2 eigenvalue(s) > 2.510763274714993e-07 after 2 dimension reduction step(s)
(PEPit) Solver status: optimal (solver: SCS); objective value: 0.2666668138016744
(PEPit) Postprocessing: 2 eigenvalue(s) > 2.510763274714993e-07 after dimension reduction
*** Example file: worst-case performance of gradient descent with fixed step-size ***
        PEPit example:           min_i ||f'(x_i)||^2 == 0.266667 (f(x_0)-f_*)
        Theoretical guarantee:   min_i ||f'(x_i)||^2 <= 0.266667 (f(x_0)-f_*)

Optimized gradient

PEPit.examples.low_dimensional_worst_cases_scenarios.wc_optimized_gradient(L, n, verbose=1)[source]

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and convex.

This code computes a worst-case guarantee for optimized gradient method (OGM), and applies the trace heuristic for trying to find a low-dimensional worst-case example on which this guarantee is nearly achieved. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of OGM and where \(x_\star\) is a minimizer of \(f\). Then, it applies the trace heuristic, which allows obtaining a one-dimensional function on which the guarantee is nearly achieved.

Algorithm: The optimized gradient method is described by

\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t)\\ y_{t+1} & = & x_{t+1} + \frac{\theta_{t}-1}{\theta_{t+1}}(x_{t+1}-x_t)+\frac{\theta_{t}}{\theta_{t+1}}(x_{t+1}-y_t), \end{eqnarray}

with

\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}. \end{eqnarray}

Theoretical guarantee: The tight theoretical guarantee can be found in [2, Theorem 2]:

\[f(x_n)-f_\star \leqslant \frac{L\|x_0-x_\star\|^2}{2\theta_n^2}.\]

References: The OGM was developed in [1,2]. Low-dimensional worst-case functions for OGM were obtained in [3, 4].

[1] Y. Drori, M. Teboulle (2014). Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming 145(1–2), 451–482.

[2] D. Kim, J. Fessler (2016). Optimized first-order methods for smooth convex minimization. Mathematical Programming 159.1-2: 81-107.

[3] A. Taylor, J. Hendrickx, F. Glineur (2017). Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Mathematical Programming, 161(1-2), 307-345.

[4] D. Kim, J. Fessler (2017). On the convergence analysis of the optimized gradient method. Journal of Optimization Theory and Applications, 172(1), 187-205.

Parameters
  • L (float) – the smoothness parameter.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_optimized_gradient(L=3, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 30 scalar constraint(s) ...
                 function 1 : 30 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.07675218017587908
(PEPit) Postprocessing: 5 eigenvalue(s) > 0.00012110342786525262 before dimension reduction
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); objective value: 0.0767421794376856
(PEPit) Postprocessing: 1 eigenvalue(s) > 5.187978263167338e-09 after dimension reduction
*** Example file: worst-case performance of optimized gradient method ***
        PEPit example:           f(y_n)-f_* == 0.0767422 ||x_0 - x_*||^2
        Theoretical guarantee:   f(y_n)-f_* <= 0.0767518 ||x_0 - x_*||^2

Frank Wolfe

PEPit.examples.low_dimensional_worst_cases_scenarios.wc_frank_wolfe(L, D, n, verbose=1)[source]

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where \(f_1\) is \(L\)-smooth and convex and where \(f_2\) is a convex indicator function on \(\mathcal{D}\) of diameter at most \(D\).

This code computes a worst-case guarantee for the conditional gradient method, aka Frank-Wolfe method, and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee using \(12\) iterations of the logdet heuristic. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee

\[F(x_n) - F(x_\star) \leqslant \tau(n, L) D^2,\]

is valid, where x_n is the output of the conditional gradient method, and where \(x_\star\) is a minimizer of \(F\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(F(x_n) - F(x_\star)\) when \(D \leqslant 1\). Then, it looks for a low-dimensional nearly achieving this performance.

Algorithm:

This method was first presented in [1]. A more recent version can be found in, e.g., [2, Algorithm 1]. For \(t \in \{0, \dots, n-1\}\),

\[\begin{split}\begin{eqnarray} y_t & = & \arg\min_{s \in \mathcal{D}} \langle s \mid \nabla f_1(x_t) \rangle, \\ x_{t+1} & = & \frac{t}{t + 2} x_t + \frac{2}{t + 2} y_t. \end{eqnarray}\end{split}\]

Theoretical guarantee:

An upper guarantee obtained in [2, Theorem 1] is

\[F(x_n) - F(x_\star) \leqslant \frac{2L D^2}{n+2}.\]

References: The algorithm is presented in, among others, [1, 2]. The logdet heuristic is presented in [3].

[1] M .Frank, P. Wolfe (1956). An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2), 95-110.

[2] M. Jaggi (2013). Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In 30th International Conference on Machine Learning (ICML).

[3] F. Maryam, H. Hindi, S. Boyd (2003). Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. American Control Conference (ACC).

Parameters
  • L (float) – the smoothness parameter.

  • D (float) – diameter of \(f_2\).

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_frank_wolfe(L=1, D=1, n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 26x26
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 132 scalar constraint(s) ...
                 function 1 : 132 scalar constraint(s) added
                 function 2 : Adding 325 scalar constraint(s) ...
                 function 2 : 325 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.07830185202143693
(PEPit) Postprocessing: 12 eigenvalue(s) > 0.0006226631118848632 before dimension reduction
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.07828372031738319
(PEPit) Postprocessing: 11 eigenvalue(s) > 4.365697148503946e-06 after 1 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.07826632525166947
(PEPit) Postprocessing: 11 eigenvalue(s) > 1.2665145818615854e-05 after 2 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.07824094610510846
(PEPit) Postprocessing: 11 eigenvalue(s) > 2.4505278932874855e-05 after 3 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.07820114036570962
(PEPit) Postprocessing: 11 eigenvalue(s) > 4.164155031005524e-05 after 4 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.07823286027467699
(PEPit) Postprocessing: 10 eigenvalue(s) > 9.73301991908838e-05 after 5 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.07823446697003811
(PEPit) Postprocessing: 10 eigenvalue(s) > 0.00011791962010861412 after 6 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.07823446697003811
(PEPit) Postprocessing: 10 eigenvalue(s) > 0.00011791962010861412 after dimension reduction
*** Example file: worst-case performance of the Conditional Gradient (Frank-Wolfe) in function value ***
        PEPit example:           f(x_n)-f_* == 0.0782345 ||x0 - xs||^2
        Theoretical guarantee:   f(x_n)-f_* <= 0.166667 ||x0 - xs||^2

Proximal point

PEPit.examples.low_dimensional_worst_cases_scenarios.wc_proximal_point(alpha, n, verbose=1)[source]

Consider the monotone inclusion problem

\[\mathrm{Find}\, x:\, 0\in Ax,\]

where \(A\) is maximally monotone. We denote \(J_A = (I + A)^{-1}\) the resolvents of \(A\).

This code computes a worst-case guarantee for the proximal point method, and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee using the trace heuristic.

That is, it computes the smallest possible \(\tau(n, \alpha)\) such that the guarantee

\[\|x_n - x_{n-1}\|^2 \leqslant \tau(n, \alpha) \|x_0 - x_\star\|^2,\]

is valid, where \(x_\star\) is such that \(0 \in Ax_\star\). Then, it looks for a low-dimensional nearly achieving this performance.

Algorithm: The proximal point algorithm for monotone inclusions is described as follows, for \(t \in \{ 0, \dots, n-1\}\),

\[x_{t+1} = J_{\alpha A}(x_t),\]

where \(\alpha\) is a step-size.

Theoretical guarantee: A tight theoretical guarantee can be found in [1, section 4].

\[\|x_n - x_{n-1}\|^2 \leqslant \frac{\left(1 - \frac{1}{n}\right)^{n - 1}}{n} \|x_0 - x_\star\|^2.\]

Reference:

[1] G. Gu, J. Yang (2020). Tight sublinear convergence rate of the proximal point algorithm for maximal monotone inclusion problem. SIAM Journal on Optimization, 30(3), 1905-1921.

Parameters
  • alpha (float) – the step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value.

  • theoretical_tau (float) – theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_proximal_point(alpha=2.2, n=11, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 13x13
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 132 scalar constraint(s) ...
                 function 1 : 132 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.03504735907840766
(PEPit) Postprocessing: 2 eigenvalue(s) > 1.885183851963194e-06 before dimension reduction
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); objective value: 0.03503739338571882
(PEPit) Postprocessing: 2 eigenvalue(s) > 1.9044504527414672e-06 after dimension reduction
*** Example file: worst-case performance of the Proximal Point Method***
        PEPit example:           ||x(n) - x(n-1)||^2 == 0.0350374 ||x0 - xs||^2
        Theoretical guarantee:   ||x(n) - x(n-1)||^2 <= 0.0350494 ||x0 - xs||^2

Halpern iteration

PEPit.examples.low_dimensional_worst_cases_scenarios.wc_halpern_iteration(n, verbose=1)[source]

Consider the fixed point problem

\[\mathrm{Find}\, x:\, x = Ax,\]

where \(A\) is a non-expansive operator, that is a \(L\)-Lipschitz operator with \(L=1\).

This code computes a worst-case guarantee for the Halpern Iteration, and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee. That is, it computes the smallest possible \(\tau(n)\) such that the guarantee

\[\|x_n - Ax_n\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the Halpern iteration, and \(x_\star\) the fixed point of \(A\).

In short, for a given value of \(n\), \(\tau(n)\) is computed as the worst-case value of \(\|x_n - Ax_n\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\). Then, it looks for a low-dimensional nearly achieving this performance.

Algorithm: The Halpern iteration can be written as

\[x_{t+1} = \frac{1}{t + 2} x_0 + \left(1 - \frac{1}{t + 2}\right) Ax_t.\]

Theoretical guarantee: A tight worst-case guarantee for Halpern iteration can be found in [1, Theorem 2.1]:

\[\|x_n - Ax_n\|^2 \leqslant \left(\frac{2}{n+1}\right)^2 \|x_0 - x_\star\|^2.\]

References: The detailed approach and tight bound are available in [1].

[1] F. Lieder (2021). On the convergence rate of the Halpern-iteration. Optimization Letters, 15(2), 405-418.

[2] F. Maryam, H. Hindi, S. Boyd (2003). Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. American Control Conference (ACC).

Parameters
  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_halpern_iteration(n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 13x13
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 132 scalar constraint(s) ...
                 function 1 : 132 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.033076981475854986
(PEPit) Postprocessing: 11 eigenvalue(s) > 2.538373915093237e-06 before dimension reduction
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); objective value: 0.03306531836320572
(PEPit) Postprocessing: 2 eigenvalue(s) > 0.00010453609338097841 after 1 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.0330736415198303
(PEPit) Postprocessing: 2 eigenvalue(s) > 4.3812352924839906e-05 after 2 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.03307313275765859
(PEPit) Postprocessing: 2 eigenvalue(s) > 4.715648695840045e-05 after 3 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.03307313275765859
(PEPit) Postprocessing: 2 eigenvalue(s) > 4.715648695840045e-05 after dimension reduction
*** Example file: worst-case performance of Halpern Iterations ***
        PEPit example:           ||xN - AxN||^2 == 0.0330731 ||x0 - x_*||^2
        Theoretical guarantee:   ||xN - AxN||^2 <= 0.0330579 ||x0 - x_*||^2

Alternate projections

PEPit.examples.low_dimensional_worst_cases_scenarios.wc_alternate_projections(n, verbose=1)[source]

Consider the convex feasibility problem:

\[\mathrm{Find}\, x\in Q_1\cap Q_2\]

where \(Q_1\) and \(Q_2\) are two closed convex sets.

This code computes a worst-case guarantee for the alternate projection method, and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee. That is, it computes the smallest possible \(\tau(n)\) such that the guarantee

\[\|\mathrm{Proj}_{Q_1}(x_n)-\mathrm{Proj}_{Q_2}(x_n)\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the alternate projection method, and \(x_\star\in Q_1\cap Q_2\) is a solution to the convex feasibility problem.

In short, for a given value of \(n\), \(\tau(n)\) is computed as the worst-case value of \(\|\mathrm{Proj}_{Q_1}(x_n)-\mathrm{Proj}_{Q_2}(x_n)\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\). Then, it looks for a low-dimensional nearly achieving this performance.

Algorithm: The alternate projection method can be written as

\[\begin{split}\begin{eqnarray} y_{t+1} & = & \mathrm{Proj}_{Q_1}(x_t), \\ x_{t+1} & = & \mathrm{Proj}_{Q_2}(y_{t+1}). \end{eqnarray}\end{split}\]

References: The first results on this method are due to [1]. Its translation in PEPs is due to [2].

[1] J. Von Neumann (1949). On rings of operators. Reduction theory. Annals of Mathematics, pp. 401–485.

[2] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

Parameters
  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (None) – no theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_alternate_projections(n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 24x24
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 144 scalar constraint(s) ...
                 function 1 : 144 scalar constraint(s) added
                 function 2 : Adding 121 scalar constraint(s) ...
                 function 2 : 121 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.018858674370385117
(PEPit) Postprocessing: 2 eigenvalue(s) > 0.0003128757392530764 before dimension reduction
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); objective value: 0.018851597249912744
(PEPit) Postprocessing: 2 eigenvalue(s) > 7.314172662475898e-06 after 1 dimension reduction step(s)
(PEPit) Solver status: optimal (solver: SCS); objective value: 0.018851597249912744
(PEPit) Postprocessing: 2 eigenvalue(s) > 7.314172662475898e-06 after dimension reduction
*** Example file: worst-case performance of the alternate projection method ***
        PEPit example:   ||Proj_Q1 (xn) - Proj_Q2 (xn)||^2 == 0.0188516 ||x0 - x_*||^2

Averaged projections

PEPit.examples.low_dimensional_worst_cases_scenarios.wc_averaged_projections(n, verbose=1)[source]

Consider the convex feasibility problem:

\[\mathrm{Find}\, x\in Q_1\cap Q_2\]

where \(Q_1\) and \(Q_2\) are two closed convex sets.

This code computes a worst-case guarantee for the averaged projection method, and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee. That is, it computes the smallest possible \(\tau(n)\) such that the guarantee

\[\|\mathrm{Proj}_{Q_1}(x_n)-\mathrm{Proj}_{Q_2}(x_n)\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the averaged projection method, and \(x_\star\in Q_1\cap Q_2\) is a solution to the convex feasibility problem.

In short, for a given value of \(n\), \(\tau(n)\) is computed as the worst-case value of \(\|\mathrm{Proj}_{Q_1}(x_n)-\mathrm{Proj}_{Q_2}(x_n)\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\). Then, it looks for a low-dimensional nearly achieving this performance.

Algorithm: The averaged projection method can be written as

\[\begin{eqnarray} x_{t+1} & = & \frac{1}{2} \left(\mathrm{Proj}_{Q_1}(x_t) + \mathrm{Proj}_{Q_2}(x_t)\right). \end{eqnarray}\]
Parameters
  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (None) – no theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_averaged_projections(n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 25x25
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 144 scalar constraint(s) ...
                 function 1 : 144 scalar constraint(s) added
                 function 2 : Adding 144 scalar constraint(s) ...
                 function 2 : 144 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.06845454756941292
(PEPit) Postprocessing: 2 eigenvalue(s) > 0.00014022393949281894 before dimension reduction
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); objective value: 0.06844459892544441
(PEPit) Postprocessing: 2 eigenvalue(s) > 7.442958512820225e-07 after 1 dimension reduction step(s)
(PEPit) Solver status: optimal (solver: SCS); objective value: 0.06844459892544441
(PEPit) Postprocessing: 2 eigenvalue(s) > 7.442958512820225e-07 after dimension reduction
*** Example file: worst-case performance of the averaged projection method ***
        PEPit example:   ||Proj_Q1 (xn) - Proj_Q2 (xn)||^2 == 0.0684446 ||x0 - x_*||^2

Dykstra

PEPit.examples.low_dimensional_worst_cases_scenarios.wc_dykstra(n, verbose=1)[source]

Consider the convex feasibility problem:

\[\mathrm{Find}\, x\in Q_1\cap Q_2\]

where \(Q_1\) and \(Q_2\) are two closed convex sets.

This code computes a worst-case guarantee for the Dykstra projection method, and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee. That is, it computes the smallest possible \(\tau(n)\) such that the guarantee

\[\|\mathrm{Proj}_{Q_1}(x_n)-\mathrm{Proj}_{Q_2}(x_n)\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2\]

is valid, where \(x_n\) is the output of the Dykstra projection method, and \(x_\star\in Q_1\cap Q_2\) is a solution to the convex feasibility problem.

In short, for a given value of \(n\), \(\tau(n)\) is computed as the worst-case value of \(\|\mathrm{Proj}_{Q_1}(x_n)-\mathrm{Proj}_{Q_2}(x_n)\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\). Then, it looks for a low-dimensional nearly achieving this performance.

Algorithm: The Dykstra projection method can be written as

\[\begin{split}\begin{eqnarray} y_{t} & = & \mathrm{Proj}_{Q_1}(x_t+p_t), \\ p_{t+1} & = & x_t + p_t - y_t,\\ x_{t+1} & = & \mathrm{Proj}_{Q_2}(y_t+q_t),\\ q_{t+1} & = & y_t + q_t - x_{t+1}. \end{eqnarray}\end{split}\]

References: This method is due to [1].

[1] J.P. Boyle, R.L. Dykstra (1986). A method for finding projections onto the intersection of convex sets in Hilbert spaces. Lecture Notes in Statistics. Vol. 37. pp. 28–47.

Parameters
  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (None) – no theoretical value.

Example

>>> pepit_tau, theoretical_tau = wc_dykstra(n=10, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 24x24
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 2 function(s)
                 function 1 : Adding 144 scalar constraint(s) ...
                 function 1 : 144 scalar constraint(s) added
                 function 2 : Adding 121 scalar constraint(s) ...
                 function 2 : 121 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal_inaccurate (solver: SCS); optimal value: 0.020649148184166164
(PEPit) Postprocessing: 3 eigenvalue(s) > 0.003245910668057083 before dimension reduction
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.02124334648210737
(PEPit) Postprocessing: 3 eigenvalue(s) > 0.002134191248999246 after 1 dimension reduction step(s)
(PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.02124334648210737
(PEPit) Postprocessing: 3 eigenvalue(s) > 0.002134191248999246 after dimension reduction
*** Example file: worst-case performance of the Dykstra projection method ***
        PEPit example:   ||Proj_Q1 (xn) - Proj_Q2 (xn)||^2 == 0.0212433 ||x0 - x_*||^2

Continuous-time models

Gradient flow for strongly convex functions

PEPit.examples.continuous_time_models.wc_gradient_flow_strongly_convex(mu, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(\mu\)-strongly convex.

This code computes a worst-case guarantee for a gradient flow. That is, it computes the smallest possible \(\tau(\mu)\) such that the guarantee

\[\frac{d}{dt}\mathcal{V}(X_t) \leqslant -\tau(\mu)\mathcal{V}(X_t) ,\]

is valid, where \(\mathcal{V}(X_t) = f(X_t) - f(x_\star)\), \(X_t\) is the output of the gradient flow, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(\mu\), \(\tau(\mu)\) is computed as the worst-case value of the derivative \(f(X_t)-f_\star\) when \(f(X_t) - f(x_\star)\leqslant 1\).

Algorithm: For \(t \geqslant 0\),

\[\frac{d}{dt}X_t = -\nabla f(X_t),\]

with some initialization \(X_{0}\triangleq x_0\).

Theoretical guarantee:

The following tight guarantee can be found in [1, Proposition 11]:

\[\frac{d}{dt}\mathcal{V}(X_t) \leqslant -2\mu\mathcal{V}(X_t).\]

The detailed approach using PEPs is available in [2, Theorem 2.1].

References:

[1] D. Scieur, V. Roulet, F. Bach and A. D’Aspremont (2017). Integration methods and accelerated optimization algorithms. In Advances in Neural Information Processing Systems (NIPS).

[2] C. Moucer, A. Taylor, F. Bach (2022). A systematic approach to Lyapunov analyses of continuous-time models in convex optimization.

Parameters
  • mu (float) – the strong convexity parameter

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_gradient_flow_strongly_convex(mu=0.1, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 3x3
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 2 scalar constraint(s) ...
                 function 1 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: -0.20000000011533495
*** Example file: worst-case performance of the gradient flow ***
        PEPit guarantee:         d/dt[f(X_t)-f_*] <= -0.2 (f(X_t) - f(x_*))
        Theoretical guarantee:   d/dt[f(X_t)-f_*] <= -0.2 (f(X_t) - f(x_*))

Gradient flow for convex functions

PEPit.examples.continuous_time_models.wc_gradient_flow_convex(t, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is convex.

This code computes a worst-case guarantee for a gradient flow. That is, it verifies the following inequality

\[\frac{d}{dt}\mathcal{V}(X_t, t) \leqslant 0,\]

is valid, where \(\mathcal{V}(X_t, t) = t(f(X_t) - f(x_\star)) + \frac{1}{2} \|X_t - x_\star\|^2\), \(X_t\) is the output of the gradient flow, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(t\), it verifies \(\frac{d}{dt}\mathcal{V}(X_t, t)\leqslant 0\).

Algorithm: For \(t \geqslant 0\),

\[\frac{d}{dt}X_t = -\nabla f(X_t),\]

with some initialization \(X_{0}\triangleq x_0\).

Theoretical guarantee:

The following tight guarantee can be found in [1, p. 7]:

\[\frac{d}{dt}\mathcal{V}(X_t, t) \leqslant 0.\]

After integrating between \(0\) and \(T\),

\[f(X_T) - f_\star \leqslant \frac{1}{2T}\|x_0 - x_\star\|^2.\]

The detailed approach using PEPs is available in [2, Theorem 2.3].

References:

[1] W. Su, S. Boyd, E. J. Candès (2016). A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights. In the Journal of Machine Learning Research (JMLR).

[2] C. Moucer, A. Taylor, F. Bach (2022). A systematic approach to Lyapunov analyses of continuous-time models in convex optimization.

Parameters
  • t (float) – time step

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_gradient_flow_convex(t=2.5, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 3x3
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 2 scalar constraint(s) ...
                 function 1 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 1.910532459863401e-18
*** Example file: worst-case performance of the gradient flow ***
        PEPit guarantee:         d/dt V(X_t) <= 1.91053e-18
        Theoretical guarantee:   d/dt V(X_t) <= 0.0

Accelerated gradient flow for strongly convex functions

PEPit.examples.continuous_time_models.wc_accelerated_gradient_flow_strongly_convex(mu, psd=True, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_{x\in\mathbb{R}^d} f(x),\]

where \(f\) is \(\mu\)-strongly convex.

This code computes a worst-case guarantee for an accelerated gradient flow. That is, it computes the smallest possible \(\tau(\mu)\) such that the guarantee

\[\frac{d}{dt}\mathcal{V}_{P}(X_t) \leqslant -\tau(\mu)\mathcal{V}_P(X_t) ,\]

is valid with

\[\mathcal{V}_{P}(X_t) = f(X_t) - f(x_\star) + (X_t - x_\star, \frac{d}{dt}X_t)^T(P \otimes I_d)(X_t - x_\star, \frac{d}{dt}X_t) ,\]

where \(I_d\) is the identity matrix, \(X_t\) is the output of an accelerated gradient flow, and where \(x_\star\) is the minimizer of \(f\).

In short, for given values of \(\mu\), \(\tau(\mu)\) is computed as the worst-case value of the derivative of \(f(X_t)-f_\star\) when \(f(X_t) - f(x_\star)\leqslant 1\).

Algorithm: For \(t \geqslant 0\),

\[\frac{d^2}{dt^2}X_t + 2\sqrt{\mu}\frac{d}{dt}X_t + \nabla f(X_t) = 0,\]

with some initialization \(X_{0}\triangleq x_0\).

Theoretical guarantee:

The following tight guarantee for \(P = \frac{1}{2}\begin{pmatrix} \mu & \sqrt{\mu} \\ \sqrt{\mu} & 1\end{pmatrix}\), for which \(\mathcal{V}_{P} \geqslant 0\) can be found in [1, Appendix B], [2, Theorem 4.3]:

\[\frac{d}{dt}\mathcal{V}_P(X_t) \leqslant -\sqrt{\mu}\mathcal{V}_P(X_t).\]

For \(P = \begin{pmatrix} \frac{4}{9}\mu & \frac{4}{3}\sqrt{\mu} \\ \frac{4}{3}\sqrt{\mu} & \frac{1}{2}\end{pmatrix}\), for which \(\mathcal{V}_{P}(X_t) \geqslant 0\) along the trajectory, the following tight guarantee can be found in [3, Corollary 2.5],

\[\frac{d}{dt}\mathcal{V}_P(X_t) \leqslant -\frac{4}{3}\sqrt{\mu}\mathcal{V}_P(X_t).\]

References:

[1] A. C. Wilson, B. Recht, M. I. Jordan (2021). A Lyapunov analysis of accelerated methods in optimization. In the Journal of Machine Learning Reasearch (JMLR), 22(113):1−34, 2021.

[2] J.M. Sanz-Serna and K. C. Zygalakis (2021) The connections between Lyapunov functions for some optimization algorithms and differential equations. In SIAM Journal on Numerical Analysis, 59 pp 1542-1565.

[3] C. Moucer, A. Taylor, F. Bach (2022). A systematic approach to Lyapunov analyses of continuous-time models in convex optimization.

Parameters
  • mu (float) – the strong convexity parameter

  • psd (boolean) – option for positivity of \(P\) in the Lyapunov function \(\mathcal{V}_{P}\)

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_flow_strongly_convex(mu=0.1, psd=True, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 4x4
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 2 scalar constraint(s) ...
                 function 1 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: -0.31622776602929215
*** Example file: worst-case performance of an accelerated gradient flow ***
        PEPit guarantee:         d/dt V(X_t,t) <= -0.316228 V(X_t,t)
        Theoretical guarantee:   d/dt V(X_t) <= -0.316228 V(X_t,t)

Accelerated gradient flow for convex functions

PEPit.examples.continuous_time_models.wc_accelerated_gradient_flow_convex(t, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is convex.

This code computes a worst-case guarantee for an accelerated gradient flow. That is, it verifies the inequality

\[\frac{d}{dt}\mathcal{V}(X_t, t) \leqslant 0 ,\]

is valid, where \(\mathcal{V}(X_t, t) = t^2(f(X_t) - f(x_\star)) + 2 \|(X_t - x_\star) + \frac{t}{2}\frac{d}{dt}X_t \|^2\), \(X_t\) is the output of an accelerated gradient flow, and where \(x_\star\) is the minimizer of \(f\).

In short, for given values of \(t\), it verifies \(\frac{d}{dt}\mathcal{V}(X_t, t) \leqslant 0\).

Algorithm: For \(t \geqslant 0\),

\[\frac{d^2}{dt^2}X_t + \frac{3}{t}\frac{d}{dt}X_t + \nabla f(X_t) = 0,\]

with some initialization \(X_{0}\triangleq x_0\).

Theoretical guarantee:

The following tight guarantee can be verified in [1, Section 2]:

\[\frac{d}{dt}\mathcal{V}(X_t, t) \leqslant 0.\]

After integrating between \(0\) and \(T\),

\[f(X_T) - f_\star \leqslant \frac{2}{T^2}\|x_0 - x_\star\|^2.\]

The detailed approach using PEPs is available in [2, Theorem 2.6].

References:

[1] W. Su, S. Boyd, E. J. Candès (2016). A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights. In the Journal of Machine Learning Research (JMLR).

[2] C. Moucer, A. Taylor, F. Bach (2022). A systematic approach to Lyapunov analyses of continuous-time models in convex optimization.

Parameters
  • t (float) – time step

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_flow_convex(t=3.4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 4x4
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 2 scalar constraint(s) ...
                 function 1 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: -1.2008648627755779e-18
*** Example file: worst-case performance of an accelerated gradient flow ***
        PEPit guarantee:         d/dt V(X_t,t) <= -1.20086e-18
        Theoretical guarantee:   d/dt V(X_t) <= 0.0

Tutorials

Contraction rate of gradient descent

PEPit.examples.tutorials.wc_gradient_descent_contraction(L, mu, gamma, n, verbose=1)[source]

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.

This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \mu, \gamma)\) such that the guarantee

\[\| x_n - y_n \|^2 \leqslant \tau(n, L, \mu, \gamma) \| x_0 - y_0 \|^2\]

is valid, where \(x_n\) and \(y_n\) are the outputs of the gradient descent method with fixed step-size \(\gamma\), starting respectively from \(x_0\) and \(y_0\).

In short, for given values of \(n\), \(L\), \(\mu\) and \(\gamma\), \(\tau(n, L, \mu \gamma)\) is computed as the worst-case value of \(\| x_n - y_n \|^2\) when \(\| x_0 - y_0 \|^2 \leqslant 1\).

Algorithm: For \(t\in\{0,1,\ldots,n-1\}\), gradient descent is described by

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where \(\gamma\) is a step-size.

Theoretical guarantee: The tight theoretical guarantee is

\[\| x_n - y_n \|^2 \leqslant \max\{(1-L\gamma)^2,(1-\mu \gamma)^2\}^n\| x_0 - y_0 \|^2,\]

which is tight on simple quadratic functions.

Parameters
  • L (float) – the smoothness parameter.

  • mu (float) – the strong-convexity parameter.

  • gamma (float) – step-size.

  • n (int) – number of iterations.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> L = 1
>>> pepit_tau, theoretical_tau = wc_gradient_descent_contraction(L=L, mu=0.1, gamma=1 / L, n=1, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 4x4
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                 function 1 : Adding 2 scalar constraint(s) ...
                 function 1 : 2 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.8100016613979604
*** Example file: worst-case performance of gradient descent with fixed step-sizes in contraction ***
        PEP-it guarantee:        ||x_n - y_n||^2 <= 0.810002 ||x_0 - y_0||^2
        Theoretical guarantee:   ||x_n - y_n||^2 <= 0.81 ||x_0 - y_0||^2

What’s new in PEPit

What’s new in PEPit 0.1.0

  • Adding general constraints to your problem.

    The method add_constraint has been added to the class PEP for general constraints not necessarily related to a specific function.
    For readability of your code, we suggest to use the method set_initial_condition when the constraint is the initial one, and the method add_constraint for any other constraint.
  • Adding LMI constraints to your problem.

    The method add_psd_matrix has been added to the class PEP and must be used to add LMI constraints to your problem.

  • CVXPY options.

    PEPit uses CVXPY to solve the underlying SDP of your problem.
    CVXPY solver options can be provided to the method PEP.solve.
  • Optimizing dimension of the solution.

    The tracetrick option of the method PEP.solve has been replaced by dimension_reduction_heuristic.
    Set to None by default, this option can be set to “trace” or “logdet{followed by a number}” to use one of those heuristic.
  • Granularity of the verbose mode has evolved.

    The verbose mode of the method PEP.solve and of the provided examples files are now integers:
    • 0: No verbose at all

    • 1: PEPit information is printed but not CVXPY’s

    • 2: Both PEPit and CVXPY details are printed

  • Parameters of function classes.

    The parameters that characterize a function class must be provided directly as arguments of this function class, not through the dict “param” anymore.
    Example: PEP.declare_function(function_class=SmoothStronglyConvexFunction, mu=.1, L=1.)
  • Initializing a Point or an Expression to 0.

    null_point and null_expression have been added to the module PEPit to facilitate the access to a Point or an Expression initialized to 0.

  • 3 new function classes have been added:

    • ConvexSupportFunction for convex support functions (see [1])

    • ConvexQGFunction, for convex and quadratically upper bounded functions (see [2])

    • RsiEbFunction, for functions verifying lower restricted secant inequality and upper error bound (see [3])

[1] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

[2] B. Goujaud, A. Taylor, A. Dieuleveut (2022). Optimal first-order methods for convex functions with a quadratic upper bound.

[3] C. Guille-Escuret, B. Goujaud, A. Ibrahim, I. Mitliagkas (2022). Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound. arXiv 2203.00342.

What’s new in PEPit 0.2.0

  • Adding possibility to set LMI constraints associated to function objects.

    The method add_psd_matrix has been added to the class Function and must be used to add LMI constraints associated to a function.

  • Storing dual values prior to dimension reduction.

    Each Constraint object receives a dual value in the attribute _dual_value which can be accessed through the method eval_dual. In previous releases, and in case of dimension reduction being activated, the dual values being stored where those of the latest solved problem. From this release, the dual values being stored are always those of the original problem. Note the primal values are those of the last problem providing adversarial example of smallest dimension possible.

  • Creating PSDMatrix class.

    PSDMatrix class as been added. This doesn’t affect how the methods add_psd_matrix must be used. A user must continue providing a psd matrix under the form of an Iterable of Expression``s. The latter will be automatically transformed into a ``PSDMatrix object that contains a _dual_value attribute and an eval_dual method as any Constraint object.

  • Fixing a minor issue in pep.py.

    There was an issue when the Gram matrix G did not need any eigenvalue correction as eig_threshold in pep.get_nb_eigenvalues_and_corrected_matrix where defined as the maximum of an empty list. This issue has been fixed in this release.

  • Eigenvalues are now sorted in decreasing order in the output of the PEP, making it easier to plot low-dimensional worst-case examples (examples of such usages can be found in the exercise repository Learning-Performance-Estimation).

  • Many new examples were introduced, including for looking for low-dimensional worst-case examples, fixed-point iterations, variational inequalities, and continuous-time dynamics.

Contributing

PEPit is designed for allowing users to easily contribute to add new features to the package. Classes of functions (or operators) as well as black-box oracles can be implemented by following the canvas from respectively PEPit/functions/ (or PEPit/operators/ and PEPit/primitive_steps/).

We encourage authors of research papers presenting novel optimization methods and/or a novel convergence results to submit the corresponding PEPit files in the directory PEPit/examples/.

General guidelines

We kindly ask you follow common guidelines, namely that the provided code:

  • sticks as much as possible to the PEP8 convention.

  • is commented with Google style docstring.

  • is well covered by tests.

  • is aligned with the documentation.

  • is also mentioned in the whatsnew section of the documentation.

Adding a new function or operator class

To add a new function / operator class, please follow the format used for the other function / operator classes.

In particular:

  • your class must inherit from the class Function and overwrite its add_class_constraints method.

  • the docstring must be complete. In particular, it must contains the list of attributes and arguments as well as an example of usage via the declare_function method of the class PEP. It must also contain a clickable reference to the paper introducing it.

Adding a step / an oracle

To add a new oracle / step, please add a new file containing the oracle function in PEPit/primitive_steps.

Remark that transforming the mathematical formulation of an oracle into its PEP equivalent may require additional tricks, see e.g. PEPit/primitive_steps/proximal_step.py, or PEPit/primitive_steps/linear_optimization_step.py.

Please make sure that your docstring contains the mathematical derivation of the latest from the previous.

Adding a new method as an example

We don’t require a specific code format for a new example. However, we ask the associated docstring to be precisely organized as follow:

  • Define Problem solved (introducing function notations and assumptions).

  • Name method in boldface formatting.

  • Introduce performance metric, initial condition and parameters (performance_metric < tau(parameters) initialization).

  • Describe method main step and cite reference with specified algorithm.

  • Provide theoretical result (Upper/Lower/Tight in boldface formatting + performance_metric < theoretical_bound initialization).

  • Reference block containing relevant clickable references (preferably to arxiv with specified version of the paper) in the format: (First name initial letter, last name (YEAR). Title. Journal or conference (Acronym of journal or conference).

  • Args block containing parameters with their type and short description.

  • Returns block containing pepit_tau and theoretical_tau.

  • Example block containing a minimal work example of the coded function.

We provide, in PEPit/examples/example_template.py, a template that can be filled very quickly to help the contributor to share their method easily.

New example template

PEPit.examples.example_template.wc_example_template(arg1, arg2, arg3, verbose=1)[source]

Consider the CHARACTERISTIC (eg., convex) minimization problem

\[f_\star \triangleq \min_x f(x),\]

where \(f\) is CLASS (eg., smooth convex).

This code computes a worst-case guarantee for the ** NAME OF THE METHOD **. That is, it computes the smallest possible \(\tau(arg_1, arg_2, arg_3)\) such that the guarantee

\[\text{PERFORMANCE METRIC} \leqslant \tau(arg_1, arg_2, arg_3) \text{ INITIALIZATION}\]

is valid, where NOTATION OF THE OUTPUT is the output of the ** NAME OF THE METHOD **, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of ARGUMENTS, \(\tau(arg_1, arg_2, arg_3)\) is computed as the worst-case value of \(\text{PERFORMANCE METRIC}\) when \(\text{INITIALIZATION} \leqslant 1\).

Algorithm: The NAME OF THE METHOD of this example is provided in REFERENCE WITH SPECIFIED ALGORITHM by

\begin{eqnarray} \text{MAIN STEP} \end{eqnarray}

Theoretical guarantee: A TIGHT, UPPER OR LOWER guarantee can be found in REFERENCE WITH SPECIFIED THEOREM:

\[\text{PERFORMANCE METRIC} \leqslant \text{THEORETICAL BOUND} \text{ INITIALIZATION}\]

References:

[1] F. Name, F. Name, F. Name (YEAR). Title. Conference or journal (Acronym of conference or journal).

[2] F. Name, F. Name, F. Name (YEAR). Title. Conference or journal (Acronym of journal or conference).

[3] F. Name, F. Name, F. Name (YEAR). Title. Conference or journal (Acronym of journal or conference).

Parameters
  • arg1 (type1) – description of arg1.

  • arg2 (type2) – description of arg2.

  • arg3 (type3) – description of arg3.

  • verbose (int) –

    Level of information details to print.

    • -1: No verbose at all.

    • 0: This example’s output.

    • 1: This example’s output + PEPit information.

    • 2: This example’s output + PEPit information + CVXPY details.

Returns
  • pepit_tau (float) – worst-case value

  • theoretical_tau (float) – theoretical value

Example

>>> pepit_tau, theoretical_tau = wc_example_template(arg1=value1, arg2=value2, arg3=value3, verbose=1)
``OUTPUT MESSAGE``

New example test template

def test_[NAME_METHOD](self):
    PARAMS = PARAMS

    wc, theory = wc_[NAME_METHOD](PARAMS=PARAMS, verbose=self.verbose)

    # If theoretical upper bound is tight
    self.assertAlmostEqual(theory, wc, delta=self.relative_precision * theory)

    # If theoretical upper bound is not tight
    self.assertLessEqual(wc, theory * (1 + self.relative_precision))

    # If theoretical lower bound is not tight
    self.assertLessEqual(theory, wc * (1 + self.relative_precision))

PEPit: Performance Estimation in Python

PyPI version Documentation Status Downloads License

This open source Python library provides a generic way to use PEP framework in Python. Performance estimation problems were introduced in 2014 by Yoel Drori and Marc Teboulle, see [1]. PEPit is mainly based on the formalism and developments from [2, 3] by a subset of the authors of this toolbox. A friendly informal introduction to this formalism is available in this blog post and a corresponding Matlab library is presented in [4] (PESTO).

Website and documentation of PEPit: https://pepit.readthedocs.io/

Source Code (MIT): https://github.com/PerformanceEstimation/PEPit

Using and citing the toolbox

This code comes jointly with the following reference:

B. Goujaud, C. Moucer, F. Glineur, J. Hendrickx, A. Taylor, A. Dieuleveut (2022).
"PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python."

When using the toolbox in a project, please refer to this note via this Bibtex entry:

@article{pepit2022,
  title={{PEPit}: computer-assisted worst-case analyses of first-order optimization methods in {P}ython},
  author={Goujaud, Baptiste and Moucer, C\'eline and Glineur, Fran\c{c}ois and Hendrickx, Julien and Taylor, Adrien and Dieuleveut, Aymeric},
  journal={arXiv preprint arXiv:2201.04040},
  year={2022}
}

Demo Open In Colab

This notebook provides a demonstration of how to use PEPit to obtain a worst-case guarantee on a simple algorithm (gradient descent), and a more advanced analysis of three other examples.

Installation

The library has been tested on Linux and MacOSX. It relies on the following Python modules:

  • Numpy

  • Scipy

  • Cvxpy

  • Matplotlib (for the demo notebook)

Pip installation

You can install the toolbox through PyPI with:

pip install pepit

or get the very latest version by running:

pip install -U https://github.com/PerformanceEstimation/PEPit/archive/master.zip # with --user for user install (no root)

Post installation check

After a correct installation, you should be able to import the module without errors:

import PEPit

Online environment

You can also try the package in this Binder repository. Binder

Example

The folder Examples contains numerous introductory examples to the toolbox.

Among the other examples, the following code (see GradientMethod) generates a worst-case scenario for iterations of the gradient method, applied to the minimization of a smooth (possibly strongly) convex function f(x). More precisely, this code snippet allows computing the worst-case value of when is generated by gradient descent, and when .

from PEPit import PEP
from PEPit.functions import SmoothStronglyConvexFunction


def wc_gradient_descent(L, gamma, n, verbose=1):
    """
    Consider the convex minimization problem

    .. math:: f_\\star \\triangleq \\min_x f(x),

    where :math:`f` is :math:`L`-smooth and convex.

    This code computes a worst-case guarantee for **gradient descent** with fixed step-size :math:`\\gamma`.
    That is, it computes the smallest possible :math:`\\tau(n, L, \\gamma)` such that the guarantee

    .. math:: f(x_n) - f_\\star \\leqslant \\tau(n, L, \\gamma) \\|x_0 - x_\\star\\|^2

    is valid, where :math:`x_n` is the output of gradient descent with fixed step-size :math:`\\gamma`, and
    where :math:`x_\\star` is a minimizer of :math:`f`.

    In short, for given values of :math:`n`, :math:`L`, and :math:`\\gamma`, :math:`\\tau(n, L, \\gamma)` is computed as the worst-case
    value of :math:`f(x_n)-f_\\star` when :math:`\\|x_0 - x_\\star\\|^2 \\leqslant 1`.

    **Algorithm**:
    Gradient descent is described by

    .. math:: x_{t+1} = x_t - \\gamma \\nabla f(x_t),

    where :math:`\\gamma` is a step-size.

    **Theoretical guarantee**:
    When :math:`\\gamma \\leqslant \\frac{1}{L}`, the **tight** theoretical guarantee can be found in [1, Theorem 3.1]:

    .. math:: f(x_n)-f_\\star \\leqslant \\frac{L}{4nL\\gamma+2} \\|x_0-x_\\star\\|^2,

    which is tight on some Huber loss functions.

    **References**:

    `[1] Y. Drori, M. Teboulle (2014). Performance of first-order methods for smooth convex minimization: a novel
    approach. Mathematical Programming 145(1–2), 451–482.
    <https://arxiv.org/pdf/1206.3209.pdf>`_

    Args:
        L (float): the smoothness parameter.
        gamma (float): step-size.
        n (int): number of iterations.
        verbose (int): Level of information details to print.

                        - -1: No verbose at all.
                        - 0: This example's output.
                        - 1: This example's output + PEPit information.
                        - 2: This example's output + PEPit information + CVXPY details.

    Returns:
        pepit_tau (float): worst-case value
        theoretical_tau (float): theoretical value

    Example:
        >>> L = 3
        >>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=1 / L, n=4, verbose=1)
        (PEPit) Setting up the problem: size of the main PSD matrix: 7x7
        (PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
        (PEPit) Setting up the problem: Adding initial conditions and general constraints ...
        (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
        (PEPit) Setting up the problem: interpolation conditions for 1 function(s)
                         function 1 : Adding 30 scalar constraint(s) ...
                         function 1 : 30 scalar constraint(s) added
        (PEPit) Compiling SDP
        (PEPit) Calling SDP solver
        (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.16666664596175398
        *** Example file: worst-case performance of gradient descent with fixed step-sizes ***
                PEPit guarantee:         f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
                Theoretical guarantee:   f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2

    """

    # Instantiate PEP
    problem = PEP()

    # Declare a strongly convex smooth function
    func = problem.declare_function(SmoothStronglyConvexFunction, mu=0, L=L)

    # Start by defining its unique optimal point xs = x_* and corresponding function value fs = f_*
    xs = func.stationary_point()
    fs = func(xs)

    # Then define the starting point x0 of the algorithm
    x0 = problem.set_initial_point()

    # Set the initial constraint that is the distance between x0 and x^*
    problem.set_initial_condition((x0 - xs) ** 2 <= 1)

    # Run n steps of the GD method
    x = x0
    for _ in range(n):
        x = x - gamma * func.gradient(x)

    # Set the performance metric to the function values accuracy
    problem.set_performance_metric(func(x) - fs)

    # Solve the PEP
    pepit_verbose = max(verbose, 0)
    pepit_tau = problem.solve(verbose=pepit_verbose)

    # Compute theoretical guarantee (for comparison)
    theoretical_tau = L / (2 * (2 * n * L * gamma + 1))

    # Print conclusion if required
    if verbose != -1:
        print('*** Example file: worst-case performance of gradient descent with fixed step-sizes ***')
        print('\tPEPit guarantee:\t f(x_n)-f_* <= {:.6} ||x_0 - x_*||^2'.format(pepit_tau))
        print('\tTheoretical guarantee:\t f(x_n)-f_* <= {:.6} ||x_0 - x_*||^2'.format(theoretical_tau))

    # Return the worst-case guarantee of the evaluated method (and the reference theoretical value)
    return pepit_tau, theoretical_tau


if __name__ == "__main__":
    L = 3
    pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=1 / L, n=4, verbose=1)

Included tools

A lot of common optimization methods can be studied through this framework, using numerous steps and under a large variety of function / operator classes.

PEPit provides the following steps (often referred to as “oracles”):

PEPit provides the following function classes CNIs:

PEPit provides the following operator classes CNIs:

Authors

This toolbox has been created by

Acknowledgments

The authors would like to thank Rémi Flamary for his feedbacks on preliminary versions of the toolbox, as well as for support regarding the continuous integration.

Contributions

All external contributions are welcome. Please read the contribution guidelines.

References

[1] Y. Drori, M. Teboulle (2014). Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming 145(1–2), 451–482.

[2] A. Taylor, J. Hendrickx, F. Glineur (2017). Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Mathematical Programming, 161(1-2), 307-345.

[3] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.

[4] A. Taylor, J. Hendrickx, F. Glineur (2017). Performance Estimation Toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In 56th IEEE Conference on Decision and Control (CDC).

[5] A. d’Aspremont, D. Scieur, A. Taylor (2021). Acceleration Methods. Foundations and Trends in Optimization: Vol. 5, No. 1-2.

[6] O. Güler (1992). New proximal point algorithms for convex minimization. SIAM Journal on Optimization, 2(4):649–664.

[7] Y. Drori (2017). The exact information-based complexity of smooth convex minimization. Journal of Complexity, 39, 1-16.

[8] E. De Klerk, F. Glineur, A. Taylor (2017). On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optimization Letters, 11(7), 1185-1199.

[9] B.T. Polyak (1964). Some methods of speeding up the convergence of iteration method. URSS Computational Mathematics and Mathematical Physics.

[10] E. Ghadimi, H. R. Feyzmahdavian, M. Johansson (2015). Global convergence of the Heavy-ball method for convex optimization. European Control Conference (ECC).

[11] E. De Klerk, F. Glineur, A. Taylor (2020). Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation. SIAM Journal on Optimization, 30(3), 2053-2082.

[12] O. Gannot (2021). A frequency-domain analysis of inexact gradient methods. Mathematical Programming.

[13] D. Kim, J. Fessler (2016). Optimized first-order methods for smooth convex minimization. Mathematical Programming 159.1-2: 81-107.

[14] S. Cyrus, B. Hu, B. Van Scoy, L. Lessard (2018). A robust accelerated optimization algorithm for strongly convex functions. American Control Conference (ACC).

[15] Y. Nesterov (2003). Introductory lectures on convex optimization: A basic course. Springer Science & Business Media.

[16] S. Boyd, L. Xiao, A. Mutapcic (2003). Subgradient Methods (lecture notes).

[17] Y. Drori, M. Teboulle (2016). An optimal variant of Kelley’s cutting-plane method. Mathematical Programming, 160(1), 321-351.

[18] Van Scoy, B., Freeman, R. A., Lynch, K. M. (2018). The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Letters, 2(1), 49-54.

[19] P. Patrinos, L. Stella, A. Bemporad (2014). Douglas-Rachford splitting: Complexity estimates and accelerated variants. In 53rd IEEE Conference on Decision and Control (CDC).

[20] Y. Censor, S.A. Zenios (1992). Proximal minimization algorithm with D-functions. Journal of Optimization Theory and Applications, 73(3), 451-464.

[21] E. Ryu, S. Boyd (2016). A primer on monotone operator methods. Applied and Computational Mathematics 15(1), 3-43.

[22] E. Ryu, A. Taylor, C. Bergeling, P. Giselsson (2020). Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization, 30(3), 2251-2271.

[23] P. Giselsson, and S. Boyd (2016). Linear convergence and metric selection in Douglas-Rachford splitting and ADMM. IEEE Transactions on Automatic Control, 62(2), 532-544.

[24] M .Frank, P. Wolfe (1956). An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2), 95-110.

[25] M. Jaggi (2013). Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In 30th International Conference on Machine Learning (ICML).

[26] A. Auslender, M. Teboulle (2006). Interior gradient and proximal methods for convex and conic optimization. SIAM Journal on Optimization 16.3 (2006): 697-725.

[27] H.H. Bauschke, J. Bolte, M. Teboulle (2017). A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications. Mathematics of Operations Research, 2017, vol. 42, no 2, p. 330-348

[28] R. Dragomir, A. Taylor, A. d’Aspremont, J. Bolte (2021). Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 1-43.

[29] A. Taylor, J. Hendrickx, F. Glineur (2018). Exact worst-case convergence rates of the proximal gradient method for composite convex minimization. Journal of Optimization Theory and Applications, 178(2), 455-476.

[30] B. Polyak (1987). Introduction to Optimization. Optimization Software New York.

[31] L. Lessard, B. Recht, A. Packard (2016). Analysis and design of optimization algorithms via integral quadratic constraints. SIAM Journal on Optimization 26(1), 57–95.

[32] D. Davis, W. Yin (2017). A three-operator splitting scheme and its optimization applications. Set-valued and variational analysis, 25(4), 829-858.

[33] Taylor, A. B. (2017). Convex interpolation and performance estimation of first-order methods for convex optimization. PhD Thesis, UCLouvain.

[34] H. Abbaszadehpeivasti, E. de Klerk, M. Zamani (2021). The exact worst-case convergence rate of the gradient method with fixed step lengths for L-smooth functions. arXiv 2104.05468.

[35] J. Bolte, S. Sabach, M. Teboulle, Y. Vaisbourd (2018). First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3), 2131-2151.

[36] A. Defazio (2016). A simple practical accelerated method for finite sums. Advances in Neural Information Processing Systems (NIPS), 29, 676-684.

[37] A. Defazio, F. Bach, S. Lacoste-Julien (2014). SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in Neural Information Processing Systems (NIPS).

[38] B. Hu, P. Seiler, L. Lessard (2020). Analysis of biased stochastic gradient descent using sequential semidefinite programs. Mathematical programming (to appear).

[39] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. Conference on Learning Theory (COLT).

[40] D. Kim (2021). Accelerated proximal point method for maximally monotone operators. Mathematical Programming, 1-31.

[41] W. Moursi, L. Vandenberghe (2019). Douglas–Rachford Splitting for the Sum of a Lipschitz Continuous and a Strongly Monotone Operator. Journal of Optimization Theory and Applications 183, 179–198.

[42] G. Gu, J. Yang (2020). Tight sublinear convergence rate of the proximal point algorithm for maximal monotone inclusion problem. SIAM Journal on Optimization, 30(3), 1905-1921.

[43] F. Lieder (2021). On the convergence rate of the Halpern-iteration. Optimization Letters, 15(2), 405-418.

[44] F. Lieder (2018). Projection Based Methods for Conic Linear Programming Optimal First Order Complexities and Norm Constrained Quasi Newton Methods. PhD thesis, HHU Düsseldorf.

[45] Y. Nesterov (1983). A method for solving the convex programming problem with convergence rate :math:O(1/k^2). In Dokl. akad. nauk Sssr (Vol. 269, pp. 543-547).

[46] N. Bansal, A. Gupta (2019). Potential-function proofs for gradient methods. Theory of Computing, 15(1), 1-32.

[47] M. Barre, A. Taylor, F. Bach (2021). A note on approximate accelerated forward-backward methods with absolute and relative errors, and possibly strongly convex objectives. arXiv:2106.15536v2.

[48] J. Eckstein and W. Yao (2018). Relative-error approximate versions of Douglas–Rachford splitting and special cases of the ADMM. Mathematical Programming, 170(2), 417-444.

[49] M. Barré, A. Taylor, A. d’Aspremont (2020). Complexity guarantees for Polyak steps with momentum. In Conference on Learning Theory (COLT).

[50] D. Kim, J. Fessler (2017). On the convergence analysis of the optimized gradient method. Journal of Optimization Theory and Applications, 172(1), 187-205.

[51] Steven Diamond and Stephen Boyd (2016). CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research (JMLR) 17.83.1–5 (2016).

[52] Agrawal, Akshay and Verschueren, Robin and Diamond, Steven and Boyd, Stephen (2018). A rewriting system for convex optimization problems. Journal of Control and Decision (JCD) 5.1.42–60 (2018).

[53] Adrien Taylor, Bryan Van Scoy, Laurent Lessard (2018). Lyapunov Functions for First-Order Methods: Tight Automated Convergence Guarantees. International Conference on Machine Learning (ICML).

[54] C. Guille-Escuret, B. Goujaud, A. Ibrahim, I. Mitliagkas (2022). Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound.

[55] B. Goujaud, A. Taylor, A. Dieuleveut (2022). Optimal first-order methods for convex functions with a quadratic upper bound.

[56] B. Goujaud, C. Moucer, F. Glineur, J. Hendrickx, A. Taylor, A. Dieuleveut (2022). PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python.

Indices and tables