Welcome to PEPit’s documentation!
Quick start guide
The toolbox implements the performance estimation approach, pioneered by Drori and Teboulle [2]. A gentle introduction to performance estimation problems is provided in this blog post.
The PEPit implementation is in line with the framework as exposed in [3,4] and follow-up works (for which proper references are provided in the example files). A gentle introduction to the toolbox is provided in [1].
When to use PEPit?
The general purpose of the toolbox is to help the researchers producing worst-case guarantees for their favorite first-order methods.
This toolbox is presented under the form of a Python package. For people who are more comfortable with Matlab, we report to PESTO.
How tu use PEPit?
Installation
PEPit is available on pypi, hence can be installed very simply by running
pip install pepit
Now you are all set! You should be able to run
import PEPit
in an Python interpreter.
Basic usage: getting worst-case guarantees
The main object is called a PEP. It stores the problem you will describe to PEPit.
First create a PEP object.
from PEPit import PEP
problem = PEP()
From now, you can declare functions thanks to the declare_function method.
from PEPit.functions import SmoothConvexFunction
func = problem.declare_function(SmoothConvexFunction, L=L)
Warning
To enforce the same subgradient to be returned each time one is required, we introduced the attribute reuse_gradient in the Function class. Some classes of functions contain only differentiable functions (e.g. smooth convex function). In those, the reuse_gradient attribute is set to True by default.
When the same subgradient is used several times in the same code and when it is difficult to to keep track of it (through proximal calls for instance), it may be useful to set this parameter to True even if the function is not differentiable. This helps reducing the number of constraints, and improve the accuracy of the underlying semidefinite program. See for instance the code for improved interior method or no Lips in Bregman divergence.
You can also define a new point with
x0 = problem.set_initial_point()
and give a name to the value of func on x0
f0 = func(x0)
as well as the (sub)gradient of func on x0
g0 = func.gradient(x0)
or
g0 = func.subgradient(x0)
There is a more compact way to do it using the oracle method.
g0, f0 = func.oracle(x0)
You can declare a stationary point of func, defined as a point which gradient on func is zero, as follow:
xs = func.stationary_point()
You can combine points and gradients naturally
x = x0
for _ in range(n):
x = x - gamma * func.gradient(x)
You must declare some initial conditions like
problem.set_initial_condition((x0 - xs) ** 2 <= 1)
as well as performance metrics like
problem.set_performance_metric(func(x) - fs)
Finally, you can ask PEPit to solve the system for you and return the worst-case guarantee of your method.
pepit_tau = problem.solve()
Warning
Performance estimation problems consist in reformulating the problem of finding a worst-case scenario as a semidefinite program (SDP). The dimension of the corresponding SDP is directly related to the number of function and gradient evaluations in a given code.
We encourage the users to perform as few function and subgradient evaluations as possible, as the size of the corresponding SDP grows with the number of subgradient/function evaluations at different points.
Derive proofs and adversarial objectives
When one can the solve method, PEPit does much more that just finding the worst-case value.
In particular, it stores possible values of each points, gradients and function values that achieve this worst-case guarantee, as well as the dual variable values associated with each constraint.
Values and dual variables values
Let’s consider the above example. After solving the PEP, you can ask PEPit
print(x.eval())
which returns one possible value of the output of the described algorithm at optimum.
You can also ask for gradients and function values
print(func.gradient(x).eval())
print(func(x).eval())
Recovering the values of all the points, gradients and function values at optimum allows you to reconstruct the function that achieves the worst-case complexity of your method.
You can also get the dual variables values of constraints at optimum, which essentially allows you to write the proof of the worst-case guarantee you just obtained.
Let’s consider again the previous example, but this time, let’s give a name to a constraint before using it.
constraint = (x0 - xs) ** 2 <= 1
problem.set_initial_condition(constraint)
Then, after solving the system, you can require its associated dual variable value with
constraint.eval_dual()
Output pdf
In a later release, we will provide an option to output a pdf file summarizing all those pieces of information.
Simpler worst-case scenarios
Sometimes, there are several solutions to the PEP problem. For obtaining simpler worst-case scenarios, one would prefer a low dimension solutions to the SDP. To this end, we provide heuristics based on the trace norm or log det minimization for reducing the dimension of the numerical solution to the SDP.
You can use the trace heuristic by specifying
problem.solve(dimension_reduction_heuristic="trace")
You can use the n iteration of the log det heuristic by specifying “logdetn”. For example, for using 5 iterations of the logdet heuristic:
problem.solve(dimension_reduction_heuristic="logdet5")
Finding Lyapunov
In a later release, we will provide tools to help finding good Lyapunov functions to study a given method.
This tool will be based on the very recent work [7].
References
[1] B. Goujaud, C. Moucer, F. Glineur, J. Hendrickx, A. Taylor, A. Dieuleveut. PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python.
[2] Drori, Yoel, and Marc Teboulle. Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming 145.1-2 (2014): 451-482
[3] Taylor, Adrien B., Julien M. Hendrickx, and François Glineur. Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Mathematical Programming 161.1-2 (2017): 307-345.
[4] Taylor, Adrien B., Julien M. Hendrickx, and François Glineur. Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization 27.3 (2017): 1283-1313.
[5] Steven Diamond and Stephen Boyd. CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research (JMLR) 17.83.1–5 (2016).
[6] Agrawal, Akshay and Verschueren, Robin and Diamond, Steven and Boyd, Stephen. A rewriting system for convex optimization problems. Journal of Control and Decision (JCD) 5.1.42–60 (2018).
[7] Adrien Taylor, Bryan Van Scoy, Laurent Lessard. Lyapunov Functions for First-Order Methods: Tight Automated Convergence Guarantees. International Conference on Machine Learning (ICML).
API and modules
Main modules
PEP
- class PEPit.PEP[source]
Bases:
object
The class
PEP
is the main class of this framework. APEP
object encodes a complete performance estimation problem. It stores the following information.- Attributes
list_of_functions (list) – list of leaf
Function
objects that are defined through the pipeline.list_of_points (list) – list of
Point
objects that are defined out of the scope of aFunction
. Typically the initialPoint
.list_of_constraints (list) – list of
Constraint
objects that are defined out of the scope of aFunction
. Typically the initialConstraint
.list_of_performance_metrics (list) – list of
Expression
objects. The pep maximizes the minimum of all performance metrics.counter (int) – counts the number of
PEP
objects. Ideally, only one is defined at a time.
A
PEP
object can be instantiated without any argumentExample
>>> pep = PEP()
- add_constraint(constraint)[source]
Store a new
Constraint
to the list of constraints of thisPEP
.- Parameters
constraint (Constraint) – typically resulting from a comparison of 2
Expression
objects.- Raises
AssertionError – if provided constraint is not a
Constraint
object.
- add_psd_matrix(matrix_of_expressions)[source]
Store a new matrix of
Expression
s that we enforce to be positive semidefinite.- Parameters
matrix_of_expressions (2D ndarray of Expression) – a square matrix of
Expression
.- Raises
AssertionError – if provided matrix is not a square matrix.
TypeError – if provided matrix does not contain only Expressions.
- declare_function(function_class, **kwargs)[source]
Instantiate a leaf
Function
and store it in the attribute list_of_functions.- Parameters
function_class (class) – a subclass of
Function
that overwrite the add_class_constraints method.kwargs (dict) – dictionary of parameters that characterize the function class. Can also contains the boolean reuse_gradient, that enforces using only one subgradient per point.
- Returns
f (Function) – the newly created function.
- static get_nb_eigenvalues_and_corrected_matrix(M)[source]
Compute the number of True non zero eigenvalues of M, and recompute M with corrected eigenvalues.
- Parameters
M (nd.array) – a 2 dimensional array, supposedly symmetric.
- Returns
nb_eigenvalues (int) – The number of eigenvalues of M estimated to be strictly positive zero.
eig_threshold (float) – The threshold used to determine whether an eigenvalue is 0 or not.
corrected_S (nd.array) – Updated M with zero eigenvalues instead of small ones.
- send_constraint_to_cvxpy(constraint, F, G)[source]
Transform a PEPit
Constraint
into a CVXPY one.- Parameters
constraint (Constraint) – a
Constraint
object to be sent to CVXPY.F (CVXPY Variable) – a CVXPY Variable referring to function values.
G (CVXPY Variable) – a CVXPY Variable referring to points and gradients.
- Returns
cvxpy_constraint (CVXPY constraint) – the corresponding CVXPY constraint.
:raises ValueError if the attribute equality_or_inequality of the
Constraint
: :raises is neither equality, nor inequality.:
- send_lmi_constraint_to_cvxpy(psd_counter, psd_matrix, F, G, verbose)[source]
Transform a PEPit
PSDMatrix
into a CVXPY symmetric PSD matrix.- Parameters
psd_counter (int) – a counter useful for the verbose mode.
psd_matrix (PSDMatrix) – a matrix of expressions that is constrained to be PSD.
F (CVXPY Variable) – a CVXPY Variable referring to function values.
G (CVXPY Variable) – a CVXPY Variable referring to points and gradients.
verbose (int) –
Level of information details to print (Override the CVXPY solver verbose parameter).
0: No verbose at all.
1: PEPit information is printed but not CVXPY’s
2: Both PEPit and CVXPY details are printed
- Returns
cvxpy_constraints_list (list of CVXPY constraints) – the PSD constraint as well as correspondence between the matrix and its elements.
- set_initial_condition(condition)[source]
Store a new
Constraint
to the list of constraints of thisPEP
. Typically an condition of the form \(\|x_0 - x_\star\|^2 \leq 1\).- Parameters
condition (Constraint) – typically resulting from a comparison of 2
Expression
objects.- Raises
AssertionError – if provided constraint is not a
Constraint
object.
- set_initial_point()[source]
Create a new leaf
Point
and store it in the attribute list_of_points.- Returns
x (Point) – the newly created
Point
.
- set_performance_metric(expression)[source]
Store a performance metric in the attribute list_of_performance_metrics. The objective of the PEP (which is maximized) is the minimum of the elements of list_of_performance_metrics.
- Parameters
expression (Expression) – a new performance metric.
- solve(verbose=1, return_full_cvxpy_problem=False, dimension_reduction_heuristic=None, eig_regularization=0.001, tol_dimension_reduction=1e-05, **kwargs)[source]
Transform the
PEP
under the SDP form, and solve it.- Parameters
verbose (int) – Level of information details to print (Override the CVXPY solver verbose parameter). 0: No verbose at all 1: PEPit information is printed but not CVXPY’s 2: Both PEPit and CVXPY details are printed
return_full_cvxpy_problem (bool) – If True, return the cvxpy Problem object. If False, return the worst case value only. Set to False by default.
dimension_reduction_heuristic (str, optional) –
An heuristic to reduce the dimension of the solution (rank of the Gram matrix). Set to None to deactivate it (default value). Available heuristics are:
”trace”: minimize \(Tr(G)\)
”logdet{an integer n}”: minimize \(\log\left(\mathrm{Det}(G)\right)\) using n iterations of local approximation problems.
eig_regularization (float, optional) – The regularization we use to make \(G + \mathrm{eig_regularization}I_d \succ 0\). (only used when “dimension_reduction_heuristic” is not None) The default value is 1e-5.
tol_dimension_reduction (float, optional) – The error tolerance in the heuristic minimization problem. Precisely, the second problem minimizes “optimal_value - tol” (only used when “dimension_reduction_heuristic” is not None) The default value is 1e-5.
kwargs (keywords, optional) – Additional CVXPY solver specific arguments.
- Returns
float or cp.Problem – Value of the performance metric of cp.Problem object corresponding to the SDP. The value only is returned by default.
Point
- class PEPit.Point(is_leaf=True, decomposition_dict=None)[source]
Bases:
object
A
Point
encodes an element of a pre-Hilbert space, either a point or a gradient.- Attributes
_is_leaf (bool) – True if self is defined from scratch (not as linear combination of other
Point
objects). False if self is defined as linear combination of other points._value (nd.array) – numerical value of self obtained after solving the PEP via SDP solver. Set to None before the call to the method PEP.solve from the
PEP
.decomposition_dict (dict) – decomposition of self as a linear combination of leaf
Point
objects. Keys arePoint
objects. And values are their associated coefficients.counter (int) – counts the number of leaf
Point
objects.
Point
objects can be added or subtracted together. They can also be multiplied and divided by a scalar value.Example
>>> point1 = Point() >>> point2 = Point() >>> new_point = (- point1 + point2) / 5
As in any pre-Hilbert space, there exists a scalar product. Therefore,
Point
objects can be multiplied together.Example
>>> point1 = Point() >>> point2 = Point() >>> new_expr = point1 * point2
The output is a scalar of type
Expression
.The corresponding squared norm can also be computed.
Example
>>> point = Point() >>> new_expr = point ** 2
Point
objects can also be instantiated via the following arguments- Parameters
is_leaf (bool) – True if self is a
Point
defined from scratch (not as linear combination of otherPoint
objects). False if self is a linear combination of existingPoint
objects.decomposition_dict (dict) – decomposition of self as a linear combination of leaf
Point
objects. Keys arePoint
objects. And values are their associated coefficients.
Note
If is_leaf is True, then decomposition_dict must be provided as None. Then self.decomposition_dict will be set to {self: 1}.
Instantiating the
Point
object of the first example can be done byExample
>>> point1 = Point() >>> point2 = Point() >>> new_point = Point(is_leaf=False, decomposition_dict = {point1: -1/5, point2: 1/5})
Expression
- class PEPit.Expression(is_leaf=True, decomposition_dict=None)[source]
Bases:
object
An
Expression
is a linear combination of functions values, inner products of points and / or gradients (product of 2Point
objects), and constant scalar values.- Attributes
_is_leaf (bool) – True if self is a function value defined from scratch (not as linear combination of other function values). False if self is a linear combination of existing
Expression
objects._value (float) – numerical value of self obtained after solving the PEP via SDP solver. Set to None before the call to the method PEP.solve from the
PEP
.decomposition_dict (dict) – decomposition of self as a linear combination of leaf
Expression
objects. Keys areExpression
objects or tuple of 2Point
objects. And values are their associated coefficients.counter (int) – counts the number of leaf
Expression
objects.
Expression
objects can be added or subtracted together. They can also be added, subtracted, multiplied and divided by a scalar value.Example
>>> expr1 = Expression() >>> expr2 = Expression() >>> new_expr = (- expr1 + expr2 - 1) / 5
Expression
objects can also be compared togetherExample
>>> expr1 = Expression() >>> expr2 = Expression() >>> inequality1 = expr1 <= expr2 >>> inequality2 = expr1 >= expr2 >>> equality = expr1 == expr2
The three outputs inequality1, inequality2 and equality are then
Constraint
objects.Expression
objects can also be instantiated via the following arguments- Parameters
is_leaf (bool) – True if self is a function value defined from scratch (not as linear combination of other function values). False if self is a linear combination of existing
Expression
objects.decomposition_dict (dict) – decomposition of self as a linear combination of leaf
Expression
objects. Keys areExpression
objects or tuple of 2Point
objects. And values are their associated coefficients.
Note
If is_leaf is True, then decomposition_dict must be provided as None. Then self.decomposition_dict will be set to {self: 1}.
Instantiating the
Expression
object of the first example can be done byExample
>>> expr1 = Expression() >>> expr2 = Expression() >>> new_expr = Expression(is_leaf=False, decomposition_dict = {expr1: -1/5, expr2: 1/5, 1: -1/5})
- eval()[source]
Compute, store and return the value of this
Expression
.- Returns
self._value (np.array) – Value of this
Expression
after the corresponding PEP was solved numerically.- Raises
ValueError("The PEP must be solved to evaluate Expressions!") if the PEP has not been solved yet. –
TypeError("Expressions are made of function values, inner products and constants only!") –
Constraint
- class PEPit.Constraint(expression, equality_or_inequality)[source]
Bases:
object
A
Constraint
encodes either an equality or an inequality between twoExpression
objects.A
Constraint
must be understood either as self.expression = 0 or self.expression \(\leqslant\) 0 depending on the value of self.equality_or_inequality.- Attributes
expression (Expression) – The
Expression
that is compared to 0.equality_or_inequality (str) – “equality” or “inequality”. Encodes the type of constraint.
_value (float) – numerical value of self.expression obtained after solving the PEP via SDP solver. Set to None before the call to the method PEP.solve from the
PEP
._dual_variable_value (float) – the associated dual variable from the numerical solution to the corresponding PEP. Set to None before the call to PEP.solve from the
PEP
counter (int) – counts the number of
Constraint
objects.
A
Constraint
results from a comparison between twoExpression
objects.Example
>>> from PEPit import Expression >>> expr1 = Expression() >>> expr2 = Expression() >>> inequality1 = expr1 <= expr2 >>> inequality2 = expr1 >= expr2 >>> equality = expr1 == expr2
Constraint
objects can also be instantiated via the following arguments.- Parameters
expression (Expression) – an object of class Expression
equality_or_inequality (str) – either ‘equality’ or ‘inequality’.
Instantiating the
Constraint
objects of the first example can be done byExample
>>> from PEPit import Expression >>> expr1 = Expression() >>> expr2 = Expression() >>> inequality1 = Constraint(expression=expr1-expr2, equality_or_inequality="inequality") >>> inequality2 = Constraint(expression=expr2-expr1, equality_or_inequality="inequality") >>> equality = Constraint(expression=expr1-expr2, equality_or_inequality="equality")
- Raises
AssertionError – if provided equality_or_inequality argument is neither “equality” nor “inequality”.
- eval()[source]
Compute, store and return the value of the underlying
Expression
of thisConstraint
.- Returns
self._value (np.array) – The value of the underlying
Expression
of thisConstraint
after the corresponding PEP was solved numerically.- Raises
ValueError("The PEP must be solved to evaluate Constraints!") if the PEP has not been solved yet. –
- eval_dual()[source]
Compute, store and return the value of the dual variable of this
Constraint
.- Returns
self._dual_variable_value (float) – The value of the dual variable of this
Constraint
after the corresponding PEP was solved numerically.- Raises
ValueError("The PEP must be solved to evaluate Constraints dual variables!") if the PEP has not been solved yet. –
Symmetric positive semidefinite matrix
- class PEPit.PSDMatrix(matrix_of_expressions)[source]
Bases:
object
A
PSDMatrix
encodes a square matrix ofExpression
objects that is constrained to be symmetric PSD.- Attributes
matrix_of_expressions (2D ndarray of Expression) – a square matrix of
Expression
objects.shape (tuple of ints) – the shape of the underlying matrix of
Expression
objects._value (2D ndarray of floats) – numerical values of
Expression
objects obtained after solving the PEP via SDP solver. Set to None before the call to the method PEP.solve from thePEP
._dual_variable_value (2D ndarray of floats) – the associated dual matrix from the numerical solution to the corresponding PEP. Set to None before the call to PEP.solve from the
PEP
.entries_dual_variable_value (2D ndarray of floats) – the dual of each correspondence between entries of the matrix and the underlying
Expression
objects.counter (int) – counts the number of
PSDMatrix
objects.
Example
>>> # Defining t <= sqrt(expr) for a given expression expr. >>> import numpy as np >>> from PEPit import Expression >>> from PEPit import PSDMatrix >>> expr = Expression() >>> t = Expression() >>> psd_matrix = PSDMatrix(matrix_of_expressions=np.array([[expr, t], [t, 1]])) >>> # The last line means that the matrix [[expr, t], [t, 1]] is constrained to be PSD. >>> # This is equivalent to det([[expr, t], [t, 1]]) >= 0, i.e. expr - t^2 >= 0.
PSDMatrix
objects are instantiated via the following argument.- Parameters
matrix_of_expressions (2D ndarray of Expression) – a square matrix of
Expression
.
Instantiating the
PSDMatrix
objects of the first example can be done byExample
>>> import numpy as np >>> from PEPit import Expression >>> from PEPit import PSDMatrix >>> matrix_of_expressions = np.array([Expression() for i in range(4)]).reshape(2, 2) >>> psd_matrix = PSDMatrix(matrix_of_expressions=matrix_of_expressions)
- Raises
AssertionError – if provided matrix is not a square matrix.
TypeError – if provided matrix does not contain only Expressions and / or scalar values.
- eval()[source]
Compute, store and return the value of the underlying matrix of
Expression
objects.- Returns
self._value (np.array) – The value of the underlying matrix of
Expression
objects after the corresponding PEP was solved numerically.- Raises
ValueError("The PEP must be solved to evaluate PSDMatrix!") if the PEP has not been solved yet. –
- eval_dual()[source]
Compute, store and return the value of the dual variable of this
PSDMatrix
.- Returns
self._dual_variable_value (ndarray of floats) – The value of the dual variable of this
PSDMatrix
after the corresponding PEP was solved numerically.- Raises
ValueError("The PEP must be solved to evaluate PSDMatrix dual variables!") if the PEP has not been solved yet. –
Function
- class PEPit.Function(is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]
Bases:
object
A
Function
object encodes a function or an operator.Warning
This class must be overwritten by a child class that encodes some conditions on the
Function
. In particular, the method add_class_constraints must be overwritten. See thePEPit.functions
andPEPit.operators
modules.Some
Function
objects are defined from scratch as leafFunction
objects, and some are linear combinations of pre-existing ones.- Attributes
_is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaves.
decomposition_dict (dict) – decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.list_of_points (list) – A list of triplets storing the points where this
Function
has been evaluated, as well as the associated subgradients and function values.list_of_stationary_points (list) – The sublist of self.list_of_points of stationary points (characterized by some subgradient=0).
list_of_constraints (list) – The list of
Constraint
objects associated with thisFunction
.counter (int) – counts the number of leaf
Function
objects.
Note
PEPit was initially tough for evaluating performances of optimization algorithms. Operators are represented in the same way as functions, but function values must not be used (they don’t have any sense in this framework). Use gradient to access an operator value.
Function
objects can be added or subtracted together. They can also be multiplied and divided by a scalar value.Example
>>> func1 = Function() >>> func2 = Function() >>> new_func = (- func1 + func2) / 5
Function
objects can also be instantiated via the following arguments.- Parameters
is_leaf (bool) – True if self is defined from scratch. False if self is defined as a linear combination of leaves.
decomposition_dict (dict) – Decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Note
If is_leaf is True, then decomposition_dict must be provided as None. Then self.decomposition_dict will be set to {self: 1}.
Note
reuse_gradient is typically set to True when this
Function
is differentiable, that is there exists only one subgradient perPoint
.Instantiating the
Function
object of the first example can be done byExample
>>> func1 = Function() >>> func2 = Function() >>> new_func = Function(is_leaf=False, decomposition_dict = {func1: -1/5, func2: 1/5})
- add_class_constraints()[source]
Warning
Needs to be overwritten with interpolation conditions (or necessary conditions for interpolation for obtaining possibly non-tight upper bounds on the worst-case performance).
This method is run by the
PEP
just before solving the problem. It evaluates interpolation conditions for the 2 lists of points that is stored in thisFunction
.- Raises
NotImplementedError – This method must be overwritten in children classes
- add_constraint(constraint)[source]
Store a new
Constraint
to the list of constraints of thisFunction
.- Parameters
constraint (Constraint) – typically resulting from a comparison of 2
Expression
objects.- Raises
AssertionError – if provided constraint is not a
Constraint
object.
- add_point(triplet)[source]
Add a triplet (point, gradient, function_value) to the list of points of this function.
- Parameters
triplet (tuple) – A tuple containing 3 elements: point (
Point
), gradient (Point
), and function value (Expression
).
- add_psd_matrix(matrix_of_expressions)[source]
Store a new matrix of
Expression
s that we enforce to be positive semidefinite.- Parameters
matrix_of_expressions (2D ndarray of Expression) – a square matrix of
Expression
.- Raises
AssertionError – if provided matrix is not a square matrix.
TypeError – if provided matrix does not contain only Expressions.
- fixed_point()[source]
This routine outputs a fixed point of this function, that is \(x\) such that \(x\in\partial f(x)\). If self is an operator \(A\), the fixed point is such that \(Ax = x\).
- Returns
x (Point) – a fixed point of the differential of self.
x (Point) – nabla f(x) = x.
fx (Expression) – a function value (useful only if self is a function).
- get_is_leaf()[source]
- Returns
self._is_leaf (bool) – allows to access the protected attribute _is_leaf.
- gradient(point)[source]
Return the gradient (or a subgradient) of this
Function
evaluated at point.- Parameters
point (Point) – any point.
- Returns
Point – a gradient (
Point
) of thisFunction
on point (Point
).
Note
the method subgradient does the exact same thing.
- oracle(point)[source]
Return a gradient (or a subgradient) and the function value of self evaluated at point.
- Parameters
point (Point) – any point.
- Returns
tuple – a (sub)gradient (
Point
) and a function value (Expression
).
- stationary_point(return_gradient_and_function_value=False)[source]
Create a new stationary point, as well as its zero gradient and its function value.
- Parameters
return_gradient_and_function_value (bool) – if True, return the triplet point (
Point
), gradient (Point
), function value (Expression
). Otherwise, return only the point (Point
).- Returns
Point or tuple – an optimal point
Functions classes
Functions
Convex
- class PEPit.functions.ConvexFunction(is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]
Bases:
Function
The
ConvexFunction
class overwrites the add_class_constraints method ofFunction
, implementing the interpolation constraints of the class of convex, closed and proper (CCP) functions (i.e., convex functions whose epigraphs are non-empty closed sets).General CCP functions are not characterized by any parameter, hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.functions import ConvexFunction >>> problem = PEP() >>> func = problem.declare_function(function_class=ConvexFunction)
- Parameters
is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.
decomposition_dict (dict) – Decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Strongly convex
- class PEPit.functions.StronglyConvexFunction(mu, is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]
Bases:
Function
The
StronglyConvexFunction
class overwrites the add_class_constraints method ofFunction
, implementing the interpolation constraints of the class of strongly convex closed proper functions (strongly convex functions whose epigraphs are non-empty closed sets).- Attributes
mu (float) – strong convexity parameter
Strongly convex functions are characterized by the strong convexity parameter \(\mu\), hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.functions import StronglyConvexFunction >>> problem = PEP() >>> func = problem.declare_function(function_class=StronglyConvexFunction, mu=.1)
References
- Parameters
mu (float) – The strong convexity parameter.
is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.
decomposition_dict (dict) – Decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Smooth
- class PEPit.functions.SmoothFunction(L=1.0, is_leaf=True, decomposition_dict=None, reuse_gradient=True)[source]
Bases:
Function
The
SmoothFunction
class overwrites the add_class_constraints method ofFunction
, implementing the interpolation constraints of the class of smooth (not necessarily convex) functions.- Attributes
L (float) – smoothness parameter
Smooth functions are characterized by the smoothness parameter L, hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.functions import SmoothFunction >>> problem = PEP() >>> func = problem.declare_function(function_class=SmoothFunction, L=1.)
References
- Parameters
L (float) – The smoothness parameter.
is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.
decomposition_dict (dict) – Decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Note
Smooth functions are necessarily differentiable, hence reuse_gradient is set to True.
Convex and smooth
- class PEPit.functions.SmoothConvexFunction(L=1.0, is_leaf=True, decomposition_dict=None, reuse_gradient=True)[source]
Bases:
SmoothStronglyConvexFunction
The
SmoothConvexFunction
class implements smooth convex functions as particular cases ofSmoothStronglyConvexFunction
.- Attributes
L (float) – smoothness parameter
Smooth convex functions are characterized by the smoothness parameter L, hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.functions import SmoothConvexFunction >>> problem = PEP() >>> func = problem.declare_function(function_class=SmoothConvexFunction, L=1.)
- Parameters
is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.
decomposition_dict (dict) – Decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.L (float) – The smoothness parameter.
Note
Smooth convex functions are necessarily differentiable, hence reuse_gradient is set to True.
Convex and quadratically upper bounded
- class PEPit.functions.ConvexQGFunction(L=1, is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]
Bases:
Function
The
ConvexQGFunction
class overwrites the add_class_constraints method ofFunction
, implementing the interpolation constraints of the class of quadratically upper bounded (\(\text{QG}^+\) [1]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex functions.- Attributes
L (float) – The quadratic upper bound parameter
General quadratically upper bounded (\(\text{QG}^+\)) convex functions are characterized by the quadratic growth parameter L, hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.functions import ConvexQGFunction >>> problem = PEP() >>> func = problem.declare_function(function_class=ConvexQGFunction, L=1)
References:
- Parameters
L (float) – The quadratic upper bound parameter.
is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.
decomposition_dict (dict) – decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Strongly convex and smooth
- class PEPit.functions.SmoothStronglyConvexFunction(mu, L=1.0, is_leaf=True, decomposition_dict=None, reuse_gradient=True)[source]
Bases:
Function
The
SmoothStronglyConvexFunction
class overwrites the add_class_constraints method ofFunction
, by implementing interpolation constraints of the class of smooth strongly convex functions.- Attributes
mu (float) – strong convexity parameter
L (float) – smoothness parameter
Smooth strongly convex functions are characterized by parameters \(\mu\) and L, hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.functions import SmoothStronglyConvexFunction >>> problem = PEP() >>> func = problem.declare_function(function_class=SmoothStronglyConvexFunction, mu=.1, L=1.)
References
- Parameters
mu (float) – The strong convexity parameter.
L (float) – The smoothness parameter.
is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.
decomposition_dict (dict) – Decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Note
Smooth strongly convex functions are necessarily differentiable, hence reuse_gradient is set to True.
Convex and Lipschitz continuous
- class PEPit.functions.ConvexLipschitzFunction(M=1.0, is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]
Bases:
Function
The
ConvexLipschitzFunction
class overwrites the add_class_constraints method ofFunction
, implementing the interpolation constraints of the class of convex closed proper (CCP) Lipschitz continuous functions.- Attributes
M (float) – Lipschitz parameter
CCP Lipschitz continuous functions are characterized by a parameter M, hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.functions import ConvexLipschitzFunction >>> problem = PEP() >>> func = problem.declare_function(function_class=ConvexLipschitzFunction, M=1.)
References
- Parameters
M (float) – The Lipschitz continuity parameter of self.
is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.
decomposition_dict (dict) – Decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Convex indicator
- class PEPit.functions.ConvexIndicatorFunction(D=inf, is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]
Bases:
Function
The
ConvexIndicatorFunction
class overwrites the add_class_constraints method ofFunction
, implementing interpolation constraints for the class of closed convex indicator functions.- Attributes
D (float) – upper bound on the diameter of the feasible set, possibly set to np.inf
Convex indicator functions are characterized by a parameter D, hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.functions import ConvexIndicatorFunction >>> problem = PEP() >>> func = problem.declare_function(function_class=ConvexIndicatorFunction, D=1)
References
- Parameters
D (float) – Diameter of the support of self.
is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf.
decomposition_dict (dict) – Decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Convex support functions
- class PEPit.functions.ConvexSupportFunction(M=inf, is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]
Bases:
Function
The
ConvexSupportFunction
class overwrites the add_class_constraints method ofFunction
, implementing interpolation constraints for the class of closed convex support functions.- Attributes
M (float) – upper bound on the Lipschitz constant
Convex support functions are characterized by a parameter M, hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.functions import ConvexSupportFunction >>> problem = PEP() >>> func = problem.declare_function(function_class=ConvexSupportFunction, M=1)
References
- Parameters
M (float) – Lipschitz constant of self.
is_leaf (bool) – True if self is defined from scratch. False is self is defined as linear combination of leaf .
decomposition_dict (dict) – Decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Restricted secant inequality and error bound
- class PEPit.functions.RsiEbFunction(mu, L=1, is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]
Bases:
Function
The
RsiEbFunction
class overwrites the add_class_constraints method ofFunction
, implementing the interpolation constraints of the class of functions verifying the “lower” restricted secant inequality (\(\text{RSI}^-\)) and the “upper” error bound (\(\text{EB}^+\)).- Attributes
mu (float) – Restricted sequent inequality parameter
L (float) – Error bound parameter
\(\text{RSI}^-\) and \(\text{EB}^+\) functions are characterized by parameters \(\mu\) and L, hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.functions import RsiEbFunction >>> problem = PEP() >>> h = problem.declare_function(function_class=RsiEbFunction, mu=.1, L=1)
References
A definition of the class of \(\text{RSI}^-\) and \(\text{EB}^+\) functions can be found in [1].
- Parameters
mu (float) – The restricted secant inequality parameter.
L (float) – The upper error bound parameter.
is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf .
decomposition_dict (dict) – decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Operators
Monotone
- class PEPit.operators.MonotoneOperator(is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]
Bases:
Function
The
MonotoneOperator
class overwrites the add_class_constraints method ofFunction
, implementing interpolation constraints for the class of maximally monotone operators.Note
Operator values can be requested through gradient and function values should not be used.
General maximally monotone operators are not characterized by any parameter, hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.operators import MonotoneOperator >>> problem = PEP() >>> h = problem.declare_function(function_class=MonotoneOperator)
References
[1] H. H. Bauschke and P. L. Combettes (2017). Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer New York, 2nd ed.
- Parameters
is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf .
decomposition_dict (dict) – Decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Strongly monotone
- class PEPit.operators.StronglyMonotoneOperator(mu, is_leaf=True, decomposition_dict=None, reuse_gradient=False)[source]
Bases:
Function
The
StronglyMonotoneOperator
class overwrites the add_class_constraints method ofFunction
, implementing interpolation constraints of the class of strongly monotone (maximally monotone) operators.Note
Operator values can be requested through gradient and function values should not be used.
- Attributes
mu (float) – strong monotonicity parameter
Strongly monotone (and maximally monotone) operators are characterized by the parameter \(\mu\), hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.operators import StronglyMonotoneOperator >>> problem = PEP() >>> h = problem.declare_function(function_class=StronglyMonotoneOperator, mu=.1)
References
Discussions and appropriate pointers for the problem of interpolation of maximally monotone operators can be found in: [1] E. Ryu, A. Taylor, C. Bergeling, P. Giselsson (2020). Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization, 30(3), 2251-2271.
- Parameters
mu (float) – Strong monotonicity parameter.
is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf .
decomposition_dict (dict) – Decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Lipschitz continuous
- class PEPit.operators.LipschitzOperator(L=1.0, is_leaf=True, decomposition_dict=None, reuse_gradient=True)[source]
Bases:
Function
The
LipschitzOperator
class overwrites the add_class_constraints method ofFunction
, implementing the interpolation constraints of the class of Lipschitz continuous operators.Note
Operator values can be requested through gradient and function values should not be used.
- Attributes
L (float)
Cocoercive operators are characterized by the parameter \(L\), hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.operators import LipschitzOperator >>> problem = PEP() >>> func = problem.declare_function(function_class=LipschitzOperator, L=1.)
Notes
By setting L=1, we define a non expansive operator.
By setting L<1, we define a contracting operator.
References
[1] M. Kirszbraun (1934). Uber die zusammenziehende und Lipschitzsche transformationen. Fundamenta Mathematicae, 22 (1934).
[2] F.A. Valentine (1943). On the extension of a vector function so as to preserve a Lipschitz condition. Bulletin of the American Mathematical Society, 49 (2).
[3] F.A. Valentine (1945). A Lipschitz condition preserving extension for a vector function. American Journal of Mathematics, 67(1).
Discussions and appropriate pointers for the interpolation problem can be found in: [4] E. Ryu, A. Taylor, C. Bergeling, P. Giselsson (2020). Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization, 30(3), 2251-2271.
- Parameters
L (float) – Lipschitz continuity parameter.
is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf .
decomposition_dict (dict) – Decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Note
Lipschitz continuous operators are necessarily continuous, hence reuse_gradient is set to True.
Strongly monotone and Lipschitz continuous
- class PEPit.operators.LipschitzStronglyMonotoneOperator(mu, L=1.0, is_leaf=True, decomposition_dict=None, reuse_gradient=True)[source]
Bases:
Function
The
LipschitzStronglyMonotoneOperator
class overwrites the add_class_constraints method ofFunction
, implementing some constraints (which are not necessary and sufficient for interpolation) for the class of Lipschitz continuous strongly monotone (and maximally monotone) operators.Note
Operator values can be requested through gradient and function values should not be used.
Warning
Lipschitz strongly monotone operators do not enjoy known interpolation conditions. The conditions implemented in this class are necessary but a priori not sufficient for interpolation. Hence the numerical results obtained when using this class might be non-tight upper bounds (see Discussions in [1, Section 2]).
- Attributes
mu (float) – strong monotonicity parameter
L (float) – Lipschitz parameter
Lipschitz continuous strongly monotone operators are characterized by parameters \(\mu\) and L, hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.operators import LipschitzStronglyMonotoneOperator >>> problem = PEP() >>> h = problem.declare_function(function_class=LipschitzStronglyMonotoneOperator, mu=.1, L=1.)
References
- Parameters
mu (float) – The strong monotonicity parameter.
L (float) – The Lipschitz continuity parameter.
is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf .
decomposition_dict (dict) – Decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Note
Lipschitz continuous strongly monotone operators are necessarily continuous, hence reuse_gradient is set to True.
Cocoercive
- class PEPit.operators.CocoerciveOperator(beta=1.0, is_leaf=True, decomposition_dict=None, reuse_gradient=True)[source]
Bases:
Function
The
CocoerciveOperator
class overwrites the add_class_constraints method ofFunction
, implementing the interpolation constraints of the class of cocoercive (and maximally monotone) operators.Note
Operator values can be requested through gradient and function values should not be used.
- Attributes
beta (float) – cocoercivity parameter
Cocoercive operators are characterized by the parameter \(\beta\), hence can be instantiated as
Example
>>> from PEPit import PEP >>> from PEPit.operators import CocoerciveOperator >>> problem = PEP() >>> func = problem.declare_function(function_class=CocoerciveOperator, beta=1.)
References
- Parameters
beta (float) – The cocoercivity parameter.
is_leaf (bool) – True if self is defined from scratch. False if self is defined as linear combination of leaf .
decomposition_dict (dict) – Decomposition of self as linear combination of leaf
Function
objects. Keys areFunction
objects and values are their associated coefficients.reuse_gradient (bool) – If True, the same subgradient is returned when one requires it several times on the same
Point
. If False, a new subgradient is computed each time one is required.
Note
Cocoercive operators are necessarily continuous, hence reuse_gradient is set to True.
Primitive steps
Inexact gradient step
- PEPit.primitive_steps.inexact_gradient_step(x0, f, gamma, epsilon, notion='absolute')[source]
This routines performs a step \(x \leftarrow x_0 - \gamma d_{x_0}\) where \(d_{x_0}\) is close to the gradient of \(f\) in \(x_0\) in the following sense:
\[\begin{split}\|d_{x_0} - \nabla f(x_0)\|^2 \leqslant \left\{ \begin{eqnarray} & \varepsilon^2 & \text{if notion is set to 'absolute'}, \\ & \varepsilon^2 \|\nabla f(x_0)\|^2 & \text{if notion is set to 'relative'}. \end{eqnarray} \right.\end{split}\]This relative approximation is used at least in in 3 PEPit examples, in particular in 2 unconstrained convex minimizations: an inexact gradient descent, and an inexact accelerated gradient.
- Parameters
- Returns
x (Point) – the output point.
dx0 (Point) – the approximate (sub)gradient of f at x0.
fx0 (Expression) – the value of the function f at x0.
- Raises
ValueError – if notion is not set in [‘absolute’, ‘relative’].
Note
When \(\gamma\) is set to 0, then the this routine returns \(x_0\), \(d_{x_0}\), and \(f_{x_0}\). It is used as is in the example of unconstrained convex minimization scheme called “inexact gradient exact line search” only to access to the direction \(d_{x_0}\) close to the gradient \(g_{x_0}\).
Exact line-search step
- PEPit.primitive_steps.exact_linesearch_step(x0, f, directions)[source]
This routines outputs some \(x\) by mimicking an exact line/span search in specified directions. It is used for instance in
PEPit.examples.unconstrained_convex_minimization.wc_gradient_exact_line_search
and inPEPit.examples.unconstrained_convex_minimization.wc_conjugate_gradient
.The routine aims at mimicking the operation:
\begin{eqnarray} x & = & x_0 - \sum_{i=1}^{T} \gamma_i d_i,\\ \text{with } \overrightarrow{\gamma} & = & \arg\min_\overrightarrow{\gamma} f\left(x_0 - \sum_{i=1}^{T} \gamma_i d_i\right), \end{eqnarray}where \(T\) denotes the number of directions \(d_i\). This operation can equivalently be described in terms of the following conditions:
\begin{eqnarray} x - x_0 & \in & \text{span}\left\{d_1,\ldots,d_T\right\}, \\ \nabla f(x) & \perp & \text{span}\left\{d_1,\ldots,d_T\right\}. \end{eqnarray}In this routine, we instead constrain \(x_{t}\) and \(\nabla f(x_{t})\) to satisfy
\begin{eqnarray} \forall i=1,\ldots,T: & \left< \nabla f(x);\, d_i \right> & = & 0,\\ \text{and } & \left< \nabla f(x);\, x - x_0 \right> & = & 0, \end{eqnarray}which is a relaxation of the true line/span search conditions.
Note
The latest condition is automatically implied by the 2 previous ones.
Warning
One can notice this routine does not encode completely the fact that \(x_{t+1} - x_t\) must be a linear combination of the provided directions (i.e., this routine performs a relaxation). Therefore, if this routine is included in a PEP, the obtained value might be an upper bound on the true worst-case value.
Although not always tight, this relaxation is often observed to deliver pretty accurate results (in particular, it automatically produces tight results under some specific conditions, see, e.g., [1]). Two such examples are provided in the conjugate gradient and gradient with exact line search example files.
References
- Parameters
- Returns
x (Point) – such that all vectors in directions are orthogonal to the (sub)gradient of f at x.
gx (Point) – a (sub)gradient of f at x.
fx (Expression) – the function f evaluated at x.
Proximal step
- PEPit.primitive_steps.proximal_step(x0, f, gamma)[source]
This routine performs a proximal step of step-size gamma, starting from x0, and on function f. That is, it performs:
\begin{eqnarray} x \triangleq \text{prox}_{\gamma f}(x_0) & \triangleq & \arg\min_x \left\{ \gamma f(x) + \frac{1}{2} \|x - x_0\|^2 \right\}, \\ & \Updownarrow & \\ 0 & = & \gamma g_x + x - x_0 \text{ for some } g_x\in\partial f(x),\\ & \Updownarrow & \\ x & = & x_0 - \gamma g_x \text{ for some } g_x\in\partial f(x). \end{eqnarray}
Inexact proximal step
- PEPit.primitive_steps.inexact_proximal_step(x0, f, gamma, opt='PD_gapII')[source]
This routine encodes an inexact proximal operation with step size \(\gamma\). That is, it outputs a tuple \((x, g\in \partial f(x), f(x), w, v\in\partial f(w), f(w), \varepsilon)\) which are described as follows.
First, \(x\) is an approximation to the proximal point of \(x_0\) on function \(f\):
\[x \approx \mathrm{prox}_{\gamma f}(x_0)\triangleq\arg\min_x \left\{ \gamma f(x) + \frac{1}{2}\|x-x_0\|^2\right\},\]where the meaning of \(\approx\) depends on the option “opt” and is explained below. The notions of inaccuracy implemented within this routine are specified using primal and dual proximal problems, denoted by
\begin{eqnarray} &\Phi^{(p)}_{\gamma f}(x; x_0) \triangleq \gamma f(x) + \frac{1}{2}\|x-x_0\|^2,\\ &\Phi^{(d)}_{\gamma f}(v; x_0) \triangleq -\gamma f^*(v)-\frac{1}{2}\|x_0-\gamma v\|^2 + \frac{1}{2}\|x_0\|^2,\\ \end{eqnarray}where \(\Phi^{(p)}_{\gamma f}(x;x_0)\) and \(\Phi^{(d)}_{\gamma f}(v;x_0)\) respectively denote the primal and the dual proximal problems, and where \(f^*\) is the Fenchel conjugate of \(f\). The options below encode different meanings of “\(\approx\)” by specifying accuracy requirements on primal-dual pairs:
\[(x,v) \approx_{\varepsilon} \left(\mathrm{prox}_{\gamma f}(x_0),\,\mathrm{prox}_{f^*/\gamma}(x_0/\gamma)\right),\]where \(\approx_{\varepsilon}\) corresponds to require the primal-dual pair \((x,v)\) to satisfy some primal-dual accuracy requirement:
\[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(v;x_0) \leqslant \varepsilon,\]where \(\varepsilon\geqslant 0\) is the error magnitude, which is returned to the user so that one can constrain it to be bounded by some other values.
Relation to the exact proximal operation: In the exact case (no error in the computation, \(\varepsilon=0\)), \(v\) corresponds to the solution of the dual proximal problem and one can write
\[x = x_0-\gamma g,\]with \(g=v=\mathrm{prox}_{f^*/\gamma}(x_0/\gamma)\in\partial f(x)\), and \(x=w\).
Reformulation of the primal-dual gap: In regard with the exact proximal computation; the inexact case under consideration here can be described as performing
\[x = x_0-\gamma v + e,\]where \(v\) is an \(\epsilon\)-subgradient of \(f\) at \(x\) (notation \(v\in\partial_{\epsilon} f(x)\)) and \(e\) is some additional computation error. Those elements allow for a common convenient reformulation of the primal-dual gap, written in terms of the magnitudes of \(\epsilon\) and of \(e\):
\[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(v;x_0) = \frac{1}{2} \|e\|^2 + \gamma \epsilon.\]Options: The following options are available (a list of such choices is presented in [4]; we provide a reference for each of those choices below).
‘PD_gapI’ : the constraint imposed on the output is the vanilla (see, e.g., [2])
\[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(v;x_0) \leqslant \varepsilon.\]
This approximation requirement is used in one PEPit example: an accelerated inexact forward backward.
‘PD_gapII’ : the constraint is stronger than the vanilla primal-dual gap, as more structure is imposed (see, e.g., [1,5]) :
\[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(g;x_0) \leqslant \varepsilon,\]
where we imposed that \(v\triangleq g\in\partial f(x)\) and \(w\triangleq x\). This approximation requirement is used in two PEPit examples: in a relatively inexact proximal point algorithm and in a partially inexact Douglas-Rachford splitting.
‘PD_gapIII’ : the constraint is stronger than the vanilla primal-dual gap, as more structure is imposed (see, e.g., [3]):
\[\Phi^{(p)}_{\gamma f}(x;x_0)-\Phi^{(d)}_{\gamma f}(\tfrac{x_0 - x}{\gamma};x_0) \leqslant \varepsilon,\]
where we imposed that \(v \triangleq \frac{x_0 - x}{\gamma}\).
References
- Parameters
- Returns
x (Point) – the approximated proximal point.
gx (Point) – a (sub)gradient of f at x (subgradient used in evaluating the accuracy criterion).
fx (Expression) – f evaluated at x.
w (Point) – a point w such that v (see next output) is a subgradient of f at w.
v (Point) – the approximated proximal point of the dual problem, (sub)gradient of f evaluated at w.
fw (Expression) – f evaluated at w.
eps_var (Expression) – value of the primal-dual gap (which can be further bounded by the user).
Bregman gradient step
- PEPit.primitive_steps.bregman_gradient_step(gx0, sx0, mirror_map, gamma)[source]
This routine outputs \(x\) by performing a mirror step of step-size \(\gamma\). That is, denoting \(f\) the function to be minimized and \(h\) the mirror map, it performs
\[x = \arg\min_x \left[ f(x_0) + \left< \nabla f(x_0);\, x - x_0 \right> + \frac{1}{\gamma} D_h(x; x_0) \right],\]where \(D_h(x; x_0)\) denotes the Bregman divergence of \(h\) on \(x\) with respect to \(x_0\).
\[D_h(x; x_0) \triangleq h(x) - h(x_0) - \left< \nabla h(x_0);\, x - x_0 \right>.\]Warning
The mirror map \(h\) is assumed differentiable.
By differentiating the previous objective function, one can observe that
\[\nabla h(x) = \nabla h(x_0) - \gamma \nabla f(x_0).\]- Parameters
- Returns
x (Point) – new iterate \(\textbf{x} \triangleq x\).
sx (Point) – \(h\)’s gradient on new iterate \(x\) \(\textbf{sx} \triangleq \nabla h(x)\).
hx (Expression) – \(h\)’s value on new iterate \(\textbf{hx} \triangleq h(x)\).
Bregman proximal step
- PEPit.primitive_steps.bregman_proximal_step(sx0, mirror_map, min_function, gamma)[source]
This routine outputs \(x\) by performing a proximal mirror step of step-size \(\gamma\). That is, denoting \(f\) the function to be minimized and \(h\) the mirror map, it performs
\[x = \arg\min_x \left[ f(x) + \frac{1}{\gamma} D_h(x; x_0) \right],\]where \(D_h(x; x_0)\) denotes the Bregman divergence of \(h\) on \(x\) with respect to \(x_0\).
\[D_h(x; x_0) \triangleq h(x) - h(x_0) - \left< \nabla h(x_0);\, x - x_0 \right>.\]Warning
The mirror map \(h\) is assumed differentiable.
By differentiating the previous objective function, one can observe that
\[\nabla h(x) = \nabla h(x_0) - \gamma \nabla f(x).\]- Parameters
- Returns
x (Point) – new iterate \(\textbf{x} \triangleq x\).
sx (Point) – \(h\)’s gradient on new iterate \(x\) \(\textbf{sx} \triangleq \nabla h(x)\).
hx (Expression) – \(h\)’s value on new iterate \(\textbf{hx} \triangleq h(x)\).
gx (Point) – \(f\)’s gradient on new iterate \(x\) \(\textbf{gx} \triangleq \nabla f(x)\).
fx (Expression) – \(f\)’s value on new iterate \(\textbf{fx} \triangleq f(x)\).
Linear optimization step
- PEPit.primitive_steps.linear_optimization_step(dir, ind)[source]
This routine outputs the result of a minimization problem with linear objective (whose direction is provided by dir) on the domain of the (closed convex) indicator function ind. That is, it outputs a solution to
\[\arg\min_{\text{ind}(x)=0} \left< \text{dir};\, x \right>,\]One can notice that \(x\) is solution of this problem if and only if
\[- \text{dir} \in \partial \text{ind}(x).\]- Parameters
dir (Point) – direction of optimization
ind (ConvexIndicatorFunction) – convex indicator function
- Returns
x (Point) – the optimal point.
gx (Point) – the (sub)gradient of ind on x.
fx (Expression) – the function value of ind on x.
Epsilon-subgradient step
- PEPit.primitive_steps.epsilon_subgradient_step(x0, f, gamma)[source]
This routines performs a step \(x \leftarrow x_0 - \gamma g_0\) where \(g_0 \in\partial_{\varepsilon} f(x_0)\). That is, \(g_0\) is an \(\varepsilon\)-subgradient of \(f\) at \(x_0\). The set \(\partial_{\varepsilon} f(x_0)\) (referred to as the \(\varepsilon\)-subdifferential) is defined as (see [1, Section 3])
\[\partial_{\varepsilon} f(x)=\left\{g:\, f(z)\geqslant f(x)+\left< g;\, z-x \right>-\varepsilon \right\}.\]An alternative characterization of \(g_0 \in\partial_{\varepsilon} f(x_0)\) consists in writing
\[f(x_0)+f^*(g_0)-\left< g_0;x_0\right>\leqslant \varepsilon.\]References
- Parameters
- Returns
x (Point) – the output point.
g0 (Point) – an \(\varepsilon\)-subgradient of f at x0.
f0 (Expression) – the value of the function f at x0.
epsilon (Expression) – the value of epsilon.
Tools
Merge two dictionaries
Multiply two dictionaries
- PEPit.tools.multiply_dicts(dict1, dict2)[source]
Multiply 2 dictionaries in the sense of developing a product of 2 sums.
- Parameters
dict1 (dict) – any dictionary
dict2 (dict) – any dictionary
- Returns
product_dict (dict) – the keys are the couple of keys of dict1 and dict2 and the values the product of values of dict1 and dict2.
Prune a dictionary
Examples
Unconstrained convex minimization
Subgradient method under restricted secant inequality and error bound
Gradient descent for quadratically upper bounded convex objective
Gradient descent with decreasing step sizes for quadratically upper bounded convex objective
Conjugate gradient for quadratically upper bounded convex objective
Heavy Ball momentum for quadratically upper bounded convex objective
Gradient descent
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent(L, gamma, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \gamma) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: Gradient descent is described by
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size.
Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), the tight theoretical guarantee can be found in [1, Theorem 3.1]:
\[f(x_n)-f_\star \leqslant \frac{L}{4nL\gamma+2} \|x_0-x_\star\|^2,\]which is tight on some Huber loss functions.
References:
- Parameters
L (float) – the smoothness parameter.
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> L = 3 >>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=1 / L, n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 30 scalar constraint(s) ... function 1 : 30 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.16666664596175398 *** Example file: worst-case performance of gradient descent with fixed step-sizes *** PEPit guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
Subgradient method
- PEPit.examples.unconstrained_convex_minimization.wc_subgradient_method(M, n, gamma, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is convex and \(M\)-Lipschitz. This problem is a (possibly non-smooth) minimization problem.
This code computes a worst-case guarantee for the subgradient method. That is, it computes the smallest possible \(\tau(n, M, \gamma)\) such that the guarantee
\[\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star \leqslant \tau(n, M, \gamma)\]is valid, where \(x_t\) are the iterates of the subgradient method after \(t\leqslant n\) steps, where \(x_\star\) is a minimizer of \(f\), and when \(\|x_0-x_\star\|\leqslant 1\).
In short, for given values of \(M\), the step-size \(\gamma\) and the number of iterations \(n\), \(\tau(n, M, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star\) when \(\|x_0-x_\star\| \leqslant 1\).
Algorithm: For \(t\in \{0, \dots, n-1 \}\)
\begin{eqnarray} g_{t} & \in & \partial f(x_t) \\ x_{t+1} & = & x_t - \gamma g_t \end{eqnarray}Theoretical guarantee: The tight bound is obtained in [1, Section 3.2.3] and [2, Eq (2)]
\[\min_{0 \leqslant t \leqslant n} f(x_t)- f(x_\star) \leqslant \frac{M}{\sqrt{n+1}}\|x_0-x_\star\|,\]and tightness follows from the lower complexity bound for this class of problems, e.g., [3, Appendix A].
References: Classical references on this topic include [1, 2].
[2] S. Boyd, L. Xiao, A. Mutapcic (2003). Subgradient Methods (lecture notes).
- Parameters
M (float) – the Lipschitz parameter.
n (int) – the number of iterations.
gamma (float) – step-size.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> M = 2 >>> n = 6 >>> gamma = 1 / (M * sqrt(n + 1)) >>> pepit_tau, theoretical_tau = wc_subgradient_method(M=M, n=n, gamma=gamma, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 9x9 (PEPit) Setting up the problem: performance measure is minimum of 7 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 64 scalar constraint(s) ... function 1 : 64 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.7559825331741553 *** Example file: worst-case performance of subgradient method *** PEPit guarantee: min_(0 \leq t \leq n) f(x_i) - f_* <= 0.755983 ||x_0 - x_*|| Theoretical guarantee: min_(0 \leq t \leq n) f(x_i) - f_* <= 0.755929 ||x_0 - x_*||
Subgradient method under restricted secant inequality and error bound
- PEPit.examples.unconstrained_convex_minimization.wc_subgradient_method_rsi_eb(mu, L, gamma, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) verifies the “lower” restricted secant inequality (\(\mu-\text{RSI}^-\)) and the “upper” error bound (\(L-\text{EB}^+\)) [1].
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, \mu, L, \gamma)\) such that the guarantee
\[\| x_n - x_\star \|^2 \leqslant \tau(n, \mu, L, \gamma) \| x_0 - x_\star \|^2\]is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, \mu, L, \gamma)\) is computed as the worst-case value of \(\| x_n - x_\star \|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: Sub-gradient descent is described by
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size.
Theoretical guarantee: The tight theoretical guarantee can be found in [1, Prop 1] (upper bound) and [1, Theorem 2] (lower bound):
\[\| x_n - x_\star \|^2 \leqslant (1 - 2\gamma\mu + L^2 \gamma^2)^n \|x_0-x_\star\|^2.\]References:
Definition and convergence guarantees can be found in [1].
- Parameters
mu (float) – the rsi parameter
L (float) – the eb parameter.
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> mu = .1 >>> L = 1 >>> pepit_tau, theoretical_tau = wc_subgradient_method_rsi_eb(mu=mu, L=L, gamma=mu / L ** 2, n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 6x6 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 8 scalar constraint(s) ... function 1 : 8 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.9605893213566064 *** Example file: worst-case performance of gradient descent with fixed step-sizes *** PEPit guarantee: f(x_n)-f_* <= 0.960589 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.960596 ||x_0 - x_*||^2
Gradient descent with exact line search
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_exact_line_search(L, mu, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for the gradient descent (GD) with exact linesearch (ELS). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of the GD with ELS, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).
Algorithm: GD with ELS can be written as
\[x_{t+1} = x_t - \gamma_t \nabla f(x_t)\]with \(\gamma_t = \arg\min_{\gamma} f \left( x_t - \gamma \nabla f(x_t) \right)\).
Theoretical guarantee: The tight worst-case guarantee for GD with ELS, obtained in [1, Theorem 1.2], is
\[f(x_n) - f_\star \leqslant \left(\frac{L-\mu}{L+\mu}\right)^{2n} (f(x_0) - f_\star).\]References: The detailed approach (based on convex relaxations) is available in [1], along with theoretical bound.
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_gradient_exact_line_search(L=1, mu=.1, n=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 16 scalar constraint(s) ... function 1 : 16 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.44812204883466417 *** Example file: worst-case performance of gradient descent with exact linesearch (ELS) *** PEPit guarantee: f(x_n)-f_* <= 0.448122 (f(x_0)-f_*) Theoretical guarantee: f(x_n)-f_* <= 0.448125 (f(x_0)-f_*)
Conjugate gradient
- PEPit.examples.unconstrained_convex_minimization.wc_conjugate_gradient(L, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for the conjugate gradient (CG) method (with exact span searches). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0-x_\star\|^2\]is valid, where \(x_n\) is the output of the conjugate gradient method, and where \(x_\star\) is a minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0-x_\star\|^2 \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i)\]with
\[(\gamma_i)_{i \leqslant t} = \arg\min_{(\gamma_i)_{i \leqslant t}} f \left(x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i) \right)\]Theoretical guarantee:
The tight guarantee obtained in [1] is
\[f(x_n) - f_\star \leqslant\frac{L}{2 \theta_n^2}\|x_0-x_\star\|^2.\]where
\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}, \end{eqnarray}and tightness follows from [2, Theorem 3].
References: The detailed approach (based on convex relaxations) is available in [1, Corollary 6].
- Parameters
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_conjugate_gradient(L=1, n=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 18 scalar constraint(s) ... function 1 : 18 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.061893515427809735 *** Example file: worst-case performance of conjugate gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.0618935 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0618942 ||x_0 - x_*||^2
Heavy Ball momentum
- PEPit.examples.unconstrained_convex_minimization.wc_heavy_ball_momentum(mu, L, alpha, beta, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for the Heavy-ball (HB) method, aka Polyak momentum method. That is, it computes the smallest possible \(\tau(n, L, \mu, \alpha, \beta)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \alpha, \beta) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of the Heavy-ball (HB) method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu, \alpha, \beta)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \alpha \nabla f(x_t) + \beta (x_t-x_{t-1})\]with
\[\alpha \in (0, \frac{1}{L}]\]and
\[\beta = \sqrt{(1 - \alpha \mu)(1 - L \alpha)}\]Theoretical guarantee:
The upper guarantee obtained in [2, Theorem 4] is
\[f(x_n) - f_\star \leqslant (1 - \alpha \mu)^n (f(x_0) - f_\star).\]References: This methods was first introduce in [1, Section 2], and convergence upper bound was proven in [2, Theorem 4].
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
alpha (float) – parameter of the scheme.
beta (float) – parameter of the scheme such that \(0<\beta<1\) and \(0<\alpha<2(1+\beta)\).
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> mu = 0.1 >>> L = 1. >>> alpha = 1 / (2 * L) # alpha \in [0, 1 / L] >>> beta = sqrt((1 - alpha * mu) * (1 - L * alpha)) >>> pepit_tau, theoretical_tau = wc_heavy_ball_momentum(mu=mu, L=L, alpha=alpha, beta=beta, n=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 5x5 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 12 scalar constraint(s) ... function 1 : 12 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.753492450790045 *** Example file: worst-case performance of the Heavy-Ball method *** PEPit guarantee: f(x_n)-f_* <= 0.753492 (f(x_0) - f(x_*)) Theoretical guarantee: f(x_n)-f_* <= 0.9025 (f(x_0) - f(x_*))
Accelerated gradient for convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_accelerated_gradient_convex(mu, L, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex (\(\mu\) is possibly 0).
This code computes a worst-case guarantee for an accelerated gradient method, a.k.a. fast gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the accelerated gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: The accelerated gradient method of this example is provided by
\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \\ y_{t+1} & = & x_{t+1} + \frac{t-1}{t+2} (x_{t+1} - x_t). \end{eqnarray}Theoretical guarantee: When \(\mu=0\), a tight empirical guarantee can be found in [1, Table 1]:
\[f(x_n)-f_\star \leqslant \frac{2L\|x_0-x_\star\|^2}{n^2 + 5 n + 6},\]where tightness is obtained on some Huber loss functions.
References:
- Parameters
mu (float) – the strong convexity parameter
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_convex(mu=0, L=1, n=1, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 4x4 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 6 scalar constraint(s) ... function 1 : 6 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.16666666668209376 *** Example file: worst-case performance of accelerated gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
Accelerated gradient for strongly convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_accelerated_gradient_strongly_convex(mu, L, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for an accelerated gradient method, a.k.a fast gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \left(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2\right),\]is valid, where \(x_n\) is the output of the accelerated gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} y_t & = & x_t + \frac{\sqrt{L} - \sqrt{\mu}}{\sqrt{L} + \sqrt{\mu}}(x_t - x_{t-1}) \\ x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \end{eqnarray}with \(x_{-1}:= x_0\).
Theoretical guarantee:
The following upper guarantee can be found in [1, Corollary 4.15]:
\[f(x_n)-f_\star \leqslant \left(1 - \sqrt{\frac{\mu}{L}}\right)^n \left(f(x_0) - f(x_\star) + \frac{\mu}{2}\|x_0 - x_\star\|^2\right).\]References:
- Parameters
mu (float) – the strong convexity parameter
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_strongly_convex(mu=0.1, L=1, n=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 5x5 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 12 scalar constraint(s) ... function 1 : 12 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.34758587217463155 *** Example file: worst-case performance of the accelerated gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.347586 (f(x_0) - f(x_*) + mu/2*||x_0 - x_*||**2) Theoretical guarantee: f(x_n)-f_* <= 0.467544 (f(x_0) - f(x_*) + mu/2*||x_0 - x_*||**2)
Optimized gradient
- PEPit.examples.unconstrained_convex_minimization.wc_optimized_gradient(L, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for optimized gradient method (OGM). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of OGM and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: The optimized gradient method is described by
\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t)\\ y_{t+1} & = & x_{t+1} + \frac{\theta_{t}-1}{\theta_{t+1}}(x_{t+1}-x_t)+\frac{\theta_{t}}{\theta_{t+1}}(x_{t+1}-y_t), \end{eqnarray}with
\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}. \end{eqnarray}Theoretical guarantee: The tight theoretical guarantee can be found in [2, Theorem 2]:
\[f(x_n)-f_\star \leqslant \frac{L\|x_0-x_\star\|^2}{2\theta_n^2},\]where tightness follows from [3, Theorem 3].
References: The optimized gradient method was developed in [1, 2]; the corresponding lower bound was first obtained in [3].
- Parameters
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_optimized_gradient(L=3, n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 30 scalar constraint(s) ... function 1 : 30 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.07675218017587908 *** Example file: worst-case performance of optimized gradient method *** PEPit guarantee: f(y_n)-f_* <= 0.0767522 ||x_0 - x_*||^2 Theoretical guarantee: f(y_n)-f_* <= 0.0767518 ||x_0 - x_*||^2
Optimized gradient for gradient
- PEPit.examples.unconstrained_convex_minimization.wc_optimized_gradient_for_gradient(L, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for optimized gradient method for gradient (OGM-G). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[\|\nabla f(x_n)\|^2 \leqslant \tau(n, L) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of OGM-G and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(\|\nabla f(x_n)\|^2\) when \(f(x_0)-f_\star \leqslant 1\).
Algorithm: For \(t\in\{0,1,\ldots,n-1\}\), the optimized gradient method for gradient [1, Section 6.3] is described by
\begin{eqnarray} y_{t+1} & = & x_t - \frac{1}{L} \nabla f(x_t),\\ x_{t+1} & = & y_{t+1} + \frac{(\tilde{\theta}_t-1)(2\tilde{\theta}_{t+1}-1)}{\tilde{\theta}_t(2\tilde{\theta}_t-1)}(y_{t+1}-y_t)+\frac{2\tilde{\theta}_{t+1}-1}{2\tilde{\theta}_t-1}(y_{t+1}-x_t), \end{eqnarray}with
\begin{eqnarray} \tilde{\theta}_n & = & 1 \\ \tilde{\theta}_t & = & \frac{1 + \sqrt{4 \tilde{\theta}_{t+1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \tilde{\theta}_0 & = & \frac{1 + \sqrt{8 \tilde{\theta}_{1}^2 + 1}}{2}. \end{eqnarray}Theoretical guarantee: The tight worst-case guarantee can be found in [1, Theorem 6.1]:
\[\|\nabla f(x_n)\|^2 \leqslant \frac{2L(f(x_0)-f_\star)}{\tilde{\theta}_0^2},\]where tightness is achieved on Huber losses, see [1, Section 6.4].
References: The optimized gradient method for gradient was developed in [1].
- Parameters
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_optimized_gradient_for_gradient(L=3, n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 30 scalar constraint(s) ... function 1 : 30 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.30700758289614183 *** Example file: worst-case performance of optimized gradient method for gradient *** PEP-it guarantee: ||f'(x_n)||^2 <= 0.307008 (f(x_0) - f_*) Theoretical guarantee: ||f'(x_n)||^2 <= 0.307007 (f(x_0) - f_*)
Robust momentum
- PEPit.examples.unconstrained_convex_minimization.wc_robust_momentum(mu, L, lam, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly-convex.
This code computes a worst-case guarantee for the robust momentum method (RMM). That is, it computes the smallest possible \(\tau(n, \mu, L, \lambda)\) such that the guarantee
\[v(x_{n+1}) \leqslant \tau(n, \mu, L, \lambda) v(x_{n}),\]is valid, where \(x_n\) is the \(n^{\mathrm{th}}\) iterate of the RMM, and \(x_\star\) is a minimizer of \(f\). The function \(v(.)\) is a well-chosen Lyapunov defined as follows,
\begin{eqnarray} v(x_t) & = & l\|z_t - x_\star\|^2 + q_t, \\ q_t & = & (L - \mu) \left(f(x_t) - f_\star - \frac{\mu}{2}\|y_t - x_\star\|^2 - \frac{1}{2}\|\nabla f(y_t) - \mu (y_t - x_\star)\|^2 \right), \end{eqnarray}with \(\kappa = \frac{\mu}{L}\), \(\rho = \lambda (1 - \frac{1}{\kappa}) + (1 - \lambda) \left(1 - \frac{1}{\sqrt{\kappa}}\right)\), and \(l = \mu^2 \frac{\kappa - \kappa \rho^2 - 1}{2 \rho (1 - \rho)}\).
Algorithm:
For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} x_{t+1} & = & x_{t} + \beta (x_t - x_{t-1}) - \alpha \nabla f(y_t), \\ y_{t+1} & = & y_{t} + \gamma (x_t - x_{t-1}), \end{eqnarray}with \(x_{-1}, x_0 \in \mathrm{R}^d\), and with parameters \(\alpha = \frac{\kappa (1 - \rho^2)(1 + \rho)}{L}\), \(\beta = \frac{\kappa \rho^3}{\kappa - 1}\), \(\gamma = \frac{\rho^2}{(\kappa - 1)(1 - \rho)^2(1 + \rho)}\).
Theoretical guarantee:
A convergence guarantee (empirically tight) is obtained in [1, Theorem 1],
\[v(x_{n+1}) \leqslant \rho^2 v(x_n),\]with \(\rho = \lambda (1 - \frac{1}{\kappa}) + (1 - \lambda) \left(1 - \frac{1}{\sqrt{\kappa}}\right)\).
References:
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
lam (float) – if \(\lambda=1\) it is the gradient descent, if \(\lambda=0\), it is the Triple Momentum Method.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Examples
>>> pepit_tau, theoretical_tau = wc_robust_momentum(mu=0.1, L=1, lam=0.2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 5x5 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 6 scalar constraint(s) ... function 1 : 6 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.5285548355275751 *** Example file: worst-case performance of the Robust Momentum Method *** PEPit guarantee: v(x_(n+1)) <= 0.528555 v(x_n) Theoretical guarantee: v(x_(n+1)) <= 0.528555 v(x_n)
Triple momentum
- PEPit.examples.unconstrained_convex_minimization.wc_triple_momentum(mu, L, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for triple momentum method (TMM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the TMM, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm:
For \(t \in \{ 1, \dots, n\}\)
\begin{eqnarray} \xi_{t+1} &&= (1 + \beta) \xi_{t} - \beta \xi_{t-1} - \alpha \nabla f(y_t) \\ y_{t} &&= (1+\gamma ) \xi_{t} -\gamma \xi_{t-1} \\ x_{t} && = (1 + \delta) \xi_{t} - \delta \xi_{t-1} \end{eqnarray}with
\begin{eqnarray} \kappa &&= \frac{L}{\mu} , \quad \rho = 1- \frac{1}{\sqrt{\kappa}}\\ (\alpha, \beta, \gamma,\delta) && = \left(\frac{1+\rho}{L}, \frac{\rho^2}{2-\rho}, \frac{\rho^2}{(1+\rho)(2-\rho)}, \frac{\rho^2}{1-\rho^2}\right) \end{eqnarray}and
\begin{eqnarray} \xi_{0} = x_0 \\ \xi_{1} = x_0 \\ y = x_0 \end{eqnarray}Theoretical guarantee: A theoretical upper (empirically tight) bound can be found in [1, Theorem 1, eq. 4]:
\[f(x_n)-f_\star \leqslant \frac{\rho^{2(n+1)} L \kappa}{2}\|x_0 - x_\star\|^2.\]References: The triple momentum method was discovered and analyzed in [1].
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_triple_momentum(mu=0.1, L=1., n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 30 scalar constraint(s) ... function 1 : 30 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.23893532450841679 *** Example file: worst-case performance of the Triple Momentum Method *** PEPit guarantee: f(x_n)-f_* <= 0.238935 ||x_0-x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.238925 ||x_0-x_*||^2
Information theoretic exact method
- PEPit.examples.unconstrained_convex_minimization.wc_information_theoretic(mu, L, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex (\(\mu\) is possibly 0).
This code computes a worst-case guarantee for the information theoretic exact method (ITEM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[\|z_n - x_\star\|^2 \leqslant \tau(n, L, \mu) \|z_0 - x_\star\|^2\]is valid, where \(z_n\) is the output of the ITEM, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\|z_n - x_\star\|^2\) when \(\|z_0 - x_\star\|^2 \leqslant 1\).
Algorithm: For \(t\in\{0,1,\ldots,n-1\}\), the information theoretic exact method of this example is provided by
\begin{eqnarray} y_{t} & = & (1-\beta_t) z_t + \beta_t x_t \\ x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t) \\ z_{t+1} & = & \left(1-q\delta_t\right) z_t+q\delta_t y_t-\frac{\delta_t}{L}\nabla f(y_t), \end{eqnarray}with \(y_{-1}=x_0=z_0\), \(q=\frac{\mu}{L}\) (inverse condition ratio), and the scalar sequences:
\begin{eqnarray} A_{t+1} & = & \frac{(1+q)A_t+2\left(1+\sqrt{(1+A_t)(1+qA_t)}\right)}{(1-q)^2},\\ \beta_{t+1} & = & \frac{A_t}{(1-q)A_{t+1}},\\ \delta_{t+1} & = & \frac{1}{2}\frac{(1-q)^2A_{t+1}-(1+q)A_t}{1+q+q A_t}, \end{eqnarray}with \(A_0=0\).
Theoretical guarantee: A tight worst-case guarantee can be found in [1, Theorem 3]:
\[\|z_n - x_\star\|^2 \leqslant \frac{1}{1+q A_n} \|z_0-x_\star\|^2,\]where tightness is obtained on some quadratic loss functions (see [1, Lemma 2]).
References:
- Parameters
mu (float) – the strong convexity parameter.
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_information_theoretic(mu=.001, L=1, n=15, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 17x17 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 240 scalar constraint(s) ... function 1 : 240 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.7566107333964406 *** Example file: worst-case performance of the information theoretic exact method *** PEP-it guarantee: ||z_n - x_* ||^2 <= 0.756611 ||z_0 - x_*||^2 Theoretical guarantee: ||z_n - x_* ||^2 <= 0.756605 ||z_0 - x_*||^2
Proximal point
- PEPit.examples.unconstrained_convex_minimization.wc_proximal_point(gamma, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is closed, proper, and convex (and potentially non-smooth).
This code computes a worst-case guarantee for the proximal point method with step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n,\gamma)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, \gamma) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the proximal point method, and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(\gamma\), \(\tau(n,\gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm:
The proximal point method is described by
\[x_{t+1} = \arg\min_x \left\{f(x)+\frac{1}{2\gamma}\|x-x_t\|^2 \right\},\]where \(\gamma\) is a step-size.
Theoretical guarantee:
The tight theoretical guarantee can be found in [1, Theorem 4.1]:
\[f(x_n)-f_\star \leqslant \frac{\|x_0-x_\star\|^2}{4\gamma n},\]where tightness is obtained on, e.g., one-dimensional linear problems on the positive orthant.
References:
- Parameters
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_proximal_point(gamma=3, n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 6x6 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 20 scalar constraint(s) ... function 1 : 20 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.02083327687098447 *** Example file: worst-case performance of proximal point method *** PEPit guarantee: f(x_n)-f_* <= 0.0208333 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0208333 ||x_0 - x_*||^2
Accelerated proximal point
- PEPit.examples.unconstrained_convex_minimization.wc_accelerated_proximal_point(A0, gammas, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is convex and possibly non-smooth.
This code computes a worst-case guarantee an accelerated proximal point method, aka fast proximal point method (FPP). That is, it computes the smallest possible \(\tau(n, A_0,\vec{\gamma})\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, A_0, \vec{\gamma}) \left(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2\right)\]is valid, where \(x_n\) is the output of FPP (with step-size \(\gamma_t\) at step \(t\in \{0, \dots, n-1\}\)) and where \(x_\star\) is a minimizer of \(f\) and \(A_0\) is a positive number.
In short, for given values of \(n\), \(A_0\) and \(\vec{\gamma}\), \(\tau(n)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2 \leqslant 1\), for the following method.
Algorithm: For \(t\in \{0, \dots, n-1\}\):
\begin{eqnarray} y_{t+1} & = & (1-\alpha_{t} ) x_{t} + \alpha_{t} v_t \\ x_{t+1} & = & \arg\min_x \left\{f(x)+\frac{1}{2\gamma_t}\|x-y_{t+1}\|^2 \right\}, \\ v_{t+1} & = & v_t + \frac{1}{\alpha_{t}} (x_{t+1}-y_{t+1}) \end{eqnarray}with
\begin{eqnarray} \alpha_{t} & = & \frac{\sqrt{(A_t \gamma_t)^2 + 4 A_t \gamma_t} - A_t \gamma_t}{2} \\ A_{t+1} & = & (1 - \alpha_{t}) A_t \end{eqnarray}and \(v_0=x_0\).
Theoretical guarantee: A theoretical upper bound can be found in [1, Theorem 2.3.]:
\[f(x_n)-f_\star \leqslant \frac{4}{A_0 (\sum_{t=0}^{n-1} \sqrt{\gamma_t})^2}\left(f(x_0) - f_\star + \frac{A_0}{2} \|x_0 - x_\star\|^2 \right).\]References: The accelerated proximal point was first obtained and analyzed in [1].
- Parameters
A0 (float) – initial value for parameter A_0.
gammas (list) – sequence of step-sizes.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_proximal_point(A0=5, gammas=[(i + 1) / 1.1 for i in range(3)], n=3, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 6x6 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 20 scalar constraint(s) ... function 1 : 20 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.015931135941565824 *** Example file: worst-case performance of fast proximal point method *** PEPit guarantee: f(x_n)-f_* <= 0.0159311 (f(x_0) - f_* + A/2* ||x_0 - x_*||^2) Theoretical guarantee: f(x_n)-f_* <= 0.0511881 (f(x_0) - f_* + A/2* ||x_0 - x_*||^2)
Inexact gradient descent
- PEPit.examples.unconstrained_convex_minimization.wc_inexact_gradient_descent(L, mu, epsilon, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for the inexact gradient method. That is, it computes the smallest possible \(\tau(n, L, \mu, \varepsilon)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \varepsilon) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of the inexact gradient method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\), \(L\), \(\mu\) and \(\varepsilon\), \(\tau(n, L, \mu, \varepsilon)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(f(x_0) - f_\star \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \gamma d_t\]with
\[\|d_t - \nabla f(x_t)\| \leqslant \varepsilon \|\nabla f(x_t)\|\]and
\[\gamma = \frac{2}{L_{\varepsilon} + \mu_{\varepsilon}}\]where \(L_{\varepsilon} = (1 + \varepsilon) L\) and \(\mu_{\varepsilon} = (1 - \varepsilon) \mu\).
Theoretical guarantee:
The tight worst-case guarantee obtained in [1, Theorem 5.3] or [2, Remark 1.6] is
\[f(x_n) - f_\star \leqslant \left(\frac{L_{\varepsilon}-\mu_{\varepsilon}}{L_{\varepsilon}+\mu_{\varepsilon}}\right)^{2n}(f(x_0) - f_\star),\]where tightness is achieved on simple quadratic functions.
References: The detailed analyses can be found in [1, 2].
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
epsilon (float) – level of inaccuracy.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_inexact_gradient_descent(L=1, mu=.1, epsilon=.1, n=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 14 scalar constraint(s) ... function 1 : 14 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.5189192063892595 *** Example file: worst-case performance of inexact gradient method in distance in function values *** PEPit guarantee: f(x_n)-f_* <= 0.518919 (f(x_0)-f_*) Theoretical guarantee: f(x_n)-f_* <= 0.518917 (f(x_0)-f_*)
Inexact gradient descent with exact line search
- PEPit.examples.unconstrained_convex_minimization.wc_inexact_gradient_exact_line_search(L, mu, epsilon, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for an inexact gradient method with exact linesearch (ELS). That is, it computes the smallest possible \(\tau(n, L, \mu, \varepsilon)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \mu, \varepsilon) ( f(x_0) - f_\star )\]is valid, where \(x_n\) is the output of the gradient descent with an inexact descent direction and an exact linesearch, and where \(x_\star\) is the minimizer of \(f\).
The inexact descent direction \(d\) is assumed to satisfy a relative inaccuracy described by (with \(0 \leqslant \varepsilon < 1\))
\[\|\nabla f(x_t) - d_t\| \leqslant \varepsilon \|\nabla f(x_t)\|,\]where \(\nabla f(x_t)\) is the true gradient, and \(d_t\) is the approximate descent direction that is used.
Algorithm:
For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} \gamma_t & = & \arg\min_{\gamma \in R^d} f(x_t- \gamma d_t), \\ x_{t+1} & = & x_t - \gamma_t d_t. \end{eqnarray}Theoretical guarantees:
The tight guarantee obtained in [1, Theorem 5.1] is
\[f(x_n) - f_\star\leqslant \left(\frac{L_{\varepsilon} - \mu_{\varepsilon}}{L_{\varepsilon} + \mu_{\varepsilon}}\right)^{2n}( f(x_0) - f_\star ),\]with \(L_{\varepsilon} = (1 + \varepsilon) L\) and \(\mu_{\varepsilon} = (1 - \varepsilon) \mu\). Tightness is achieved on simple quadratic functions.
References: The detailed approach (based on convex relaxations) is available in [1],
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
epsilon (float) – level of inaccuracy.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_inexact_gradient_exact_line_search(L=1, mu=0.1, epsilon=0.1, n=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 9x9 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 18 scalar constraint(s) ... function 1 : 18 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.5191057273345401 *** Example file: worst-case performance of inexact gradient descent with exact linesearch *** PEPit guarantee: f(x_n)-f_* <= 0.519106 (f(x_0)-f_*) Theoretical guarantee: f(x_n)-f_* <= 0.518917 (f(x_0)-f_*)
Inexact accelerated gradient
- PEPit.examples.unconstrained_convex_minimization.wc_inexact_accelerated_gradient(L, epsilon, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for an accelerated gradient method using inexact first-order information. That is, it computes the smallest possible \(\tau(n, L, \varepsilon)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \varepsilon) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of inexact accelerated gradient descent and where \(x_\star\) is a minimizer of \(f\).
The inexact descent direction is assumed to satisfy a relative inaccuracy described by (with \(0\leqslant \varepsilon \leqslant 1\))
\[\|\nabla f(y_t) - d_t\| \leqslant \varepsilon \|\nabla f(y_t)\|,\]where \(\nabla f(y_t)\) is the true gradient at \(y_t\) and \(d_t\) is the approximate descent direction that is used.
Algorithm: The inexact accelerated gradient method of this example is provided by
\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} d_t\\ y_{k+1} & = & x_{t+1} + \frac{t-1}{t+2} (x_{t+1} - x_t). \end{eqnarray}Theoretical guarantee: When \(\varepsilon=0\), a tight empirical guarantee can be found in [1, Table 1]:
\[f(x_n)-f_\star \leqslant \frac{2L\|x_0-x_\star\|^2}{n^2 + 5 n + 6},\]which is achieved on some Huber loss functions (when \(\varepsilon=0\)).
References:
- Parameters
L (float) – smoothness parameter.
epsilon (float) – level of inaccuracy
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_inexact_accelerated_gradient(L=1, epsilon=0.1, n=5, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 13x13 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 47 scalar constraint(s) ... function 1 : 47 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.03944038534724904 *** Example file: worst-case performance of inexact accelerated gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.0394404 (f(x_0)-f_*) Theoretical guarantee for epsilon = 0 : f(x_n)-f_* <= 0.0357143 (f(x_0)-f_*)
Epsilon-subgradient method
- PEPit.examples.unconstrained_convex_minimization.wc_epsilon_subgradient_method(M, n, gamma, eps, R, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is closed, convex, and proper. This problem is a (possibly non-smooth) minimization problem.
This code computes a worst-case guarantee for the \(\varepsilon\) -subgradient method. That is, it computes the smallest possible \(\tau(n, M, \gamma, \varepsilon, R)\) such that the guarantee
\[\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star \leqslant \tau(n, M, \gamma, \varepsilon, R)\]is valid, where \(x_t\) are the iterates of the \(\varepsilon\) -subgradient method after \(t\leqslant n\) steps, where \(x_\star\) is a minimizer of \(f\), where \(M\) is an upper bound on the norm of all \(\varepsilon\)-subgradients encountered, and when \(\|x_0-x_\star\|\leqslant R\).
In short, for given values of \(M\), of the accuracy \(\varepsilon\), of the step-size \(\gamma\), of the initial distance \(R\), and of the number of iterations \(n\), \(\tau(n, M, \gamma, \varepsilon, R)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n} f(x_t) - f_\star\).
Algorithm: For \(t\in \{0, \dots, n-1 \}\)
\begin{eqnarray} g_{t} & \in & \partial_{\varepsilon} f(x_t) \\ x_{t+1} & = & x_t - \gamma g_t \end{eqnarray}Theoretical guarantee: An upper bound is obtained in [1, Lemma 2]:
\[\min_{0 \leqslant t \leqslant n} f(x_t)- f(x_\star) \leqslant \frac{R^2+2(n+1)\gamma\varepsilon+(n+1) \gamma^2 M^2}{2(n+1) \gamma}.\]References:
- Parameters
M (float) – the bound on norms of epsilon-subgradients.
n (int) – the number of iterations.
gamma (float) – step-size.
eps (float) – the bound on the value of epsilon (inaccuracy).
R (float) – the bound on initial distance to an optimal solution.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> M, n, eps, R = 2, 6, .1, 1 >>> gamma = 1 / sqrt(n + 1) >>> pepit_tau, theoretical_tau = wc_epsilon_subgradient_method(M=M, n=n, gamma=gamma, eps=eps, R=R, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 21x21 (PEPit) Setting up the problem: performance measure is minimum of 7 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (14 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 188 scalar constraint(s) ... function 1 : 188 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 1.0191201198697333 *** Example file: worst-case performance of the epsilon-subgradient method *** PEPit guarantee: min_(0 <= t <= n) f(x_i) - f_* <= 1.01912 Theoretical guarantee: min_(0 <= t <= n) f(x_i) - f_* <= 1.04491
Gradient descent for quadratically upper bounded convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent_qg_convex(L, gamma, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [1]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L, \gamma) \| x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of gradient descent with fixed step-size \(\gamma\), and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(||x_0 - x_\star||^2 \leqslant 1\).
Algorithm: Gradient descent is described by
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size.
Theoretical guarantee: When \(\gamma < \frac{1}{L}\), the lower theoretical guarantee can be found in [1, Theorem 2.2]:
\[f(x_n)-f_\star \leqslant \frac{L}{2}\max\left(\frac{1}{2n L \gamma + 1}, L \gamma\right) \|x_0-x_\star\|^2.\]References:
The detailed approach is available in [1, Theorem 2.2].
- Parameters
L (float) – the quadratic growth parameter.
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> L = 1 >>> pepit_tau, theoretical_tau = wc_gradient_descent_qg_convex(L=L, gamma=.2 / L, n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 35 scalar constraint(s) ... function 1 : 35 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.19230811671886025 *** Example file: worst-case performance of gradient descent with fixed step-sizes *** PEPit guarantee: f(x_n)-f_* <= 0.192308 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.192308 ||x_0 - x_*||^2
Gradient descent with decreasing step sizes for quadratically upper bounded convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_gradient_descent_qg_convex_decreasing(L, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [1]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.
This code computes a worst-case guarantee for gradient descent with decreasing step-sizes. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \| x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of gradient descent with decreasing step-sizes, and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(||x_0 - x_\star||^2 \leqslant 1\).
Algorithm: Gradient descent with decreasing step sizes is described by
\[x_{t+1} = x_t - \gamma_t \nabla f(x_t)\]with
\[\gamma_t = \frac{1}{L u_{t+1}}\]where the sequence \(u\) is defined by
\begin{eqnarray} u_0 & = & 1 \\ u_{t} & = & \frac{u_{t-1}}{2} + \sqrt{\left(\frac{u_{t-1}}{2}\right)^2 + 2}, \quad \mathrm{for } t \geq 1 \end{eqnarray}Theoretical guarantee: The tight theoretical guarantee is conjectured in [1, Conjecture A.3]:
\[f(x_n)-f_\star \leqslant \frac{L}{2 u_t} \|x_0-x_\star\|^2.\]Notes:
We verify that \(u_t \sim 2\sqrt{t}\). The step sizes as well as the function values of the iterates decrease as \(O\left( \frac{1}{\sqrt{t}} \right)\).
References:
The detailed approach is available in [1, Appendix A.3].
- Parameters
L (float) – the quadratic growth parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_gradient_descent_qg_convex_decreasing(L=1, n=6, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 9x9 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 63 scalar constraint(s) ... function 1 : 63 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.10554312873115372 (PEPit) Postprocessing: solver's output is not entirely feasible (smallest eigenvalue of the Gram matrix is: -4.19e-06 < 0). Small deviation from 0 may simply be due to numerical error. Big ones should be deeply investigated. In any case, from now the provided values of parameters are based on the projection of the Gram matrix onto the cone of symmetric semi-definite matrix. *** Example file: worst-case performance of gradient descent with fixed step-sizes *** PEPit guarantee: f(x_n)-f_* <= 0.105543 ||x_0 - x_*||^2 Theoretical conjecture: f(x_n)-f_* <= 0.105547 ||x_0 - x_*||^2
Conjugate gradient for quadratically upper bounded convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_conjugate_gradient_qg_convex(L, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [2]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.
This code computes a worst-case guarantee for the conjugate gradient (CG) method (with exact span searches). That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0-x_\star\|^2\]is valid, where \(x_n\) is the output of the conjugate gradient method, and where \(x_\star\) is a minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0-x_\star\|^2 \leqslant 1\).
Algorithm:
\[x_{t+1} = x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i)\]with
\[(\gamma_i)_{i \leqslant t} = \arg\min_{(\gamma_i)_{i \leqslant t}} f \left(x_t - \sum_{i=0}^t \gamma_i \nabla f(x_i) \right)\]Theoretical guarantee:
The tight guarantee obtained in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper) is
\[f(x_n) - f_\star \leqslant \frac{L}{2 (n + 1)} \|x_0-x_\star\|^2.\]References: The detailed approach (based on convex relaxations) is available in [1, Corollary 6], and the result provided in [2, Theorem 2.4].
- Parameters
L (float) – the quadratic growth parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_conjugate_gradient_qg_convex(L=1, n=12, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 27x27 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 351 scalar constraint(s) ... function 1 : 351 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.038461130525391705 *** Example file: worst-case performance of conjugate gradient method *** PEPit guarantee: f(x_n)-f_* <= 0.0384611 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0384615 ||x_0 - x_*||^2
Heavy Ball momentum for quadratically upper bounded convex objective
- PEPit.examples.unconstrained_convex_minimization.wc_heavy_ball_momentum_qg_convex(L, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is quadratically upper bounded (\(\text{QG}^+\) [2]), i.e. \(\forall x, f(x) - f_\star \leqslant \frac{L}{2} \|x-x_\star\|^2\), and convex.
This code computes a worst-case guarantee for the Heavy-ball (HB) method, aka Polyak momentum method. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the Heavy-ball (HB) method, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(f(x_n)-f_\star\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm:
This method is described in [1]
\[x_{t+1} = x_t - \alpha_t \nabla f(x_t) + \beta_t (x_t-x_{t-1})\]with
\[\alpha_t = \frac{1}{L} \frac{1}{t+2}\]and
\[\beta_t = \frac{t}{t+2}\]Theoretical guarantee:
The tight guarantee obtained in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper) is
\[f(x_n) - f_\star \leqslant \frac{L}{2}\frac{1}{n+1} \|x_0 - x_\star\|^2.\]References: This methods was first introduce in [1, section 3], and convergence tight bound was proven in [2, Theorem 2.3] (lower) and [2, Theorem 2.4] (upper).
- Parameters
L (float) – the quadratic growth parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_heavy_ball_momentum_qg_convex(L=1, n=5, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 9x9 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 63 scalar constraint(s) ... function 1 : 63 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.08333167067320212 *** Example file: worst-case performance of the Heavy-Ball method *** PEPit guarantee: f(x_n)-f_* <= 0.0833317 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0833333 ||x_0 - x_*||^2
Composite convex minimization
Proximal gradient
- PEPit.examples.composite_convex_minimization.wc_proximal_gradient(L, mu, gamma, n, verbose=1)[source]
Consider the composite convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]where \(f_1\) is \(L\)-smooth and \(\mu\)-strongly convex, and where \(f_2\) is closed convex and proper.
This code computes a worst-case guarantee for the proximal gradient method (PGM). That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[\|x_n - x_\star\|^2 \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2,\]is valid, where \(x_n\) is the output of the proximal gradient, and where \(x_\star\) is a minimizer of \(F\). In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\|x_n - x_\star\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: Proximal gradient is described by
\[\begin{split}\begin{eqnarray} y_t & = & x_t - \gamma \nabla f_1(x_t), \\ x_{t+1} & = & \arg\min_x \left\{f_2(x)+\frac{1}{2\gamma}\|x-y_t\|^2 \right\}, \end{eqnarray}\end{split}\]for \(t \in \{ 0, \dots, n-1\}\) and where \(\gamma\) is a step-size.
Theoretical guarantee: It is well known that a tight guarantee for PGM is provided by
\[\|x_n - x_\star\|^2 \leqslant \max\{(1-L\gamma)^2,(1-\mu\gamma)^2\}^n \|x_0 - x_\star\|^2,\]which can be found in, e.g., [1, Theorem 3.1]. It is a folk knowledge and the result can be found in many references for gradient descent; see, e.g.,[2, Section 1.4: Theorem 3], [3, Section 5.1] and [4, Section 4.4].
References:
[2] B. Polyak (1987). Introduction to Optimization. Optimization Software New York.
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
gamma (float) – proximal step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> pepit_tau, theoretical_tau = wc_proximal_gradient(L=1, mu=.1, gamma=1, n=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 6 scalar constraint(s) ... function 1 : 6 scalar constraint(s) added function 2 : Adding 6 scalar constraint(s) ... function 2 : 6 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.6560999999942829 *** Example file: worst-case performance of the Proximal Gradient Method in function values*** PEPit guarantee: ||x_n - x_*||^2 <= 0.6561 ||x0 - xs||^2 Theoretical guarantee: ||x_n - x_*||^2 <= 0.6561 ||x0 - xs||^2
Accelerated proximal gradient
- PEPit.examples.composite_convex_minimization.wc_accelerated_proximal_gradient(mu, L, n, verbose=1)[source]
Consider the composite convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f(x) + h(x)\},\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex, and where \(h\) is closed convex and proper.
This code computes a worst-case guarantee for the accelerated proximal gradient method, also known as fast proximal gradient (FPGM) method. That is, it computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee
\[F(x_n) - F(x_\star) \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2,\]is valid, where \(x_n\) is the output of the accelerated proximal gradient method, and where \(x_\star\) is a minimizer of \(F\).
In short, for given values of \(n\), \(L\) and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(F(x_n) - F(x_\star)\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: Accelerated proximal gradient is described as follows, for \(t \in \{ 0, \dots, n-1\}\),
\begin{eqnarray} x_{t+1} & = & \arg\min_x \left\{h(x)+\frac{L}{2}\|x-\left(y_{t} - \frac{1}{L} \nabla f(y_t)\right)\|^2 \right\}, \\ y_{t+1} & = & x_{t+1} + \frac{i}{i+3} (x_{t+1} - x_{t}), \end{eqnarray}where \(y_{0} = x_0\).
Theoretical guarantee: A tight (empirical) worst-case guarantee for FPGM is obtained in [1, method FPGM1 in Sec. 4.2.1, Table 1 in sec 4.2.2], for \(\mu=0\):
\[F(x_n) - F_\star \leqslant \frac{2 L}{n^2+5n+2} \|x_0 - x_\star\|^2,\]which is attained on simple one-dimensional constrained linear optimization problems.
References:
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_proximal_gradient(L=1, mu=0, n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 12x12 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 30 scalar constraint(s) ... function 1 : 30 scalar constraint(s) added function 2 : Adding 20 scalar constraint(s) ... function 2 : 20 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.052630167313517565 (PEPit) Postprocessing: solver's output is not entirely feasible (smallest eigenvalue of the Gram matrix is: -7.28e-06 < 0). Small deviation from 0 may simply be due to numerical error. Big ones should be deeply investigated. In any case, from now the provided values of parameters are based on the projection of the Gram matrix onto the cone of symmetric semi-definite matrix. *** Example file: worst-case performance of the Accelerated Proximal Gradient Method in function values*** PEPit guarantee: f(x_n)-f_* <= 0.0526302 ||x0 - xs||^2 Theoretical guarantee: f(x_n)-f_* <= 0.0526316 ||x0 - xs||^2
Bregman proximal point
- PEPit.examples.composite_convex_minimization.wc_bregman_proximal_point(gamma, n, verbose=1)[source]
Consider the composite convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]where \(f_1(x)\) and \(f_2(x)\) are closed convex proper functions.
This code computes a worst-case guarantee for Bregman Proximal Point method. That is, it computes the smallest possible \(\tau(n, \gamma)\) such that the guarantee
\[F(x_n) - F(x_\star) \leqslant \tau(n, \gamma) D_{f_1}(x_\star; x_0)\]is valid, where \(x_n\) is the output of the Bregman Proximal Point (BPP) method, where \(x_\star\) is a minimizer of \(F\), and when \(D_{f_1}\) is the Bregman distance generated by \(f_1\).
Algorithm: Bregman proximal point is described in [1, Section 2, equation (9)]. For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} x_{t+1} & = & \arg\min_{u \in R^n} f_1(u) + \frac{1}{\gamma} D_{f_2}(u; x_t), \\ D_h(x; y) & = & h(x) - h(y) - \nabla h (y)^T(x - y). \end{eqnarray}Theoretical guarantee: A tight empirical guarantee can be guessed from the numerics
\[F(x_n) - F(x_\star) \leqslant \frac{1}{\gamma n} D_{f_1}(x_\star, x_0).\]References:
- Parameters
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Examples
>>> pepit_tau, theoretical_tau = wc_bregman_proximal_point(gamma=3, n=5, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 14x14 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 30 scalar constraint(s) ... function 1 : 30 scalar constraint(s) added function 2 : Adding 42 scalar constraint(s) ... function 2 : 42 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.06666740784196148 *** Example file: worst-case performance of the Bregman Proximal Point in function values *** PEPit guarantee: F(x_n)-F_* <= 0.0666674 Dh(x_*; x_0) Theoretical guarantee: F(x_n)-F_* <= 0.0666667 Dh(x_*; x_0)
Douglas Rachford splitting
- PEPit.examples.composite_convex_minimization.wc_douglas_rachford_splitting(L, alpha, theta, n, verbose=1)[source]
Consider the composite convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]where \(f_1(x)\) is is convex, closed and proper , and \(f_2\) is \(L\)-smooth. Both proximal operators are assumed to be available.
This code computes a worst-case guarantee for the Douglas Rachford Splitting (DRS) method. That is, it computes the smallest possible \(\tau(n, L, \alpha, \theta)\) such that the guarantee
\[F(y_n) - F(x_\star) \leqslant \tau(n, L, \alpha, \theta) \|x_0 - x_\star\|^2.\]is valid, where it is known that \(x_k\) and \(y_k\) converge to \(x_\star\), but not \(w_k\) (see definitions in the section Algorithm). Hence we require the initial condition on \(x_0\) (arbitrary choice, partially justified by the fact we choose \(f_2\) to be the smooth function).
Note that \(y_n\) is feasible as it has a finite value for \(f_1\) (output of the proximal operator on \(f_1\)) and as \(f_2\) is smooth.
Algorithm:
Our notations for the DRS method are as follows, for \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} x_t & = & \mathrm{prox}_{\alpha f_2}(w_t), \\ y_t & = & \mathrm{prox}_{\alpha f_1}(2x_t - w_t), \\ w_{t+1} & = & w_t + \theta (y_t - x_t). \end{eqnarray}This description can be found in [1, Section 7.3].
Theoretical guarantee: We compare the output with that of PESTO [2] for when \(0\leqslant n \leqslant 10\) in the case where \(\alpha=\theta=L=1\).
References:
- Parameters
L (float) – the smoothness parameter.
alpha (float) – parameter of the scheme.
theta (float) – parameter of the scheme.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> pepit_tau, theoretical_tau = wc_douglas_rachford_splitting(L=1, alpha=1, theta=1, n=9, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 22x22 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 90 scalar constraint(s) ... function 1 : 90 scalar constraint(s) added function 2 : Adding 110 scalar constraint(s) ... function 2 : 110 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.027792700548325236 *** Example file: worst-case performance of the Douglas Rachford Splitting in function values *** PEPit guarantee: f(y_n)-f_* <= 0.0278 ||x0 - xs||^2 Theoretical guarantee: f(y_n)-f_* <= 0.0278 ||x0 - xs||^2
Douglas Rachford splitting contraction
- PEPit.examples.composite_convex_minimization.wc_douglas_rachford_splitting_contraction(mu, L, alpha, theta, n, verbose=1)[source]
Consider the composite convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x) \}\]where \(f_1(x)\) is \(L\)-smooth and \(\mu\)-strongly convex, and \(f_2\) is convex, closed and proper. Both proximal operators are assumed to be available.
This code computes a worst-case guarantee for the Douglas Rachford Splitting (DRS) method. That is, it computes the smallest possible \(\tau(\mu,L,\alpha,\theta,n)\) such that the guarantee
\[\|w_1 - w_1'\|^2 \leqslant \tau(\mu,L,\alpha,\theta,n) \|w_0 - w_0'\|^2.\]is valid, where \(x_n\) is the output of the Douglas Rachford Splitting method. It is a contraction factor computed when the algorithm is started from two different points \(w_0\) and \(w_0\).
Algorithm:
Our notations for the DRS method are as follows [3, Section 7.3], for \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} x_t & = & \mathrm{prox}_{\alpha f_2}(w_t), \\ y_t & = & \mathrm{prox}_{\alpha f_1}(2x_t - w_t), \\ w_{t+1} & = & w_t + \theta (y_t - x_t). \end{eqnarray}Theoretical guarantee:
The tight theoretial guarantee is obtained in [2, Theorem 2]:
\[\|w_1 - w_1'\|^2 \leqslant \max\left(\frac{1}{1 + \mu \alpha}, \frac{\alpha L }{1 + L \alpha}\right)^{2n} \|w_0 - w_0'\|^2\]for when \(\theta=1\).
References:
Details on the SDP formulations can be found in
When \(\theta = 1\), the bound can be compared with that of [2, Theorem 2]
A description for the DRS method can be found in [3, 7.3]
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
alpha (float) – parameter of the scheme.
theta (float) – parameter of the scheme.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Examples
>>> pepit_tau, theoretical_tau = wc_douglas_rachford_splitting_contraction(mu=.1, L=1, alpha=3, theta=1, n=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 12x12 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 20 scalar constraint(s) ... function 1 : 20 scalar constraint(s) added function 2 : Adding 20 scalar constraint(s) ... function 2 : 20 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.35012779919911946 *** Example file: worst-case performance of the Douglas-Rachford splitting in distance *** PEPit guarantee: ||w - wp||^2 <= 0.350128 ||w0 - w0p||^2 Theoretical guarantee: ||w - wp||^2 <= 0.350128 ||w0 - w0p||^2
Accelerated Douglas Rachford splitting
- PEPit.examples.composite_convex_minimization.wc_accelerated_douglas_rachford_splitting(mu, L, alpha, n, verbose=1)[source]
Consider the composite convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]where \(f_1\) is closed convex and proper, and \(f_2\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for accelerated Douglas-Rachford. That is, it computes the smallest possible \(\tau(n, L, \mu, \alpha)\) such that the guarantee
\[F(y_n) - F(x_\star) \leqslant \tau(n,L,\mu,\alpha) \|w_0 - w_\star\|^2\]is valid, \(\alpha\) is a parameter of the method, and where \(y_n\) is the output of the accelerated Douglas-Rachford Splitting method, where \(x_\star\) is a minimizer of \(F\), and \(w_\star\) defined such that
\[x_\star = \mathrm{prox}_{\alpha f_2}(w_\star)\]is an optimal point.
In short, for given values of \(n\), \(L\), \(\mu\), \(\alpha\), \(\tau(n, L, \mu, \alpha)\) is computed as the worst-case value of \(F(y_n)-F_\star\) when \(\|w_0 - w_\star\|^2 \leqslant 1\).
Algorithm: The accelerated Douglas-Rachford splitting is described in [1, Section 4]. For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} x_{t} & = & \mathrm{prox}_{\alpha f_2} (u_t),\\ y_{t} & = & \mathrm{prox}_{\alpha f_1}(2x_t-u_t),\\ w_{t+1} & = & u_t + \theta (y_t-x_t),\\ u_{t+1} & = & \left\{\begin{array}{ll} w_{t+1}+\frac{t-1}{t+2}(w_{t+1}-w_t)\, & \text{if } t >1,\\ w_{t+1} & \text{otherwise.} \end{array}\right. \end{eqnarray}Theoretical guarantee: There is no known worst-case guarantee for this method beyond quadratic minimization. For quadratics, an upper bound on is provided by [1, Theorem 5]:
\[F(y_n) - F_\star \leqslant \frac{2}{\alpha \theta (n + 3)^ 2} \|w_0-w_\star\|^2,\]when \(\theta=\frac{1-\alpha L}{1+\alpha L}\) and \(\alpha < \frac{1}{L}\).
References: An analysis of the accelerated Douglas-Rachford splitting is available in [1, Theorem 5] for when the convex minimization problem is quadratic.
- Parameters
mu (float) – the strong convexity parameter.
L (float) – the smoothness parameter.
alpha (float) – the parameter of the scheme.
n (int) – the number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value (upper bound for quadratics; not directly comparable).
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_douglas_rachford_splitting(mu=.1, L=1, alpha=.9, n=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 11x11 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 20 scalar constraint(s) ... function 1 : 20 scalar constraint(s) added function 2 : Adding 20 scalar constraint(s) ... function 2 : 20 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.19291623136473224 *** Example file: worst-case performance of the Accelerated Douglas Rachford Splitting in function values *** PEPit guarantee: F(y_n)-F_* <= 0.192916 ||x0 - ws||^2 Theoretical guarantee for quadratics: F(y_n)-F_* <= 1.68889 ||x0 - ws||^2
Frank Wolfe
- PEPit.examples.composite_convex_minimization.wc_frank_wolfe(L, D, n, verbose=1)[source]
Consider the composite convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]where \(f_1\) is \(L\)-smooth and convex and where \(f_2\) is a convex indicator function on \(\mathcal{D}\) of diameter at most \(D\).
This code computes a worst-case guarantee for the conditional gradient method, aka Frank-Wolfe method. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[F(x_n) - F(x_\star) \leqslant \tau(n, L) D^2,\]is valid, where x_n is the output of the conditional gradient method, and where \(x_\star\) is a minimizer of \(F\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(F(x_n) - F(x_\star)\) when \(D \leqslant 1\).
Algorithm:
This method was first presented in [1]. A more recent version can be found in, e.g., [2, Algorithm 1]. For \(t \in \{0, \dots, n-1\}\),
\[\begin{split}\begin{eqnarray} y_t & = & \arg\min_{s \in \mathcal{D}} \langle s \mid \nabla f_1(x_t) \rangle, \\ x_{t+1} & = & \frac{t}{t + 2} x_t + \frac{2}{t + 2} y_t. \end{eqnarray}\end{split}\]Theoretical guarantee:
An upper guarantee obtained in [2, Theorem 1] is
\[F(x_n) - F(x_\star) \leqslant \frac{2L D^2}{n+2}.\]References:
[1] M .Frank, P. Wolfe (1956). An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2), 95-110.
- Parameters
L (float) – the smoothness parameter.
D (float) – diameter of \(f_2\).
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> pepit_tau, theoretical_tau = wc_frank_wolfe(L=1, D=1, n=10, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 26x26 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 132 scalar constraint(s) ... function 1 : 132 scalar constraint(s) added function 2 : Adding 325 scalar constraint(s) ... function 2 : 325 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.07830185202143693 *** Example file: worst-case performance of the Conditional Gradient (Frank-Wolfe) in function value *** PEPit guarantee: f(x_n)-f_* <= 0.0783019 ||x0 - xs||^2 Theoretical guarantee: f(x_n)-f_* <= 0.166667 ||x0 - xs||^2
Improved interior method
- PEPit.examples.composite_convex_minimization.wc_improved_interior_algorithm(L, mu, c, lam, n, verbose=1)[source]
Consider the composite convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]where \(f_1\) is a \(L\)-smooth convex function, and \(f_2\) is a closed convex indicator function. We use a kernel function \(h\) that is assumed to be closed, proper, and strongly convex (see [1, Section 5]).
This code computes a worst-case guarantee for Improved interior gradient algorithm (IGA). That is, it computes the smallest possible \(\tau(\mu,L,c,\lambda,n)\) such that the guarantee
\[F(x_n) - F(x_\star) \leqslant \tau(\mu,L,c,\lambda,n) (c D_h(x_\star;x_0) + f_1(x_0) - f_1(x_\star))\]is valid, where \(x_n\) is the output of the IGA and where \(x_\star\) is a minimizer of \(F\) and \(D_h\) is the Bregman distance generated by \(h\).
In short, for given values of \(\mu\), \(L\), \(c\), \(\lambda\) and \(n\), \(\tau(\mu,L,c,\lambda,n)\) is computed as the worst-case value of \(F(x_n)-F_\star\) when \(c D_h(x_\star;x_0) + f_1(x_0) - f_1(x_\star)\leqslant 1\).
Algorithm: The IGA is described in [1, “Improved Interior Gradient Algorithm”]. For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} \alpha_t & = & \frac{\sqrt{(c_t\lambda)^2+4c_t\lambda}-\lambda c_t}{2},\\ y_t & = & (1-\alpha_t) x_t + \alpha_t z_t,\\ c_{t+1} & = & (1-\alpha_t)c_t,\\ z_{t+1} & = & \arg\min_{z} \left\{ \left< z;\frac{\alpha_t}{c_{t+1}}\nabla f_1(y_t)\right> +f_2(z)+D_h(z;z_t)\right\}, \\ x_{t+1} & = & (1-\alpha_t) x_t + \alpha_t z_{t+1}. \end{eqnarray}Theoretical guarantee: The following upper bound can be found in [1, Theorem 5.2]:
\[F(x_n) - F_\star \leqslant \frac{4L}{c n^2}\left(c D_h(x_\star;x_0) + f_1(x_0) - f_1(x_\star) \right).\]References:
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong-convexity parameter.
c (float) – initial value.
lam (float) – the step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> L = 1 >>> lam = 1 / L >>> pepit_tau, theoretical_tau = wc_improved_interior_algorithm(L=L, mu=1, c=1, lam=lam, n=5, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 22x22 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 3 function(s) function 1 : Adding 42 scalar constraint(s) ... function 1 : 42 scalar constraint(s) added function 2 : Adding 49 scalar constraint(s) ... function 2 : 49 scalar constraint(s) added function 3 : Adding 42 scalar constraint(s) ... function 3 : 42 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal_inaccurate (solver: SCS); optimal value: 0.06675394483126838 *** Example file: worst-case performance of the Improved interior gradient algorithm in function values *** PEPit guarantee: F(x_n)-F_* <= 0.0667539 (c * Dh(xs;x0) + f1(x0) - F_*) Theoretical guarantee: F(x_n)-F_* <= 0.111111 (c * Dh(xs;x0) + f1(x0) - F_*)
No Lips in function value
- PEPit.examples.composite_convex_minimization.wc_no_lips_in_function_value(L, gamma, n, verbose=1)[source]
Consider the constrainted composite convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]where \(f_1\) is convex and \(L\)-smooth relatively to \(h\), \(h\) being closed proper and convex, and where \(f_2\) is a closed convex indicator function.
This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[F(x_n) - F_\star \leqslant \tau(n, L) D_h(x_\star; x_0),\]is valid, where \(x_n\) is the output of the NoLips method, where \(x_\star\) is a minimizer of \(F\), and where \(D_h\) is the Bregman divergence generated by \(h\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(F(x_n) - F_\star\) when \(D_h(x_\star; x_0) \leqslant 1\).
Algorithm: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [2, Algorithm 1]. For \(t \in \{0, \dots, n-1\}\),
\[x_{t+1} = \arg\min_{u} \{f_2(u)+\langle \nabla f_1(x_t) \mid u - x_t \rangle + \frac{1}{\gamma} D_h(u; x_t)\}.\]Theoretical guarantee:
The tight guarantee obtained in [2, Theorem 1] is
\[F(x_n) - F_\star \leqslant \frac{1}{\gamma n} D_h(x_\star; x_0),\]for any \(\gamma \leq \frac{1}{L}\); tightness is provided in [2, page 23].
References: NoLips was proposed [1] for convex problems involving relative smoothness. The worst-case analysis using a PEP, as well as the tightness are provided in [2].
Notes
Disclaimer: This example requires some experience with PEPit and PEPs ([2], section 4).
- Parameters
L (float) – relative-smoothness parameter.
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> L = 1 >>> gamma = 1 / (2 * L) >>> pepit_tau, theoretical_tau = wc_no_lips_in_function_value(L=L, gamma=gamma, n=3, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 15x15 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 3 function(s) function 1 : Adding 20 scalar constraint(s) ... function 1 : 20 scalar constraint(s) added function 2 : Adding 20 scalar constraint(s) ... function 2 : 20 scalar constraint(s) added function 3 : Adding 16 scalar constraint(s) ... function 3 : 16 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.6666714558260607 *** Example file: worst-case performance of the NoLips in function values *** PEPit guarantee: F(x_n) - F_* <= 0.666671 Dh(x_*; x_0) Theoretical guarantee: F(x_n) - F_* <= 0.666667 Dh(x_*; x_0)
No Lips in Bregman divergence
- PEPit.examples.composite_convex_minimization.wc_no_lips_in_bregman_divergence(L, gamma, n, verbose=1)[source]
Consider the constrainted composite convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]where \(f_1\) is convex and \(L\)-smooth relatively to \(h\), \(h\) being closed proper and convex, and where \(f_2\) is a closed convex indicator function.
This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[\min_{t\leqslant n} D_h(x_{t-1}; x_t) \leqslant \tau(n, L) D_h(x_\star; x_0),\]is valid, where \(x_n\) is the output of the NoLips method, where \(x_\star\) is a minimizer of \(F\), and where \(D_h\) is the Bregman divergence generated by \(h\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(\min_{t\leqslant n} D_h(x_{t-1}; x_t)\) when \(D_h(x_\star; x_0) \leqslant 1\).
Algorithm: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [2, Algorithm 1]. For \(t \in \{0, \dots, n-1\}\),
\[x_{t+1} = \arg\min_{u} \{f_2(u)+\langle \nabla f_1(x_t) \mid u - x_t \rangle + \frac{1}{\gamma} D_h(u; x_t)\}.\]Theoretical guarantee: The upper guarantee obtained in [2, Proposition 4] is
\[\min_{t\leqslant n} D_h(x_{t-1}; x_t) \leqslant \frac{2}{n (n - 1)} D_h(x_\star; x_0),\]for any \(\gamma \leq \frac{1}{L}\). It is empirically tight.
References:
Notes
Disclaimer: This example requires some experience with PEPit and PEPs ([2], section 4).
- Parameters
L (float) – relative-smoothness parameter.
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> L = 1 >>> gamma = 1 / L >>> pepit_tau, theoretical_tau = wc_no_lips_in_bregman_divergence(L=L, gamma=gamma, n=10, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 36x36 (PEPit) Setting up the problem: performance measure is minimum of 10 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 3 function(s) function 1 : Adding 132 scalar constraint(s) ... function 1 : 132 scalar constraint(s) added function 2 : Adding 132 scalar constraint(s) ... function 2 : 132 scalar constraint(s) added function 3 : Adding 121 scalar constraint(s) ... function 3 : 121 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.022279210584840024 *** Example file: worst-case performance of the NoLips_2 in Bregman divergence *** PEPit guarantee: min_t Dh(x_(t-1); x_t) <= 0.0222792 Dh(x_*; x_0) Theoretical guarantee: min_t Dh(x_(t-1); x_t) <= 0.0222222 Dh(x_*; x_0)
Three operator splitting
- PEPit.examples.composite_convex_minimization.wc_three_operator_splitting(mu1, L1, L3, alpha, theta, n, verbose=1)[source]
Consider the composite convex minimization problem,
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x) + f_3(x)\}\]where, \(f_1\) is \(L_1\)-smooth and \(\mu_1\)-strongly convex, \(f_2\) is closed, convex and proper, and \(f_3\) is \(L_3\)-smooth convex. Proximal operators are assumed to be available for \(f_1\) and \(f_2\).
This code computes a worst-case guarantee for the Three Operator Splitting (TOS). That is, it computes the smallest possible \(\tau(n, L_1, L_3, \mu_1)\) such that the guarantee
\[\|w^{(0)}_{n} - w^{(1)}_{n}\|^2 \leqslant \tau(n, L_1, L_3, \mu_1, \alpha, \theta) \|w^{(0)}_{0} - w^{(1)}_{0}\|^2\]is valid, where \(w^{(0)}_{0}\) and \(w^{(1)}_{0}\) are two different starting points and \(w^{(0)}_{n}\) and \(w^{(1)}_{n}\) are the two corresponding \(n^{\mathrm{th}}\) outputs of TOS. (i.e., how do the iterates contract when the method is started from two different initial points).
In short, for given values of \(n\), \(L_1\), \(L_3\), \(\mu_1\), \(\alpha\) and \(\theta\), the contraction factor \(\tau(n, L_1, L_3, \mu_1, \alpha, \theta)\) is computed as the worst-case value of \(\|w^{(0)}_{n} - w^{(1)}_{n}\|^2\) when \(\|w^{(0)}_{0} - w^{(1)}_{0}\|^2 \leqslant 1\).
Algorithm: One iteration of the algorithm is described in [1]. For \(t \in \{0, \dots, n-1\}\),
\begin{eqnarray} x_t & = & \mathrm{prox}_{\alpha, f_2}(w_t), \\ y_t & = & \mathrm{prox}_{\alpha, f_1}(2 x_t - w_t - \alpha \nabla f_3(x_t)), \\ w_{t+1} & = & w_t + \theta (y_t - x_t). \end{eqnarray}References: The TOS was introduced in [1].
- Parameters
mu1 (float) – the strong convexity parameter of function \(f_1\).
L1 (float) – the smoothness parameter of function \(f_1\).
L3 (float) – the smoothness parameter of function \(f_3\).
alpha (float) – parameter of the scheme.
theta (float) – parameter of the scheme.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (None) – no theoretical value.
Example
>>> L3 = 1 >>> alpha = 1 / L3 >>> pepit_tau, theoretical_tau = wc_three_operator_splitting(mu1=0.1, L1=10, L3=L3, alpha=alpha, theta=1, n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 26x26 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 3 function(s) function 1 : Adding 56 scalar constraint(s) ... function 1 : 56 scalar constraint(s) added function 2 : Adding 56 scalar constraint(s) ... function 2 : 56 scalar constraint(s) added function 3 : Adding 56 scalar constraint(s) ... function 3 : 56 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.47544137382115453 *** Example file: worst-case performance of the Three Operator Splitting in distance *** PEPit guarantee: ||w^2_n - w^1_n||^2 <= 0.475441 ||x0 - ws||^2
Non-convex optimization
Gradient Descent
- PEPit.examples.nonconvex_optimization.wc_gradient_descent(L, gamma, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth.
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee
\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \tau(n, L, \gamma) (f(x_0) - f(x_n))\]is valid, where \(x_n\) is the n-th iterates obtained with the gradient method with fixed step-size.
Algorithm: Gradient descent is described as follows, for \(t \in \{ 0, \dots, n-1\}\),
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size and.
Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), an empirically tight theoretical worst-case guarantee is
\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \frac{4}{3}\frac{L}{n} (f(x_0) - f(x_n)),\]see discussions in [1, page 190] and [2].
References:
- Parameters
L (float) – the smoothness parameter.
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> L = 1 >>> gamma = 1 / L >>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=gamma, n=5, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 6 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 30 scalar constraint(s) ... function 1 : 30 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.2666769474847614 *** Example file: worst-case performance of gradient descent with fixed step-size *** PEPit guarantee: min_i ||f'(x_i)||^2 <= 0.266677 (f(x_0)-f_*) Theoretical guarantee: min_i ||f'(x_i)||^2 <= 0.266667 (f(x_0)-f_*)
No Lips 1
- PEPit.examples.nonconvex_optimization.wc_no_lips_1(L, gamma, n, verbose=1)[source]
Consider the constrainted non-convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]where \(f_2\) is a closed convex indicator function and \(f_1\) is possibly non-convex and \(L\)-smooth relatively to \(h\), and where \(h\) is closed proper and convex.
This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee
\[\min_{0 \leqslant t \leqslant n-1} D_h(x_{t+1}; x_t) \leqslant \tau(n, L, \gamma) (F(x_0) - F(x_n))\]is valid, where \(x_n\) is the output of the NoLips method, and where \(D_h\) is the Bregman distance generated by \(h\):
\[D_h(x; y) \triangleq h(x) - h(y) - \nabla h (y)^T(x - y).\]In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n-1}D_h(x_{t+1}; x_t)\) when \(F(x_0) - F(x_n) \leqslant 1\).
Algorithms: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [1, Section 3]. For \(t \in \{0, \dots, n-1\}\),
\[x_{t+1} = \arg\min_{u \in R^d} \nabla f(x_t)^T(u - x_t) + \frac{1}{\gamma} D_h(u; x_t).\]Theoretical guarantees: The tight theoretical upper bound is obtained in [1, Proposition 4.1]
\[\min_{0 \leqslant t \leqslant n-1} D_h(x_{t+1}; x_t) \leqslant \frac{\gamma}{n(1 - L\gamma)}(F(x_0) - F(x_n))\]References: The detailed setup and results are availaible in [1]. The PEP approach for studying such settings is presented in [2].
DISCLAIMER: This example requires some experience with PEPit and PEPs (see Section 4 in [2]).
- Parameters
L (float) – relative-smoothness parameter.
gamma (float) – step-size (equal to 1/(2*L) for guarantee).
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> L = 1 >>> gamma = 1 / (2 * L) >>> pepit_tau, theoretical_tau = wc_no_lips_1(L=L, gamma=gamma, n=5, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 20x20 (PEPit) Setting up the problem: performance measure is minimum of 5 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 3 function(s) function 1 : Adding 30 scalar constraint(s) ... function 1 : 30 scalar constraint(s) added function 2 : Adding 30 scalar constraint(s) ... function 2 : 30 scalar constraint(s) added function 3 : Adding 49 scalar constraint(s) ... function 3 : 49 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.20000306821054706 *** Example file: worst-case performance of the NoLips in Bregman divergence *** PEPit guarantee: min_t Dh(x_(t+1), x_(t)) <= 0.200003 (F(x_0) - F(x_n)) Theoretical guarantee : min_t Dh(x_(t+1), x_(t)) <= 0.2 (F(x_0) - F(x_n))
No Lips 2
- PEPit.examples.nonconvex_optimization.wc_no_lips_2(L, gamma, n, verbose=1)[source]
Consider the constrainted composite convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x)+f_2(x) \}\]where \(f_2\) is a closed convex indicator function and \(f_1\) is possibly non-convex, \(L\)-smooth relatively to \(h\), and \(h\) is closed proper and convex.
This code computes a worst-case guarantee for the NoLips method. That is, it computes the smallest possible \(\tau(n,L,\gamma)\) such that the guarantee
\[\min_{0 \leqslant t \leqslant n-1} D_h(x_t;x_{t+1}) \leqslant \tau(n, L, \gamma) (F(x_0) - F(x_n))\]is valid, where \(x_n\) is the output of the NoLips method, and where \(D_h\) is the Bregman distance generated by \(h\):
\[D_h(x; y) \triangleq h(x) - h(y) - \nabla h (y)^T(x - y).\]In short, for given values of \(n\), \(L\), and \(\gamma\), \(\tau(n, L, \gamma)\) is computed as the worst-case value of \(\min_{0 \leqslant t \leqslant n-1}D_h(x_t;x_{t+1})\) when \(F(x_0) - F(x_n) \leqslant 1\).
Algorithms: This method (also known as Bregman Gradient, or Mirror descent) can be found in, e.g., [1, Section 3]. For \(t \in \{0, \dots, n-1\}\),
\[x_{t+1} = \arg\min_{u \in R^d} \nabla f(x_t)^T(u - x_t) + \frac{1}{\gamma} D_h(u; x_t).\]Theoretical guarantees: An empirically tight worst-case guarantee is
\[\min_{0 \leqslant t \leqslant n-1}D_h(x_t;x_{t+1}) \leqslant \frac{\gamma}{n}(F(x_0) - F(x_n)).\]References: The detailed setup is presented in [1]. The PEP approach for studying such settings is presented in [2].
DISCLAIMER: This example requires some experience with PEPit and PEPs (see Section 4 in [2]).
- Parameters
L (float) – relative-smoothness parameter.
gamma (float) – step-size (equal to \(\frac{1}{2L}\) for guarantee).
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> L = 1 >>> gamma = 1 / L >>> pepit_tau, theoretical_tau = wc_no_lips_2(L=L, gamma=gamma, n=3, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 14x14 (PEPit) Setting up the problem: performance measure is minimum of 3 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 3 function(s) function 1 : Adding 12 scalar constraint(s) ... function 1 : 12 scalar constraint(s) added function 2 : Adding 12 scalar constraint(s) ... function 2 : 12 scalar constraint(s) added function 3 : Adding 25 scalar constraint(s) ... function 3 : 25 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.33333185324089176 *** Example file: worst-case performance of the NoLips_2 in Bregman distance *** PEPit guarantee: min_t Dh(x_(t-1), x_(t)) <= 0.333332 (F(x_0) - F(x_n)) Theoretical guarantee: min_t Dh(x_(t-1), x_(t)) <= 0.333333 (F(x_0) - F(x_n))
Stochastic and randomized convex minimization
Stochastic gradient descent
- PEPit.examples.stochastic_and_randomized_convex_minimization.wc_sgd(L, mu, gamma, v, R, n, verbose=1)[source]
Consider the finite sum minimization problem
\[F_\star \triangleq \min_x \left\{F(x) \equiv \frac{1}{n} \sum_{i=1}^n f_i(x)\right\},\]where \(f_1, ..., f_n\) are \(L\)-smooth and \(\mu\)-strongly convex. In addition, we assume a bounded variance at the optimal point (which is denoted by \(x_\star\)):
\[\mathbb{E}\left[\|\nabla f_i(x_\star)\|^2\right] = \frac{1}{n} \sum_{i=1}^n\|\nabla f_i(x_\star)\|^2 \leqslant v^2.\]This code computes a worst-case guarantee for one step of the stochastic gradient descent (SGD) in expectation, for the distance to an optimal point. That is, it computes the smallest possible \(\tau(L, \mu, \gamma, v, R, n)\) such that
\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \tau(L, \mu, \gamma, v, R, n)\]where \(\|x_0 - x_\star\|^2 \leqslant R^2\), where \(v\) is the variance at \(x_\star\), and where \(x_1\) is the output of one step of SGD (note that we use the notation \(x_0,x_1\) to denote two consecutive iterates for convenience; as the bound is valid for all \(x_0\), it is also valid for any pair of consecutive iterates of the algorithm).
Algorithm: One iteration of SGD is described by:
\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, n|]\right), \\ x_{t+1} & = & x_t - \gamma \nabla f_{i}(x_t), \end{eqnarray}\end{split}\]where \(\gamma\) is a step-size.
Theoretical guarantee: An empirically tight one-iteration guarantee is provided in the code of PESTO [1]:
\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \frac{1}{2}\left(1-\frac{\mu}{L}\right)^2 R^2 + \frac{1}{2}\left(1-\frac{\mu}{L}\right) R \sqrt{\left(1-\frac{\mu}{L}\right)^2 R^2 + 4\frac{v^2}{L^2}} + \frac{v^2}{L^2},\]when \(\gamma=\frac{1}{L}\). Note that we observe the guarantee does not depend on the number \(n\) of functions for this particular setting, thereby implying that the guarantees are also valid for expectation minimization settings (i.e., when \(n\) goes to infinity).
References: Empirically tight guarantee provided in code of [1]. Using SDPs for analyzing SGD-type method was proposed in [2, 3].
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
gamma (float) – the step-size.
v (float) – the variance bound.
R (float) – the initial distance.
n (int) – number of functions.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> mu = 0.1 >>> L = 1 >>> gamma = 1 / L >>> pepit_tau, theoretical_tau = wc_sgd(L=L, mu=mu, gamma=gamma, v=1, R=2, n=5, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 11x11 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (2 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 5 function(s) function 1 : Adding 2 scalar constraint(s) ... function 1 : 2 scalar constraint(s) added function 2 : Adding 2 scalar constraint(s) ... function 2 : 2 scalar constraint(s) added function 3 : Adding 2 scalar constraint(s) ... function 3 : 2 scalar constraint(s) added function 4 : Adding 2 scalar constraint(s) ... function 4 : 2 scalar constraint(s) added function 5 : Adding 2 scalar constraint(s) ... function 5 : 2 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 5.041652328250217 *** Example file: worst-case performance of stochastic gradient descent with fixed step-size *** PEPit guarantee: E[||x_1 - x_*||^2] <= 5.04165 ||x0 - x_*||^2 Theoretical guarantee: E[||x_1 - x_*||^2] <= 5.04165 ||x0 - x_*||^2
Stochastic gradient descent in overparametrized setting
- PEPit.examples.stochastic_and_randomized_convex_minimization.wc_sgd_overparametrized(L, mu, gamma, n, verbose=1)[source]
Consider the finite sum minimization problem
\[F_\star \triangleq \min_x \left\{F(x) \equiv \frac{1}{n} \sum_{i=1}^n f_i(x)\right\},\]where \(f_1, ..., f_n\) are \(L\)-smooth and \(\mu\)-strongly convex. In addition, we assume a zero variance at the optimal point (which is denoted by \(x_\star\)):
\[\mathbb{E}\left[\|\nabla f_i(x_\star)\|^2\right] = \frac{1}{n} \sum_{i=1}^n \|\nabla f_i(x_\star)\|^2 = 0,\]which happens for example in machine learning in the interpolation regime, that is if there exists a model \(x_\star\) such that the loss \(\mathcal{L}\) on any observation \((z_i)_{i \in [|1, n|]}\), \(\mathcal{L}(x_\star, z_i) = f_i(x_\star)\) is zero.
This code computes a worst-case guarantee for one step of the stochastic gradient descent (SGD) in expectation, for the distance to optimal point. That is, it computes the smallest possible \(\tau(L, \mu, \gamma, n)\) such that
\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \tau(L, \mu, \gamma, n) \|x_0 - x_\star\|^2\]is valid, where \(x_1\) is the output of one step of SGD.
Algorithm: One iteration of SGD is described by:
\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, n|]\right), \\ x_{t+1} & = & x_t - \gamma \nabla f_{i}(x_t), \end{eqnarray}\end{split}\]where \(\gamma\) is a step-size.
Theoretical guarantee: An empirically tight one-iteration guarantee is provided in the code of PESTO [1]:
\[\mathbb{E}\left[\|x_1 - x_\star\|^2\right] \leqslant \frac{1}{2}\left(1-\frac{\mu}{L}\right)^2 \|x_0-x_\star\|^2,\]when \(\gamma=\frac{1}{L}\). Note that we observe the guarantee does not depend on the number \(n\) of functions for this particular setting, thereby implying that the guarantees are also valid for expectation minimization settings (i.e., when \(n\) goes to infinity).
References: Empirically tight guarantee provided in code of [1]. Using SDPs for analyzing SGD-type method was proposed in [2, 3].
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
gamma (float) – the step-size.
n (int) – number of functions.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> mu = 0.1 >>> L = 1 >>> gamma = 1 / L >>> pepit_tau, theoretical_tau = wc_sgd_overparametrized(L=L, mu=mu, gamma=gamma, n=5, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 11x11 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (2 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 5 function(s) function 1 : Adding 2 scalar constraint(s) ... function 1 : 2 scalar constraint(s) added function 2 : Adding 2 scalar constraint(s) ... function 2 : 2 scalar constraint(s) added function 3 : Adding 2 scalar constraint(s) ... function 3 : 2 scalar constraint(s) added function 4 : Adding 2 scalar constraint(s) ... function 4 : 2 scalar constraint(s) added function 5 : Adding 2 scalar constraint(s) ... function 5 : 2 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.8099999999798264 *** Example file: worst-case performance of stochastic gradient descent with fixed step-size and with zero variance at the optimal point *** PEPit guarantee: E[||x_1 - x_*||^2] <= 0.81 ||x0 - x_*||^2 Theoretical guarantee: E[||x_1 - x_*||^2] <= 0.81 ||x0 - x_*||^2
SAGA
- PEPit.examples.stochastic_and_randomized_convex_minimization.wc_saga(L, mu, n, verbose=1)[source]
Consider the finite sum convex minimization problem
\[F_\star \triangleq \min_x \left\{F(x) \equiv h(x) + \frac{1}{n} \sum_{i=1}^{n} f_i(x)\right\},\]where the functions \(f_i\) are assumed to be \(L\)-smooth \(\mu\)-strongly convex, and \(h\) is closed, proper, and convex with a proximal operator readily available.
This code computes the exact rate for a Lyapunov (or energy) function for SAGA [1]. That is, it computes the smallest possible \(\tau(n,L,\mu)\) such this Lyapunov function decreases geometrically
\[\mathbb{E}[V^{(1)}] \leqslant \tau(n, L, \mu) V^{(0)},\]where the value of the Lyapunov function at iteration \(t\) is denoted by \(V^{(t)}\) and is defined as
\[V^{(t)} \triangleq \frac{1}{n} \sum_{i=1}^n \left(f_i(\phi_i^{(t)}) - f_i(x^\star) - \langle \nabla f_i(x^\star); \phi_i^{(t)} - x^\star\rangle\right) + \frac{1}{2 n \gamma (1-\mu \gamma)} \|x^{(t)} - x^\star\|^2,\]with \(\gamma = \frac{1}{2(\mu n+L)}\) (this Lyapunov function was proposed in [1, Theorem 1]). We consider the case \(t=0\) in the code below, without loss of generality.
In short, for given values of \(n\), \(L\), and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\mathbb{E}[V^{(1)}]\) when \(V(x^{(0)}) \leqslant 1\).
Algorithm: One iteration of SAGA [1] is described as follows: at iteration \(t\), pick \(j\in\{1,\ldots,n\}\) uniformely at random and set:
\begin{eqnarray} \phi_j^{(t+1)} & = & x^{(t)} \\ w^{(t+1)} & = & x^{(t)} - \gamma \left[ \nabla f_j (\phi_j^{(t+1)}) - \nabla f_j(\phi_j^{(t)}) + \frac{1}{n} \sum_{i=1}^n(\nabla f_i(\phi^{(t)}))\right] \\ x^{(t+1)} & = & \mathrm{prox}_{\gamma h} (w^{(t+1)})\triangleq \arg\min_x \left\{ \gamma h(x)+\frac{1}{2}\|x-w^{(t+1)}\|^2\right\} \end{eqnarray}Theoretical guarantee: The following upper bound (empirically tight) can be found in [1, Theorem 1]:
\[\mathbb{E}[V^{(t+1)}] \leqslant \left(1-\gamma\mu \right)V^{(t)}\]References:
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of functions.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_saga(L=1, mu=.1, n=5, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 27x27 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 6 function(s) function 1 : Adding 30 scalar constraint(s) ... function 1 : 30 scalar constraint(s) added function 2 : Adding 6 scalar constraint(s) ... function 2 : 6 scalar constraint(s) added function 3 : Adding 6 scalar constraint(s) ... function 3 : 6 scalar constraint(s) added function 4 : Adding 6 scalar constraint(s) ... function 4 : 6 scalar constraint(s) added function 5 : Adding 6 scalar constraint(s) ... function 5 : 6 scalar constraint(s) added function 6 : Adding 6 scalar constraint(s) ... function 6 : 6 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.9666748513396348 *** Example file: worst-case performance of SAGA for Lyapunov function V_t *** PEPit guarantee: V^(1) <= 0.966675 V^(0) Theoretical guarantee: V^(1) <= 0.966667 V^(0)
Point SAGA
- PEPit.examples.stochastic_and_randomized_convex_minimization.wc_point_saga(L, mu, n, verbose=1)[source]
Consider the finite sum minimization problem
\[F^\star \triangleq \min_x \left\{F(x) \equiv \frac{1}{n} \sum_{i=1}^n f_i(x)\right\},\]where \(f_1, \dots, f_n\) are \(L\)-smooth and \(\mu\)-strongly convex, and with proximal operator readily available.
This code computes a tight (one-step) worst-case guarantee using a Lyapunov function for Point SAGA [1]. The Lyapunov (or energy) function at a point \(x\) is given in [1, Theorem 5]:
\[V(x) = \frac{1}{L \mu}\frac{1}{n} \sum_{i \leq n} \|\nabla f_i(x) - \nabla f_i(x_\star)\|^2 + \|x - x^\star\|^2,\]where \(x^\star\) denotes the minimizer of \(F\). The code computes the smallest possible \(\tau(n, L, \mu)\) such that the guarantee (in expectation):
\[\mathbb{E}\left[V\left(x^{(1)}\right)\right] \leqslant \tau(n, L, \mu) V\left(x^{(0)}\right),\]is valid (note that we use the notation \(x^{(0)},x^{(1)}\) to denote two consecutive iterates for convenience; as the bound is valid for all \(x^{(0)}\), it is also valid for any pair of consecutive iterates of the algorithm).
In short, for given values of \(n\), \(L\), and \(\mu\), \(\tau(n, L, \mu)\) is computed as the worst-case value of \(\mathbb{E}\left[V\left(x^{(1)}\right)\right]\) when \(V\left(x^{(0)}\right) \leqslant 1\).
Algorithm: Point SAGA is described by
\[\begin{split}\begin{eqnarray} \text{Set }\gamma & = & \frac{\sqrt{(n - 1)^2 + 4n\frac{L}{\mu}}}{2Ln} - \frac{\left(1 - \frac{1}{n}\right)}{2L} \\ \text{Pick random }j & \sim & \mathcal{U}\left([|1, n|]\right) \\ z^{(t)} & = & x_t + \gamma \left(g_j^{(t)} - \frac{1}{n} \sum_{i\leq n}g_i^{(t)} \right), \\ x^{(t+1)} & = & \mathrm{prox}_{\gamma f_j}(z^{(t)})\triangleq \arg\min_x\left\{ \gamma f_j(x)+\frac{1}{2} \|x-z^{(t)}\|^2 \right\}, \\ g_j^{(t+1)} & = & \frac{1}{\gamma}(z^{(t)} - x^{(t+1)}). \end{eqnarray}\end{split}\]Theoretical guarantee: A theoretical upper bound is given in [1, Theorem 5].
\[\mathbb{E}\left[V\left(x^{(t+1)}\right)\right] \leqslant \frac{1}{1 + \mu\gamma} V\left(x^{(t)}\right)\]References:
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of functions.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_point_saga(L=1, mu=.01, n=10, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 31x31 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 10 function(s) function 1 : Adding 2 scalar constraint(s) ... function 1 : 2 scalar constraint(s) added function 2 : Adding 2 scalar constraint(s) ... function 2 : 2 scalar constraint(s) added function 3 : Adding 2 scalar constraint(s) ... function 3 : 2 scalar constraint(s) added function 4 : Adding 2 scalar constraint(s) ... function 4 : 2 scalar constraint(s) added function 5 : Adding 2 scalar constraint(s) ... function 5 : 2 scalar constraint(s) added function 6 : Adding 2 scalar constraint(s) ... function 6 : 2 scalar constraint(s) added function 7 : Adding 2 scalar constraint(s) ... function 7 : 2 scalar constraint(s) added function 8 : Adding 2 scalar constraint(s) ... function 8 : 2 scalar constraint(s) added function 9 : Adding 2 scalar constraint(s) ... function 9 : 2 scalar constraint(s) added function 10 : Adding 2 scalar constraint(s) ... function 10 : 2 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.9714053941143999 *** Example file: worst-case performance of Point SAGA for a given Lyapunov function *** PEPit guarantee: E[V(x^(1))] <= 0.971405 V(x^(0)) Theoretical guarantee: E[V(x^(1))] <= 0.973292 V(x^(0))
Randomized coordinate descent for smooth strongly convex functions
- PEPit.examples.stochastic_and_randomized_convex_minimization.wc_randomized_coordinate_descent_smooth_strongly_convex(L, mu, gamma, d, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for randomized block-coordinate descent with step-size \(\gamma\). That is, it computes the smallest possible \(\tau(L, \mu, \gamma, d)\) such that the guarantee
\[\mathbb{E}_i[\|x_{t+1}^{(i)} - x_\star \|^2] \leqslant \tau(L, \mu, \gamma, d) \|x_{t} - x_\star\|^2\]where \(x_{t+1}^{(i)}\) denotes the value of the iterate \(x_{t+1}\) in the scenario where the \(i\) th block of coordinates is selected for the update with fixed step-size \(\gamma\), \(d\) is the number of blocks of coordinates and where \(x_\star\) is a minimizer of \(f\).
In short, for given values of \(\mu\), \(L\), \(d\), and \(\gamma\), \(\tau(L, \mu, \gamma, d)\) is computed as the worst-case value of \(\mathbb{E}_i[\|x_{t+1}^{(i)} - x_\star \|^2]\) when \(\|x_t - x_\star\|^2 \leqslant 1\).
Algorithm: Randomized block-coordinate descent is described by
\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, d|]\right), \\ x_{t+1}^{(i)} & = & x_t - \gamma \nabla_i f(x_t), \end{eqnarray}\end{split}\]where \(\gamma\) is a step-size and \(\nabla_i f(x_t)\) is the partial derivative corresponding to the block \(i\).
Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), the tight theoretical guarantee can be found in [1, Appendix I, Theorem 17]:
\[\mathbb{E}_i[\|x_{t+1}^{(i)} - x_\star \|^2] \leqslant \rho^2 \|x_t-x_\star\|^2,\]where \(\rho^2 = \max \left( \frac{(\gamma\mu - 1)^2 + d - 1}{d},\frac{(\gamma L - 1)^2 + d - 1}{d} \right)\).
References:
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong-convexity parameter.
gamma (float) – the step-size.
d (int) – the dimension.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> L = 1 >>> mu = 0.1 >>> gamma = 2 / (mu + L) >>> pepit_tau, theoretical_tau = wc_randomized_coordinate_descent_smooth_strongly_convex(L=L, mu=mu, gamma=gamma, d=2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 4x4 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (3 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 2 scalar constraint(s) ... function 1 : 2 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.8347107377149059 *** Example file: worst-case performance of randomized coordinate gradient descent *** PEPit guarantee: E||x_(n+1) - x_*||^2 <= 0.834711 ||x_n - x_*||^2 Theoretical guarantee: E||x_(n+1) - x_*||^2 <= 0.834711 ||x_n - x_*||^2
Randomized coordinate descent for smooth convex functions
- PEPit.examples.stochastic_and_randomized_convex_minimization.wc_randomized_coordinate_descent_smooth_convex(L, gamma, d, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is convex and \(L\)-smooth.
This code computes a worst-case guarantee for randomized block-coordinate descent with fixed step-size \(\gamma\). That is, it verifies that the inequality holds (the expectation is over the index of the block of coordinates that is randomly selected)
\[\mathbb{E}_i[\phi(x_{t+1}^{(i)})] \leqslant \phi(x_{t}),\]where \(x_{t+1}^{(i)}\) denotes the value of the iterate \(x_{t+1}\) in the scenario where the \(i\) th block of coordinates is selected for the update with fixed step-size \(\gamma\), and \(d\) is the number of blocks of coordinates.
In short, for given values of \(L\), \(d\), and \(\gamma\), it computes the worst-case value of \(\mathbb{E}_i[\phi(x_{t+1}^{(i)})]\) such that \(\phi(x_{t}) \leqslant 1\).
Algorithm: Randomized block-coordinate descent is described by
\[\begin{split}\begin{eqnarray} \text{Pick random }i & \sim & \mathcal{U}\left([|1, d|]\right), \\ x_{t+1}^{(i)} & = & x_t - \gamma \nabla_i f(x_t), \end{eqnarray}\end{split}\]where \(\gamma\) is a step-size and \(\nabla_i f(x_t)\) is the partial derivative corresponding to the block \(i\).
Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), the tight theoretical guarantee can be found in [1, Appendix I, Theorem 16]:
\[\mathbb{E}_i[\phi(x^{(i)}_{t+1})] \leqslant \phi(x_{t}),\]where \(\phi(x_t) = d_t (f(x_t) - f_\star) + \frac{L}{2} \|x_t - x_\star\|^2\), \(d_{t+1} = d_t + \frac{\gamma L}{d}\), and \(d_t \geqslant 1\).
References:
- Parameters
L (float) – the smoothness parameter.
gamma (float) – the step-size.
d (int) – the dimension.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> L = 1 >>> pepit_tau, theoretical_tau = wc_randomized_coordinate_descent_smooth_convex(L=L, gamma=1 / L, d=2, n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 12x12 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (9 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 42 scalar constraint(s) ... function 1 : 42 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.9999978377393944 *** Example file: worst-case performance of randomized coordinate gradient descent *** PEPit guarantee: E[phi_(n+1)(x_(n+1))] <= 0.999998 phi_n(x_n) Theoretical guarantee: E[phi_(n+1)(x_(n+1))] <= 1.0 phi_n(x_n)
Monotone inclusions and variational inequalities
Proximal point
- PEPit.examples.monotone_inclusions_variational_inequalities.wc_proximal_point(alpha, n, verbose=1)[source]
Consider the monotone inclusion problem
\[\mathrm{Find}\, x:\, 0\in Ax,\]where \(A\) is maximally monotone. We denote \(J_A = (I + A)^{-1}\) the resolvents of \(A\).
This code computes a worst-case guarantee for the proximal point method. That, it computes the smallest possible \(\tau(n, \alpha)\) such that the guarantee
\[\|x_n - x_{n-1}\|^2 \leqslant \tau(n, \alpha) \|x_0 - x_\star\|^2,\]is valid, where \(x_\star\) is such that \(0 \in Ax_\star\).
Algorithm: The proximal point algorithm for monotone inclusions is described as follows, for \(t \in \{ 0, \dots, n-1\}\),
\[x_{t+1} = J_{\alpha A}(x_t),\]where \(\alpha\) is a step-size.
Theoretical guarantee: A tight theoretical guarantee can be found in [1, section 4].
\[\|x_n - x_{n-1}\|^2 \leqslant \frac{\left(1 - \frac{1}{n}\right)^{n - 1}}{n} \|x_0 - x_\star\|^2.\]Reference:
- Parameters
alpha (float) – the step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> pepit_tau, theoretical_tau = wc_proximal_point(alpha=2, n=10, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 12x12 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 110 scalar constraint(s) ... function 1 : 110 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.03874199421010509 *** Example file: worst-case performance of the Proximal Point Method*** PEPit guarantee: ||x(n) - x(n-1)||^2 <= 0.038742 ||x0 - xs||^2 Theoretical guarantee: ||x(n) - x(n-1)||^2 <= 0.038742 ||x0 - xs||^2
Accelerated proximal point
- PEPit.examples.monotone_inclusions_variational_inequalities.wc_accelerated_proximal_point(alpha, n, verbose=1)[source]
Consider the monotone inclusion problem
\[\mathrm{Find}\, x:\, 0\in Ax,\]where \(A\) is maximally monotone. We denote \(J_A = (I + A)^{-1}\) the resolvents of \(A\).
This code computes a worst-case guarantee for the accelerated proximal point method proposed in [1]. That, it computes the smallest possible \(\tau(n, \alpha)\) such that the guarantee
\[\|x_n - y_n\|^2 \leqslant \tau(n, \alpha) \|x_0 - x_\star\|^2,\]is valid, where \(x_\star\) is such that \(0 \in Ax_\star\).
Algorithm: Accelerated proximal point is described as follows, for \(t \in \{ 0, \dots, n-1\}\)
\[\begin{split}\begin{eqnarray} x_{t+1} & = & J_{\alpha A}(y_t), \\ y_{t+1} & = & x_{t+1} + \frac{t}{t+2}(x_{t+1} - x_{t}) - \frac{t}{t+1}(x_t - y_{t-1}), \end{eqnarray}\end{split}\]where \(x_0=y_0=y_{-1}\)
Theoretical guarantee: A tight theoretical worst-case guarantee can be found in [1, Theorem 4.1], for \(n \geqslant 1\),
\[\|x_n - y_{n-1}\|^2 \leqslant \frac{1}{n^2} \|x_0 - x_\star\|^2.\]Reference:
- Parameters
alpha (float) – the step-size
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_proximal_point(alpha=2, n=10, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 12x12 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 110 scalar constraint(s) ... function 1 : 110 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.010000353550061647 *** Example file: worst-case performance of the Accelerated Proximal Point Method*** PEPit guarantee: ||x_n - y_n||^2 <= 0.0100004 ||x_0 - x_s||^2 Theoretical guarantee: ||x_n - y_n||^2 <= 0.01 ||x_0 - x_s||^2
Optimal Strongly-monotone Proximal Point
- PEPit.examples.monotone_inclusions_variational_inequalities.wc_optimal_strongly_monotone_proximal_point(n, mu, verbose=1)[source]
Consider the monotone inclusion problem
\[\mathrm{Find}\, x:\, 0\in Ax,\]where \(A\) is maximally \(\mu\)-strongly monotone. We denote by \(J_{A}\) the resolvent of \(A\).
For any \(x\) such that \(x = J_{A} y\) for some \(y\), define the resolvent residual \(\tilde{A}x = y - J_{A}y \in Ax\).
This code computes a worst-case guarantee for the Optimal Strongly-monotone Proximal Point Method (OS-PPM). That is, it computes the smallest possible \(\tau(n, \mu)\) such that the guarantee
\[\|\tilde{A}x_n\|^2 \leqslant \tau(n, \mu) \|x_0 - x_\star\|^2,\]is valid, where \(x_n\) is the output of the Optimal Strongly-monotone Proximal Point Method, and \(x_\star\) is a zero of \(A\). In short, for a given value of \(n, \mu\), \(\tau(n, \mu)\) is computed as the worst-case value of \(\|\tilde{A}x_n\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: The Optimal Strongly-monotone Proximal Point Method can be written as
\begin{eqnarray} x_{t+1} & = & J_{A} y_t,\\ y_{t+1} & = & x_{t+1} + \frac{\varphi_{t} - 1}{\varphi_{t+1}} (x_{t+1} - x_t) - \frac{2 \mu \varphi_{t}}{\varphi_{t+1}} (y_t - x_{t+1}) \\ & & + \frac{(1+2\mu) \varphi_{t-1}}{\varphi_{t+1}} (y_{t-1} - x_t). \end{eqnarray}where \(\varphi_k = \sum_{i=0}^k (1+2\mu)^{2i}\) with \(\varphi_{-1}=0\) and \(x_0 = y_0 = y_{-1}\) is a starting point.
This method is equivalent to the Optimal Contractive Halpern iteration.
Theoretical guarantee: A tight worst-case guarantee for the Optimal Strongly-monotone Proximal Point Method can be found in [1, Theorem 3.2, Corollary 4.2]:
\[\|\tilde{A}x_n\|^2 \leqslant \left( \frac{1}{\sum_{k=0}^{N-1} (1+2\mu)^k} \right)^2 \|x_0 - x_\star\|^2.\]References: The detailed approach and tight bound are available in [1].
- Parameters
n (int) – number of iterations.
mu (float) – \(\mu \ge 0\). \(A\) will be maximal \(\mu\)-strongly monotone.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> pepit_tau, theoretical_tau = wc_optimal_strongly_monotone_proximal_point(n=10, mu=0.05, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 12x12 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 110 scalar constraint(s) ... function 1 : 110 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.003937868091430488 *** Example file: worst-case performance of Optimal Strongly-monotone Proximal Point Method *** PEPit guarantee: ||AxN||^2 <= 0.00393787 ||x0 - x_*||^2 Theoretical guarantee: ||AxN||^2 <= 0.00393698 ||x0 - x_*||^2
Douglas Rachford Splitting
- PEPit.examples.monotone_inclusions_variational_inequalities.wc_douglas_rachford_splitting(L, mu, alpha, theta, verbose=1)[source]
Consider the monotone inclusion problem
\[\mathrm{Find}\, x:\, 0\in Ax + Bx,\]where \(A\) is \(L\)-Lipschitz and maximally monotone and \(B\) is (maximally) \(\mu\)-strongly monotone. We denote by \(J_{\alpha A}\) and \(J_{\alpha B}\) the resolvents of respectively A and B, with step-sizes \(\alpha\).
This code computes a worst-case guarantee for the Douglas-Rachford splitting (DRS). That is, given two initial points \(w^{(0)}_t\) and \(w^{(1)}_t\), this code computes the smallest possible \(\tau(L, \mu, \alpha, \theta)\) (a.k.a. “contraction factor”) such that the guarantee
\[\|w^{(0)}_{t+1} - w^{(1)}_{t+1}\|^2 \leqslant \tau(L, \mu, \alpha, \theta) \|w^{(0)}_{t} - w^{(1)}_{t}\|^2,\]is valid, where \(w^{(0)}_{t+1}\) and \(w^{(1)}_{t+1}\) are obtained after one iteration of DRS from respectively \(w^{(0)}_{t}\) and \(w^{(1)}_{t}\).
In short, for given values of \(L\), \(\mu\), \(\alpha\) and \(\theta\), the contraction factor \(\tau(L, \mu, \alpha, \theta)\) is computed as the worst-case value of \(\|w^{(0)}_{t+1} - w^{(1)}_{t+1}\|^2\) when \(\|w^{(0)}_{t} - w^{(1)}_{t}\|^2 \leqslant 1\).
Algorithm: One iteration of the Douglas-Rachford splitting is described as follows, for \(t \in \{ 0, \dots, n-1\}\),
\begin{eqnarray} x_{t+1} & = & J_{\alpha B} (w_t),\\ y_{t+1} & = & J_{\alpha A} (2x_{t+1}-w_t),\\ w_{t+1} & = & w_t - \theta (x_{t+1}-y_{t+1}). \end{eqnarray}Theoretical guarantee: Theoretical worst-case guarantees can be found in [1, section 4, Theorem 4.3]. Since the results of [2] tighten that of [1], we compare with [2, Theorem 4.3] below. The theoretical results are complicated and we do not copy them here.
References: The detailed PEP methodology for studying operator splitting is provided in [2].
- Parameters
L (float) – the Lipschitz parameter.
mu (float) – the strongly monotone parameter.
alpha (float) – the step-size in the resolvent.
theta (float) – algorithm parameter.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> pepit_tau, theoretical_tau = wc_douglas_rachford_splitting(L=1, mu=.1, alpha=1.3, theta=.9, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 6x6 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 4 scalar constraint(s) ... function 1 : 4 scalar constraint(s) added function 2 : Adding 2 scalar constraint(s) ... function 2 : 2 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.928770693164459 *** Example file: worst-case performance of the Douglas Rachford Splitting*** PEPit guarantee: ||w_(t+1)^0 - w_(t+1)^1||^2 <= 0.928771 ||w_(t)^0 - w_(t)^1||^2 Theoretical guarantee: ||w_(t+1)^0 - w_(t+1)^1||^2 <= 0.928771 ||w_(t)^0 - w_(t)^1||^2
Three operator splitting
- PEPit.examples.monotone_inclusions_variational_inequalities.wc_three_operator_splitting(L, mu, beta, alpha, theta, verbose=1)[source]
Consider the monotone inclusion problem
\[\mathrm{Find}\, x:\, 0\in Ax + Bx + Cx,\]where \(A\) is maximally monotone, \(B\) is \(\beta\)-cocoercive and C is the gradient of some \(L\)-smooth \(\mu\)-strongly convex function. We denote by \(J_{\alpha A}\) and \(J_{\alpha B}\) the resolvent of respectively \(A\) and \(B\), with step-size \(\alpha\).
This code computes a worst-case guarantee for the three operator splitting (TOS). That is, given two initial points \(w^{(0)}_t\) and \(w^{(1)}_t\), this code computes the smallest possible \(\tau(L, \mu, \beta, \alpha, \theta)\) (a.k.a. “contraction factor”) such that the guarantee
\[\|w^{(0)}_{t+1} - w^{(1)}_{t+1}\|^2 \leqslant \tau(L, \mu, \beta, \alpha, \theta) \|w^{(0)}_{t} - w^{(1)}_{t}\|^2,\]is valid, where \(w^{(0)}_{t+1}\) and \(w^{(1)}_{t+1}\) are obtained after one iteration of TOS from respectively \(w^{(0)}_{t}\) and \(w^{(1)}_{t}\).
In short, for given values of \(L\), \(\mu\), \(\beta\), \(\alpha\) and \(\theta\), the contraction factor \(\tau(L, \mu, \beta, \alpha, \theta)\) is computed as the worst-case value of \(\|w^{(0)}_{t+1} - w^{(1)}_{t+1}\|^2\) when \(\|w^{(0)}_{t} - w^{(1)}_{t}\|^2 \leqslant 1\).
Algorithm: One iteration of the algorithm is described in [1]. For \(t \in \{ 0, \dots, n-1\}\),
\begin{eqnarray} x_{t+1} & = & J_{\alpha B} (w_t),\\ y_{t+1} & = & J_{\alpha A} (2x_{t+1} - w_t - C x_{t+1}),\\ w_{t+1} & = & w_t - \theta (x_{t+1} - y_{t+1}). \end{eqnarray}References: The TOS was proposed in [1], the analysis of such operator splitting methods using PEPs was proposed in [2].
- Parameters
L (float) – smoothness constant of C.
mu (float) – strong convexity of C.
beta (float) – cocoercivity of B.
alpha (float) – step-size (in the resolvants).
theta (float) – overrelaxation parameter.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (None) – no theoretical value.
Example
>>> pepit_tau, theoretical_tau = wc_three_operator_splitting(L=1, mu=.1, beta=1, alpha=.9, theta=1.3, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 8x8 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 3 function(s) function 1 : Adding 2 scalar constraint(s) ... function 1 : 2 scalar constraint(s) added function 2 : Adding 2 scalar constraint(s) ... function 2 : 2 scalar constraint(s) added function 3 : Adding 2 scalar constraint(s) ... function 3 : 2 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.7796889999218343 *** Example file: worst-case contraction factor of the Three Operator Splitting *** PEPit guarantee: ||w_(t+1)^0 - w_(t+1)^1||^2 <= 0.779689 ||w_(t)^0 - w_(t)^1||^2
Optimistic gradient
- PEPit.examples.monotone_inclusions_variational_inequalities.wc_optimistic_gradient(n, gamma, L, verbose=1)[source]
Consider the monotone variational inequality
\[\mathrm{Find}\, x_\star \in C\text{ such that } \left<F(x_\star);x-x_\star\right> \geqslant 0\,\,\forall x\in C,\]where \(C\) is a closed convex set and \(F\) is maximally monotone and Lipschitz.
This code computes a worst-case guarantee for the optimistic gradient method. That, it computes the smallest possible \(\tau(n)\) such that the guarantee
\[\|\tilde{x}_n - \tilde{x}_{n-1}\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2,\]is valid, where \(\tilde{x}_n\) is the output of the optimistic gradient method and \(x_0\) its starting point.
Algorithm: The optimistic gradient method is described as follows, for \(t \in \{ 0, \dots, n-1\}\),
\begin{eqnarray} \tilde{x}_{t} & = & \mathrm{Proj}_{C} [x_t-\gamma F(\tilde{x}_{t-1})], \\ {x}_{t+1} & = & \tilde{x}_t + \gamma (F(\tilde{x}_{t-1}) - F(\tilde{x}_t)). \end{eqnarray}where \(\gamma\) is some step-size.
Theoretical guarantee: The method and many variants of it are discussed in [1] and a PEP formulation suggesting a worst-case guarantee in \(O(1/n)\) can be found in [2, Appendix D].
References:
- Parameters
n (int) – number of iterations.
gamma (float) – the step-size.
L (float) – the Lipschitz constant.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (None) – no theoretical bound.
Example
>>> pepit_tau, theoretical_tau = wc_optimistic_gradient(n=5, gamma=1 / 4, L=1, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 15x15 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 49 scalar constraint(s) ... function 1 : 49 scalar constraint(s) added function 2 : Adding 84 scalar constraint(s) ... function 2 : 84 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.06631469189357277 *** Example file: worst-case performance of the Optimistic Gradient Method*** PEPit guarantee: ||x(n) - x(n-1)||^2 <= 0.0663147 ||x0 - xs||^2
Past extragradient
- PEPit.examples.monotone_inclusions_variational_inequalities.wc_past_extragradient(n, gamma, L, verbose=1)[source]
Consider the monotone variational inequality
\[\mathrm{Find}\, x_\star \in C\text{ such that } \left<F(x_\star);x-x_\star\right> \geqslant 0\,\,\forall x\in C,\]where \(C\) is a closed convex set and \(F\) is maximally monotone and Lipschitz.
This code computes a worst-case guarantee for the past extragradient method. That, it computes the smallest possible \(\tau(n)\) such that the guarantee
\[\|x_n - x_{n-1}\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2,\]is valid, where \(x_n\) is the output of the past extragradient method and \(x_0\) its starting point.
Algorithm: The past extragradient method is described as follows, for \(t \in \{ 0, \dots, n-1\}\),
\begin{eqnarray} \tilde{x}_{t} & = & \mathrm{Proj}_{C} [x_t-\gamma F(\tilde{x}_{t-1})], \\ {x}_{t+1} & = & \mathrm{Proj}_{C} [x_t-\gamma F(\tilde{x}_{t})]. \end{eqnarray}where \(\gamma\) is some step-size.
Theoretical guarantee: The method and many variants of it are discussed in [1]. A worst-case guarantee in \(O(1/n)\) can be found in [2, 3].
References:
- Parameters
n (int) – number of iterations.
gamma (float) – the step-size.
L (float) – the Lipschitz constant.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (None) – no theoretical bound.
Example
>>> pepit_tau, theoretical_tau = wc_past_extragradient(n=5, gamma=1 / 4, L=1, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 20x20 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 144 scalar constraint(s) ... function 1 : 144 scalar constraint(s) added function 2 : Adding 84 scalar constraint(s) ... function 2 : 84 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.06026126041500441 *** Example file: worst-case performance of the Past Extragradient Method*** PEPit guarantee: ||x(n) - x(n-1)||^2 <= 0.0602613 ||x0 - xs||^2
Fixed point
Halpern iteration
- PEPit.examples.fixed_point_problems.wc_halpern_iteration(n, verbose=1)[source]
Consider the fixed point problem
\[\mathrm{Find}\, x:\, x = Ax,\]where \(A\) is a non-expansive operator, that is a \(L\)-Lipschitz operator with \(L=1\).
This code computes a worst-case guarantee for the Halpern Iteration. That is, it computes the smallest possible \(\tau(n)\) such that the guarantee
\[\|x_n - Ax_n\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the Halpern iteration, and \(x_\star\) the fixed point of \(A\).
In short, for a given value of \(n\), \(\tau(n)\) is computed as the worst-case value of \(\|x_n - Ax_n\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: The Halpern iteration can be written as
\[x_{t+1} = \frac{1}{t + 2} x_0 + \left(1 - \frac{1}{t + 2}\right) Ax_t.\]Theoretical guarantee: A tight worst-case guarantee for Halpern iteration can be found in [1, Theorem 2.1]:
\[\|x_n - Ax_n\|^2 \leqslant \left(\frac{2}{n+1}\right)^2 \|x_0 - x_\star\|^2.\]References: The detailed approach and tight bound are available in [1].
- Parameters
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_halpern_iteration(n=25, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 28x28 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 702 scalar constraint(s) ... function 1 : 702 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.005933984368783424 *** Example file: worst-case performance of Halpern Iterations *** PEPit guarantee: ||xN - AxN||^2 <= 0.00593398 ||x0 - x_*||^2 Theoretical guarantee: ||xN - AxN||^2 <= 0.00591716 ||x0 - x_*||^2
Optimal Contractive Halpern iteration
- PEPit.examples.fixed_point_problems.wc_optimal_contractive_halpern_iteration(n, gamma, verbose=1)[source]
Consider the fixed point problem
\[\mathrm{Find}\, x:\, x = Ax,\]where \(A\) is a \(1/\gamma\)-contractive operator, i.e. a \(L\)-Lipschitz operator with \(L=1/\gamma\).
This code computes a worst-case guarantee for the Optimal Contractive Halpern Iteration. That is, it computes the smallest possible \(\tau(n, \gamma)\) such that the guarantee
\[\|x_n - Ax_n\|^2 \leqslant \tau(n, \gamma) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the Optimal Contractive Halpern iteration, and \(x_\star\) is the fixed point of \(A\). In short, for a given value of \(n, \gamma\), \(\tau(n, \gamma)\) is computed as the worst-case value of \(\|x_n - Ax_n\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: The Optimal Contractive Halpern iteration can be written as
\[x_{t+1} = \left(1 - \frac{1}{\varphi_{t+1}} \right) Ax_t + \frac{1}{\varphi_{t+1}} x_0.\]where \(\varphi_k = \sum_{i=0}^k \gamma^{2i}\) and \(x_0\) is a starting point.
Theoretical guarantee: A tight worst-case guarantee for the Optimal Contractive Halpern iteration can be found in [1, Corollary 3.3, Theorem 4.1]:
\[\|x_n - Ax_n\|^2 \leqslant \left(1 + \frac{1}{\gamma}\right)^2 \left( \frac{1}{\sum_{k=0}^n \gamma^k} \right)^2 \|x_0 - x_\star\|^2.\]References: The detailed approach and tight bound are available in [1].
- Parameters
n (int) – number of iterations.
gamma (float) – \(\gamma \ge 1\). \(A\) will be \(1/\gamma\)-contractive.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_optimal_contractive_halpern_iteration(n=10, gamma=1.1, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 13x13 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 132 scalar constraint(s) ... function 1 : 132 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.010613882599724987 *** Example file: worst-case performance of Optimal Contractive Halpern Iterations *** PEPit guarantee: ||xN - AxN||^2 <= 0.0106139 ||x0 - x_*||^2 Theoretical guarantee: ||xN - AxN||^2 <= 0.0106132 ||x0 - x_*||^2
Krasnoselskii-Mann with constant step-sizes
- PEPit.examples.fixed_point_problems.wc_krasnoselskii_mann_constant_step_sizes(n, gamma, verbose=1)[source]
Consider the fixed point problem
\[\mathrm{Find}\, x:\, x = Ax,\]where \(A\) is a non-expansive operator, that is a \(L\)-Lipschitz operator with \(L=1\).
This code computes a worst-case guarantee for the Krasnolselskii-Mann (KM) method with constant step-size. That is, it computes the smallest possible \(\tau(n)\) such that the guarantee
\[\frac{1}{4}\|x_n - Ax_n\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the KM method, and \(x_\star\) is some fixed point of \(A\) (i.e., \(x_\star=Ax_\star\)).
Algorithm: The constant step-size KM method is described by
\[x_{t+1} = \left(1 - \gamma\right) x_{t} + \gamma Ax_{t}.\]Theoretical guarantee: A theoretical upper bound is provided by [1, Theorem 4.9]
\[\begin{split}\tau(n) = \left\{ \begin{eqnarray} \frac{1}{n+1}\left(\frac{n}{n+1}\right)^n \frac{1}{4 \gamma (1 - \gamma)}\quad & \text{if } \frac{1}{2}\leqslant \gamma \leqslant \frac{1}{2}\left(1+\sqrt{\frac{n}{n+1}}\right) \\ (\gamma - 1)^{2n} \quad & \text{if } \frac{1}{2}\left(1+\sqrt{\frac{n}{n+1}}\right) < \gamma \leqslant 1. \end{eqnarray} \right.\end{split}\]Reference:
- Parameters
n (int) – number of iterations.
gamma (float) – step-size between 1/2 and 1
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_krasnoselskii_mann_constant_step_sizes(n=3, gamma=3 / 4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 6x6 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 20 scalar constraint(s) ... function 1 : 20 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.14062586461718285 *** Example file: worst-case performance of Kranoselskii-Mann iterations *** PEPit guarantee: 1/4||xN - AxN||^2 <= 0.140626 ||x0 - x_*||^2 Theoretical guarantee: 1/4||xN - AxN||^2 <= 0.140625 ||x0 - x_*||^2
Krasnoselskii-Mann with increasing step-sizes
- PEPit.examples.fixed_point_problems.wc_krasnoselskii_mann_increasing_step_sizes(n, verbose=1)[source]
Consider the fixed point problem
\[\mathrm{Find}\, x:\, x = Ax,\]where \(A\) is a non-expansive operator, that is a \(L\)-Lipschitz operator with \(L=1\).
This code computes a worst-case guarantee for the Krasnolselskii-Mann method. That is, it computes the smallest possible \(\tau(n)\) such that the guarantee
\[\frac{1}{4}\|x_n - Ax_n\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the KM method, and \(x_\star\) is some fixed point of \(A\) (i.e., \(x_\star=Ax_\star\)).
Algorithm: The KM method is described by
\[x_{t+1} = \frac{1}{t + 2} x_{t} + \left(1 - \frac{1}{t + 2}\right) Ax_{t}.\]Reference: This scheme was first studied using PEPs in [1].
- Parameters
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (None) – no theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_krasnoselskii_mann_increasing_step_sizes(n=3, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 6x6 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 20 scalar constraint(s) ... function 1 : 20 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.11963406474148795 *** Example file: worst-case performance of Kranoselskii-Mann iterations *** PEPit guarantee: 1/4 ||xN - AxN||^2 <= 0.119634 ||x0 - x_*||^2
Potential functions
Gradient descent Lyapunov 1
- PEPit.examples.potential_functions.wc_gradient_descent_lyapunov_1(L, gamma, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code verifies a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it verifies that the Lyapunov (or potential/energy) function
\[V_n \triangleq n (f(x_n) - f_\star) + \frac{L}{2} \|x_n - x_\star\|^2\]is decreasing along all trajectories and all smooth convex function \(f\) (i.e., in the worst-case):
\[V_{n+1} \leqslant V_n,\]where \(x_{n+1}\) is obtained from a gradient step from \(x_{n}\) with fixed step-size \(\gamma=\frac{1}{L}\).
Algorithm: Onte iteration of gradient descent is described by
\[x_{n+1} = x_n - \gamma \nabla f(x_n),\]where \(\gamma\) is a step-size.
Theoretical guarantee: The theoretical guarantee can be found in e.g., [1, Theorem 3.3]:
\[V_{n+1} - V_n \leqslant 0,\]when \(\gamma=\frac{1}{L}\).
References: The detailed potential function can found in [1] and the SDP approach can be found in [2].
- Parameters
L (float) – the smoothness parameter.
gamma (float) – the step-size.
n (int) – current iteration number.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Examples
>>> L = 1 >>> pepit_tau, theoretical_tau = wc_gradient_descent_lyapunov_1(L=L, gamma=1 / L, n=10, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 4x4 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 6 scalar constraint(s) ... function 1 : 6 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 3.3902995517363515e-18 *** Example file: worst-case performance of gradient descent with fixed step-size for a given Lyapunov function*** PEPit guarantee: V_(n+1) - V_(n) <= 3.3903e-18 Theoretical guarantee: V_(n+1) - V_(n) <= 0.0
Gradient descent Lyapunov 2
- PEPit.examples.potential_functions.wc_gradient_descent_lyapunov_2(L, gamma, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code verifies a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it verifies that the Lyapunov (or potential/energy) function
\[V_n \triangleq (2n + 1) L \left(f(x_n) - f_\star\right) + n(n+2) \|\nabla f(x_n)\|^2 + L^2 \|x_n - x_\star\|^2\]is decreasing along all trajectories and all smooth convex function \(f\) (i.e., in the worst-case):
\[V_{n+1} \leqslant V_n,\]where \(x_{n+1}\) is obtained from a gradient step from \(x_{n}\) with fixed step-size \(\gamma=\frac{1}{L}\).
Algorithm: Onte iteration of radient descent is described by
\[x_{n+1} = x_n - \gamma \nabla f(x_n),\]where \(\gamma\) is a step-size.
Theoretical guarantee: The theoretical guarantee can be found in [1, Theorem 3]:
\[V_{n+1} - V_n \leqslant 0,\]when \(\gamma=\frac{1}{L}\).
References: The detailed potential function and SDP approach can be found in [1].
- Parameters
L (float) – the smoothness parameter.
gamma (float) – the step-size.
n (int) – current iteration number.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Examples
>>> L = 1 >>> pepit_tau, theoretical_tau = wc_gradient_descent_lyapunov_2(L=L, gamma=1 / L, n=10, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 4x4 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 6 scalar constraint(s) ... function 1 : 6 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 1.894425729310791e-17 *** Example file: worst-case performance of gradient descent with fixed step size for a given Lyapunov function*** PEPit guarantee: V_(n+1) - V_(n) <= 1.89443e-17 Theoretical guarantee: V_(n+1) - V_(n) <= 0.0
Accelerated gradient method
- PEPit.examples.potential_functions.wc_accelerated_gradient_method(L, gamma, lam, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code verifies a worst-case guarantee for an accelerated gradient method. That is, it verifies that the Lyapunov (or potential/energy) function
\[V_n \triangleq \lambda_n^2 (f(x_n) - f_\star) + \frac{L}{2} \|z_n - x_\star\|^2\]is decreasing along all trajectories and all smooth convex function \(f\) (i.e., in the worst-case):
\[V_{n+1} \leqslant V_n,\]where \(x_{n+1}\), \(z_{n+1}\), and \(\lambda_{n+1}\) are obtained from one iteration of the accelerated gradient method below, from some arbitrary \(x_{n}\), \(z_{n}\), and \(\lambda_{n}\).
Algorithm: One iteration of accelerated gradient method is described by
\[\begin{split}\begin{eqnarray} \text{Set: }\lambda_{n+1} & = & \frac{1}{2} \left(1 + \sqrt{4\lambda_n^2 + 1}\right), \tau_n & = & \frac{1}{\lambda_{n+1}}, \text{ and } \eta_n & = & \frac{\lambda_{n+1}^2 - \lambda_{n}^2}{L} \\ y_n & = & (1 - \tau_n) x_n + \tau_n z_n,\\ z_{n+1} & = & z_n - \eta_n \nabla f(y_n), \\ x_{n+1} & = & y_n - \gamma \nabla f(y_n). \end{eqnarray}\end{split}\]Theoretical guarantee: The following worst-case guarantee can be found in e.g., [2, Theorem 5.3]:
\[V_{n+1} - V_n \leqslant 0,\]when \(\gamma=\frac{1}{L}\).
References: The potential can be found in the historical [1]; and in more recent works, e.g., [2, 3].
[1] Y. Nesterov (1983). A method for solving the convex programming problem with convergence rate :math:`O(1/k^2). In Dokl. akad. nauk Sssr (Vol. 269, pp. 543-547). <http://www.mathnet.ru/links/9bcb158ed2df3d8db3532aafd551967d/dan46009.pdf>`_
- Parameters
L (float) – the smoothness parameter.
gamma (float) – the step-size.
lam (float) – the initial value for sequence \((\lambda_t)_t\).
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Examples
>>> L = 1 >>> pepit_tau, theoretical_tau = wc_accelerated_gradient_method(L=L, gamma=1 / L, lam=10., verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 6x6 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 12 scalar constraint(s) ... function 1 : 12 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 5.264872499157039e-14 *** Example file: worst-case performance of accelerated gradient method for a given Lyapunov function*** PEPit guarantee: V_(n+1) - V_n <= 5.26487e-14 Theoretical guarantee: V_(n+1) - V_n <= 0.0
Inexact proximal methods
Accelerated inexact forward backward
- PEPit.examples.inexact_proximal_methods.wc_accelerated_inexact_forward_backward(L, zeta, n, verbose=1)[source]
Consider the composite convex minimization problem,
\[F_\star \triangleq \min_x \left\{F(x) \equiv f(x) + g(x) \right\},\]where \(f\) is \(L\)-smooth convex, and \(g\) is closed, proper, and convex. We further assume that one can readily evaluate the gradient of \(f\) and that one has access to an inexact version of the proximal operator of \(g\) (whose level of accuracy is controlled by some parameter \(\zeta\in (0,1)\)).
This code computes a worst-case guarantee for an accelerated inexact forward backward (AIFB) method (a.k.a., inexact accelerated proximal gradient method). That is, it computes the smallest possible \(\tau(n, L, \zeta)\) such that the guarantee
\[F(x_n) - F(x_\star) \leqslant \tau(n, L, \zeta) \|x_0 - x_\star\|^2,\]is valid, where \(x_n\) is the output of the IAFB, and where \(x_\star\) is a minimizer of \(F\).
In short, for given values of \(n\), \(L\) and \(\zeta\), \(\tau(n, L, \zeta)\) is computed as the worst-case value of \(F(x_n) - F(x_\star)\) when \(\|x_0 - x_\star\|^2 \leqslant 1\).
Algorithm: Let \(t\in\{0,1,\ldots,n\}\). The method is presented in, e.g., [1, Algorithm 3.1]. For simplicity, we instantiate [1, Algorithm 3.1] using simple values for its parameters and for the problem setting (in the notation of [1]: \(A_0\triangleq 0\), \(\mu=0\), \(\xi_t \triangleq0\), \(\sigma_t\triangleq 0\), \(\lambda_t \triangleq\gamma\triangleq\tfrac{1}{L}\), \(\zeta_t\triangleq\zeta\), \(\eta \triangleq (1-\zeta^2) \gamma\)), and without backtracking, arriving to:
\begin{eqnarray} A_{t+1} && = A_t + \frac{\eta+\sqrt{\eta^2+4\eta A_t}}{2},\\ y_{t} && = x_t + \frac{A_{t+1}-A_t}{A_{t+1}} (z_t-x_t),\\ (x_{t+1},v_{t+1}) && \approx_{\varepsilon_t} \left(\mathrm{prox}_{\gamma g}\left(y_t-\gamma \nabla f(y_t)\right),\, \mathrm{prox}_{ g^*/\gamma}\left(\frac{y_t-\gamma \nabla f(y_t)}{\gamma}\right)\right),\\ && \text{with } \varepsilon_t = \frac{\zeta^2\gamma^2}{2}\|v_{t+1}+\nabla f(y_t) \|^2,\\ z_{t+1} && = z_t-(A_{t+1}-A_t)\left(v_{t+1}+\nabla f(y_t)\right),\\ \end{eqnarray}where \(\{\varepsilon_t\}_{t\geqslant 0}\) is some sequence of accuracy parameters (whose values are fixed within the algorithm as it runs), and \(\{A_t\}_{t\geqslant 0}\) is some scalar sequence of parameters for the method (typical of accelerated methods).
The line with “\(\approx_{\varepsilon}\)” can be described as the pair \((x_{t+1},v_{t+1})\) satisfying an accuracy requirement provided by [1, Definition 2.3]. More precisely (but without providing any intuition), it requires the existence of some \(w_{t+1}\) such that \(v_{t+1} \in \partial g(w_{t+1})\) and for which the accuracy requirement
\[\gamma^2 || x_{t+1} - y_t + \gamma v_{t+1} ||^2 + \gamma (g(x_{t+1}) - g(w_{t+1}) - v_{t+1}(x_{t+1} - w_{t+1})) \leqslant \varepsilon_t,\]is valid.
Theoretical guarantee: A theoretical upper bound is obtained in [1, Corollary 3.5]:
\[F(x_n)-F_\star\leqslant \frac{2L \|x_0-x_\star\|^2}{(1-\zeta^2)n^2}.\]References: The method and theoretical result can be found in [1, Section 3].
- Parameters
L (float) – smoothness parameter.
zeta (float) – relative approximation parameter in (0,1).
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_inexact_forward_backward(L=1.3, zeta=.45, n=11, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 59x59 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 156 scalar constraint(s) ... function 1 : 156 scalar constraint(s) added function 2 : Adding 528 scalar constraint(s) ... function 2 : 528 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.018869997698251897 *** Example file: worst-case performance of an inexact accelerated forward backward method *** PEPit guarantee: F(x_n)-F_* <= 0.01887 ||x_0 - x_*||^2 Theoretical guarantee: F(x_n)-F_* <= 0.0269437 ||x_0 - x_*||^2
Partially inexact Douglas Rachford splitting
- PEPit.examples.inexact_proximal_methods.wc_partially_inexact_douglas_rachford_splitting(mu, L, n, gamma, sigma, verbose=1)[source]
Consider the composite strongly convex minimization problem,
\[F_\star \triangleq \min_x \left\{ F(x) \equiv f(x) + g(x) \right\}\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex, and \(g\) is closed convex and proper. We denote by \(x_\star = \arg\min_x F(x)\) the minimizer of \(F\). The (exact) proximal operator of \(g\), and an approximate version of the proximal operator of \(f\) are assumed to be available.
This code computes a worst-case guarantee for a partially inexact Douglas-Rachford Splitting (DRS). That is, it computes the smallest possible \(\tau(n,L,\mu,\sigma,\gamma)\) such that the guarantee
\[\|z_{n} - z_\star\|^2 \leqslant \tau(n,L,\mu,\sigma,\gamma) \|z_0 - z_\star\|^2\]is valid, where \(z_n\) is the output of the DRS (initiated at \(x_0\)), \(z_\star\) is its fixed point, \(\gamma\) is a step-size, and \(\sigma\) is the level of inaccuracy.
Algorithm: The partially inexact Douglas-Rachford splitting under consideration is described by
\begin{eqnarray} x_{t} && \approx_{\sigma} \arg\min_x \left\{ \gamma f(x)+\frac{1}{2} \|x-z_t\|^2 \right\},\\ y_{t} && = \arg\min_y \left\{ \gamma g(y)+\frac{1}{2} \|y-(x_t-\gamma \nabla f(x_t))\|^2 \right\},\\ z_{t+1} && = z_t + y_t - x_t. \end{eqnarray}More precisely, the notation “\(\approx_{\sigma}\)” correspond to require the existence of some \(e_{t}\) such that
\begin{eqnarray} x_{t} && = z_t - \gamma (\nabla f(x_t) - e_t),\\ y_{t} && = \arg\min_y \left\{ \gamma g(y)+\frac{1}{2} \|y-(x_t-\gamma \nabla f(x_t))\|^2 \right\},\\ && \text{with } \|e_t\|^2 \leqslant \frac{\sigma^2}{\gamma^2}\|y_{t} - z_t + \gamma \nabla f(x_t) \|^2,\\ z_{t+1} && = z_t + y_t - x_t. \end{eqnarray}Theoretical guarantee: The following tight theoretical bound is due to [2, Theorem 5.1]:
\[\|z_{n} - z_\star\|^2 \leqslant \max\left(\frac{1 - \sigma + \gamma \mu \sigma}{1 - \sigma + \gamma \mu}, \frac{\sigma + (1 - \sigma) \gamma L}{1 + (1 - \sigma) \gamma L)}\right)^{2n} \|z_0 - z_\star\|^2.\]References: The method is from [1], its PEP formulation and the worst-case analysis from [2], see [2, Section 4.4] for more details.
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
n (int) – number of iterations.
gamma (float) – the step-size.
sigma (float) – noise parameter.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_partially_inexact_douglas_rachford_splitting(mu=.1, L=5, n=5, gamma=1.4, sigma=.2, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 18x18 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 40 scalar constraint(s) ... function 1 : 40 scalar constraint(s) added function 2 : Adding 30 scalar constraint(s) ... function 2 : 30 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.28120549805153155 *** Example file: worst-case performance of the partially inexact Douglas Rachford splitting *** PEPit guarantee: ||z_n - z_*||^2 <= 0.281205 ||z_0 - z_*||^2 Theoretical guarantee: ||z_n - z_*||^2 <= 0.281206 ||z_0 - z_*||^2
Relatively inexact proximal point
- PEPit.examples.inexact_proximal_methods.wc_relatively_inexact_proximal_point_algorithm(n, gamma, sigma, verbose=1)[source]
Consider the (possibly non-smooth) convex minimization problem,
\[f_\star \triangleq \min_x f(x)\]where \(f\) is closed, convex, and proper. We denote by \(x_\star\) some optimal point of \(f\) (hence \(0\in\partial f(x_\star)\)). We further assume that one has access to an inexact version of the proximal operator of \(f\), whose level of accuracy is controlled by some parameter \(\sigma\geqslant 0\).
This code computes a worst-case guarantee for an inexact proximal point method. That is, it computes the smallest possible \(\tau(n, \gamma, \sigma)\) such that the guarantee
\[f(x_n) - f(x_\star) \leqslant \tau(n, \gamma, \sigma) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the method, \(\gamma\) is some step-size, and \(\sigma\) is the level of accuracy of the approximate proximal point oracle.
Algorithm: The approximate proximal point method under consideration is described by
\[x_{t+1} \approx_{\sigma} \arg\min_x \left\{ \gamma f(x)+\frac{1}{2} \|x-x_t\|^2 \right\},\]where the notation “\(\approx_{\sigma}\)” corresponds to require the existence of some vector \(s_{t+1}\in\partial f(x_{t+1})\) and \(e_{t+1}\) such that
\[x_{t+1} = x_t - \gamma s_{t+1} + e_{t+1} \quad \quad \text{with }\|e_{t+1}\|^2 \leqslant \sigma^2\|x_{t+1} - x_t\|^2.\]We note that the case \(\sigma=0\) implies \(e_{t+1}=0\) and this operation reduces to a standard proximal step with step-size \(\gamma\).
Theoretical guarantee: The following (empirical) upper bound is provided in [1, Section 3.5.1],
\[f(x_n) - f(x_\star) \leqslant \frac{1 + \sigma}{4 \gamma n^{\sqrt{1 - \sigma^2}}}\|x_0 - x_\star\|^2.\]References: The precise formulation is presented in [1, Section 3.5.1].
- Parameters
n (int) – number of iterations.
gamma (float) – the step-size.
sigma (float) – accuracy parameter of the proximal point computation.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_relatively_inexact_proximal_point_algorithm(n=8, gamma=10, sigma=.65, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 18x18 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 88 scalar constraint(s) ... function 1 : 88 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal_inaccurate (solver: SCS); optimal value: 0.007753853579615959 *** Example file: worst-case performance of an inexact proximal point method in distance in function values *** PEPit guarantee: f(x_n) - f(x_*) <= 0.00775385 ||x_0 - x_*||^2 Theoretical guarantee: f(x_n) - f(x_*) <= 0.00849444 ||x_0 - x_*||^2
Adaptive methods
Polyak steps in distance to optimum
- PEPit.examples.adaptive_methods.wc_polyak_steps_in_distance_to_optimum(L, mu, gamma, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex, and \(x_\star=\arg\min_x f(x)\).
This code computes a worst-case guarantee for a variant of a gradient method relying on Polyak step-sizes (PS). That is, it computes the smallest possible \(\tau(L, \mu, \gamma)\) such that the guarantee
\[\|x_{t+1} - x_\star\|^2 \leqslant \tau(L, \mu, \gamma) \|x_{t} - x_\star\|^2\]is valid, where \(x_t\) is the output of the gradient method with PS and \(\gamma\) is the effective value of the step-size of the gradient method with PS.
In short, for given values of \(L\), \(\mu\), and \(\gamma\), \(\tau(L, \mu, \gamma)\) is computed as the worst-case value of \(\|x_{t+1} - x_\star\|^2\) when \(\|x_{t} - x_\star\|^2 \leqslant 1\).
Algorithm: Gradient descent is described by
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size. The Polyak step-size rule under consideration here corresponds to choosing of \(\gamma\) satisfying:
\[\gamma \|\nabla f(x_t)\|^2 = 2 (f(x_t) - f_\star).\]Theoretical guarantee: The gradient method with the variant of Polyak step-sizes under consideration enjoys the tight theoretical guarantee [1, Proposition 1]:
\[\|x_{t+1} - x_\star\|^2 \leqslant \tau(L, \mu, \gamma) \|x_{t} - x_\star\|^2,\]where \(\gamma\) is the effective step-size used at iteration \(t\) and
\begin{eqnarray} \tau(L, \mu, \gamma) & = & \left\{\begin{array}{ll} \frac{(\gamma L-1)(1-\gamma \mu)}{\gamma(L+\mu)-1} & \text{if } \gamma\in[\tfrac{1}{L},\tfrac{1}{\mu}],\\ 0 & \text{otherwise.} \end{array}\right. \end{eqnarray}References:
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
gamma (float) – the step-size.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> L = 1 >>> mu = 0.1 >>> gamma = 2 / (L + mu) >>> pepit_tau, theoretical_tau = wc_polyak_steps_in_distance_to_optimum(L=L, mu=mu, gamma=gamma, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 4x4 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (2 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 6 scalar constraint(s) ... function 1 : 6 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.66942148764241 *** Example file: worst-case performance of Polyak steps *** PEPit guarantee: ||x_1 - x_*||^2 <= 0.669421 ||x_0 - x_*||^2 Theoretical guarantee: ||x_1 - x_*||^2 <= 0.669421 ||x_0 - x_*||^2
Polyak steps in function value
- PEPit.examples.adaptive_methods.wc_polyak_steps_in_function_value(L, mu, gamma, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex, and \(x_\star=\arg\min_x f(x)\).
This code computes a worst-case guarantee for a variant of a gradient method relying on Polyak step-sizes. That is, it computes the smallest possible \(\tau(L, \mu, \gamma)\) such that the guarantee
\[f(x_{t+1}) - f_\star \leqslant \tau(L, \mu, \gamma) (f(x_t) - f_\star)\]is valid, where \(x_t\) is the output of the gradient method with PS and \(\gamma\) is the effective value of the step-size of the gradient method.
In short, for given values of \(L\), \(\mu\), and \(\gamma\), \(\tau(L, \mu, \gamma)\) is computed as the worst-case value of \(f(x_{t+1})-f_\star\) when \(f(x_t)-f_\star \leqslant 1\).
Algorithm: Gradient descent is described by
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size. The Polyak step-size rule under consideration here corresponds to choosing of \(\gamma\) satisfying:
\[\|\nabla f(x_t)\|^2 = 2 L (2 - L \gamma) (f(x_t) - f_\star).\]Theoretical guarantee: The gradient method with the variant of Polyak step-sizes under consideration enjoys the tight theoretical guarantee [1, Proposition 2]:
\[f(x_{t+1})-f_\star \leqslant \tau(L,\mu,\gamma) (f(x_{t})-f_\star),\]where \(\gamma\) is the effective step-size used at iteration \(t\) and
\begin{eqnarray} \tau(L,\mu,\gamma) & = & \left\{\begin{array}{ll} (\gamma L - 1) (L \gamma (3 - \gamma (L + \mu)) - 1) & \text{if } \gamma\in[\tfrac{1}{L},\tfrac{2L-\mu}{L^2}],\\ 0 & \text{otherwise.} \end{array}\right. \end{eqnarray}References:
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
gamma (float) – the step-size.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> L = 1 >>> mu = 0.1 >>> gamma = 2 / (L + mu) >>> pepit_tau, theoretical_tau = wc_polyak_steps_in_function_value(L=L, mu=mu, gamma=gamma, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 4x4 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (2 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 6 scalar constraint(s) ... function 1 : 6 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.6694215432773613 *** Example file: worst-case performance of Polyak steps *** PEPit guarantee: f(x_1) - f_* <= 0.669422 (f(x_0) - f_*) Theoretical guarantee: f(x_1) - f_* <= 0.669421 (f(x_0) - f_*)
Low dimensional worst-cases scenarios
Inexact gradient
- PEPit.examples.low_dimensional_worst_cases_scenarios.wc_inexact_gradient(L, mu, epsilon, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for an inexact gradient method and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee using \(10\) iterations of the logdet heuristic.
That is, it computes the smallest possible \(\tau(n,L,\mu,\varepsilon)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n,L,\mu,\varepsilon) (f(x_0) - f_\star)\]is valid, where \(x_n\) is the output of the gradient descent with an inexact descent direction, and where \(x_\star\) is the minimizer of \(f\). Then, it looks for a low-dimensional nearly achieving this performance.
The inexact descent direction is assumed to satisfy a relative inaccuracy described by (with \(0 \leqslant \varepsilon \leqslant 1\))
\[\|\nabla f(x_t) - d_t\| \leqslant \varepsilon \|\nabla f(x_t)\|,\]where \(\nabla f(x_t)\) is the true gradient, and \(d_t\) is the approximate descent direction that is used.
Algorithm:
The inexact gradient descent under consideration can be written as
\[x_{t+1} = x_t - \frac{2}{L_{\varepsilon} + \mu_{\varepsilon}} d_t\]where \(d_t\) is the inexact search direction, \(L_{\varepsilon} = (1 + \varepsilon)L\) and \(\mu_{\varepsilon} = (1-\varepsilon) \mu\).
Theoretical guarantee:
A tight worst-case guarantee obtained in [1, Theorem 5.3] or [2, Remark 1.6] is
\[f(x_n) - f_\star \leqslant \left(\frac{L_{\varepsilon} - \mu_{\varepsilon}}{L_{\varepsilon} + \mu_{\varepsilon}}\right)^{2n}(f(x_0) - f_\star ),\]with \(L_{\varepsilon} = (1 + \varepsilon)L\) and \(\mu_{\varepsilon} = (1-\varepsilon) \mu\). This guarantee is achieved on one-dimensional quadratic functions.
References:The detailed analyses can be found in [1, 2]. The logdet heuristic is presented in [3].
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong convexity parameter.
epsilon (float) – level of inaccuracy
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_inexact_gradient(L=1, mu=0.1, epsilon=0.1, n=6, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 15x15 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 62 scalar constraint(s) ... function 1 : 62 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.13989778793516514 (PEPit) Postprocessing: 2 eigenvalue(s) > 1.7005395180119392e-05 before dimension reduction (PEPit) Calling SDP solver (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.1398878008962302 (PEPit) Postprocessing: 2 eigenvalue(s) > 5.283608596989854e-06 after 1 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.13988778337004493 (PEPit) Postprocessing: 2 eigenvalue(s) > 5.335098252373141e-06 after 2 dimension reduction step(s) (PEPit) Solver status: optimal (solver: SCS); objective value: 0.1398927512487368 (PEPit) Postprocessing: 2 eigenvalue(s) > 1.2372028101610534e-05 after 3 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.13988824650439619 (PEPit) Postprocessing: 2 eigenvalue(s) > 2.006867894032787e-05 after 4 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.13988779568391294 (PEPit) Postprocessing: 2 eigenvalue(s) > 5.416953129163531e-06 after 5 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.1398889451757595 (PEPit) Postprocessing: 2 eigenvalue(s) > 3.983502472713177e-05 after 6 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.13988780180833413 (PEPit) Postprocessing: 2 eigenvalue(s) > 5.4785759855262395e-06 after 7 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.13988778218159367 (PEPit) Postprocessing: 2 eigenvalue(s) > 5.360843247635456e-06 after 8 dimension reduction step(s) (PEPit) Solver status: optimal (solver: SCS); objective value: 0.13988478099895965 (PEPit) Postprocessing: 2 eigenvalue(s) > 9.59529914206238e-06 after 9 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.13988791535665998 (PEPit) Postprocessing: 2 eigenvalue(s) > 9.339529753603287e-06 after 10 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.13988791535665998 (PEPit) Postprocessing: 2 eigenvalue(s) > 9.339529753603287e-06 after dimension reduction *** Example file: worst-case performance of inexact gradient *** PEPit example: f(x_n)-f_* == 0.139888 (f(x_0)-f_*) Theoretical guarantee: f(x_n)-f_* <= 0.139731 (f(x_0)-f_*)
Non-convex gradient descent
- PEPit.examples.low_dimensional_worst_cases_scenarios.wc_gradient_descent(L, gamma, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth.
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\), and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee. That is, it computes the smallest possible \(\tau(n, L, \gamma)\) such that the guarantee
\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \tau(n, L, \gamma) (f(x_0) - f(x_n))\]is valid, where \(x_n\) is the n-th iterates obtained with the gradient method with fixed step-size. Then, it looks for a low-dimensional nearly achieving this performance.
Algorithm: Gradient descent is described as follows, for \(t \in \{ 0, \dots, n-1\}\),
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size and.
Theoretical guarantee: When \(\gamma \leqslant \frac{1}{L}\), an empirically tight theoretical worst-case guarantee is
\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \frac{4}{3}\frac{L}{n} (f(x_0) - f(x_n)),\]see discussions in [1, page 190] and [2].
References:
- Parameters
L (float) – the smoothness parameter.
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> L = 1 >>> gamma = 1 / L >>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=gamma, n=5, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 6 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 30 scalar constraint(s) ... function 1 : 30 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.2666769474847614 (PEPit) Postprocessing: 6 eigenvalue(s) > 0.0 before dimension reduction (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); objective value: 0.266667996380269 (PEPit) Postprocessing: 2 eigenvalue(s) > 1.0527850440294492e-05 after 1 dimension reduction step(s) (PEPit) Solver status: optimal (solver: SCS); objective value: 0.2666668138016744 (PEPit) Postprocessing: 2 eigenvalue(s) > 2.510763274714993e-07 after 2 dimension reduction step(s) (PEPit) Solver status: optimal (solver: SCS); objective value: 0.2666668138016744 (PEPit) Postprocessing: 2 eigenvalue(s) > 2.510763274714993e-07 after dimension reduction *** Example file: worst-case performance of gradient descent with fixed step-size *** PEPit example: min_i ||f'(x_i)||^2 == 0.266667 (f(x_0)-f_*) Theoretical guarantee: min_i ||f'(x_i)||^2 <= 0.266667 (f(x_0)-f_*)
Optimized gradient
- PEPit.examples.low_dimensional_worst_cases_scenarios.wc_optimized_gradient(L, n, verbose=1)[source]
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and convex.
This code computes a worst-case guarantee for optimized gradient method (OGM), and applies the trace heuristic for trying to find a low-dimensional worst-case example on which this guarantee is nearly achieved. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of OGM and where \(x_\star\) is a minimizer of \(f\). Then, it applies the trace heuristic, which allows obtaining a one-dimensional function on which the guarantee is nearly achieved.
Algorithm: The optimized gradient method is described by
\begin{eqnarray} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t)\\ y_{t+1} & = & x_{t+1} + \frac{\theta_{t}-1}{\theta_{t+1}}(x_{t+1}-x_t)+\frac{\theta_{t}}{\theta_{t+1}}(x_{t+1}-y_t), \end{eqnarray}with
\begin{eqnarray} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}. \end{eqnarray}Theoretical guarantee: The tight theoretical guarantee can be found in [2, Theorem 2]:
\[f(x_n)-f_\star \leqslant \frac{L\|x_0-x_\star\|^2}{2\theta_n^2}.\]References: The OGM was developed in [1,2]. Low-dimensional worst-case functions for OGM were obtained in [3, 4].
- Parameters
L (float) – the smoothness parameter.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_optimized_gradient(L=3, n=4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 7x7 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 30 scalar constraint(s) ... function 1 : 30 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.07675218017587908 (PEPit) Postprocessing: 5 eigenvalue(s) > 0.00012110342786525262 before dimension reduction (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); objective value: 0.0767421794376856 (PEPit) Postprocessing: 1 eigenvalue(s) > 5.187978263167338e-09 after dimension reduction *** Example file: worst-case performance of optimized gradient method *** PEPit example: f(y_n)-f_* == 0.0767422 ||x_0 - x_*||^2 Theoretical guarantee: f(y_n)-f_* <= 0.0767518 ||x_0 - x_*||^2
Frank Wolfe
- PEPit.examples.low_dimensional_worst_cases_scenarios.wc_frank_wolfe(L, D, n, verbose=1)[source]
Consider the composite convex minimization problem
\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]where \(f_1\) is \(L\)-smooth and convex and where \(f_2\) is a convex indicator function on \(\mathcal{D}\) of diameter at most \(D\).
This code computes a worst-case guarantee for the conditional gradient method, aka Frank-Wolfe method, and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee using \(12\) iterations of the logdet heuristic. That is, it computes the smallest possible \(\tau(n, L)\) such that the guarantee
\[F(x_n) - F(x_\star) \leqslant \tau(n, L) D^2,\]is valid, where x_n is the output of the conditional gradient method, and where \(x_\star\) is a minimizer of \(F\). In short, for given values of \(n\) and \(L\), \(\tau(n, L)\) is computed as the worst-case value of \(F(x_n) - F(x_\star)\) when \(D \leqslant 1\). Then, it looks for a low-dimensional nearly achieving this performance.
Algorithm:
This method was first presented in [1]. A more recent version can be found in, e.g., [2, Algorithm 1]. For \(t \in \{0, \dots, n-1\}\),
\[\begin{split}\begin{eqnarray} y_t & = & \arg\min_{s \in \mathcal{D}} \langle s \mid \nabla f_1(x_t) \rangle, \\ x_{t+1} & = & \frac{t}{t + 2} x_t + \frac{2}{t + 2} y_t. \end{eqnarray}\end{split}\]Theoretical guarantee:
An upper guarantee obtained in [2, Theorem 1] is
\[F(x_n) - F(x_\star) \leqslant \frac{2L D^2}{n+2}.\]References: The algorithm is presented in, among others, [1, 2]. The logdet heuristic is presented in [3].
[1] M .Frank, P. Wolfe (1956). An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2), 95-110.
- Parameters
L (float) – the smoothness parameter.
D (float) – diameter of \(f_2\).
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> pepit_tau, theoretical_tau = wc_frank_wolfe(L=1, D=1, n=10, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 26x26 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 132 scalar constraint(s) ... function 1 : 132 scalar constraint(s) added function 2 : Adding 325 scalar constraint(s) ... function 2 : 325 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.07830185202143693 (PEPit) Postprocessing: 12 eigenvalue(s) > 0.0006226631118848632 before dimension reduction (PEPit) Calling SDP solver (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.07828372031738319 (PEPit) Postprocessing: 11 eigenvalue(s) > 4.365697148503946e-06 after 1 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.07826632525166947 (PEPit) Postprocessing: 11 eigenvalue(s) > 1.2665145818615854e-05 after 2 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.07824094610510846 (PEPit) Postprocessing: 11 eigenvalue(s) > 2.4505278932874855e-05 after 3 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.07820114036570962 (PEPit) Postprocessing: 11 eigenvalue(s) > 4.164155031005524e-05 after 4 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.07823286027467699 (PEPit) Postprocessing: 10 eigenvalue(s) > 9.73301991908838e-05 after 5 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.07823446697003811 (PEPit) Postprocessing: 10 eigenvalue(s) > 0.00011791962010861412 after 6 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.07823446697003811 (PEPit) Postprocessing: 10 eigenvalue(s) > 0.00011791962010861412 after dimension reduction *** Example file: worst-case performance of the Conditional Gradient (Frank-Wolfe) in function value *** PEPit example: f(x_n)-f_* == 0.0782345 ||x0 - xs||^2 Theoretical guarantee: f(x_n)-f_* <= 0.166667 ||x0 - xs||^2
Proximal point
- PEPit.examples.low_dimensional_worst_cases_scenarios.wc_proximal_point(alpha, n, verbose=1)[source]
Consider the monotone inclusion problem
\[\mathrm{Find}\, x:\, 0\in Ax,\]where \(A\) is maximally monotone. We denote \(J_A = (I + A)^{-1}\) the resolvents of \(A\).
This code computes a worst-case guarantee for the proximal point method, and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee using the trace heuristic.
That is, it computes the smallest possible \(\tau(n, \alpha)\) such that the guarantee
\[\|x_n - x_{n-1}\|^2 \leqslant \tau(n, \alpha) \|x_0 - x_\star\|^2,\]is valid, where \(x_\star\) is such that \(0 \in Ax_\star\). Then, it looks for a low-dimensional nearly achieving this performance.
Algorithm: The proximal point algorithm for monotone inclusions is described as follows, for \(t \in \{ 0, \dots, n-1\}\),
\[x_{t+1} = J_{\alpha A}(x_t),\]where \(\alpha\) is a step-size.
Theoretical guarantee: A tight theoretical guarantee can be found in [1, section 4].
\[\|x_n - x_{n-1}\|^2 \leqslant \frac{\left(1 - \frac{1}{n}\right)^{n - 1}}{n} \|x_0 - x_\star\|^2.\]Reference:
- Parameters
alpha (float) – the step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value.
theoretical_tau (float) – theoretical value.
Example
>>> pepit_tau, theoretical_tau = wc_proximal_point(alpha=2.2, n=11, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 13x13 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 132 scalar constraint(s) ... function 1 : 132 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.03504735907840766 (PEPit) Postprocessing: 2 eigenvalue(s) > 1.885183851963194e-06 before dimension reduction (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); objective value: 0.03503739338571882 (PEPit) Postprocessing: 2 eigenvalue(s) > 1.9044504527414672e-06 after dimension reduction *** Example file: worst-case performance of the Proximal Point Method*** PEPit example: ||x(n) - x(n-1)||^2 == 0.0350374 ||x0 - xs||^2 Theoretical guarantee: ||x(n) - x(n-1)||^2 <= 0.0350494 ||x0 - xs||^2
Halpern iteration
- PEPit.examples.low_dimensional_worst_cases_scenarios.wc_halpern_iteration(n, verbose=1)[source]
Consider the fixed point problem
\[\mathrm{Find}\, x:\, x = Ax,\]where \(A\) is a non-expansive operator, that is a \(L\)-Lipschitz operator with \(L=1\).
This code computes a worst-case guarantee for the Halpern Iteration, and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee. That is, it computes the smallest possible \(\tau(n)\) such that the guarantee
\[\|x_n - Ax_n\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the Halpern iteration, and \(x_\star\) the fixed point of \(A\).
In short, for a given value of \(n\), \(\tau(n)\) is computed as the worst-case value of \(\|x_n - Ax_n\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\). Then, it looks for a low-dimensional nearly achieving this performance.
Algorithm: The Halpern iteration can be written as
\[x_{t+1} = \frac{1}{t + 2} x_0 + \left(1 - \frac{1}{t + 2}\right) Ax_t.\]Theoretical guarantee: A tight worst-case guarantee for Halpern iteration can be found in [1, Theorem 2.1]:
\[\|x_n - Ax_n\|^2 \leqslant \left(\frac{2}{n+1}\right)^2 \|x_0 - x_\star\|^2.\]References: The detailed approach and tight bound are available in [1].
- Parameters
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_halpern_iteration(n=10, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 13x13 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 132 scalar constraint(s) ... function 1 : 132 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.033076981475854986 (PEPit) Postprocessing: 11 eigenvalue(s) > 2.538373915093237e-06 before dimension reduction (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); objective value: 0.03306531836320572 (PEPit) Postprocessing: 2 eigenvalue(s) > 0.00010453609338097841 after 1 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.0330736415198303 (PEPit) Postprocessing: 2 eigenvalue(s) > 4.3812352924839906e-05 after 2 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.03307313275765859 (PEPit) Postprocessing: 2 eigenvalue(s) > 4.715648695840045e-05 after 3 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.03307313275765859 (PEPit) Postprocessing: 2 eigenvalue(s) > 4.715648695840045e-05 after dimension reduction *** Example file: worst-case performance of Halpern Iterations *** PEPit example: ||xN - AxN||^2 == 0.0330731 ||x0 - x_*||^2 Theoretical guarantee: ||xN - AxN||^2 <= 0.0330579 ||x0 - x_*||^2
Alternate projections
- PEPit.examples.low_dimensional_worst_cases_scenarios.wc_alternate_projections(n, verbose=1)[source]
Consider the convex feasibility problem:
\[\mathrm{Find}\, x\in Q_1\cap Q_2\]where \(Q_1\) and \(Q_2\) are two closed convex sets.
This code computes a worst-case guarantee for the alternate projection method, and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee. That is, it computes the smallest possible \(\tau(n)\) such that the guarantee
\[\|\mathrm{Proj}_{Q_1}(x_n)-\mathrm{Proj}_{Q_2}(x_n)\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the alternate projection method, and \(x_\star\in Q_1\cap Q_2\) is a solution to the convex feasibility problem.
In short, for a given value of \(n\), \(\tau(n)\) is computed as the worst-case value of \(\|\mathrm{Proj}_{Q_1}(x_n)-\mathrm{Proj}_{Q_2}(x_n)\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\). Then, it looks for a low-dimensional nearly achieving this performance.
Algorithm: The alternate projection method can be written as
\[\begin{split}\begin{eqnarray} y_{t+1} & = & \mathrm{Proj}_{Q_1}(x_t), \\ x_{t+1} & = & \mathrm{Proj}_{Q_2}(y_{t+1}). \end{eqnarray}\end{split}\]References: The first results on this method are due to [1]. Its translation in PEPs is due to [2].
- Parameters
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (None) – no theoretical value.
Example
>>> pepit_tau, theoretical_tau = wc_alternate_projections(n=10, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 24x24 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 144 scalar constraint(s) ... function 1 : 144 scalar constraint(s) added function 2 : Adding 121 scalar constraint(s) ... function 2 : 121 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.018858674370385117 (PEPit) Postprocessing: 2 eigenvalue(s) > 0.0003128757392530764 before dimension reduction (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); objective value: 0.018851597249912744 (PEPit) Postprocessing: 2 eigenvalue(s) > 7.314172662475898e-06 after 1 dimension reduction step(s) (PEPit) Solver status: optimal (solver: SCS); objective value: 0.018851597249912744 (PEPit) Postprocessing: 2 eigenvalue(s) > 7.314172662475898e-06 after dimension reduction *** Example file: worst-case performance of the alternate projection method *** PEPit example: ||Proj_Q1 (xn) - Proj_Q2 (xn)||^2 == 0.0188516 ||x0 - x_*||^2
Averaged projections
- PEPit.examples.low_dimensional_worst_cases_scenarios.wc_averaged_projections(n, verbose=1)[source]
Consider the convex feasibility problem:
\[\mathrm{Find}\, x\in Q_1\cap Q_2\]where \(Q_1\) and \(Q_2\) are two closed convex sets.
This code computes a worst-case guarantee for the averaged projection method, and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee. That is, it computes the smallest possible \(\tau(n)\) such that the guarantee
\[\|\mathrm{Proj}_{Q_1}(x_n)-\mathrm{Proj}_{Q_2}(x_n)\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the averaged projection method, and \(x_\star\in Q_1\cap Q_2\) is a solution to the convex feasibility problem.
In short, for a given value of \(n\), \(\tau(n)\) is computed as the worst-case value of \(\|\mathrm{Proj}_{Q_1}(x_n)-\mathrm{Proj}_{Q_2}(x_n)\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\). Then, it looks for a low-dimensional nearly achieving this performance.
Algorithm: The averaged projection method can be written as
\[\begin{eqnarray} x_{t+1} & = & \frac{1}{2} \left(\mathrm{Proj}_{Q_1}(x_t) + \mathrm{Proj}_{Q_2}(x_t)\right). \end{eqnarray}\]- Parameters
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (None) – no theoretical value.
Example
>>> pepit_tau, theoretical_tau = wc_averaged_projections(n=10, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 25x25 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 144 scalar constraint(s) ... function 1 : 144 scalar constraint(s) added function 2 : Adding 144 scalar constraint(s) ... function 2 : 144 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.06845454756941292 (PEPit) Postprocessing: 2 eigenvalue(s) > 0.00014022393949281894 before dimension reduction (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); objective value: 0.06844459892544441 (PEPit) Postprocessing: 2 eigenvalue(s) > 7.442958512820225e-07 after 1 dimension reduction step(s) (PEPit) Solver status: optimal (solver: SCS); objective value: 0.06844459892544441 (PEPit) Postprocessing: 2 eigenvalue(s) > 7.442958512820225e-07 after dimension reduction *** Example file: worst-case performance of the averaged projection method *** PEPit example: ||Proj_Q1 (xn) - Proj_Q2 (xn)||^2 == 0.0684446 ||x0 - x_*||^2
Dykstra
- PEPit.examples.low_dimensional_worst_cases_scenarios.wc_dykstra(n, verbose=1)[source]
Consider the convex feasibility problem:
\[\mathrm{Find}\, x\in Q_1\cap Q_2\]where \(Q_1\) and \(Q_2\) are two closed convex sets.
This code computes a worst-case guarantee for the Dykstra projection method, and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee. That is, it computes the smallest possible \(\tau(n)\) such that the guarantee
\[\|\mathrm{Proj}_{Q_1}(x_n)-\mathrm{Proj}_{Q_2}(x_n)\|^2 \leqslant \tau(n) \|x_0 - x_\star\|^2\]is valid, where \(x_n\) is the output of the Dykstra projection method, and \(x_\star\in Q_1\cap Q_2\) is a solution to the convex feasibility problem.
In short, for a given value of \(n\), \(\tau(n)\) is computed as the worst-case value of \(\|\mathrm{Proj}_{Q_1}(x_n)-\mathrm{Proj}_{Q_2}(x_n)\|^2\) when \(\|x_0 - x_\star\|^2 \leqslant 1\). Then, it looks for a low-dimensional nearly achieving this performance.
Algorithm: The Dykstra projection method can be written as
\[\begin{split}\begin{eqnarray} y_{t} & = & \mathrm{Proj}_{Q_1}(x_t+p_t), \\ p_{t+1} & = & x_t + p_t - y_t,\\ x_{t+1} & = & \mathrm{Proj}_{Q_2}(y_t+q_t),\\ q_{t+1} & = & y_t + q_t - x_{t+1}. \end{eqnarray}\end{split}\]References: This method is due to [1].
- Parameters
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (None) – no theoretical value.
Example
>>> pepit_tau, theoretical_tau = wc_dykstra(n=10, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 24x24 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 2 function(s) function 1 : Adding 144 scalar constraint(s) ... function 1 : 144 scalar constraint(s) added function 2 : Adding 121 scalar constraint(s) ... function 2 : 121 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal_inaccurate (solver: SCS); optimal value: 0.020649148184166164 (PEPit) Postprocessing: 3 eigenvalue(s) > 0.003245910668057083 before dimension reduction (PEPit) Calling SDP solver (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.02124334648210737 (PEPit) Postprocessing: 3 eigenvalue(s) > 0.002134191248999246 after 1 dimension reduction step(s) (PEPit) Solver status: optimal_inaccurate (solver: SCS); objective value: 0.02124334648210737 (PEPit) Postprocessing: 3 eigenvalue(s) > 0.002134191248999246 after dimension reduction *** Example file: worst-case performance of the Dykstra projection method *** PEPit example: ||Proj_Q1 (xn) - Proj_Q2 (xn)||^2 == 0.0212433 ||x0 - x_*||^2
Continuous-time models
Gradient flow for strongly convex functions
- PEPit.examples.continuous_time_models.wc_gradient_flow_strongly_convex(mu, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(\mu\)-strongly convex.
This code computes a worst-case guarantee for a gradient flow. That is, it computes the smallest possible \(\tau(\mu)\) such that the guarantee
\[\frac{d}{dt}\mathcal{V}(X_t) \leqslant -\tau(\mu)\mathcal{V}(X_t) ,\]is valid, where \(\mathcal{V}(X_t) = f(X_t) - f(x_\star)\), \(X_t\) is the output of the gradient flow, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(\mu\), \(\tau(\mu)\) is computed as the worst-case value of the derivative \(f(X_t)-f_\star\) when \(f(X_t) - f(x_\star)\leqslant 1\).
Algorithm: For \(t \geqslant 0\),
\[\frac{d}{dt}X_t = -\nabla f(X_t),\]with some initialization \(X_{0}\triangleq x_0\).
Theoretical guarantee:
The following tight guarantee can be found in [1, Proposition 11]:
\[\frac{d}{dt}\mathcal{V}(X_t) \leqslant -2\mu\mathcal{V}(X_t).\]The detailed approach using PEPs is available in [2, Theorem 2.1].
References:
- Parameters
mu (float) – the strong convexity parameter
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_gradient_flow_strongly_convex(mu=0.1, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 3x3 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 2 scalar constraint(s) ... function 1 : 2 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: -0.20000000011533495 *** Example file: worst-case performance of the gradient flow *** PEPit guarantee: d/dt[f(X_t)-f_*] <= -0.2 (f(X_t) - f(x_*)) Theoretical guarantee: d/dt[f(X_t)-f_*] <= -0.2 (f(X_t) - f(x_*))
Gradient flow for convex functions
- PEPit.examples.continuous_time_models.wc_gradient_flow_convex(t, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is convex.
This code computes a worst-case guarantee for a gradient flow. That is, it verifies the following inequality
\[\frac{d}{dt}\mathcal{V}(X_t, t) \leqslant 0,\]is valid, where \(\mathcal{V}(X_t, t) = t(f(X_t) - f(x_\star)) + \frac{1}{2} \|X_t - x_\star\|^2\), \(X_t\) is the output of the gradient flow, and where \(x_\star\) is the minimizer of \(f\). In short, for given values of \(t\), it verifies \(\frac{d}{dt}\mathcal{V}(X_t, t)\leqslant 0\).
Algorithm: For \(t \geqslant 0\),
\[\frac{d}{dt}X_t = -\nabla f(X_t),\]with some initialization \(X_{0}\triangleq x_0\).
Theoretical guarantee:
The following tight guarantee can be found in [1, p. 7]:
\[\frac{d}{dt}\mathcal{V}(X_t, t) \leqslant 0.\]After integrating between \(0\) and \(T\),
\[f(X_T) - f_\star \leqslant \frac{1}{2T}\|x_0 - x_\star\|^2.\]The detailed approach using PEPs is available in [2, Theorem 2.3].
References:
- Parameters
t (float) – time step
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_gradient_flow_convex(t=2.5, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 3x3 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 2 scalar constraint(s) ... function 1 : 2 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 1.910532459863401e-18 *** Example file: worst-case performance of the gradient flow *** PEPit guarantee: d/dt V(X_t) <= 1.91053e-18 Theoretical guarantee: d/dt V(X_t) <= 0.0
Accelerated gradient flow for strongly convex functions
- PEPit.examples.continuous_time_models.wc_accelerated_gradient_flow_strongly_convex(mu, psd=True, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_{x\in\mathbb{R}^d} f(x),\]where \(f\) is \(\mu\)-strongly convex.
This code computes a worst-case guarantee for an accelerated gradient flow. That is, it computes the smallest possible \(\tau(\mu)\) such that the guarantee
\[\frac{d}{dt}\mathcal{V}_{P}(X_t) \leqslant -\tau(\mu)\mathcal{V}_P(X_t) ,\]is valid with
\[\mathcal{V}_{P}(X_t) = f(X_t) - f(x_\star) + (X_t - x_\star, \frac{d}{dt}X_t)^T(P \otimes I_d)(X_t - x_\star, \frac{d}{dt}X_t) ,\]where \(I_d\) is the identity matrix, \(X_t\) is the output of an accelerated gradient flow, and where \(x_\star\) is the minimizer of \(f\).
In short, for given values of \(\mu\), \(\tau(\mu)\) is computed as the worst-case value of the derivative of \(f(X_t)-f_\star\) when \(f(X_t) - f(x_\star)\leqslant 1\).
Algorithm: For \(t \geqslant 0\),
\[\frac{d^2}{dt^2}X_t + 2\sqrt{\mu}\frac{d}{dt}X_t + \nabla f(X_t) = 0,\]with some initialization \(X_{0}\triangleq x_0\).
Theoretical guarantee:
The following tight guarantee for \(P = \frac{1}{2}\begin{pmatrix} \mu & \sqrt{\mu} \\ \sqrt{\mu} & 1\end{pmatrix}\), for which \(\mathcal{V}_{P} \geqslant 0\) can be found in [1, Appendix B], [2, Theorem 4.3]:
\[\frac{d}{dt}\mathcal{V}_P(X_t) \leqslant -\sqrt{\mu}\mathcal{V}_P(X_t).\]For \(P = \begin{pmatrix} \frac{4}{9}\mu & \frac{4}{3}\sqrt{\mu} \\ \frac{4}{3}\sqrt{\mu} & \frac{1}{2}\end{pmatrix}\), for which \(\mathcal{V}_{P}(X_t) \geqslant 0\) along the trajectory, the following tight guarantee can be found in [3, Corollary 2.5],
\[\frac{d}{dt}\mathcal{V}_P(X_t) \leqslant -\frac{4}{3}\sqrt{\mu}\mathcal{V}_P(X_t).\]References:
- Parameters
mu (float) – the strong convexity parameter
psd (boolean) – option for positivity of \(P\) in the Lyapunov function \(\mathcal{V}_{P}\)
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_flow_strongly_convex(mu=0.1, psd=True, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 4x4 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 2 scalar constraint(s) ... function 1 : 2 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: -0.31622776602929215 *** Example file: worst-case performance of an accelerated gradient flow *** PEPit guarantee: d/dt V(X_t,t) <= -0.316228 V(X_t,t) Theoretical guarantee: d/dt V(X_t) <= -0.316228 V(X_t,t)
Accelerated gradient flow for convex functions
- PEPit.examples.continuous_time_models.wc_accelerated_gradient_flow_convex(t, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is convex.
This code computes a worst-case guarantee for an accelerated gradient flow. That is, it verifies the inequality
\[\frac{d}{dt}\mathcal{V}(X_t, t) \leqslant 0 ,\]is valid, where \(\mathcal{V}(X_t, t) = t^2(f(X_t) - f(x_\star)) + 2 \|(X_t - x_\star) + \frac{t}{2}\frac{d}{dt}X_t \|^2\), \(X_t\) is the output of an accelerated gradient flow, and where \(x_\star\) is the minimizer of \(f\).
In short, for given values of \(t\), it verifies \(\frac{d}{dt}\mathcal{V}(X_t, t) \leqslant 0\).
Algorithm: For \(t \geqslant 0\),
\[\frac{d^2}{dt^2}X_t + \frac{3}{t}\frac{d}{dt}X_t + \nabla f(X_t) = 0,\]with some initialization \(X_{0}\triangleq x_0\).
Theoretical guarantee:
The following tight guarantee can be verified in [1, Section 2]:
\[\frac{d}{dt}\mathcal{V}(X_t, t) \leqslant 0.\]After integrating between \(0\) and \(T\),
\[f(X_T) - f_\star \leqslant \frac{2}{T^2}\|x_0 - x_\star\|^2.\]The detailed approach using PEPs is available in [2, Theorem 2.6].
References:
- Parameters
t (float) – time step
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_accelerated_gradient_flow_convex(t=3.4, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 4x4 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (0 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 2 scalar constraint(s) ... function 1 : 2 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: -1.2008648627755779e-18 *** Example file: worst-case performance of an accelerated gradient flow *** PEPit guarantee: d/dt V(X_t,t) <= -1.20086e-18 Theoretical guarantee: d/dt V(X_t) <= 0.0
Tutorials
Contraction rate of gradient descent
- PEPit.examples.tutorials.wc_gradient_descent_contraction(L, mu, gamma, n, verbose=1)[source]
Consider the convex minimization problem
\[f_\star \triangleq \min_x f(x),\]where \(f\) is \(L\)-smooth and \(\mu\)-strongly convex.
This code computes a worst-case guarantee for gradient descent with fixed step-size \(\gamma\). That is, it computes the smallest possible \(\tau(n, L, \mu, \gamma)\) such that the guarantee
\[\| x_n - y_n \|^2 \leqslant \tau(n, L, \mu, \gamma) \| x_0 - y_0 \|^2\]is valid, where \(x_n\) and \(y_n\) are the outputs of the gradient descent method with fixed step-size \(\gamma\), starting respectively from \(x_0\) and \(y_0\).
In short, for given values of \(n\), \(L\), \(\mu\) and \(\gamma\), \(\tau(n, L, \mu \gamma)\) is computed as the worst-case value of \(\| x_n - y_n \|^2\) when \(\| x_0 - y_0 \|^2 \leqslant 1\).
Algorithm: For \(t\in\{0,1,\ldots,n-1\}\), gradient descent is described by
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]where \(\gamma\) is a step-size.
Theoretical guarantee: The tight theoretical guarantee is
\[\| x_n - y_n \|^2 \leqslant \max\{(1-L\gamma)^2,(1-\mu \gamma)^2\}^n\| x_0 - y_0 \|^2,\]which is tight on simple quadratic functions.
- Parameters
L (float) – the smoothness parameter.
mu (float) – the strong-convexity parameter.
gamma (float) – step-size.
n (int) – number of iterations.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> L = 1 >>> pepit_tau, theoretical_tau = wc_gradient_descent_contraction(L=L, mu=0.1, gamma=1 / L, n=1, verbose=1) (PEPit) Setting up the problem: size of the main PSD matrix: 4x4 (PEPit) Setting up the problem: performance measure is minimum of 1 element(s) (PEPit) Setting up the problem: Adding initial conditions and general constraints ... (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added) (PEPit) Setting up the problem: interpolation conditions for 1 function(s) function 1 : Adding 2 scalar constraint(s) ... function 1 : 2 scalar constraint(s) added (PEPit) Compiling SDP (PEPit) Calling SDP solver (PEPit) Solver status: optimal (solver: SCS); optimal value: 0.8100016613979604 *** Example file: worst-case performance of gradient descent with fixed step-sizes in contraction *** PEP-it guarantee: ||x_n - y_n||^2 <= 0.810002 ||x_0 - y_0||^2 Theoretical guarantee: ||x_n - y_n||^2 <= 0.81 ||x_0 - y_0||^2
What’s new in PEPit
What’s new in PEPit 0.1.0
Adding general constraints to your problem.
The methodadd_constraint
has been added to the classPEP
for general constraints not necessarily related to a specific function.For readability of your code, we suggest to use the methodset_initial_condition
when the constraint is the initial one, and the methodadd_constraint
for any other constraint.Adding LMI constraints to your problem.
The method
add_psd_matrix
has been added to the classPEP
and must be used to add LMI constraints to your problem.CVXPY options.
PEPit uses CVXPY to solve the underlying SDP of your problem.CVXPY solver options can be provided to the methodPEP.solve
.Optimizing dimension of the solution.
Thetracetrick
option of the methodPEP.solve
has been replaced bydimension_reduction_heuristic
.Set to None by default, this option can be set to “trace” or “logdet{followed by a number}” to use one of those heuristic.Granularity of the verbose mode has evolved.
The verbose mode of the methodPEP.solve
and of the provided examples files are now integers:0: No verbose at all
1: PEPit information is printed but not CVXPY’s
2: Both PEPit and CVXPY details are printed
Parameters of function classes.
The parameters that characterize a function class must be provided directly as arguments of this function class, not through the dict “param” anymore.Example:PEP.declare_function(function_class=SmoothStronglyConvexFunction, mu=.1, L=1.)
Initializing a Point or an Expression to 0.
null_point
andnull_expression
have been added to the modulePEPit
to facilitate the access to aPoint
or anExpression
initialized to 0.3 new function classes have been added:
ConvexSupportFunction
for convex support functions (see [1])ConvexQGFunction
, for convex and quadratically upper bounded functions (see [2])RsiEbFunction
, for functions verifying lower restricted secant inequality and upper error bound (see [3])
What’s new in PEPit 0.2.0
Adding possibility to set LMI constraints associated to function objects.
The method
add_psd_matrix
has been added to the classFunction
and must be used to add LMI constraints associated to a function.Storing dual values prior to dimension reduction.
Each
Constraint
object receives a dual value in the attribute_dual_value
which can be accessed through the methodeval_dual
. In previous releases, and in case of dimension reduction being activated, the dual values being stored where those of the latest solved problem. From this release, the dual values being stored are always those of the original problem. Note the primal values are those of the last problem providing adversarial example of smallest dimension possible.Creating
PSDMatrix
class.PSDMatrix
class as been added. This doesn’t affect how the methodsadd_psd_matrix
must be used. A user must continue providing a psd matrix under the form of an Iterable ofExpression``s. The latter will be automatically transformed into a ``PSDMatrix
object that contains a_dual_value
attribute and aneval_dual
method as anyConstraint
object.Fixing a minor issue in pep.py.
There was an issue when the Gram matrix G did not need any eigenvalue correction as
eig_threshold
inpep.get_nb_eigenvalues_and_corrected_matrix
where defined as the maximum of an empty list. This issue has been fixed in this release.Eigenvalues are now sorted in decreasing order in the output of the PEP, making it easier to plot low-dimensional worst-case examples (examples of such usages can be found in the exercise repository Learning-Performance-Estimation).
Many new examples were introduced, including for looking for low-dimensional worst-case examples, fixed-point iterations, variational inequalities, and continuous-time dynamics.
Contributing
PEPit is designed for allowing users to easily contribute to add new features to the package. Classes of functions (or operators) as well as black-box oracles can be implemented by following the canvas from respectively PEPit/functions/ (or PEPit/operators/ and PEPit/primitive_steps/).
We encourage authors of research papers presenting novel optimization methods and/or a novel convergence results to submit the corresponding PEPit files in the directory PEPit/examples/.
General guidelines
We kindly ask you follow common guidelines, namely that the provided code:
sticks as much as possible to the PEP8 convention.
is commented with Google style docstring.
is well covered by tests.
is aligned with the documentation.
is also mentioned in the
whatsnew
section of the documentation.
Adding a new function or operator class
To add a new function / operator class, please follow the format used for the other function / operator classes.
In particular:
your class must inherit from the class
Function
and overwrite itsadd_class_constraints
method.the docstring must be complete. In particular, it must contains the list of attributes and arguments as well as an example of usage via the
declare_function
method of the classPEP
. It must also contain a clickable reference to the paper introducing it.
Adding a step / an oracle
To add a new oracle / step,
please add a new file containing the oracle function in PEPit/primitive_steps
.
Remark that transforming the mathematical formulation of an oracle into its PEP equivalent may require additional tricks, see e.g. PEPit/primitive_steps/proximal_step.py, or PEPit/primitive_steps/linear_optimization_step.py.
Please make sure that your docstring contains the mathematical derivation of the latest from the previous.
Adding a new method as an example
We don’t require a specific code format for a new example. However, we ask the associated docstring to be precisely organized as follow:
Define Problem solved (introducing function notations and assumptions).
Name method in boldface formatting.
Introduce performance metric, initial condition and parameters (
performance_metric < tau(parameters) initialization
).Describe method main step and cite reference with specified algorithm.
Provide theoretical result (
Upper/Lower/Tight
in boldface formatting +performance_metric < theoretical_bound initialization
).Reference block containing relevant clickable references (preferably to arxiv with specified version of the paper) in the format: (
First name initial letter
,last name
(YEAR
).Title
.Journal or conference
(Acronym of journal or conference
).Args block containing parameters with their type and short description.
Returns block containing
pepit_tau
andtheoretical_tau
.Example block containing a minimal work example of the coded function.
We provide, in PEPit/examples/example_template.py
, a template that can be filled very quickly
to help the contributor to share their method easily.
New example template
- PEPit.examples.example_template.wc_example_template(arg1, arg2, arg3, verbose=1)[source]
Consider the
CHARACTERISTIC (eg., convex)
minimization problem\[f_\star \triangleq \min_x f(x),\]where \(f\) is
CLASS (eg., smooth convex)
.This code computes a worst-case guarantee for the **
NAME OF THE METHOD
**. That is, it computes the smallest possible \(\tau(arg_1, arg_2, arg_3)\) such that the guarantee\[\text{PERFORMANCE METRIC} \leqslant \tau(arg_1, arg_2, arg_3) \text{ INITIALIZATION}\]is valid, where
NOTATION OF THE OUTPUT
is the output of the **NAME OF THE METHOD
**, and where \(x_\star\) is the minimizer of \(f\). In short, for given values ofARGUMENTS
, \(\tau(arg_1, arg_2, arg_3)\) is computed as the worst-case value of \(\text{PERFORMANCE METRIC}\) when \(\text{INITIALIZATION} \leqslant 1\).Algorithm: The
NAME OF THE METHOD
of this example is provided inREFERENCE WITH SPECIFIED ALGORITHM
by\begin{eqnarray} \text{MAIN STEP} \end{eqnarray}Theoretical guarantee: A
TIGHT, UPPER OR LOWER
guarantee can be found inREFERENCE WITH SPECIFIED THEOREM
:\[\text{PERFORMANCE METRIC} \leqslant \text{THEORETICAL BOUND} \text{ INITIALIZATION}\]References:
- Parameters
arg1 (type1) – description of arg1.
arg2 (type2) – description of arg2.
arg3 (type3) – description of arg3.
verbose (int) –
Level of information details to print.
-1: No verbose at all.
0: This example’s output.
1: This example’s output + PEPit information.
2: This example’s output + PEPit information + CVXPY details.
- Returns
pepit_tau (float) – worst-case value
theoretical_tau (float) – theoretical value
Example
>>> pepit_tau, theoretical_tau = wc_example_template(arg1=value1, arg2=value2, arg3=value3, verbose=1) ``OUTPUT MESSAGE``
New example test template
def test_[NAME_METHOD](self):
PARAMS = PARAMS
wc, theory = wc_[NAME_METHOD](PARAMS=PARAMS, verbose=self.verbose)
# If theoretical upper bound is tight
self.assertAlmostEqual(theory, wc, delta=self.relative_precision * theory)
# If theoretical upper bound is not tight
self.assertLessEqual(wc, theory * (1 + self.relative_precision))
# If theoretical lower bound is not tight
self.assertLessEqual(theory, wc * (1 + self.relative_precision))
PEPit: Performance Estimation in Python
This open source Python library provides a generic way to use PEP framework in Python. Performance estimation problems were introduced in 2014 by Yoel Drori and Marc Teboulle, see [1]. PEPit is mainly based on the formalism and developments from [2, 3] by a subset of the authors of this toolbox. A friendly informal introduction to this formalism is available in this blog post and a corresponding Matlab library is presented in [4] (PESTO).
Website and documentation of PEPit: https://pepit.readthedocs.io/
Source Code (MIT): https://github.com/PerformanceEstimation/PEPit
Using and citing the toolbox
This code comes jointly with the following reference
:
B. Goujaud, C. Moucer, F. Glineur, J. Hendrickx, A. Taylor, A. Dieuleveut (2022).
"PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python."
When using the toolbox in a project, please refer to this note via this Bibtex entry:
@article{pepit2022,
title={{PEPit}: computer-assisted worst-case analyses of first-order optimization methods in {P}ython},
author={Goujaud, Baptiste and Moucer, C\'eline and Glineur, Fran\c{c}ois and Hendrickx, Julien and Taylor, Adrien and Dieuleveut, Aymeric},
journal={arXiv preprint arXiv:2201.04040},
year={2022}
}
Demo
This notebook provides a demonstration of how to use PEPit to obtain a worst-case guarantee on a simple algorithm (gradient descent), and a more advanced analysis of three other examples.
Installation
The library has been tested on Linux and MacOSX. It relies on the following Python modules:
Numpy
Scipy
Cvxpy
Matplotlib (for the demo notebook)
Pip installation
You can install the toolbox through PyPI with:
pip install pepit
or get the very latest version by running:
pip install -U https://github.com/PerformanceEstimation/PEPit/archive/master.zip # with --user for user install (no root)
Post installation check
After a correct installation, you should be able to import the module without errors:
import PEPit
Online environment
Example
The folder Examples contains numerous introductory examples to the toolbox.
Among the other examples, the following code (see GradientMethod
)
generates a worst-case scenario for iterations of the gradient method, applied to the minimization of a smooth (possibly strongly) convex function f(x).
More precisely, this code snippet allows computing the worst-case value of
when
is generated by gradient descent, and when
.
from PEPit import PEP
from PEPit.functions import SmoothStronglyConvexFunction
def wc_gradient_descent(L, gamma, n, verbose=1):
"""
Consider the convex minimization problem
.. math:: f_\\star \\triangleq \\min_x f(x),
where :math:`f` is :math:`L`-smooth and convex.
This code computes a worst-case guarantee for **gradient descent** with fixed step-size :math:`\\gamma`.
That is, it computes the smallest possible :math:`\\tau(n, L, \\gamma)` such that the guarantee
.. math:: f(x_n) - f_\\star \\leqslant \\tau(n, L, \\gamma) \\|x_0 - x_\\star\\|^2
is valid, where :math:`x_n` is the output of gradient descent with fixed step-size :math:`\\gamma`, and
where :math:`x_\\star` is a minimizer of :math:`f`.
In short, for given values of :math:`n`, :math:`L`, and :math:`\\gamma`, :math:`\\tau(n, L, \\gamma)` is computed as the worst-case
value of :math:`f(x_n)-f_\\star` when :math:`\\|x_0 - x_\\star\\|^2 \\leqslant 1`.
**Algorithm**:
Gradient descent is described by
.. math:: x_{t+1} = x_t - \\gamma \\nabla f(x_t),
where :math:`\\gamma` is a step-size.
**Theoretical guarantee**:
When :math:`\\gamma \\leqslant \\frac{1}{L}`, the **tight** theoretical guarantee can be found in [1, Theorem 3.1]:
.. math:: f(x_n)-f_\\star \\leqslant \\frac{L}{4nL\\gamma+2} \\|x_0-x_\\star\\|^2,
which is tight on some Huber loss functions.
**References**:
`[1] Y. Drori, M. Teboulle (2014). Performance of first-order methods for smooth convex minimization: a novel
approach. Mathematical Programming 145(1–2), 451–482.
<https://arxiv.org/pdf/1206.3209.pdf>`_
Args:
L (float): the smoothness parameter.
gamma (float): step-size.
n (int): number of iterations.
verbose (int): Level of information details to print.
- -1: No verbose at all.
- 0: This example's output.
- 1: This example's output + PEPit information.
- 2: This example's output + PEPit information + CVXPY details.
Returns:
pepit_tau (float): worst-case value
theoretical_tau (float): theoretical value
Example:
>>> L = 3
>>> pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=1 / L, n=4, verbose=1)
(PEPit) Setting up the problem: size of the main PSD matrix: 7x7
(PEPit) Setting up the problem: performance measure is minimum of 1 element(s)
(PEPit) Setting up the problem: Adding initial conditions and general constraints ...
(PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
(PEPit) Setting up the problem: interpolation conditions for 1 function(s)
function 1 : Adding 30 scalar constraint(s) ...
function 1 : 30 scalar constraint(s) added
(PEPit) Compiling SDP
(PEPit) Calling SDP solver
(PEPit) Solver status: optimal (solver: SCS); optimal value: 0.16666664596175398
*** Example file: worst-case performance of gradient descent with fixed step-sizes ***
PEPit guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
Theoretical guarantee: f(x_n)-f_* <= 0.166667 ||x_0 - x_*||^2
"""
# Instantiate PEP
problem = PEP()
# Declare a strongly convex smooth function
func = problem.declare_function(SmoothStronglyConvexFunction, mu=0, L=L)
# Start by defining its unique optimal point xs = x_* and corresponding function value fs = f_*
xs = func.stationary_point()
fs = func(xs)
# Then define the starting point x0 of the algorithm
x0 = problem.set_initial_point()
# Set the initial constraint that is the distance between x0 and x^*
problem.set_initial_condition((x0 - xs) ** 2 <= 1)
# Run n steps of the GD method
x = x0
for _ in range(n):
x = x - gamma * func.gradient(x)
# Set the performance metric to the function values accuracy
problem.set_performance_metric(func(x) - fs)
# Solve the PEP
pepit_verbose = max(verbose, 0)
pepit_tau = problem.solve(verbose=pepit_verbose)
# Compute theoretical guarantee (for comparison)
theoretical_tau = L / (2 * (2 * n * L * gamma + 1))
# Print conclusion if required
if verbose != -1:
print('*** Example file: worst-case performance of gradient descent with fixed step-sizes ***')
print('\tPEPit guarantee:\t f(x_n)-f_* <= {:.6} ||x_0 - x_*||^2'.format(pepit_tau))
print('\tTheoretical guarantee:\t f(x_n)-f_* <= {:.6} ||x_0 - x_*||^2'.format(theoretical_tau))
# Return the worst-case guarantee of the evaluated method (and the reference theoretical value)
return pepit_tau, theoretical_tau
if __name__ == "__main__":
L = 3
pepit_tau, theoretical_tau = wc_gradient_descent(L=L, gamma=1 / L, n=4, verbose=1)
Included tools
A lot of common optimization methods can be studied through this framework, using numerous steps and under a large variety of function / operator classes.
PEPit provides the following steps (often referred to as “oracles”):
PEPit provides the following function classes CNIs:
PEPit provides the following operator classes CNIs:
Contributions
All external contributions are welcome. Please read the contribution guidelines.
References
[1] Y. Drori, M. Teboulle (2014). Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming 145(1–2), 451–482.
[2] A. Taylor, J. Hendrickx, F. Glineur (2017). Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Mathematical Programming, 161(1-2), 307-345.
[3] A. Taylor, J. Hendrickx, F. Glineur (2017). Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313.
[4] A. Taylor, J. Hendrickx, F. Glineur (2017). Performance Estimation Toolbox (PESTO): automated worst-case analysis of first-order optimization methods. In 56th IEEE Conference on Decision and Control (CDC).
[5] A. d’Aspremont, D. Scieur, A. Taylor (2021). Acceleration Methods. Foundations and Trends in Optimization: Vol. 5, No. 1-2.
[6] O. Güler (1992). New proximal point algorithms for convex minimization. SIAM Journal on Optimization, 2(4):649–664.
[7] Y. Drori (2017). The exact information-based complexity of smooth convex minimization. Journal of Complexity, 39, 1-16.
[8] E. De Klerk, F. Glineur, A. Taylor (2017). On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optimization Letters, 11(7), 1185-1199.
[9] B.T. Polyak (1964). Some methods of speeding up the convergence of iteration method. URSS Computational Mathematics and Mathematical Physics.
[10] E. Ghadimi, H. R. Feyzmahdavian, M. Johansson (2015). Global convergence of the Heavy-ball method for convex optimization. European Control Conference (ECC).
[11] E. De Klerk, F. Glineur, A. Taylor (2020). Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation. SIAM Journal on Optimization, 30(3), 2053-2082.
[12] O. Gannot (2021). A frequency-domain analysis of inexact gradient methods. Mathematical Programming.
[13] D. Kim, J. Fessler (2016). Optimized first-order methods for smooth convex minimization. Mathematical Programming 159.1-2: 81-107.
[14] S. Cyrus, B. Hu, B. Van Scoy, L. Lessard (2018). A robust accelerated optimization algorithm for strongly convex functions. American Control Conference (ACC).
[15] Y. Nesterov (2003). Introductory lectures on convex optimization: A basic course. Springer Science & Business Media.
[16] S. Boyd, L. Xiao, A. Mutapcic (2003). Subgradient Methods (lecture notes).
[17] Y. Drori, M. Teboulle (2016). An optimal variant of Kelley’s cutting-plane method. Mathematical Programming, 160(1), 321-351.
[18] Van Scoy, B., Freeman, R. A., Lynch, K. M. (2018). The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Letters, 2(1), 49-54.
[19] P. Patrinos, L. Stella, A. Bemporad (2014). Douglas-Rachford splitting: Complexity estimates and accelerated variants. In 53rd IEEE Conference on Decision and Control (CDC).
[20] Y. Censor, S.A. Zenios (1992). Proximal minimization algorithm with D-functions. Journal of Optimization Theory and Applications, 73(3), 451-464.
[21] E. Ryu, S. Boyd (2016). A primer on monotone operator methods. Applied and Computational Mathematics 15(1), 3-43.
[22] E. Ryu, A. Taylor, C. Bergeling, P. Giselsson (2020). Operator splitting performance estimation: Tight contraction factors and optimal parameter selection. SIAM Journal on Optimization, 30(3), 2251-2271.
[23] P. Giselsson, and S. Boyd (2016). Linear convergence and metric selection in Douglas-Rachford splitting and ADMM. IEEE Transactions on Automatic Control, 62(2), 532-544.
[24] M .Frank, P. Wolfe (1956). An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2), 95-110.
[25] M. Jaggi (2013). Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In 30th International Conference on Machine Learning (ICML).
[26] A. Auslender, M. Teboulle (2006). Interior gradient and proximal methods for convex and conic optimization. SIAM Journal on Optimization 16.3 (2006): 697-725.
[27] H.H. Bauschke, J. Bolte, M. Teboulle (2017). A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications. Mathematics of Operations Research, 2017, vol. 42, no 2, p. 330-348
[28] R. Dragomir, A. Taylor, A. d’Aspremont, J. Bolte (2021). Optimal complexity and certification of Bregman first-order methods. Mathematical Programming, 1-43.
[29] A. Taylor, J. Hendrickx, F. Glineur (2018). Exact worst-case convergence rates of the proximal gradient method for composite convex minimization. Journal of Optimization Theory and Applications, 178(2), 455-476.
[30] B. Polyak (1987). Introduction to Optimization. Optimization Software New York.
[31] L. Lessard, B. Recht, A. Packard (2016). Analysis and design of optimization algorithms via integral quadratic constraints. SIAM Journal on Optimization 26(1), 57–95.
[32] D. Davis, W. Yin (2017). A three-operator splitting scheme and its optimization applications. Set-valued and variational analysis, 25(4), 829-858.
[33] Taylor, A. B. (2017). Convex interpolation and performance estimation of first-order methods for convex optimization. PhD Thesis, UCLouvain.
[34] H. Abbaszadehpeivasti, E. de Klerk, M. Zamani (2021). The exact worst-case convergence rate of the gradient method with fixed step lengths for L-smooth functions. arXiv 2104.05468.
[35] J. Bolte, S. Sabach, M. Teboulle, Y. Vaisbourd (2018). First order methods beyond convexity and Lipschitz gradient continuity with applications to quadratic inverse problems. SIAM Journal on Optimization, 28(3), 2131-2151.
[36] A. Defazio (2016). A simple practical accelerated method for finite sums. Advances in Neural Information Processing Systems (NIPS), 29, 676-684.
[37] A. Defazio, F. Bach, S. Lacoste-Julien (2014). SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in Neural Information Processing Systems (NIPS).
[38] B. Hu, P. Seiler, L. Lessard (2020). Analysis of biased stochastic gradient descent using sequential semidefinite programs. Mathematical programming (to appear).
[39] A. Taylor, F. Bach (2019). Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions. Conference on Learning Theory (COLT).
[40] D. Kim (2021). Accelerated proximal point method for maximally monotone operators. Mathematical Programming, 1-31.
[41] W. Moursi, L. Vandenberghe (2019). Douglas–Rachford Splitting for the Sum of a Lipschitz Continuous and a Strongly Monotone Operator. Journal of Optimization Theory and Applications 183, 179–198.
[42] G. Gu, J. Yang (2020). Tight sublinear convergence rate of the proximal point algorithm for maximal monotone inclusion problem. SIAM Journal on Optimization, 30(3), 1905-1921.
[43] F. Lieder (2021). On the convergence rate of the Halpern-iteration. Optimization Letters, 15(2), 405-418.
[44] F. Lieder (2018). Projection Based Methods for Conic Linear Programming Optimal First Order Complexities and Norm Constrained Quasi Newton Methods. PhD thesis, HHU Düsseldorf.
[45] Y. Nesterov (1983).
A method for solving the convex programming problem with convergence rate :math:O(1/k^2)
.
In Dokl. akad. nauk Sssr (Vol. 269, pp. 543-547).
[46] N. Bansal, A. Gupta (2019). Potential-function proofs for gradient methods. Theory of Computing, 15(1), 1-32.
[47] M. Barre, A. Taylor, F. Bach (2021). A note on approximate accelerated forward-backward methods with absolute and relative errors, and possibly strongly convex objectives. arXiv:2106.15536v2.
[48] J. Eckstein and W. Yao (2018). Relative-error approximate versions of Douglas–Rachford splitting and special cases of the ADMM. Mathematical Programming, 170(2), 417-444.
[49] M. Barré, A. Taylor, A. d’Aspremont (2020). Complexity guarantees for Polyak steps with momentum. In Conference on Learning Theory (COLT).
[50] D. Kim, J. Fessler (2017). On the convergence analysis of the optimized gradient method. Journal of Optimization Theory and Applications, 172(1), 187-205.
[51] Steven Diamond and Stephen Boyd (2016). CVXPY: A Python-embedded modeling language for convex optimization. Journal of Machine Learning Research (JMLR) 17.83.1–5 (2016).
[52] Agrawal, Akshay and Verschueren, Robin and Diamond, Steven and Boyd, Stephen (2018). A rewriting system for convex optimization problems. Journal of Control and Decision (JCD) 5.1.42–60 (2018).
[53] Adrien Taylor, Bryan Van Scoy, Laurent Lessard (2018). Lyapunov Functions for First-Order Methods: Tight Automated Convergence Guarantees. International Conference on Machine Learning (ICML).
[54] C. Guille-Escuret, B. Goujaud, A. Ibrahim, I. Mitliagkas (2022). Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound.
[55] B. Goujaud, A. Taylor, A. Dieuleveut (2022). Optimal first-order methods for convex functions with a quadratic upper bound.
[56] B. Goujaud, C. Moucer, F. Glineur, J. Hendrickx, A. Taylor, A. Dieuleveut (2022). PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python.