Source code for PEPit.examples.stochastic_and_randomized_convex_minimization.saga

import numpy as np

from PEPit import PEP
from PEPit.functions import ConvexFunction
from PEPit.functions import SmoothStronglyConvexFunction
from PEPit.primitive_steps import proximal_step



[docs]
def wc_saga(L, mu, n, wrapper="cvxpy", solver=None, verbose=1):
    """
    Consider the finite sum convex minimization problem

    .. math:: F_\\star \\triangleq \\min_x \\left\\{F(x) \\equiv h(x) + \\frac{1}{n} \\sum_{i=1}^{n} f_i(x)\\right\\},

    where the functions :math:`f_i` are assumed to be :math:`L`-smooth :math:`\\mu`-strongly convex, and :math:`h` is
    closed, proper, and convex with a proximal operator readily available. In the sequel, we use the notation 
    :math:`\\mathbb{E}` for denoting the expectation over the uniform distribution of the index
    :math:`i \\sim \\mathcal{U}\\left([|1, n|]\\right)`,
    e.g., :math:`F(x)\\equiv h(x)+\\mathbb{E}[f_i(x)]`.

    This code computes the exact rate for a Lyapunov (or energy) function for **SAGA** [1].
    That is, it computes the smallest possible :math:`\\tau(n,L,\\mu)` such this Lyapunov function decreases geometrically

    .. math:: \\mathbb{E}[V^{(1)}] \\leqslant \\tau(n, L, \\mu) V^{(0)},
    
    where the value of the Lyapunov function at iteration :math:`t` is denoted by :math:`V^{(t)}` and is defined as

    .. math:: V^{(t)} \\triangleq \\frac{1}{n} \sum_{i=1}^n \\left(f_i(\\phi_i^{(t)}) - f_i(x^\\star) - \\langle \\nabla f_i(x^\\star); \\phi_i^{(t)} - x^\\star\\rangle\\right) + \\frac{1}{2 n \\gamma (1-\\mu \\gamma)} \\|x^{(t)} - x^\\star\\|^2,

    with :math:`\\gamma = \\frac{1}{2(\\mu n+L)}` (this Lyapunov function was proposed in [1, Theorem 1]).
    We consider the case :math:`t=0` in the code below, without loss of generality.

    In short, for given values of :math:`n`, :math:`L`, and :math:`\\mu`,
    :math:`\\tau(n, L, \\mu)` is computed as the worst-case value of :math:`\\mathbb{E}[V^{(1)}]`
    when :math:`V(x^{(0)}) \\leqslant 1`.

    **Algorithm**: One iteration of SAGA [1] is described as follows: at iteration :math:`t`, pick
    :math:`j\\in\\{1,\ldots,n\\}` uniformely at random and set:

        .. math::
            :nowrap:

            \\begin{eqnarray}
                \\phi_j^{(t+1)} & = & x^{(t)} \\\\
                w^{(t+1)} & = & x^{(t)} - \\gamma \\left[ \\nabla f_j (\\phi_j^{(t+1)}) - \\nabla f_j(\\phi_j^{(t)}) + \\frac{1}{n} \\sum_{i=1}^n(\\nabla f_i(\\phi^{(t)}))\\right] \\\\
                x^{(t+1)} & = & \\mathrm{prox}_{\\gamma h} (w^{(t+1)})\\triangleq \\arg\\min_x \\left\\{ \\gamma h(x)+\\frac{1}{2}\\|x-w^{(t+1)}\\|^2\\right\\}
            \\end{eqnarray}

    **Theoretical guarantee**: The following **upper** bound (empirically tight) can be found in [1, Theorem 1]:

    .. math:: \\mathbb{E}[V^{(t+1)}] \\leqslant \\left(1-\\gamma\\mu \\right)V^{(t)}

    **References**:

    `[1] A. Defazio, F. Bach, S. Lacoste-Julien (2014). SAGA: A fast incremental gradient method with support for
    non-strongly convex composite objectives. In Advances in Neural Information Processing Systems (NIPS).
    <http://papers.nips.cc/paper/2014/file/ede7e2b6d13a41ddf9f4bdef84fdc737-Paper.pdf>`_

    Args:
        L (float): the smoothness parameter.
        mu (float): the strong convexity parameter.
        n (int): number of functions.
        wrapper (str): the name of the wrapper to be used.
        solver (str): the name of the solver the wrapper should use.
        verbose (int): level of information details to print.
                        
                        - -1: No verbose at all.
                        - 0: This example's output.
                        - 1: This example's output + PEPit information.
                        - 2: This example's output + PEPit information + solver details.

    Returns:
        pepit_tau (float): worst-case value
        theoretical_tau (float): theoretical value

    Example:
        >>> pepit_tau, theoretical_tau = wc_saga(L=1, mu=.1, n=5, wrapper="cvxpy", solver=None, verbose=1)
        (PEPit) Setting up the problem: size of the Gram matrix: 27x27
        (PEPit) Setting up the problem: performance measure is the minimum of 1 element(s)
        (PEPit) Setting up the problem: Adding initial conditions and general constraints ...
        (PEPit) Setting up the problem: initial conditions and general constraints (1 constraint(s) added)
        (PEPit) Setting up the problem: interpolation conditions for 6 function(s)
        			Function 1 : Adding 30 scalar constraint(s) ...
        			Function 1 : 30 scalar constraint(s) added
        			Function 2 : Adding 6 scalar constraint(s) ...
        			Function 2 : 6 scalar constraint(s) added
        			Function 3 : Adding 6 scalar constraint(s) ...
        			Function 3 : 6 scalar constraint(s) added
        			Function 4 : Adding 6 scalar constraint(s) ...
        			Function 4 : 6 scalar constraint(s) added
        			Function 5 : Adding 6 scalar constraint(s) ...
        			Function 5 : 6 scalar constraint(s) added
        			Function 6 : Adding 6 scalar constraint(s) ...
        			Function 6 : 6 scalar constraint(s) added
        (PEPit) Setting up the problem: additional constraints for 0 function(s)
        (PEPit) Compiling SDP
        (PEPit) Calling SDP solver
        (PEPit) Solver status: optimal (wrapper:cvxpy, solver: MOSEK); optimal value: 0.9666666451656595
        (PEPit) Primal feasibility check:
        		The solver found a Gram matrix that is positive semi-definite up to an error of 8.407941346480772e-09
        		All the primal scalar constraints are verified up to an error of 1.4346716234068732e-07
        (PEPit) Dual feasibility check:
        		The solver found a residual matrix that is positive semi-definite
        		All the dual scalar values associated with inequality constraints are nonnegative
        (PEPit) The worst-case guarantee proof is perfectly reconstituted up to an error of 2.824905589324927e-07
        (PEPit) Final upper bound (dual): 0.9666666537075449 and lower bound (primal example): 0.9666666451656595 
        (PEPit) Duality gap: absolute: 8.541885421209372e-09 and relative: 8.836433390898197e-09
        *** Example file: worst-case performance of SAGA for Lyapunov function V_t ***
        	PEPit guarantee:	 V^(1) <= 0.966667 V^(0)
        	Theoretical guarantee:	 V^(1) <= 0.966667 V^(0)
    
    """

    # Instantiate PEP
    problem = PEP()

    # Declare a convex function and n smooth strongly convex ones
    h = problem.declare_function(ConvexFunction)
    fn = [problem.declare_function(SmoothStronglyConvexFunction,
                                   L=L, mu=mu, reuse_gradient=True) for _ in range(n)]

    # Define the objective as a linear combination of the former
    func = h + np.mean(fn)

    # Start by defining its unique optimal point xs = x_* and corresponding function value fs = f_*
    xs = func.stationary_point()

    # Then define the initial points
    phi = [problem.set_initial_point() for _ in range(n)]
    x0 = problem.set_initial_point()

    # Compute the initial value of the Lyapunov function, for a given parameter
    gamma = 1 / 2 / (mu * n + L)
    c = 1 / 2 / gamma / (1 - mu * gamma) / n
    g, f = [x0 for _ in range(n)], [x0 for _ in range(n)]
    g0, f0 = [x0 for _ in range(n)], [x0 for _ in range(n)]
    gs, fs = [x0 for _ in range(n)], [x0 for _ in range(n)]
    init_lyapunov = c * (xs - x0) ** 2

    for i in range(n):
        g[i], f[i] = fn[i].oracle(phi[i])
        gs[i], fs[i] = fn[i].oracle(xs)
        init_lyapunov = init_lyapunov + 1 / n * (f[i] - fs[i] - gs[i] * (phi[i] - xs))

    # Set the initial constraint as the Lyapunov bounded by 1
    problem.set_initial_condition(init_lyapunov <= 1)

    # Compute the expected value of the Lyapunov function after one iteration
    # (so: expectation over n possible scenarios: one for each element fi in the function).
    final_lyapunov_avg = (xs - xs) ** 2
    for i in range(n):
        g0[i], f0[i] = fn[i].oracle(x0)
        w = x0 - gamma * (g0[i] - g[i])
        for j in range(n):
            w = w - gamma / n * g[j]
        x1, _, _ = proximal_step(w, h, gamma)
        final_lyapunov = c * (x1 - xs) ** 2
        for j in range(n):
            if i != j:
                gi, fi = g[j], f[j]
                final_lyapunov = final_lyapunov + 1 / n * (fi - fs[j] - gs[j] * (phi[j] - xs))
            else:
                gi, fi = g0[i], f0[i]
                final_lyapunov = final_lyapunov + 1 / n * (fi - fs[j] - gs[j] * (x0 - xs))
        final_lyapunov_avg = final_lyapunov_avg + final_lyapunov / n

    # Set the performance metric to the distance average to optimal point
    problem.set_performance_metric(final_lyapunov_avg)

    # Solve the PEP
    pepit_verbose = max(verbose, 0)
    pepit_tau = problem.solve(wrapper=wrapper, solver=solver, verbose=pepit_verbose)

    # Compute theoretical guarantee (for comparison) : the bound is given in Theorem 1 of [1]
    theoretical_tau = (1 - gamma * mu)

    # Print conclusion if required
    if verbose != -1:
        print('*** Example file: worst-case performance of SAGA for Lyapunov function V_t ***')
        print('\tPEPit guarantee:\t V^(1) <= {:.6} V^(0)'.format(pepit_tau))
        print('\tTheoretical guarantee:\t V^(1) <= {:.6} V^(0)'.format(theoretical_tau))

    # Return the worst-case guarantee of the evaluated method (and the reference theoretical value)
    return pepit_tau, theoretical_tau



if __name__ == "__main__":
    pepit_tau, theoretical_tau = wc_saga(L=1, mu=.1, n=5, wrapper="cvxpy", solver=None, verbose=1)