Smoothed Variable Sample-Size Accelerated Proximal Methods for Nonsmooth Stochastic Convex Programs

Afrooz Jalilzadeh, Uday Shanbhag, Jose H. Blanchet and Peter W. Glynn

Stochastic Systems, Volume 12, Number 4 (2022).

JalilzadehShanbhagBlanchetG2022.pdf

We consider the unconstrained minimization of the function F, where F = f + g, f is an expectation-valued nonsmooth convex or strongly convex function, and g is a closed, convex, and proper function. (I) Strongly convex f. When f is μ-strongly convex in x, traditional stochastic subgradient schemes (SSG) often display poor behavior, arising in part from noisy subgradients and diminishing steplengths. Instead, we apply a variable sample-size accelerated proximal scheme (VS-APM) on F, the Moreau envelope of F; we term such a scheme as (mVS-APM) and in contrast with (SSG) schemes, (mVS-APM) utilizes constant steplengths and increasingly exact gradients. We consider two settings. (a) Bounded domains. In this setting, (mVS-APM) displays linear convergence in inexact gradient steps, each of which requires utilizing an inner (prox-SSG) scheme. Specically, (mVS-APM) achieves an optimal oracle complexity in prox-SSG steps of O(1/ϵ) with an iteration complexity of O(log(1/ϵ)) in inexact (outer) gradients of F to achieve an ϵ-accurate solution in mean-squared error, computed via an increasing number of inner (stochastic) subgradient steps; (b) Unbounded domains. In this regime, under an assumption of state-dependent bounds on subgradients, an unaccelerated variant (mVS-APM) is linearly convergent where increasingly exact gradients ∇xF(x) are approximated with increasing accuracy via (SSG) schemes. Notably, (mVS-APM) also displays an optimal oracle complexity of O(1/ϵ); (II) Convex f. When f is merely convex but smoothable, by suitable choices of the smoothing, steplength, and batch-size sequences, smoothed (VS-APM) (or sVS-APM) achieves an optimal oracle complexity of O(1/ϵ2) to obtain an ϵ-optimal solution. Our results can be specialized to two important cases: (a) Smooth f. Since smoothing is no longer required, we observe that (VS-APM) admits the optimal rate and oracle complexity, matching prior ndings; (b) Deterministic nonsmooth f. In the nonsmooth deterministic regime, (sVS-APM) reduces to a smoothed accelerated proximal method (s-APM) that is both asymptotically convergent and optimal in that it displays a complexity of O(1/ϵ), matching the bound provided by Nesterov in 2005 for producing ϵ-optimal solutions. Finally, (sVS-APM) and (VS-APM) produce sequences that converge almost surely to a solution of the original problem.