RSQP: Problem-Specific Architectural Customization for Accelerated Convex Quadratic Optimization

M. Wang, I. McInerney, B. Stellato, S. Boyd, and H. So

Proceedings 50th Annual International Symposium on Computer Architecture, article 73, pages 1–12, June 2023.

Convex optimization is at the heart of many performance-critical applications across a wide range of domains. Although many high-performance hardware accelerators have been developed for specific optimization problems in the past, designing such an accelerator is a challenging task and the resulting computing architecture is often so specific to the targeted application that they can hardly be re-used even in a related application within the same domain. To accelerate general-purpose optimization solvers that must operate on diverse user input during run time, an ideal hardware solver should be able to adapt to the provided optimization problem dynamically while achieving high performance and power-efficiency. In this work, a hardware-accelerated general-purpose quadratic program solver, called RSQP, with reconfigurable functional units and data path that facilitate problem-specific customization is presented. RSQP uses a string-based encoding to describe the problem structure with fine granularity. Based on this encoding, functional units and datapath customized to the sparsity pattern of the problem are created by solving a dictionary-based lossless string compression problem and a mixed integer linear program respectively. RSQP has been integrated to accelerate the general-purpose quadratic programming solver OSQP and has been tested using an extensive benchmark with 120 optimization problems from 6 application domains. Through architectural customization, RSQP achieves up to 7x performance improvement over its baseline generic design. Furthermore, when compared with a CPU and a GPU-accelerated implementation, RSQP achieves up to 31.2x and 6.9x end-to-end speedup on these benchmark programs, respectively. Finally, the FPGA accelerator operates at up to 6.6x lower dynamic power consumption and up to 22.7x higher power efficiency over the GPU implementation, making it an attractive solution for power-conscious datacenter applications.