Inference for Tractable Architectural Analysis
Statistical inference enables fundamentally new capabilities
for capturing relationships within parameter spaces for
microarchitectures
[ASPLOS'06] and multiprocessors
[HPCA'06],
[MICRO'08]. The
computational efficiency of statistical inference not only
provides answers to prior questions far more quickly, it also
provides new answers to much larger, previously intractable
questions. The proposed microarchitectural simulation
paradigm defines a comprehensive design space, simulates
sparsely sampled design points, and derives inferential
models to reveal trends. These models are inexpensively
constructed, efficient, and accurate surrogates for
simulators of multi-billion point design spaces.
Moreover, inference is equally applicable to both sides of
the hardware/software interface, whether estimating the
impact of process variations in emerging circuit technologies
(e.g., 3T1D memories [ICCD'09]) or
estimating performance scalability for tunable parallel
applications (e.g., LINPACK, Multigrid
[PPoPP'07]).
Optimization of Performance, Power, Temperature
The computational efficiency of inference closes the divide
between detailed simulation and best known practices in classical
optimization, which are applied microarchitectural performance,
power and temperature. Pareto frontiers and contour maps for
large, comprehensive design spaces are now possible. Iterative
optimization heuristics become tractable when predictive
regression models replace simulation within the iterative loop.
Thus, efficient surrogates for detailed simulation allow
designers to leverage the wealth of literature and history in
classical optimization for qualitatively new studies of
performance-power efficiency [HPCA'08], multiprocessor
heterogeneity
[HPCA'07], and microarchitectural adaptivity [ASPLOS'08].
Scalability and Robust, Emerging Technologies
For decades, technology scaling boosted performance and
increased density for integrated circuits. However,
shrinking device feature sizes and process variations hinder
reliability and limit performance gains from scaling for
current designs. Emerging circuits and devices that are more
robust to process variations and are amenable to scaling include
phase change memory (PCM) as a DRAM alternative
[ISCA'09]. On the memory
bus, PCM provides non-volatility below the processor caches
with deep implications across the hardware-software interface
[SOSP'09]. 3T1D
memories, an emerging DRAM circuit is a viable 6T SRAM
alternative [ICCD'09b].
For existing designs, post-fabrication tuning may mitigate
process variations.
[ICCD'09a].
Sustainability and Digital Infrastructure Policy
Increasing centralization of compute resources, driven by the
commoditization of compute servers and economies of scale,
suggests environmental and energy effects from IT
infrastructure are most effectively monitored in and
optimized for large-scale data centers
[StGallen'07].
Within data centers, small cores provide efficiency but, as
currently architected, exacts a price with respect to
robustness, flexibility, and reliability
[MSR'09]. Equally
important is validating claims of net environmental benefits
from the adoption of digital business practices and assessing
the degree to which technology is an incomplete substitute
for traditional business practices. Future research in
digital sustainability will span fundamental technology,
business management, and public policy
[StGallen'08].
Auto-Tuning for High-Performance Computing
Effective sparse matrix-vector multiply (SpMV) optimizations
need heuristics that automatically tune linear algebra
computational kernels to reflect the capabilities of current
compiler and hardware technologies
[SC'02].
In particular, optimizations for symmetric SpMV include
algorithmic, data structure, compiler, and
architecture-specific elements. These optimizations exploit
the symmetric structure of the matrix to improve performance
by as much as 2.6x while reducing storage costs and memory
traffic by 0.5x
[ICPP'04]. This implementation of symmetric SpMV is
incorporated into published libraries, which use heuristics
to search for the best choice of values for tunable
parameters
[OSKI]. Furthermore,
models can estimate upper bounds on performance as a function
of system and algorithmic parameters, thereby evaluating the
effectiveness of optimizations against theoretical peak
performance.