A Sequoia program is comprised of a set of abstract, parameterized task variants, and the process of mapping such a Sequoia program to a target machine consists of instantiating the abstract tasks, yielding a set of task instances.
Tasks contain four types of contructs which must be mapped to a target machine.
- Tunable parameters; these must be set to compile-time constant integer values.
- Program variables (both scalars and arrays); every data object must be placed in a concrete memory level of the machine. Further, array variables may have their layout within a memory level specified as a mapping directive.
- Control constructs, including C control statements (for, if, while, etc.), Sequoia iteration statements (mappar, mapseq, etc.), and code that operates on scalar variables in general; these control constructs must all be assigned to the control processor in the target machine that will execute them.
- Task callsites; a callsite only names the "base" task which is called, not the variant or instance. Mapping directives control which instance is called at every task callsite in the program.
The process of instantiating a task consists of making all of the above parameter choices and mapping decisions. A task instance is a task variant which has been specialized for a target machine by mapping all the above constructs.
As a running example for the explanations in the subsections of this section, consider the following implementation of matrix multiplication:
void task<inner> MatMul::Inner(in float A[M][P], in float B[P][N], inout float C[M][N])
{
tunable Mblk, Pblk, Rblk;
mappar (int i = 0 : M/Mblk,
mappar (int j = 0 : N/Nblk) {
mapseq (int k = 0 : P/Pblk) {
MatMul(A[i*Mblk;Mblk][k*Pblk;Pblk],
B[k*Pblk;Pblk][j*Nblk;Nblk],
C[i*Mblk;Mblk][j*Nblk;Nblk]);
}
}
}
}
void task<leaf> MatMul::Leaf(in float A[M][P], in float B[P][N], inout float C[M][N])
{
for (int i = 0; i < M; i++) {
for (int j = 0; j < N; j++) {
for (int k = 0; k < P; k++) {
C[i][j] += A[i][k] * B[k][j];
}
}
}
}
|
The above implementation multiplies input matrices A and B and accumulates the result into matrix C, decomposing the algorithm by splitting up the problem of multipling an M*P matrix by a P*N matrix into subproblems which multiply submatrices of sizes Mblk*Pblk and Pblk*Nblk together.
The task variant call graph corresponding to this program is as follows: