DocsNavigationUser login |
Contractual Compiler OptimizationsThe Sequoia compiler and runtime is one particular implementation of the Sequoia language, and it's possible that there could exist completely different systems which take Sequoia code as an input language. This section describes the optimizations which any Sequoia implementation must perform; these can be considered a "contract" between the programmer and the system, and the programmer may rely on these optimizations being performed when writing their code. (The Sequoia compiler performs these optimizations, and more.) CBVR Copy Elimination of Stride-1 Range Array Blocks The task calling semantics of the Sequoia language are call-by-value-result (CBVR), also known as "copy in, copy out". Clearly, if every task call in the program required a physical memory copy of array arguments then performance would be very poor, but the copies may only be semantic copies; that is, as far as the program is concerned, the copy did occur, but "under the hood", the Sequoia compiler or runtime system may have optimized the copy away, converting it instead into an alias or reference to another array if it proves that it is safe to do so. There is a very common case of argument passing, in which a multi-dimensional stride-1 range array block is passed as an argument to a subtask call, such as in the following example.
Suppose that these tasks were instantiated as in the following diagram:
In the event that the mapping interface specifies that the MyTask1Inst instance's array A and the MyTask2Inst instance's array X are in the same memory level, the semantic copy from the namespace of MyTask1Inst into MyTask2Inst that occurs on a task call would actually be a copy within the same memory module, and further, the calling semantics of tasks specify that when MyTask1Inst executes the task call "MyTask2(A);" (say), it will actually pause there until the child task returns, and thus, the compiler can actually alias array X to array A and not perform an underlying physical copy of the data. As far as the program is concerned, however, arrays X and A are completely different physical objects. Similarly, arrays Y and B can be aliased to the same physical array object. In essence, the compiler is taking a language with copy semantics on task calls and optimizing away unnecessary copies when it is safe to do so. The Sequoia compiler actually eliminates copies in other cases too (if safe), though the basic copy elimination of stride-1 range array arguments to task calls is a guaranteed feature of any Sequoia implementation. Copy Hoisting Consider the following example, consisting of an inner task implementation from a Sequoia 2D convolution program.
The language semantics are that every iteration of the nested mappar statement, the Conv2D task is called, passing array blocks of A and C and the entire array H as arguments. Suppose that the instance called at the Conv2D callsite was in a different memory level to the calling instance of Conv2D::Inner: the semantics of this code would be that every iteration, the two input array arguments would be physically copied between memory levels (which could be an expensive memory transfer operation), the subtask instance would run, and then the subtask's output data would be copied back to the calling task's C array. Clearly, the exact same array H is passed every iteration, and it would be inefficient to recopy it every time. The compiler will perform an optimization known as "loop hoisting", also known as "loop-invariant code motion", and only perform the copy of H once, at the start of the nested mappar construct, with every iteration's Conv2D call operating on this copy. The contractual optimization that any Sequoia implementation must perform is to identify the iteration variables that are used to describe an array block passed to a task call inside a nest of Sequoia iteration constructs and "hoist" out the copy of any arguments as far as possible. As a second example, consider the following inner task from a Sequoia implementation of matrix multiplication.
In this example, the array block C[i*Mblk;Mblk][j*Nblk;Nblk] doesn't depend on the k loop variable, only i and j, and so rather than copying C on every (i, j, k) iteration, it is only copied on every (i, j) iteration, with each k iteration using the same copy. |