APL is a programming language for numerical matrix work [APL]; it
is the original collection oriented language[SiBle91].
This section examines the use of buffering (a term from operating
systems) to implement such languages on uniprocessors with a memory
hierarchy. Buffering (sometimes also called batching) is a very
effective and widely used programming technique, but also has its
limits: latency suffers, sometimes performance suffers, dynamic
conditionals induce copying and increase overhead. There is no known
way to effectively handle dynamic jumps, or higher order scans,
reduces, and permutations.

Rather than loop over all the data, it can be advantageous to
limit the loop length (block size) so that the total temporary memory
required fits within the (or a) cache. Renderman and Nyquist [Nyquist] do this. This is called blocking or tiling[WoLa91]. Note that as the temporary space required increases, the
vector length must go down to remain in the cache, though this
probably only makes a difference in extreme cases.

There are, however, some drawbacks to buffering. Latency suffers
because a complete block must be processed before the first result is
available. Latency is particularly important in interactive music
applications. Throughput may suffer from increased cache and memory
traffic or from an inability to effectively use multiple functional
units.

Dynamic control constructs are very difficult to handle while
buffering. Conditionals can be handled either with mask vectors or by
copying and collapsing. I don't know any reasonable way to handle
dynamic jumps, or higher-order scans, reduces, and permutations.

With RTCG the static control flow of the inner interpreter is
eliminated. The resulting loop may be better optimized. Memory
references for temporaries in the program can be replaced with
register references. Tiling is still useful, though not as often.

See [ThoDa95] for an analysis of buffering vs compilation in the
context of audio signal synthesis.