Computing the Representation to optimize Data Movement and Storage
Its widely recognized that data movement (from memory,
from SSD, within a parallel machine, even in the wires across a chip)
is the critical cost and performance limiter in computer systems. We
are building architectures that enable efficient rapid transformation
of information encodings, to reduce size and computation cost. UAP, UDP, and now the Recoding Engine. Initial
designs and studies show that benefits of 4x to 1000x can be achieved
in specific cases. Critical challenges include how to expose these new
ideas to software: (e.g. transformer libraries, or to view C++ arrays
as abstract data types with a different concrete type implementation),
as well as a variety of functional and implementation architecture
issues.
These efforts came out of the 10x10 project, that pursued a a principled, systematic approach to heterogeneity in computer architecture. A
10x10 architecture
exploits deep workload analysis to drive co-design of a federated heterogeneous architecture that exploits customization for energy efficiency, but federates a set of customized engines to achieve general-purpose coverage. The 10x10 project built 7 accelerators and federated them in a
study that assessed overall benefit
. The most interesting accelerators were all data-oriented. The three data-oriented accelerators (generalized pattern matching, small sort, and gather-scatter) were merged into a new architecture called the Unstructured Data Processor (UDP) and sometimes Unified Automata Processor (UAP)

9/20/2018 News: We are part of the new IRIS-HEP NSF grant , and we will explore acceleration of data science / High-energy physics with UDP/Recode ideas.