For many big data analytics workloads, approximate results suffice. This begs
the question, whether and how the underlying system architecture can take
advantage of such relaxations, thereby lifting constraints inherent in today’s
architectures. This position paper explores one of the possible directions.
Impression Store is a distributed storage system with the abstraction of
big data vectors. It aggregates updates internally and responds to the retrieval
of top-K high-value entries. With proper extension, Impression
Store supports various aggregations, top-K queries, outlier and
major mode detection. While restricted in scope, such queries represent a
substantial and important portion of many production workloads. In return, the
system has unparalleled scalability; any node in the system can process any
query, both reads and updates. The key technique we leverage is compressive
sensing, a technique that substantially reduces the amount of active memory
state, IO, and traffic volume needed to achieve such scalability.