Behavior-Driven Optimizations for Big Data Exploration

The physical and biological sciences are becoming more data driven, often due overwhelming quantities of data collected from satellites, telescopes, sequencers, and other sensors. One of the key issues for scientists who work with large datasets is efficient visualization of their data to extract patterns, observe anomalies, and debug their workflows. Though a variety of visualization tools exist to help people make sense of their data, these tools often rely on database management systems (or DBMSs) for data processing and storage; and unfortunately, DBMSs fail to process the data fast enough to support a fluid, interactive visualization experience. My work blends optimization techniques from databases and methodology from HCI and visualization in order to support interactive and iterative exploration of large datasets. In this talk, I will discuss Sculpin, a visual exploration system that learns user exploration patterns automatically, and exploits these patterns to pre-fetch data ahead of users as they explore. I will show that Sculpin's pre-fetching techniques provide significant performance benefits compared to existing systems. I will then discuss our ongoing work with Sculpin, which aims to avoid wasting computational resources, while still providing a fluid, interactive exploration experience for users. To do this, we combine data-prefetching with incremental data processing and visualization-focused caching optimizations, and incorporate these techniques in Sculpin to further boost performance.