We’ve previously described the Hadoop/Hive data warehouse we built in 2012 to store and process the HTTP access logs (450M records/day) and structured application event logs (170M events/day) that are generated by our service. This setup is still working well for us, but we added Impala into our cluster last year to speed up ad hoc analytic queries. This led to the promised 4x reduction in query times, but access to data in the