If the data cannot be cached because it’s too huge to fit into memory, you can use another storage than CSV. Parquet and ORC are storing data in columnar format which allows the engine to scan only the data of the columns required for the query as opposed to all the data.

// unfortunately, parquet does not play well with spaces in column names, so
// we have to define aliases for those columns
sql("""
CREATE TABLE inspections_parquet USING parquet AS
SELECT
CAMIS, DBA, BORO, BUILDING, STREET, ZIPCODE, PHONE,
`CUISINE DESCRIPTION` AS CUISINE_DESCRIPTION,
`INSPECTION DATE` AS INSPECTION_DATE, ACTION,
`VIOLATION CODE` AS VIOLATION_CODE,
`VIOLATION DESCRIPTION` AS VIOLATION_DESCRIPTION,
`CRITICAL FLAG` AS CRITICAL_FLAG,
SCORE, GRADE,
`GRADE DATE` AS GRADE_DATE,
`RECORD DATE` AS RECORD_DATE,
`INSPECTION TYPE` AS INSPECTION_TYPE
FROM inspections""")

Now let’s run the same query than previously but on this “inspections_parquet” table