cache table vs. parquet table performance

cache table vs. parquet table performance

Hello,

I'm using spark-thrift server and I'm searching for best performing solution to query hot set of data. I'm processing records with nested structure, containing subtypes and arrays. 1 record takes up several KB.

If I understood correctly, the data should be stored in in-memory columnar format with storage level MEMORY_AND_DISK. So data which doesn't fit to memory will be spille to disk (I assume also in columnar format (?))

I cached 1 day of data (1 M records) and according to spark UI storage tab none of the data was cached to memory and everything was spilled to disk. The size of the data was 5.7 GB.

I'm using spark-thrift server and I'm searching for best performing solution to query hot set of data. I'm processing records with nested structure, containing subtypes and arrays. 1 record takes up several KB.

If I understood correctly, the data should be stored in in-memory columnar format with storage level MEMORY_AND_DISK. So data which doesn't fit to memory will be spille to disk (I assume also in columnar format (?))

I cached 1 day of data (1 M records) and according to spark UI storage tab none of the data was cached to memory and everything was spilled to disk. The size of the data was 5.7 GB.

I'm using spark-thrift server and I'm searching for best performing solution to query hot set of data. I'm processing records with nested structure, containing subtypes and arrays. 1 record takes up several KB.

If I understood correctly, the data should be stored in in-memory columnar format with storage level MEMORY_AND_DISK. So data which doesn't fit to memory will be spille to disk (I assume also in columnar format (?))

I cached 1 day of data (1 M records) and according to spark UI storage tab none of the data was cached to memory and everything was spilled to disk. The size of the data was 5.7 GB.

I'm using spark-thrift server and I'm searching for best performing solution to query hot set of data. I'm processing records with nested structure, containing subtypes and arrays. 1 record takes up several KB.

If I understood correctly, the data should be stored in in-memory columnar format with storage level MEMORY_AND_DISK. So data which doesn't fit to memory will be spille to disk (I assume also in columnar format (?))

I cached 1 day of data (1 M records) and according to spark UI storage tab none of the data was cached to memory and everything was spilled to disk. The size of the data was 5.7 GB.

I'm using spark-thrift server and I'm searching for best performing solution to query hot set of data. I'm processing records with nested structure, containing subtypes and arrays. 1 record takes up several KB.

If I understood correctly, the data should be stored in in-memory columnar format with storage level MEMORY_AND_DISK. So data which doesn't fit to memory will be spille to disk (I assume also in columnar format (?))

I cached 1 day of data (1 M records) and according to spark UI storage tab none of the data was cached to memory and everything was spilled to disk. The size of the data was 5.7 GB.