i'm looking for the best database for my big data project.
We are collecting data from some sensors. Every row has about one hundred column.
every day we store some milions of rows.

The most common query is for retreiving data for one sensor in a range of date.

at the moment i use a percona mysql cluster. when i ask data for a range on some days, the response is fast. The problem is when i ask data for a month.
The database is perfectly optimized, but the response time is not acceptable.

I would like to change percona cluster with a database able to perform query in parallel on all the nodes to improve response time.

With Cassandra i could partition data accross nodes (maybe based on the current date) but i have read that cassandra cannot read data between partition in parallel, but i have to create a query for every day. (i don't know why)

Is there a database that manage shard queries automatically, so i can distribute data across all nodes?

1 Answer
1

With Cassandra, If you split your data across multiple partitions, you still can read data between partition in parallel by executing multiples queries asynchronously.

Cassandra drivers help you handle this, see execute_concurrent from the python driver.

Moreover, the cassandra driver is aware of the data partitioning, it knows which node holds which data. So when reading or writing, it chooses an appropriate node to send the query, according to the driver load balancing policy (specifically with the TokenAwarePolicy).

Thus, the client acts as a load balancer, and your request is processed in parallel by the available nodes.