Hi,
We have a web-app platform that uses Vertica server.
Our machines are on cloud, from AWS.
Vertica cluster has 3 nodes, m3.2xlarge, 8 vCPU, 30 GB memory.
SIze of the db files - 25 GB.
The database contains around 10 tables, maximum 15.000.000 records/table, 1000 columns/table.
The tables have only the default superprojections. There isn't any possibility to know which combination of columns will be used. It is up to the web-app user to chose the columns combination (=parameters) that will be used.
Also, there is an ETL process that load/update data.
Our problem is related to memory usage.
Sometimes, during the ETL process, the web-app is stuck because the SELECT-s requested by the web-app are put on queue and then rejected due to 'timeout' or 'issuficient memory'.
As I wrote on an other topic viewtopic.php?f=47&t=2516, the resource pool used by web-app has maximum 25 G, but only 6 SELECTs (2.5GB) can run simultaneously, that means only 6 web reports can be done.
The responsiveness for only few web-app users is unacceptable.
My questions are:
- how to set the aws instances/vertica server/resource_pool to accept more queries ?
create resource pool xxxx_pool
memorysize '30%'
maxmemorysize '95%'
priority 100
runtimepriority MEDIUM
runtimeprioritythreshold 30
queuetimeout 120
runtimecap '10 seconds'
cascade to general
- how to calculate the memory necessary for 100 (?!?) SELECTs ?
- how to calculate the maximum number of supported SELECTs for 25 GB ?
- how to decrease the memory used by a SELECT ?
- how to calculate memory is the database will increase to 1 T ?
Thank you

vcarusi wrote:My question: "wide" is referring to number of rows? number of columns ?

Wide refers to the number of columns. The catalog may have improved in more recent versions, but wider tables could potentially degrade performance; especially if not optimized. Run DBD if you haven't already; it could potentially improve your encoding.

For select queries , memory allocation is related to the amount of columns on the select list (its around 2M per column) , in many cases it also related to many other factors like if your query include aggregation and sorting and it also related to your cluster setup (how many nodes )