RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)

In Impala 2.6 and higher, this query option only applies as a fallback, when statistics
are not available. By default, Impala estimates the optimal size of the Bloom filter structure
regardless of the setting for this option. (This is a change from the original behavior in
Impala 2.5.)

In Impala 2.6 and higher, when the value of this query option is used for query planning,
it is constrained by the minimum and maximum sizes specified by the
RUNTIME_FILTER_MIN_SIZE and RUNTIME_FILTER_MAX_SIZE query options.
The filter size is adjusted upward or downward if necessary to fit within the minimum/maximum range.

Type: integer

Default: 1048576 (1 MB)

Maximum: 16 MB

Added in:Impala 2.5.0

Usage notes:

This setting affects optimizations for large and complex queries, such
as dynamic partition pruning for partitioned tables, and join optimization
for queries that join large tables.
Larger filters are more effective at handling
higher cardinality input sets, but consume more memory per filter.

If your query filters on high-cardinality columns (for example, millions of different values)
and you do not get the expected speedup from the runtime filtering mechanism, consider
doing some benchmarks with a higher value for RUNTIME_BLOOM_FILTER_SIZE.
The extra memory devoted to the Bloom filter data structures can help make the filtering
more accurate.

Because the runtime filtering feature applies mainly to resource-intensive
and long-running queries, only adjust this query option when tuning long-running queries
involving some combination of large partitioned tables and joins involving large tables.

Because the effectiveness of this setting depends so much on query characteristics and data distribution,
you typically only use it for specific queries that need some extra tuning, and the ideal value depends
on the query. Consider setting this query option immediately before the expensive query and
unsetting it immediately afterward.