APPX_COUNT_DISTINCT Query Option (CDH 5.2 or higher only)

Allows multiple COUNT(DISTINCT) operations within a single query, by internally rewriting each COUNT(DISTINCT) to use the
NDV() function. The resulting count is approximate rather than precise.

Type: Boolean; recognized values are 1 and 0, or true and false; any other value interpreted
as false

Default:false (shown as 0 in output of SET statement)

Examples:

The following examples show how the APPX_COUNT_DISTINCT lets you work around the restriction where a query can only evaluate COUNT(DISTINCT col_name) for a single column. By default, you can count the distinct values of one column or another, but not both in a single
query:

When you enable the APPX_COUNT_DISTINCT query option, now the query with multiple COUNT(DISTINCT) works. The reason this
behavior requires a query option is that each COUNT(DISTINCT) is rewritten internally to use the NDV() function instead, which provides
an approximate result rather than a precise count.

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.