Monday, September 26, 2011

I don't know whether there's another situation where this characteristic would be useful, but right now due to an ETL specification I need to make sure that a particular combination of fields doesn't repeat in the source flow of a fact table . So I transformed the source query into a view and applied the DISTINCT ON clause.

DISTINCT Clause

If DISTINCT is specified, all duplicate rows are removed from the result set (one row is kept from each group of duplicates). ALL specifies the opposite: all rows are kept; that is the default.

DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example:

SELECT DISTINCT ON (location) location, time, report FROM weather_reports ORDER BY location, time DESC;

retrieves the most recent weather report for each location. But if we had not used ORDER BY to force descending order of time values for each location, we'd have gotten a report from an unpredictable time for each location.

The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.