This chapter is from the book

10.6 General Rules for the GROUP BY Clause

This section describes a number of important rules that relate to select blocks with a GROUP BY clause.

Rule 1: In Section 9.6, in Chapter 9, we gave several rules for the use of aggregation functions in the SELECT clause. We now add the following rule: If a select block does have a GROUP BY clause, any column specification specified in the SELECT clause must exclusively occur as a parameter of an aggregated function or in the list of columns given in the GROUP BY clause, or in both.

Therefore, the following statement is incorrect because the TOWN column appears in the SELECT clause, yet it is not the parameter of an aggregation function and does not occur in the list of columns by which the result is grouped.

SELECT TOWN, COUNT(*)
FROM PLAYERS
GROUP BY PLAYERNO

The reason for this restriction is as follows. The result of an aggregation function always consists of one value for each group. The result of a column specification on which grouping is performed also always consists of one value per group. These results are compatible. In contrast, the result of a column specification on which no grouping is performed consists of a set of values. This would not be compatible with the results of the other expressions in the SELECT clause.

Rule 2: In most examples, the expressions that are used to form groups also occur in the SELECT clause. However, that is not necessary. An expression that occurs in the GROUP BY clause can appear in the SELECT clause.

Rule 3: An expression that is used to form groups can also occur in the SELECT clause within a compound expression. We give an example next.

Example 10.11. Get the list with the different penalty amounts in cents.

SELECT CAST(AMOUNT * 100 AS SIGNED INTEGER)
AS AMOUNT_IN_CENTS
FROM PENALTIES
GROUP BY AMOUNT

The result is:

AMOUNT_IN_CENTS
---------------
2500
3000
5000
7500
10000

Explanation: A grouping is performed on a simple expression consisting of a column name: AMOUNT. In the SELECT clause, that same AMOUNT column occurs within a compound expression. This is allowed.

This rule is followed by the fact that no matter how complex a compound expression is, if it occurs in a GROUP BY clause, it can be included in its entirety only in the SELECT clause. For example, if the compound expression PLAYERNO * 2 occurs in a GROUP BY clause, the expressions PLAYERNO * 2, (PLAYERNO * 2) - 100, and MOD(PLAYERNO * 2, 3) - 100 can occur in the SELECT clause. On the other hand, the expressions PLAYERNO, 2 * PLAYERNO, PLAYERNO * 100, and 8 * PLAYERNO * 2 are not allowed.

Rule 4: If an expression occurs twice or more in a GROUP BY clause, double expressions are simply removed. The GROUP BY clause GROUP BY TOWN, TOWN is converted to GROUP BY TOWN. Also, GROUP BY SUBSTR(TOWN,1,1), SEX, SUBSTR(TOWN,1,1) is converted to GROUP BY SUBSTR(TOWN,1,1), SEX. Therefore, it has no use for double expressions.

Rule 5: In Section 9.4, in Chapter 9, we described the cases in which the use of DISTINCT in the SELECT clause is superfluous. The rules given in that section apply to SELECT statements without a GROUP BY clause. We add a rule for SELECT statements with a GROUP BY clause: DISTINCT (if used outside an aggregation function) that is superfluous when the SELECT clause includes all the columns specified in the GROUP BY clause. The GROUP BY clause groups the rows in such a way that the columns on which they are grouped no longer contain duplicate values.

Exercise 10.12: Describe why the following statements are incorrect:

SELECT PLAYERNO, DIVISION
FROM TEAMS
GROUP BY PLAYERNO

SELECT SUBSTR(TOWN,1,1), NAME
FROM PLAYERS
GROUP BY TOWN, SUBSTR(NAME,1,1)