Aggregates are more popular than any other lookup topic here, by a ratio of more than 2 to 1.

Basic aggregation is the simplest grouping query pattern: for column foo, display the smallest, largest, sum, average or some other statistic of column bar values:SELECT foo, MIN(bar) AS bar
FROM tbl
GROUP BY foo

Return the highest bar value for each foo, ordering top to bottom by that value:SELECT foo, MAX(bar) AS Count
FROM tbl
GROUP BY foo
ORDER BY Count DESC;

For aggregating functions like MIN() and MAX() that return a single value, there may be multiple instances of the result. If you wish to see them, put the aggregating query in a subquery and join to it from a direct query on the table:SELECT a.foo, a.bar
FROM tbl a
JOIN (
SELECT foo, MAX(bar) AS Count
FROM tbl
GROUP BY foo
) b ON a.foo=b.foo AND a.bar=b.count
ORDER BY a.foo, a.bar;

MySQL introduced the SQL extension GROUP_CONCAT(), which makes short work of listing items in groups. For example, given a table of suppliers and the parts they make ...CREATE TABLE supparts(supID char(2),partID char(2));
INSERT INTO supparts VALUES
('s1','p1'),('s1','p2'),('s1','p3'),('s1','p4'),('s1','p5'),('s1','p6'),
('s2','p1'),('s2','p2'),('s3','p2'),('s4','p2'),('s4','p4'),('s4','p5');

Two rules of thumb help with problems like this: solve one step at a time, and work from inside out. Here "inside out" means start by building the join list needed to retrieve sales data:from salespersons s
join orders o using(salespersonid)
join orderlines l using(orderid)
join products p using(productid)

If columns other than the GROUP BY column must be retrieved, and if the grouping expression does not have a strictly 1:1 relationship with those columns, then to avoid returning arbitrary values for those non-grouping columns, you must put the GROUP BY query in a subquery and join that result to the other columns, for example:SELECT s.partID, s, thiscol, s.thatcol, anothercol, x.Suppliers
FROM supparts s
JOIN (
SELECT partID,GROUP_CONCAT(supID ORDER BY supID) AS Suppliers
FROM supparts
GROUP BY partID
) x USING(partID)

You track squash court bookings and fees. A court booking has one fee, but it has two member references, and those can be the same if one member has brought a guest. For each booking row, the fee is to be split between the two members. What query correctly aggregates fees including cases where the two members of record are the same? DROP TABLE IF EXISTS bookings;
CREATE TABLE bookings (
court_id int(11) NOT NULL,
member1 int(11) NOT NULL,
member2 int(11) NOT NULL,
time timestamp NOT NULL,
fee decimal(5,2) NOT NULL
);

An efficient solution, posted by 'laptop alias' on a MySQL forum:SELECT member, ROUND(SUM(fee/2),2) AS total
FROM (
SELECT member1 AS member, fee FROM bookings
UNION ALL
SELECT member2, fee FROM bookings
) AS tmp
GROUP BY member;

Given a parent table and two child tables, a query which sums values in both child tables, grouping on a parent table column, returns sums that are exactly twice as large as they should be. In this example from the MySQL General Discussion list:DROP TABLE IF EXISTS packageItem,packageCredit,packageItemTax;
CREATE TABLE packageItem (
packageItemID INT,
packageItemName CHAR(20),
packageItemPrice DECIMAL(10,2)
);
INSERT INTO packageItem VALUES(1,'Delta Hotel',100.00);

Find the values of a table column c1 for which there are a specified number of listed values in another column c2.

To get an overview of the values of c2 for each value of c1:SELECT
c1,
GROUP_CONCAT(c2 ORDER BY c2) AS 'C2 values'
FROM table
GROUP BY c1;

To retrieve a list of c1 values for which there exist specific values in another column c2, you need an IN clause specifying the c2 values and a HAVING clause specifying the required number of different items in the list ...SELECT c1
FROM table
WHERE c2 IN (1,2,3,4)
GROUP BY c1
HAVING COUNT(DISTINCT c2)=4;

This is easy to generalise to multiple column expressions, and a HAVING clause specifying any number of items from the IN list.

You have an election database with tables for candidates, parties and districts. A candidate belongs to one party; a district may have any number of candidates:DROP TABLE IF EXISTS parties,districts,candidates;
CREATE TABLE parties (
party char(12) NOT NULL,
PRIMARY KEY (party)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
INSERT INTO parties VALUES ('Conservative'),('Liberal'),('Socialist'),('Green'),('Libertarian');

In a good introductory tutorial on MySQL subqueries, Jeremy Cole developed a triply nested query to retrieve the name, population, head of state, and number of official languages in countries with the most official languages. The query uses two tables from the MySQL world database:CREATE TABLE country (
Code char(3) NOT NULL DEFAULT '' PRIMARY KEY,
Name char(52) NOT NULL DEFAULT '',
Population int(11) NOT NULL DEFAULT '0',
HeadOfState char(60) DEFAULT NULL,
... other columns ...
);
CREATE TABLE countrylanguage (
CountryCode char(3) NOT NULL DEFAULT '' PRIMARY KEY,
Language char(30) NOT NULL DEFAULT '',
IsOfficial enum('T','F') NOT NULL DEFAULT 'F',

When you have parent-child-grandchild tables, eg companies, users, actions, and your query requirement is for per-parent aggregates from the child table and per-child aggregates from the grandchild table, then cascading joins yield spuriously multiplied counts, and correlated subqueries fail because the second correlated subquery cannot find a visible joining column.

Rather than list instances including repeats on one line, you want to list distinct instances and their counts. One way is to do a simple GROUP BY query and in your application layer remove the newlines from the result. Or you can do it in one step:drop table if exists t;
create table t (
type int(10) ,
instance int(10)
) ;
insert into t values
(1,4),(1,7),(1,9),(1,10),(2,2),(2,3),(2,5),(2,6),(2,8),(3,1),(4,11);

If you need a child table aggregate result for each row in a query, what's faster, putting the correlated aggregating subquery in the SELECT list, or in the FROM clause?

Thus, given tables stocks(id, user, ticker,lastvalue) and transactions(id, sid, action, qty, price, commission), and the requirement to retrieve stock values and per-stock transaction counts, which of these queries will run faster?select ticker, id, lastvalue, (select count(*) from transactions where sid=stocks.id) N
from stocks;

Given the table authorbook(authid INT, bookid INT), what query finds the books who have authors with more than one book in the table?

Even one level of recursion can induce a mild trance. Escape the trance by taking the problem one step at a time. First write the query that finds the authors with multiple books. Then join an outer query to that on authorid, and have the outer query select bookid:SELECT a1.bookid
FROM authorbook a1
INNER JOIN (
SELECT authid,count(bookid)
FROM authorbook a2
GROUP BY authid
HAVING COUNT(bookid)>1
) AS a3 ON a1.authid=a3.authid;

A superstore is running a promotion: each day, every fifth a customer wins a prize. Each day you're given a text file with data columns for customerID, timestamp and order amount. How do you find every fifth customer?

Load the data into a table, then write the query inside out; with this kind of problem, nested queries are a boon.

1. Using Load Data Infile, load the text file into a table (indexless for speed) named tbl with columns for customerID, timestamp and amount, and index the table on (customerID, timestamp).

2. Write a query that tracks per-customer order by timestamp with user variables:set @id='', @ord=1;
select

To group rows by a time period whose length in minutes divides evenly into 60, use this formula:GROUP BY ((60/periodMinutes) * HOUR( thistime ) + FLOOR( MINUTE( thistime ) / periodMinutes ))

where thistime is the TIME column and periodMinutes is the period length in minutes. So to group by 15-min periods, write ...SELECT ...
GROUP BY ( 4 * HOUR( thistime ) + FLOOR( MINUTE( thistime ) / 15 ))
...

Here is a simple soccer league table setup that was developed in the MySQL Forum by J Williams and a contributor named "Laptop Alias". The teams table tracks team ID and name, the games table tracks home and away team IDs and goal totals for each game. The query for standings is built by aggregating a UNION of home team and away team game results:DROP TABLE IF EXISTS teams, games;
CREATE TABLE teams(id int primary key auto_increment,tname char(32));
CREATE TABLE games(id int primary key auto_increment, date datetime,
hteam int, ateam int, hscore tinyint,ascore tinyint);

Given tables parent(id int not null primary key, etc...) and child (id int not null primary key, pid int not null references parent (id), etc...), how do we write a query that retrieves only one child row per pid even when the child table has multiple matching rows? MySQL permits use of GROUP BY even when the SELECT list specifies no aggregate function, so this will work:select p.id, c.id
from parent p
join child c on p.id=c.pid
group by p.id;

But is it accurate? No, because it displays only the first c.pid value it happens to find. For further discussion see Within-group aggregates.

... and you need an attendance summary by date and person. It turns out that this aggregating query is easy top write—first write an inner query to Group By date and person With Rollup, then write an outer query that renames the Rollup Null labels.

You have a table in which each row references one text and one keyword in the text ... DROP TABLE IF EXISTS keywords;
CREATE TABLE keywords (txtID int, keyword char(8));
INSERT INTO keywords VALUES(1 , 'foo'),(2 , 'bar'),(1 , 'foo'),(2 , 'foo');

... and you want a list of texts which include every keyword.

You might think you have to join and match. You don't. All you need to do is count the distinct keywords which occur for each text, then for each text compare that number with the entire list of distinct keywords:SELECT txtID, COUNT(DISTINCT keyword) AS N
FROM keywords
GROUP BY txtID
HAVING N = (SELECT COUNT(DISTINCT keyword) FROM keywords);
+-------+---+
| txtID | N |
+-------+---+
| 2 | 2 |
+-------+---+

Your first thought may be to GROUP BY item, but that is not guaranteed to return the correct supplier value for each minimum item price. Grouping by both item and supplier will return more information than you want. Nor can you write WHERE price=MIN(...) because the query engine will evaluate the WHERE clause before it knows the MIN value.

This is the problem of aggregating within aggregates. It is sometimes called the 'groupwise aggregates' problem, but the term 'groupwise' is ambiguous. We think better names for it are subaggregates, inner aggregates, or within-group aggregates.

It's easy to show that the within-group aggregates problem is a form of the problem of returning values from non-grouping columns in a GROUP BY query. Suppose you write ...SELECT item, supplier, Min(price)
FROM products
GROUP BY item;

Will this reliably return the correct supplier per item? No. Unless there is exactly one supplier per item, the supplier value returned will be arbitrary. To retrieve the correct supplier for each item, you need more query logic.

Here are six solutions. As you'll see, some are very slow, a couple are quite fast, and some work for only some versions of the problem:

1. Self-exclusion join solution: One way to model within-group minima or maxima is via an left self exclusion join...SELECT p1.item, p1.supplier, p1.price
FROM products AS p1
LEFT JOIN products AS p2 ON p1.item = p2.item AND p1.price > p2.price
WHERE p2.item IS NULL;

...because in the resultset built by joining on left item = right item and left price > right price, the left-sided rows for which there is no greater right-sided price are precisely the per-item rows with the smallest prices. The query runs more than an order of magnitude faster with an index on (item, supplier).

to which you then join the products table to find the matching suppliers:SELECT products.item, supplier, products.price
FROM products
JOIN tmp ON products.item = tmp.item
WHERE products.price=tmp.minprice;
UNLOCK TABLES;
DROP TABLE tmp;

3. Correlated subquery solution: From MySQL 4.1 on, the temporary table can be a correlated subquery. This is the most intuitively obvious syntax for the problem. It's also the slowest—up to a hundred times slower than the exclusion join, whether the queries are compared with or without indexes:SELECT item, supplier, price
FROM products AS p1
WHERE price = (
SELECT MIN(p2.price)
FROM products AS p2
WHERE p1.item = p2.item
);

4. FROM clause subquery solution: If we move the aggregating subquery from the WHERE clause to the FROM clause, the query improves from a hundred times slower than the self-exclusion join to twenty times faster:SELECT p.item, p.supplier, p.price
FROM products AS p
JOIN (
SELECT item, MIN(price) AS minprice
FROM products
GROUP BY item
) AS pm ON p.item = pm.item AND p.price = pm.minprice;

Some users have trouble mapping elements of this model to their instance of the problem. The model has five elements (or sets of them):

(i) a table, which might be a view, a single physical table, or a table derived from joins
(ii) one or more grouping columns,
(iii) one or more columns to aggregate,
(iv) one or more columns not mentioned in the GROUP BY clause,
(v) an aggregating job to do, typically MIN() or MAX().

5. Ordered subquery solution: A "trick" solution, made possible by MySQL's non-standard tolerance of GROUP BY when there is no aggregating SELECT expression, uses ORDER BY in a subquery to find the lowest prices, and GROUP BY in the outer query to pick them off:SELECT *
FROM (
SELECT *
FROM products
ORDER BY price ASC
) AS s
GROUP BY item;

It's succinct, it's fast if the query engine can find an index to order on, and it retrieves the prices and associated values in one step, but this query is seldom much faster than the FROM clause subquery described above.

The self-exclusion join and WHERE clause subquery methods scale badly because they're O(N2). If ordering and grouping columns are indexed, they become O(N * log N), but are still substantially slower than the FROM clause subquery method.

Each row of a games table records one game score for a team: DROP TABLE IF EXISTS games;
CREATE TABLE games(id INT, teamID INT, score INT);
INSERT INTO games VALUES
(1,1,3),(2,1,4),(3,1,5),(4,1,6),(5,2,6),
(6,2,7),(7,2,8),(8,2,7),(9,2,6),(10,2,7);

How would we write a query that returns the average of the top 50% of scores per team?

Probably more bunk has been written about correlation than about any other statistic. We'll keep this short and straight. At its simplest, correlation is a statistical measure of non-random, linear association between pairs of values in a dataset. It's denoted by r, and varies from -1 through +1, where -1 indicates perfect inverse correlation (the regression line goes down left to right), 0 indicates no correlation (there is no regression line; it's just a scatterplot), and +1 indicates perfect direct correlation (the regression line goes up left to right).

Statistically, the median is the middle value--the value that is smaller than that found in half of all remaining rows, and larger than that found in the other half:SELECT a.hours As Median
FROM BulbLife As a, bulbLife AS b
GROUP BY a.Hours
Having Sum( Sign( 1-Sign(b.Hours-a.Hours ))) = Floor(( Count(*)+1)/2) ;

This formula works, but it doesn't scale—it's O(n*n)—and it's awkward to use.

So we posted a MySQL implementation of Torben Mogenson's algorithm for calculating the median (http://ndevilla.free.fr/median/median/node20.html). It's said to be amongst the fastest, but it proved too slow. Then Joe Wynne offered an algorithm which looks correct and does scale. Here it is as a MySQL stored procedure:

Statistically, the mode is the most frequently occurring value. Given tables parent(id int) and child(pid int, cid int), where child.pid references parent.id as a foreign key, what query finds the parent.id most often represented in the child id, that is, the modal count of child.pid?SELECT pid, COUNT(*) AS frequency
FROM child
GROUP BY pid
ORDER BY frequency DESC
LIMIT 1;

Without MSSQL's RANK() aggregate function, how do we display rank order in a MySQL query, for example from a table like this?CREATE TABLE votes( name CHAR(10), votes INT );
INSERT INTO votes VALUES
('Smith',10),('Jones',15),('White',20),('Black',40),('Green',50),('Brown',20);

The query is a two-step:
1. Join the table to itself on the value to be ranked, handling ties
2. Group and order the result of the self-join on rank:

In a table of track event results, rows marked as 'WR' in the `Note` column represent world record times. How would we retrieve the history of those world records?drop table if exists results;
CREATE TABLE results (
ID int UNSIGNED PRIMARY KEY AUTO_INCREMENT,
Name varchar(50) COLLATE utf8_unicode_ci NOT NULL,
Date date NOT NULL,
Time int NOT NULL,
Note varchar(50) COLLATE utf8_unicode_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=14 ;

From table tbl( class, member ), you want to cross-tabulate all classes with their members. In SQL terms, you aggregate members over classes. In MySQL:SELECT class,GROUP_CONCAT(member)
FROM tbl
GROUP BY class;

With that simple query, you're halfway toward cross-tabulation, halfway to implementing a simple CUBE, and halfway to basic entity-attribute-value (EAV) logic. This is easier to see if we have two columns, rather than just one, to tabulate against the grouping column:DROP TABLE IF EXISTS tbl;

A pivot (or crosstab, or contingency) table aggregates sets of column values into rows of statistics, and pivots target value statistics on partitioning criteria defined by any available data.

Spreadsheet applications have intuitive point-and-click interfaces for generating pivot tables. RDBMSs generally do not. The task looks difficult in SQL, though, only until you have coded a few.

If you ported the Microsoft sample database Northwind to your MySQL database (as described in chapter 11 of Get It Done with MySQL), you can execute this example step by step. Even if you haven't ported Northwind, the example is easy to follow.

Data designs often require flexibility in numbers and names of data points per instance row: instead of saving all the data points belonging to a key value in a single row, you save each data point as a name-value pair in its own row.

because in a pair of backslashes, the second is not escaped by the first, so to compare two literals you double each backslash in the RLIKE argument. But if you are querying a table for such a string from the MySQL client, this doubling happens twice--once in the client, and once in the database--so to find a column value matching 'a\\b', you need to write...SELECT desc FROM xxx WHERE desc RLIKE 'aa\\\\\\\\bb';

This query UNIONs queries for matching column names from two tables, and keeps just those rows which occur once in the union. Those are the rows with unmatched data. Customise your column list { id, col1, col2, col3 ...} as desired, but usually you'll want it to start with the primary key: SELECT MIN(TableName) as TableName, id, col1, col2, col3, ...
FROM (
SELECT 'Table a' as TableName, a.id, a.col1, a.col2, a.col3, ...
FROM a
UNION ALL
SELECT 'Table b' as TableName, b.id, b.col1, b.col2, b.col3, ...
FROM b
) AS tmp
GROUP BY id, col1, col2, col3, ...
HAVING COUNT(*) = 1
ORDER BY 1;

To compare columns by name and ordinal position in tables test.t1 and test.t2:SELECT
MIN(TableName) AS 'Table',
column_name AS 'Column',
ordinal_position AS 'Position'
FROM (
SELECT
't1' as TableName,
column_name,
ordinal_position
FROM information_schema.columns AS i1
WHERE table_schema='test' AND table_name='t1'

One of EF Codd's rules for relational databases is the no-back-door rule: all info about tables should be accessible only by a query on tables. Since version 5, the MySQL implementation of information_schema (I_S) helps meet Codd's requirement. I_S supplies metadata in tables, so it's the first place to look for how to compare the structures of two databases.

Elsewhere on this page there is a simple query template for comparing data in two structurally similar tables:SELECT MIN(TableName) as TableName, id, col1, col2, col3, ...
FROM (
SELECT 'Table a' as TableName, a.id, a.col1, a.col2, a.col3, ...
FROM a
UNION ALL
SELECT 'Table b' as TableName, b.id, b.col1, b.col2, b.col3, ...
FROM b
) AS tmp

Generate SQL to replace 'old_text' with 'new_text' in all character columns in 'dbname':Select Concat( 'update ', table_schema, '.', table_name,
' set ', column_name,
'=replace(', column_name, ',''old_text'',''new_text'');'
)
From information_schema.columns
Where (data_type Like '%char%' or data_type like '%text' or data_type like '%binary')
And table_schema = 'dbname';

Referential integrity (RI) is data consistency between related tables. ACID-compliant database engines like InnoDB provide foreign keys to automate RI maintenance. Simple example: customers have orders, which have items; by defining a customers foreign key in the orders table and an orders foreign key in the items table, you ensure that no orders or items row gets orphaned.

To change a foreign key, first drop it, then declare the new, revised foreign key. The syntax for declaring a foreign key is ...[CONSTRAINT [constraint_name]]
FOREIGN KEY [key_name] (keycol_name,...) reference_definition

and the syntax for dropping one is ...DROP FOREIGN KEY constraint_name

Notice that you can omit the CONSTRAINT when you declare a foreign key, but the only way to DROP a foreign key is to reference it by the constraint_name which you probably never specified!

There should be a circle of hell reserved for designers who build inconsistencies like this into their tools. The only way round this one is to run SHOW CREATE TABLE to find out what the foreign key's constraint_name is, so you can write the DROP statement. Here is a wee test case:

Starting with MySQL 5, you can find all tables which have foreign key references to a given table with an information_schema query, here encapsulated in a stored procedure which takes two arguments, a database name and a table name. When the table argument is blank or NULL, the procedure returns all parent-child links where the parent table is in the specified database; otherwise it returns parent-child links for the specified parent table:DROP PROCEDURE IF EXISTS ListChildren;
DELIMITER |
CREATE PROCEDURE ListChildren( pdb CHAR(64), ptbl CHAR(64) )
BEGIN
IF ptbl = '' OR ptbl IS NULL THEN
SELECT
c.table_schema as 'Parent Schema',
u.referenced_table_name as 'Parent Table',
u.referenced_column_name as 'Parent Column',
u.table_schema as 'Child Schema',
u.table_name as 'Child Table',

List tables which are referenced by foreign key constraints in a given table.This is a simple query on two information_schema tables: table_constraints and key_column_usage. It is easy to parameterise, so we show it in stored procedures. The first sproc lists all foreign key references in a database. The second lists all foreign key references for a table.CREATE PROCEDURE ListParentsForDb( pdb CHAR(64) )
SELECT
u.table_schema AS 'Schema',
u.table_name AS 'Table',
u.column_name AS 'Key',
u.referenced_table_schema AS 'Parent Schema',
u.referenced_table_name AS 'Parent table',
u.referenced_column_name AS 'Parent key'
FROM information_schema.table_constraints AS c
JOIN information_schema.key_column_usage AS u
USING( constraint_schema, constraint_name )

The MySQL SHOW TABLES command is fine, but sometimes we want a little more information.

This simple stored procedure lists the table name, engine type, version, collation and rowcount for every table in a database. (Individual databases come and go, so we keep all such database-wide stored routines in a system database.)DROP PROCEDURE IF EXISTS showtables;
CREATE PROCEDURE showtables()
SELECT
table_name AS 'Table',
IFNULL(engine, 'VIEW') AS Engine,
version AS Version,
table_collation AS Collation,
table_rows AS Rows

The simplest support for counting business days between any two dates is a calendar table calendar(d date,isholiday bool) populated for all days in all possibly relevant years. Then the following query gives the inclusive number of business days between dates dStart and dStop:SELECT COUNT(*)
FROM calendar
WHERE d BETWEEN dStart AND dStop
AND DAYOFWEEK(d) NOT IN(1,7)
AND isholiday=0;

If that solution is not available, you have to do with a weekday count, which this function (corrected 6 Jul 2009) computes:

Date arithmetic is deceptively hard. One way to appreciate the difficulties is to read Chapter 21 in our book. Another is to try to calculate the number of Tuesdays (or another weekday) between two dates. It's not a back-of-the-napkin problem.

An earlier formula we had for this problem sometimes gave incorrect results. As a debugging aid, we wrote a brute force calculator for the problem:SET GLOBAL log_bin_trust_function_creators=1;
DROP FUNCTION IF EXISTS DayCount;
DELIMITER |

Find the difference between two datetime values in seconds, minutes, hours or days. If dt1 and dt2 are datetime values of the form 'yyyy-mm-dd hh:mm:ss', the number of seconds between dt1 and dt2 isUNIX_TIMESTAMP( dt2 ) - UNIX_TIMESTAMP( dt1 )

To get the number of minutes divide by 60, for the number of hours divide by 3600, and for the number of days, divide by 3600 * 24.

You track resource booking periods. You need a query to report daily usage for a given resource.

First the problem of calculating per-diem usage. Call the starting datetime of a booked period pStart, and its ending datetime pEnd. Then for a given date pDate, if the period began before pDate. then pDate usage begins at 00:00:00, otherwise it starts at pStart; likewise if the period extends past pDate, then pDate usage ends at midnight on pDate, otherwise it ends at pEnd. Therefore the period begins at...IF( pStart < pDate, CAST(pDate AS DATETIME ), pStart )

A table tracks beginning and ending booking dates dfrom and dto. How to sum booking durations in days for a given month, say July?

A booking may begin and end in the target month, or begin, or end, or neither:select
sum(
case
when month(dfrom)=7 and month(dfrom)=month(dto) then datediff(dto,dfrom)
when month(dfrom)=7 then datediff(last_day(dfrom),dfrom)
when month(dto)=7 then datediff(dto,date_format(dto,'%Y-%m-01'))
else 0
end
) as DaysInJuly
from ...

Given a date and its weekday number (1=Sunday, ..., 7=Saturday), there are three possibilities:
1. Today is Thursday: then next Thursday is 7 days from now.
2. Today is before Thursday: then next Thursday is (5 minus today's weekday number) from now.
3. Today is after Thursday: then next Thursday is 7 + (5 minus today's weekday number).set @d=curdate();
set @n = dayofweek(curdate());
select
@d:=adddate(curdate(),0) as date,
@n:=dayofweek(adddate(curdate(),0)) as weekday,

We often need to compute datetimes based on year and month. This tiny function simplifies such queries:set global log_bin_trust_function_creators=1;
create function yearmonth(d date) returns int
return 100*year(d)+month(d);

Then to find date values within the three-month period bounded by the first day of last month and the last day of next month, write ...select d
from tbl
where yearmonth(d) between yearmonth(curdate()-interval 1 month) and yearmonth(curdate()+interval 1 month);

A table that tracks time periods may require period uniqueness. That means it has no sequenced duplicates.

If a table has columns processID, start_date and end_date, those three columns are period unique if there exists no pair of rows with the same processID and overlapping start_date and end_date values. If there is such a pair of rows, the table exhibits sequenced duplication.

Another way of saying it: if an instant is the smallest datetime unit of start_date and end_date columns, then if there are no sequenced duplicates, there is exactly one processID value at any instant.

Here is a query to find sequenced duplicates for those columns:SELECT t.processid
FROM tbl t
WHERE EXISTS (
SELECT * FROM tbl AS t3
WHERE t3.processid IS NULL
)
OR EXISTS (
SELECT * FROM tbl AS t1
WHERE 1 < (
SELECT COUNT(processid)
FROM tbl AS t2
WHERE t1.processid = t2.processid
AND t1.start_date < t2.end_date
AND t2.start_date < t1.end_date
)
);

Was employee 1 in or out at 12:30pm on 15 May 2014? Use a self-join to play the in and out events against the given datetime ...select if(count(1),'Yes','No' ) as InStore
from tbl a
join tbl b using(empno, clockdate)
where empno=1
and a.clockdate='2014-5-15'
and a.clocktime<='12:30:00' and a.clocktype='I'
and b.clocktime>'12:30:00' and b.clocktype='O';
+---------+
| InStore |
+---------+
| Yes |
+---------+

You have a visits table (id int, start datetime, end datetime), and you wish to track peak visit counts. A simple solution is to self-join on non-matching IDs and overlapping visit times, group by ID, then order by the resulting counts:SELECT a.id,group_concat(b.id) as Overlaps, count(b.id)+1 as OverlapCount
FROM visits a
JOIN visits b on a.id < b.id and a.start < b.end and b.start < a.end
GROUP BY a.id
ORDER BY OverlapCount DESC;

Given a clinic of physicians, patients and appointments, how to find an available appointment time for a given physician?

This is a variant of the [Not] Exists query pattern. Though we can write it with subqueries, performance will be crisper with a join. But finding data that is not there requires a join to data which is there. So in addition to tables for appointments, doctors and patients, we need a table of all possible appointment datetimes. Here's a schema illustrating the idea ...CREATE TABLE a_dt ( -- POSSIBLE APPOINTMENT DATES AND TIMES
d DATE,
t TIME
);
CREATE TABLE a_drs ( -- DOCTORS
did INT -- doctor id
);
CREATE TABLE a_pts ( -- PATIENTS

You have a schedule table (period, day, subject, room) with a primary key period,day to avoid duplicate bookings. You wish to display the schedule as periods, subjects and rooms in rows, and days of the week in columns.SELECT
period,
MAX(IF(day=1, CONCAT(subject,' ',room), '')) AS Mon,
MAX(IF(day=2, CONCAT(subject,' ',room), '')) AS Tue,
MAX(IF(day=3, CONCAT(subject,' ',room), '')) AS Wed,
MAX(IF(day=4, CONCAT(subject,' ',room), '')) AS Thu,
MAX(IF(day=5, CONCAT(subject,' ',room), '')) AS Fri
FROM schedule
GROUP BY period

MAX() chooses existing over blank entries, and GROUP BY lines everything up on the same row.

Open Item: match payments against individual charges, typically by carrying the charge number in the payments table

Statement: list and sum all charges and all payments, and show the difference as the outstanding balance.

The Open Item method needs a foolproof way to match payments to charges, but what if the customer neglected to return a copy of the invoice, or to write the invoice number on the cheque? Reconciliation staff spend much of their time resolving such problems.

Can we help? Yes! It won't be entirely foolproof, but it will drastically cut down the onerous work of reconciliation.

Show parents, children and grandchildren including parents without childrenSELECT parent.id AS ParentID,
IFNULL(child.parent_id,') AS ChildParentID,
IFNULL(child.id,') AS ChildID,
IFNULL(grandchild.child_id,') AS GrandchildChildID
FROM parent
LEFT JOIN child ON parent.id=child.parent_id
LEFT JOIN grandchild ON child.id=grandchild.child_id;

You have n student classes of known size, and m classrooms of known size, where m>=n. What's the best algorithm for assigning as many classes as possible to rooms of adequate size?

It's a version of the combinatorial knapsack problem. It's known to be NP-complete, which means it's possible to verify any correct solution but there is no known algorithm for quickly finding a correct solution. How then to proceed?

Early in 2010 Joe Celko resurrected the problem in a Simple Talk column, and challenged readers to improve on SQL Server solutions he'd published in the third edition of his "SQL for Smarties". Here's his small version of the problem modified for MySQL:DROP TABLE IF EXISTS Rooms, Classes;
CREATE TABLE Rooms(
room_nbr CHAR(2) NOT NULL PRIMARY KEY, room_size INTEGER NOT NULL
) ENGINE=MyISAM;
CREATE TABLE Classes(

Data-driven table relationships are hard to maintain, but sometimes they cannot be avoided. How do we build joins for them? One way is to use a CASE statement in the SELECT list to handle the joining possibilities. In this example, the parent.linktable column determines the name of the table where a particular parent row's data is. The method is fine when the number of child tables is small:USE test;
DROP TABLE IF EXISTS parent, child1, child2;

We often want to know how certain column values associate with other column values, for example "What else did buyers of x buy?", or "What projects did Sean, Ian and Gerard all work on?"

Start with buyers of x. The table that summarises this information might be a View that encapsulates joins from customers to orders to orderitems to products, perhaps scoped on a recent date range. Here we ignore all such detail. We focus only on the SQL patterns that solve this kind of problem:DROP TABLE IF EXISTS userpurchases;
CREATE TABLE userpurchases( custID INT UNSIGNED, prodID INT UNSIGNED );
INSERT INTO userpurchases
VALUES (1,1),(1,2),(2,4),(3,1),(3,2),(4,2),(4,3),(5,1),(5,2),(5,3);
SELECT custID, GROUP_CONCAT(prodID ORDER BY prodID) AS PurchaseList
FROM userpurchases
GROUP BY custID;
+--------+--------------+

Usually, a JOIN is faster than an uncorrelated subquery. For example in the sakila test database, customer is a parent of rental (via customer_id) which in turn is a parent of payment (via rental_id). The subquery version of a query for whether a customer has made payments and rentals... SELECT DISTINCT c.customer_id
FROM customer c
WHERE c.customer_id IN (
SELECT r.customer_id
FROM rental r
JOIN payment p USING (rental_id)
WHERE c.customer_id = 599;
);

Given tables parent(id INT), child(id INT,parent_id INT), how do we find parents with no children? It's the All X for which there is no Y pattern, which can be written as an exclusion join...SELECT parent.id
FROM parent
LEFT JOIN child ON parent.id = child.parent_id
WHERE child.parent_id IS NULL;

or with a NOT EXISTS subquery, which is logically equivalent to the exclusion join, but usually performs much slower:SELECT parent.id AS ParentID
FROM parent
WHERE NOT EXISTS (
SELECT parent.id
FROM parent
JOIN child ON parent.ID = child.parent_id
);

You have a parties table that holds info on peoples' names etc, and a contracts table where each row has clientID and contractorID value pointing at a parties.partyID value--that is, each contracts row points at two parties rows. You want to list the names of all contractors and their clients.SELECT clientpartyID,
pCli.name AS Client,
contractorpartyID,
pCon.name AS Contractor
FROM contracts
INNER JOIN parties AS pCli
ON contracts.clientpartyID = pCli.partyID
INNER JOIN parties AS pCon
ON contracts.contractorpartyID = pCon.partyID;

You track orders and their items in orders and orderdetails tables, as in the NorthWind database. How many of your orders have been for multiple items? We can use the standard SQL IN() operator to answer the question:SELECT orderID
FROM orders
WHERE orderID IN (
SELECT orderID
FROM orderdetails
GROUP BY orderID
HAVING COUNT(orderID) > 1
);

Given a table employee( employeeID INT, mgr_employeeID INT, salary DECIMAL(10,2)), find the managers who earn less than one or more of their subordinates.

We can write this query directly from the logic of its spec...SELECT DISTINCT employeeID
FROM employee AS e
WHERE EXISTS (
SELECT employeeID
FROM employee AS m
WHERE m.mgr_employeeID = e.employeeID AND e.salary > m.salary
);

We have a students table, an exams table, and a registrations table in which each row registers one student for one exam. How do we find the exams for which a particular student is not registered? DROP TABLE IF EXISTS students, exams, registrations;
CREATE TABLE students (
sid int(10) unsigned PRIMARY KEY auto_increment,
firstname varchar(45) NOT NULL default '',
lastname varchar(45) NOT NULL default ''
);
INSERT INTO students VALUES
(1, 'Jack', 'Malone'),(2, 'Hiro', 'Nakamura'),(3, 'Bree', 'Van de Kamp'),
(4, 'Susan', 'Mayer'),(5, 'Matt', 'Parkman'),(6, 'Claire', 'Bennet');

You have parties and contracts tables. Every contracts row has a contractorpartyID value which references a row in parties, and a clientpartyID value which also references a row in parties. How to list all parties and their contracts, showing blanks as empty strings rather than NULLs?SELECT parties.partyID,
IFNULL(contractorpartyID,'') AS contractor,
IFNULL(clientpartyID,'') AS client
FROM parties
LEFT JOIN contractor_client ON partyID=contractorpartyID
ORDER BY partyID;
+---------+------------+--------+
| partyID | contractor | client |
+---------+------------+--------+
| 1 | | |
| 2 | 2 | 1 |
| 3 | | |
+---------+------------+--------+

ISO SQL defines a ROW_NUMBER() OVER function with an optional PARTITION clause for generating a derived row number column in a resultset. Several RDBMSs—including DB2, Oracle and SQL Server—implement it. Here is the simplest possible example. Given a table with two columns i and j, generate a resultset that has a derived sequential row_number column taking the values 1,2,3,... for a defined ordering of j which resets to 1 when the value of i changes: DROP TABLE IF EXISTS test;
CREATE TABLE test(i int,j int);
INSERT INTO test
VALUES (3,31),(1,11),(4,14),(1,13),(2,21),(1,12),(2,22),(3,32),(2,23),(3,33);

You have a table of names, you have retrieved a row with name $name, and you want the row for the next name in name order. MySQL LIMIT syntax makes this very easy:SELECT *
FROM tbl
WHERE name > $name
ORDER BY name
LIMIT 1

Suppose you have a phone book of names, addresses, etc. You are displaying 20 rows per page, you're on page 100, and you want to display page 99. How do you do this knowing only what page you are on?

Assuming...

1-based page numbers

you are on page P

each page shows N rows

then the general formula for translating a 1-based page number into a first LIMIT argument is ... MAX(0,P-1) * N

which for the 99th 20-row page evaluates to 1960, and the second argument to LIMIT is just N, so to see page 99, write... SELECT ... LIMIT (1960, N);

The trouble with this is scaling. MySQL doesn't optimise LIMIT at all well. SELECT ... LIMIT 1000000,20 will unfortunately retrieve not just the twenty rows starting at the millionth row; it will retrieve a million and twenty rows before it shows you the 20 you asked for! The bigger the result, the longer LIMIT takes.

What's the alternative? Build pagination logic into the WHERE clause, and ensure that there is a covering index for the paginating column. On a table of 100,000 indexed random integers, SELECT ... WHERE ... for the last 20 integers in the table is twice as fast as the comparable LIMIT query. With a million integers, it's more than 500 times faster!

If your interface calls for showing only 20 rows per page on a given order, retrieve the twenty rows, plus the row just before the set if it exists, plus the next row after those twenty if it exists. When the user clicks the Previous Page button, adjust the WHERE clause to specify rows where the key value is <= the row just before the current set and ORDER BY the current index DESC LIMIT 20; likewise when the user clicks the Next Page button, have the WHERE clause specify key values >= that of the row just after the set, and ORDER BY the current index ASC LIMIT 20.

You have an election database with tables listing political parties, election districts, and candidates running for parties in those districts. You want to know which parties have candidates running in all districts. Under Aggregates we show a GROUP BY solution (here).

If there are reasons not to aggregate, relational division can solve the problem. The basic idea in relational division is that, aside from aggregation, SQL has no direct way to express "all Xs for which all Y are Z", but does have a NOT EXISTS operator, so we can express "all Xs for which all Y are Z" in SQL as a double negative: "all Xs for which no Y is not Z". Once you think of formulating the question this way, the query almost writes itself:SELECT DISTINCT party FROM parties
WHERE NOT EXISTS (
SELECT * FROM districts
WHERE NOT EXISTS (
SELECT * FROM candidates
WHERE candidates.party=parties.party AND candidates.district=districts.district
)
);

One way to arrive at the answer is by asking: What are the assembly-supplier pairs such that no part of the assembly is not made by the supplier? That's relational division again, formulated for two tables by Stephen Todd. Given assemblyparts(assembly,part) and partsuppliers(part,supplier) tables, here is a query that Joe Celko credits to Pierre Mullin.SELECT DISTINCT
AP1.assembly,
SP1.supplier
FROM AssemblyParts AS AP1, PartSuppliers AS SP1
WHERE NOT EXISTS (
SELECT *
FROM AssemblyParts AS AP2
WHERE AP2.assembly = AP1.assembly
AND NOT EXISTS (
SELECT SP2.part
FROM PartSuppliers AS SP2
WHERE SP2.part = AP2.part AND SP2.supplier = SP1.supplier
)
);

You have a table tbl(id int) with values (1,2,4,18,19,20,21), and you wish to find the first missing number in its sequence of id values:SELECT t1.id+1 AS Missing
FROM tbl AS t1
LEFT JOIN tbl AS t2 ON t1.id+1 = t2.id
WHERE t2.id IS NULL
ORDER BY id LIMIT 1;
+---------+
| Missing |
+---------+
| 3 |
+---------+

Given a table t(ID int, ...), how would we display each ID and its next highest value? A simple method uses aggregation on a non-equi-self-join to find the smallest value of all greater values for each value:SELECT id, MIN(b.id) AS Next
FROM t AS a
JOIN t AS b ON b.id > a.id
GROUP BY a.id;

Sometimes we need next values of a column on a given row ordering. Oracle has a LEAD(...) OVER(...) construct to simplify this query. MySQL does not. The logic is:

1. Form a resultset consisting of all relevant rows, joined with all relevant rows that have greater values in the ordering columns. For example, if the table has these rows:+------+
| 2 |
| 4 |
| 6 |
| 8 |
| 10 |
+------+

A traditional way to find the first and last values of column value sequences in a table like this ...drop table if exists t;
create table t(id int);
insert into t values(1),(2),(3),(4),(6),(7),(8);

... uses an exclusion join on the previous sequential value to find the first value of each sequence, and the minimum next value from a left join and an exclusion join on the previous sequential value to find the end of each sequence:SELECT

You have a table tbl with a column whose values can be freely overwritten and which needs to be filled with perfectly sequential values starting with 1:SET @i=0;
UPDATE tbl SET keycol=(@i:=@i+1);

But more often, what you need is a mechanism for guaranteeing an unbroken sequence of integer key values whenever a row is inserted.

Here is the bad news: auto_increment does not provide this logic, for example it does not prevent inserton failures, editing of the column, or subsequent deletion.

The good news is that the logic for guaranteeing an unbroken sequence is straightforward:

(i) use InnoDB

(ii) create a table with the desired sequence in one column and further column(s) to track use (to use such a table for sequential columns in multiple other tables, provide a tracking column for each such table)

(iii) write Triggers to fetch the next available sequential key from that table, mark it used, and assign it on Insert

Find the distance in kilometres between two points on the surface of the earth. This is just the sort of problem stored functions were made for. For a first order approximation, ignore deviations of the earth's surface from the perfectly spherical. Then the distance in radians is given by a number of trigonometric formulas. ACOS and COS behave reasonably: COS(lat1-lat2)*(1+COS(lon1-lon2)) - COS(lat1+lat2)*(1-COS(lon1-lon2))
rads = ACOS( --------------------------------------------------------------------- )
2

We need to convert degrees latitude and longitude to radians, and we need to know the length in km of one radian on the earth's surface, which is 6378.388. The function:set log_bin_trust_function_creators=TRUE;

You have a parties table that holds info on peoples' names etc, and a contracts table in which each row defines one contract, identifying a client as clientpartyID and a contractor as contractorpartyID, each of these a foreign key referencing parties.partyID. You want a list of parties showing how many contracts they have participated in as client, and how many they've participated in as contractor.SELECT
p.partyID,
p.name,
(SELECT COUNT(*) FROM contractor_client c1 WHERE c1.clientpartyID = p.partyID )
AS ClientDeals,
(SELECT COUNT(*) FROM contractor_client c2 WHERE c2.contractorpartyID = p.partyID)
AS ContractorDeals
FROM parties p
ORDER BY partyID;

In the Sakila table film, retrieve a top-down percentile ranking of film lengths:SELECT
a.film_id ,
ROUND( 100.0 * ( SELECT COUNT(*) FROM film AS b WHERE b.length <= a.length ) / total.cnt, 1 )
AS percentile
FROM film a
CROSS JOIN (
SELECT COUNT(*) AS cnt
FROM film
) AS total
ORDER BY percentile DESC;

A user variable can maintain a per-row cumulative sum of column values. Initialise it, then adjust its value as desired in the appropriate SELECT expression:SET @total=0;
SELECT id, value, @total:=@total+value AS RunningSum
FROM tbl;

Chequebook balancing programs often use this pattern. This one tracks the running balance and how much money from the most recent deposit remains:DROP TABLE IF EXISTS chequebook;

You often need to sum across several categories to total customer purchase amounts, salesperson sales amounts, political party election spending, etc.

For this example assume three tables: candidates, parties and ridings. You want to get the total amount spent in all ridings by every party in one output row. Here is the schema:CREATE TABLE candidates (
id int(11) NOT NULL default '0',
`name` char(10) ,
riding char(12) ,
party char(12) ,
amt_spent decimal(10,0) NOT NULL default '0',
PRIMARY KEY (id)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

We often want to know the top 1, 2, 10 or whatever values from a query. This is dead simple in MySQL. However many JOINs and WHEREs the query has, simply ORDER BY the column(s) whose highest values are sought, and LIMIT the resultset: SELECT (somecolumn), (othercolumns) ...
FROM (some tables) ...
ORDER BY somecolumn DESC
LIMIT 10;

For this example, the query string should be of the form:SELECT ... FROM ... WHERE ... IN (

but so long as it has those elements, it can be as complex as you like. When you call the sproc:
1. Quote each argument with a pair of single quotes,
2. Separate these quoted arguments with commas,
3. Surround the whole param string with another set of single quotes: CALL passInParam( 'SELECT * FROM tbl WHERE colval IN (', ('''abc'',''def'',''ghi''' ));

If ip is for the form N.N.N.N where N may be 1, 2 or 3 digits, how to group and count by just the first three octets, ie the ip class?SELECT
LEFT(ip, CHAR_LENGTH(ip) - LOCATE('.', REVERSE(ip))) as ipclass,
COUNT(*)
FROM tbl
GROUP BY ipclass;

Hamilton Turner notes we can find the first octet with LEFT(ip,LOCATE('.',ip)-1).