10 SQL Tricks that You Didn't Think Were Possible

SQL is the winning language of Big Data. Whether you’re running a classic relational database, a column store (“NewSQL”), or a non-relational storage system (“NoSQL”), a powerful, declarative, SQL-based query language makes the difference. The SQL standard has evolved drastically in the past decades, and so have its commercial and open source implementations.
In this fast-paced talk, we’re going to look at very peculiar and interesting data problems and how we can solve them with SQL. We’ll explore common table expressions, hierarchical SQL, table-valued functions, lateral joins, row value expressions, window functions, and advanced data types, such as XML and JSON. And we’ll look at Oracle’s mysterious MODEL and MATCH_RECOGNIZE clauses, devices whose mystery is only exceeded by their power. Most importantly, however, we’re going to learn that everyone can write advanced SQL. Once you learn the basics in these tricks, you’re going to love SQL even more.

5.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
Why do I talk about SQL?
SQL is the only ever successful,
mainstream, and general-
purpose 4GL (Fourth-
Generation Programming
Language)
And it is awesome!

8.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
What is SQL?
What is
SQL?

9.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
What is SQL?
SQL is the original
microservice

10.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
What is SQL?
SQL is the original
microservice
Just install a single stored
procedure in an Oracle XE
instance, deploy, done.

20.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
Seriously, what does that mean?
Any program
can be written
in SQL!
(although, no one’s that crazy)

21.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
The strength of a 4GL language
You tell the
machine WHAT,
not HOW

22.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
Which do you feel is more awesome? This?
Siri, what is the
meaning of life?
Image credit: https://www.flickr.com/photos/procsilas/12821454664 By Procsilas Moscas. License CC-BY 2.0

23.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
Which do you feel is more awesome? Or this?
Image credit: https://www.flickr.com/photos/ajmexico/3281139507 By ajmexico. License CC-BY 2.0

24.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
That’s why the company is called “Oracle”

37.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
Remember this from this talk: The SQL muscle
Image credit: https://www.flickr.com/photos/flamephoenix1991/8376271918 By _DJ_. License CC-BY SA 2.0
This is the SQL muscle.
It needs constant training
and practice

38.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
Remember this from this talk: The SQL muscle
Image credit: https://www.flickr.com/photos/mikecogh/6684205707 By Michael Coghlan. License CC-BY SA 2.0
It is the same for the Java muscle

40.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
What you came here for
Enough bla bla
What you came
here for...

41.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
10 SQL tricks to convince you SQL is awesome
1. Everything is a table
2. Data generation with recursive SQL
3. Running total calculations
4. Finding the length of a series
5. Finding the largest series with no gaps
6. The subset sum problem with SQL
7. Capping a running total
8. Time series pattern recognition
9. Pivoting and unpivoting
10.Abusing XML and JSON (don’t do this at home)

69.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
3. Running total calculations
What is a running total?
Ask your project manager to give
you a crash course about the
awesome Microsoft Excel!

70.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
3. Running total calculations
What is a running total?
Ask your project manager to give
you a crash course about the
awesome Microsoft Excel!

71.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
3. Running total calculations
But first, a little theory about
window functions
There was SQL before window
functions and there was SQL after
window functions.

72.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
3. Running total calculations
-- Aggregations / rankings on a subset of
-- rows relative to the current row being
-- transformed by SELECT
function(...) OVER (
PARTITION BY ...
ORDER BY ...
ROWS BETWEEN ... AND ...
)
What are window functions?

73.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
3. Running total calculations
-- Aggregations / rankings on a subset of
-- rows relative to the current row being
-- transformed by SELECT
function(...) OVER (
PARTITION BY length
ORDER BY ...
ROWS BETWEEN ... AND ...
)

74.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
3. Running total calculations
-- Aggregations / rankings on a subset of
-- rows relative to the current row being
-- transformed by SELECT
function(...) OVER (
PARTITION BY length
ORDER BY ...
ROWS BETWEEN ... AND ...
)

75.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
3. Running total calculations
-- Aggregations / rankings on a subset of
-- rows relative to the current row being
-- transformed by SELECT
function(...) OVER (
PARTITION BY ...
ORDER BY title
ROWS BETWEEN ... AND ...
)

76.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
3. Running total calculations
-- Aggregations / rankings on a subset of
-- rows relative to the current row being
-- transformed by SELECT
function(...) OVER (
PARTITION BY ...
ORDER BY title
ROWS BETWEEN ... AND ...
)

77.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
3. Running total calculations
-- Aggregations / rankings on a subset of
-- rows relative to the current row being
-- transformed by SELECT
row_number( ) OVER (
PARTITION BY ...
ORDER BY title
ROWS BETWEEN ... AND ...
)

94.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
Easy...
4. Finding the largest series with no gaps
SELECT DISTINCT
cast(login_time AS DATE) AS login_date
FROM logins
WHERE user_id = :user_id

96.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
Still easy...
4. Finding the largest series with no gaps
SELECT
login_date,
row_number() OVER (ORDER BY login_date)
FROM login_dates

97.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
Now, what happens if we subtract...?
4. Finding the largest series with no gaps
SELECT
login_date -
row_number() OVER (ORDER BY login_date)
FROM login_dates

100.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
4. Finding the largest series with no gaps
Such consecutive
Much row number
So gap wow

101.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
Easy explanation:
1. ROW_NUMBER() never has gaps
2. Our data, however, does
4. Finding the largest series with no gaps

102.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So, just group by this difference!
4. Finding the largest series with no gaps
SELECT
min(login_date), max(login_date),
max(login_date) -
min(login_date) + 1 AS length
FROM login_date_groups
GROUP BY grp
ORDER BY length DESC

123.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
5. Finding the length of a series
SELECT
trx.*,
CASE WHEN lag(sign)
OVER (ORDER BY id DESC) != sign
THEN rn END AS lo,
CASE WHEN lead(sign)
OVER (ORDER BY id DESC) != sign
THEN rn END AS hi,
FROM trx

124.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
5. Finding the length of a series
SELECT -- With NULL handling...
trx.*,
CASE WHEN coalesce(lag(sign)
OVER (ORDER BY id DESC), 0)!=sign
THEN rn END AS lo,
CASE WHEN coalesce(lead(sign)
OVER (ORDER BY id DESC), 0)!=sign
THEN rn END AS hi,
FROM trx

131.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
5. Finding the length of a series
SELECT
trx.*,
last_value (lo) IGNORE NULLS OVER (
ORDER BY id DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) AS lo,
first_value(hi) IGNORE NULLS OVER (
ORDER BY id DESC
ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING) AS hi
FROM trx

132.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
5. Finding the length of a series
SELECT -- With NULL handling...
trx.*,
coalesce(last_value (lo) IGNORE NULLS OVER (
ORDER BY id DESC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW), rn) AS lo,
coalesce(first_value(hi) IGNORE NULLS OVER (
ORDER BY id DESC
ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING), rn) AS hi
FROM trx

183.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
7. Capping a running total
SELECT *
FROM (
SELECT date, amount, 0 AS total
FROM amounts
)
MODEL
DIMENSION BY (row_number() OVER (ORDER BY date) AS rn)
-- Individually enumerate each row with a row number

193.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
8. Time series pattern recognition
Trigger on the 3rd
repetition of an event if
the event occurs more
than 3 times.

196.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
8. Time series pattern recognition
SELECT *
FROM series
MATCH_RECOGNIZE (
ORDER BY ...
-- Pattern matching is done in this order
)

197.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
8. Time series pattern recognition
SELECT *
FROM series
MATCH_RECOGNIZE (
ORDER BY ...
MEASURES ...
-- These are the columns produced by matches
)

198.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
8. Time series pattern recognition
SELECT *
FROM series
MATCH_RECOGNIZE (
ORDER BY ...
MEASURES ...
ALL ROWS PER MATCH
-- A short specification of what rows are
-- returned from each match
)

225.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
All others
9. Pivoting and unpivoting
SELECT
first_name, last_name,
count(CASE rating WHEN 'NC-17' THEN 1 END) AS "NC-17",
count(CASE rating WHEN 'PG' THEN 1 END) AS "PG",
count(CASE rating WHEN 'G' THEN 1 END) AS "G",
count(CASE rating WHEN 'PG-13' THEN 1 END) AS "PG-13",
count(CASE rating WHEN 'R' THEN 1 END) AS "R"
FROM actor AS a
JOIN film_actor AS fa USING (actor_id)
JOIN film AS f USING (film_id)
GROUP BY actor_id

230.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
9. Pivoting and unpivoting
Pivoting:
Values from a single column become
columns containing aggregations
Unpivoting:
Columns become values in a single column

244.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
10 SQL tricks to convince you SQL is awesome
1. Everything is a table
2. Data generation with recursive SQL
3. Running total calculations
4. Finding the length of a series
5. Finding the largest series with no gaps
6. The subset sum problem with SQL
7. Capping a running total
8. Time series pattern recognition
9. Pivoting and unpivoting
10.Abusing XML and JSON (don’t do this at home)

247.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
10 SQL tricks to convince you SQL is awesome
Noun
maze (plural mazes)
1. A labyrinth; a puzzle consisting of a
complicated network of paths or passages, the
aim of which is to find one's way.
2. Something made up of many confused or
conflicting elements; a tangle.
3. Confusion of thought; perplexity; uncertainty;
state of bewilderment.

248.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
10 SQL tricks to convince you SQL is awesome
Noun
maze (plural mazes)
1. A labyrinth; a puzzle consisting of a
complicated network of paths or passages, the
aim of which is to find one's way.
2. Something made up of many confused or
conflicting elements; a tangle.
3. Confusion of thought; perplexity; uncertainty;
state of bewilderment.

249.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
Why do I talk about SQL?
SQL is the only ever successful,
mainstream, and general-
purpose 4GL (Fourth-
Generation Programming
Language)
And it is awesome!

250.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
Why do I talk about SQL?
Not a single,
explicit algorithm!

256.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database?

257.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes

258.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes
2. Can you do it in the database?

259.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes
2. Can you do it in the database? Yes

260.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes
2. Can you do it in the database? Yes
(... after visiting my 2 day SQL training)
http://www.jooq.org/training

261.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes
2. Can you do it in the database? Yes
(... after visiting my 2 day SQL training)
3. Can you do it in your database?

262.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes
2. Can you do it in the database? Yes
(... after visiting my 2 day SQL training)
3. Can you do it in your database? Yes

263.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes
2. Can you do it in the database? Yes
(... after visiting my 2 day SQL training)
3. Can you do it in your database? Yes
(... unless you're using MySQL)

264.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes
2. Can you do it in the database? Yes
(... after visiting my 2 day SQL training)
3. Can you do it in your database? Yes
(... unless you're using MySQL)
4. Should you do it in the database?

265.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes
2. Can you do it in the database? Yes
(... after visiting my 2 day SQL training)
3. Can you do it in your database? Yes
(... unless you're using MySQL)
4. Should you do it in the database? No

266.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes
2. Can you do it in the database? Yes
(... after visiting my 2 day SQL training)
3. Can you do it in your database? Yes
(... unless you're using MySQL)
4. Should you do it in the database? No
JUST
KIDDING!

267.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes
2. Can you do it in the database? Yes
(... after visiting my 2 day SQL training)
3. Can you do it in your database? Yes
(... unless you're using MySQL)
4. Should you do it in the database? Yes

268.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes
2. Can you do it in the database? Yes
(... after visiting my 2 day SQL training)
3. Can you do it in your database? Yes
(... unless you're using MySQL)
4. Should you do it in the database? Yes
5. Do listicles attract attention?

269.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes
2. Can you do it in the database? Yes
(... after visiting my 2 day SQL training)
3. Can you do it in your database? Yes
(... unless you're using MySQL)
4. Should you do it in the database? Yes
5. Do listicles attract attention? Yes

270.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes
2. Can you do it in the database? Yes
(... after visiting my 2 day SQL training)
3. Can you do it in your database? Yes
(... unless you're using MySQL)
4. Should you do it in the database? Yes
5. Do listicles attract attention? Yes
6. Will this talk ever end?

271.
Copyright (c) 2009-2016 by Data Geekery GmbH. Slides licensed under CC BY SA 3.0
So what’s the key takeaway?
1. Can you do it in the database? Yes
2. Can you do it in the database? Yes
(... after visiting my 2 day SQL training)
3. Can you do it in your database? Yes
(... unless you're using MySQL)
4. Should you do it in the database? Yes
5. Do listicles attract attention? Yes
6. Will this talk ever end? Yes