My Favorite PostgreSQL Queries and Why They Matter

Databases, tables, normalization, and a solid backup plan allow us to store and maintain data.

Those combined best practices, in turn, afford us interaction with that data. In today's data-driven world, data is valuable. Not only valuable, data is oftentimes critical to end-user solutions provided by products and services. Extracting insight, answering questions, and meaningful metrics from data by way of querying and data manipulation is an integral component of SQL in general.

This foundational crux is critical for success in any data-driven aspect.

Below, I present a combination of 8 differing queries or types of queries I have found interesting and engaging to explore, study, learn, or otherwise manipulate data sets.

They are not listed in any order of importance.

Most will probably be familiar old friends. Perhaps some will become new acquaintances.

Sample tables and data used are not as important as the actual construction of the queries themselves and what each query returns, offers, or provides. Many of them are mock and derived for demonstration purposes and should not be taken literally in their values.

1. Left join, mind any nulls on the right...

Suppose in this example, we have a running sale of two months and are getting a total of both combined.

Yet, for some reason, the second month did not pull its weight and we want to target what days month one picked up the slack.

These sales are represented as tables payment and fake_month for this demonstration.

To note:

We will only check for totals greater than 2000.

We will limit the output to just 10 rows.

To start, we have this Common Table Expression (CTE) 'generating' the fake_month table for us, and query that follows.

Yes that works. So why not just use that query instead? Why it matters?

I feel using LEFT JOIN and ORDER BY NULLS first for the table on the right side of the JOIN, is a great way to explore unfamiliar tables and data sets.

By confirming what, if any, data is ‘missing’ on that side of the join condition first; enhances clarity and awareness, allowing you to then filter out the results set with the WHERE <column_name> IS NULL clause, finalizing things up.

Of course, familiarity with the tables and datasets could potentially eliminate the need for the LEFT JOIN presented here.

It's a worthy query for anyone utilizing PostgreSQL to at least try, during exploration.

2. String Concatenation

Concatenation, the joining or appending of two strings, provides a presentation option for results sets. Many 'things' can be concatenated.

However, as noted in the documentation, the string concatenation operator ('||') accepts non-string input, as long as one is a string.

Notice we had to escape the single quote used with apostrophe s, using an additional single quote to show possession of the email address for each customer.

Why you should know?

There may be times when concatenating data presents you with better insight and understanding into the data set you are working with. Along with reporting options, concatenating shared datasets with others' could potentially make them (the data) more readable and digestible.

3. Supplying IN values list with Subquery's

A Subquery has numerous powerful uses. Of those, providing an IN list of values to check for membership is a common one.

Here's a quick use.

Suppose we have customer and payments tables in a mock DVD rental store and want to reward our top five highest spending customers who rented movies during the days of April 10 - 13.

Imagine that's a special target period. So if the customer spent more than $30, we want to acknowledge them.

Bear in mind, there are other available options for solving this type of question (i.e., joins, capturing results from multiple selects, etc...), yet, sub-queries can handle it as well.

We will start out with the whole shebang here. This complete query returns everything we want for this particular question.

dvdrental=> SELECT first_name, last_name, email
FROM customer
WHERE customer_id IN (
SELECT customer_id FROM (
SELECT DISTINCT customer_id, SUM(amount)
FROM payment
WHERE extract(month from payment_date) = 4
AND extract(day from payment_date) BETWEEN 10 AND 13
GROUP BY customer_id
HAVING SUM(amount) > 30
ORDER BY SUM(amount) DESC
LIMIT 5) AS top_five);

This example actually contains nested subquery's, one of which is a Derived Table.

Let's start by drilling into the innermost subquery, that Derived Table.

This subquery is a standalone SELECT statement all its own, returning a customer_id and a SUM() on the amount column.

Only those customers meeting the criteria checked by the WHERE and HAVING clauses make the cut, being further thinned out with LIMIT 5;

Why the next subquery you ask?

Can we not just use the WHERE customer_id IN portion of the outermost SELECT here?

Let's see with a hands-on approach.

I will remove the AS top_five from the subquery and try the outermost query with it now:

dvdrental=> SELECT first_name, last_name, email
FROM customer
WHERE customer_id IN
(SELECT DISTINCT customer_id, SUM(amount)
FROM payment
WHERE extract(month from payment_date) = 4
AND extract(day from payment_date) BETWEEN 10 AND 13
GROUP BY customer_id
HAVING SUM(amount) > 30
ORDER BY SUM(amount) DESC
LIMIT 5);
ERROR: subquery has too many columns
LINE 3: WHERE customer_id IN (

Here, IN membership is being tested with only the customer_id column, yet the Derived Table returns two columns and PostgreSQL lets us know.

One remedy is to use another subquery. Selecting only the customer_id from the Derived Table results set, creates the next inner nested subquery.

Now the IN predicate contains multiple rows of one column's values to check membership against the WHERE clause for customer_id to make the final results set.

Why it matters?

Utilizing subquery's in this manner is powerful due to the fact of the number of values that could potentially be tested with the IN() predicate.

Imagine if there were a 100? Or more?

'Hard-coding' all of them in the IN() list could become problematic and error-prone as the volume of values increases.

4. generate_series()

This set returning function, is handy and super fun to use and explore. I have used generate_series() in above examples, but it deserves a talk of its own. Focusing more on the function and capabilities.

I find generate_series() useful for comparative queries where some, or all data is missing.

Or only partial data is available at the time I am exploring. One handy use is populating tables with 'dummy data'.

We need to rename the member_status column and add 'group' to the end of the current name present for each record.

For starters, multiple individual UPDATE statements will accomplish this no problem.

But, so can a single CASE expression.

trial=> UPDATE reward_members
SET member_status = (
CASE member_status
WHEN 'gold' THEN 'gold_group'
WHEN 'bronze' THEN 'bronze_group'
WHEN 'platinum' THEN 'platinum_group'
WHEN 'silver' THEN 'silver_group'
END
)
WHERE member_status IN ('gold', 'bronze','platinum', 'silver');
UPDATE 8

You can imagine how many round trips this would take to the server if multiple individual UPDATE statements had been run. In truth, only 4 for this example. But still, the potential for many is always there.

Yet, using an UPDATE with CASE expression, we are sending only one query instead.

7. COPY and \copy

PostgreSQL provides COPY, a command for exporting data between files and tables.

Be sure and visit the provided link to see the abundant number of options available with COPY.

An important note concerning COPY. SUPERUSER role privilege is required to execute this command.

The psql meta-command \copy is an alternative for those users not deemed this role attribute. We will visit that command in turn shortly.

First, let's run a COPY command to export certain columns to a CSV file on the local machine.

However, one of my test roles without the SUPERUSER attribute, ran into problems writing to the /tmp file.

See below for confirmation:

trial=# SET role log_user; -- changing from postgres user to log_user
SET

Now attempting the same COPY command, writing to the /tmp folder

trial=> COPY (SELECT expense_amt, member_status
FROM reward_members
WHERE member_status = 'gold_group')
TO '/tmp/awards_to_honor2.csv'
DELIMITER ','
CSV HEADER;
ERROR: must be superuser to COPY to or from a file
HINT: Anyone can COPY to stdout or from stdin. psql's \copy command also works for anyone.

Perhaps a better measure, as suggested in the HINT:, for roles without the SUPERUSER attribute, is the psql \copy meta-command.

Let's carry-out a similar type of command with \copy instead using the same role, without the need for that SUPERUSER attribute.

Importing data into PostgreSQL via files is a surefire bulk upload method. Although all are not covered in this blog post, COPY and \copy both, offer several options for working with different file formats and extensions.

On the same token, exporting data from tables, or specific columns is easily handled with both of these commands as well.

8. psql \help meta-command

You're in a psql command-line session. Curious about the CREATE INDEX command syntax?

The fact this meta-command is super easy to use, powerful, and convenient are enough pros to mention it here. It's saved me tons of time spent searching throughout other documentation. And of course, being a newbie, I use it quite often!

Conclusion

This is not an exhaustive list. Nor the 'be all end all' of queries and data manipulation.

Only my take on those that pique my interest and speak to me as I continue to learn and grow into a SQL Developer role. I hope through this blog post, you will find use cases for the above queries and commands, implementing those where you see fit.

Joshua Otwell is a Guest Writer for Severalnines. While his day job of Pipeline Surveyor is not a technical one, he has a drive to migrate into a SQL Developer role. He is a PostgreSQL and MySQL database technology enthusiast, focusing on inventory and asset data storage and processing. He spends his spare time with his wife and two daughters, reading fantasy novels, playing in Tabletop RPG games, and blogging his journey to SQL Developer one blog at a time. Here's a link to his blog.

More from 'Become a PostgreSQL DBA' series

Benchmarking is one of the ways to get insight into your database server capacity. There are different ways to perform a capacity test before you load your database with production data. One of the popular tools out there for RDBMS benchmarks is sysbench. This blog post shows how to generate a test load for your PostgreSQL servers using sysbench.

The PostgreSQL System Catalog is a schema with tables and views that contain metadata about all the other objects inside the database and more. With it, we can discover when various operations happen, how tables or indexes are accessed, and even whether or not the database system is reading information from memory or needing to fetch data from disk.