Tips and Tricks

Splitting Comma-Separated Values In MySQL

Comma-Separated Values

Every once in a while, a quick decision is made to store data in a comma-separated fashion, and the SQL analyst is left to pick up the pieces during analysis.

Let’s take an example from Periscope’s own schema: Each Periscope dashboard has a comma-separated list of users who receive that dashboard by email every day. Here’s what it looks like:

Let’s say we want to do a simple analysis: Which users receive the most dashboards by email? If we’re using Postgres, regexp_split_to_table comes to the rescue.

MySQL users, however, are in the dark. In this post, we’ll show how to split our comma-separated string into a table of values for easier analysis in MySQL.

Making a Table of Numbers

To get started, we’ll need a table that contains numbers at least as big as the length of our longest comma-separated list. We like Periscope’s Views feature for this, but in a pinch, a temporary table also works:

createtemporarytable numbers as (

select1as n

unionselect2as n

unionselect3as n

...

)

Joining Our Table To Numbers

The next thing we’ll want to do is create the structure of our resulting table. We need a row for each email address in each list.

To do that, let’s join the numbers table to our original dashboards table. We’ll use the numbers to restrict the number of rows to the length of each list:

select*

from dashboards

join numbers

onchar_length(email_recipients)

-char_length(replace(email_recipients, ',', ''))

>= n -1

Let’s take this in pieces. First is char_length, which returns the number of characters in a string. replace(email_recipients, ',', '') removes commas from email_recipients. So char_length(email_recipients) - char_length(replace(email_recipients, ',', '')) counts the commas in email_recipients.

By joining on the number of commas >= n - 1, we get exactly the number of rows as there are email_recipients!

Here are the results:

Selecting Each Item In The List

We now have the list duplicated exactly the right number of times, and as a bonus, we have a column of numbers that we can use as an array index!

We just need to select the item in the list that corresponds to n. For this, we’ll turn to MySQL’s handy substring_index function. Here’s the SQL:

select

id,

substring_index(

substring_index(email_recipients, ',', n),

',',

-1

) as email

from dashboards

join numbers

onchar_length(email_recipients)

-char_length(replace(email_recipients, ',',''))

>= n -1

substring_index returns the substring starting or ending at the i’th occurrence of the specified delimiter, where i is the third argument. We use it once with n to find the nth comma and select the entire list after that comma.

Then we call it again with -1 to find the first remaining comma, and select everything to the left of that. With this combination, we find the whole string between the nth and (n+1)th comma. That’ll be the nth email recipient!

Here’s the resulting table:

Putting It All Together

Now that we have our data schematized, a simple group-and-count can tell us who the top users of the email feature are!