All the Reddit data is extracted from BigQuery, with maybe a few data range tweaks depending on what I'm doing.

Example:

SQL:

#standardSQL
SELECT
title
FROM (
SELECT
REGEXP_REPLACE(title, '&amp;', '&') as title,
score
FROM
`fh-bigquery.reddit_posts.*`
WHERE
(_TABLE_SUFFIX BETWEEN "2016_01" AND "2018_10")
AND LOWER(subreddit) IN ('legaladvice')
AND score >= 5
AND LENGTH(title) >= 20
)
ORDER BY score DESC