Google Cloud Platform includes a variety of data services. When you're
encountering new data you'd like to explore, Google BigQuery allows
you to do so easily and scalably. It is a fully-managed service for data
exploration on arbitrarily-large datasets using an SQL syntax. Once data
has been loaded into the service, it can be queried at interactive speeds,
without having to know beforehand what you're looking for.

There has been much already written about using BigQuery for data analysis - for
example:

Okay, so there are several bits of information that the Meteoritical Society
provides us. From this sample, the fall field is always set to Found. Let's
see if there are any other values using a group by:

It seems it's far more likely (96.6% vs 3.4%) for a meteor to be found, than to
be seen fall. This is reassuring, since by extension it implies that it's
unlikely that a meteor will fall on any given person. But how frequently do they
fall in general? The data only provides information for the actual fall of 1096
/ (1096 + 30941) = 3.4% of the recorded meteors, but a count of meteors by year
should at least give us a rough upper bounds for the frequency of meteors
falling:

Aside: Doing queries like this can serve to inform the
preprocessing step,
revealing bugs and dirty data. For example, the first draft of this article
revealed some bad year data, which led directly to filters incorporated into
the cleansing pipeline.

This tells us that the median number of metoers found per year is 10, the
most metoers found in a year is 3046, with 90% of years recording less than 344
meteors. The quantile function provides approximate figures, but can give us a
decent impression of the overall shape of the data. We can compute a more
precise figure manually:

So we find for the 262 years for which we have data, 27 / 262 = 10% of the
years recorded more than 344 meteors. So in that sense, 2006 was indeed an
unusual year, though the same could be said of 2010 and 2011. In fact, recent
years do seem to have a higher number of reported meteors relative to the
quantile distribution, perhaps due to more vigilant record-keeping. We can again
use the quantiles function to get an overall impression of how reporting is
distributed across the years:

We see here that the period from 1400 to 1974 has roughly the same number of
reported meteors as the period from 2003 to 2013, though from 1974 onward, there
seems to have fairly regular reporting.

Quantiles are useful for getting a quick feel for the distribution of your data
- try using it to find the different masses of meteors found.

Finally, one might be concerned, when researching a new place to live or
establish a business, the rate you might expect meteors to land in your area.
Fortunately, BigQuery provides some functions to help compute distances between
latitude and logitude coordinates. Adapted from the
advanced examples in
the docs, we can find the number of meteors within an approximately 50-mile
radius of Google's Kirkland campus (at 47.669861, -122.197355):

Since the last meteor was found before I was born, 30 miles away, it seems that
Kirkland is relatively safe from meteor landings.

The natural next step in making this information useful and presentable might be
to create a visualization of this on a map, since this is inherently geographic
data. For more on this, check out Cloud Datalab and
Datastudio.

API Documentation

BigQuery is not just a command-line sql querying tool -

As we've seen, in addition to support for
standard SQL (and its
legacy SQL syntax used in this
tutorial), BigQuery also provides some useful additional aggregate functions.

Related Tutorials

Submit a Tutorial

Request a Tutorial

GCP Tutorials

Beta: Community Contributed Tutorials is a Beta release. This feature is not covered by any SLA or deprecation policy and may be subject to backward-incompatible changes.
Except as otherwise noted, the content of this page is licensed under the
Creative Commons Attribution 4.0 License, and code samples are licensed under the
Apache 2.0 License. For details, see our
Site Policies. Java is a registered trademark of Oracle and/or its affiliates.