Prerequisites

Prepare and load the sample data.

Obtain the data. The US Government keeps a record of names given to newborn babies for each year going back to 1880. You can download the data from Popular Baby Names. The zip archive contain several comma separated value (CSV) files (one for each year). The file names are in the format “yobNNNN.txt”. You can choose any file you like. For this example we’ll use the file yob2010.txt that contains the most popular names for the year 2010.

Unzip the archive. The format for each CSV file is name, gender, count. Where gender is either “M” or “F”, and count is the number of children given that name. Remember this format, because you will use it for the schema when you create your table.

NoteThe sample data is already in comma separated value (CSV) format. One of the formats accepted by BigQuery, the other being JSON. So no data preparation is required before loading it into BigQuery.