I’ve come up with some practical exercises for participants to try. There will be a lot of contextualisation going on as part of the workshop, the main idea is to seed some ideas of what’s possible. Here’s the Google Docs version of the guide. The focus is on what you can do with the importHTML, importFeed, importXML, importRange and importData spreadsheet formula with a few others, including QUERY, thrown into the mix. Using QUERY isn’t straight forward but it opens up lots of possibilities in how you reshape data (extra examples of these are included in the Google Docs version).

importHtml

Syntax: ImportHtml(URL, query, index)

URL is the URL of the HTML page.

Query is either “list” or “table” indicates what type of structure to pull in from the webpage. If it’s “list,” the function looks for the contents of <UL>, <OL>, or <DL> tags; if it’s “table,” it just looks for <TABLE> tags.

Index is the 1-based index of the table or the list on the source web page. The indices are maintained separately so there might be both a list #1 and a table #1.

Exercise 1: Importing a html table and graphing the result

Select Insert > Chart and then select the data ranges for country name and total medals

While still in the Chart Editor select the Charts tab, then Maps > Geo charts - regions

Still in the Chart Editor select Customise and change the No value colour to blank.

Finally click Insert

You should now have a chart that looks like this:

Notice that the chart has No values (black) for most of the countries. To fix this we need to remove the country codes in brackets. One way to do this is trim the text from the left until the first bracket “(“. This can be done using a combination of the LEFT, FIND and ARRAYFORMULA (ARRAYFORMULA allows you to apply formulas to multiple cells).

In cell H2 enter the formula =ARRAYFORMULA(LEFT(B2:B86,FIND("(",B2:B86)-2)) this should create a column of country names without brackets)

Click on your Chart and select Advanced edit.

Make sure you are on the Start tab in the Chart Editor and edit the data range forSheet1!B2:B86 to Sheet1!H2:H86 then click Update

importFeed

Syntax: =ImportFeed(URL; [feedQuery | itemQuery]; [headers]; [numItems]). The arguments to the function are as follows:

URL is the url of the RSS or ATOM feed.

feedQuery/itemQuery is one of the following query strings: "feed", "feed title", "feed author", "feed description", "feed url", "items", "items author", "items title", "items summary", "items url", or "items created". The feed queries return properties of the feed as a whole: the feed's title, the feed's author, etc.Note: To get the data included in the feed, you need to do an "items" request.

the "feed" query returns a single row with all of the feed information.

the "feed <type>" query returns a single cell with the requested feed information.

the "items" query returns a full table, with all of the item information about each item in the feed.

the "items <type>" query returns a single column with the requested information about each item.

if a query is given that begins with "feed", the numItems parameter isn't necessary and is replaced by the option headers param.

if a query is given that begins with "items", the numItems parameter is expected as the 3rd param, and headers as the 4th.

headers - "true" if column headers is desired. This will add an extra row to the top of the output labeling each column of the output.

Exercise 2: Importing a RSS feed and getting social share counts

An aside: Spreadsheet Addiction by Patrick Burns (http://goo.gl/P6pQP) - highlights the dangers of using spreadsheets for analytics. Particular issues include the ambiguity of a cell being a value or a formula. For example, if I sort cells on the value in the Twitter count column all the data is lost because cells are sorted as values but actually contain formula which get broken.

How it works

In cell A11 is the formula =IF(ISBLANK(B5),,IMPORTFEED(B5,"items",FALSE)) If the feed url is not blank this fetches the RSS feed defined in B5. Results are returned in cells A11:E30. You may have noticed that column E is hidden this is because it contains the feed item description.

The social share counts are returned by a custom function written in Google Apps Script (https://script.google.com). Google Apps Script is similar to Excel Macros, written using a JavaScript syntax. If you open Tools > Script editor in your spreadsheet you can see some of the custom script powering the spreadsheet. This includes the getSharedCount formula used in cells F11:F30 which passes the post url to the SharedCount.com API and returns social share counts. The code used is:

importXML

Syntax: =ImportXML(URL, query)

URL is the URL of the XML or HTML file.

Query is the XPath query to run on the data given at the URL. Each result from the XPath query is placed in its own row of the spreadsheet. For more information about XPath, please visithttp://zvon.org/xxl/XPathTutorial/Output/.

How it works

Using XPath we can identify parts of a XML (including HTML) page we want to extract. The screenshow below shows how parts of the page are identified. [I always struggle with XPath so use browser extensions to help (Scraper and XPath Helper)]. The results are pulled into the spreadsheet as live data so if the source page is updated the data in the spreadsheet will also be updated.

ImportRange

Syntax: =ImportRange(spreadsheet-key, range)

Spreadsheet-key is a STRING which is the key value from the spreadsheet URL.

Range is a STRING representing the range of cells you want to import, optionally including the sheet name (defaults to first sheet). You can also use a range name if you prefer. Given that the two arguments are STRINGs, you need to enclose them in quotes or refer to cells which have string values in them.

Example: =importrange("abcd123abcd123", "sheet1!A1:C10")

"abcd123abcd123" is the value in the "key=" attribute on the URL of the target spreadsheet and "sheet1!A1:C10" is the range which is desired to be imported.

Note: In order to use ImportRange, you need to have been added as a viewer or collaborator to the spreadsheet from which ImportRange is pulling the data. Otherwise, you'll get this error: "#REF! error: The requested spreadsheet key, sheet title, or cell range was not found."

[Here’s http://goo.gl/b8FXC is a copy of the completed spreadsheet used in exercises 4, 5 and 6]

Lets now reshape the data so we can generate some graphs. Lets first calculate the change in rank between 2011 and 2012

In cell E2 enter the formula =B2-A2

Fill this formula for the rest of the rows (there are a couple of ways of doing this including copying cell E2, highlighting the other empty cells in Column E and pasting, or whilst E2 is active grab and drag the bottom right corner of the cell

Now we want to get a list of the countries included in column D. To do this in cell G2 enter the formula =UNIQUE(D2:D102)

Now we want to sum the rank difference per country by entering the following formula in cell H2:=SUMIF(D$2:D$102,G2,E$2:E$102)

Copy this value down for the remaining rows

Select the data range G2:H16 and Insert > Chart > Bar chart

Graph of change in top 100 world university ranking.

Is the process for producing this chart valid? Is it displaying a meaningful representation of the data? Is this chart a lie?

Important: Notice that the calculation for France has an error:

This is because on row 93 the source data doesn’t have a number value. Because we’ve used ImportRange to get the data we can’t edit it as are changes get overwritten by the importRange formula in cell A1. In our scenario we can remove the calculated value in E93 or use a formula to handle the error. Other ways around this are to flatten the imported data by copying all of it and paste as values (other solutions exist which we cover later)

ImportData

Syntax: =ImportData(URL)

URL is the URL of the CSV or TSV file. This imports a comma- or tab-separated file.

Note: The limit on the number of ImportData functions per spreadsheet is 50.

Exercise 5: Importing CSV data from Google Maps

In the spreadsheet you created for exercise 4 Insert > New Sheet

In cell A1 of the new sheet enter the formula =FILTER(Sheet1!A:E,Sheet1!D:D="United Kingdom") to filter the data on sheet1 where column D is the United Kingdom

Import... and QUERY

Query - A query string for applying data operations. The query operates on column IDs directly from the input range and uses a subset of the SQL language. For example, "select E," "select A , B," "sum(B),C group by C," "select D where D < 'Nick' ." In certain instances, for example when using FILTER as a data source, column identifiers are Col1, Col2 etc. For more information on creating queries read see Google Visualization API Query Language

Headers (optional) - A number specifying the number of header rows in the input range. If omitted, or set to -1, the number of header rows is guessed from the input range. This parameter enables transformation of multi-header rows range input to be transformed to a single row header input which the QUERY supports.

Exercise 6: Import and QUERY

In the spreadsheet you created for exercise 4 Insert > New Sheet

Form your original sheet (Sheet1) copy the importRange formula in cell A1

In your new sheet (Sheet3) paste the formula you just copied inside a QUERY formula shown below

How it works

The QUERY function imports the data and using the Google query language selects columns 1 to 4 and also adds a fifth by taking the difference between Col2 and Col1, this new column is labeled as Difference. Notice that on row 93 the French entry no longer has an error, but is blank.

We could continue exercise 4 and get a summary chart using UNIQUE and SUMIF. An alternative would be to use the QUERY formula again to do this for us by:

In Sheet 3 enter the following formula in cell G1 =QUERY(A:E,"SELECT D, SUM(E) WHERE D <> '' GROUP BY D ORDER BY D")

This time we are setting the data source as all the data in columns A to E in Sheet3. Next we are creating a query that selects column D and a sum of column E where D is no blank and grouped by the value in column D (the country names).

A reminder that here’s a copy of the completed spreadsheet used in exercises 4, 5 and 6 http://goo.gl/b8FXC

Summary/Keypoints

Hopefully you’ve seen that Google Sheets (Spreadsheets) can be a useful tool for importing data from other sources and reshaping it to fit your needs. The last set of exercises are more advanced so don’t worry if you don’t fully understand what is happening, they are there as an indication of what is possible and hopefully inspire you to find out more.

Three points worth remembering when using import formula:

The data is a live link - if the data source changes or disappears your imported data will also change or disappear (this can be both an advantage and disadvantage).

The ambiguity of a cell - because a spreadsheet cell can a value and a formula sorting values generated by formulas can lead to unforeseen problems.

Hello Thanks for Saved my Day. That really working and help. I want to ask you this. When i tried to import data from csv source. Spreadsheet shows all of data in a row. Data looks like - mpcbm(94,[['458','a380','aloha airways ',[32.8661098,26.588951,13363.54,120.3,-1.82,-1.04],1362712241588,'458'],['460','AN140','Alphalead 15(MTA)',[25.798486,-80.2858274,5.24,19.2,-0.8,0.28],1362712210179,'460'],['462','a380','Octa Air TGBr UA CLX-770-PT',[25.7839661,-80.2159159,394.47,133.79,-15.14,-25.32],1362712254640,'462'],['476','md11','Phil. Intl. Airline',[25.7887856,-80.2642534,58.96,91.42,-7.84,-3.31],1362770679316,'476'],['480','hughes','CAPTAIN AIMAN',[-22.990834,-43.3724926,29.33,-16.6,-3.96,13.21],1362712248528,'480'],['848','cub','MumbaiSky',[19.1475676,72.9643679,442.9,86.76,-2.12,-2.89],1362712239782,'848'],['1502','a380','AAV VP (Rick)',- Data not hava column names. So how can i show this data in columns ?

Hello, i think i found which character they are using for seperating. works fine when i tried ', and show data 3 colums is working fine. But i tried using this over a table generator. how can i configure spreadsheet to seperate data with ', ..

Disclaimer

The views I express here are mine alone and do not necessarily reflect the views of my employer or any other party.

All code, applications and templates on this site are distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.