Tutorial: City of Whittlesea Workshop

Introduction

In this tutorial we will introduce you to the the AURIN Workbench. AURIN is the largest single resource for accessing and interrogating thousands of datasets, spanning the physical, social, economic, and ecological aspects of Australia’s cities, towns and communities.

The AURIN Portal is AURIN’s flagship tool on the Workbench – it is an online data browser, visualisation and spatial analytical platform. It is completely free to use for any university staff member or student, and for any government employee, across Australia.

The AURIN Portal has been designed to integrate into a researcher’s existing research ecosystem, meaning data can be downloaded or uploaded as the user sees fit. However, if you wish to work within the AURIN portal, there are over, 100 analytical routines ranging in complexity from basic statistical and visualisation techniques to complex spatial statistical routines. There are also bespoke analytical tools, including neighbourhood level walkability analyses.

In today’s tutorial we will be undertaking the following tasks on the AURIN Workbench:

Browsing AURIN Data and Help Pages, and entering the AURIN Portal

Selecting an Area and browsing and retrieving AURIN data

Creating a Choropleth Map and creating a Choropleth Centroid Map

Merging datasets and creating an interactive scatterplot

Generating variables and filtering data

Creating a static scatterplot and undertaking a correlation analysis

Undertaking a Neighbourhood Walkability Analysis for Whittlesea

Mapping a Screenshot and downloading an Attribution List

Please note that within this tutorial, there are a number of links (inAURIN Blue) to specific user guides for each Portal tool used. Please click on the links to get specific instructions for each of the tools. Important parts of the tutorial are highlighted in AURIN Orange.

AURIN Data

Use this tool to explore some of the data sets that you are interested in. You can search by:

Organisation

Keywords

Tags

License Type

Location

We recommend that you explore datasets within the AURIN data catalogue before diving in to the AURIN Portal to search for data. While a data search in the Portal is a seamless component of the workflow, it can be much easier to find specific datasets in a site dedicated to being a data catalogue!

AURIN HELP AND DOCUMENTATION

The second tool we will use is in the AURIN Help and Documentation repository that accompanies the AURIN workbench. This can be found at:

In these pages, you will find specific user guides for each and every tool on the AURIN Workbench. In addition, we have a collection of tutorials (like this one!) which provide useful examples of how to join a range of tools together for a full work flow, as well as exploring how to undertake interesting investigations of the thousands of datasets in the AURIN Workbench.

We recommend that you bookmark both of these sites now, so that they are at your fingertips when you need to do some pre-Portal groundwork in the future – or if you get stuck!

Logging into the AURIN Portal

If this is your first time visiting the AURIN portal you will be greeted with the Australian Access Federation (AAF) login screen:

If you are a student, staff member or researcher at an Australian University (that is, you have an @edu.au email address), you should already be able to access the portal by selecting your university from the drop-down list and using your regular login details. A few other institutions have organisational access: AARNET, AIMS, CSIRO, INTERSECT, NICTA and TPAC

If you are not a member of the above list – that is, if you work for a government organisation you will need to have requested and set up an account with us. You can register here. You will then receive a registration email from the AAF asking you to verify your account.

Then you will need to log in via the AAF Virtual Home Network – the option as at the top of the organisation list

Navigating the AURIN Portal Interface

Once you have logged in to the AURIN Portal, you will be greeted by the main map interface, which looks something like the image below. The numbers in the image represent elements of the portal interface that you may find interesting or useful as you navigate your way through the portal

1. The Map Interface

The AURIN Portal is presented in a map based format, which allows you to zoom, pan and navigate with either your track pad or click wheel, or through the + or – zoom function on the top left of the map. The red boundary on your map indicates the area of geography that you have selected for your study.

2. Select Your Area

You can choose what your study area is by the navigation box that appears when you first log in. This can be reopened at any point by clicking on any of the rows in the Area panel. Comprehensive information about how to select your area can be found here

3. Select Your Data

You can aquire your data by using the options present under the Data panel. Comprehensive information about how to select your area can be found here

4. Visualise Your Data

You can create maps and interactive charts from your data by using the options present under the Visualise panel. Comprehensive information on visualisations can be found here

5. Analyse Your Data

You can undertake simple and sophisticated analyses of your data by using the options present under the Analyse panel. Comprehensive information on analysis tools can be found here

6. The Help Button

On the top right of the portal is the blue Help button, which will link out to the AURIN Help pages that you are currently browsing. Wherever you see a blue ? question mark, this will also link out to the appropriate help page associated with that part of the portal.

7. MyAURIN

Clicking on the MyAURIN link will bring up a menu (shown below) with a number of options. Here you can create a new project, rename your project (always a good idea if you’re creating lots of projects!), open up a different project from your project list and reset your current project.

8. Report An Issue

If you come across a bug or a problem with the portal, please report an issue to the AURIN team by clicking on the Report an Issue button

Now that we are in the Portal, and familiar with the layout of the interface, it’s time to do some work!

Selecting a Study Area

Selecting an area of inquiry will allow you to discover, extract, analyse and visualise the data that you’re interested in.

When you log in to the portal, your project will start with Australia selected as your highest level of geography.

From here you can select the levels of geography you are interested in. There are two ways to do this, by clicking either of the options under the Area panel (shown below)

Clicking on the Area Selection option under the Area panel will bring up the Area Selection pop-up box, which gives you three options of choosing your area of enquiry. If you are interested in understanding all of these options, have a look at the Selecting your Areahelp documentation

In this tutorial, we will be moving down the options until we come to 2016 Greater Melboure Capital Statistical Area (Greater Melbourne GCCSA 2016). You can also interact with the map itself at this stage, clicking on the areas, further breaking them down into their constituent parts.

Clicking Done will close the browser and zoom the map to the level of geography that you have selected. Your portal session should look something like the image below

It is important that you have selected the 2016 GCCSA

Selecting some Data

After selecting your area of analysis, you now need to bring some data into your AURIN portal session to visualise, interrogate and analyse.

The Data panel (shown below) provides you with three options for selecting data. Each of these will be explained in more detail below

For this part of the tutorial, we will be using the first option, so click the + Dataset button

This button calls up the Data Browser window, giving users access to the datasets that are available for their use from the many AURIN data custodians around the country.

The first thing to note is that the number of datasets that are available to you when you open the Data Browser window is restricted by your area selection – the smaller the area that you select, the fewer the datasets that will be available within your session

There are many ways that you can filter and search for datasets within the data browser. You can do a Keyword search, limit your search to a specific level of aggregation or granularity, or limit to the organisation/data custodian that you’re interested in.

When you select a dataset (shown below), the dialogue box will then require you to select the attributes you want for the data – the three shown in the image below (you can also select all of the attributes by clicking the top left check box next to Attributes) The abstract for your data is provided in the top right of the Data Browser window.

Once you have selected your dataset and attributes, you can click either Add or Add and Open. The former option will keep your Data Browser open, so that you can shop for more datasets, but it will not automatically retrieve your dataset from the custodian/source. You will still need to do this by clicking on the specific dataset entry in the Data panel later. The latter option will close your Data Browser window and automatically retrieve the dataset from the custodian/source. Click Add and Open

For this tutorial, select SA2 SEIFA 2016 – The Index of Relative Socio-economic Advantage and Disadvantage (IRSAD), select the three attributes shown below and then click Add & Open to close the data browser

This will add the dataset to your Data panel, and it will open up the table for your perusal. You can sort the columns by the values by click on each column header. You can also download the dataset directly to your desktop in .csv (comma separated values) if you prefer, by clicking the small “CSV” icon on the top left of the table.

Adding Additional Variables to a Table of Data, and adding another dataset

At this stage, we may decide that we have accidentally missed some variables that we would actually like to have in the table for further use down the track. Rather than deleting the dataset and re-shopping for it with all of the variables, we can actually add the variables to the existing table. To do this, click on the spanner icon on the right next to the data entry, and click on the Edit option (as shown below)

This will open up the list of attributes, and you can add Usual resident population to your list of selected attributes. Once you click Edit and Open this will re-shop for the dataset, and you will see the additional variable in the resultant table when it opens

Now we will repeat the shopping process for an additional dataset. Using the keyword homeless, add the following dataset to your data panel (with all of the variables selected, as per the image below)

Creating a Choropleth Map

A choropleth map is likely to be the most common kind of map visualisation that AURIN users will make with the data that they access. A choropleth (from Greek χώρο (“area/region”) + πλήθος (“multitude”)) is a thematic map in which areas are shaded or patterned in proportion to the measurement of a variable being displayed on the map.

Choropleth maps can be created quickly and easily in the AURIN Portal, and provide a useful first pass at detecting and visualising interesting spatial patterns in your data.

We are going to create a choropleth map of our SEIFA IRSAD data.

To create a choropleth map, select Maps, Charts & Graphs, then Map Visualisations and then Choropleth. This will bring up a range of fields that need to be populated. Enter your parameters as you see them in the screen shot below

This will create a map of the SEIFA IRSAD score across Melbourne’s SA2s at the 2016 Census, which should look like the map below. You can hover over each of the areas to bring up its SEIFA IRSAD Score. Also, if you open up the table of data (by clicking on the little table icon), you can hover over each row and they’ll show up on the map, and vice versa.

Creating a Choropleth Centroid Map

Choropleth centroid maps are another common type of AURIN visualisation tool. Similar to classic choropleth maps, choropleth centroid maps differ in the central point (“centroid”) of an area (“polygon” on a map is represented, with varying colour and symbol size, rather than the entire area being shaded according to the variable.

We are going to create a choropleth centroid map of our homelessness data.

To create a choropleth centroid map, select Maps, Charts & Graphs, then Map Visualisations and then Choropleth Centroid. This will bring up a range of fields that need to be populated. Enter your parameters as you see them in the screen shot below

This will bring up a map which looks something like the image below. Larger darker circles represent SA2s with higher estimated numbers of homeless people, while smaller lighter circles represent areas with lower estimated numbers of homeless people. We have zoomed in a couple of steps so that the pattens are more visible

Research Question: Do you think there is a relationship between the number of homeless people, and the socio-economic score of an SA2?

Merge the Datasets

In order to answer the research question posed above, we first need to bring our datasets into the same table. We will use the Merge Aggregated Datasets tool to do this

The Merge Aggregated Datasets tool allows you to join together datasets of the same geographical aggregation(and in the same place!) to allow you to compare and investigate the relationship between the two (or more) datasets. In this tool, if one row in a table does not have a corresponding row in the other table, it will be empty on that side in the output table

Once you have clicked Add and Run this will execute the tool. When it as finished running, you can view the new table by clicking on the Display button that pops up. You will see a new table that has both of the old tables in it unified across rows (SA2s).

It is important to rename this output table in your Data panel to something like SEIFA + Homeless. Do this by click the Spanner icon next to the output table, selecting Rename and then renaming the output (shown below)

Creating a Scatterplot

A scatter plot is one of the easiest and more effective ways of investigating your datasets. The AURIN portal has two ways of creating a scatterplot from your data – the interactive scatterplot described here allows you to interact with each of the data-points and where they fit on your map, while the scatterplot described in the Chart Tools is more “bare-bones”, although it allows you easy download of the image for incorporation into documents or presentations – we will show you how to create one further down this tutorial

We will create an interactive scatterplot of an socio-economic status and estimated numbers of homeless people across Melbourne SA2s.

To do this click the Maps, Charts and Graphs button, click Interactive Charts and Scatter Plot. Enter your parameters as shown in the image below and click the Add and Display button

Once you have run the tool it produces a scatter plot as shown below, showing the relationship between SEIFA IRSAD Index Score and the estimated number of homeless people for SA2s in Melbourne. You can resize the chart and hover either the points on the graph or the SA2s on the map and they will show up on the other component.

Question: Do you think there is a relationship between the two variables from this scatterplot? What other tools might we need to determine this? Are there other factors we need to think about when we consider this relationship?

Generating New Variables

One of the issues that we have with our data at the moment is that we are comparing the total number of homeless people in an SA2 with socio-economic status. At this point, we would need to standardise the number to create a per capita rate, particularly if the number of homeless people is actually related to the total number of people in an area.

To do this, we will need to create a new column of data that has homeless populations standardised by populations. We will use the generatetool here to create a new table and column.

To do this open up the tool (Tools → Data Manipulation → Generate) and enter the parameters as you see them below, and click Add and Run

Once this has finished running, click on the Display button to view the table. You will see that new table has been produced with a new column in it. Remember to rename this table as well – something like SEIFA + Homeless + HL_Rate

Filtering by an Attribute

Recall from the scatter plot that there are three outliers at the top of the graph which may unduly skew or influence our results. If you click the HL_Rate column to sort in descending order, these three SA2s (Melbourne, St Kilda, Dandenong) are in the top 4 of the SA2, along with Flemington Racecourse. We may wish to remove these four values from our table.

To do this we will use the Dataset Attribute Filtertool to create a new table with these four rows removed. Open up the tool (Tools → Data Manipulation → Dataset Attribute Filter) and enter the parameters as you see them below, and click Add and Run. This will create a new dataset, which only includes SA2s (rows) that have a Homelessness Rate lower than 0.022 (as shown below)

Once the tool has completed running, a new table will appear in the right hand Data panel. You should rename this something like SEIFA + Homeless + HL_Rate Filtered.

Creating a Static Scatterplot

We may now be at a stage where we would like to create some imagery for a report or paper. Part of this would be to produce a scatterplot of the SEIFA Index Score versus the Homelessness Rate for our filtered SA2s. Rather than using the interactive scatterplot above, which can’t easily be placed into a report, the scatterplot chart toolcan be saved easily as a .png format.

Open up the tool (Tools → Charts → Scatterplot) and enter the parameters as you see them below, and click Add and Run. This will execute the tool.

When it has finished running click Display and it will open up the resultant image as shown below. You can right click on this image and save it to the appropriate location on your computer for later use in a report.

Running a Correlation Analysis

Up until now, all of our analysis of our data has been purely visual or interpretive in nature – that is, we have produced maps and graphs, but we have only done an “eyeballing” of the data and looked for patterns. Now we need to take our analysis one step further and undertake a statistical analysis that removes visual bias from the equation and runs a statistical test on the data.

We will undertake a very basic statistical analysis – a correlation analysis, which generate a correlation co-efficient that will tell us the strength and direction of any relationship between our two variables, and whether or not it is statistically significant.

Open up the correlation tool (Tools → Statistical Analysis → Correlation) and enter the parameters as you see them below. Note that for correlation analyses, you can choose more than to variables to see the correlation coefficients between range of factors, not just between two. However for this analysis we will only choose two variables. Click Add and Run

Make sure to keep the other parameters (Method, Use, Alternative Hypothesis, Confidence Level) as the default values.

Once the tool has run, click on the Display pop up that appears. It should look something like the image below. This correlation coefficient value of -0.4015 shows a moderately strong negative relationship between SEIFA IRSAD Index score and the rate of homelessness across Melbourne’s SA2s at the 2016 Census. This means that as disadvantage decreases and advantage increases, the rate of homelessness in those SA2s decreases. It is also statistically significant.

We are now going to switch gears completely and undertake some spatial analysis at a much finer scale. We are going to investigate walkability in a new area.

Firstly, create a new project under MyAURIN, naming it something like City of Whittlesea Walkability Analysis (shown below)

Using Bounding Box Selection

Firstly, select the City of Casey as your study area. Do this by clicking on Area Selection and searching for Casey (C) 2016 LGA boundary as shown below.

When you are finished click done and your portal should look like the second image below.

Now we will create a bounding box that encompasses the northern part of this area. To do this, click on the spanner next to the Bounding Box option under the Area panel, and then Select Current Map View. This will ‘activate’ the bounding box, which you can drag around and reshape, so that it’s where you want it and the size that you want it (shown below). The advantage to this method is that it does not then restrict the datasets that are available based on geographic hierarchies – every dataset that intersects or is within that box will come up in your data search when you run it.

Once you have selected your area, make sure to click Done. This will remove the active edge and vertex circles and lock in the bounding box.

At this point you might want to change the base map to one that has fewer colours on it, so that you can see your data more clearly. To do this, click on MyAURIN and then Change Base Map shown below).

Loading the right datasets

Now we need to bring some datasets into calculate our walkability.

Firstly, you will need a street network. Add and open the OpenStreetMap – Lines (Australia) 2018 making sure that you add the blue geometry attribute (shown below).

Secondly, we will add a point level dataset around which we will measure the surrounding walkability. For this, select the Department of Health – National Toilet Map – June 2018 dataset. Again, at the minimum, the blue geometry attribute must be selected.

At this point, you may want to have a look at what these datasets actually look like. For each of them, click the spanner next to the dataset in the data panel, and click Display on Map (shown below).

Just choose the default parameters for the display of the datasets. Your two datasets should look like the following

Finally, you will need a dataset which represents the land use categories and population counts for area. For this, add and open the Mesh Block 2016 Census for Australia dataset. For this dataset, select all of the variables, including the blue geometry variable

You might also want to have a look at what the land use categories of your meshblocks look like. To do this, create a choropleth of your meshblock dataset, choosing “meshblock Category” as your attribute. This will automatically choose Preclassified as your breaks. Select a Qualitative palette type, so that each land use gets a discrete colour. It should look something like the map below

Measuring Walkability around Points

We will be using the Walkability Index with Gross Density (Points) tool

Open the tool (Tools → Walkability → Complete with with Gross Density (Points)) and enter your parameters as shown below. These are also explained under the image.

Road Network: Line data set representing a road or pedestrian network. We select OpenStreetMap – Lines (Australia) 2018in this instance

Points: This is the dataset with the points which will form the centre of our walking catchments. In this case we select the Department of Health – National Toilet Map – June 2018

Maximum Walk Distance: The maximum distance (in metres) along the line network data to be traversed from each input point. We select 600 for this analysis

Trim Distance: The width (in metres) of buffer to be applied to the traversable line, network segments. We select 50 for this analysis.

Land use polygon dataset: This is the dataset that we use to specify the different land uses to be included in the Land Use Mix component of the walkability index. Select MB Mesh Block 2016 Census for Australia

Land use classification attribute: This is where specify the attribute that has the different land uses within it. In this instance we select Mesh Block Category

Land use classifications dataset again select the dataset that has the different land uses in it, i.e. select MB Mesh Block 2016 Census for Australia

Land use classifications attribute: again select the attribute which has the different land uses within it i.e. select Mesh Block Category

Classification categories values select the land uses you would like to include within your land use mix calculations.

Population dataset select the dataset that has the population counts for your regions. In this instance, the MB Mesh Block 2016 Census for Australiadataset has population counts for each mesh block, so we select that

Population attribute select the attribute within your population dataset that contains the population counts – in this instance, it is called Total Usual Residential Population 2016

Once you have entered your parameters, click Add and Run to execute the tool

Once your tool has run, click on the Display button to bring up the output of the tool. This is a table, with a large amount of information for about each of the catchments around the schools in the analysis (shown below). These are explained in some detail under the image

Connectivity: The total number of connections per square kilometer

Area: The total area in square metres of each walking catchment

Connections The total number of connections in each of the walking catchment

LUM_X: The total square metres of each land use X falling within each walking catchment

LandUseMixMeasure: This is an ‘entropy measure’, measuring the extent to which there is an equal distribution of each land use within the catchments. Values of the land use mix range from 0 (the lowest mix) to 1 (the highest possible mix)

AverageDensity: The average population per hectare for each of the catchments.

XXX_ZScore: These are the scores for the three different components (connectivity, land use and average density) converted into Z scores, where the mean for the different catchments is zero, and the numbers indicate how many standard deviations each score is above or below the mean. Essentially, the more positive the number, the better relative score for that attribute, and the more negative number, the worse relative score for that attribute. We recommend that you make sure you have a relatively large number of observations (a minimum of 30) before using Z scores in any discussion, as they rely on robust mean and standard deviation calculations, which are less reliable at smaller samples sizes.

SumZScore: This is the final Walkability Index for your catchments – and represents the sums of each of the different component Z score

We will now take a look at the distribution of the Walkability Index across our study areas. To do this, create a choroplethof the SumZScore, choosing a Diverging palette type (such as Spectral or Red to Blue) so that the middle colour represents the mean values. It should look something like the image below. If you hover over each of the bike share stations, you can see its individual attributes, and determine which of the different components let down or improved its overall walkability index

Mapping a Screenshot

We are now at a point where we may want to embed our walkability map into a report. You can create a high resolution screen shot of your portal session now by clicking MyAURIN and then Map Screenshot, shown in the image below. This will download automatically to your desktop.

Make sure to select all of the cartographic parameters to make sure your map is cartographically sound and appropriately references the AURIN Workbench

Exporting an Attributions List

Even the most permissive license on the AURIN Workbench requires you to cite where the data was sourced from. To download a list of the appropriate citations for all datasets that you accessed in your AURIN Portal session, click on MyAURIN and then Attributions Export (shown in the image below). This will download a zipped folder that contains a text file (.txt) with the list of citations and a .ris file which can be loaded into bibliographic management software such as EndNote.

ABOUT AURIN

AURIN is a collaborative national network of leading researchers and data providers across the academic, government, and private sectors. We provide a one-stop online workbench with access to thousands of multidisciplinary datasets, from over 100 different data sources.