Analysis and Visualization of Large Complex Data with Tessera, Brisbane

Presented by Ryan Hafen, Hafen Consulting LLC & Purdue University on on 17 February at Gardens Point Campus, Queensland University of Technology, Brisbane.

About the workshop:

This is a repeat of the popular Tessera course that was run in Sydney in October 2015. It will be held in conjunction with the Visualisation, Big Data, Art and Science Festival 2016 on 18-19 February at QUT (see http://visualisation.matters.today/).

R is a powerful language for statistical analysis and visualization. However, most of its power is restricted to data of small or moderate size. Using Tessera, users can readily visualize and analyse large complex data sets in a familiar R environment, making use of the thousands of methods for analysis, visualization, and machine learning that are available in R.

Developed over the past two years as part of the DARPA XDATA program in the United States, Tessera (http://tessera.io) is an open source statistical computing environment that enables R users to perform deep analysis of large, complex data sets. Principal contributors to the project are statisticians and computer scientists at Purdue University and Pacific Northwest National Laboratory.

Tessera uses the Divide and Recombine (D&R) approach. In D&R, data are divided into meaningful subsets, embarrassingly parallel computations are performed on the subsets, and results are combined in a statistically valid manner. Using the R datadr package, Tessera provides a simple interface to distributed parallel back end computation environments such as Hadoop. Tessera includes a visualization component, Trelliscope, which provides a D&R approach for detailed, flexible, and interactive visualization of large complex data

An overview of Divide and Recombine and Tessera will be provided, followed by a hands-on introduction to the Tessera R packages datadr and Trelliscope. After providing a practical feel for using Tessera for statistical analysis and visualization on small data sets, more in-depth hands-on examples will be provided using a larger data set, a one year collection of Taxi ridership data in New York City.

Requirements

Attendees should have basic proficiency with R and RStudio. Attendees should have a laptop with the following installed:

R 3.2.X

A recent version of RStudio

An up-to-date web browser, Chrome/Safari/Firefox

The [datadr package](https://github.com/tesseradata/datadr)

The [trelliscope package](https://github.com/tesseradata/trelliscope)

About the Instructor:

Ryan Hafen is a statistical consultant and a remote adjunct assistant professor in the Statistics Department at Purdue University. Ryan’s research focuses on methodology, tools, and applications in exploratory analysis, statistical model building, and machine learning on large, complex datasets. He is the developer of the datadr and Trelliscope components of the Tessera project (tessera.io), as well as the rbokeh visualization package. Prior to his work as a statistical consultant, Ryan worked at Pacific Northwest National Laboratory, doing applied work on analyzing large complex data spanning many domains, including power systems engineering, nuclear forensics, high energy physics, biology, and cyber security. Ryan has a B.S. in Statistics from Utah State University, M.Stat. in Mathematics from University of Utah, and Ph.D. in Statistics from Purdue University.

Course Costs:

SSA Members – $450

SSA Student Members** – $200

Non-SSA Members – $600

Non-SSA Student Members** – $300

Bookings open on 20 January 2016 and close strictly on Tuesday, 9 February 2016. Any bookings accepted after this date -should places still be available- will incur an additional $100 late-booking-fee.

** Proof of Valid University ID required

Course cost is being subsidized by the ARC Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS).

Course Location:

Gardens Point Campus, Queensland University of Technology, Brisbane.

Travel Expenses

Occasionally workshops have to be cancelled due to a lack of subscription. Early registration ensures that this will not happen. Please contact the SSAI Office before making any travel arrangements to confirm that the workshop will go ahead, because the SSAI will not be held responsible for any travel or accommodation expenses incurred due to a workshop cancellation.

Cancellation Policy

Cancellations received prior to Wednesday, 10 February 2016 will be refunded, minus a $20 administration fee.

From 10 February 2016 no part of the registration fee will be refunded. However, registrations are transferable within the same organisation. Please advise any changes to [email protected].

When:

17/02/2016

Time:

9:00 am - 5:00 pm

Cost:

from $200 (SSA student members)

Location:

Queensland University of Technology,
2 George Street,
Brisbane,
QLD 4000