My new book is now available from Amazon. From the cover: The Essentials of Data Science: Knowledge Discovery Using R presents the concepts of data science through a hands-on approach using free and open source software. It systematically drives an accessible journey through data analysis and machine learning to discover and share knowledge from data. […]

Data scientists rely on the freedom to innovate that is afforded by open source software. We often deploy an open source software stack based on Ubuntu GNU/Linux and the R Statistical Software. This provides a powerful environment for the management, wrangling, analysis, modeling, and presentation of data within a tool that supports machine learning and […]

The fully open source software stack of the Ubuntu Data Science Virtual Machine (DSVM) hosted on Azure is a great place to support an R workshop or laboratory session or R training. I record here the simple steps to set up a Linux Data Science Virtual Machine (in the main so I can remember how […]

This was originally shared as a Revolution Analytics Blog Post on 25th October 2016. Programming is an art and a way we express ourselves. As we write our programs we should keep in mind that someone else is very likely to be reading it. We can facilitate the accessibility of our programs through a clear presentation […]

I had the privilege to join a panel in 2014 that explored big data opportunities and challenges. Together, coordinated by Professor Zhi-Hua Zhou, we captured our thoughts into a paper published in the IEEE Computational Intelligence Magazine (Volume 9, Number 4). It is an honour to learn that we have received a 2017 IEEE Outstanding […]

I have released an alpha version of Rattle with two significant updates. Eugene Dubossarsky and his team have been working on a Shiny interface to generate ggplot2 graphics interactively. It is a package called ggraptR. This is now available through Rattle’s Explore tab choosing the Interactive option. In line with Rattle’s philosophy of teaching programming […]

Data Scientists have access to a grammar for preparing data (Hadley Wickham’s tidyr package in R), a grammar for data wrangling (dplyr), and a grammar for graphics (ggplot2). At an R event hosted by CSIRO in Canberra in 2011 Hadley noted that we are missing a grammar for machine learning. At the time I doodled […]

A 5-video series called Data Science for Beginners has been released by Microsoft. It introduces practical data science concepts to a non-technical audience… making data science accessible – keeping the language clear and simple as an entry point to understanding data science. http://aka.ms/data-science-for-beginners-1 http://aka.ms/data-science-for-beginners-2 http://aka.ms/data-science-for-beginners-3 http://aka.ms/data-science-for-beginners-4 http://aka.ms/data-science-for-beginners-5 Graham @ Microsoft

The R package rattle provides a dataset that I have been collecting over a few years now from the Australian Bureau of Meteorology. Like most of the datasets in rattle it is also available as a CSV file as part of the package (as well as a proper R dataset) and can also be downloaded […]

A new release of Rattle has hit CRAN – this is version 4.0.0 and brings a variety of stability fixes and enhancements. For example, Jose A Magaña has added support for the display of pairs plots. An obvious addition is the Connect-R button on the toolbar – this will take you to Connect-R where R […]