My Blog about Data

Kaggle

I found a new dataset about UK broadband speeds and I started analysing it in R. However, after cleaning the data, I thought that creating a dashboard with Shiny would take me too much time so I moved to Tableau. I wanted to keep my analyses in one place so I embedded the dashboard into the output html document (see below).

Initially I thought that RMarkdown can’t generate embedded Tableau visualizations because the iframe in my report seemed blank after knitting the report. I had to open the generated in the browser to see the iframe filled with Tableau dashboard.

Kaggle released another interesting data set. This time it’s a loan book of a P2P lender – Lending Club.
I had a stab at analysing it and here are some teaser charts that were created, but more can be found here.

Last month I took part in my first Kaggle competition using BNP Paribas Cardif’s data. The aim was to accelerate claims management process but my personal goal was to apply machine learning techniques.
That officially makes me a Kaggler 😛
I used xgboost R package to implement gradient boosting. The results are out so I know there’s a long way for me to improve my ML skills. I guess that I will need to work more on feature engineering and ensembling my models in future.

Kaggle publishes many interesting datasets and one of them was including various world university rankings.
I decided to run a quick analysis of the CWUR data and create a map in R using rworldmap package.

The initial results are here:USA and China outnumber other countries by the number of universities in the CWUR data.

The map shows that USA by far outnumbers other countries in the top 100 universities according to CWUR.