The Fastest Growing Data Science Software

Christian revealed that R started as an academic-oriented project but has been gaining popularity in the business circles lately. This is due to the plethora of extra packages developed on the fly by enthusiasts, the widely available support by the community, the quality of its charts and the opportunity to code every step of the analysis process in its programming language, which is fully customizable.

0

votes

Last Wednesday, the data science aficionados in Sofia enjoyed another exciting event brought to them by Data Science Society. This time the presenter believes so much in what he presented that he is building his startup based on it. Christian Mladenov introduced the software and programming language R in the cozy atmosphere of Eleven roof – the co-working space for start-ups in Sofia. As most of our speakers, Christian has diverse background – he graduated his BSc in Business Administration at Hogeschool INHOLLAND and obtained his MSc at RSM in the Netherlands. He gained experience as a software developer in Fredhopper, marketing expert in Agilent Technologies, product manager and business analyst at HP in the Netherlands and in Bulgaria, and a compliance intern in UBS in London. Currently he is running his own business as the co-founder of Intuitics.

At Intuitics Christian and his team are developing tools for building intuitive web applications for data analysis. At the core of this effort stands R.R is an open source statistical software with a meta-programming language behind. Christian revealed that R started as an academic-oriented project but has been gaining popularity in the business circles lately. This is due to the plethora of extra packages developed on the fly by enthusiasts, the widely available support by the community, the quality of its charts and the opportunity to code every step of the analysis process in its programming language, which is fully customizable. Its capabilities have spawned a rich ecosystem of graphical interface applications, commercial applications, packages dedicated to data analysis and so on. Christian compared R to some other competitors and in his view, only Python comes close. He mentioned that companies like Google, Facebook, Amazon, Microsoft, Dell and HP use R, sometimes for prototyping solutions before implementing them under Java or Python. Unfortunately, a serious drawback for big data is the memory limitation for datasets and the lack of multithreading support.

After this introduction, our speaker demonstrated the main features of the programming language. As in other languages, R has objects that have classes, and functions that manipulate the objects. Unlike most languages, you can assign a value to a function. Among the most useful objects for data analysis are vectors and matrices. A very interesting concept to R is the list object – a collection of other objects that might be from different classes in the same list. Most data analysis is conducted by employing data frames – a concept similar to tables in SQL and Excel.

Christian gave a practical example how to use R for analyzing a dataset of wines. The purpose of the exercise is to determine how the chemical properties of the wines affect the quality. He started by loading the datasets into R, demonstrated how to manipulate them by adding new variables, merging with different datasets, filtering out rows and columns. The graphical capabilities of R are its centrepiece and Christian gave us examples how to utilize them by preparing the data for plotting and building the plots. He also got the audience acquainted with packages for neat summary tables and correlation tables.

In trying to infer the wine rating from its qualities, our speaker showed us two approaches under R – a linear regression and decision trees. For the latter, he grouped the ratings in three groups and explained that classification models might work better for datasets where the target variables is clustered around few values, as is the case.