Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

This week I’ve been at the R Users con­fer­ence in Albacete, Spain. These con­fer­ences are a lit­tle unusual in that they are not really about research, unlike most con­fer­ences I attend. They pro­vide a place for peo­ple to dis­cuss and exchange ideas on how R can be used.

Here are some thoughts and high­lights of the con­fer­ence, in no par­tic­u­lar order.

Håvard Rue spoke on Bayesian com­put­ing with INLA and the R-​​INLA pack­age. I was unaware of INLA before, but it is a much faster way than MCMC to do some Bayesian com­pu­ta­tions. It looks use­ful — I might try it sometime.

Christoph Bergmeir (who has just fin­ished vis­it­ing me at Monash for a few months) talked about the Rsio­pred pack­age (not yet on CRAN) which uses a fuzzy mul­ti­cri­te­ria approach to fore­cast­ing ETS mod­els. Essen­tially, it tries to opti­mizeRMSE, MAE and MAPE simul­ta­ne­ously, which will give biased fore­casts of course, but hope­fully more robust fore­casts. The opti­miza­tion is also bet­ter (in the sense of get­ting closer to the global opti­mum) than the ets() func­tion in the fore­cast pack­age. Christoph is also respon­si­ble for the big improve­ment in speed of theets() func­tion from v4.05 of the fore­cast package.

José Manuel Benítez Sánchez (Christoph’s boss) talked about the efforts of his team at the Uni­ver­sity of Granada to add machine learn­ing tools to CRAN.Their RSNNS pack­age looks good. Next time I fit a neural net, I’ll try it out.

Dun­can Mur­doch gave an inter­est­ing talk on the new fea­tures in R 3.0.x and beyond. The most inter­est­ing part was that future releases will include the bug fixes and per­for­mance enhance­ments iden­ti­fied by Rad­ford Neal. In ques­tion time, Dun­can explained why we will prob­a­bly never have pack­ages depen­dent on spe­cific ver­sions of other packages.

Steve Scott from Google talked about Bayesian struc­tural time series mod­els with regres­sors. Actu­ally, I’d heard his coau­thor (and boss) Hal Var­ian speak on the same sub­ject at the Oper­a­tions Research con­fer­ence in Rome last week. Look out for the bsts pack­age when it is released on CRAN. It looks very useful.

As usual, there were lots of peo­ple talk­ing about Sweave and knitR for repro­ducible research. I quite like knitR because it uses mark­down which is a very sim­ple doc­u­ment markup lan­guage. How­ever, for my pur­poses, I still pre­fer keep­ing the tex and R files sep­a­rate as explained here.

I heard two nice talks on visu­al­iz­ing lik­ert scale data by Kim­berly Speer­schnei­der (on the lik­ert pack­age not yet on CRAN) and Richard Heiberger (on the HH pack­age). Lik­ert scale data are the stan­dard fare of sur­veys, so it is good to see some seri­ous think­ing being done on how to graph the data usefully.

I had my first expe­ri­ence of light­ning talks, where each per­son gets 5 min­utes. These were sur­pris­ingly effec­tive and inter­est­ing. I espe­cially enjoyed Andy South on map­ping half a mil­lion sig­na­tures.

I gave a talk on the hts pack­age enti­tled R tools for hier­ar­chi­cal time series. I am now work­ing actively on hier­ar­chi­cal time series fore­cast­ing again, after a break from it for a cou­ple of years. So expect to see more on this topic in the com­ing months. It has a lot of appli­ca­tions, and there is a lot still to be done to develop the the­ory, method­ol­ogy and tools for han­dling hier­ar­chi­cal time series in practice.

RStu­dio demon­strated their new debug­ging fea­tures to a few of us dur­ing one lunch break. In fact, they allowed us to down­load the pre-​​release ver­sion for our own use, but I’m not allowed to tell you the URL! How­ever, if you google [rstu­dio pre­view release] you might find it. The debug­ging fea­tures are excel­lent, so look out for v0.98 avail­able soon.

I finally met the team from Rev­o­lu­tion Ana­lyt­ics, the peo­ple who pro­duce Rev­o­lu­tion R. Unfor­tu­nately, Rev­o­lu­tion R is not avail­able for 64-​​bit Ubuntu which is the plat­form I use.

I was amazed at how much effort is going into mak­ing R work with gigan­ti­cally enor­mous and humungous data sets. For me a seri­ously big data set has a few hun­dred thou­sand obser­va­tions, but many peo­ple are work­ing with ter­abytes, petabytes and even exabytes of data. Mind-​​boggling.

I can­not get used to the Span­ish habit of eat­ing in the very late evening, while still get­ting up for the first ses­sion at 9am. The con­fer­ence din­ner began at 9.30pm, and I left just after 12.30am as I wanted to get some sleep. The party was appar­ently still going well after 2am, includ­ing Steve Scott who was doing the 9am talk the fol­low­ing morning!