Blog Archives

Now that I’m on my winter break, I’ve been taking a little bit of time to read up on some modelling techniques that I’ve never used before. Two such techniques are Random Forests and Conditional Trees. Since both can be used … Continue reading →

Recently at work we got sent a data file containing information on donations to a specific charitable organization, ranging all the way back to the 80′s. Usually, when we receive a dataset with a donation history in it, each row … Continue reading →

When I finally figured out how to successfully use the ff package to load data into R, I was apparently working with relatively pain free data to load up through read.csv.ffdf (see my previous post). Just this past Sunday, I … Continue reading →

Penulti-what? Let me explain: Today I had to iteratively go through each row of a donor history dataset and compare a donor’s maximum yearly donation total to the second highest yearly donation total. In even more concrete terms, for each … Continue reading →

Before choosing to support the purchase of Statistica at my workplace, I came across the ff package as an option for working with really big datasets (with special attention paid to ff dataframes, or ffdf). It looked like a good … Continue reading →

The saga with Statistica continues: Statistica kept crashing on me while doing my data processing. One of the big problems was a wonderful bug that occurred when some of my text data variables were coded (unsurprisingly) as text! Under this … Continue reading →

Context: I work with data from non-profit organizations, and so a big concern in many of my analyses is if and how much people are donating from one year to the next. One of the things I normally like to do … Continue reading →

I’ve been spending a lot of time in the last month or so doing projects at work not statistics related, hence the lack of posts! In the interim, I had to do some serious research on handling datasets bigger than … Continue reading →

This past week for work I had to create some plots to show the max, min, and median of a measure across the levels of a qualitative variable, and show the max and min of the same variable within a … Continue reading →

I recently finished a long stretch of work on a particular project that required me to draw upon four R packages. Each time I got back to my work on the project, I’d have to load the packages manually, as … Continue reading →