Data-themed articles, essays, and studies

Merging Syntax

This is a little programming thing. and if that’s not your thing, please skip it and have a great weekend.

Like most longtime progammers, I prize brevity. That is: I am lazy. Given a long sequence of data-frame merging operations to perform in R, I thought a binary-operator shorthand for natural joining of two data frames sharing a single common column might be handy. (That’s cetainly a special merging case, but also not uncommon.)

So something like:

joined_df <- df1 | df2 | df3;

in lieu of a long sequence of merge() calls or equivalent subsetting logic. I attached a short file that implements the “pipe” shorthand as an S3 subclassing and Ops-override. I used subclassing logic interally rather than merge (probably faster), but either would be OK. Examples are included, in case you want to take a look. (Subclassing of the data frames is necessary, as | already has a use for data frames, so there is an appropriate constructor that works just like data.frame(). ) In my test cases (400K rows) performance was acceptable.

So, can you implement the shorthand in R? You can…. As for whether something like this is a good idea in general, I’m not entirely sanguine. Syntax can actually be too brief if we lose relevant context (here the joining column). But for a lot of simple joins, it is handy.