which seems to add on about 30% time when na.rm=FALSE. Presumably na.locf has other merits, capturing more of the corner cases and allowing filling up instead of down (which is an interesting exercise in the cumsum world, anyway). It's also clear that we're making at least five allocations of possibly large data -- idx (actually, we calculate is.na() and it's complement), cumsum(idx), x[idx], and x[idx][cumsum(idx)] -- so there's room for further improvement, e.g., in C