I have a dataframe of prescribing data from UK practices.
The original data is at http://datagov.ic.nhs.uk/T201207.exe. I've wrangled it into a PCT level data frame, ordered by PCT and by the most common prescription (descending order in the 'items' column).

There are 151 pct's and each has over 1000 items. I want to extract the top 50 items for each pct. I know I could write a for loop and just iterate over the levels of pct, but that's not R. I haven't figured out how to use apply or sapply to do the subset over the levels. This seems to be better at getting entire columns than getting a subset of the rows.

@Arun the executable file is a 'self-expanding zip' which contains 2 csv files, which are the data. Thank the NHS.
–
SuzFeb 24 '13 at 14:56

Thank you @Anthony. I spent about an hour looking, but I guess I didn't use the correct terms. I've added a couple of tags to that one so it might be more findable for the next person.
–
SuzFeb 24 '13 at 15:05

I was going to suggest that this can be done straightforwardly with data,table, but it appears @arun already pointed that out in the question Anthony linked to. Perhaps close this as duplicate?
–
Ricardo SaportaFeb 24 '13 at 19:30

This is a good answer, and it works. I've voted it up. I have been trying to learn the R way and staying with base functions, but I may have to give in. I keep seeing plyr used in useful ways. I have voted to close this question as it is identical to a previous one ('how to find top N values by group...'). However, the plyr way is not suggested on that question. Perhaps you could add it there. (I'm happy to vote it up..)
–
SuzFeb 24 '13 at 21:14

This answer and the one you were linked to are NOT the same. This just picks the first 50 elements, irrespective of ties. They are similar, but not identical. I don't mind voting to close the question since you've done so. But read the other post carefully and see if that's what you require, because from your question, it isn't obvious.
–
ArunFeb 24 '13 at 22:51

In this case, I don't care about ties. I have ordered the data on 3 fields. I'm using one as a factor to group the data, a second as a ranking that I'm interested in, and the 3rd to define the edges (break ties). So it's well resolved. The other question includes this case as a subset, and @Ista's 1st suggestion there answered my question. Answers on that page also address the question of ties in some detail, but as a secondary issue. I don't see that the questions are sufficiently distinct to keep this question open, but perhaps your point is that ddply() won't handle the ties.
–
SuzFeb 26 '13 at 13:44

Both lsta's and my first solution there answer this, to be precise. However, that wasn't supposed to be the answer to THAT question, as Anthony specifically asked for dealing with ties in the question. However, the issue seems to be resolved. The question seems to be closed. All is well. good luck.
–
ArunFeb 26 '13 at 13:50