@MaxMackie askubuntu.com/questions/88142/…. I can't get ahold of a mod there at this hour, so I flagged it asking them to migrate if they're willing; it already has an accepted answer so I'm not sure if they will
–
Michael Mrozek♦Dec 16 '11 at 7:10

@MichaelMrozek, hmmm what usually happens in these situations? Do we simply keep the duplicates?
–
maxmackieDec 16 '11 at 7:11

5 Answers
5

Aside from how to cut and re-arrange the fields (covered in the other answers), there is the issue of quirky CSV fields.

If your data falls into this "quirky" category, a bit of pre and post filtering can take care of it. The filters shown below require the characters \x01,\x02,\x03,\x04 to not appear anywhere in your data.

Here are the filters wrapped around a simple awk field dump.

Note:field-five has an invalid/incomplete "quoted field" layout, but it is benign at the end of a row (depending on the CSV parser). But, of course, it would cause problematic unexpedted results if it were to be swapped away from its current end-of-row position.

Update; user121196 has pointed out a bug when a comma precedes a trailing quote. Here is the fix.

how would you delete the nth column based on this filter?
–
user121196Dec 3 '12 at 10:50

@user121196 - As mentioned in its opening sentence, this answer shows a way to make the CSV data more consistent.. eg. by termporarily replacing a quote-embedded comma with a neutral token character... and then reverting it back into a comma after the move/cut/delete. Again, as mentioned, the move/cut/delete step is replaced by a simple awk field-dump.
–
Peter.ODec 3 '12 at 11:39

This depends on whether your CSV file uses commas only for delimiters, or if you have madness like:

field one,"field,two",field three

This assumes you're using a simple CSV file:

Removing a column

You can get rid of a single column many ways; I used column 2 as an example. The easiest way is probably to use cut, which lets you specify a delimiter -d and which fields you want to print -f; this tells it to split on commas and output field 1, and fields 3 through the end:

$ cut -d, -f1,3- /path/to/your/file

If you actually need to use sed, you can write a regular expression that matches the first n-1 fields, the nth field, and the rest, and skip outputting the nth (here n is 2, so the first group is matched 1 time: \{1\}):

$ sed 's/\(\([^,]\+,\)\{1\}\)[^,]\+,\(.*\)/\1\3/' /path/to/your/file

There are a number of ways to do this in awk, none of them particularly elegant. You can use a for loop, but dealing with the trailing comma is a pain; ignoring that it'd be something like: