Linux Blog

Happy new year! I guess it’s time for a yearly update, I feel like everyone else has done it and now its my turn. Hit the jump for some more statistics that are probably only interesting to yours truly.

Top 10 PostsInterestingly enough none of these were written this year. Perhaps I should write a query to extract the most popular ones of this year, I’m not sure they’re getting the same search love as my older stuff.

I got a question from a user called Bastiaan. He had found my site while searching for ‘cut from end of line Linux’ and landed on the Using cut – shellscript string manipulation article. I haven’t received a lot of feedback on it, but am happy with the feedback I have and the amount of visits it gets. As I’ve said before if no one else reads The Linux Blog I still use it as a reference, so I am glad people are finding it useful. Anyways, Bastiaan’s problem was he works in a University and has a file with A LOT of DNA records in it. He needed to grab the last 50 characters of each line, regardless of the line length. After some correspondence we came up with a solution.

I have experience in doing this sort of thing in other languages such as PHP but not bash. Here is what I came up with for bash:

While this was really quick to write it is not the most efficient way in the world. It has to read each line, echo it out, calculate the length of the line, subtract 50 from it. Again, does the job but not very gracefully.

Bastiaan then had told me he reversed the whole file and then was processing that with cut. I have heard of tac, to reverse entire files, but not had never heard of rev. Using rev I assumed that he was running something like the following:

rev file.txt > rev_file.txt
cat rev_file.txt |cut-c-50|rev

rev file.txt > rev_file.txt
cat rev_file.txt | cut -c -50 | rev

That will get you the last 50 characters from each line (well, really the first 50 of a reversed file) That works pretty good so the final solution was to try to stream line it a little bit so that it could be done in one step.

rev file.txt |cut-c-50|rev> out.txt

rev file.txt | cut -c -50 | rev > out.txt

So there you have it, if you’re looking to use cut to “cut” characters from the end of the line, the above will cut 50 characters off of the end. Obviously you can remove the last “> out.txt” to get the output on the screen.

This post is designed to be a refresher, reference or quick intro into how to manipulate strings with the cut command in bash.

Some times its useful to take the output of a command and reformat it. I sometimes do this for aesthetic purposes or tor format for use as input into another command.
Cut has options to cut by bytes (-b), characters (-c) or fields (-f). I normally cut by character or field but byte can come in handy some times.
The options to cut by are below.

N N’th byte, character or field, counted from 1
N- from N’th byte, character or field, to end of line
N-M from N’th to M’th (included) byte, character or field
-M from first to M’th (included) byte, character or field

The options pretty much explain themselves but I have included some simple examples below:Cutting by characters (command on top, output below)

echo "123456789" | cut -c -5
12345

echo "123456789" | cut -c 5-
56789

echo "123456789" | cut -c 3-7
34567

echo "123456789" | cut -c 5
5

Sometimes output from a command is delimited so a cut by characters will not work. Take the example below:

echo -e "1\t2\t3\t4\t5" |cut -c 5-7
3 4

To echo a tab you have to use the -e switch to enable echo to process back slashed characters. If the desired output is 3\t4 then this would work great if the strings were always 1 character but if anywhere before field 3 a character was added the output would be completely changed as followed:

echo -e "1a\t2b\t3c\t4d\t5e" | cut -c 5-7
b 3

This is resolved by cutting by fields.Cutting by fields

The syntax to cut by fields is the same as characters or bytes. The two examples below display different output but are both displaying the same fields (Fields 3 Through to the end of line.)

echo -e "1\t2\t3\t4\t5" | cut -f 3-
3 4 5

echo -e "1a\t2a\t3a\t4a\t5a" | cut -f 3-
3a 4a 5a

The default delimiter is a tab, if the output is delimited another way a custom delimiter can be specified with the -d option. It can be just about any printable character, just make sure that the character is escaped (back slashed) if needed. In the example below I cut the string up using the pipe as the delimiter.

echo "1|2|3|4|5" | cut -f 3- -d \|
3|4|5

One great feature of cut is that the delimiter that was used for input can be changed by the output of cut. In the example below I change the format of the string from a dash delimited output and change it to a comma.

For the above example I pipe the output of uptime to cut and tell it I want to split it with a comma , delimiter. I then choose fields 1 and 2. The output from that cut is piped into another cut that removes the spaces in front of the output.
Load averages extracted from uptime:

This is about the same as the previous example except the fields changed. Instead of fields 1 and 2 I told it to display fields 4 through the end. The output from that is piped to another cut which removes the three spaces that were after the comma in "4 users, " by starting at the 3rd character.
The great thing about cutting by fields is that no matter if the field length changes the data stays the same. Take the example below. I now have 17 users logged in which would have broke the output if I had used -c (since there is an extra character due to a double digit number of users being logged in.)

That just about covers everything for the cut command. Now you know about it you can use cut to chop up all types of strings. It is one of the many great tools available for string manipulation in bash. If you can remember what cut does it will make your shell scripting easier, you don’t need to memorize the syntax because all of the information on how to use cut is available here, in the man pages and all over the web.