software development and consulting

In my current job, we do a great deal of “ex post facto” debugging from debug data that we record during operations. To help me do off-the-cuff analysis of these, I make heavy use of egrep and sed. I have created three scripts that encapsulate the most common patterns where I had complicated sed lines before.

The three scripts are:

clip – used to pluck out a particular portion of a line based on a regex of what comes before the portion and a regex of what comes after it

clipc – like clip except counts the number of occurrences of each unique plucked-out portion

clips – like clip but assumes the plucked out portion is numeric and outputs the sum of all of the plucked out portions

If I want to see how often GET is done from various IP addresses, I could do this:

% clipc '^''- .* "GET'< apache.log8 127.0.0.12 139.12.0.26 217.0.22.3

If I wanted to add up the number of bytes sent sending successful pages, do:

% clips '" 200'< apache.log50078

The clip program defaults to having the what-comes-before regex matching open paren, open bracket, double quote, single quote, or equal sign. It defaults to having the what-comes-after regex matching close paren, close bracket, double quote, single quote, comma, or whitespace. So, if I wanted just the first few dates from the above, I wouldn’t need any arguments:

Implementations of the Above Scripts

The clip could be written simply in a few lines of Perl:

my$pre=shift||'[\[("\'=]';my$post=shift||'[,\s"\')\]]';

while(my$line=<>){# look for $pre followed by whitespace then the (non-greedy) portion to keep# followed by some (non-greedy) amount of whitespace followed by $post.print"$1\n"if($line=~m{$pre\s*(.*?)\s*?$post}o );}