Can anyone give suggestion on where to alter this code so that I can calculate from 6001 to 10000 lines?
Thank you.

trist007

08-18-2011 01:55 AM

Code:

NR > 6000 && NR < 100001

NR = number record

David the H.

08-18-2011 02:12 AM

Please use [code][/code] tags around your code, to preserve formatting and to improve readability. Don't use quote tags, as they do not protect whitespace.

So just where are you having problems? As far as I can see, there's nothing wrong with the array loop. I don't really know enough about the math to evaluate the formulae at the end, however, or how you want to apply them.

The only thing I can suggest offhand is to keep a running total of the numbers added so far, so you don't have to hard-code it in. This minor modification worked for me in a quick test on a sample file. To average the lines 11-50, for example:

The problem the OP stated, however, was that he wanted to evaluate only a subsection of entries. This means we need some way to a) match only the desired range of lines, and b) count only the number of lines matched.

Sure you can put most of the work into the main block instead, just add an NR match and a running-count variable to the above, but it seems to me that using an intermediate array and doing the heavy stuff in the END block makes the code a little easier to read and work with.

It looks like I also need to repeat what I mentioned above. DO NOT USE "QUOTE" TAGS AROUND CODE! Quote tags don't preserve formatting. Please use CODE tags ([code][/code]), which do.

AnanthaP

08-20-2011 06:21 AM

To "David_the_H"

Selecting a sub set of records has nothing to do with avoiding a redundant use of an array - once the required records are selected. The first (selecting a sub set) is straight awk technique - which is fine and I assume it is understood by the OP. The second has to do with basic statistics exercises (STAT101). As the number of records becomes more and more, efficiency and hitting RAM limits, going into SWAP area for an avoidable array would become important. I think there is value in suggesting this method (to the OP).

As to why I used QUOTE tags instead of CODE tags its because you need to "go advanced" to use CODE tags - which I saw as un-necessary for a 3 line code snippet and if you think CODE tags need to be used more often (which seems reasonable), how about getting it placed in the "quick reply" box. Empirically you would then expect more posters to use the CODE tag when appropriate.

I expect that I won't get a special lecture about putting this suggestion in the suggestions thread.

By the way, when we use awk to select a sub set of records by a pattern what will NR return? The number of selected records or total records in the files in the argument list (ref:NR, FNR etc). I'll be trying it out.