I noticed using procmon that the reads done by comskip are only 32KB, and some are even have the I/O flags "Non-cached, Paging I/O".

I'm thinking the non-cached means FILE_FLAG_NO_BUFFERING is set, which means the OS can't choose to do any read-aheads or caching for you on the file. ShowAnalyzer uses 256KB reads and has this "Non-cached" set on ALL the I/Os making it thrash the disk even harder.

I know there is an MSDN page saying to use it for performance, but I think it only ends up a benefit for totally random access, and in other cases like the processing of a video which is mostly sequential, is a huge hit.

For something mostly sequential like processing a video, I think it would be better to use FILE_FLAG_SEQUENTIAL_SCAN to hint the OS to do more prefetching of the next section of the file into cache. Or read the file in bigger chunks (multi-MB, ideally configurable).

As a test I tried priming the cache doing "E:\cygwin\bin\cat {path to a .wtv file} > nul" while 4 instances of comskip are running. This sped up the processing on that instance by over 10x. (I have 12GB, so generally entire recordings are cachable).

BTW, I threw together a small app that I now run in parallel with comskip, which reads the file throttled to 25MB/s (I haven't made the throttling configurable yet).

This allows me to process 2 HD shows simultaneously before being either CPU or disk limited on the Core i7 920. (For some reason even with 6 threads a single show stays below 40% CPU). SD shows I can get closer to 35MB/s if I pull the whole file into cache when comskip starts, but HD shows are a bit slower than the 25MB/s, and if it lags behind too much, 2 HD shows is bigger than my RAM so it ends up uncached again.

By running this with comskip, (along with the lowres option and 6 threads configured in the ini) I can have 2 1 hour HD shows processed in parallel in about 6 minutes. On a WD20EARS Green drive.

Reading the whole file without throttling an hour of SD takes less than 1 minute. It is kludgy with the separate process though, and if an HD show is slower than normal for some reason and gets kicked out of cache before comskip gets to whats read in it will be back to lots of slow disk seeks.

Here is the code of the app I threw together to read the files into cache:

Code:

// prime_cache.cpp : Defines the entry point for the console application.//

The filling of cache is done concurrently with comskip processing, and yes, it is many times faster.

"before being either CPU or disk limited" was bad wording on my part -- of course it is limited somewhere. Filling the cache keeps it from being disk limited, and even then the CPU isn't pegged 100% unless I run more than 2 instances of comskip at once -- I guess it just doesn't have 4+ threads that want to do anything at a given time.

The problem with small reads on a HDD is while the disk can get around 100MB/s, the time for it to seek to the track and spin around again is around 20ms (ok more like 11ms since we're probably already on the right track). So a 32KB read, assuming no read ahead in the drive itself, is waiting 11ms for the drive to spin around so it can do 0.32ms of actual reading.

I just came across the post about the missing ';' after disable_heuristics=4 in the comskip.ini, which I guess was making some of the settings screwy?After adding the ';' it processes an hour of HD in about 3 minutes even without pre-caching it, which is better than I had gotten it down to before even with the pre-caching.