grep

From:

Petr Pajas

Subject:

grep

Date:

Mon, 12 Jul 2004 12:27:50 +0200

User-agent:

Gnus/5.110002 (No Gnus v0.2) Emacs/21.3 (gnu/linux)

Hi folks,
I'm using grep to extract lines that start with '15' from a file
approx. 15MB in size. On a 3GHz Linux box it run for 1m30s. I found
that it was due to UTF-8 locales. If I switch to 8bit locales, it only
takes a fraction of a second. Strangely, it also takes only about 2s
if used with UTF-8 locales, but searching for lines that *contain* 15,
not only begin with.
$ grep --version
grep (GNU grep) 2.5.1
Copyright 1988, 1992-1999, 2000, 2001 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ LC_CTYPE=en_US.UTF-8 time grep '^15' u0057.lst >/dev/null
73.46user 0.19system 1:18.93elapsed 93%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (173major+61minor)pagefaults 0swaps
$ LC_CTYPE=en_US time grep '^15' u0057.lst >/dev/null
0.05user 0.02system 0:00.13elapsed 51%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (163major+37minor)pagefaults 0swaps
$ LC_CTYPE=en_US.UTF-8 time grep '15' u0057.lst >/dev/null
1.84user 0.01system 0:01.91elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (170major+53minor)pagefaults 0swaps
$ LC_CTYPE=en_US time grep '15' u0057.lst >/dev/null
0.07user 0.00system 0:00.13elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (160major+36minor)pagefaults 0swaps
These results make me believe there is something odd in the
implementation of either locale support or of '^'.
Thanks,
-- Petr