On Mon, 18 Aug 2003 18:54:20 -0300, Gustavo Niemeyer <niemeyer at conectiva.com> wrote:
>> When looking for a particular string, rather than a pattern, you are
>> probably better off to use the find or index method of that string
>> rather than re.
>>>> >>> 'abcdefg'.find('d')
>> 3
>>Just out of curiosity, the preference for the find() method rather than
>using SRE might not be so obvious. SRE has an internal optimization
>which handles literals in a different way, avoiding some of the overhead
>from the engine. Additionally, it includes an overlap algorithm which
>speeds up the linear searching in many cases, specially with large data.
>I wonder why find doesn't take advantage of the same thing, whatever it is
(Boyer-Moore? or some running fifo window hash check?). Probably the usual ...
people who can do it are too busy to do it in free time, and no-one's throwing money ;-)
>Here is a cooked example which highlights that fact:
>>-----------------------------
>import time
>import re
>>s1 = "x"*30+"a"
>s2 = "x"*1000000+"a"
>p1 = re.compile(s1)
>>start = time.time()
>for i in range(1000):
> p1.search(s2)
> print time.time()-start
>>start = time.time()
>for i in range(1000):
> s2.find(s1)
> print time.time()-start
>-----------------------------
>>And here is a sample output:
>>% python test.py
>45.2645740509
>233.810258031
>That looks worth improving on. Looking quickly at stringobject.c,
this looks like it might be the forward search guts:
...
if (dir > 0) {
if (n == 0 && i <= last)
return (long)i;
last -= n;
for (; i <= last; ++i)
if (s[i] == sub[0] && memcmp(&s[i], sub, n) == 0)
return (long)i;
...
If so, you can see a reason for your results.
But I don't see any reason why the performances couldn't be made to match.
Regards,
Bengt Richter