On Tue, Mar 16, 2004 at 10:03:44AM -0800, Steve Harris wrote:
> All I get from a backtrace is:
>
> #0 0x400e04a9 in compress3 (num=2139062143,
Yep, buffer overflow.
I'll have to defer to Jose for this problem.
What's the point of indexing such a large file?
> buffer=0x487ab00f
> "\202�p\001��|\203�+\201\234�dBC��k\201\216�c��(\237�s�\005\235�x\002\201\221-\004\203B\002\212h\004�W\002�w\002�z\002\204:\004\205g\005\234+\003�N\002\216p\002\210\032\002�\006\003\201h\001�r\003�")
> at compress.c:140
> 140 _s[_i++] = _r & 127;
> #1 0x7f7f7f7f in ?? ()
> Cannot access memory at address 0x7f7f7f7f
>
> The file its processing is quite large:
> $ wc /raid/swh/lit_index/segv.lit
> 5065943 9424230 50550321 /raid/swh/lit_index/segv.lit
>
> and contains some 8bit characters, but if I run it through sort | uniq it
> doesn't cause problems. Its fairly simple file, with one phrase per line,
> longest line is 255 characters.
>
> There are a few thousand similar files in the directory tree, that parse
> fine, but this is by far the largest. It doesnt appear to matter at what
> position it appears in the parse order.
>
> I've made the file available at http://triplestore.aktors.org/~swh/segv.lit
> incase anyone wants to test it.
>
> - Steve
>
--
Bill Moseley
moseley@hank.org