Description

We have written a crawler based on code we found on the net. The code has run smoothly, however on numerous occassions it now hangs on certain pages - which have been changed and I guess it's the content of the pages that influences the behaviour. The script dies when the php max exec time is reached. An example page with which the statement "$index->addDocument($doc);" hangs is: http://www.febe.be/nl_BE/page/show/id/41

Please note that it seems most likely that the issue is not caused by the content of the page, because all other pages from that point onwards fail.

We would really like this problem to be resolved soon as we are planning to support UTF-8 in our application.

Posted by Frédéric Choquet (fchoquet) on 2009-06-26T08:42:25.000+0000

I met exactly the same problem as Eric.

We have several servers running zend lucene and only one failed de create index. I found out that mbstring.func_overload was accidentaly activated on this server.

The symptom was an infinite loop in Zend_Search_Lucene_Index_SegmentWriter::_generateCFS

readBytes did not return the right value. I had 5 bytes missing.

Maybe it's not actually a bug (mbstring.func_overload is a really weird option that prevents binary file handling) but you should prevent the execution to get into an infinite loop and raise a "mbstring.func_overload not supported" exception.

Posted by michal kralik (ceecko) on 2010-01-31T06:57:31.000+0000

I still experience the same issue in ZF 1.10
I managed to solve it by disabling mbstring.func_overload. That however prevented my app working with utf-8, which is not an option :(

Posted by Tomek Pęszor (admirau) on 2010-10-27T07:08:37.000+0000

How to reproduce the bug:

Add this at the end of php.ini:

mbstring.func_overload = 7

Then during building the search index, script hangs in an infinite loop at: