Description

We use persistent connections which are normally fine, but when a Mongo server is restarted, if we don't also restart Apache we get a bunch of errors about not being able to send the request. It seems like the driver doesn't detect when the connection is lost.

I am working on a major rewrite of the connection code to use connection pooling, which should take care of these persistent connection issues. I would really appreciate it if you could try this new code (on a NON-PRODUCTION system) and let me know if it works better for you.

Kristina Chodorow (Inactive)
added a comment - Apr 05 2011 04:53:26 PM UTC To everyone watching this issue:
I am working on a major rewrite of the connection code to use connection pooling, which should take care of these persistent connection issues. I would really appreciate it if you could try this new code (on a NON-PRODUCTION system) and let me know if it works better for you.
You can download the latest code from https://github.com/mongodb/mongo-php-driver (should show up as version 1.2.0-).
You shouldn't need to change any PHP code.
In the interests of not getting "off-topic," please let me know about general problems with connection pooling at PHP-132 . Feel free to comment about problems reconnecting on this bug.

Hi. I am seeing this issue (or a similar one) with updated drivers, version 1.2.3, connected to a replicaSet running 1.8.3. I can reproduce it reliably. There are several "game" machines running nginx + php_cgi, connected to my replicaSet. If I go to the primary and issue a rs.stepDown(), or take it offline, then the replicaSet reconfigures itself and elects a new primary. However, php code can no longer query the database: I get the "max number of retries exhausted, couldn't send query" message in my logs that collect the exception, and sometimes a "couldn't get response header" as well. I waited for 10-15 minutes, and some queries never completed. I did a new test, disabling APC in php.ini, just to be sure. Same results. If I restart php-cgi on a given machine, all queries immediately complete, and I never get the error.
I believe the reason some queries complete without the restart after 10-15 minutes is because php_cgi respawns some of the child processes by itself, but this is difficult to diagnose.
What I can reproduce reliably is that, after a stepDown, or when mongod is stopped on the primary (simulating a failure), these exceptions pile up for several minutes. Again, restarting php-cgi immediately restores connectivity for one machine (others that are not restarted still can not connect.)
If there is anything I can try to help you diagnose this, please let me know. Also, if there is a way to "force" the persistent connection (I assume the driver is maintaining one "behind the scenes") to be recycled when I get this exception, this will also help us. Thanks.

Mauricio Piacentini
added a comment - Aug 23 2011 01:39:39 AM UTC Hi. I am seeing this issue (or a similar one) with updated drivers, version 1.2.3, connected to a replicaSet running 1.8.3. I can reproduce it reliably. There are several "game" machines running nginx + php_cgi, connected to my replicaSet. If I go to the primary and issue a rs.stepDown(), or take it offline, then the replicaSet reconfigures itself and elects a new primary. However, php code can no longer query the database: I get the "max number of retries exhausted, couldn't send query" message in my logs that collect the exception, and sometimes a "couldn't get response header" as well. I waited for 10-15 minutes, and some queries never completed. I did a new test, disabling APC in php.ini, just to be sure. Same results. If I restart php-cgi on a given machine, all queries immediately complete, and I never get the error.
I believe the reason some queries complete without the restart after 10-15 minutes is because php_cgi respawns some of the child processes by itself, but this is difficult to diagnose.
What I can reproduce reliably is that, after a stepDown, or when mongod is stopped on the primary (simulating a failure), these exceptions pile up for several minutes. Again, restarting php-cgi immediately restores connectivity for one machine (others that are not restarted still can not connect.)
If there is anything I can try to help you diagnose this, please let me know. Also, if there is a way to "force" the persistent connection (I assume the driver is maintaining one "behind the scenes") to be recycled when I get this exception, this will also help us. Thanks.