Hi,<br><br>I have an issue with segfaulting mom&#39;s that seems correlated with the is server trying to ping it&#39;s moms.<br>The server are version is torque-2.3.6-2cri.x86_64<br clear="all">We are currently supporting two OS&#39;s through the same batch system using submit filter and node properties. Therefore, we have two different versions of moms.<br>
Nodes 1-&gt;295 have moms torque-2.3.6-2cri.x86_64 and 296-&gt;309 have moms torque-2.1.9-4cri.slc4.i386<br><br>When the moms segfault we see that the torque-2.1.9 moms stay up and only the torque-2.3.6 moms all die.<br><br>
I ran one of them through GDB and can see the call stack:<br><br>Program received signal SIGSEGV, Segmentation fault.<br>0x000000000041813f in ?? ()<br>(gdb) where<br>#0 0x000000000041813f in ?? ()<br>#1 0x000000000041985e in ?? ()<br>
#2 0x0000000000419a70 in ?? ()<br>#3 0x0000000000416b97 in close_conn ()<br>#4 0x0000000000416c52 in close_conn ()<br>#5 0x00002b12d6cd7488 in wait_request () from /usr/lib64/libtorque.so.2<br>#6 0x0000000000416e1d in close_conn ()<br>
#7 0x00000000004170e1 in close_conn ()<br>#8 0x00002b12d6f2b974 in __libc_start_main () from /lib64/libc.so.6<br>#9 0x0000000000405eb9 in close_conn ()<br>#10 0x00007fff7565e368 in ?? ()<br>#11 0x0000000000000000 in ?? ()<br>
<br>Unfortunately this doesn&#39;t really give me any clues.<br>Does anyone have any other ideas?<br><br>Cheers,<br><br>Dug<br><br>-- <br>ScotGrid, Room 481, Kelvin Building, University of Glasgow<br><br>