Description of problem:
For various reasons (code?, network issues, ...), satellite-sync can sometimes
get wedged, never dying, but never finishing either. (I've got RHN Satellite
3.4 running). When this happens, the regular cronjob doesn't start if the
previous cronjob hasn't yet completed, so unless you're watching the emails
closely, it can be days before you realize "hey, I'm not getting any updates
from satellite-sync now".
So, I wrote a script to run as the cronjob, which checks on satellite-sync every
minute, and kills it if it hasn't completed within 24 hours. I'd encourage you
to add this (or something better) to the product and/or docs in order to make it
more resiliant to satellite-sync hangs.
#!/bin/sh
# to make the job scheduler report sigchld immediately
set -bm
perl -le 'sleep rand 9000'
trap check_child CHLD
satellite-sync --email > /dev/null 2>&1 &
PID=$!
function check_child()
{
if ! `ps -p $PID > /dev/null 2>&1` ; then
exit 0
fi
}
# give the satellite-sync up to 24 hours to complete
# and kill it after that
let i=0
while [ $i -lt $((60 * 24)) ]; do
sleep 60
i=$((i+1))
done
kill $PID > /dev/null 2>&1

Can you provide more data about the circumstances where satellite-sync wedges?
Dates & times, exact commandline used, etc will provide us with data to attack
the fundamental performance issues you're seeing.

These have been open for years with no investigation or resolution. Since then the code base has moved on significantly, such that many of these no longer would apply to the current spacewalk code. I'm closing these requests in the hope they're no longer necessary, or if they are, they'll get discovered anew.

Note

You need to
log in
before you can comment on or make changes to this bug.