Display errors on queued computers (Unicast)

There are some display errors (and/or bugs?) when the max. allowed number of computers is deploying (Unicast) and some computers have been queued.

The first strange thing is that always “Attempting to check in … Failed” is displayed every 5 seconds when the time is increased:
In the scheduler log i can see this message every 5 seconds (The host names are matching the queued computers):

@Wayne-Workman THey base on the time. If all of them have the exact same checkin time, they will ALL say 0 before me, because it doesn’t know which is which, (of which I’ll likely fall back to the ID of the tasking).

@Tom-Elliott Will the number be just all over the place or will it be consistent? Like if 30 computers all check in at exactly the same moment and maximum clients is set to 10 and 10 start, will all other 20 show that “There are 20 before me” ?

@Wayne-Workman The count issue is based off of timing now. So if 5 hosts checkin at the same time, they all have the same count value.
Normally there is a delay, but in the case of group tasks, this might not exist.

Normal cases will be accurate, and I suppose this one is accurate as well.

@tian My worry for the display is that all the clients have the same time starting out. Because of this, they all get updated at the same time too. I update the check in times if they have timed out now, so order should be maintained. However, if all times are identical, there’s no real way to know which one is first or not. Again, it’s a minimal problem, though I can understand the “huh?” issue.

@Tom-Elliott Maybe the group deployment task is not working correctly with this in the trunk versions. Except the wrong display the other things are working. We also just use one server (and on this one the default storage/node). We mostly use group deployment tasks - single deployment tasks are not used that often and wouldn’t reach the limit we set.

@tian While I’m glad we’re closer, I hope to have finally gotten this more properly solved. The checkin process was constantly updating the time which is why you would see the numbers change (depending on the other host checking in).

It should be good now, Hopefully.

I don’t want to tell you - but now (Version 6981) all queued computers display “There are 0 before me” again …(It also would be fine just to display the total number of computers waiting - or just display the time waited - if this problem consumes too much time at the moment…)

@tian While I’m glad we’re closer, I hope to have finally gotten this more properly solved. The checkin process was constantly updating the time which is why you would see the numbers change (depending on the other host checking in).

@Tom-Elliott Thanks for your hard work. Now the numbers are changing and are different. But these are totally mixed up now (Version 6977):
The pictures were taken from different computers.
A lot of times there still are the same numbers to displayed on different computers at the same time (maybe the 5 seconds interval is too huge to see if there are moments with double numbers).

Is there anything else I can provide?

(In fog 0.32 the queued numbers were/are fine - with the difference that there was/is a delay till free slots can be used by queued computers. Now in the recent versions the computers starting up first gets the free slots - but that is ok.)

Just a little more information i just found : if i click the “force task to start” button in the “Task Management”, then the deploy task does start. By the way, if you need some testing just send me a mail. I do have a good linux and developement knowledge which may help to debug if needed.

@Tom-Elliott sure, here it is ! So i create a basic task (deploy), then it just loop on the “attempting to check in”. This task did work fine (same machine/image) a few hours ago just before i update to latest git (svn).

Just a little message to say that i do have the same problem with SVN 6975 : i have an “attempting to check in” loop when trying to restore an image. It was working fine before i updated (i don’t remember which version i was previously but it was SVN 69xx).

@Tom-Elliott I just did a test with version 6971 and it looks different now but still seems not completely correct.
The number in “There are x before me” is counting now, but it displays the same number on all the queued computers again:

When there are four comupter waiting it is “3” on all waiting computers

with five computers it is “4” on all waiting computers

at the end all six queued computers display “There are 5 before me” on all waiting computers
I waited 10+ minutes again, but the computers didn’t display different numbers.

Here you can see the change of the number (that takes place on all computers) when one more computer is waiting: