Details

Description

In the swarm server-side slave creation logic, when a node of the provided name already exists, '-$IP' is appended to the end in an effort to end with a unique name (see https://github.com/jenkinsci/swarm-plugin/blob/master/plugin/src/main/java/hudson/plugins/swarm/PluginImpl.java#L59)
However, as far as I can tell, that new name is never provided to the slave, so it doesn't seem like it'd be possible for the slave to connect with that name, and in my experience I've seen hundreds of collision-avoidance nodes in this setup, and have never once seen one online or connected in any way.

These dead "hyphen nodes" don't hurt running builds, but they are visual noise and false positives in our offline slave metrics, so it'd be nice if they could be avoided.

Attachments

Activity

That's actually of a high priority to some... All it takes to take down a cluster is plug out the wire for a second, Jenkins swarm clients loose connections and then try to reconnect, while Jenkins has not yet forgot about the previous slaves. I think that checking if a node (slave in my dictionary) is online should be done more often and better, especially when there are name collisions.

Ernestas Lukoševičius
added a comment - 2015-02-13 19:24 - edited That's actually of a high priority to some... All it takes to take down a cluster is plug out the wire for a second, Jenkins swarm clients loose connections and then try to reconnect, while Jenkins has not yet forgot about the previous slaves. I think that checking if a node (slave in my dictionary) is online should be done more often and better, especially when there are name collisions.

The current name collision avoidance uses the requests address, which could very likely be the same for all clients
as they could be being routed through a HTTP proxy (or two) so that is not a good disambiguator

We use a digest of the client's interfaces and MAC addresses and the remoteFSRoot to try and give a consistent ID

We ALWAYS append the ID if we have it as otherwise during reconnect the slaves with the same name will shuffle around
which defeats a lot of the login that Jenkins has internally based on slaves having a consistent name

In the event of legacy clients that do not have the ID we will let them connect with their name as long as there
is no online slave with that name. This does mean that where there are multiple legacy swarm clients with the
same name, only one can be on-line at any moment in time, but that is an improvement on the current where
once a shuffle starts, none can stay on-line

SCM/JIRA link daemon
added a comment - 2015-04-27 19:28 Code changed in jenkins
User: Stephen Connolly
Path:
client/src/main/java/hudson/plugins/swarm/Client.java
client/src/main/java/hudson/plugins/swarm/SwarmClient.java
plugin/src/main/java/hudson/plugins/swarm/PluginImpl.java
http://jenkins-ci.org/commit/swarm-plugin/ab37bc84eb9639888f3a66c68a9b1536c5882c88
Log:
[FIXED JENKINS-26558] Clients should provide a unique ID to be used for name collision avoidance
The current name collision avoidance uses the requests address, which could very likely be the same for all clients
as they could be being routed through a HTTP proxy (or two) so that is not a good disambiguator
We use a digest of the client's interfaces and MAC addresses and the remoteFSRoot to try and give a consistent ID
We ALWAYS append the ID if we have it as otherwise during reconnect the slaves with the same name will shuffle around
which defeats a lot of the login that Jenkins has internally based on slaves having a consistent name
In the event of legacy clients that do not have the ID we will let them connect with their name as long as there
is no online slave with that name. This does mean that where there are multiple legacy swarm clients with the
same name, only one can be on-line at any moment in time, but that is an improvement on the current where
once a shuffle starts, none can stay on-line