How do you fix broken Hub Routes

Today I hit an issue where I noticed that if w/in Infrastructure Manager if I log into any other hub (ex: Tunnel Server) besides the Primary Hub, trying to access any other hub turns red and all hubs are not accessible. I can only access the other tunnel servers and the primary hub.

If I log into the primary hub then all client hubs are good, and I can navigate to them. What causes this and how do you fix this? How do you update the hub route tables so all other hubs can access each other.

I could be misunderstanding but it seems to me that it is likely just a matter of how it works when logged into a tunnel server. The primary has access and vision to everything down via the tunnels, but up from the perspective of a tunnel hub it may not have access to everything. It can't go up to the primary and see everything down. This is just based on my understanding of how it probably works and not from any experience with such a setup.

No. All hubs can see all the other respective hubs and access them via the IM tool. I've even logged into IM when on clients hubs to make updates and it could still see all to other hubs and navigate to the other hubs as well. Something was wrong with the hub routes. I don't know what happened but about 4 hours after the primary hub restart, ~9:35 at around 11:35ish, it magically started working. I didn't do anything and when I was logged into the UIM Hub collector box, it started to work and I was able to access all other hubs again. IDK what fixed it but just time.

Also another suggestion was I could of tried stopping the Nimsoft Hub boxes. Delete the hubs.sds file and then start back up. That wold forcibly cause all the route to be re-constructed. I did not do this but it was suggested.

So, the hub has a nice little defect - well actually it winds up being a couple.

The fundamental issue is that every hub in your environment will periodically send out the list of hubs that it knows about and whether it thinks each hub exists.

Second, any time a hub learns about something that's different from it's current known set of hubs, it will incorporate that information into its current hub list and it will then send that information to every hub it knows about.

Part of this hub information that gets sent around is an instruction that will cause other hubs to remove knowledge of a hub based on one hub's opinion that it no longer exists.

Mix this behavior with less than instantaneous communication of information and things can get interesting.

Consider what might happen should one of your hubs lose the ability to communicate with your central hub. Eventually it will flag that central hub as no longer existing and it will broadcast that information to all the hubs it knows about.

The fun part is that eventually that message will reach your central hub and it will absorb the information that it doesn't exist any more and delete itself.

Also, especially after restarting a hub, there will be some hubs that can reach it and some that can't. As a result, they will all be sending to all the other hubs knowledge about whether the hub is reachable or not. It creates a kind of electronic argument.

Broadcom/CA folks, is there a way to force update the hub route across the board? Something I can do to get this to update? This did start working after several hours but that's hours of possible downtime in terms of working with the tool. How do you force a hub update so this gets fixed quicker?

And just to be clear on the process mentioned above, as it was explained to me, you have to stop all (every single one of them - can't miss even one) hubs. Then delete the hubs.sds (probably the robots.sds too for good measure). Then start the hubs one at a time. You should wait between starting hubs until all the currently running hubs agree on the status of the hub network otherwise you run the risk of reintroducing the inconsistencies. And then you have to guarantee that no one ever rolls back to a prior snapshot, or backup image, or turns on an old server to see what it is, or anything that might introduce the old version of the hub network.

With 3,000 unique locations with hubs, this process struck me as hilarious.

Or as one of the engineers said, that's why everything I do has retries of at least two....

On the brighter side of things, I do know that their engineering group is aware of the issue and that it is on the radar. Afterall, this has to be impacting the SaS offering and so it would have a direct impact on CA's revenue too now.