Tuesday, July 10, 2007

I think everyone I know at UCT has at least one story about a nasty experience with ICTS. I tell you, I have many. One of the reasons is that I co-administrate the IBM Linux Competency Center (LCC). If you tried clicking on that link and it didn't work that either means you aren't on the UCT network (blame ICTS) or the problem I am about to describe has yet to be resolved.

A very brief background on the LCC. It is a lab with a rack of IBM servers ranging from dual core blades through to a server with 8 cores. I took over admining the lab almost exactly a year ago from now together with Jason Brownbridge who has since left UCT and Adrian Frith has taken over his duties. Running on our own subnet we often have to deal with ICTS, especially since our CS admin, Matthew West, left last month.

During this year ICTS has been gradually "upgrading" the UCT network, claiming the end user will benefit, although all I've heard that will be new is that they will have more control over the network such as being able to disable machines remotely. On Friday 29 June, the PD Hahn building in which the LCC is situated underwent the upgrade. This is when all the troubles began. To give you an idea, when the CS building underwent the upgrade it indirectly caused one of our sys admins to retire.

First problem was the IP adresses were all changed. We appeared to resolve that issue pretty quickly, although more on that later. Then they replaced the switch with a nice new Gbit Cisco switch. This is where the real problems start. The blade center could not connect to the switch. Three people from ICTS checked it out on seperate occasions - one of them checking it twice - and every time they've told us it's an issue on our end. So, we decided to put in our own switch to be sure. Guess what? It worked!! We're still following this up though, as it would be nice to get the Gbit switch back.

The other problem we're still experiencing started on Monday. All of a sudden after working on Sunday none of the nodes could connect to the external network. First one ICTS person told us they had been having issues deploying multicasting services on Friday and that it has been spreading the PD Hahn building, affecting the various subnets. However, today another ICTS person tells us that he knows of no network issues in PD Hahn. Tell me about miscommunication! He tells us that we're using the incorrect IP address. HOW?