Our Toronto office moved on the weekend. We are also getting all the configuration details ready for a switchover to a new VPN infrastructure. We hadn’t planned to go live on the new infrastructure until the end of this week or the beginning of next week, but Thursday last, two days before the Toronto office move, we found out that the old ISP for Toronto would be unable to provide service in the new location, and the only connectivity we would have would be the new infrastructure.

A flurry of router configuration, proxy server construction and general networking hackage ensued. We connected Toronto to our main office via the new VPN. Since the new VPN infrastructure will have an Internet gateway that isn’t ready yet, the routers on the new VPN don’t know about the old internet gateway at the head office, so the only way to get the Toronto users on to the internet was to use a proxy server that was aware of both the old gateway and the new VPN. I built that as a virtual machine on a machine that was already mostly built, and deployed it at the head office, and over the weekend we got everybody connected from Toronto.

Unfortunately, I needed the server that I deployed the proxy on to send to the location where our new gateway will be located, for the new production proxy server. I decided to take one of my lab servers from Engineering and re-deploy the virtual machine on it, and then swap it out for the other server so that I could send the other server to the new gateway location.
Both boxes are identical HP ML370 severs. I set up the lab one to be identically configured to the one in the server room. Then I copied the working proxy server over to the lab machine from the live machine. I tested everything in the lab and it all worked fine. Then I waited until after hours for the Toronto office, and took the lab machine next door to the server room. I shut down the live ML370, unplugged it’s cables, and plugged them into the ML370 from the lab and turned it on. It wouldn’t boot. It powered up but no video signal came out.

I tried switching power cables, keyboard, mouse, monitor, and network cables. Nothing worked. The machine from the lab wouldn’t boot. Then, since the Toronto office was without internet connectivity, I restored all the original cables and put the old box back in, fired it up and saw that it worked fine.

I quit for the day after asking Stuart to look into warranty replacement for the lab machine.

This morning I took the broken lab machine back into the lab, and just for fun I hooked it up in to the cables I had used when I configured it the day before. It boots up fine and works just as it should. I almost wish it hadn’t. Problems that have no discernable cause are so hard to figure out.

We’re busy like bees around here getting ready for all kinds of stuff. I’ve been working on getting ready to flip over to our new VPN provider, and helping handle network issues for an office move, screwing around wth routers and firewalls, proxy servers and virtual machine hosts.

Dad’s also been back in the hospital for post-operative coplications from his heart surgery, which makes things seem even more surreal.

I have been testing a beta patch for the ZLM Linux management agent, to see if it would prevent my VMware GSX server blades from crashing every few days. I installed the patch (zmd7020a) on one blade while leaving the others alone, and then let them run normally, including some VM loads, for several days. Every single blade hung up over Easter weekend, except the one with the patch installed, and everyhing else on that one blade seems to be working normally, so I am going to try deploying the patch on the rest of the servers. I installed the patch to the rest of the blades this morning by making it into a ZLM bundle and pushing it out via ZenWorks.

Mack turned the big oh-seven last week. He wanted to go bowling for his birthday so we took him and seven other boys to the St. Albert bowling alley. They had lots of fun, and Mack even got a strike! Then they ate pizza, which the bowling alley provided and which was suprisingly good, and ate cake (which I made, I’m proud to say). It was a good birthday. He also got spoiled by us and his grandparents. There are now several new Gamecube games in our house thanks to various people. He also got a very nice skateboard from Oma and Opa, and we got him protective gear to go with it, including a new helmet, which he already tested by landing completely upside-down on his head during one of the funniest moments of the weekend (don’t get me wrong, I don’t usually laugh at the kids wiping out, but he didn’t hurt himself and he couldn’t have done a more perfect skateboard dismount to a headstand if he tried).

Easter brunch saw 12 people around our table this year, and since it could have easily been less, what with both Jenn’s dad and mine having had open heart surgery two weeks apart, we were very grateful to have everyone there. Even Grandma, who will have her 90th birthday later this year, was quite spry and cheerful. Everyone ate decorated eggs and cold-cuts and buns, but Emily stole all the swiss cheese. It’s nice to have everyone over, and since we always massively clean our house when we put on somthing like that, we now have a nice clean house to enjoy too.

I re-built eight of the nine blades in the Bladecenter after Brainshare. I wasn’t quite done by the weekend, so I finished that off this week. I got the VMs from our new financial management system running on the newly-rebuilt blades, and configured one of our routers so that our developers could see the VMs from their regular desktops, even though the VMs are in the normally isolated lab network. Then I started working on other stuff.

We are trying to get ready for a major switchover from our existing VPN infrastructure to a new one from a new provider. The provider has some challenges getting things setup the way we need. We have also had problems getting fibre pulled into the various sites, with contractors shrugging the responsibiltiy back and forth between themselves and the telco provider. I think we’re finally just about ready with that stuff.

Meanwhile, the blades started crashing overnight, so I spent a day or two figuring out what was causing that. It turns out cron would execute zenworks zmd (management agent) to do some maintenance function, and it would cause a kernel panic, locking up the machines. I found a beta patch of the zmd piece, and installed that on one server to see if it would work. I’m waiting to see if it stays up when the other blades crash.