Post navigation

I’ve mentioned in the past that I think it’s important to share and give back knowledge.

This week’s blog post will be short (sorry, they can’t all be great works of art.) But first I want to mention an event that just happened. I’m the leader of the local SQL Server User Group: CASSUG. We had our monthly meeting last night and I was grateful that Hilary Cotter was willing and able to drive up from New Jersey to present on Service Broker.

When I arrange for speakers, I always hope my group gets something out of it. Well, last night we had a new member visiting from out of town. So, it’s probably rare he’ll make future meetings. And today, I read from him: “Hilary’s presentation was very informative and interesting. “ and “Now it has piqued my interest and I’ve started a Pluralsight course to learn more.” To me, that’s success.

At our July meeting we had lightning rounds. Instead of a single presenter, we had four of our local members present on a topic of their choice for about 15 minutes each. One of them, presented on using XML results in a SQL query to help build an HTML based email. He adopted the idea from I believe this blog post. Twice now in the last month I’ve used it to help clean up emails I had a system sending out. Yesterday, I finally decided to cleanup an old, ugly, hard to read text based email that showed the status of several scheduled jobs we were running overnight. A few hours later, after some tweaking I now had a beautiful, easy to read email. Excellent work and all based on an idea I never would have come up with it my colleague had not shared it from his source.

And that leads me to a bit of self-promotion. When I created this blog, my goal was not to have lots of posts around SQL Server. Several months ago, a mentor of mine (I don’t know if she considers herself that, but I do, since she’s the one that planted the seed in my head for my first book: IT Disaster Response: Lessons Learned in the Field) approached me at SQL Saturday Atlanta and mentioned she was now an editor for Red-Gate’s Simple-Talk blog section and asked me if I’d be interested in writing. I was.

So I’m proud to say that the first of my blog at the Red-Gate Simple-Talk site is up. Go read it. I’m excited. As of today it’s had over 2000 views! Far more than I get here. And there’s more to come.

And here’s the kicker. Just today I had a client say, “Hey, I need to get this data from this SQL 2014 database to a SQL 2008 Database.” I was able to say, “I’ve got JUST the answer for that!”

Sharing knowledge is a good thing. It makes us all far more capable and smarter.

I’ve mentioned in the past that I participate a lot in SQL Saturday events and also teach cave rescue. These are ways I try to give back to at least two communities I am a member of. I generally take this engagement very seriously; for two reasons.

The first, which is especially true when I teach cave rescue, is that I’m teaching critical skills that may or may not put a life on the line. I can’t go into teaching these activities without being prepared or someone may get injured or even killed.

The second is, that the audience deserves my best. In some cases, they’ve paid good money to attend events I’m talking or teaching at. In all cases, they’re taking some of their valuable time and giving it to me.

All the best SQL Saturday speakers and NCRC instructors I know feel generally the same about their presentations. They want to give their best.

But here’s the ugly truth: Sometimes we’re not on our A game. There could be a variety of reasons:

We might be jet-lagged

We may have partied a bit too much last night (though for me, this is not an issue, I was never much of a party animal, even when I was younger)

You might have lost your power and Internet the day before during the time you were going to practice and found yourself busy cutting up trees

A dozen other reasons

You’ll notice one of those became singular. Ayup, that was my excuse. At the SQL Saturday Albany event, due to unforeseen circumstances the day before, the time I had allocated to run through my presentation was spent removing trees from the road, clearing my phone line and trying to track down the cable company.

So, one of my presentations on Saturday was not up to the standard I would have liked it to be. And for that, to my audience, I apologize (and did so during the presentation).

But here’s the thing: the feedback I received was still all extremely positive. In fact the only really non-positive feedback was in fact very constructive criticism that would have been valid even had I been as prepared as I would have liked!

I guess the truth is, sometimes we hold ourselves to a higher standard than the audience does. And I think we should.

I apologize for skipping two weeks of blog posts, but I was a bit busy; for about 11 days my family and I were visiting Europe for the first time. It was a wonderful trip. It started with a trip to Manchester UK for a SQL Saturday event.

I had sort of forgotten exactly how much further north we were until it dawned on me how early dawn was. Actually we had noticed the night before as we walked back from the amazingly wonderful speakers’ dinner how light it was despite how late it was. When I woke up at around 4:30 AM (a bit of jetlag there) I noticed despite the blackout curtains how bright it was around their edges. I later looked it up, and it appears that technically it never reached “night” there, but simply astronomical twilight.

Ever since seeing the movie “White Nights” my wife has always wanted to experience the white nights of Russia. This wasn’t that, but it was close.

This trip followed up on the heels of the amazingly successful Thai Cave Rescue that I had previously commented on. As long term readers know, I’m a caver who also teaches cave rescue and has a role as the Northeast Coordinator of the National Cave Rescue Commission. During the 18 day saga, I and others were called upon by various media outlets to give our insight and perspective. I was fortunate, I only did a little under a dozen media events. Our National Coordinator, Anmar Mirza did well over 100, and most of those in about a 5 day period. A link to one of my media events is here: The Takeaway.

I don’t want to talk about the operation itself, but I want to talk about White Knights. We love our White Knights: the term often refers to a character who will ride into town and single-handedly solve the town’s problems. The truth is, white knights rarely if ever exist and that most problems require a lot more effort to solve.

We’ve seen this in politics, and we saw this with this cave rescue. Let me start by saying I think the work Elon Musk has done with SpaceX is amazing. SpaceX has in fact single-handedly revolutionized the space launch market.

It was perhaps inevitable that Musk’s name would show up in relation to this cave rescue. Musk has previously gotten attention for attempting to help with the power outage crisis in Puerto Rico and now his vow to help the people of Flint (both by the way I think worthy causes and I wish him and more importantly the people he’s trying to help, well).

But here’s the thing, a cave rescue isn’t solved by a white knight. It’s solved by a lot of effort and planning with a lot of people with a variety of skills and experience. There’s rarely a magic breakthrough that magically makes things easier.

And I’ll be blunt: his “submarine” idea, while interesting, was at best a PR distraction and at worst, possibly caused problems.

“But Greg, he was trying to help, how could this make things worse?” I actually disengaged from an online debate with some Musk fanbois who couldn’t see why Musk’s offer was problematic. To them, he was the white knight that could never do wrong.

Here’s the thing: I know for a fact that several of us, myself included, had to take part of our allotted airtime or written coverage to address why Musk’s idea probably wouldn’t work. This meant less time or room for useful information to be passed on to the audience. Part of my role as regional coordinator is to educate people about cave rescue, and I can’t do this effectively when I’m asked to discuss distractions.

“But so what, that didn’t impact the rescue.” No, it didn’t. But, it appears from the Twitter fights I’ve seen, and other information, that at least some resources on the ground were tasked to deal with Musk. This does mean that people had to spend time dealing with both Musk and the publicity. This means those resources couldn’t be spent elsewhere. At least one report from Musk (which honestly I question) suggests he actually entered the cave during the rescue operations. This means that resources had to be spent on assuring his safety and possibly prevented another person who could have provided help in other ways (even if it was simply acting as a sherpa) from entering.

And apparently, there’s now a useless “submarine” sitting outside the cave. I’ll leave discussion of why I had problems with the submarine itself for another post.

But here’s one final reason I have problem with Musk bringing so much attention to himself and his idea: It could have lead to second guessing.

Let’s be clear: even the cave divers themselves felt that they would most likely lose some of the kids; this was exactly how dangerous the rescue was. This is coming from the folks who best knew the cave and best understand the risks and issues. Some of the best cave divers in the world, with rescue experience, who were on-site, thought that some kids would die in the attempt to rescue them. And, if reports are true, they were aware of Musk’s offer and obviously rejected it (and in fact one suggested later that Musk do something anatomically impossible with it.)

Had the rescuers worst fears come true, Musk fan bois would have second guessed every decision. In other words, people would have put more faith in their favorite white knight, who had zero practical experience in the ongoing operations , than they would have in the very people who were there and actively involved. I saw the comments before and during the operations from his fans and all of them were upset that their favorite white knight wasn’t being called in to save the day. I can only imagine how bad it would have been had something tragic occurred.

This is why I’m against white knights. They rarely if ever solve the problem, and worse when they do ride into town, they take time and energy away from those who are actually working on the problems. Leave the white knights on a chess board.

As I’m writing this, word has rocketed around the world that the 12 soccer players and their coach have been safely rescued from Tham Luang cave. We are awaiting word that all the rescuers themselves, including one of the doctors that had spent time with the boys since they were found, are still on their way out.

Unfortunately, one former Thai SEAL diver, Saman Kunan, who had rejoined his former teammates to help in the rescue, lost his life. This tragic outcome should not be forgotten, nor should it cast too large of a shadow on the amazing success.

What I want to talk about though is not the cave or the rescue operations, but the decision making progress. The title for this post comes from Narongsak Osottanakorn’s statement several days ago when they began the evacuation operations.

The term D-Day actually predates the famous Normandy landings that everyone associates it with. However, success of the Normandy landings and their importance in the ultimate outcome of WWII has forever cemented that phrase in history.

One of the hardest parts of any large scale operation like this is making the decision on whether to act. During the Apollo Program, they called them GO/NO GO decisions. Famously you can see this in the movie Apollo 13 where Gene Kranz goes around the room asking for a Go/No Go for launch. (it was pointed in a Tindellgram out before the Apollo 11 landing, that the call after the Eagle landed should be changed to Stay/No Stay – so there was no confusion on if they were “go to stay” or “go to leave”.)

While I’ve never been Flight Commander for a lunar mission, nor a Supreme Allied Commander for a European invasion, I have had to make life or death decisions on much smaller operations. A huge issue is not knowing the outcome. It’s like walking into a casino. If you knew you were always going to win, it would be an easy decision on how to bet. But obviously that’s not possible. The best you can do is gather as much information as you can, gather the best people you can around you, trust them and then make the decision.

What compounds the decision making progress in many cases, and especially in cave rescue is the lack of communication and lack of information. It can be very frustrating to send rescuers into the cave and not know, sometimes for hours, what is going on. Compound this with what is sometimes intense media scrutiny (which was certainly present here with the entire world watching), and one can feel compelled to rush the decision making progress. It is hard, but generally necessary to resist this. In an incident I’m familiar with, I recall a photograph of the cave rescue expert advising rescue operations, standing in the rain, near the cave entrance waiting for the waters to come down so they could send search teams in. Social media was blowing up with comments like, “they need to get divers in there now!” “Why aren’t the authorities doing anything?” The fact is, the authorities were doing exactly what the cave rescue expert recommended; waiting for it to be safe enough to act. Once the waters came down, they could send people and find the trapped cavers.

The incident in Thailand is a perfect example of the confluence of these factors:

There was media pressure from around the world with people were asking why they were taking so long to begin rescuing the boys and once they did start to rescue them, why it took them three days. Offers and suggestions flowed in from around the world and varied from the absurd (one suggestion we received at the NCRC was the use of dolphins) to the unfortunately impractical (let’s just say Mr. Musk wasn’t the only one, nor the first, to suggest some sort of submarine or sealed bag).

There was always a lack of enough information. Even after the boys had been found, it could take hours to get information to the surface, or from the surface back to the players. This hinders the decision making process.

Finally of course are the unknowns:

When is the rain coming?

How much rain?

How will the boys react to being submerged?

What can they eat in their condition?

And finally, there is, in the back of the minds of folks making the decisions the fact that if the outcome turned tragic, everyone will second guess them.

Narongsak Osottanakorn and others had to weigh all the above with all the facts that they had, and the knowledge that they couldn’t have as much information as they might want and make life-impacting decisions. For this I have a great deal of respect for them and don’t envy them.

Fortunately, in this case, the decisions led to a successful outcome which is a huge relief to the families and the world.

For any operation, especially complex ones, such as this rescue, a moon landing or an invasion of the beaches of Normandy, the planning and decision making process is critically important and often over shadowed by the folks executing the operation. As important as Neil Armstrong, Buzz Aldrin and Michael Collins (who all to often gets overlooked, despite writing one of the better autobiographies of the Apollo program) were to Apollo 11, without the support of Gene Kranz, Steve Bales, and hundreds of others on the ground, they would have very likely had to abort their landing.

So, let’s not forget the people behind the scenes making the decisions.

“When does a cave rescue become a recovery?’ That was the question a friend of mine asked me online about a week ago. This was before the boys and their coach had been found in the Thai cave.

Before I continue, let me add a huge caveat: this is an ongoing dynamic situation and many of the details I mention here may already be based on inaccurate or outdated information. But that’s also part of the point I ultimately hope to make: plans have to evolve as more data is gathered.

My somewhat flippant answer was “when they’re dead.” This is a bit of dark humor answer but there was actually some reasoning behind it. Before I go on, let me say that at that point I actually still had a lot of hope and reason to believe they were still alive. I’m very glad to find that they were in fact found alive and relatively safe.

There’s a truth about cave rescue: caves are literally a black-hole of information. Until you find the people you’re searching for, you have very little information. Sometimes it may be as little as, “They went into this cave and haven’t come out yet.” (Actually sometimes it can be even less than that, “We think they went into one of these caves but we’re not even sure about that.”)

So when it comes to rescue, two of the items we try to teach students when teaching cave rescue is to look for clues, and to try to establish communications. A clue might be a footprint or a food wrapper. It might be the smell of a sweaty caver wafting in a certain direction. A clue might be the sound of someone calling for help. And the ultimate clue of course is the caver themselves. But there are other clues we might look for: what equipment do we think they have? What experience do they have? What is the characteristics of the cave? These can all drive how we search and what decisions we make.

Going back to the Thai cave situation, based on the media reports (which should always be taken with a huge grain of salt) it appeared that the coach and boys probably knew enough to get above the flood level and that the cave temps were in the 80s (Fahrenheit). These are two reasons I was hopeful. Honestly, had they not gotten above the flood zone, almost certainly we’d be talking about a tragedy instead. Had the cave been a typical northeast cave where the temps are in the 40s (F) I would have had a lot less hope.

Given the above details then, it was reasonable to believe the boys were still alive and to continue to treat the situation as a search and eventually rescue situation. And fortunately, that’s the way it has turned out. What happens next is still open for speculation, but I’ll say don’t be surprised if they bring in gear and people and bivouac in place for weeks or even months until the water levels come down.

During the search process, apparently a lot of phone lines were laid into parts of the cave so that easier communications could be made with the surface. Now that they have found the cavers, I’d be shocked if some sort of realtime communications is not setup in short order. This will allow he incident commander to make better informed decisions and to be able to get the most accurate and up to date data.

So, let me relate this to IT and disasters. Typically a disaster will start with, “the server has crashed” or something similar. We have an idea of the problem, but again, we’re really in a black-hole of information at that moment. Did the server crash because a hard drive failed, or because someone kicked the power cord or something else?

The first thing we need to do is to get more information. And we may need to establish communications. We often take that for granted, but the truth is, often when a major disaster occurs, the first thing to go is good communications. Imagine that the crashed server is in a datacenter across the country. How can you find out what’s going on? Perhaps you call for hands on support. But what if the reason the server has crashed is because the datacenter is on fire? You may not be able to reach anyone! You might need to call a friend in the same city and have them go over there. Or you might even turn on the news to see if there’s anything on worth noting.

But the point is, you can’t react until you have more information. Once you start to have information, you can start to develop a reaction plan. But let’s take the above situation and imagine that you find your datacenter has in fact burned down. You might start to panic and think you need to order a new server. You start to call up your CFO to ask her to let you buy some new hardware when suddenly you get a call from your tech in the remote. They tell you, “Yeah, the building burned down, but we got real lucky and our server was in an area that was undamaged and I’ve got it in the trunk of my car, what do you want me to do with it?”

Now your previous data has been invalidated and you have new information and have to develop a new plan.

This is the situation in Thailand right now. They’re continually getting new information and updating their plans as they go. And this is the way you need to handle you disasters, establish communications, gather data and create a plan and update your plan as the data changes. And don’t give up hope until you absolutely have to.

This blog post will try to tie together several of my favorite things: Cheese, caving, and accidents.

I was making lunch the other day and I was looking at the stick of sliced Swiss cheese I had. I should note, I love Swiss cheese, especially with a good roast beef sandwich.

But first, an existential question. “What is a cave?”

Oh, that’s easy, it’s a passage through rock in the ground. In other words it’s the area where there’s no rock. Great. Let’s start simple. I think we can agree if it’s dark and I can walk through it, it’s a cave. What if I have to crawl? Yeah, that’s still a cave. What if I have to shimmy through and can barely fit? Yeah, that’s still a cave. What if I can’t fit, but one of my much smaller friends can fit through? Yeah, that’s a cave. But what if the entire thing is too small for anyone to crawl through but small animals can? What if two rooms that are large enough for humans to be in are connected by a passage too tight for a human, but say you can shine a light through, or can make a “voice connection” and hear people at the other end? Is that still part of the cave? As an aside, humans have mapped over 190 miles of Jewel Cave (and more all the time, big shout out to my friends who are mapping it!) But airflow studies estimate that we’ve only mapped about 3-5% of it. Let that sink in. But, what if the other 95% is too small for a human to fit in. I don’t think anyone would not call that part of the cave.

But here’s the real question. So we’ve mapped the cave. We know where the passages (i.e. lack of rock) are. We find a plug of mud and remove that. We’ve made more cave! Yeah! But what if we remove ALL the rock around the existing passage. When does the cave disappear? I mean now we just have a lot more “absence of rock”. But I think we’d agree at some point we no longer have a cave!

So back to Swiss cheese. One of the distinguishing details of such cheese are the holes, or more properly named the eyes. Did you know there’s actual Federal guidelines on what can be called Swiss cheese. Ayup, you can’t simply have a cheese with eyes in it. So I guess Swiss cheese is sort of like a cave. We actually have to think about it to give it some definition we can agree on. Take away all the cheese, eyes and all, and you have no more cheese and I’m quite sad.

But what about accidents? Well, there’s a model of risk analysis called the Swiss cheese model. Basically, very few accidents occur out of the blue or entirely without a relation to other factors. The idea is you have multiple slices of Swiss cheese and all the holes have to line up for the accident to occur. For example, in my own personal experience, years ago I came close to all the “pieces” of the cheese lining up; while driving through New Jersey, I came fairly close to hydroplaning off an exit ramp into the woods. Let’s look at some of the slices of cheese that came into play.

I was tired. Had I been more awake I’d have been paying a bit more attention.

It was dark. I might have noticed exactly how wet the exit ramp was during daylight.

I was travelling too fast.

I had nearly missed the ramp, I might have been travelling slower (see above) had I noticed the ramp sooner.

The instant I hit the ramp, I knew I was in trouble. I think the ONE slice that didn’t line up was, experience. Had I been 20 years younger with less experience driving, I suspect I’d have ended up off the road. I was at the very edge of being able to brake and maneuver and I called upon all my years of experience to stay on the correct side of that edge. One thin slice of “cheese” saved me that night.

When one looks through accident reports, of almost any industry or activity, one can start to look for where the slices lined up and how any one could be changed. One reason I read the American Cave Accidents report when I receive it is to learn where the slices could have been moved so I can make sure I don’t line up my slices of cheese.

So, the question for you is where do your slices of cheese line up?

And other question is, what sort of cheese do you put on YOUR roast beef sandwich? And do you make sure your Swiss cheese eyes don’t line up so every bite is ensured a bit of cheese?

It was a pretty simple request actually. “Can you copy over the Panama database from FOO\WAS_21 to server BAR\LAX_45?”

“Sure, no problem.”

Of course it was a problem. Here’s the issue. This is at one of my clients. They have a couple of datacenters and have hundreds of servers in each. In addition, they have servers in different AD domains. This helps them partition functionality and security requirements. Normally copying files between servers within a datacenter isn’t an issue. Even copying files between the different domains in the same datacenter isn’t normally too bad. To be clear, it’s not great. Between servers in the same domain, it appears they have 1GB connections, between the domains, the firewall seems to throttle stuff down to 100MB.

The problem is when copying between different domains in different datacenters. This can be abysmally slow. That was my problem this week. WAS_21 and LAX_45 were in different datacenters, and in different domains.

Now, for small files, I can use the cut and paste functionality built into RDP and simply cut and paste. This doesn’t work for large files. The file in this case was 19GB. So this was out.

Fortunately, through the Citrix VDI they provide, I have a temp folder I can use. So, easily enough, I could copy the 19GB file from FOO\WAS_21 to that. That took just a few minutes. Then I tried to copy it from there to BAR\LAX_45. This was slow, but looked like it would work. It was going to take 4-5 hours, but they didn’t need the file for a week.

After about 4.5 hours, my RDP session locked up. I logged out and back in and saw the copy had failed. I tried again. This time at just under 4.5 hours I noticed an out of memory error. And then my session locked up.

So, apparently this wasn’t going to work. The obvious solution was to split the file (it was already compressed) into multiple files; except I’m not allowed to install most software on the servers. So that wasn’t a great option. I probably could have installed something like 7zip and then uninstalled it, but I didn’t want to deal with that and the paperwork that would result.

So I fell back to an old friend: Robocopy. This appeared to be working great. Up until about 4.5 hours. And guess what… another out of memory error.

But I LIKE challenges like this.

So I looked more closely. Robocopy has a lot of options. There are two that stuck out: /Z – restartable mode. That looked good. I figured worst case, I’d start my backup, let it fail at about 85% done and then resume it.

But then the holy grail: /J :: copy using unbuffered I/O (recommended for large files).

Wow… unbuffered… that looks good. Might use less memory.

So I gambled and tried both. And low and behold, 4:19 later… the file was copied!

So, it was an annoying problem but… I had solved it. I like that!

So the take-away: Don’t give up. There’s always a way if you’re creative enough!