Posted
by
samzenpuson Friday March 30, 2012 @08:27AM
from the still-under-warranty dept.

angry tapir writes "I recently had the opportunity to sit down with Charles (Charlie) Pellerin, who was NASA's director of astrophysics when the Hubble Space Telescope launched with its seemingly fatally flawed optical system. Pellerin went on to head up the servicing mission that finally fixed the telescope and for that was awarded NASA's highest honor, a Distinguished Service Medal. Since Hubble he has done a lot of thinking about the problems that led up to the error and how organizations can best avoid making similar mistakes."

The real hero of that project was a man called Story Musgrave.
http://en.wikipedia.org/wiki/Story_Musgrave [wikipedia.org]
There was a lot of planning put into fixing it, but without him actually up there in space improvising when stuff went south, the Hubble would be useless today.

yeah, the guys that designed the corrective optics, the mechanism that deployed them, all the tooling, processes and procedures that were needed to install them and trained the astronauts didn't matter at all. It was all Story. Yup, he's the real hero.

I did say that there was a lot of planning put into fixing it. But it all comes down to the people working with the equipment. All kinds of stuff went wrong with the "fix the hubble" missions. He ended up having to improvise and almost lost his right hand because of it.

Humans are much cheaper than an equivalently versatile remote. Even when they die.

The chinese know this and could care less about a few hundred thousand dead. They will take over the stars. Americans also do, but they will send the lower classes up there to die and pave the way and cannot lose too many or the source will dry up.

The military (any military), well, they only know this when they're at war.

And NASA actually knows it, but the PR backlash is way too big, so they do everything they can to reduce

I wanted to joke about the PhDs getting drunk at their desks, but there are a couple of gems in the text:

"I saw this guy, Richard Feynman, who was a review board member, take a piece of rubber O-ring and put it in his icy water on television, and showed that it stiffened up. So immediately I said, 'Oh, that's the technical problem, they didn't do the O-ring well.'"

"That was nuts," Pellerin says. "These guys understood the O-ring, but I put that story in my head because technical people look for technical answers. I never read the conclusion of [the review board] report that said it was a social shortfall."

We see this very clearly when discussing evoting.

Then towards the end there is an interesting analogy of the Shuttle accidents with a Korean airline company having an extreme crash rate, referring to people put under too much pressure, and irrational.

"There's a bunch of research I've come across in this work, where people say that the social context is a 78-80 per cent determinant of performance; individual abilities are 10 per cent. So why do we make this mistake? Because we spend all of these years in higher education being trained that it's about individual abilities."

I would also like to know the juicy details behind the Mars Climate Orbiter [wikipedia.org] not orbiting, but amm.. slamming into Mars.

Well, that one's easy. Failure by one unnamed country to use units that aren't based on the length of the Pharaoh's arm or how fast a horse with 2 bushels of corn can run on a slope that has a variable gradient, but it's close to the palace so it will do.

The basis of the units are irrelevant; consistency in their use is. Unless you're able to tell me that the length of a path travelled by light in a vacuum in 1/299,792,458 is directly related to landing a probe on Mars.

That's exactly my point. NASA *does* use SI units... just not consistently. It has suppliers that use non-SI units. Those units are also self-inconsistent in some cases (for example, the size of a gallon).

You should read the article, because your comment is exactly the kind of thing he is talking about. Technical people who think they have found a technical problem, therefore the solution is to correct that problem. If the problem was really that measurement system the US uses is 'wrong', then how can the US have so many successful space missions? The problem is not that there are multiple measurement systems, or that one is somehow superior to the other. The problem is that the teams did not communicate successfully - not a technical problem at all. And don't say 'well, if the stupid US would use the same system as the rest of the world it wouldn't be a problem', because that just shows you completely missed the point. The point is that there was ineffective communication - a leadership problem - not simply a technical problem.

Supplier did not supply SI, since it bases its measurements on US system.

Problems.

Yes, it was a communication and management error, but not entirely. It has been standard in scientific settings to use SI units for years and years. Failure to use them *especially when specifically outlined by the design brief* is not just a "communications problem" - it's a fundamental error in the product that was delivered unfit for purpose.

Well, you did miss the point. It was ENTIRELY a communications problem. Saying 'NASA specified SI' does not mean there was effective communication. That's like saying just because a teacher gave a lecture all the students have learned the information. How do you know the information was received and understood? How do you know that information was received and understood by every single person working on the project?

It is obvious that somewhere along the line there was a breakdown in communication. I

Another point: if it was 'standard' to use SI for years and years, why was Lockheed not using SI? Surely this was not Lockheed's first foray into the aerospace industry. If it was 'standard', why did no-one inside Lockheed raise a red flag and say 'wtf'? Is every Lockheed manager and employee a moron who doesn't know the 'standard'? Or maybe, just maybe, there were communication failures. Maybe using SI is not quite the 'standard' when dealing with actual hardware that people assume it is - commun

Contrary to popular belief, the mixup was not an SI vs English units problem. The problem was that the numbers were passed from Lockheed to NASA without units. Without the actual units jotted down after the numbers, the Lockheed people knew the units were lb-f. The NASA people assumed the units were Newtons.

It's an important distinction because the same error can happen even if you work entirely in SI units. If I write down a number in kilonewtons but fail to write down the units, and you assume it is just newtons, we end up with the same problem. I've seen this happen countless times in the lab and while tutoring, with kids plugging grams into an equation when they're supposed to be using kg. (Which BTW is one stupid thing about SI units - really confuses the kids that the base unit for everything else has no prefix, but the base unit for mass is a kilo-gram.) Fortunately, forcing them to write down the units after every number usually takes care of this problem.

In science and engineering, any time you see a number without units, your immediate reaction should be to ask the person who provided the numbers what the units are. (Actually you should be ripping him a new one for failing to write down the units, dimensionless numbers excepted.) Never assume the units, always ask.

As already pointed out somewhat, the bases of the SI units are entirely arbitrary, too. The meter was originally based on a fraction of the earth's circumference but that was tremendously awkward because the earth doesn't have a constant circumference. Also the fractional difference between a liter and its original definition as the volume of 1 kg of water is a failure for the metric system. SI units beat the older units but not by all that much, they are just the "standard" now.

But that's the point - they are a standard (and ever since their inception, the standards body has been looking for ways to define them based on invariable things, rather than on arbitrary things like a mass of platinum/iridium alloy, or a metal rod that is a certain length.

SI as a system for standardising units across science is not controversial, or even new - the fact that a multi million dollar space mission can fail so spectacularly because one supplier was using imperial units is just unforgivably mor

You are certainly correct in everything in your last post. Of all the industries to cling to the imperial system, the aerospace industry is the least forgivable. I was around (in the 70's?) when the US tried to metricise and it was shot down as a Communist plot. You have no argument from me that failure to completely abandon the imperial units is stupid by all those who haven't done it.

It was because people weren't paying attention. Had their been a second set of eyes regarding the specs for the control system, and then a review to make sure the programming was correct (it was the control system that had inaccurate data), there would have been no issue.
There is the misplaced notion that somehow S.I. is more accurate than Imperial. It isn't. You could create anything to be the basis for a standard of measurement. As long as your measurements (and methods thereof) are reproducible, and yo

I attended an astronomy conference a year ago that included a presentation from a NASA guy on the mars rovers. He had a few disparaging things to say about Lockheed-Martin, including blaming them for the Mars Climate Orbiter failure. He said their contract included a statement to recalibrate the thruster in the metric system but they failed to do so. (Of course, he neglected to mention that NASA was managing the project and failed to catch the error.) He also said one of the rovers drove by the heat shield (built by Lockheed-Martin) from the rover landing and there was a big disagreement over examining the heat shield up close to see how well it held up. Lockheed-Martin wanted the data but wanted to keep it secret on the grounds it was a proprietary design. NASA said all their data is public so it's either we drive by without looking, or we take a look and release all the data. They eventually did the latter.

One more thing -- the same conference included a presentation by a professional astronomer who had overseen the building of an observatory in Chile. He had disparaging things to say about NASA -- that their cost estimate was 10X over what he eventually spent on the project. Guess it all depends on your point of view.

I worked at Ball Aerospace years ago and found out the real story. NASA cut the budget for Hubble so that a final optical train alignment task was never done. The engineers had designed a laser test to check the optical path but NASA wanted to save the $50000 the test would take. So until it was turned on, in space, they had no clue how bad it was. Working with NASA was tough mostly due to their arrogance.

It's not only the $50k for the test. Most likely there will millions in cuts and this test happened to be in the mix. It would be nice if you only had to pay for the tests that showed problems. It would make engineering much easier. Unfortunately you have to test for everything even the stuff that works fine.

Is NASA really so underfunded they cannot afford a 50k dollar test to make sure their 1.5 billion dollar telescope?

Read the article (I know, this is Slashdot...). The problem was not that they didn't do obvious tests that would have revealed a major flaw - THEY DID! But they didn't believe the results since the far more sensitive null corrector should/could not possibly have made an error of this magnitude (it should have been accurate to 1/65 wave, but was off by 1/2 wave). They assumed that the test set-up they were using was flawed -- and the environment they were working in (schedule and budget overruns, prestige an

Thanks for that comment. No go back into your troll-hole and read the book Knuckle Dragging for Cretins.Politics being what it is, a politician will try to wrap themselves with the glory coming from a successful project. As with anyone they want to minimize the risk. Being a politician has nothing to do with their gender, only the glad-handing opportunism. To simplify it for your less developed neocortex:

I worked for NASA at the time of the repair. Sadly, because of the ridiculous cost of the shuttle the cost of repair could have built 3 Hubbles, launched two using Atlas boosters to a higher, clearing and more useful orbit and kept one in reserve. Just. for. the. rescue. mission. STS was a horrendous waste of talent and opportunity.

According to TFA, they did do the final test, and it showed problems. Unfortunately, they came to the conclusion that the test was bad, not the mirror. They assumed that since the mirror was no longer on it's 'bed of nails', it was sagging under gravity, and that was causing the test error.

According to TFA, they did do the final test, and it showed problems. Unfortunately, they came to the conclusion that the test was bad, not the mirror. They assumed that since the mirror was no longer on it's 'bed of nails', it was sagging under gravity, and that was causing the test error.

Given the thickness error in the mirror was less than the thickness of a piece of paper, that is a reasonable explanation. It was really small error given the size and weight of the mirror and gravity unfortunately does h

According to TFA, they did do the final test, and it showed problems. Unfortunately, they came to the conclusion that the test was bad, not the mirror. They assumed that since the mirror was no longer on it's 'bed of nails', it was sagging under gravity, and that was causing the test error.

Given the thickness error in the mirror was less than the thickness of a piece of paper, that is a reasonable explanation. It was really small error given the size and weight of the mirror and gravity unfortunately does have a huge effect in slightly deforming such heavy optics. And yes, the optics were carved with gravity deformation in mind as well.

And the other bad thing is, well, the further something is away from you, the tighter the tolerances needed in order to resolve that object, so an error as tiny as it is makes for very blurry images.

To anyone who works with telescope mirrors (even ones costing 0.001% the cost the Hubble mirror) knows all about gravity sagging, it is every present and even my 13.1" mirror requires carefully designed supports. The degree of sag with the 3-point support should have been (and probably was) a calculated, pre-known quantity, and that it was sagging far more than expected should have led to an investigation as to why that was if the cultural environment had supported proper review. So no, it was not a reason

You ought to RTFA. That was just one test out of many, and all the previous tests showed the mirror failing too. They just didn't report the failures. Why? Well, because they had other big "emergencies" going all the time, and (this is key) they were under intense pressure from management to solve all these other "emergency" problems quickly, since the whole project was already over budget by nearly a billion dollars.

Your anecdotal story is intersting, but it fits right into what he was talking about with the Management failures at NASA. Clearly it wasn't the lack of that test that caused the problem. It was a management decision to not perform it. Probably under the exact same pressures. Even if it had been performed though, who's to say they wouldn't have rationalized away the results like they did all the other failed tests?

"We tested that mirror over and over and over with a different kind of device, the old style refractive null corrector," Pellerin says.
The results? "Half wave of error, half wave of error, half wave of error."
"So some people sat down and said, 'What's going on?" Pellerin recalls. "The mindset was that the mirror can not be other than perfect. So something else is happening. They concluded that the mirror was sagging under the force of gravity in the three point mount rather than being on the bed of nails by half a wave.
"Well it turned out that was wrong. But they rationalised, rationalised, rationalised.

...

The project had suffered other challenges beyond fabricating and mounting the mirror; staff were being "hammered" all the time, Pellerin says. In addition there was constant angst about how far the project had gone over budget. "Hubble's initial budget was $434 million we closed it at $1.8 billion just for the flight segment; big overruns."
"So the way it works is you tend to blame the people doing the work," Pellerin says. "So we're hammering on them, hammering on them so they had no free time or no inclination to track down anything that wasn't a critical problem because we have other critical problems. Difficult technical things that we couldn't solve yet."
The review board also found that a hostile environment had been created for the contactor, which meant "they told us about any problem at their peril," Pellerin says.

If you read the article, he did state that they used a second, older, less precise, but still useful test device to check the mirror after using their first super precise (but unknowingly flawed) testing device that they fabricated for the purpose. The mirror failed the tests consistently by the same amount. They then proceeded to ignore it and assume the test was at fault, not the mirror.

The article mentions that the contractor was afraid to bring up problems.

That, plus the mentality from management that people who bring up problems are "troublemakers," "negative," "not team players," etc. (because they've put too much of their ego or political capital into a project) has got to be responsible for more disasters, large and small, than any other deadly combination.

I worked for a large nonprofit that blew money on doomed projects as though money grew on trees. Each time, it started with somebody, usually a contractor or somebody else who stood to gain from it, flattering the leadership that this was huge and visionary and would make or save them millions. Then the organizational mind control started, where everybody was saying that it was the greatest thing ever. Then the flawed project management started. Then when the cracks were obvious, people who pointed them out were vilified as naysayers. It was only the lower-downs who said anything because to rise, one had to be a "team player," and the organization was hierarchical enough that lower-downs were ignored. Then denial that there were problems, together with tossing more money at it (including adding more people to a software project at the last minute because that always works). Then even when the leadership [sic] team [sic] all realized there were problems, they all waited until the person responsible for the project was willing to concede defeat. because in a political environment, nobody wants to confront somebody who might retaliate

Those elements are the inevitable recipe for disaster for any project, but it's fear that drives virtually all of them. Fear of not looking good (note that the Congresscritter didn't yell about wasting taxpayer money, she yelled about being made to look bad), loss aversion, fear of admitting a mistake, fear of speaking up.

Pellerin was brave enough to do something technically illegal and scrape up the funds for servicing it.

In order for the lens snafu to have been a fatal flaw, it would have had to have rendered the Hubble inoperable -- or at least incapable of doing science that couldn't be done with existing telescopes. By for the years that Hubble flew before the fix, it returned pictures that were hundreds of times better than anything previously seen. Within days of its activation, we were seeing so far, and so clearly, that our understanding of the nature of the universe was being rewritten. We were seeing things beyond

Totally agree. This is actual news for nerds who are interested in how to effectively manage, and be managed, by other nerds.
Or, we could go back to arguing whether Autism is a fake diagnosis, based on, you know, our skill with Java.
I kid, I kid. Sort of.

This is the core of real engineering work, and it's one of the reasons I loved working at NASA under great management. I mostly squandered the opportunities I had there, and yet I still learned more from that time than anything else in my career. I actually started there working for a brilliant optics guy who was at Perkin Elmer during the Hubble years. Later, my direct supervisor went on to play a key role in the servicing mission, and (last I heard) was part of the JWST team.

Later, worked in private industry for the team the (essentially) discovered the hole in the ozone layer. We got into it verbally from time to time, but I really respected his knowledge of the physics we were involved in. I once joked about getting fired if the part I was working on failed. He looked me right in the eye and said, "Oh, I won't fire you. I'll make you stay here and fix it." I smile a bit every time I think about that meeting.

I enjoyed that thoroughly. Just like Mythical Man-Month is required reading in virtually all CS programs, something like this needs to be adapted and given as required reading, too - with more anectdotal details of the failure and solutions and costs, and with more pragmatic approaches to avoiding pitfalls, of course. Really good find for us. Thanks/. editors.

I totally agree with you & NoahsMyBro: This is 5 pages of some of the best content I've ever read out of Slashdot. For a run of the mill tech-mag, it's paradigm-shiftingly good; like maybe they're not all brokers/investors selling their con as "tech journalism."

But I'm saddened by the lack of commentary (I'm concluding that implies a lack of RTFAing) and the quality. "Imperial vs. Metric"? Anecdotes that--while interesting--fall straight on the line graphed in the linked story? Oh well, it's worth it

Totally enjoyed the article.
1) I believe this type of mismanagement is a systemic flaw which occurs today despite all the Challenger / Hubble lessons learned. This seems to happen when any project is driven by unspeakable layers of management and the top layers have goals and metrics far different than the bottom layers
2) Was the Apollo project which put man on the moon any different? How did we get all these people to put a man on the moon while today we cannot even get a rocket off the ground?

This is a classic! It shows that when doing exacting science/engineering one should NEVER hurry the staff! Things will get done when they are done, otherwise someone will apply a field-expedient solution to a problem, with the usual "humorous" results...

I think the article was in some ways flawed. It gave a good description of how the error occurred. Then it moved on to a huge tirade against the focus on "individual abilities" which it blames for the whole error. Firstly, even taking the description of how the error occurred at face value, it is not at all clear that the error had anything to do with a focus on "individual abilities". On the contrary, it seems this was just an instance of really poor management that - due to cost overruns - pushed their employees to work harder, to the point that they lost their focus on quality and maybe even started cutting corners in the fabrication process. This has absolutely nothing to do with a focus on "individual abilities". However, let me address the "anti-individual abilities agenda" anyway.

The anti-individual abilities agenda is routinely promoted by managers, project managers and other people engaged in the management layers (management consultants, business schools etc.). The motive is pretty clear: Many bosses don't like admitting that the success of their project comes down to individual abilities of a few core members on the project. After all, what is the value of management then, they ask? It's like the tail wagging the dog.

However, this is just denying reality. I can firmly say that on any project of major size I worked on, the was a few 5-10% of people on the project running the show. This in itself is not very surprising, what is surprising is the fact that these 5-10% were not centered at the top of the pyramid. Rather, it was evenly spread out over all 'layers' from 'highest to lowest'. These people (by virtue of their skills and dedication to the project, something that is often lacking with the project management itself!) automatically assume a role of authorities whether management likes it or not. It's simply the only way to get things done. Let's face it, on any project there's going to be a lot of 9-5'ers that don't really care. They are never the ones driving the car, nor should they. It's the 5-10% who has both the ability to and the interest in getting the job done that counts. Those that dream about the project at night and who feel their personal honour is at stake in making it succeed.
Also, as Fred Brooks noted in 'a mythical man month', some (sub)projects are like surgery. You need one highly skilled person to be in charge and carry out the job, and the rest of the team members are really just accessories of that person. Their contribution can be important of course, but at the end of the day, all choices, responsibility resides with the 'surgeon' etc.

I think the lesson to be learned from these observations is that management needs to accept that this is the structure that projects will generally fall into, no matter what they do. The job of management is to get the best result out of it. On projects with poor management that creates obstacles for progress and makes lots of bad choices (this often happens on politically infested projects as well as on projects where management doesn't have a clue about the technical aspects), often the project finds a way to completely bypass management. Decisions by management may be outright ignored, or important decisions are never brought up to this level but are just made behind the scenes. This is a very dangerous situation since important decisions may not be properly reviewed and may not even be known by all stake-holders. While most decisions taken may have been correct, it takes just one bad decision to jeopardize the project, and problems related to this kind of "skunkworks decisions" tend to surface very late where they may cause huge problems, sometimes disasters.

The job of management is to embrace the individual abilities, and to listen carefully (but of course not uncritically) to arguments brought forward, no matter if it is from a project manager or a "lowly" techie. They need to make a decent effort to try to understand what they are talking about, even if the explanations are not always clear and even if it can sometimes be highly technical.

it is not at all clear that the error had anything to do with a focus on "individual abilities".

If you RTFA, there was a single point of failure isolated to a specific tool used to grind the mirror, and the subsequent mistakes made by management dealing with the error (i.e. when the mirror was on the bed of nails).

What makes you think the US failed with Sputnik ? Eisenhower deliberately let the Soviets be first because he didn't want to use existing military rockets and wanted them to test the water of flying a satellite over sovereign countries. Now you might argue that he was mistaken but his plan (let the soviet be first and follow soon after) was a success.

US "failed" at being the first to put satellite in orbit but as you pointed out that was deliberate to let USSR be first to test waters of sovereign "space." My big question is what is true intentions of our leaders? I want to know now and not decades from now.