STS-122 – Hale memo calls for a review of the ECO system’s validity

A fascinating memo, written by Shuttle manager Wayne Hale, has been acquired by this site, providing an insight into the thought process behind decisions taken during Atlantis’ two launch attempts.

Troubleshooting will pick up again this morning at Pad 39A, as engineers evaluate the potential cause of the two scrubbed launch attempts, via an LH2 ECO (Engine Cut Off) system that Hale himself asks if “it is now time to consider whether we can live without it.”

Atlantis and her stack are now covered by the Rotating Service Structure (RSS), ahead of pad work to take her out of a launch configuration. This will allow for platforms to be re-installed for engineer access.

In tandem, engineers are looking over the data gathered by yesterday’s scrub, which turned into an opportunity for a tanking test, used to gather more information on potential fault trees downstream of the ECO sensors – which themselves are not believed to be the cause of the issue.

Meetings are taking place on Monday to discuss the forward plan, which already involves Atlantis’ External Tank, STS-123’s ET (ET-126) inside the Vehicle Assembly Building (VAB), and ET-127 at the Michoud Assembly Facility (MAF).

The plan is likely to involve a troubleshooting device known as Time-Domain Reflectometer (TDR), which works by injecting a pulse into a line and watching for reflections. The aim will be to potentially find the open circuit via this method. Discussions are also taking place on the potential of an additional tanking test.

Mission Management Team (MMT) evaluations:

Sunday’s second attempt to launch STS-122 came after a 48 hour scrub turnaround, caused by a breaching of LCC (Launch Commit Criteria) rules, which denote that three of the four LH2 ECO sensors need to be in working order for the launch countdown to proceed to lift-off. Both LH2 ECO sensors 3 and 4 failed ‘open circuit’, effectively scrubbing the launch. LH2 ECO sensor 1 subsequently failed during the draining of the tank.

This led to a MMT (Mission Management Team) meeting to discuss whether to proceed towards a second attempt on Saturday (24 hour scrub turnaround), or to allow for a further day of discussions and target a Sunday attempt. The latter was chosen.

Following a fully attended video conference call with engineers from all the related centers and contractors – one which was classed as passionate, though sources go further, still seen as healthy in ‘new’ NASA culture of allowing everyone a voice when it comes to safe flight – Sunday’s attempt was given new a go to proceed with a new LCC requiring all four LH2 ECO sensors to be working.

This new LCC, it has since been revealed, was insisted on by the Astronaut Office. JSC Engineering, JSC Safety and Mission Assurance, some ET and safety related managers also voiced their concerns, before approving a plan that involved both the new LCC and the risk mitigation effort, which included Flight Control staff procedures on monitoring the orbiter and ET’s propellant usage during the ride uphill.

Management conversations:

It was an e-mail from Safety and Mission Assurance manager, and three times shuttle astronaut, William McArthur, that prefaced the response by Wayne Hale.

‘I am skeptical that we will reach a consensus supporting flight with 0/4 ECO sensors within the next two days,’ noted McAthur after Friday’s scrub. ‘Most at JSC heard direction today to ‘find rationale for flight’ versus ‘is the risk of flying without the ECO sensor system acceptable?’. To me, this seems to be a huge leap.

‘The 3/4 criteria has been extensively discussed and analyzed. To respond to yet more difficulties with the system by ‘giving up’ and declaring the system valueless appears inconsistent with NASA tradition. We fix things that don’t work.

‘Perhaps this system doesn’t add value commensurate with the effort to make it work reliably. But we may be on a slippery slope pushing to resolve that question in such a short period of time.

‘That we’ve never had an LH2 low-level cut-off is valuable data but not compelling in itself. Have we ever come close? We’ve never used the Crew Escape System, but there are no proposals to trade this equipment for upmass (I’d sure consider it, though). Maybe not a fair analogy.

‘Right now I’m far from being comfortable with flying in our current condition.’

Regardless, Sunday’s launch attempt was soon dashed by the LCC, which was raised to 4/4 for the LH2 ECOs. It was just a couple of minutes into the fastfill of the ET when LH2 ECO sensor number 3 failed, dashing the hopes that the intermittent problem would not show up during the second attempt, and scrubbing the launch.

While NASA have been open with their evaluations to the media and public, an in-depth look at the evaluation process has been provided via the response to McAthur’s e-mail by Hale – sent to the managers of the related NASA centers.

‘I completely agree with the thought that we are unlikely to reach community agreement today on this subject. Community agreement, while somewhat desirable, has never been mandatory, and in actual practice is frequently not achieved,’ wrote Hale.

‘The MMT Chair will have to do what his job description requires him to do, make a decision in the face of conflicting recommendations. The governance model of the agency provides for appeal paths for dissenting opinions and the restraining checks and balances of the independent technical authorities. But in the end, somebody will have to decide and complete unanimity is not mandatory, nor I would say, even necessarily desirable.

‘It is good to have strong advocates on both sides of complex issues to ensure that all factors are thoroughly examined. So whether today, next week, next month, or next year, I expect that there will still be many folks who will disagree with the decision – whatever that may be – on this topic.’

Mr Hale then spoke about ‘launch fever’ – sometimes noted as ‘schedule pressure’ – which is usually questioned at every launch-related press briefing by members of the attending media, leading to what he rightly describes as ‘brazen’ headlines, especially if such headlines were seen during STS-122’s launch attempts, due to the fact the ability to even achieve a December attempt was thanks to a major effort both with ground processing and ISS stage work, post STS-120.

‘I am extraordinarily aware of the affects of the phenomenon termed launch fever. We are here at KSC or our home centers with a job to do, fly the shuttle safely and successfully, which thousands of folks have labored long and hard to get ready,’ Hale added. ‘(An) extraordinary effort has been made to get the vehicle to the launch pad on time.

‘The media is clamoring for us to launch (most launch pressure comes from the media) with brazen headlines in the papers etc. If we had decided to override the LCC and press to launch, or if we had come to a hasty conclusion at the end of a long and tiring day yesterday, then I would agree that our judgement had been overly influenced by launch fever or schedule pressure.

‘The fact that we stood down for 48 hours to consider our options, to cool down, to gather our facts, to discuss, debate, troubleshoot, review, etc., etc., etc., all of this is clear indication that the team is not allowing launch fever to overly influence us. I expect we will have a, ahem, dispassionate discussion. We may reach a conclusion. We may not.’

Under consideration for the interim troubleshooting was a tanking test. MSFC (Marshall Space Flight Center) provided shuttle managers with the pros and cons for carrying this out for what would have been Saturday.

‘Testing helps to characterize risk that a minimum of 2 (probably more) ECO sensor circuits will perform during cryogenic operations. Provides additional confidence through tanking / de-tanking / tanking cycle if sensors are operational after initial tanking and experiencing cryo temps,’ noted the ‘benefits’ of carrying out a tanking test ahead of the second launch attempt.

However, Hale and his team were also provided with an opposing view, again from MSFC, which explained why this option wasn’t taken on Saturday, and only used during the scrub of Sunday’s launch attempt, due to the obvious opportunity of a pre-loaded tank on the pad.

‘Does not increase nor provide flight rationale over a typical pre-launch tanking. Another structural cycle added to tank hardware. Requires resources that could be utilized on failure analysis assessments. If ECO system performance is nominal, no guarantee that anomaly will not occur next flow, i.e., does not clear un-verified failure,’ added MSFC’s ‘detractors and risks,’ element of the presentation.

‘Tanking test alone – without supporting logic – does not provide flight rationale. 3 of 4 ECO sensors failed wet, so they are clearly susceptible to the ‘failure mode’. If thermal can affect the system, vibration could affect it too. Provides only one additional data point. Tanking test doesnâ€™t alleviate longer term concerns with overall LH2 ECO system reliability. Schedule impact. Hazardous operation. Detanking, alone, has no benefit.’

This goes some way to show what the shuttle management are faced with, a ‘damned if you do, damned if you don’t’ style of recommendations to find a solution to an intermittent problem that has already been through a full troubleshoot.

‘The question, as always, is when have we spent enough time doing our homework, how long is long enough to contemplate a course of action, review the data, etc.’ Hale noted, before moving on to a technical overview of his thoughts on the validity of the ECO sensor system.

‘The proposal that the LH2 cut-off system is unreliable and we should disable it and fly without it is not new. There were discussions along these lines in some small parts of the community before the Columbia accident. This discussion came to a serious discussion at the first RTF tanking test in March of 05. At that time, I believed that such a decision would be premature.

‘There is no doubt that a properly functioning LH2 cut-off system protects for some failure cases that would otherwise be catastrophic. It behooved the program to expend considerable resources to troubleshoot and perhaps restore functionality to this system, even though the likelihood of needing the LH2 cut-off system was statistically unlikely.

‘Every time the ECO sensors have caused a launch scrub, the discussion of the usefulness or not of the LH2 system has been debated, and every time to date there was more work that could be done, so the decision to fly without them was always postponed. This is not a new idea. The question, really, is whether the time has come to consider it more thoroughly.’

Hale went on to give a historical overview of the years of work that has been carried out on the ECO system, from the related hardware – such as the Point Sensor Box (PSB) on the orbiter, downstream to the sensors themselves. He also noted his opinion on the usefulness the aforementioned tanking test would have in adding to their understanding of the problem.

‘Troubleshooting at cryo temperatures is by definition hazardous and we have limited ways to accomplish that. If there is a credible story that a new tanking test, perhaps with new drag on instrumentation for voltages, resistance, or temperature, would help, we probably will execute that test,’ noted Hale, speaking of the separate test that would have been ordered for Saturday.

‘However, I remain skeptical at this time since the case has not been presented in a coherent manner of what we would test, what data we would find, and how that might be useful.’

Mr Hale continued his technical overview by focusing on the LH2 feedthrough connector, which has made up the bulk of presentations created by the ET project over the past couple of days, though only to show they believe it is not at fault for the ECO system’s issues. ‘For open circuit condition connector failure would require disengagement of pins or pin failure: Not possible,’ noted MSFC as part of their assessment.

‘Much attention has centered on the LH2 tank feedthrough connector. An extensive series of tests and evaluations is done on each connector. After the ET-120 tanking tests, the LH2 feedthrough connector was removed and subjected to many harsh tests. Nothing was found,’ added Hale.

‘The design is relatively simple and robust. What could be done about LH2 feedthrough connectors or how to better test them to find problems before actual tanking remains a mystery. Again, every credible suggestion has been exhausted.’

This all leaves the Shuttle Program with a dilemma. They have a system that has a fault with an unknown root cause, a fault that is intermittent, on a system that has never been used during the history of the Program. Yet this fault is causing the scrubs to launch attempts that might otherwise of have smoothly counted down to lift-off.

However, the Program is not yet of the consensus that simply ‘doing away’ with the system is the current recommendation, due to its role as of mitigating the risk of the engines not shutting down before they run out of propellant, which would lead to the vehicle being lost. Hale believes the Program should entertain another discussion on looking past the requirement of the LH2 ECO system.

‘Out of having completed everything that we can think of to do, we must come to the place were we have to consider what it means to fly with an unreliable system. Because it is unreliable. And because there are no more credible suggestions out there of how to make it reliable, either in the short term, or the longer term (less than 2.5 years),’ Hale noted.

‘So why do we need an LH2 cut-off system. Simply put, if you need it and the LH2 tank runs dry with the engines at full power with LO2 still coming in, inevitably a catastrophe will occur. LOX rich shutoffs are ugly in the extreme. This is a crit 1 situation. And it occurs so rapidly that human intervention is not practical.

(Click image for STS-93 LOX ECO Cut Off video).

‘The system is biased toward a LOX shutdown. There have been three LOX shutdowns in the history of the program, the abort to orbit on STS-51-F caused by faulty SSME redline sensors that erroneously shut down a perfectly good engine, and two cases STS-78 and STS-93, where failures in the system caused off nominal performance in the engines.

‘There has never been an LH2 shutdown, although post flight reconstruction point to a few flights where it is possible that we were close, and much is made of the LHS cut-off sensors showing dry on a couple of flights post MECO. I personally find this last evidence not compelling since the fluid remaining in the tank will slosh or rebound as MECO and it is likely that flashing dry would normally occur.’

Backing up his call for the debate on removing the troublesome system from the LCC, Hale spoke of his experiences when he was working at the Mission Control Center (MCC).

‘For many years in the MCC, I was part of the team that practiced ‘engine performance cases’. The Booster office would identify something was wrong in one or more of the engines or the MPS system, and appropriate action (declaration of a TAL or ATO abort, etc.) would be taken. Most of the time we got it right, sometimes we got it wrong, that is the nature of the practice.

‘There are a large number of cases, the signatures are sometimes clear and other times subtle, and so the team has tried to codify the exact signatures of what these cases are and how to respond to them. I am sure that since I left the MCC operations that improvements have been made. That must be part of the review.

‘However, it is clear that there are some cases that would result in subtle engine off nominal operations that would eat through the reserves (Flight Performance Reserve and Fuel Bias plus any benefit from Ascent Performance Margin and launching at the in-plane or optimum time), and would not be detectable by the MCC. These cases would be catastrophic.

‘Part of the review we need to have before agreeing to launch without a functioning LH2 cut-off system is an examination of those cases, and their likelihood. If the likelihood of encountering one of these invariably catastrophic cases is of the same order as other accepted risks, say MMOD at the 1 in 300-ish level, then the program may accept that. But we need to review it.

‘We also must review the tools that could be used to detect and evaluation these cases. How robust is the Abort Regions Determinator (high, I believe). Is all the instrumentation crit 1 or is some of it crit 3 (I believe it is almost all crit 1). How likely is it to maintain telemetry and communications with the MCC at the critical times (based on redundancy and past history, high). Again, my estimation is in parenthesis, and we need to review that and make sure that we understand it.’

Hale also revealed that he has been considering the validity behind sticking with the LH2 ECO system – and its continued troubleshooting effort – for some time. His opinion is likely to have been backed yet further via the subsequent ECO system issue during the second launch attempt on Sunday.

‘In my opinion, the program has avoided having the discussion of launching without LH2 cut-off system because until now there has always been a credible improvement/troubleshooting path that held hope of restoring the reliability of this system. In fact, the last several flights had lulled me and others into thinking we had accomplished our goal.

‘Friday’s events proved that the system is unreliable. Having exhausted all the ways we can think of to make it reliable, it is now time to consider whether we can live without it.

‘It is likely that this system has been unreliable all along. If the sensor systems fail to the wet state late in the countdown or in flight, that is undetectable until the recently installed ECO voltage instrumentation was installed. It seems to me likely that we have been flying the entire history of the program with a false sense of security and that we have never had reliable protection from LH2 low level cut-off. That is a really sobering thought.

‘I am considering issuing actions in other areas where we have safety systems that have never been exercised to see if we can better test and make sure they are functional to prove that we have not been fooled by some smart failures. Lets take our time and consider this ‘old’ proposal one more time.’L2 members: All documentation – from which the above article has quoted snippets – is available in full in the related L2 sections, updated live.