Share this story

Nearly one month ago, Boeing completed the first orbital test flight of its Starliner spacecraft with a near-perfect landing at White Sands Space Harbor in New Mexico.

The mission had to be cut short due to a well-publicized timing error that delayed the spacecraft's service module from performing an orbital insertion burn. This caused the thrusters on board the service module, which provides power to Starliner during most of its mission, to fire longer than expected. As a result, the spacecraft did not have enough fuel to complete a rendezvous with the International Space Station, a key component of the test flight in advance of crewed missions.

Since providing some initial information during a post-flight news conference, NASA and Boeing have gone mostly quiet about the investigation into the timing error. Two weeks ago, the space agency said it had initiated two investigations. One would find the root cause of the "mission elapsed timer anomaly" over the course of about two months, and the second will determine whether another uncrewed test flight of Starliner is required before astronauts fly on the vehicle.

Further Reading

The NASA release did not mention thruster performance, but an agency source told Ars that engineers are looking closely at the performance of the Starliner propulsion system. In addition to four large launch abort engines, the service module has 28 reaction control system thrusters, each with 85 pounds of thrust and 20 more-powerful orbital maneuvering thrusters, each with 1,500 pounds of thrust.

During the post-flight news conference Jim Chilton, Boeing's senior vice president of the Space and Launch division, said the service module thrusters were stressed due to their unconventional use in raising Starliner's orbit instead of performing one big burn. As a result, the company had to shut down one manifold, which effectively branches into several lines carrying propellant to four thrusters. "We even shut down one manifold as we saw pressure go low 'cause it had been used a lot," he said.

The NASA source said eight or more thrusters on the service module failed at one point and that one thruster never fired at all.

In response to a question about thruster performance, Boeing provided the following statement to Ars: "After the anomaly, many of the elements of the propulsion system were overstressed, with some thrusters exceeding the planned number of burns for a service module mission. We took a few cautionary measures to make sure the propulsion system stayed healthy for the remainder of the mission, including re-pressurizing the manifold, recovering that manifold’s thrusters. Over the course of the mission we turned off 13 thrusters and turned all but one back on after verifying their health."

Abort demo

Although it did not fly up to the altitude of the space station and perform a rendezvous and docking during its test flight, Starliner did fly an "abort demonstration" that simulated approaching and backing away from the space station. The NASA source said Boeing may also have failed this test due to thruster issues. Boeing denied this.

"In testing the system the spacecraft executed all the commands, but we did observe a lower than expected delta V during the backing away phase," Boeing said in a statement. "Current evidence indicates the lower delta V was due to the earlier cautionary thruster measures, but we are carefully reviewing data to determine whether this demonstration should be repeated in the subsequent mission."

Further Reading

A major part of the joint NASA-Boeing investigation will entail assessing the performance of the spacecraft's overall propulsion system and determining the root cause of the observed issues—be it software errors, excess stress during the orbit-raising maneuver, or some more systemic problem. In June 2018, Boeing suffered a significant setback during a ground-based test of its propulsion system. Last November, however, its four main abort engines appeared to perform nominally during a 75-second test.

Both Boeing and NASA officials said it would be premature to discuss the matter further until the investigations are complete.

488 Reader Comments

I will give Boeing a semi-pass on this one. The thrusters were used in a manner they would never be used during a mission, so it's probably not surprising they had issues after that. However, it is concerning that one thruster never fired at all. That is certainly not acceptable, and unfortunately, may be yet another failure of Boeing's quality assurance culture. In light of this additional finding, there is absolutely no way they should be allowed to do a manned flight without a successful unmanned demo. Considering Dragon had yet another great flight adds more fuel to that argument. If SpaceX can do it, why can't Boeing?

We don't know if that thruster failed to fire because of an issue with the thruster or an issue with the fuel manifold.

Also remember that time SpaceX lost an entire capsule during a test? That's way more serious than any failure Boeing has had.

Whether or not it was "way more serious" is besides the point that the anomaly did delay the Crew Dragon development, and rightfully so. Both NASA and SpaceX hunkered down and worked the problem, and there have been dozens of tests of the new system by now. What is concerning people is that Boeing may not be held to these standards, and just be allowed to handwave their issues away -- right up until we have another loss of life that could have been prevented by rational management and quality control.

Boeing ceased to be an "engineer's company" when they merged with McDonnell Douglas, and the failing MD's corporate structure replaced Boeing's. One wonders how these recent fails are due to that change.

As a Seattleite with Boeing friends, I’d put the moment when they moved corporate HQ to Chicago. Their objective was to reduce the power of the unions (which it did) but the significant result was to divorce management from engineering.

As if all this wasn’t enough, the NYT has an article today about how Boeing worked to bury criticism after the Air Ethiopia of an 737 NG a couple of years before the MAX crashes. Eerie similarities include single sensor failure leading to powerful automatic intervention that the pilots couldn’t counteract. Here too, weird feature decisions compromised safety (there was a paid upgrade that used two sensors, but it wasn’t ever retrofitted to old model).

Joe Sedor, the National Transportation Safety Board official who led the American team working on the Turkish Airlines investigation, said it was not unusual for investigating bodies to make changes to a report after receiving feedback, or for American safety officials to jointly submit their comments with Boeing.

Mr. Sedor is now overseeing the N.T.S.B.’s work on the Max crashes. He acknowledged that reliance on a single sensor was a contributing factor in both cases but cautioned against focusing on it.

The same guy who helped Boeing bury the report ten years ago is in charge of the Max investigations. Fucking A.

Joe Sedor has led many investigations for the NTSB, specializing in foreign investigations, and he's done an excellent job. Watch some "Mayday" episodes. He's a good guy.

The Turkish Airlines accident has some similarities to the MCAS saga because all Boeing jets except for the 777 and 787 use only the sensors on the captain's side to drive the autopilot and autothrottle systems. In this case, the captain's radio altimeter was erroneously reading -8 feet, and the flight crew was fully aware of that. The ground proximity warning system (GPWS) kept going off throughout the descent, and the captain noted it was due to the radio altimeter.

What happened in this accident is that after lowering the landing gear and extending the flaps for final approach, the autothrottle went into landing mode, noticed that the radio altimeter indicated the plane was on the ground, and commanded idle thrust. The crew didn't notice their airspeed dropping, and they didn't notice that the autothrottles weren't pushing forward to overcome the drag from the landing gear and flaps like on a normal approach. They just sat there and did nothing while the aircraft stalled and fell to the ground.

Unlike MCAS, all they needed to do was push the autothrottle disconnect on the thrust levers and push them forward. They should have been watching their airspeed, and they should have executed a missed approach. There was nothing preventing them from doing so.

This was definitely a case of a highly imperfect automation system failing, but in this case, Boeing could argue with a straight face that it was reasonable to expect the flight crew to function as the backup for this system, whereas MCAS put flight crews in a thoroughly unreasonable situation.

"In testing the system the spacecraft executed all the commands, but we did observe a lower than expected delta V during the backing away phase," Boeing said in a statement. "Current evidence indicates the lower delta V was due to the earlier cautionary thruster measures, but we are carefully reviewing data to determine whether this demonstration should be repeated in the subsequent mission."

"Earlier cautionary thruster measures", meaning that they had to shut things down because the thrusters were failing. Boeing's use of corporate doublespeak doesn't quite make them sound particularly competent.

As for giving them a pass. Bad things happen in space, where equipment often needs to be used out of spec. NASA has a history of needing equipment to do the extraordinary. Would be good for Starliner not to have thrusters that are just barely what is required and failures on a stress test like what occurred isn't promising.

This is more serious than your interpretation suggests. It's one thing to have lower thrust from the RCS system because some of the thrusters are down. It's quite another to have lower delta-v; this implies that, even accounting for some thrusters missing, the remaining combination of thrusters was unable to expend enough propellant to do the job of backing away.

I don't know the details of this approach and abort test, but if they couldn't generate as much delta-v on abort (i.e., backing away), then the Starliner would be left with a forward velocity. That would be pretty bad if the ISS were in front of it.

I will give Boeing a semi-pass on this one. The thrusters were used in a manner they would never be used during a mission, so it's probably not surprising they had issues after that.

I agree that it’s not surprising that the thruster system had issues after being asked to perform beyond the spec.

Surely with a limited amount of propellant on board, they could spec the engines to how much they might actually be fired? It's not like they were run for half an hour longer than thought possible.

I think the simpler thing is to ensure the thrusters aren't used pointlessly for more than 7 minutes continually. There is no scenario where that should have happened so make sure it doesn't happen. The thrusters were designed for 30 second precision attitude burns and instead ran for 7 minutes continually. That is way outside their duty cycle.

These are cold-gas thrusters, yes? So the manifold shutdown might have been to maintain pressure. But even if the thrusters had been used too long I still don’t see how it leads to a delta-V miss unless they were completely out of reaction mass or there was a control system failure.

I love how they try to highlight how well they recovered from the "anomaly", and conveniently ignore the fact that their own programming is what caused the system to be overstressed and fire the thrusters too many times and run the manifold pressure down too low!

Its not like that mode was an "emergency save the crew" mode; it was supposed to be some kind of station-keeping/orientation-management mode, right? Seems like pretty poor software design & controls, letting a non-emergency mode drive your propulsion system into an overstressed state...

It almost sounds similar to a certain jumbo-jet's MCAS system deciding to take the word of a single sensor as proof that the whole plane should be firmly planted in the dirt instead of possibly stalling out. Who knows how many hidden systems are going to be working against the first manned crew on one of these things? I'm just imagining the capsule coasting to within 10 feet of the ISS when it decides to jettison the passenger hatch because a sensor built into the flotation buoy thinks it detected a safe water landing. Boeing is like a clown-car of bad decisions; every time you think it's finally empty, another bad decision climbs out and slams a pie in the face of safety over profits.

I looked up the Starliner thruster configuration and it's really a peculiar setup:

12 RCS thrusters in the crew module. These are used exclusively for orienting the crew module before and during reentry, after the service module has been jettisoned.

28 RCS thrusters in the service module. These are used for attitude control during the majority of the mission (as long as the service module still is around).

20 OMS thrusters in the service module. These are the 30000 lbf thrusters, they are distributed around the service module very much like an RCS. The "downward" pointing thrusters seem to be used for orbital manoeuvring, while the entire array of 20 thrusters is exclusively used for attitude control during low altitude aborts, since the stack is aerodynamically unstable and needs to be controlled with an iron fist against aerodynamical forces in this situation, so that the "normal" RCS isn't powerful enough.

So each of these thrusters is there for a reason and has a job, it's just that the overall design is a not very smart one.

This whole hubbub reminds me of the drumbeat a year or so ago from the "NASA is too risk averse! In the good Old Days they strapped 'em in and lit the candle. NASA has become a bunch of snowflakes!" crowd. Not much of a peep from them lately.......

But a little more seriously, and perhaps an unpopular opinion, but I wish they'd just throw that flag out the ISS airlock. I have zero interest in who wins the "race." I want there to be two (or more) safe (as safe as spaceflight can be, which should be pretty safe by now into low orbit) domestic carriers into LEO. Boeing did not demonstrate this reasonable level of safety and capability in this demo flight, we need a do-over until they do. I would say the same for SpaceX or Dreamliner or a Wells Cavorite sphere.

If SpaceX gets delayed while they train up for extending the capabilities of the Crew Dragon and Boeing gets there first, fine, as long as Boeing has passed rigorous quality checks to get there.

That last is the important point public pressure needs to focus on.

Not fine, because you are rewarding the company with the most problems, the worst culture of the two with a huge PR win.

Positive PR is money, you are fine with giving free money to Boeing, while they are the worst of the two companies here.

You might value having 2 systems more than who gets rewarded and who gets punished. I value giving more to the better company more than that. Because next time, Boeing might use their lobbying again to keep a better company outside of a government contract (Sierra Nevada Dream Chaser). And they might succeed if they have the positive PR of having been first. If everybody saw that the small upstart is the better company, then maybe next time, we get two good companies, instead of a good company and Boeing.

Boeing ceased to be an "engineer's company" when they merged with McDonnell Douglas, and the failing MD's corporate structure replaced Boeing's. One wonders how these recent fails are due to that change.

As a Seattleite with Boeing friends, I’d put the moment when they moved corporate HQ to Chicago. Their objective was to reduce the power of the unions (which it did) but the significant result was to divorce management from engineering.

As if all this wasn’t enough, the NYT has an article today about how Boeing worked to bury criticism after the Air Ethiopia of an 737 NG a couple of years before the MAX crashes. Eerie similarities include single sensor failure leading to powerful automatic intervention that the pilots couldn’t counteract. Here too, weird feature decisions compromised safety (there was a paid upgrade that used two sensors, but it wasn’t ever retrofitted to old model).

Joe Sedor, the National Transportation Safety Board official who led the American team working on the Turkish Airlines investigation, said it was not unusual for investigating bodies to make changes to a report after receiving feedback, or for American safety officials to jointly submit their comments with Boeing.

Mr. Sedor is now overseeing the N.T.S.B.’s work on the Max crashes. He acknowledged that reliance on a single sensor was a contributing factor in both cases but cautioned against focusing on it.

The same guy who helped Boeing bury the report ten years ago is in charge of the Max investigations. Fucking A.

Joe Sedor has led many investigations for the NTSB, specializing in foreign investigations, and he's done an excellent job. Watch some "Mayday" episodes. He's a good guy.

The Turkish Airlines accident has some similarities to the MCAS saga because all Boeing jets except for the 777 and 787 use only the sensors on the captain's side to drive the autopilot and autothrottle systems. In this case, the captain's radio altimeter was erroneously reading -8 feet, and the flight crew was fully aware of that. The ground proximity warning system (GPWS) kept going off throughout the descent, and the captain noted it was due to the radio altimeter.

What happened in this accident is that after lowering the landing gear and extending the flaps for final approach, the autothrottle went into landing mode, noticed that the radio altimeter indicated the plane was on the ground, and commanded idle thrust. The crew didn't notice their airspeed dropping, and they didn't notice that the autothrottles weren't pushing forward to overcome the drag from the landing gear and flaps like on a normal approach. They just sat there and did nothing while the aircraft stalled and fell to the ground.

Unlike MCAS, all they needed to do was push the autothrottle disconnect on the thrust levers and push them forward. They should have been watching their airspeed, and they should have executed a missed approach. There was nothing preventing them from doing so.

This was definitely a case of a highly imperfect automation system failing, but in this case, Boeing could argue with a straight face that it was reasonable to expect the flight crew to function as the backup for this system, whereas MCAS put flight crews in a thoroughly unreasonable situation.

I would agree, which is why I was cautious about drawing parallels. But the NYT has another expert calling it a “sentinel” event, and blaming the pilots led Boeing to not learn anything from it. Like: it’s a bad idea to make people buy upgrades for better safety and you should really look hard at what an autonomous system might do based on a single erroneous sensor reading and maybe you should train pilots better if you expect them to react within seconds.

I also agree that there has been nothing yet in any of these investigations that dents the NTSB’s sterling reputation (unlike the FAA). But I would hazard a guess that Mr. Sedor has spent some sleepless nights wondering whether the NTSB should have pushed back harder.

I will give Boeing a semi-pass on this one. The thrusters were used in a manner they would never be used during a mission, so it's probably not surprising they had issues after that.

I agree that it’s not surprising that the thruster system had issues after being asked to perform beyond the spec.

Surely with a limited amount of propellant on board, they could spec the engines to how much they might actually be fired? It's not like they were run for half an hour longer than thought possible.

Actually, that's a really outstanding point. If there's "X" amount of propellant on board, surely the thrusters were designed with enough "life per mission" to burn all of that propellant, yes?

So the "violation" here was in duty-cycle, not total thrust duration. The thrusters were operated longer, over a defined period of time, than they were designed to do, including the FOS I mentioned earlier. Likely overheated.

How exactly is it that the flight-management software allowed this without throwing an error and falling back to either a secondary mode that would shuttle between thruster groups (they certainly have ENOUGH of the things on this vehicle), or abort the burn entirely? Is it actually possible that the engine management software is open-loop and didn't have sensors to detect the thrusters (presumably) overheating?

I understand that the prime operating design case for the vehicle is to have astronauts on-board. But if the vehicle is required to fly unmanned (which, by terms of the contract for the OFT, it is), how is the flight-management software not able to handle what seems like perfectly reasonable contingencies and system-preservation tasks? What the heck?

EDIT: Ninja'd a bit by Statistical and Danrarbc

Only if each set of thrusters has its own prop tanks. If they're all connected to one (or two capsule and service module) set of prop tanks - as the ability to fire the RCS thrusters for an extra long burn to make up for the bigger ones intended to do so not firing when expected to implies; then being able to burn them for far longer than they're designed to in an emergency is plausible.

I looked up the Starliner thruster configuration and it's really a peculiar setup:

12 RCS thrusters in the crew module. These are used exclusively for orienting the crew module before and during reentry, after the service module has been jettisoned.

28 RCS thrusters in the service module. These are used for attitude control during the majority of the mission (as long as the service module still is around).

20 OMS thrusters in the service module. These are the 30000 lbf thrusters, they are distributed around the service module very much like an RCS. The "downward" pointing thrusters seem to be used for orbital manoeuvring, while the entire array of 20 thrusters is exclusively used for attitude control during low altitude aborts, since the stack is aerodynamically unstable and needs to be controlled with an iron fist against aerodynamical forces in this situation, so that the "normal" RCS isn't powerful enough.

So each of these thrusters is there for a reason and has a job, it's just that the overall design is a not very smart one.

The OMAC (orbital maneuvering and attitude control) thrusters are 1500 lbf, not 30000.

Triggered by them setting a date for the 737 line shutdown. Still a bit surprising, I'd've expected most of the impact to've been priced in a month ago when they first announced that it was going to happen in the near future.

I will give Boeing a semi-pass on this one. The thrusters were used in a manner they would never be used during a mission, so it's probably not surprising they had issues after that.

I agree that it’s not surprising that the thruster system had issues after being asked to perform beyond the spec.

Surely with a limited amount of propellant on board, they could spec the engines to how much they might actually be fired? It's not like they were run for half an hour longer than thought possible.

Actually, that's a really outstanding point. If there's "X" amount of propellant on board, surely the thrusters were designed with enough "life per mission" to burn all of that propellant, yes?

Can you run linpack on all the cores in your PC processor at full frequency indefinitely?

A lot of things are designed for statistical limits not absolute. Like central office phone switches. If something like over 50% of subscribers pick up the receiver at once quality of service will be compromised if not the switch falling over altogether.

I will give Boeing a semi-pass on this one. The thrusters were used in a manner they would never be used during a mission, so it's probably not surprising they had issues after that.

I agree that it’s not surprising that the thruster system had issues after being asked to perform beyond the spec.

Surely with a limited amount of propellant on board, they could spec the engines to how much they might actually be fired? It's not like they were run for half an hour longer than thought possible.

Actually, that's a really outstanding point. If there's "X" amount of propellant on board, surely the thrusters were designed with enough "life per mission" to burn all of that propellant, yes?

So the "violation" here was in duty-cycle, not total thrust duration. The thrusters were operated longer, over a defined period of time, than they were designed to do, including the FOS I mentioned earlier. Likely overheated.

How exactly is it that the flight-management software allowed this without throwing an error and falling back to either a secondary mode that would shuttle between thruster groups (they certainly have ENOUGH of the things on this vehicle), or abort the burn entirely? Is it actually possible that the engine management software is open-loop and didn't have sensors to detect the thrusters (presumably) overheating?

I understand that the prime operating design case for the vehicle is to have astronauts on-board. But if the vehicle is required to fly unmanned (which, by terms of the contract for the OFT, it is), how is the flight-management software not able to handle what seems like perfectly reasonable contingencies and system-preservation tasks? What the heck?

EDIT: Ninja'd a bit by Statistical and Danrarbc

Only if each set of thrusters has its own prop tanks. If they're all connected to one (or two capsule and service module) set of prop tanks - as the ability to fire the RCS thrusters for an extra long burn to make up for the bigger ones intended to do so not firing when expected to implies; then being able to burn them for far longer than they're designed to in an emergency is plausible.

Pretty sure you'd design the thrusters for the maximum duration possible given the amount of propellant on-board, not the "optimum" duration.

Or at least you should. Having thrusters that can't last the longest possible burn in an emergency use of the system tends to lead to said emergency turning out worse than it started out being.

If I'm right, Boeing "just" committed a software design error - albeit to the most-critical software on the vehicle.

If you're right, Boeing committed that software design error and inadequately designed thrusters for an easily foreseeable emergency scenario.

I looked up the Starliner thruster configuration and it's really a peculiar setup:

12 RCS thrusters in the crew module. These are used exclusively for orienting the crew module before and during reentry, after the service module has been jettisoned.

28 RCS thrusters in the service module. These are used for attitude control during the majority of the mission (as long as the service module still is around).

20 OMS thrusters in the service module. These are the 30000 lbf thrusters, they are distributed around the service module very much like an RCS. The "downward" pointing thrusters seem to be used for orbital manoeuvring, while the entire array of 20 thrusters is exclusively used for attitude control during low altitude aborts, since the stack is aerodynamically unstable and needs to be controlled with an iron fist against aerodynamical forces in this situation, so that the "normal" RCS isn't powerful enough.

So each of these thrusters is there for a reason and has a job, it's just that the overall design is a not very smart one.

The OMAC (orbital maneuvering and attitude control) thrusters are 1500 lbf, not 30000.

I will give Boeing a semi-pass on this one. The thrusters were used in a manner they would never be used during a mission, so it's probably not surprising they had issues after that.

I agree that it’s not surprising that the thruster system had issues after being asked to perform beyond the spec.

Surely with a limited amount of propellant on board, they could spec the engines to how much they might actually be fired? It's not like they were run for half an hour longer than thought possible.

I think the simpler thing is to ensure the thrusters aren't used pointlessly for more than 7 minutes continually. There is no scenario where that should have happened so make sure it doesn't happen. The thrusters were designed for 30 second precision attitude burns and instead ran for 7 minutes continually. That is way outside their duty cycle.

These are cold-gas thrusters, yes? So the manifold shutdown might have been to maintain pressure. But even if the thrusters had been used too long I still don’t see how it leads to a delta-V miss unless they were completely out of reaction mass or there was a control system failure.

Thank you. That brings up issues of excess temperatures from excessive use. But it’s still really nagging that they couldn’t achieve the requisite delta-V. I would assume this type of test is time-limited (eg achieve x m/s away from rendezvous point within y seconds). Failing this kind of implies that overuse fried the RCS system to the point where it was no longer capable of it’s design function. If true, that would be a hell of a lot worse that the clock issue, because you’d have to alter RCS design (at least somewhat) and that would require a significant amount of retesting as well as simulation.

I will give Boeing a semi-pass on this one. The thrusters were used in a manner they would never be used during a mission, so it's probably not surprising they had issues after that.

I agree that it’s not surprising that the thruster system had issues after being asked to perform beyond the spec.

Surely with a limited amount of propellant on board, they could spec the engines to how much they might actually be fired? It's not like they were run for half an hour longer than thought possible.

Actually, that's a really outstanding point. If there's "X" amount of propellant on board, surely the thrusters were designed with enough "life per mission" to burn all of that propellant, yes?

So the "violation" here was in duty-cycle, not total thrust duration. The thrusters were operated longer, over a defined period of time, than they were designed to do, including the FOS I mentioned earlier. Likely overheated.

How exactly is it that the flight-management software allowed this without throwing an error and falling back to either a secondary mode that would shuttle between thruster groups (they certainly have ENOUGH of the things on this vehicle), or abort the burn entirely? Is it actually possible that the engine management software is open-loop and didn't have sensors to detect the thrusters (presumably) overheating?

I understand that the prime operating design case for the vehicle is to have astronauts on-board. But if the vehicle is required to fly unmanned (which, by terms of the contract for the OFT, it is), how is the flight-management software not able to handle what seems like perfectly reasonable contingencies and system-preservation tasks? What the heck?

EDIT: Ninja'd a bit by Statistical and Danrarbc

Only if each set of thrusters has its own prop tanks. If they're all connected to one (or two capsule and service module) set of prop tanks - as the ability to fire the RCS thrusters for an extra long burn to make up for the bigger ones intended to do so not firing when expected to implies; then being able to burn them for far longer than they're designed to in an emergency is plausible.

Pretty sure you'd design the thrusters for the maximum duration possible given the amount of propellant on-board, not the "optimum" duration.

Or at least you should. Having thrusters that can't last the longest possible burn in an emergency use of the system tends to lead to said emergency turning out worse than it started out being.

If I'm right, Boeing "just" committed a software design error - albeit to the most-critical software on the vehicle.

If you're right, Boeing committed that software design error and inadequately designed thrusters for an easily foreseeable emergency scenario.

Things like this have a maximum duty cycle/sustained period of continuous operation for thermal reasons (as do paper shredders and garage door openers). Designing them to support 100% duty cycle indefinitely means they have to be bigger and heavier.

I will give Boeing a semi-pass on this one. The thrusters were used in a manner they would never be used during a mission, so it's probably not surprising they had issues after that.

I agree that it’s not surprising that the thruster system had issues after being asked to perform beyond the spec.

Surely with a limited amount of propellant on board, they could spec the engines to how much they might actually be fired? It's not like they were run for half an hour longer than thought possible.

Actually, that's a really outstanding point. If there's "X" amount of propellant on board, surely the thrusters were designed with enough "life per mission" to burn all of that propellant, yes?

So the "violation" here was in duty-cycle, not total thrust duration. The thrusters were operated longer, over a defined period of time, than they were designed to do, including the FOS I mentioned earlier. Likely overheated.

How exactly is it that the flight-management software allowed this without throwing an error and falling back to either a secondary mode that would shuttle between thruster groups (they certainly have ENOUGH of the things on this vehicle), or abort the burn entirely? Is it actually possible that the engine management software is open-loop and didn't have sensors to detect the thrusters (presumably) overheating?

I understand that the prime operating design case for the vehicle is to have astronauts on-board. But if the vehicle is required to fly unmanned (which, by terms of the contract for the OFT, it is), how is the flight-management software not able to handle what seems like perfectly reasonable contingencies and system-preservation tasks? What the heck?

EDIT: Ninja'd a bit by Statistical and Danrarbc

Only if each set of thrusters has its own prop tanks. If they're all connected to one (or two capsule and service module) set of prop tanks - as the ability to fire the RCS thrusters for an extra long burn to make up for the bigger ones intended to do so not firing when expected to implies; then being able to burn them for far longer than they're designed to in an emergency is plausible.

Pretty sure you'd design the thrusters for the maximum duration possible given the amount of propellant on-board, not the "optimum" duration.

Or at least you should. Having thrusters that can't last the longest possible burn in an emergency use of the system tends to lead to said emergency turning out worse than it started out being.

If I'm right, Boeing "just" committed a software design error - albeit to the most-critical software on the vehicle.

If you're right, Boeing committed that software design error and inadequately designed thrusters for an easily foreseeable emergency scenario.

No that isn't a realistic assumption at all. Designing a thruster so it can empty the entire shared propellant tank for all the thrusts buring at 100% thrust continually (100% duty cycle) until the tank is exhausted. Of course the same would be true for all the thrusters so you would be adding pointless mass and complexity across the entire spacecraft to support something it doesn't need to do.

There is nothing wrong with the thrusters having reasonable duty cycle limits. A 30 second duty cycle is a very long time for rcs thrusters which usually are limited to a second or two.

The real issue is the timer failing causes the thrusters to start precision attitude control and then nothing in the system stopped them even after they had exceeded their duty cycle by a factor of 10. They were firing for so long they overheated and possibly damaged the internal sensors in the thrusters. Nothing stopped them as a sanity which shows just how fragile the system is.

I looked up the Starliner thruster configuration and it's really a peculiar setup:

12 RCS thrusters in the crew module. These are used exclusively for orienting the crew module before and during reentry, after the service module has been jettisoned.

28 RCS thrusters in the service module. These are used for attitude control during the majority of the mission (as long as the service module still is around).

20 OMS thrusters in the service module. These are the 30000 lbf thrusters, they are distributed around the service module very much like an RCS. The "downward" pointing thrusters seem to be used for orbital manoeuvring, while the entire array of 20 thrusters is exclusively used for attitude control during low altitude aborts, since the stack is aerodynamically unstable and needs to be controlled with an iron fist against aerodynamical forces in this situation, so that the "normal" RCS isn't powerful enough.

So each of these thrusters is there for a reason and has a job, it's just that the overall design is a not very smart one.

So you're saying they developed an aerodynamically unstable craft that relies on software and active aggressive attitude control to correct the issue?

Seems like Boeing would be better off doing a massive Engineer Purge of their lead staff.

The biggest fault with Boeing's severe culture deficiencies lie with the suits in Chicago, not the engineering staff. I don't doubt there are engineers making bad decisions too - but far too many of Boeing's bad decisions are decisions being made in Chicago.

No that isn't a realistic assumption at all. Designing a thruster so it can empty the entire shared propellant tank for all the thrusts buring at 100% thrust continually (100% duty cycle) until the tank is exhausted would be dumb.

That thruster would need to be massive with a complex cooling system to ensure it could work under those conditions, conditions which it has no need to meet.

There is nothing wrong with the thrusters having reasonable duty cycles and 30 second duty cycle is a very long time for rcs thrusters. The real issue is the timer failing causes the thrusters to start precision attitude control and then nothing in the system stopped them even after they had exceeded their duty cycle by a factor of 10.

I mean I don't see anything inherently wrong with having the thrusters fire longer than rated if in a situation where thrusters are required for that long. It seems that having a hard rule that thrusters shut down after 30 seconds could cause problems in an off-nominal case where precise attitude control was needed for longer.

The problem as I see it was just the computer thinking it was in a situation where it needed thrusters when it did not.

Also throw in the possibility that there was a thruster that did not fire. That scares the thrust out of me because it creates a whole separate failure loop from the timing issue because it was a single thruster that failed to act..

It feels like there's some sliding back to the same "normalisation of deviance" mentality that lost two shuttles.

I will give Boeing a semi-pass on this one. The thrusters were used in a manner they would never be used during a mission, so it's probably not surprising they had issues after that.

I agree that it’s not surprising that the thruster system had issues after being asked to perform beyond the spec.

Surely with a limited amount of propellant on board, they could spec the engines to how much they might actually be fired? It's not like they were run for half an hour longer than thought possible.

Actually, that's a really outstanding point. If there's "X" amount of propellant on board, surely the thrusters were designed with enough "life per mission" to burn all of that propellant, yes?

Can you run linpack on all the cores in your PC processor at full frequency indefinitely?

A lot of things are designed for statistical limits not absolute. Like central office phone switches. If something like over 50% of subscribers pick up the receiver at once quality of service will be compromised if not the switch falling over altogether.

Agreed, but if you can foresee enough of an emergency to enable use of a given thruster configuration longer than the total pack could be used (they an be activated and controlled independently, for example), not designing them to withstand that foreseeable usage is inexcusable.

I'm not talking about some hack where Col. Cotton saves the B-70 with a paperclip. I mean if you can use "thruster manifold A" for adequate RCS control, then the thrusters in manifold A better be able to withstand burning the entire RCS fuel supply to which they have access, even if there's four manifolds and so each would "normally" only burn a quarter of the total duration.

IMHO, what happened in the OFT was far more likely to have been a duty-cycle problem, where burning the thrusters for 7 minutes straight was in violation of some (totally hypothetical here) 50% duty-cycle or something, and they overheated. Could be something as simple as deeply heat-soaking the nozzle throat material, leading to potential cracks, softening, etc..

In the event ALL the thrusters (save the one that didn't work) were operating during the OFT, and if the issue was that they expended their entire design life duration, plus the FOS, in that seven-minute burn, and there was still propellant left for the remainder of the mission, then there's something SERIOUSLY wrong with the design methodology for those thrusters. They shouldn't be able to do that.

If, instead, they violated a duty-cycle limit, then there's a major software issue in that the system allowed itself to do that without compensating. That's assuming they didn't make the flight management software open-loop, which already seems to be the case regarding the main timer and the system operating the RCS. Those two didn't know what the other was doing as it stands. The RCS didn't know the main timer was 11h off and NOT firing the main orbital boost, and merrily went on its way stabilizing the vehicle for a burn that wasn't happening. That doesn't sound closed-loop to me.

like the 707 (a jet airliner? how ridiculous), the 747 (two levels? Who would want that),

Hmmm.

The 707's first commercial flight was six years after the Comet, so it was pretty clear the market existed. Especially considering the #1 request for the Comet was "can fly London-New York" and the second was "doesn't explode".

And the 747 only has a second level because it was originally designed as a cargo plane and the second level was so the pilots wouldn't get squished by the cargo containers on a short landing. It's the same reason the C-5 has a second level, as both aircraft come from the same original design contract.

I looked up the Starliner thruster configuration and it's really a peculiar setup:

12 RCS thrusters in the crew module. These are used exclusively for orienting the crew module before and during reentry, after the service module has been jettisoned.

28 RCS thrusters in the service module. These are used for attitude control during the majority of the mission (as long as the service module still is around).

20 OMS thrusters in the service module. These are the 30000 lbf thrusters, they are distributed around the service module very much like an RCS. The "downward" pointing thrusters seem to be used for orbital manoeuvring, while the entire array of 20 thrusters is exclusively used for attitude control during low altitude aborts, since the stack is aerodynamically unstable and needs to be controlled with an iron fist against aerodynamical forces in this situation, so that the "normal" RCS isn't powerful enough.

So each of these thrusters is there for a reason and has a job, it's just that the overall design is a not very smart one.

So you're saying they developed an aerodynamically unstable craft that relies on software and active aggressive attitude control to correct the issue?

Seems like Boeing would be better off doing a massive Engineer Purge of their lead staff.

The biggest fault with Boeing's severe culture deficiencies lie with the suits in Chicago, not the engineering staff. I don't doubt there are engineers making bad decisions too - but far too many of Boeing's bad decisions are decisions being made in Chicago.

The management layer probably felt there was nothing wrong with doing so because every jet fighter for the last ~40 years has been aerodynamically unstable and required continuous computer stabilization to maintain safe flight.

No that isn't a realistic assumption at all. Designing a thruster so it can empty the entire shared propellant tank for all the thrusts buring at 100% thrust continually (100% duty cycle) until the tank is exhausted. Of course the same would be true for all the thrusters so you would be adding pointless mass and complexity across the entire spacecraft to support something it doesn't need to do.

There is nothing wrong with the thrusters having reasonable duty cycle limits. A 30 second duty cycle is a very long time for rcs thrusters which usually are limited to a second or two.

The real issue is the timer failing causes the thrusters to start precision attitude control and then nothing in the system stopped them even after they had exceeded their duty cycle by a factor of 10. They were firing for so long they overheated and possibly damaged the internal sensors in the thrusters. Nothing stopped them as a sanity which shows just how fragile the system is.

No that isn't a realistic assumption at all. Designing a thruster so it can empty the entire shared propellant tank for all the thrusts buring at 100% thrust continually (100% duty cycle) until the tank is exhausted would be dumb.

That thruster would need to be massive with a complex cooling system to ensure it could work under those conditions, conditions which it has no need to meet.

There is nothing wrong with the thrusters having reasonable duty cycles and 30 second duty cycle is a very long time for rcs thrusters. The real issue is the timer failing causes the thrusters to start precision attitude control and then nothing in the system stopped them even after they had exceeded their duty cycle by a factor of 10.

I mean I don't see anything inherently wrong with having the thrusters fire longer than rated if in a situation where thrusters are required for that long. It seems that having a hard rule that thrusters shut down after 30 seconds could cause problems in an off-nominal case where precise attitude control was needed for longer.

The problem as I see it was just the computer thinking it was in a situation where it needed thrusters when it did not.

Statistical is right. Outside of emergency operations, no flight control mode, regardless of why it was triggered or how long it was held in effect, should damage the system to the point where it can no longer function up to design rating. It could have rotated redundant thrusters to reduce duty cycle (at the cost of some performance) or it could have gone into a safe mode. But it shouldn’t have damaged things to the point where even with ground commanded fixes RCS can’t perform to spec.

Things like this have a maximum duty cycle/sustained period of continuous operation for thermal reasons (as do paper shredders and garage door openers). Designing them to support 100% duty cycle indefinitely means they have to be bigger and heavier.

No, you and I are in violent agreement as well. I think I misunderstood your earlier post.

See my second post. I absolutely agree with the realism of duty-cycle limits, and that that was the likely issue in the OFT. Someone else earlier had suggested, I believe, that the thrusters had somehow exceeded their entire design life somehow, and I think I mistakenly linked that to your post.

And now NASA is suggesting that Dragon 2 may finally be approved for an extended first manned stay at ISS, something the Boeing program somehow got approved for before the vehicle had ever seen air under it.

So, kudos for SpaceX, right?

Except that Birdenstein, when announcing the possibility, noted that it would require additional Dragon crew training for the extended stay, delaying the first manned launch. Again.

Anyone want to bet that the "additional training" will be just long enough in duration that Boeing can get caught back up, do an additional OFT (or be excused from it), and maybe be first to fly people?

I imagine these latest Boeing thruster issues (seriously, they didn't have enough design margin for this off-nominal flight?) will require even more Dragon crew training before they can fly people. Grrrrrrr...

I was thinking the same thing. It's been clear since day 1 that Boeing would fly the first crewed mission to the ISS. I honestly wouldn't be surprised if Boeing is allowed to do the crew demo in April or May and SpaceX ends up doing the short duration Crew mission in June after training the Astronauts for the extended duration mission because it is "too risky." Then in the fall Boeing still gets first dibs on the actual ISS missions.

No that isn't a realistic assumption at all. Designing a thruster so it can empty the entire shared propellant tank for all the thrusts buring at 100% thrust continually (100% duty cycle) until the tank is exhausted would be dumb.

That thruster would need to be massive with a complex cooling system to ensure it could work under those conditions, conditions which it has no need to meet.

There is nothing wrong with the thrusters having reasonable duty cycles and 30 second duty cycle is a very long time for rcs thrusters. The real issue is the timer failing causes the thrusters to start precision attitude control and then nothing in the system stopped them even after they had exceeded their duty cycle by a factor of 10.

I mean I don't see anything inherently wrong with having the thrusters fire longer than rated if in a situation where thrusters are required for that long. It seems that having a hard rule that thrusters shut down after 30 seconds could cause problems in an off-nominal case where precise attitude control was needed for longer.

The problem as I see it was just the computer thinking it was in a situation where it needed thrusters when it did not.

Statistical is right. Outside of emergency operations, no flight control mode, regardless of why it was triggered or how long it was held in effect, should damage the system to the point where it can no longer function up to design rating. It could have rotated redundant thrusters to reduce duty cycle (at the cost of some performance) or it could have gone into a safe mode. But it shouldn’t have damaged things to the point where even with ground commanded fixes RCS can’t perform to spec.

... unless ceasing to fire thrusters in order to preserve them for later would create an emergency, in which case ignoring the duty cycle and continuing to fire them is the correct thing to do.

The computer thought the orbital insertion burn was still ongoing. Precise attitude control is needed during such a burn or things can go south. Maintaining attitude during a burn was the correct thing to do, even if it meant exceeding the duty cycle.

The problem is that there was no burn, and thus no need for attitude control.