An anomaly on Phoenix Sol 22, but the spacecraft is in no danger

Today I received a press release from the University of Arizona about Phoenix' sol 22 activities. It talked about the new trench called "Snow White" in the Wonderland area (which I wrote about yesterday), and mentioned that TEGA was continuing work. A normal TEGA cycle includes three separate heating cycles on one sample, and since it had already done the first two on sols 18 and 20 to 35°C and 175°C (95 and 350°F), respectively, I assume that that line means that TEGA performed the final bake on sol 22 to its highest temperature of 1000°C (about 1800°F).

However, buried within that press release was the following sentence: "Newly planned science activities will resume no earlier than Sol 24 as engineers look into how the spacecraft is handling larger than expected amounts of data." In other words, no science activities were performed on sol 23 because of something wrong with the way Phoenix is handling its data. This sounded frighteningly similar to the sol 18 anomaly on Spirit. I wasn't writing this blog yet when that happened, but was supervising the Student Astronauts so I was actually inside rover operations when this unfolded. It was very frightening at the time; after an apparently normal day the rover suddenly quit talking to Earth, and over the course of the next few days it developed that "Spirit had insomnia, a fever, was getting weaker all the time, was babbling incoherently, and was largely unresponsive to commands," in the words of Mark Adler. (He wrote about those dramatic days and the recovery when he served as a guest blogger here while I was on maternity leave.)

So I sent off an interview request and was delighted that Phoenix mission manager Barry Goldstein who responded. Of all the mission management types who have appeared on the Phoenix press panels, he produces the best combination of detail and clarity of description. I asked him to give a little bit more detail about what was going on, and compare and contrast it with the Spirit sol 18 anomaly.

Barry said: "When the anomaly happened with Spirit, we lost communication. We never lost communication or control of [Phoenix]. It's quite different." He explained that, late on sol 22, the spacecraft team was monitoring the downlink of spacecraft housekeeping data. Housekeeping data includes all the important bits and pieces of information that the spacecraft records about its state of health, things like battery and state of charge, voltages here and temperatures there, that kind of thing. "They noticed one of the APIDs [Application System Identifiers] for a housekeeping data packet, which is normally generated only one to three times every time we do an uplink, was generated 45,000 times. It was a surprise, to say the least."

The telemetry they were looking at indicated that these 45,000 packets existed, but it did not yet include the content of those 45,000 packets. The telemetry also told them that all these housekeeping data packets were assigned high priority. That high priority led to the loss of some low-priority science data from sol 22. Here's why. Memory management works on Phoenix a lot like it does on a handheld computer. When it's powered up, during the day, the spacecraft works out of random-access memory (RAM). Its instructions, all the data it gathers (both science and housekeeping), everything is kept in RAM. As Phoenix has opportunities to dump the data from RAM to Earth via Odyssey or Mars Reconnaissance Orbiter, it can remove some of the stuff from RAM. However, at the end of the day, there is still usually some data remaining in RAM. Before Phoenix goes to sleep (kind of like when your handheld's screen goes dark), it transfers data from RAM to flash memory, according to its priority. If Phoenix runs out of room in flash after the high-priority data has been transferred to flash, the lower-priority data will be lost when Phoenix shuts down for the night. And that's what happened -- the 45,000 packets of housekeeping data "starved out" the science data.

Barry said that "the scientists are not at all concerned" about the loss of this low-priority data. But there was a more urgent problem, as he explained. "We have a restriction on the amount of time it takes for the spacecraft to boot. I can't remember the total value but it's over 60 seconds. If it doesn't boot within a certain amount of time, it will reset and then eventually go over to the B side." Phoenix has two redundant computer systems; if it failed to boot, it would go over to its backup computer system, a situation the managers want to avoid. Incidentally, the rovers do not have backup computers. "This data structure is huge because of these 45,000 blocks, [so] it has to pull that out of the flash as part of the boot process. And so we were concerned it would take too long and therefore it would side-swap. So we took some emergency action last night. Number one, we updated the priority of that APID such that it will restrict the amount of that data type to be saved in flash. The second thing we did is, we lost science operations on sol 23." That is, no science activities were performed yesterday. "The third thing we did is up the priority of the downlink of that data structure that we generated so often, so that we could retrieve what we have so it could help us diagnose the problem.

"The current state of the spacecraft is as follows: we have the data down, we have the spacecraft under control, and we have the size of the file system in control such that we're no longer worried about the size of the file system growing and keeping us from booting appropriately." So the situation is stable; Phoenix is not in any risk of going in to the kind of hazardous situation that Spirit suffered. However, they don't understand the root cause of the problem yet. I asked Barry about that and he said he preferred not to say anything yet, since they're less than a day into diagnosis and recovery, but that they do have a suspect.

It turns out that this cloud has a silver lining, Barry said. "The only restriction we put on science activity for sol 24, which the science team is planning right now, is that they can't save the data to the flash because we want to keep the flash small -- we don't want this thing to eat us alive. [But] because we were in this anomalous state, we requested, and received, a bunch of contingency passes from MRO and Odyssey. So what ends up happening is: we told the science team 'you can do whatever you want, because the only thing we are worried about was flash, we just are not going to save it to flash when we turn off.' And we then told them we have all these [downlink] passes. So as it turns out, what the science team is planning is the most data-rich sol we've had to date, because we have all these extra passes. I was joking with Peter [Smith, the principal investigator] that he should pray for these things more often because he gets more data."

So look for nextersol to be a good one for images!

I commented, in closing, that this seems much less scary than what happened with Spirit. Barry replied "It's much less scary but I'll feel a lot better when we know exactly what's going on.