Chips in Space: Lessons learned (Part 2)

For those who have been following this blog and the reader comments, you know that the battery failed eight days into the mission. For the next mission, we need to design a power system where a shorted battery will not render the satellite inoperable. Thankfully, the ARISSat-1 battery failed open; but that was luck. For an extraordinary story about amateur-satellite batteries, read how the battery shorted then reopened 21 years later on AMSAT OSCAR 7.

For those who have been following this blog and the reader comments, you know that the battery failed eight days into the mission. For the next mission, we need to design a power system where a shorted battery will not render the satellite inoperable. Thankfully, the ARISSat-1 battery failed open; but that was luck. For an extraordinary story about amateur-satellite batteries, read how the battery shorted then reopened 21 years later on AMSAT OSCAR 7.

With ARISSat-1’s perpetual off-in-eclipse and on-in-sunlight operation, we should consider adding a real time clock (RTC) to keep chronological time. This would also be helpful in the event the satellite reboots. Presently, ARISSat-1 keeps mission elapsed (MET) time in seconds from when it was last powered on. With the battery failed and the satellite effectively rebooting each orbit, the MET from the telemetry shows when it was last powered on. So, we have to use time of receipt to maintain the chronology of the telemetry.

Which leads me to another lesson learned—adding non-volatile memory for storing whole-orbit data. Whole-orbit data would be sampled at a known rate. This would give us the ability to analyze trends better. Presently, the telemetry is transmitted down on an “as gathered” basis.

Yardney silver-zinc battery used for development

And, in what likely will not be the last word on telemetry, Jerry Zdenek (N9YTK) adds:

“Pay more attention to the accuracy of the data. [For example] It'd be nice to have greater than 10 bits on the battery current.”

Another lesson we learned was to use a real-time operating system (RTOS) for the larger microcontrollers in the system. The ARISSat-1 firmware was written using a simple scheduler scheme. We initially thought this would provide the simplest solution. However, as we got further into the project, the team felt it really was more trouble than it was worth. An RTOS would make life a lot simpler.

Not-so Flat-sat (Can you identify all the subsystems in the photo?)

When designing the subsystems, insert plenty of debugging points—the more the better. You can’t have too few! And, if you stack the PCBs, as was done in the Internal Housekeeping Unit (IHU), ensure that you can use some sort of backplane or extender boards. Jerry, on his experience with the IHU stack:

“Debugging something in the middle of the stack (of course the one that needs it) was a nightmare. I had to build a bunch of different right-angle adapters and a few jumper cables to connect all the other jumpers. The Software Defined Transponder (SDX) was 90 degrees to the top; the Power Supply Unit (PSU) was 90 degrees to the bottom, jumpered to the IHU in a second spot. It was really ugly.”

We call this creating a “flat-sat,” with all of the PCBs and subsystems laid out flat on a table. Everything connected and communicating in what will become the final configurations. This allows the programming, debugging and probing of test points. However, there’s another catch—plan ahead on how to debug and reprogram the satellite when it is all closed up. When everything is “buttoned up,” it’s a real pain to open everything up to get at a particular test point, programming port and what have you, in order to implement a change. Dismantling and reassembling is a recipe for disaster.

Closing out our Lessons Learned series, Jim Johns (KA0IQT) had an interesting, if not unique, perspective from the project:

“As far as lessons learned, I suspect that most of the team will focus on the technical challenges. In my opinion, the bigger lesson learned was that the project was an ‘MBA in a Box’ (or in this case satellite.) Looking back, all of the normal disciplines that are covered in an MBA program were an integral part of the project. Organizational Behavior (the study of the relationships among management and teams), Operations Research (methods of optimizing processes and procedures, and decision making), Finance and Accounting (developing the financial strategy and tracking the costs of the program), and Marketing (the strategy to use in selling the product). In short, the project provided an opportunity to learn and grow both technically and managerially.”

And here we thought this project was all technical challenges!

The design team has put together a survey for everyone interested in this satellite project, including you, the loyal readers of this blog. We’d love to get your input on how the ARISSat-1 project has gone, and what you’d like to see in the next satellite. Please fill it out at: https://www.surveymonkey.com/s/arissat1-operation.

Next week, I’ll conclude this limited-series blog by examining the survey results and sharing our early plans for ARISSat-2.

Hi Steve, just curious to know ... did you do some kind of failure modes and effect analysis (FMEA) at the beginning of the design phase of the project? Were the modes of battery failures not considered?
Anyways, I think now it becomes part of integrated knowledge base to avoid such possibilities in future. All the best for next one! :)

Hi Sanjib, We did think about failure modes and how to recover from them. Tony Monteiro (AA2TX) wrote about the power system in the Jan/Feb 2011 issue of The AMSAT Journal. In the article he describes the failure modes of the battery. The article is available at http://tinyurl.com/3f72xvy.

HI Steve, thank you for sharing the article! It gives me a deeper knowledge on ARISSat-1's power system. I saw the information about the failure modes of the battery you are talking about.
Have you also used any of the traditional failure analysis tools such as FMEA, Fault Tree Analysis to organize the actions required to build the recovery mechanism for different probable failure modes for all the sub-systems?

Hi Sanjib, I would say that we did not use these formal methods. Thinking about them, these too should be a lesson learned. It takes someone on the team with this kind of knowledge to help lead a volunteer team. Thanks for the suggestion!

Steve,
I started my career at Lockheed Missiles and Space and my job was satellite payload analysis as well as failure modes and recovery methods to ensure getting the data back.
It was less 'formal' (if I can use that word) than methods available today, but was nonetheless meticulous and involved knowing every circuit, its function, how it could fail, and what that failure would do to the rest of the systems.
Good for anyone who later in their career had to debug of any type of system.
Keep up the good work!

Again, a great series of articles! I remember trying to debug 15x15in pcbs in an array processor prototype with suspect boards on 15" extender boards. When running at full speed sometimes it would not work (had to run 1/2 clock speed - really glad we had designed in that!). Debug of completed systems is always a challenge but more so with compact systems!