5333 private links
How on Earth do you patch the software on a computer orbiting the Moon? Very carefully.
FRANK O’BRIEN - 1/30/2020, 12:30 PM
In the afternoon of January 31, 1971, the flight thundered away from the Kennedy Space Center on its Saturn V launch vehicle after only a brief 40 minute hold for weather. After restarting the S-IVB third stage for trans-lunar injection (TLI), the command module Kitty Hawk and her crew were on their way to the Moon. //
However, less than four hours before the scheduled landing, controllers noticed that according to the indications on their consoles in Mission Control, the LM's Abort pushbutton appeared to have been pressed. When asked via radio, Shepard confirmed that no one on board Antares had pressed the Abort button—which meant there was a short-circuit or other electrical issue somewhere inside the LM's complicated guts.
This was potentially a mission-ending problem: if the button was pressed and the engine was firing, the LM would immediately begin its abort procedure as soon as the lunar descent started, making a landing impossible.
Under hard time pressure, the ground had to quickly figure out what was wrong and devise a workaround. What they came up with was the most brilliant computer hack of the entire Apollo program, and possibly in the entire history of electronic computing.
To explain exactly what the hack was, how it functioned, and the issues facing the developers during its creation, we need to dig deep into how the Apollo Guidance Computer worked. Hold onto your hats, Ars readers—we're going in. //
Once again the LM’s orbit carried it behind the Moon and out of communications, leaving the crew with just a smattering of procedures and few options. The normal work of finishing the system configurations continued, and the crew maneuvered to the descent attitude, tidied up the cabin, and put on their helmets and gloves. In the meantime, Don Eyles’ team was feverishly working to find a better solution to the Abort bit issue.
Working the problem involved unraveling a complex, daisy-chained series of events. The main landing program, P63, does not perform all of the landing computations itself. Rather, it orchestrates a large number of Jobs and Waitlist Tasks, each performing a necessary part of the effort. Another Job running concurrently was the SERVICER, which sampled attitudes and accelerations that fed into the guidance equations. SERVICER, in turn, scheduled Routine R11 as a Waitlist Task, running every 0.25 seconds. R11 first checked whether aborts are enabled (via the LETABBIT flag), and if so, it then checked the status of the Abort bit. With aborts allowed, and the abort signal set (presumably because the crew pressed the Abort pushbutton), P63 is terminated, the AGC's Major Mode switches to P70, and the abort process begins. //
This was the breakthrough. If R11 could be spoofed into believing that an abort was already in progress, then it didn’t matter if the Abort button was pressed or not—the button's state would be ignored.
But how did R11 actually inform itself about whether or not an abort was executing? The answer was in plain sight on the DSKY: The Major Mode display, under the label “PROG”. //
In less than two minutes after the descent to the Moon had started, the Abort pushbutton had been successfully disabled and the computer was happily managing the descent. All indications were that the next lunar landing would be successfully accomplished in eight more minutes. //
As Antares passed through 32,000 feet (about 9,700 meters), Mitchell became concerned and informed controllers that the radar hadn’t locked on. Houston replied with a suggestion to pull the circuit breaker for the radar, and then power the system back on, which did the trick. Solid radar data began flowing into the computer, and the crew quickly agreed to accept it. Just a few minutes later, Shepard made a smooth and on-target touchdown at the Fra Mauro highlands.
After the mission, when asked if he would have attempted to land without the radar, the notoriously hard-charging Shepard reportedly replied, “You’ll never know.” In Gene Kranz’s Failure is Not an Option autobio, Kranz recounts that Flight Director Jerry Griffin was convinced that Shepard would indeed make an attempt to land without radar, and would just as certainly have had to abort when fuel ran out. //
The idea that a single errant switch could derail a lunar landing attempt was unacceptable. After the mission, a new variable in the AGC code was introduced that allowed the crew to "mask out" (that is, to ignore) the Abort and Abort Stage pushbuttons. The scenario assumed that a failing switch would be recognized well before the descent began, and commands could be entered in time to prevent an inadvertent abort. Like the fix used for Apollo 14, this would make initiating an abort through a pushbutton impossible, and any urgent situation would have to be performed on the Abort Guidance System. //
The recovery from Apollo 14’s Abort switch failure can only be described as brilliant and heroic. But the most important enabler of this effort was that the software, while fiendishly complex, could be understood by a small team of developers. Modern hardware and software, with its extensive protection schemes, virtualization and dynamic program management simply would make such a simple hack impossible. Faced with a comparable problem today, even if the fix were trivial, the solution likely would require large amounts of code to be recompiled, tested and uploaded to the spacecraft. This may not be possible given the short timeframe necessary to save the mission.
In the end, Apollo 14’s fix truly represented the “Spirit of Apollo," where talented teams made the impossible happen.