Back to main menu
Site hosted by Angelfire.com: Build your free website today!



TALES FROM THE LUNAR MODULE GUIDANCE COMPUTER






On Internet, one may find a report which has been written by a member of the develoment of the Apollo computer (AGC).
Here is the link toward it:

Link to "Tales from the lunar module guidance computer".


This report is supposed to relate the history of the development of the Apollo computer.
A nιophyte, without technological knowledge, might get fooled and think they are relating something normal, just adding a little humor.
But for someone who has the required technological knowledge (which happens to be my case), it's really funny to read, full of scientific absurdities and incongruities, and that shows that the Apollo computer is in fact nothing but a farce.
I have translated this report, and I have added comments written in white italic, that I have imbricated in the text.
These comments aim to show the absurdity which is lying in this report.




ABSTRACT: The Apollo 11 mission succeeded in landing on the moon despite two computer-related problems that affected the Lunar Module during the powered descent. An uncorrected problem in the rendezvous radar interface stole approximately 13% of the computer's duty cycle, resulting in five program alarms and software restarts. In a less well-known problem, caused by erroneous data, the thrust of the LM's descent engine fluctuated wildly because the throttle control algorithm was only marginally stable. The explanation of these problems provides an opportunity to describe the operating system of the Apollo flight computers and the lunar landing guidance software.

Translate: Despite an erratic behavior of the onboard computer, the astronauts managed to avoid the crash of the LM by taking the manual control of commands.



Figure 1: The Lunar Module

LM-1, also known as Apollo 5, was a 6-hour unmanned mission in earth orbit for the Lunar Module (LM) only. The date was January 22, 1968. For those of us who developed the onboard software for the LM Guidance Computer (LGC) it was our first flight. An event that had once seemed impossibly distant was now upon us.

The mission included two firings of the LM's Descent Propulsion System (DPS). For the second "burn" Allan Klumpp, who designed the lunar landing guidance equations based on work by George Cherry, had devised an earth-orbit version of the lunar landing guidance. It had three parts, meant to simulate the "braking" phase, "visibility" phase, and final landing phase of a real descent. But first there was a burn meant to simulate the descent orbit insertion maneuver that preceded the landing. This was to be the first in-flight firing of the LM's descent engine, lasting about 38 seconds.

The LGC was in P41, the program for the first DPS burn. The LM had maneuvered to the burn attitude. The computer counted down to ignition. At thirty seconds a "task" called READACCS was executed for the first time. It read the accelerometers in the spacecraft's inertial measurement unit, scheduled a "job" called SERVICER to run immediately, and then scheduled itself to run again two seconds later. Having been initialized with state vectors from the onboard orbital integration software, SERVICER's "average-G" navigation equations began to use accelerometer data to update the position and velocity vectors. READACCS and SERVICER would repeat every two seconds throughout the powered-flight phase. Seven and a half seconds before ignition an "ullage" burn of the Reaction Control System (RCS) jets began, to settle the propellant in the DPS tanks. We leaned closer to the squawk box that connected us to mission control in Houston.

We heard "Engine on"... several seconds passed... "Engine off".

Soon we understood what had happened. A small piece of code in SERVICER called the "delta-V monitor" had concluded that the engine had failed and sent an engine-off command. But why? To give the engine time to come up to thrust, the delta-V monitor always waited some period of time after engine-on before it began to monitor the engine. But this time, at the end of the grace period the engine was still not producing enough thrust to satisfy the monitor's thrust criterion.


So, because the engine was detected as not giving enough thrust, the good solution was to switch it off...so that it was not producing thrust at all!


Published accounts have attributed the slow DPS thrust buildup to the fact that the LM's tanks were only partially pressurized. The author's investigations show that the problem was elsewhere. For the DPS fuel system, the normal procedure was to open the valve that allowed fuel to enter the propellant manifold at the time the engine was armed, several seconds before ignition. But on LM-1 the control valve that regulated the passage of fuel from the manifold into the engine was suspected of being leaky. To prevent the possible, premature entry of hypergolic propellant into the engine (which could have had explosive consequences) the decision was made, shortly before flight, to delay arming the engine until the time of ignition.

The fact of delaying the arming of the engine was not removing the problem of the leak; may be the engine would not explode at the time it was ignited, but it still might explode later.
And if they were suspecting a leak, why not more simply fix this leak?
Generally, when an accident happens from a leak, the leak is not known before the accident.
It's only after the accident, that an investigation can show that there may have been a leak.



The engine was slow to start not because the tanks were less pressurized, but because the propellant had further to travel to reach the engine. It would have been easy for us to adjust the parameter that controlled how long the delta-V monitor waited before testing the engine — but nobody told us.

That's a lame excuse because they could have checked how long the propellant had to travel to reach the engine and adjust the parameter accordingly.
Next time a production has a problem at a customer, I'll tell him: I could have saved it by adjusting a parameter but nobody told me...we'll see how he reacts!



Houston sent a signal to turn off the onboard computer. The main objectives of the LM-1 mission were achieved under ground control. We who programmed the LM's computer hung our heads in disappointment, and endured a public reaction that did not distinguish between a "computer error" and a mistake in the data. Yet, this was not the last time that a seemingly innocuous parameter, relating to the performance of the descent engine, would come perilously close to ruining a mission.

A "computer error" is when the computer blows himself up alone without reason, and a "mistake in the data" is when the computer blows himself up because he misinterpreted a data, and "nobody told him".



* * *

The job of designing the guidance system for the Apollo spacecraft had fallen to the MIT Instrumentation Laboratory in Cambridge, Massachusetts . Under the leadership of its founder "Doc" Charles Stark Draper, the Lab had since 1939 played the preeminent role in perfecting inertial guidance systems. Our contract to design and program the Apollo Primary Guidance Navigation and Control System (PGNCS, pronounced "pings") was the first Apollo contract signed. Doc had volunteered to fly the mission himself.

(In 1970 the Instrumentation Laboratory was renamed the Charles Stark Draper Laboratory, and in 1973 became independent from MIT, although the two institutions remain linked. The Draper Laboratory is still deeply involved in NASA's manned spaceflight programs.)

The flight computer program for LM-1 was called SUNBURST. By the time LM-1 flew we were already working on SUNDANCE, the program that would fly the earth-orbital Apollo 9 mission. SUNDANCE in turn evolved into LUMINARY, the program for Apollo 10 and the lunar landing missions. It was LUMINARY revision 99 that flew the Apollo 11 mission in July, 1969. Revision 116 flew Apollo 12 in December, and so on.

(This paper follows nomenclature used during the Apollo Program. Program names, and the names of tags and variables within programs, were usually written in upper case.)

Informally, the programs were called "ropes" because of the durable form of read-only memory into which they were transformed for flight, which resembled a rope of woven copper wire. For the lunar missions, 36K words of "fixed" (read-only) memory, each word consisting of 15 bits plus a parity bit, were available for the program. In addition there were 2K words of artfully timeshared "erasable" or RAM memory. Allowing for the identical Apollo guidance computer (AGC) in the Command Module (CM), containing a program called COLOSSUS, it is correct to say that we landed on the moon with 152 Kbytes of computer memory.




Figure 2: Apollo LM Primary Guidance and Navigation System (PGNS)


The AGC was packaged in a sturdy, sealed, aluminum-magnesium box, anodized in a gold color, that measured about six inches, by one foot, by two feet, weighed 70 pounds and consumed about 55 watts. Its logic was made up of 5600 3-input NOR gates packaged two-each in flat-pack integrated circuits. Eldon Hall, the machine's principal designer, has related the bold decision to use integrated circuit technology for this computer despite its immaturity in the early 1960's.

The LGC (with related equipment) was mounted behind the astronauts at the back of the LM cabin. In front of the astronauts was a rigid structure called the "Nav Base" that held an alignment telescope and the Inertial Measurement Unit (IMU) in a fixed geometrical relationship. The computer's Display and Keyboard Unit (DSKY) was mounted like a desk between the two astronauts. Figure 2 illustrates the components and high-level interfaces of the LM's primary guidance system.

The IMU, packaged in a spherical case about a foot in diameter, was the heart of the guidance system. The heart of the IMU itself, enclosed by three nested gimbals, was the "stable member" — a small platform containing three accurate gyroscopes and three accelerometers — that could be "aligned" to an inertial orientation. Any deviation from the inertial alignment would be sensed by the gyros, and the gimbals would move to correct, all happening with such precision that no matter what attitude (orientation) the spacecraft took (almost), the stable member deep inside provided a steady attitude reference. A matrix called REFSMMAT expressed the stable-member alignment with respect to the reference inertial frame. The accelerometers were there to count velocity increments during powered flight in the coordinate system of the stable member.



Figure 3: Lunar Module Display and Keyboard Unit (DSKY)

The DSKY (Figure 3) was the principal man-machine interface for the LGC. For display it provided three signed five-digit registers for general-purpose use, three two-digit registers to indicate the current phase (a number between 63 and 68 for the lunar landing), and the current "verb" and "noun". Verbs and nouns provided a primitive language for communication between the crew and the computer. Phases and verb/noun combinations were determined by the software in some cases, and in other cases were entered by the crew on a keyboard of 19 keys. The contents of the three general-purpose registers depended on the current verb and noun. The DSKY also contained an array of indicator lights that were under the control of the computer, and a computer-activity light that lit when the LGC was not in its idle state.

The AGCs in the LM and CM were programmed in two languages. The one we called "Basic", but more properly "Yul", was an assembler language of about 40 operations, authored by Hugh Blair-Smith. "Interpretive" was a list-processing interpretive language (essentially a set of subroutines) designed to facilitate guidance and navigation calculations involving double precision (30-bit fixed-point) vectors and matrices — at the cost of being very slow. The Interpreter was written by Charles Muntz.

The memory-cycle time for the AGC was 11.7 microseconds. A single-precision addition in the assembler language took two memory cycles. A double-precision vector cross-product programmed in Interpretive took about 5 milliseconds. One of the challenges in programming the AGC was juggling the two languages to obtain the best blend of speed and compactness for the given situation.

The computer programs for Apollo were still small enough to fit into one listing — typically six inches thick on 11x15 inch fan-fold paper. The listing included symbol tables that allowed threads to be traced. With a single listing we always knew that the answer was there, when we had a bug to deal with, but it might be devilish to find.

I like it: The computer program was still small enough to fit into one listing...Of several thousand pages!!! (Figure 4)...Some listing, LOL!



Figure 4: Listing of LM Computer Program LUMINARY 131


With respect to units, the LGC was eclectic. Inside the computer we used metric units, at least in the case of powered-flight navigation and guidance. At the operational level NASA, and especially the astronauts, preferred English units. This meant that before being displayed, altitude and altitude-rate (for example) were calculated from the metric state vector maintained by navigation, and then were converted to feet and ft/sec. It would have felt weird to speak of spacecraft altitude in meters, and both thrust and mass were commonly expressed in pounds. Because part of the point of this paper is to show how things were called in this era of spaceflight, I shall usually express quantities in the units that it would have felt natural to use at the time.

So, the Apollo computer already had much difficulty performing the tasks it had to do in the given time (at the point that it sometimes needed a restart), but it still had to convert units for lazy astronauts who couldn't make the effort of working with the metric system!

* * *

By now the area on the second floor of 75 Cambridge Parkway where we monitored missions had been moved to a larger space, but on July 20, 1969 the room was crowded despite efforts to keep it clear for those of us who were most involved in this phase of the mission. We listened to a squawk box in a nondescript classroom, while a quarter of a million miles away a manned spacecraft emerged from behind the moon and approached its orbital low-point (perilune) of about 50, 000 feet above the cratered surface, where the lunar landing burn would begin.

The crew keyed in Verb 37 Noun 63 to select P63, the phase that controlled the preparations for Powered Descent Initiation (PDI) and stayed in control until the burn achieved its first set of targets. The computer processed an algorithm to compute the exact time for ignition and the attitude the LM should be in at that time. Next the spacecraft maneuvered to that orientation. At the time of ignition the engine bell would be pointed almost dead ahead, directly opposing the spacecraft's orbital velocity.

Now the computer issued code 500. It thought the landing radar antenna was in the wrong position. The crew saw that the relevant switches were already in the right positions, but they cycled them anyway and the warning cleared. This had no connection with the events that would follow, but it nourished our suspicion of "discretes", those signals that told the computer some fact like the position of a switch or an antenna — but sometimes lied.

the computer thought...but was not sure! And the crew had to put switches one a wrong position to clear the alarm!
If the computer works with "lying" signals, what use is it? Isn't it supposed to bring reliability?




Control passed to BURNBABY — the master ignition routine that we wrote after LM-1 to save memory by exploiting the similarities among the powered flight phases in the period leading up to ignition. Verb 06 Noun 62 appeared on the DSKY. The middle register contained a time in minutes and seconds that began to count down toward light-up. At 35 seconds the display went blank, and at 30 seconds reappeared. This was a signal that Average-G had started. At seven and a half seconds, the ullage burn began. At five seconds, the display flashed to request a "go" from the crew. Buzz Aldrin, the LM Pilot, standing on the right side of the cockpit, had the main responsibility for working the DSKY. Now he keyed PROCEED.

At Mission Elapsed Time (MET) 102:33:05 self-igniting propellants came together in the descent engine and it lit up at 10% throttle. Armstrong did not even feel the gentle push — less than 1/25 G. The display changed to Noun 63 and the three display registers now showed a total velocity of 5559.7 ft/sec, an altitude-rate of Π2.2 ft/sec, and an altitude of 49971 feet. The gimbals that pivoted the descent engine moved to align the thrust vector with the spacecraft's center of mass. Then, 26 seconds into the burn, the software throttled-up the DPS to its maximum thrust of 9870 pounds (43, 900 newtons), 94% of the engine's official rating of 10500 pounds, and at the same time enabled the descent guidance.

P63 was called the braking phase because its only purpose was to shed horizontal velocity. It would end in about eight minutes when the spacecraft reached target conditions known as "high gate" at about 7400 feet altitude. Figure 5 illustrates the phases of the lunar landing.



Figure 5: Phases of the Lunar Landing (Numbers Approximate)


At MET 102:36:55 Neil Armstrong, the Commander, standing on the left side of the LM cockpit, used his joystick to spin the spacecraft about its thrust axis so that the windows, which had allowed the astronauts to look down at the surface while hurtling forward feet first, would point out into space, where Earth was visible. But the spacecraft was rotating too slowly. Armstrong realized the autopilot rate switch was at 5 deg/sec and switched it to 25. Just before the maneuver was complete the landing radar signaled "data good".

If it is an "autopilot", why does it need to be adjusted? Isn't it supposed to adjust itself?
The landing radar signals that the data is "good", but what does that mean, which data? The radar just gives an indication of the proximity of the ground.




It was not possible to navigate so accurately as to touch down safely on the lunar surface with no local knowledge of its relative distance or velocity. The landing radar provided this information. Despite the "reasonability check" performed by the software, radar data could not be incorporated into the state vector without crew (and ground) approval. So about five minutes into the burn Aldrin keyed in Verb 16 Noun 68 — a request to monitor a noun whose third register showed the difference between the altitude sensed by the radar and the computed altitude. This number, called DELTAH, was about -2900 feet. This was within the range of expected altitude error. The radar data could gradually be folded into navigation without adversely affecting the shape of the trajectory.

The radar data cannot be integrated into the guidance system without the crew approval? And what if the crew forgets to incorporate it?
And if the radar data has no influence on the trajectory, what use is it?




Then we heard the words "program alarm". In Cambridge we looked at each other. Onboard, Aldrin saw the PROG light go on and the display switch back to Verb 06 Noun 63. He quickly keyed in Verb 90 Noun 50. Alarm code 1202 appeared on the DSKY. This was an alarm issued when the computer was overloaded — when it had more work to do than it had time for. In Cambridge the word went around, "Executive alarm, no core sets". Then Armstrong said, with an edge, "Give us a reading on the 1202 program alarm".

Wow, Aldrin quickly reacts by typing a given verb of a given noun; how did he know he had to type this given verb of the given noun when faced to this situation. Did he have to learn all the verbs of all nouns to type in reaction to verbs of nouns of various situations?
Poor computer becomes overloaded, it does not not how to go on; help me, please, I'm overloaded!




From here events moved very quickly, too fast for us to have any input from Cambridge. It was up to Mission Control in Houston. The story of what happened there has often been told — how it fell to a 26-year-old mission control guidance officer named Steve Bales to say "go" or "abort". Bales had participated in a recent review of LGC alarms that had deemed 1202 a "go" unless it occurred too often or the trajectory deviated. He was supported by Jack Garman of NASA and Russ Larson of MIT in the back room. Garman said, "go". Larson gave a thumbs-up. (He later said he was too scared to form words.) So Bales answered, "go", Flight Director Gene Krantz said "go", and capsule communicator Charlie Duke passed it up to the crew. At MIT, where we realized that something mysterious was draining time from the computer, we were barely breathing.

So now enters the hero, a young man aged 26, who is here to save the situation because at the MIT they are in total disarray!
He however needed the advice from his superiors...But he was on the chain of the "Go" he transmitted after he received the green light from the flight director!
Something mysterious was draining time from the computer...could it have been an UFO?



Half a minute elapsed between the alarm and the "go" from Houston. During that time mission control approved the DELTAH, and Aldrin keyed in Verb 57 to allow navigation to incorporate the landing radar measurements. Then he tried Verb 16 Noun 68 again and watched DELTAH decrease to 900 feet. Again a program alarm light. Again Verb 90 Noun 50 — 1202 alarm. Again "go" from the ground.

So, during a half minute, till the saving "go" came back from Houston, the computer was left inoperational as it was not knowing what to do, and had not been given any directive.
Meanwhile the lem was on its own; it was probably using the occasion to dance a gigue.
Next time there was a 1202 alarm, the "go" came back faster, because they were more trained.




Figure 6: Commanded (dotted line) Versus Actual Thrust
(solid line) During Powered Descent (Simulation Data)

At MET 102:39:31 the best possible confidence builder occurred — throttle down, right on time. "Ah! Throttle down... better than the simulator" commented Aldrin, "Throttle down on time!" exclaimed Armstrong, their excitement palpable. In the official transcript of communications between spacecraft and ground during the powered descent, these are the only exclamation points.

The transcripter was able to read the exclamation point in the spoken words of Armstrong!
What a talent!



The descent engine experienced excessive nozzle erosion if operated in the range between 65% and maximum thrust. Throttle down occurred when the thrust required by guidance sank to a level enough below that limit that a gradual increase through the end of the braking phase would not force a return to maximum (see Figure 6). Throttle down was a sensitive indicator of how well the guidance system was doing. It was also true that if the throttle stuck at maximum an abort might soon be necessary, because in about 40 seconds the guidance equations would command the spacecraft to invert.

It seems that the abort procιdure was the magical solution to solve all the problems which the AGC couldn't handle!


While the LM was still facing the lunar surface Armstrong had clocked landmarks that indicated the LM was further downrange than desired. He realized now that the computer did not know the lander was going long. Otherwise the engine would have stayed at maximum thrust for longer as guidance tried to stop short.

The AGC was often working on the base of wrong data; so it was not its fault if it was taking wrong decisions, not it was not its fault, be gentle to it!


At MET 102:41:32, as the spacecraft passed through 7400 feet, sinking at 125 ft/sec, high gate was achieved. Guidance began using a new set of targets. The LM pitched forward so that the lunar surface was visible ahead. On the DSKY the mode register changed to 64 indicating the Visibility Phase, and Noun 64 replaced Noun 63. Two two-digit numbers replaced velocity in the top register. One was a "landing point designator" (LPD) angle that indicated where Armstrong should look along a reticle attached to his window to see where the LM would touch down if it were allowed to land automatically. The guidance system controlled yaw to keep the landing site along the line of the reticle. The crew could move a hand controller to shift the site. (Armstrong had stated before the flight that he planned not to use this capability, but there was apparently one inadvertant redesignation late in the visibility phase.) The second number gave the time remaining during which a redesignation could be input. With the redesignation logic now engaged, this was the busiest period of the landing.

So, on the display, the velocity was replaced with two numbers, one being an angle, and the other one a time!
The astronauts had better know when it was changing; imagine if they had thought that the angle/time was still the velocity!
(Or conversely that the display was already displaing the angle/time when it was still displaying the velocity).



At MET 102:42:17 a 1201 alarm occurred. It was another Executive alarm — "No VAC areas available". About 24 seconds later there was another 1202. Just 16 seconds later, with the lander at 770 feet with a sink rate of 27 ft/sec, yet another 1202 occurred. Mission control in Houston called a "go" in each case. Neil Armstrong, whose heart rate rose from 120 to 150 during this period, put it this way:

Normally, in this time period, that is, from P64 onward, we'd be evaluating the landing site and starting LPD activity. However, the concern here was not with the landing area we were going into, but rather whether we could continue at all. Consequently, our attention was directed toward clearing the program alarms, keeping the machine flying, and assuring ourselves that control was adequate to continue without requiring an abort. Most of our attention was directed inside the cockpit during this time period and in my view this would account for our inability to study the landing site and final landing location during final descent.

So, amstrong was so much busy clearing the computer alarms that he couldn't properly choose the landing site!
It's a computer which was requiring much attention, it was in need of love



Nevertheless, Armstrong had time to notice that the LPD indicated "we were landing just short of a large rocky crater with very large rocks covering a high percentage of the surface". So at MET 102:43:08 (650 feet), after deciding that he could not stop short of the crater, Armstrong flipped the autopilot mode switch from AUTO to ATT HOLD to take manual control of the LM's attitude. He maneuvered to zero pitch to maintain horizontal velocity and skim over the rocky area.

Armstrong took the control of commands; I have the feeling that he was starting to lose his confidence in the ability of the AGC to safely land the LM. (some more "wrong" data, and it was the crash!)



(ATT HOLD meant the digital autopilot's Rate-Command Attitude-Hold mode, in which the astronaut could command an attitude rate by deflecting a joystick. After the stick was released the autopilot nulled rates to maintain the present attitude.)

At MET 102:43:20 (430 feet) Armstrong flicked a spring loaded toggle switch with his left hand, entering the rate-of-descent mode (P66). Now the computer controlled the spacecraft's thrust to maintain a rate-of-descent commanded by the ROD switch. A flick upward slowed the descent by one foot per second; a flick downward increased the descent rate by the same amount. Using the joystick, Armstrong tilted the LM to null out horizontal velocity and bring the LM to a safe area for touchdown. After some "possibly spastic" control motions because dust kicked up by the exhaust plume distorted his perception of translational velocity, at MET 102:45:40, Armstrong landed the spacecraft safely in the Sea of Tranquility.

Meanwhile, the AGC was still making computations to make the LM land as safely as possible...Nobody had told it the LM had landed!


* * *

Years before Apollo 11, when the guidance system was first being conceived, the onboard software was almost an afterthought — "Hal will take care of it" was the sentiment. In fact it ended up taking scores of people, with hundreds more in support, but to Hal Laning, in the early days, fell the job of figuring out how to organize the numerous software functions that must go on almost simultaneously in a real-time spacecraft control computer — in this case one of limited size and speed.

Hal's design avoided the pitfalls of a "boxcar" executive, in which the computations must be divided up explicitly between time slices. A boxcar executive is painful to implement because computations must be broken up arbitrarily. During development the allocation may need to be revised whenever any of its parts is modified or new functions are added. Worst of all, a boxcar executive is a brittle system during operation. It breaks down completely as soon as any function takes longer than the time it is allocated.

Instead, Laning envisioned a system in which software functions were allocated among various "jobs" that could be of any size and shape, as determined by the nature of their function. Each job was assigned a priority. The operating system always executed the job with the highest priority. Thus, if a low-priority job was executing and a high-priority job was scheduled, the low-priority job was suspended while the higher-priority job executed. This system gave the illusion that jobs ran simultaneously, although of course they merely took turns. Such a system was not deterministic in the sense that what executed when could be determined a priori, but its operation could be sufficiently understood and verified that in sum it enhanced reliability, safety, flexibility of use, and especially ease of development.

Wow, they have revolutionized real time by giving to the AGC functionalities...you ordinarily find on real time systems!
Now, I wonder how they could make the AGC run in a real time environment:
There aren't even functions to start processes in the documentation of the AGC, nor any synchronization function that a real time system needs.
A real time systems needs a real time kernel, which monitors the processes, that I fail to see in the AGC.
Furthermore, a real time system cannot work without a stack; a stack is vital to save the data of the various processes when they interrupt each other.
The AGC doesn't have a stack; a subroutine cannot even call another one, for the return address is saved into an unique register (and moreover the contents of the return address is also saved, which is aberrant).
It is obvious that the AGC is absolutely not fit to run in a real time environment.
It can only run sequentially, with interrupts occurring either on a periodical scheme, or on the initiation of hardware signals.



In such a design the Executive function that orchestrated the execution of jobs had to provide each job with a set of registers in which its status could be saved if it was suspended during the execution of a higher priority job. The LGC contained an array of eight such "core sets" of 12 registers each, each register having 15 bits. A core set of this size was sufficient for many jobs, but jobs that used the Interpretive language to do vector and matrix computations required more space. For such jobs an additional area of 43 registers was allocated for the storage of intermediate results. There were five such "Vector Accumulator (VAC) areas" in the LGC.

Allocating a given core set, or a "VAC" area to jobs doesn't make sense, because a job will not necessarily use all the data of the core set or of the VAC area; the data which is not used will be lost for another job.
On the other hand, a job may need more data than what is provided by a core set or a VAC area.
The data should be allocated dynamically by the compiler to jobs, so that they just use what they need, and no more, with no waste of unused data (especially when the AGC is limited in memory, not like modern computers which can afford to waste data).




With a limited number of core sets and VAC areas, the allocation of functions to jobs had to be done thoughtfully. Functions that had a sequential relationship with each other were grouped into the same job. Thus the large SERVICER job that was active during the lunar landing (and other powered flight modes) first performed average-G navigation, then guidance equations, then throttle and attitude output, and then the updating of displays — each part using the outputs of the ones preceding.


A function could work on several jobs, with the data of each of these jobs (and some data of the jobs could be common); so why encase it in a single job?



The availability of core sets and VAC areas limited the number of jobs that could be in the queue at any time to eight, of which up to five could require VAC areas. In normal steady-state operation, the number of jobs executed equaled the number being scheduled and therefore the average usage of core sets and VAC areas was more of less steady, although jobs that occurred on a one-shot or asynchronous basis might cause the usage to fluctuate.

Limiting the number of jobs just by the availability of the available core sets and VAC areas doesn't make much sense; what would really limit the number of jobs is the time in which they need to be processed, and the size of the stack if the AGC had one...but it has none!


However, if more jobs were being scheduled than were being finished, the number of core sets and VAC areas in use must rise. If the debit continued long enough, the resources would be exhausted. The next job request could not be fulfilled.

Cut to a time about a year before Apollo 11, when we software engineers, who thought we already had enough to do, were requested to write the lunar landing software in such a way that the computer could literally be turned off and back on without interrupting the landing or any other vital maneuver! This was called "restart protection". Other factors than power transients also caused restarts. A restart was triggered if the hardware thought the software was in an endless loop, or if there were a parity failure when reading fixed memory, or for several other reasons.

How could the hardware know that the software had entered an endless loop?
And if there was a parity failure on fixed memory, it's not the fact of restarting the computer which would fix the problem!
Unless the power was switched off and on again, and the program reloaded again, but, in that time, it was done from a magnetic tape, and it was a quite long process; meanwhile the AGC would have remained inoperational, and the LM would have remained without guidance.



Restart protection was done by registering waypoints at suitable points during the operation of the software such that if processing happened to jump back to the last waypoint, no error would be introduced, as in the following example:

NEW_X = X + 1
register waypoint
X = NEW_X

It is evident that without the waypoint, going through this code a second time would cause X to be incremented twice.

These "register waypoints" are absolutely ridiculous and absurd; they are extremely inconvenient to use; what if the job is in a state that, when it has to restart, it has to do again the instruction which is after the waypoint?
Typically, it's rather the program counters of the jobs, and essential data which would be saved.
And how could these waypoints be registered? Another intelligence would ne needed to save them.


Following a restart, such computations could be reconstructed. For each job, processing would commence at the last registered waypoint. If multiple copies of the same job were in the queue, only the most recent was restarted. Certain other computations that were not considered vital were not restart-protected. These would simply disappear if there were a restart.

It's more logically at the location of the saved program counter that execution should resume; if the program restarts from the last registered waypoint, then the data of the job should also be reset at the state they were when the job executed the last registered waypoint!
Multiple copies of the same job in queue? This makes no sense; the AGC is not able to handle multiple copies of the same job.



Restart protection worked very well. On the control panel of our real-time "hybrid" simulator in Cambridge was a pushbutton that caused the AGC to restart. During simulations we sometimes pushed the button randomly, almost hoping for a failure that might lead us to one more bug. Invariably, once we got the restart protection working, operation continued seamlessly.


(The hybrid simulator combined SDS 9300 digital and Beckmann analog computers with a real AGC and realistic LM and CM cockpits.)

Restart protection was prompted by the possibility that the hardware could cause a restart, but the software could also initiate a restart if it reached a point where it did not know how to continue. This was done by transferring control to the tag BAILOUT in the Alarms and Aborts software. An error code accompanied this call.

It seems that the AGC was systematically resorting to restarting as a common way of overcoming a problem it was incapable to handle.
Anyway, for the use it was, it was not mattering a lot; it doesn't seem that the astronauts were putting much confidence into it!



This was the action taken by the Executive program if its resources were exceeded. If a job could not be scheduled because no "core sets" were available, the Executive called BAILOUT with alarm code 1202. If no "VAC areas" were available, BAILOUT was called with alarm code 1201.

In order to display the error code, the display job had to be called; but how could it start without resource as SERVICER had drained them all?


Not all the functions executed in the LGC were "jobs". There was also a system of hardware interrupts, which could break in at any point (when not explicitly inhibited) to perform high priority functions. Interrupts were dedicated to particular functions including the digital autopilot, uplink and downlink, and keyboard operation.

Another interrupt could be used to execute any piece of code that had to be executed at a given time. Such functions, called "tasks", were scheduled by calling a subroutine called WAITLIST. A task had to be of very short duration.

Whereas jobs were scheduled to execute immediately at a given priority, tasks were scheduled to run at a given time. Tasks and jobs were often used together. A task might be scheduled to capture sensor data that needed to be read at a definite time, and the task in turn might schedule a job at an appropriate priority to perform processing based on the measurement.


But how could an interrupt schedule a job? No function in the AGC exists to schedule a job!
What the interrupt might have done is updating a data from a sensor which was used by a running job.



When Hal Laning designed the Executive and Waitlist system in the mid 1960's, he made it up from whole cloth with no examples to guide him. The design is still valid today. The allocation of functions among a sensible number of asynchronous processes, under control of a rate- and priority-driven preemptive executive, still represents the state of the art in real-time GN&C computers for spacecraft.

It still represents the state of the art of fanciful Apollo computers, that's for sure...but with no successors!


* * *

To understand the root cause of the alarms on Apollo 11 during the powered descent, one must first look ahead to the rendezvous with the Command Module that followed the LM's ascent to lunar orbit. Just as it needed the landing radar to measure altitude and velocity with respect to the lunar surface during the landing, the LM, as the active vehicle during rendezvous with the CM in lunar orbit, needed the rendezvous radar (RR) to measure the range, range-rate, and direction of the other spacecraft.

The RR had several modes of operation, determined by the setting of its mode switch. As flown on Apollo 11, the available RR modes were SLEW, AUTO, and LGC. In SLEW and AUTO modes the radar operated under the control of the crew, independently of the LGC. This was the method that would be used during ascent and rendezvous if the primary guidance system failed. In SLEW mode the rendezvous radar antenna could be steered manually, but otherwise was stationary. Once the antenna was pointed near the target, the AUTO (automatic tracking) mode could be used to acquire and track the target. In these cases the RR range and range-rate, and the shaft and trunnion angles that defined where the RR antenna was pointing, were made available for display on cockpit cross-pointers and tape meters. Range and range-rate were also made available to the abort guidance system (AGS), a computer with only 6144 words of memory that was provided by TRW as a backup for use if the PGNS failed during lunar descent or ascent.

(The naming of the three rendezvous radar modes has been a source of confusion for some commentators. Based on crew input the designations were changed between LM-1 and the lunar landing missions. The mode called LGC on Apollo 11 was formerly called AUTO. The mode called AUTO on Apollo 11 was formerly MANUAL. SLEW was unchanged.)

That's absolutely hilarious: The AUTO mode on Apollo II was formerly corresponding to a manual mode!
It must explain why, even in the AUTO mode, the astronauts still had to make manual corrections to prevent it from putting the LM into a dangerous situation!



If the PGNS was healthy (as it always was) the radar was controlled by the LGC, and in this case the RR mode switch was set to LGC. The RR interface electronics made available to the software the target range and range-rate measured by the radar, and the angles of the RR antenna's shaft and trunnion, from which the direction to the target could be determined. Programs running in the LGC used this information to guide the LM to a favorable rendezvous.

It turned out that the rendezvous radar could also be operated during the powered descent, and this was done during Apollo 11. Crew procedures called for the RR to be switched on just before P63 was selected, and to be kept in SLEW or AUTO mode throughout the landing maneuver.

Many explanations have been offered for why the RR was configured in this way for the lunar landing. For example, a fanciful scheme for monitoring the landing by comparing RR data to a chart of expected readings may have been considered by some people in Houston. However, a simpler explanation is sufficient to explain the facts: The RR was on for no other purpose than to be warmed up if there were an abort, and it was in AUTO (while the LM was in a position to track the CM) or in SLEW (at other times), simply to keep the antenna from moving uselessly.



Figure 7: Interfaces Among PGNS, ATCA and the Rendezvous Radar

The diagram of figure 7 is absolutely hilarious, totally demential!
First, we can see that AUTO and SLEW do exactly the same thing.
Then all the LGC position does is just connecting the RR switch to another reference of the same voltage of 28 volts oscillating at the same frequency of 800 Hz (which are completely fanciful values).
- The angle resolver (circled in blue) produces angle signals with are converted into angle pulses in the coupling data unit (circled in green); but these angle pulses are provided according two possible rates, 6.4kpps (or 6400 pps) or 400 pps; why two rates? if the CDU can produce the pulses at 6400pps, then it must always produce them at this rate, for it is the interest of the LGC to acquire them as fast as possible; the other rate is sixteen times slower than the first one, so why offering the possibility of using it (and how can the LGC know at what rate the pulses are produced).
- the LGC provides a carrier at 1.024MHZ; this signal goes into the block "PCM and timing electronics", into which it is modulated with digital data; normally the block "PCM and timing electronics" should output a signal of the same frequency (but modulated with digital data); instead of that it outputs a signal which has a frequency of 1.6kpps (equivalent to 1.6Khz), that is a signal which has a frequency almost one thousand smaller than the input frequency!
This diagram is completely absurd, and it is obvious that it fulfills no function at all; it's a mad doctor's diagram.



The problem has also been attributed (including by the author previously) to a "checklist error". This formulation is no more accurate than calling the delta-V monitor's premature shutdown of the engine on LM-1 a "computer error", when it was actually caused by faulty documentation. In fact, the RR switch settings on Apollo 11 should not have caused any problem. That they did so can be traced to another case of... faulty documentation.

Faulty documentation?
How is it possible that on a project so big as the Apollo project, the documentation would be made with so much carelessness?



Years previously, an interface control document (ICD) had been written to define the electrical interface between the PGNS and an electronic assembly called the attitude and translation control assembly (ATCA) that was provided by Grumman Aerospace, the builder of the Moon lander. The ICD specified that the 28-volt 800-Hz voltages in the two systems be "frequency locked", but did not say, "phase synchronized". As built, the two voltages were locked in frequency by a "frequency sync" signal sent by the LGC. They were also locked into a constant phase relationship. However, the phase angle between the two signals was completely random, depending on the instant at which the LGC, which was always powered up after the ATCA, began sending the first frequency sync signal. Thse interfaces are pictured in Figure 7.

No, it's impossible; if the signals were built to be at the same frequency with a constant phase angle, this phase angle could not depend on the moment the LGC was powered up.
The electronic could be adjusted to dephase more or less the two signals, but, for a given adjustment, the two signals would always be dephased the same, whatever the moment the LGC was powered up.
One absurdity more!




The 800-Hz phasing problem was detected during launch site testing of LM-1 and documented — but it was never corrected. As a result, when the RR mode switch was in AUTO or SLEW, the shaft and trunnion resolvers were being excited by an 800-Hz signal from the ATCA that was very likely to be out of phase with the 800-Hz waveform used as a reference by the coupling data units (CDUs) whose job was to make sense of the resolver signals, and in turn increment (or decrement) the counters inside the computer that told the software how the antenna was pointed.

Another hilarious nonsense.
A problem which has been documented and never corrected!
The Apollo project seems to have been conducted with the greatest lack of serious: Problems documented and not corrected!



On Apollo 11, however, the CDUs were being asked to comprehend a contradiction. Because they were based on a separately controlled excitation voltage, the resolver signals as received by the CDUs indicated no known angle. The discomfiture of the CDUs was at its worst when the phase angle between the two 800-Hz waveforms was near 90 or 270 degrees — and Apollo 11 evidently hit one of these sweet spots. The response of the CDUs was to increment or decrement the counters in the LGC, nearly constantly, at the maximum rate of 6400 pulses per seconds for each angle. This phenomenon occurred whenever the RR mode was in SLEW or AUTO, regardless of whether the rendezvous radar itself was powered up.

So the AGC was counting angle pulses even if the radar was not connected!
It was doing computations upon erroneous data, and this was supposed to safely guide the LM???
What a joke!!!
If the same 28V reference had been used, there would have been no phasing problem at all, and there would have been no angle pulses when the radar was not connected.
They are creating problems which should not exist!



The CDU interface counters in the LGC were incremented or decremented by means of external commands that were processed inside the computer as increment or decrement operations with names like PINC and MINC. Like the LGC's programmable operations, these took time, in this case one memory cycle of 11.7 microseconds, each. Moving at their maximum rate, the RR CDU counters consumed approximately 15% of the available computation time. At the time, conservatively, we assumed the time drain (called TLOSS) was about 13%, which was consistent with the behavior that was observed.

The hardware pulses were counted by software instructions in the AGC!
The AGC was already lacking time to perform the tasks it had to do, so it was still losing more time by counting hardware signals which could have been counted by simple electronic devices!
Just by counting the hardware pulses with electronic circuits instead of counting them through software instructions, they could have spared many time cycles which would have reduced the load and could have avoided restarts for reasons of overload.



I am indebted to LM guidance systems expert George for his patient explanations of the rendezvous radar interface. Silver's role was pivotal during the Apollo 11 mission. He was at Cape Canaveral for the launch, then flew to Boston to get ready for an assignment to monitor the lunar ascent in Cambridge. On July 20 he watched the lunar landing at home on television. He heard the alarms, grasped that something was stealing CPU time, and remembered a case he had seen during LM-1 systems testing in which the rendezvous radar interface had caused wild counter activity. After some additional analysis by the team monitoring the mission in Cambridge, Silver finally got through to the MIT representatives in Houston, on the morning of July 21, less than one hour before lunar liftoff.


So George went to Houston to explain how the AGC program could have been modified if it was not presently on the moon!
(he arrived just in time for the MIT representatives to make a prayer for the astronauts)




I think that I must now give explanations on the way the guidance was done (or at least should be done).
When the LM is expelled out of the CM, it has a high horizontal speed, the same as the one of the CM.
This speed creates a centrifugal force which compensates the lunar attraction and prevents the LM from joining the moon.
The LM must lose its horizontal speed for two reasons:
- In order to decrease the centrifugal force, which will allow the lunar attraction to attract the LM toward the moon.
- And because, when the LM reaches the moon, it must have a null horizontal velocity.
So the LM starts with a horizontal orientation, and, with its main engine, creates a force which allows to decrease its horizontal velocity.
As its horizontal velocity is decreasing, so is the centrifugal force which opposes the lunar attraction; the LM starts falling toward the moon; it has a vertical acceleration which causes an ever increasing vertical velocity.
The LM must maneuver to null its vertical velocity when it reaches the moon, and it must also arrive vertical on the moon, and not horizontal.
So progressively, as the horizontal velocity is decreasing, the LM pivots from a horizontal position to a vertical position; this is not done all of a sudden but progressively, like it's shown on figure 5.
The LM creates a force which opposes the velocities and creates decelerations on the two axes.
The way the engine thrust is distributed on the two axes depends on the orientation (or attitude) of the LM, like it is shown on figure 14.


Figure 14:The way the thrust of the engine is distributed on the two axes.
Note: Horizontal and vertical mean nothing without a reference; in this case, "vertical" means perpendicular to the surface of the moon, and "horizontal" means tangential to the surface of the moon.


If the Lem is closer to horizontal than vertical, it's the horizontal force which will receive the greater part of the thrust, and vice versa.
So it is possible to obtain a given horizontal thrust and a given vertical thrust just by adjusting the couple /.
But how could the Lem be rotated to give it the desired orientation (or attitude)?
Very simply, on each side of the lem, at the places I have circled, you can see a pair of two lateral engines, one horizontal, and one vertical.
Why two?
Because the vertical lateral engines allow to rotate the Lem, and give it the desired attitude (and thus adjust the distribution of the thrust on the two axes).
And in what concerns the horizontal lateral engines, they allow to move the Lem laterally, and are used when the Lem is close to the moon at low speed to allow the crew to choose the appropriate landing site (and only them can do it this maneuver, for the lem has no eyes to see the terrain).
The horizontal thrust must be adjusted so to null the horizontal velocity when the Lem arrives on the moon, and likewise the vertical thrust must be adjusted to null the vertical velocity when the lem arrives on the moon.
So the computer computes the horizontal and vertical decelerations the Lem should have in oder to arrive with null horizontal and vertical velocities on the moon.
It compares it with the actual decelerations which are read on the accelerometers.
The aim is to make the computed desired decelerations reach the measured decelerations which are given by the accelerometers; by differentiating the desired values with the actually read values, the computer makes a correction on the thrust and the attitude which tries to make the next measured values closer to the desired values of decelerations.

I must know explain how the guidance of planes is usually made; Figure 15 explains how it works: The pilot wants to have a given tilt he gives to the computer; The computer continuously compares the desired tilt with the actual tilt measured with a sensor, and computes a command which tries to compensate the difference, and make the next measured tilt closer to the wanted one; this is done continuously at a given period (which does not need to be extremely fast): this system of guidance is extremely efficient; the measured tilt very quickly reaches the commanded one, and follows it with a great regularity, in spite of eventual perturbations.


Figure 15: How the guidance works on a plane.


The AGC works exactly the same: It compares the decelerations given by the accelerometers with the values it has computed to reach the moon with null velocities, knowing that these computed values depend on the current horizontal and vertical velocities of the lem, and also its altitude it continuously updates; from the difference between the expected values and the measured values, the computer computes a correction on the thrust and the attitude of the lem; for instance if the computed horizontal deceleration if greater than the currently read horizontal one, and that the computed vertical deceleration is also greater than the currently read vertical one, but that the difference between the vertical decelerations is greater than the difference between the horizontal decelerations, the computer knows that it must increase the thrust of the lem, and also change its attitude (by playing on the lateral vertical engines) to make it more vertical.
So step by step, the values read on the accelerometers are going to stick more and more with the expected values which allow to reach the moon with null velocities; it happens quite fast, and afterwards there are only little corrections which are made at each step, which allow the read values to ideally follow the expected ones: on the moon, unlike on the earth, there are no atmospheric perturbations, so the guidance should be quite smooth.
In fact, the velocities will not exactly be nulled just on the moon, but a little before, at a moderate altitude; this altitude allows the crew to view the moon surface, and to choose an appropriate landing site by playing on the horizontal lateral engines (the vertical lateral engines are only used to change the attitude of the lem in the previous phase).
Once they are on a spot (no hole, no rock) which suits them, the astronauts move the lem down at moderate speed, with a final deceleration to gently land it on the moon; the landing radar allows to ideally adjust the thrust for a smooth landing.
These explanations are necessary to make you understand why what follows is absurd.



* * *

The lunar landing was the busiest mission phase on Apollo. Landing guidance had to hit targets that were defined in position, velocity, acceleration (so the LM would stay right side up), and one dimension of "jerk" (the rate of change of acceleration). During the visibility phase the software permitted the crew to redesignate the landing site. The throttle had to be controlled continuously. Navigation had to incorporate landing radar measurements. (Figure 8 shows the typical duty-cycle profile between the selection of P63 and touchdown.)


Redesignate means nothing, because it suggests that the computer could have chosen a landing place by itself; the computer could in no way master the spot of the ideal landing place for it had no eyes to see the terrain; only the astronauts could do it; once the automatic guidance had reached a point a little above the moon with null velocities, the astronauts started maneuvering the lem with the lateral engines to bring it to an ideal landing place they could see with their eyes, and which was representing no danger for them.





Figure 8: Duty Cycle During Powered Descent (Simulation Data)


Even so, we had tried to make our programs fast enough to preserve some margin against TLOSS from an unknown source. The chief constraint was the two-second period that was built into the average-G navigation used during powered-flight. This was the frequency at which the READACCS task read the accelerometers and scheduled the big SERVICER job that used those readings as the starting point for a new round of navigation, guidance, throttle, attitude-command, and display. During the lunar descent, duty-cycle simply describes how much time was used in aggregate by jobs, tasks, and interrupts, during each 2-second period.

They had to preserve a margin against a loss of some unknown source, but they never cared about what was this "unknown" source which was stealing time from the CPU, and thus could endanger its working.
Besides, "unknown source" has no meaning on a computer: Anything which runs on a computer has been programmed by an engineer, and all the sources are thus known; if the engineers don't even remember what they have programmed, it's rather worrying!
Furthermore, as I explained it above, they had made things worse by making it perform a task it shouldn't have, that is counting angle pulses with software instructions.



During the braking phase, up to the time the landing radar locked onto the surface, the duty-cycle margin was over 15%. After the radar acquired, the extra computations involved in converting the body-referenced radar data to the navigation coordinate system lowered the margin to perhaps 13%. When a monitor display such as Verb 16 Noun 68 was added, the margin shrank again, to 10% or less. Buzz Aldrin was perceptive when he said after the second 1202 alarm, "It appears to come up when we have a 1668 up".

So the simple fact of adding a display makes the duty-cycle margin fall from 13% to 10%!
One can wonder about the performance of the AGC program!
And the astronauts seemed to have more work clearing the alarms than actually guiding the ship!



With a 10% margin and a 13% drain, the LGC simply did not have enough CPU time to perform all the functions that were required. Thanks to the flexibility of the Executive design — and quite unlike what would have happened with a boxcar structure — there was no collapse.

So they have drain from an "unknown" source they don't care about finding the origin, and they make the duty-cycle margin fall below this drain by adding a display, one can wonder how it could make this margin fall as much, and that was creating an overload situation which was causing a restart!
But everything is OK, since the "restart" is able to recreate a situation which is not worse than the one before the restart...which is not very difficult, since the situation before the restart was probably anyway not very helpful for the navigation!





Table 1: Jobs Active During the Lunar Landing

Table 1 lists the jobs that were active during the Apollo 11 powered descent. SERVICER carried the lowest priority, but was also by far the longest. The higher-priority jobs that could break in on SERVICER were all of relatively short duration.

This table of jobs is completely incoherent: Just after they say that SERVICER has the lowest priority (because if its size) and thence can be interrupted by any other job.
But SERVICER contains very important functions, like the guidance, which should be at a high level, because they control the navigation of the ship.
On the other hand, there is a job for display and a job for keyboard management, which are at higher priorities than SERVICER!
It's completely absurd, because, in a real time application, human interface , like display and keyboard management, are always at the lowest level.
The guidance belongs to a type of task which should normally be at high level, because it's important that it's regularly processed for its good working.
And the job for IMU gyro compensation makes no sense, because there is no way than a compensation can be computed from the supposed motion of the IMU.

,
Having a relatively low priority because of its size, SERVICER got last crack at the available computation time. With a negative time margin it was SERVICER that had not yet reached its conclusion when the next READACCS, running punctually, scheduled SERVICER again. Because it had not reached its end, the earlier SERVICER had not released its core set and VAC area — so the next time READACCS called FINDVAC to schedule SERVICER the Executive assigned a new core set and VAC area. That SERVICER also did not finish. After a short span of such operation the Executive exhausted its supply of core sets and/or VAC areas. When the next request was made the Executive, unable to comply, called BAILOUT with a 1201 or 1202 alarm code.



That's hilarious:
Because "SERVICER" had the lowest priority, it was processed after all the other processes.
But because SERVICER was reserving core sets and VAC areas each time it was scheduled, the other processes which had the priority over SERVICER could not run by fault of availability of core sets and VAC areas that SERVICER had drained out (and was not releasing because the other jobs were preventing it from running fast enough)!
This is the dumbest real time system I have ever seen!




Figure 9: SERVICER Operation, With and Without TLOSS

Figure 9 illustrates how SERVICER behaves in the presence of severe TLOSS, and Figure 10 compares plots of core set and VAC area usage for a normal case, and a high TLOSS case in which restarts occur.



Figure 10: Effect of TLOSS on Executive and Waitlist Resources During Lunar Descent

The fancy effect of TLOSS, with abusive use of copy and paste.
These diagrams are a little too repetitive to be credible.
There's quite a similarity between the use of core sets and VAC areas whereas the processes which use them are quite different in their nature.


(Simulation data, starting in P63 before acquisition of radar velocity data,
ending at touchdown. Note that plots have different vertical scales.)

The interesting effect of this train of events, during P63, was that the problem fixed itself. The software restart reconstructed only the most recent incarnation of the SERVICER job, and flushed the uncompleted SERVICER "stubs" that had accumulated. In addition, it terminated functions that had not been restart protected because they were not deemed critical — including the DELTAH monitor Verb 16 Noun 68. This is why, following the two alarms in P63, the display returned from Noun 68 to Noun 63.

The "interesting" effect of the restart is that the AGC had come to a such erratic behavior that restarting it could not make the situation worse!


Here a system of restart protection that was primarily motivated by the possibility of hardware glitches synergistically provided a means to shed computational load in response to a software logjam caused by TLOSS. We had devised a real-time control system that under certain conditions was "fault tolerant".

So the hardware input that the AGC was unable to acquire correctly was creating an increasingly deteriorating situation, and the fact of restarting was eliminating the effect of the previous bad acquisition of hardware; after the restart, the AGC was more sane again, ready to deteriorate from hardware incorrect acquisition again!
it was very fault tolerant indeed, it could restart before incorrect hardware incorrect acquisition would completely jam it!
But why has nobody thought about working this way before Apollo...and after it neither?



During P64 the situation was different. Added to the regular guidance equations was new processing that provided the capability to redesignate the landing site. With this addition, the essential software by itself left a duty-cycle margin of less than 10%. The alarms kept coming. There were three 1201 and 1202 alarms within 40 seconds. Each time, the software restart flushed the Executive queue but could not shed load.

During the landing phase which is the phase which was requiring the greatest attention from the astronauts, the computer alarms were occurring more often with systematic restart and were disturbing the astronauts at the worst moment.
Instead of being a help for the landing, the computer was exacly the opposite, a nuisance.
It's a miracle the LM didn't crash!



At MET 102:43:08, forestalling the next alarm, Armstrong switched the autopilot from AUTO to ATT HOLD mode, easing the computational burden, and then entered semi-manual mode P66, where the burden was still lighter. After 2 minutes and 20 seconds spent maneuvering in P66 without alarms, the LM landed.

Understand: Armstrong took manual control because he realized that the AGC was about to crash the LM on the lunar ground!
A computer which doesn't do much because it can't stand the load is not very useful!



* * *

Five months later Apollo 12 survived a lightning strike during boost and landed on the Moon. Thanks in part to a new noun (69) that we had defined to allow the crew to make position corrections based on ground tracking data during the braking phase, astronauts Pete Conrad and Alan Bean were able to land the LM within an easy walk of an unmanned Surveyor spacecraft that had landed on the Moon in April, 1967. Apollo 12's pinpoint landing paved the way for landings in more difficult terrain.

So, Apollo 12 was saved because the programmers allowed the astronauts to take decisions instead of the computer, and correct the errors of the computer.
May be the best would have been to switch the AGC off; it might have been the way it would have worked best!



It was only after Apollo 12 that we began to understand the other serious problem.

It started when Clint Tillman of Grumman Aerospace (the builder of the Lunar Module) noticed throttle oscillations during simulations of the final descent, on the order of 5% of the DPS thrust. This prompted Tillman to examine telemetry data from Apollo 11 and 12, where he noticed throttle oscillations during the final landing phases that were on the order of 25% peak to peak. (See Figure 12.) This was the period when the Commander was simultaneously using the ROD switch to control altitude-rate and the joystick to maneuver the vehicle. Because plots of this data resembled the battlements and turrets of a castle (or a castellated nut) this problem got to be known as "throttle castellation".


There could not be such oscillations; without atmosphere, there are no perturbations on the moon, and so the guidance had every reason to remain smooth, with no oscillations; and the last thing the crew would have had to do is to contrary the guidance of the computer by introducing artificial perturbations with a switch and a joystick...unless the computer is working very badly (and numerous restarts don't help), so the crew has to compensate for its erractic behavior; but in no way the crew could do it as well as a computer working correctly.





Figure 11: First Report of Throttle Castellations

Klumpp, in Cambridge, traced the excitation that caused the oscillations to a previously unrecognized phenomenon that came to be called "IMU bob". The IMU was located above, and about four feet in front of, the center-of-mass of the vehicle. Small but rapid pitch maneuvers, such as those required during final descent, slung the IMU in a way that was interpreted by the accelerometers as a change in the vertical velocity of the vehicle. This in turn affected the calculations of altitude-rate, and the estimate of thrust.


This explanation is totally ridiculous: The guidance was obviously remaining smooth (with no atmospheric perturbations on the moon), and so the IMU had no reason to be slung that way, and perturb the readings of the accelerometers.
If it was coming up to the point that the IMU was effectively slung that way, it was meaning that the lem had a very dangerous behavior, that the computer was unable to guide it efficiently like it should have, and that it had great chances to crash on the moon!



But this theory only partially explained the throttle behavior observed in the flight data.

Rocket engines that can be throttled were and still are unusual, but a throttleable engine was a necessity for making a soft landing on the Moon. A fixed-thrust engine and a very simple guidance equation could put a spacecraft through a spot on the lunar surface. But to get there right side up, moving slowly, with visibility and the ability to hover while choosing a landing area, required an engne that could balance lunar gravity while varying its thrust as the vehicle's mass decreased, as the vertical component of the thrust vector changed during attitude maneuvers, and as the astronaut requested changes in the descent rate.


it's not the mass of the vehicle which decreases, but its velocity; its mass remains the same.
In no way the astronaut must change the descent rate which is determined by the computed trajectory.
If he was, the descent rate might become completely improper to null the velocities at the level of the moon (or a little before).
Eventually, the computer might still have time to correct this descent rate to catch a trajectory allowing to null the velocities at the level of the moon, but, in some cases, the computer might not have the time to catch a such trajectory, and wouldn't have the time to null the velocities before reaching the lunar surface...which would result in a splendid crash!
Unless, of course the computer works in a disastrous way, and the crew realizes that, if they let it do, it will crash the lem on the moon, so it's still better to make manual corrections, even if these manual corrections cannot be as good as the ones made by a correctly working computer.



The guidance equations determined what acceleration was required, both in magnitude and direction. The autopilot maneuvered the vehicle to satisfy the thrust direction commanded by guidance. It was up to the throttle-control program to control the magnitude. Throttle-control started by computing the LM's mass. Knowing mass, it determined the magnitude of the thrust correction required to change vehicle acceleration from that measured by the accelerometers to that commanded by the guidance equations, converted this to the units used by the throttle assembly (about 2.8 pounds per pulse), and sent it to the hardware.

This is completely ridiculous and absurd; the computer doesn't have to compute the mass, it's irrelevant for the guidance; only the altitude, velocities, and decelerations are; the unit conversion is ridiculous too; the whole system can work with the same units; converting units makes uslessly lose time (and the AGC is already in need of time).



The accelerometers in the IMU did not really measure acceleration; they merely counted velocity increments since the last reading. Because a throttle change commanded on the previous guidance pass occurred at some time between the accelerometer readings, the measured delta-V did not show the full effect of the most recent adjustment.

Even if there is a little delay between the current accelerometer readings and the previously commanded throttle change, it doesn't matter a lot, because there is a continuous correction which is extremely efficient, without the need for it to be very fast, especially since there is no perturbation on the moon, and that, between two steps, the changes are not very important.




Figure 12: Throttle Excursions During Apollo 12 P66

Throttle control had to compensate for this effect. The amount of compensation depended on when during the guidance period throttle commands were issued, and it also depended upon the rapidity with which the engine followed the throttle command. The applicable ICD stated that the throttle time lag was 0.3 seconds.


Firstly, There's no way that the lag time of the throttle could be compensated.
"compensating" would consist in, instead of using the current readings of the accelerometers, using a prediction of what will be these readings when delayed by the lag time of the engine.
But the reading of the accelerometers cannot be predicted (if they could, they would not be needed in the guidance), so it's not possible to compensate this lag time.
Secondly, SERVICER was running every two seconds (it could even be more, because sometimes SERVICER couldn't manage to finish its job in this period), which means that the thrust of the engine was remaining the same and not updated by the readings of the accelerometers for two seconds (at least); so it doesn't make much of a difference if the engine had a lag time of 0.3 seconds (and it even ended up in being 0.2 seconds only).
Thirdly, SERVICER could be interrupted by any other job at any moment (because it has the lowest priority); that means that, between the readings of the accelerometers and the making of the correction using these readings, a job (or even several ones) could interrupt SERVICER, and create a variable delay between the readings and the use of these readings, a variable delay which could eventually go over the engine's lag time.
If the lag time of the engine had to be compensated, then this variable delay between the reading of the accelerometers and the making of the correction should also be compensated; but, even if it was possible to apply a compensation if this delay was known (which is not even the case), it is not possible just because this delay is not known and variable!
So, we are here swimming in an ocean of absurdities!




It fell to the author to program and test the throttle-control routine. In plots produced by a simulation that accurately modeled the DPS using the time lag of 0.3 seconds, I observed the oscillation that occurred in the actual thrust level after a large throttle change was commanded without compensation for the throttle lag. When I compensated for 0.1 second I saw that the oscillation was reduced. When I compensated for 0.2 seconds the oscillation appeared to be virtually eliminated. There the matter rested.


Without compensation, the throttle had no reason to oscillate, even with a little delay between the command and its effect; on the other hand, the more the command is compensated, the more the extrapolated compensation is uncertain, and the more the throttle will have chances to oscillate.
So adding compensation, instead of countering an oscillation which has no reason to exist, will create an oscillation which will exist!




Klumpp remembers me saying, "It's just like medicine, don't give it more compensation than it needs".
Klumpp knew it was not "just like medicine", but he never insisted that I program the correct number. Examining his motives 15 years later, Klumpp wrote: "

I thought it was important to nurture self-reliance, to let coworkers' decisions on small matters prevail, even when not optimum. So I withheld my thoughts and let Don's decision stand, at least until he might reconsider it independently.


I like how the programmers disagree on the amount of compensation, and one lets the other one program his own compensation "to nurture his self-reliance", though he thinks this compensation is not "optimum".
Yet an error on this compensation could have serious consequences and cause a dangerous behavior of the lem (supposing that this compensation has a benefit effect on oscillations, which it doesn't).
But a good understanding between the programmers is more important, isn't it?



Examining my own motives, I believe that the annoyance I felt toward the compensation terms for cluttering up my throttle logic may have translated into a desire to compensate no more than necessary. Be that as it may, both Apollo 11 and Apollo 12 flew with 0.2 seconds of compensation for a 0.3 second throttle delay.

What's incredible is that the author seems to know what compensation was needed, and he nevertheless applied less than this compensation!


But now both Klumpp's analysis, and an independent report prepared by J. A. Sorensen at Bellcomm, concluded that "The oscillatory character of the P66 throttle command was apparently due to the actual value of the descent engine time constant being smaller than that assumed" (Sorensen). Klumpp tracked it down. The performance of the descent engine had been improved, but the ICD was not modified accordingly. The actual time lag for the descent engine was about 0.075 seconds. It turned out we had overcompensated. As a result the throttle was barely stable.

Klumpp's analysis had an even more startling result. It showed that if the software had compensated at 0.3 seconds on Apollo 11, the throttle would have been unstable. The throttle oscillations, instead of settling down, would have become greater. Following throttle-down in P63, or perhaps in P66 under the excitation of IMU bob, the DPS engine would have rapidly oscillated between minimum and maximum thrust. No doubt mission control, quite logically, would have linked the throttle behavior to the 1202 alarms that were occurring for entirely independent reasons.

In a project as important as the Apollo one, you might expect to be a good communication between the engineers; it seems that, when a hardware improvement is made, the software team is not even warned of this improvement and goes on programming like there had been no improvement, with consequences which might go as far as being fatal!
It only worked because the author had decided to compensate less than necessary (with the disapproval of his superior).
The funniest thing is that there is no way that a compensation could be applied!




An abort would have been inevitable. With all modesty, it appears to be the case that if the author had coded the "correct" compensation number in the throttle-control routine, Apollo 11 would not have landed. I invite someone with no personal stake and a grasp of the mathematics to reexamine this theory.

That's really hilarious; if the "correct" compensation had been coded, relying on a state of the engine not incorporating an improvement the software team had not been warned about, the lem would have crashed on the moon!
The lem was saved because Klumpp's did not insist for his subordinate to program the correct value in order to "nurture his self-reliance".
Super big LOL, that's really a Hollywood scenario!
Especially when in fact it was not possible to apply any compensation!




* * *

We fixed IMU bob by removing the velocity changes caused by IMU motion from the acceleration measurements. We corrected the throttle time lag and simulations showed that this indeed fixed the throttle instability. Neither fix was on Apollo 13, but that mission was not able to attempt a lunar landing.

And they only settled this problem after several moon landing missions!



Curiously, a change made before the throttle problem came to light, which was on Apollo 13, would have offered a backup if the automatic throttle had failed. A new noun (92) was defined that the crew could select to see the throttle level desired by guidance. Logic that would have terminated automatic guidance if the throttle were (or appeared to be) switched to MANUAL was removed. These changes let the astronaut take control of the throttle during P63 or P64 while guidance continued to command attitude. I do not know whether these difficult procedures were ever practiced.

Absolutely hilarious: The crew could select the throttle level desired by guidance!!! That's contradictory, for if there is a throttle level desired by guidance and that the astronauts change it, it will no more be desired by guidance!
And letting the crew take control of the throttle while the computer was keeping control of the attitude makes absolutely no sense at all, and is totally absurd: the couple thrust/attitude determines the magnitude of the horizontal and vertical thrusts and cannot be separated; the control of the thrust and attitude must be given to a same intelligence, and the best is to give it to the computer (if it's working correctly).
It's a little as though you were driving a car, and you say to your passenger: "I now close my eyes, and you watch the road, but I keep the steering wheel"!
This type of change is the one which could guarantee a crash of the lem on the moon.
One more absurdity to add to the already long list!



The problem of the Executive overload alarms was dealt with several times over.

The rendezvous radar mode switch was placed in LGC for ascent. For future missions the descent checklist was changed. Meanwhile we added logic to LUMINARY to check the rendezvous radar mode, and if it was not in LGC, send a signal to zero the rendezvous radar counters.

Allan Klumpp studied the Executive problem from another angle. He discovered that under conditions in which TLOSS occurred intermittently, or when the level of computer activity fluctuated in the presence of TLOSS, it was possible for incomplete SERVICER jobs that had been interrupted during the issuance of attitude commands, but had not yet been flushed by a software restart, to be resumed at a later time — with the possibility that inappropriate attitude commands could be issued to the autopilot. In time for Apollo 13 Klumpp devised a fix in which an occasional whole SERVICER job would be dropped to catch up, if necessary.

How could he determine in which case the SERVICER job had to be wholly dropped?

But for the future, none of these changes provided fundamental relief from the constraint of the fixed, two-second guidance period. A terrain model needed to be added to the landing radar routines to allow landing in difficult terrain. Guidance modifications were waiting in the wings. Where would the time come from?

We developed a concept we called "variable SERVICER", in which the guidance period was allowed to stretch if it needed to. Fears that the two-second interval was built inextricably into the software proved unfounded. It was only necessary to measure the guidance period and use that value explicitly in place of the two seconds that was implicit in a few calculations. We got variable SERVICER working in an offline version of LUMINARY, and demonstrated its immunity to very high levels of TLOSS.

So, since the SERVICER job could become unable to do its whole processing in a two seconds period, they have imagined a variable measurable period instead of a fixed one, which was allowing it to go over this limit, and thus avoid the AGC being stuck.
But it also means that the task it was making was not fulfilled in a deterministic period of time.
This is what we could call "flexible" real time...but it's by definition no more real time!
May be the AGC could avoid restarts this way, but we can have serious doubts over the reliability of its processing!
A more intelligent approach would have been (apart from not making it count hardware pulses with software instructions) to separate the more essential part of SERVICER which absolutely had to be made at each period from less essential ones which could eventually skip one period (or even more).
The first essential part would have been done at each period, and the less important ones would have been done alternately.
This way SERVICER could have kept its deterministic period of two seconds, and still stand the load without crashing!



Freedom from the two-second straitjacket allowed other ideas to be considered. Astronaut John Young suggested a capability that we called P66 LPD. By now P66 had evolved into an even more flexible program than it was when Armstrong flew it on Apollo 11. One of its new features was that if the crew switched the attitude mode back from ATT HOLD to AUTO, guidance would then control the attitude to null the horizontal velocity. Young's idea was for the LGC to display an LPD angle (as during the visibility phase) that would show the Commander the spot over which the LM would come to hover, if at that instant the autopilot were switched to AUTO.

This makes no sense at all: The astronaut puts the switch on AUTO, checks if the place the lem will land on if kept on AUTO is suitable, and if not comes back on ATT-AUTO, switches again on AUTO, checks if the place the lem will land on if kept on AUTO is suitable, and if not comes back on ATT-AUTO...It can take a long while before the astronaut sees that the place the AUTO mode will land the lem on is OK for him!
Just directly moving the lem to the desired place with the lateral engines is far more convenient.
One absurdity more to add to the list!



To make P66 LPD accurate, the software had to react instantly when the astronaut switched to AUTO — more quickly than the two-second period, or even the one-second period at which parts of P66 operated, allowed. We coded a version in which a job running every quarter of a second reacted to the change in autopilot mode by immediately issuing attitude and throttle commands, and responded far more quickly and precisely to inputs from the ROD switch as well. In manned simulations run at the LM Mission Simulator (LMS) at Cape Canaveral, with its fabulous terrain models visible in the LM's windows, we showed that this system facilitated very precise landings.


If they really wanted to immediately react to a change of the switch, there was still a faster way: Electronically generate a hardware signal when the switch is changed which could trigger an interrupt which could immediately process the switch change; and it would avoid uselessy make a process run when it has nothing to do (and thus contribute to the overload of the AGC)!
Moreover, it doesn't make much sense to react very fast to the switch change, because the astronauts act on it at human speed which is not very fast.




Neither variable SERVICER nor P66 LPD ever flew. NASA had made the decision that Apollo 17 would be the last landing. With so few missions remaining, the software control board made the conservative decision — no major changes to the landing software. By synchronizing the landing radar measurements with the time the accelerometers were read, Robert Covelli gained enough time to squeeze in the terrain model for Apollo 15, 16, and 17.


For the help these "improvements" were bringing, it's just as well they were never used!
And "synchronizing the landing radar measurements with the time the accelerometers were read" means nothing at all, and brings no improvement: synchronized or not, the time of the reading of the radar will always add to the time of the reading of the accelerometers, either following each other immediately, or separated by a little period of time; so there was no gain of time at all!




Apollo 14 brought the author a brief notoriety. The abort switch on the instrument panel was sending a spurious signal that could have spoiled Alan Shepard and Ed Mitchell's landing. I had written the code that monitored this discrete. The workaround simply changed a few registers, first to fool the abort monitor into thinking that an abort was already in progress, and then to clean up afterward so that the landing could continue unaffected. The procedure radioed up and flawlessly executed by the astronauts involved 61 DSKY keystrokes. Perhaps the most interesting part of the Apollo 14 incident has been the number of differing versions that have been offered to history. But Apollo 14 is a story for another day.


A program code to fool the abort procedure and make it believe it's already in progress?
Why not simply not start this abort procedure by electronically filtering the "spurious" signal?
And if an abort was really needed for another reason, was the abort procedure also "fooled" not to happen?
I would rather think that Apollo 14 was saved because the crew did not trust the on-board computer and took the manual control!




In December 1972 I traveled to Cape Canaveral for the launch of Apollo 17. At this moment spaceflight was hip. The writer Tom Wolfe was there with photographer Annie Liebowitz to write the four-part story for Rolling Stone magazine that was the precursor of "The Right Stuff" . It was the only Apollo night launch. The misty Florida sky lit up orange from horizon to horizon as the huge Saturn V ripped downrange on a quarter-mile flame that licked at the end like a blowtorch.

I spent a few days at the LMS testing some procedures that we called "erasable memory programs". These were snippets of code that could be installed in unused VAC areas to handle certain malfunctions — an idea that was a legacy of the Apollo 14 incident. Then I flew back to Cambridge for the landing itself.

That's hilarious: There were some cases that SERVICER was draining out the VAC areas which would have been needed by jobs with a higher priority, so they were still more wasting VAC areas by installing snippets of codes in them; this way restarts would still more often occur because the executive would still be quicker out of available VAC areas (Oh, I was forgetting about the "Variable SERVICER"...which was never installed, and is an insult to real time!).



After that came the pleasure of listening in while Gene Cernan and Jack Schmitt, a geologist by training, explored the Moon in the lunar rover, venturing over 3 miles, out of sight of the spacecraft. And that was the last time anyone walked on the Moon.

If anybody ever walked; may be, may be not, who knows?"
Well that was really a nice story, the kind Lewis Caroll would have appreciated!




Figure 13: Some of the People Involved.
Front Row: Vince Megna, "Doc" Charles Stark Draper, the author,
Dave Moore, Tony Cook. Back Row: Phil Felleman, Larry Berman,
Allan Klumpp, Bob Werner, Robert Lones, Sam Drake.


Some have big smiles on their faces.
Are they smiles of satisfaction...or amusement?
Some others rather grin
Is it by modesty and not to show to overtly their satisfaction...or by sadness of being forced to play a comedy?




I have made a little humor with pictures from Apollo 11 on the theme of this report: