"By 1963, designers determined that the Apollo computer software would have a long list of capabilities." /1/
Tasks of LEM's AGC
"The software should act as backup to the Saturn booster (Saturn had its own computer, the
Saturn Launch Vehicle Digital Computer [LVDC], and software), controll aborts, target, do all navigation and flight control tasks, attitude determination and control, digital autopilot tasks, and eventually all maneuvers involving velocity changes. Programs for these tasks had to fit in the memories of two small computers, one in the
CM and one in the
LEM. Designers developed the programs using a
Honeywell 1800 computer and later an
IBM 36O, but never with the actual flight hardware which was only used for testing in some simulated environments. The development computers generated binary object code and a listing. The tape containing the object code would be tested and eventually released for
core rope manufacture. The listing served as documentation of the code." /1/
Honeywell 1800 (1960's)
"Defining requirements is the single most difficult part of the software development cycle. The specification is the customer's statement of what the software product is to do. Improperly prepared or poorly defined requirements mean that the resulting software will likely be incomplete and unusable. Depending on the type of project, the customer may have little or a lot to do with the preparation of the specification. In most cases, a team from the software developers works with the customer." /1/
IBM 360 (1964...)
"MIT worked closely with NASA in preparing the
Guidance and Navigation System Operations Plan (GSOP), which served as the requirements document for each mission.
NASA's Mission Planning and Analysis Division (MPAD) at the
Manned Spacecraft Center provided detailed guidance requirements right down to the equation level." /1/
Johnson Manned Spacecraft Center
"Often these requirements were in the form of flow charts to show detailed logic. The division fashioned these requirements into a controlled document that contained specific mission requirements, preliminary mission profile, preliminary reference trajectory, and operational requirements for spacecraft guidance and navigation. NASA planned to review the GSOP at launch minus 18 months, 16 months, 14 months and then to baseline or "freeze" it at 13.5 months before launch. The actual programs were to be finished at launch minus 10.5 months and tested until 8 months ahead, when they were released to the manufacturer, with tapes also kept at MIT and sent to Houston, North American (CM manufacturer), and Grumman (LEM manufacturer) for use in simulations. At launch minus 4 months the
core ropes were to be completed and used throughout the mission." /1/
A video from the Mission Planning and Analysis Division of the Manned Spacecraft Center in Houston, Texas (now the Johnson Space Center [JSP]).
"Even during the early 1960s, the cycle of requirements definition, design, coding, testing, and maintenance was followed, if not fully appreciated, by software developers. The main point of the Bellcomm report (and the thrust of software engineering) was that software can be treated the same way as hardware, and the same engineering principles can apply." /1/
David G. Hoag, technical design director at the MIT laboratory, examines the IMU (Inertial Measuring Unit) of Apollo.
"However, NASA was more used to hardware development than to large-scale software and, thus, initially failed adequately to control the software development.
MIT, which concentrated on the overall guidance system, similarly treated software as a secondary occupation. This was so even though MIT manager A.L. Hopkins had written early in the program that "upon its execution rests the efficiency and flexibility of the Apollo Guidance and Navigation System". Combined with NASA's inexperience, MIT's non-engineering approach to software caused serious development problems that were overcome only with great effort and expense. In the end NASA and MIT produced quality software, primarily because of the small-group nature of development at MIT and the overall dedication shown by nearly everyone associated with the Apollo program." /1/
A video from the Mission Planning and Analysis Division of the Manned Spacecraft Center in Houston, Texas (now the Johnson Space Center [JSP]).
"In the Apollo program, with an outside organization developing the software, NASA had to provide for quality control of the product. One method was a set of standing committees; the other was the acceptance cycle." /1/
Dr. Von Braun, Dr. J.P. Kuettner and Warren J. Northleft at the NASA Manned Spacecraft Center, now the Johnson Space Center (Oct.14.1964)
"Three boards contributed directly to the control of the Apollo software and hardware development. The
Apollo Spacecraft Configuration Control Board monitored and evaluated changes requested in the design and construction of the spacecraft itself, including the guidance and control system, of which the computer was a part. The
Procedures Change Control Board, chaired by Chief Astronaut Donald K. Slayton, inspected items that would affect the design of the user interfaces. Most important was the
Software Configuration Control Board, established in 1967 in response to continuing problems and chaired for a long period by Christopher Kraft. It controlled the modifications made to the on-board software. All changes in the existing specification had to be routed through this board for resolution. NASA's Stan Mann commented that MIT "could not change a single bit without permission"." /1/
Dr. Von Braun looking computer consoles (Oct.14.1964)
"NASA also developed a specific set of
review points that paralleled the software development cycle. The
Critical Design Review (
CDR) resulted in acceptance of specifications and requirements for a given mission and placed them under configuration control. It followed the preparation of the requirements definition, guidance equation development, and engineering simulations of the equations. Next came a
First Article Configuration Inspection (
FACI). Following the coding and testing of programs and the production of a validation plan, it marked the completion of the development stage and placed the software code under configuration control. After testing was completed, the
Customer Acceptance Readiness Review (
CARR) certified that the validation process resulted in correct software. After the CARR, the code would be released for
core rope manufacture. Finally the
Flight Readiness Review (
FRR) was the last step in clearing the software for flight. The acceptance process was mandatory for each mission, providing for consistent evaluation of the software and ensuring reliability." /1/
"With respect to units, the LGC was eclectic. Inside the computer we used metric units, at least in the case of powered-flight navigation and guidance. At the operational level NASA, and especially the astronauts, preferred English units. This meant that before being displayed, altitude and altitude-rate (for example) were calculated from the metric state vector maintained by navigation, and then were converted to feet and ft/sec. It would have felt weird to speak of spacecraft altitude in meters, and both thrust and mass were commonly expressed in pounds. Because part of the point of this paper is to show how things were called in this era of spaceflight, I shall usually express quantities in the units that it would have felt natural to use at the time." (Don Eyles) /3/
"In software engineering practice today, the specification document is followed by a design document, from which the coding is done. Theoretically, the two together would enable any competent programmer to code the program. The GSOPs contained characteristics of both a specification and design document. But, as one of the designers of the Apollo and Shuttle software has said, "I don't think I could give you the requirements for Apollo and have you build the flight software". In fact, the plans varied both in what they included and in the level of detail requirements. This variety gave MIT considerable latitude when actually developing the flight software, thus reducing the chance that it would be easily verified and validated." /1/
The AGC Operating System
"The AGC was a
priority-interrupt system capable of handling several jobs at one time. This type of system is quite different from a
round-robin executive. In the latter programs have a fixed amount of time in which to run before being suspended while the computer moves on to the remaining pending jobs, thus giving each job the same amount of attention. A priority-interrupt system is always executing the one job with the highest priority; it then moves on to others of equal or lower priority in its queue." /1/
AGC and DSKY
"The Apollo control programs included two related to job scheduling: the
Executive and the Waitlist. The Executive could handle up to seven jobs at once while the Waitlist had a limit of nine short tasks. Waitlist tasks had execution times of 4 milliseconds or less. If a task ran longer than that, it would be promoted by the Waitlist to "job" status and moved to the Executive's queue. The Executive checked every 20 milliseconds for jobs or tasks with higher priorities than the current ones. It also managed the DSKY displays88. If the Executive checked the priority list and found no other jobs waiting, it executed a program called DUMMY JOB continuously until another job came into the queue." /1/
AGC's placement in the LM
"The Executive had other duties as part of controlling jobs. One solution to the tight memory in the AGC was the concept of time-sharing the erasable memory. No job had permanent claim to any registers in the erasable store. When a job was being executed, the Executive would assign it a "
coreset" of 12 erasable memory locations. Also, when interpretive jobs were being ran (the Interpreter is explained below), an additional 43 cells were allocated for vector accumulation (VAC). The final lunar landing programs had eight coresets in the LEM computer and just seven in the CM. Both had five VACs. Moreover, memory locations were given multiple assignments where it was assured that the owning processes would never execute at the same time. This approach caused innumerable problems in testing as software evolved and memory conflicts were created due to the changes." /1/
Programming Tools
"The AGCs in the LM and CM were programmed in two languages. The one we called "Basic", but more properly "Yul", was an assembler language of about 40 operations, authored by Hugh Blair-Smith. "Interpretive" was a list-processing interpretive language (essentially a set of subroutines) designed to facilitate guidance and navigation calculations involving double precision (30-bit fixed-point) vectors and matrices — at the cost of being very slow. The Interpreter was written by Charles Muntz." /3/
"The memory-cycle time for the AGC was 11.7 microseconds. A single-precision addition in the assembler language took two memory cycles. A double-precision vector cross-product programmed in Interpretive took about 5 milliseconds. One of the challenges in programming the AGC was juggling the two languages to obtain the best blend of speed and compactness for the given situation." /3/
"The interpreter got a starting location in memory, retrieved the data in that location, and interpreted the data as though it were an instruction. Instead of having only the 11 instructions available in assembler, up to 128 pseudo instructions were defined. The larger number of instructions in the interpreter meant that equations did not have to be broken down excessively. This increased the speed and accuracy of the coding." /1/
Some of the MIT staff /3/
"The MIT staff gave the resulting computer programs a variety of imaginative names. Many, such as SUNDISK, SUNBURST, and SUNDIAL, related to the sun because Apollo was the god of the sun in the classical period. But the two major lunar flight programs were called
COLOSSUS and
LUMINARY. The former was chosen because it began with "C" like the CM, and the latter because it began with "L" like the LEM97. Correspondence between NASA and MIT often shortened these program names and appended numbers. For example, SOLRUM55 was the 55th revision of SOLARIUM for the AS501 and 502 missions. BURST116 was the 116th revision of SUNBURST98. Although these programs had many similarities, COLOSSUS and LUMINARY were the only ones capable of navigating a flight to the moon. On August 9, 1968, planners decided to put the first released version of COLOSSUS on Apollo 8, which made the first circumlunar flight possible on that mission." /1/
Restart Protection
"An Apollo restart transferred control to a specified address, where a program would begin that consulted phase tables to see which jobs to schedule first. These jobs would then be directed to pick up from the last restart point. The restart point addresses were kept in a restart table. Programmers had to ensure that the restart table entries and phase table entries were kept up to date by the software as it executed. The restart program also cleared all output channels, such as control jet commands, warning lights, and engine on and off commands, so that nothing dangerous would take place outside of computer control ." /1/
"A software failure causing restarts occurred during the Apollo 11 lunar landing. The software was designed to give counter increment requests priority over instructions. This meant that if some item of hardware needed to increment the count in a memory register, its request to do so would cause the operating system to interrupt current jobs, process the request, and then pick up the suspended routines. It had been projected that if 85,000 increments arrived in a second, the effect would be to completely stop all other work in the system. Even a smaller number of requests would slow the software down to the point at which a restart might occur. During the descent of Apollo 11 to the moon, the rendezvous radar made so many increment requests that about 15% of the computer systems resources were tied up in responding. The time spent handling the interrupts meant that the interrupted jobs did not have enough computer time to complete before they were scheduled to begin again. This situation caused restarts to occur, three of which happened in a 40-second period while program P64 of LUMINARY ran during descent106. The restarts caused a series of warnings to be displayed both in the spacecraft and in Mission Control. Steven G. Bales and John R. Garman, monitoring the computer from Mission Control, recognized the origin of the problem. After consultation, Bales, reporting to the Flight Director, called the system GO for landing. They were right, and the restart software successfully handled the situation. The solution to this particular problem was to correct a switch position on the rendezvous radar which, through an arcane series of circuitry, had caused the analog-to-digital conversion circuitry to race up and down. This incident proved the need for and effectiveness of built-in software recovery for unknown or unanticipated error conditions in flight software-a philosophy that has appeared deeply embedded in all NASA manned spaceflight software since then." /1/
"Cut to a time about a year before Apollo 11, when we software engineers, who thought we already had enough to do, were requested to write the lunar landing software in such a way that the computer could literally be turned off and back on without interrupting the landing or any other vital maneuver! This was called "restart protection". Other factors than power transients also caused restarts. A restart was triggered if the hardware thought the software was in an endless loop, or if there were a parity failure when reading fixed memory, or for several other reasons. Restart protection was done by registering way points at suitable points during the operation of the software such that if processing happened to jump back to the last way point, no error would be introduced." /3/
"Following a restart, such computations could be reconstructed. For each job, processing would commence at the last registered waypoint. If multiple copies of the same job were in the queue, only the most recent was restarted. Certain other computations that were not considered vital were not restart-protected. These would simply disappear if there were a restart." /3/
"Restart protection worked very well. On the control panel of our real-time "hybrid" simulator in Cambridge was a pushbutton that caused the AGC to restart. During simulations we sometimes pushed the button randomly, almost hoping for a failure that might lead us to one more bug. Invariably, once we got the restart protection working, operation continued seamlessly." /3/
SDS 9300 digital computer and a COMCOR CI 5000 analog computer
"The hybrid simulator in Cambridge combined SDS 9300 digital and Beckman 21331 analog computers with a real AGC and realistic LM and CM cockpits."/3/
Conclusion
"NASA did successfully land a man on the moon using programs certifiably adequate for the purpose. No one doubted the quality of the software eventually produced by MIT nor the dedication and ability of the programmers and managers at the Instrumentation Lab. It was the process used in software development that caused great concern, and NASA helped to improve it143. The lessons of this endeavor were the same learned by almost every other large system development team of the 1960s: (a) documentation is crucial, (b) verification must proceed through several levels, (c) requirements must be clearly defined and carefully managed, (d) good development plans should be created and [53] executed, and (e) more programmers do not mean faster development. Fortunately, no software disasters occurred as a result of the rush to the moon, which is more a tribute to the ability of the individuals doing the work than to the quality of the tools they used." /1/
REFERENCES
/1/
http://history.nasa.gov/computers/Ch2-6.html
/2/
http://www.ibiblio.org/apollo/LVDC.html
/3/
http://www.doneyles.com/LM/Tales.html
/4/ E-2066, HYBRID SIMULATION OF THE APOLLO GUIDANCE, NAVIGATION AND CONTROL SYSTEM, Philip G. Felleman, December 1966
* * *