The IBM 7030 (STRETCH)

Norman Hardy

by Norman Hardy

This is a short memoir concerning the IBM 7030, or as it was known then, the StrETCH.
This is followed by a section of technical odds and ends, principally from a study made by Mark Smotherman.
The STRETCH, so named because it stretched all of the then known limits in computers, is considered today to be a path-finding machine.

Norman Hardy: I arrived at Livermore (the lab) from Berkeley in the middle of 1955. The lab was negotiating with Remington Rand Univac for the LARC at the time, but I was not in on that. IBM had also bid on the job, but the lab was proceeding with Remington Rand. IBM did a deal with Los Alamos for a similar machine about two years later than the LARC contract. IBM called their machine "STRETCH" or, more formally, the "IBM 7030". By about 1958, the lab ordered a STRETCH as well; the LARC was late for delivery.

The LARC was, indeed, delivered about two years late. The STRETCH was also late, but only by a few months [1]. The two machines ran largely different programs which were almost all coded in assembly language.

The machine comprised about ten boxes, each as big as an outsized refrigerator. These boxes abutted each other at their broad sides. If the row of boxes went east and west, then the works rolled out to the south for access, whereas the wiring connecting the units ran along the north side of the row. On the broad side of the last box was the engineer's console that had about 3000 incandescent indicator lamps. These ran at low voltage and glowed a gorgeous orange with the room light turned out.

This was still the era when maintenance engineers came with the machine. With earlier machines, engineers were on site 24 hours a day, but the transistor machines were enough more reliable that they could all sleep at home most nights. I recall that they would do preventative maintenance for a few hours about every two weeks. The core memories had ECC (error checking and correcting) codes. It was clear that this saved much downtime. The 7090s, with exactly the same memory boxes, each experienced more down time for memory problems than the STRETCH, which had six boxes. Aside from memory problems, the machine failures were almost always in a part of the machine in which the engineer had no previous experience. They had to learn a new part of the machine with impatient programmers hovering about. The machine had about 250,000 transistors.

One of the boxes was the basic exchange which maintained several flows of 8-bit bytes between core and various IO devices. This pre-figured the channels of IBM's subsequent 360 line. Attached to the basic exchange were a card reader, card punch, line printer, operator's console and about eight magnetic tape units.

The high-speed exchange was another big box that did the same for the large disk with about 16MB of storage. The transfer rate was about one word (8 bytes) per 4 microseconds.

Perhaps most significant to those with problems to solve was the large memory. Previously, the largest memory was about 106 bits. The STRETCH had six times as much [2].

On previous machines, the working data for production jobs would not fit in core, but required magnetic tape, which was read and written by explicit application logic, overlapped with the computation proper. Such applications were complex and difficult to debug. Since application data would all fit in the core memory of the STRETCH, applications came into production months sooner than they had upon the introduction of previous machines.

This was still before time-sharing, and debugging consisted of reserving time slots on the machine and bringing card decks to the machine when it was your turn. The card reader read the deck and the machine assembled or compiled your code. Some people kept their sources on magnetic tape and brought tape and card decks that described updates to their programs. A new program listing would normally be produced at 600 lines per minute. A rudimentary operating system normally remained in place between jobs. Later, we came to the point where the OS would request magnetic tapes enough in advance of a job that they might well be fetched and mounted by the time that they were required. A few jobs used tapes for working storage. The disks were not used heavily. Occasionally, data would be left on disk from day to day. Space was informally allocated by merely negotiating with the programers whose jobs used the disk.

Livermore developed a Fortran compiler for the STRETCH. Fortran was used for many smaller production jobs but the few bigger jobs remained in assembly language.

IBM built about ten STRETCH systems; one was a component of the Harvest that went to NSA. The STRETCH missed its original performance targets and was upstaged by the IBM 7090 that had aggressively adapted the STRETCH transistor technology and 2-microsecond core memories [3].

The more modest goals of the 7090 brought it to market before the STRETCH itself. It met an immediate need to run deployed 709 code with which it was highly compatible. (The 709 was a tube machine and about seven times slower.) T. J. Watson criticized the STRETCH effort for having missed its goals and terminated sales of the machine. Several years later, he acknowledged that the machine had indeed been strategic for the technology that it had developed for IBM.

IBM 7030 System

Figure 1: The IBM 7030 System Operator's Console
Figure 2: The IBM 7030 System Operator's Console
Other than Disk I/O, all other I/O and all computer activity went through this device, known as the Low Speed Exchange. Programmers such as Tad Kishi shown here, found this console to be very helpful in debugging their programs.
Figure 3: The Low Speed Exchange
Shown here as the Low Speed Exchange was being installed, are left to right, Sid Fernbach, department head, Clarence Badger, STRETCH programmer, and Richard von Holdt, assistant department head.

I selected these additional data from Mark Smotherman's page on Eric Smith's page.

The IBM STRETCH designed in the late 1950s contained many aggressive organization techniques (aggressive even by today's standards). These included predecoding, memory operand prefetch, out-of-order prediction, branch misprediction recovery, and precise interrupts. These techniques appear In many more recent high-end IBM mainframes such as S/360 Model 91, S/370 Model165, 3033, and 3090 as well as the IBM RS/6000 and PowerPC microprocessors.

Five test programs were selected for simulation to help determine machine parameters: a hydrodynamics mesh problem, a Monte Carlo neutron-diffusion code, the inner loop of a second neutron diffusion code, a polynomial evaluation routine, and the inner loop of a matrix inversion routine. Several special STRETCH instructions were defined for scientific computation of this kind, such as branch on count and multiply and add.

Instructions in STRETCH flowed through two processing elements: an indexing and instruction unit that fetched, predecoded, and partially executed the instruction stream, and an arithmetic unit that executed the remainder of the organization. A set of 16 64-bit index registers was associated with the indexing and instruction unit, and a set of 64-bit accumulators and other registers were associated with the arithmetic unit.

The indexing and instruction unit of STRETCH fetched 64-bit memory words into a two-word instruction buffer. Instructions could be either 32 or 64 bits in length, so up to four instructions could be buffered. The indexing and instruction unit directly executed indexing instructions and prepared arithmetic instructions by calculating effective addresses (i.e., adding index register contents to address fields) and starting memory operand fetches. The unit itself was a pipelined computer, and it decoded instructions in parallel with execution [Blosk, 1961]. One interesting feature of the instruction fetch logic was the addition of predecoding bits to all instructions; this was done one word at a time, so two half-word instructions could be predecoded in parallel.

Unconditional branches and conditional branches that depended on the state of the index registers could also be fully executed in the indexing and instruction unit. Conditional branches that depended on the state of the arithmetic registers were predicted untaken, and the untaken path was speculatively executed.

All instructions, either fully or partially executed (i.e., prepared), were placed into a novel form of buffering called a 'Lockheed' unit, which was contemporaneously called a 'virtual memory' but which we would view today as a combination of a completion buffer and a history buffer. A fully executed indexing instruction would be placed into one of four levels of lookahead along with its instruction address and the previous value of any index register that had been modified. This history of old values provided a way for the lookahead levels to be rolled back and thus restore the contents of the index registers on a mispredicted branch or interrupt. A partially executed arithmetic instruction would also be placed into a lookahead level along with its instruction address, and there it would wait for the completion of its memory operand fetch. Some complex instructions were broken into separate parts and thus required multiple lookahead levels.

The arithmetic unit would execute an arithmetic instruction whenever its lookahead level became the oldest and its memory operand was available. Arithmetic interrupts were made precise by causing a roll back of the lookahead levels, just as in the case of a mispredicted branch

Stores were also executed whenever their lookahead level became the oldest. Store forwarding was implemented by checking the memory address to be read of each subsequent load placed in the lookahead levels. If that address matched the memory address to be written of the store, the load was cancelled and the store value was directly copied into the buffer reserved for the loaded value. Only one outstanding store was allowed at a time. Also, because of potential instruction modification the memory address to be written of the store was compared to each of the instruction addresses in the lookahead levels.

The clock cycle time for the indexing unit and the lookahead was 300 nanoseconds -- up from the initial estimates of 100 nanoseconds. The clock cycle time for the VFL and parallel arithmetic units was 600 nanoseconds. Twenty-three levels of logic were allowed in a path, and connections of approximately 15 feet were counted as one-half level. The PAU performed one floating-point add every 1.5 microseconds and one floating-point multiply every 2.4 microseconds.

When STRETCH became operational in 1961, benchmarks indicated that it was only four times the performance of a 7090. This difference was apparently due to store instruction delays and the misprediction recovery time required for taken arithmetic branches; both cases stalled the arithmetic unit. Even though STRETCH was the fastest computer in the world (and remained so until 1964), the performance difference caused considerable embarrassment for IBM.


Charles Bashe, Lyle Johnson, John Palmer, and Emerson Pugh. IBM's Early Computers. Cambridge, MA: MIT Press, 1986.

"IBM Stretch," section 13.3 of G.A. Blaauw and F.P. Brooks, Jr., Computer Architecture: Concepts and Evolution. Reading, MA: Addison Wesley, 1997.

Erich Bloch, "The Engineering Design of the Stretch Computer," Proc. IRE/AIEE/ACM Eastern Joint Computer Conference, Boston, December 1959, pp. 48-58.

R.T. Blosk, "The Instruction Unit of the Stretch Computer," Proc. IRE/AIEE/ACM Eastern Joint Computer Conference, New York, December 1960, pp. 299-324.

Werner Buchholz, editor. Planning A Computer System, McGraw-Hill, 1962.

John Cocke and Harwood Kolsky, "The Virtual Memory in the Stretch Computer," Proc. IRE/AIEE/ACM Eastern Joint Computer Conference, Boston, December 1959, pp. 82-93.

Edward Yaski, "Fastest in its Time," Datamation, January 1982, pp. 34ff.

Editor's notes:

Those wishing to explore Norman Hardy's other insights may visit his page.