mark@hubcap.UUCP (Mark Smotherman) (12/21/88)
DRAFT -- 12/20/88 A Taxonomy of I/O Systems Mark Smotherman Dept. of Computer Science Clemson University, Clemson, SC 29634-1906 (803) 656-5878, INTERNET: mark@hubcap.clemson.edu Introduction I am attempting to compile a taxonomy of I/O systems that is based on the program sequencing necessary for the control of I/O devices. Typical textbook presentations of I/O systems are too simplistic in that they identify only four categories: 1) program-controlled I/O (i.e. polling), 2) interrupt-driven I/O, 3) DMA, and 4) channel I/O. A study of the historical develop- ment of I/O systems reveals a much richer design space. G.A. Blaauw and F.P. Brooks, Jr., have come much closer to a comprehensive categorization of the area. In their manuscript, Computer Architecture, these authors identify essentially seven categories of I/O systems: I. Dependent I/O A. direct B. single instruction overlap 1. private buffer per device 2. shared buffer a. dedicated to I/O usage b. general buffer in main memory II. Autonomous I/O A. channel (specialized controller/processor) B. peripheral processor (generalized processor) 1. homogeneous multiprocessor structure 2. heterogeneous multiprocessor structure I believe that an I/O taxonomy should make finer distinctions than either of those above and that multiprocessor issues should be identified explicitly. For uniprocessors, I see the major categories as synchronous I/O versus the several different ways in which overlapped I/O operations can be initiated. For the overlapped operations, I have chosen the method of completion re- porting as the basis of subcategories, and under that I have identified the method of transfer. In terms of the method of transfer, I make a distinction between a controller that can transfer only one block before requiring CPU intervention and a controller that can transfer multiple blocks in a scatter/gather type of operation (in which the blocks are identified to the controller by a chain of descriptors). Many designers and authors call a controller with the latter ca- pability an I/O channel; however, I reserve that term for a spe- cialized I/O processor that fetches instructions that have an identifiable opcode field. I note that Bell and Newell categor- ize controllers with scatter/gather capability as Pios since they consider the chain of block descriptors essentially a series of jump instructions. From Blaauw and Brooks, I accept their distinction between I/O channels and I/O processors as being the general ability to count. That is, an I/O processor should have the ability to maintain a loop or event count that is unrelated to the transfer of a given number of characters per block. For multiprocessors, the technique of I/O initiation is not as important as the symmetry of initiation; therefore, this symmetry or lack of it becomes the basis of the major categories. The method of completion reporting is the basis of subcategories, and symmetry in interruption is explicitly identified. A classification of historical machines into the different categories should help define the categories. Indeed, in line with the thinking of Blaauw and Brooks, I believe that the earli- est machine with a given feature will likely exhibit that feature in its least disguised and least compromised form. For older machines with multiple I/O options, I have chosen to classify them according to their established use (e.g. the IBM S/360 has synchronous I/O capability, which is largely unused). Some mul- tiprocessors appear in the first section; this is because they represent the first use of a given transfer initiation method. The I/O classification I present also serves as a guided tour of the history of I/O systems. However, not all categories are po- pulated with machines. This may be the result of omissions on my part, or the category may indeed be unfruitful. I would like to characterize the reasons for the latter occurrence. A New Taxonomy I. CPU - I/O INTERACTION A. Synchronous transfer ERA 1103 (delivered 1953) - interlocked I/O; word at a time (Blaauw and Brooks credit a UNIVAC 1103A at NASA as the first computer to use the interrupt concept; a 1956 paper reports that an interrupt was used to preempt a batch machine and start data collection from a wind tunnel; see also the UNIVAC I and NBS DYSEAC) IBM 702 (announced 1953) - CPU stalls while a block of charac- ters are transferred from an I/O device buffer; introduced con- trol unit concept IBM 1401 (1959) - originally designed as a printer controller but found widespread use as a small business computer; uses one opcode per I/O device: 1 is read a card into location 0 to 79, 2 is punch a card from another location, 3 is print from a third location, etc.; the CPU stalls while the characters are transferred; a card duplicating program could be written in about 20 characters and punched onto one card B. Asynchronous transfer 1. interlocked instruction to start transfer a. synchronization by interlock UNIVAC I (1951) - one 60-word tape buffer for input and one for output; initial input instruction starts transfer to buffer and then releases CPU for overlapped instruction exe- cution; subsequent input instruction dumps buffer to memory, starts next transfer, and then releases CPU; if subsequent input instruction is issued too early then interlock stalls CPU; I/O errors halt the CPU and the operator must diagnose (Codd in 1962 article credits the UNIVAC I as "one of the earliest machines to be equipped with program interruption" since he states that an arithmetic overflow would cause the program to stop; Eckert's 1951 paper mentions several checks that can stop the machine) IBM 701 (announced 1952) - "copy logic": after an initial prepare to read (or write) instruction, the program must is- sue a copy instruction for each word in the transfer; a loop is coded to update the memory addresses and issue the copies, and the loop may also perform superficial processing such as character code conversion; the copy instruction is inter- locked so that an early issue is stalled until the I/O device can provide/accept the next word; at end of file the copy in- struction causes a one-instruction skip, and at the end of block it causes a two-instruction skip b. synchronization by polling i. separate instructions to poll and transfer data (earlier machines? CDC 160?) PDP-1 (1959) - conditional skips on I/O buffer register con- tents are apparently used to poll for transfer completion PDP-8 (1965) - conditional skips on control unit status re- gisters are used for polling ii. controller transfers words of block (i.e. DMA) (did Whirlwind I have this capability? -- Everett, 1951 pa- per: "In general the computer continues to run during termi- nal equipment wait times.") IBM SAGE (started 1952, operational 1955) - I/O operations to start block transfers of data to/from drum buffers; I/O operations proceed in parallel with further CPU operations; a controller generates the sequential memory addresses for the block and decrements a counter; CPU has conditional branch to test completion of transfer; transfers are inter- locked so that CPU is stalled if second transfer is attempt- ed before previous one ends (Serrell, et al., in 1962 iden- tify "computation in parallel with I/O" as a significant new feature of SAGE). iii. controller with scatter/gather capability (often called an I/O channel) UNIVAC 1108? iv. I/O channel (with specialized I/O instruction set) IBM 709 (announced 1957) - I/O processor with specialized instruction set; CPU has instruction to start channel and conditional branch to test completion of channel function; start channel is interlocked so that CPU is stalled if second start is issued before previous one ends (note: Greenstadt paper says external interrupt available) v. I/O processor (example unknown) c. synchronization by interrupt i. separate instruction to transfer data NBS DYSEAC (2 papers published in 1954) - 2 program counters; I/O signal causes CPU to switch PC's; bit in each instruction can force switch back between PC's (Codd in 1962 article states, "in the NBS DYSEAC the very significant step was made of extending interruption to input-output opera- tions") Lincoln TX-2 (1957 paper) - "multiple sequence": 33 program counters; each I/O device has a dedicated PC and operates at a fixed priority (i.e. forerunner of interrupt vector); each instruction has break and dismiss bits; break is used to in- dicate points at which a higher-priority sequence can take over, while dismiss is used to allow lower-priority se- quences to resume (Blaauw and Brooks classify this machine as having PPUs, but I see the explicit instruction bits as a recognition of the sharing of a single CPU; thus, I consider this machine closer to interrupt vectoring than to virtual PPUs) PDP-1 (1959) - Bell, et al., credit the "16-channel sequence break system" to TX-2 influence (actual operation not described) ii. controller transfers words of block (i.e. DMA) CDC 1604 (1960) ? iii. controller with scatter/gather capability (often called an I/O channel) IBM 7070 (announced 1958) - "priority processing" (I/O in- terrupt); I/O completion causes CPU to switch to uninterrup- tible priority routine; return address stored in register; scatter/gather capability possible with chain of record de- finition words IBM STRETCH (started 1954, delivered 1961) - I/O exchange acts as byte multiplexor; I/O completion is part of a comprehensive interrupt vector facility (vector contains single instructions to be executed outside the normal in- struction cycle, these instructions can be single- instruction fixups or subroutine calls); interrupt nesting is allowed but I/O is treated as a single cause Honeywell 800 (1963) - hardware-assisted multiprogramming; 8 virtual processors, each has 2 program counters and an indi- vidual interrupt vector base register; for each memory cycle the hardware scans on a priority basis for activity on the 8 input controllers, then the 8 output controllers, and then the CPU; within the CPU the hardware scans in a cyclic manner with various exceptions for multiple memory cycle operations iv. I/O channel (with specialized I/O instruction set) IBM 7090 (announced 1958) - addition of data-channel trap (I/O interrupt) to 709 architecture; causes CPU to branch to special instruction sequence v. I/O processor UNIVAC LARC (started 1954, delivered 1960) - high-level re- quest packets (e.g. record number or key) are sent to an I/O processor, which also performs services such as device queueing; the requesting processor is interrupted when its request is complete; main memory contention is resolved by a time-slotted memory bus 3. conditional instruction to start transfer a. synchronization by polling (example unknown) b. synchronization by interrupt IBM S/360 (1964) - start I/O instruction sets condition code according to success of initiation (path may be busy and CPU must perform queueing, or error may exist); channel I/O but less complex instruction set for channel than 7090 channels 4. mailbox deposit to start transfer a. synchronization by polling CDC 6600 (1964) - in the typical OS structure, PPUs poll reserved main memory locations (input mailboxes) to determine I/O requests; after starting a device in response to an I/O request, a PPU will poll the device until completion; at com- pletion the PPU will place a completion notice in its output mailbox; programs can poll the output mailbox or can be suspended until the PPU running the OS sees the completion notice and resumes the program by an exchange jump; before an output transfer the PPU must move the data from the shared main memory to its local memory, likewise after an input transfer the PPU must move the data from its local memory to the shared main memory; PPU execution is time-shared equally TI-ASC (1972) - PPU execution can be weighted to favor cer- tain PPU's b. synchronization by interrupt (example unknown) 5. queue insert to start transfer a. synchronization by polling Burroughs B7700 (1973) - reserved locations exist in main memory that define head and tail pointers to I/O device re- quest queues and I/O completion block queues; queue manipula- tions by the CPU and I/O module are atomic actions; any IOM can handle any device; a start I/O instruction begins IOM processing on a specified device queue; processing continues until an error, interrupt, or empty queue; the CPU polls the completion block queue; optionally interrupts can be generat- ed on completion of each request (see also IBM S/370 XA where path busy queueing is handled by the channel subsystem) b. synchronization by queueing Honeywell Series 60 Level 64 (1974) - microcoded semaphore operations used in I/O ELXSI System 6400 - uses messages as a synchronization mechanism between both OS processes and I/O controllers; an I/O processor notifies the controller that a message is pend- ing, but it is the responsibility of the controller to handle queues, including processing things out of order, and han- dling errors c. synchronization by interrupt (see also IBM S/370 XA where path busy queueing is handled by the channel subsystem) 6. asynchronous instruction to start transfer a. synchronization by polling (example unknown) b. synchronization by interrupt IBM S/370 (1970) - SIOF (start I/O fast release) used to release CPU after channel has fetched CAW and before the channel has determined if I/O operation can be successfully initiated; interrupt occurs if device or path is busy (was assumed to be infrequent but on later systems the interrupt overhead canceled out performance gain from the fast release of CPU) II. MULTIPROCESSOR I/O A. Asymmetric initiation 1. synchronization by polling (example unknown) 2. synchronization by asymmetric interrupt Burroughs B5500 (1964) - up to two CPUs; only master CPU can initiate I/O; ITI instruction to test for pending interrupt at end of interrupt handling; I/O channels and CPUs are crossbarred with memory modules, also the I/O channels are crossbarred with all the peripherals IBM S/370 MP - channels and devices are dedicated to a partic- ular CPU UNIVAC 1100 Model 80 (1977) - up to four CPUs, but I/O must initiated by either of the two CPU's that connect to the storage interface unit that controls memory access for the I/O device; I/O is directed to the cache in the SIU rather than directly to main memory; I/O interrupts are made available to the two CPUs in alternation and for a limited amount of time each; if one CPU doesn't respond to the interrupt within the available period, the interrupt is passed on to the next CPU in sequence (how did earlier 1108 do this?) 3. synchronization by symmetric interrupt (in a sense, see UNIVAC 1100/80) B. Symmetric initiation 1. synchronization by polling Plessey System 250 (1969) - memory-mapped I/O with capability protection; device drivers get requests from memory queues and poll until transfer is complete; interrupt-like system is also available in which each processor periodically (i.e. 100 mi- crosec.) examines a common status word for interrupt requests (various papers differ on its use in I/O) (see also B-7700) (see also IBM S/370 XA option of masking off subclasses) 2. synchronization by queueing Intel 432 - CPU does I/O by passing data capabilities to the IOP; highest level I/O facility that has been built into a system 3. synchronization by asymmetric interrupt Honeywell 6000 (1971) - any CPU can initiate I/O operations, but interrupts are directed to a single control processor, which is determined by manually set switches (did GE- 635/645/655 have this?) 4. synchronization by symmetric interrupt Burroughs D-825 (1962) - all interrupts are transmitted to each processor; an OS-controlled mask register in each proces- sor determines if it will respond to a given interrupt IBM S/360 Model 67 (1966) - IBM S/370 XA (1983) - subchannel per device; any CPU can start I/O on any device and any CPU can accept an interrupt; option- ally, interrupt requests from subchannels can be assigned to one of eight maskable interruption subclasses, and priority schemes can be programmed so that certain high priority pro- grams can be interrupted by only a small number of subclasses; if all CPUs mask off a certain subclass, the interruption status is held pending in the channel system and can be ac- cepted later by use of the test subchannel instruction; a test pending interruption instruction is also available and is used to avoid immediate context switch after LPSW is executed by interrupt handler; path busy queueing is handled by the chan- nel subsystem Sequent Balance (1986) - SLIC interrupts processor with least priority Data General MV/20000-2 - any processor can start I/O on any channel; channels either send interrupts to processor N where N is the contents of an IOC register that the OS can set, or the IOC is in DDI mode (device directed interrupts); DDI uses a table accessed by the IOC to map device numbers to processor numbers and interrupts from a particular device then go to the processor of the OS's choice Conclusion I would like to gather responses and corrections to this new tax- onomy and to the historical information. Important categories and/or machines may have been omitted, and machines may have been misclassified. Some interesting questions have been identified by previous re- viewers of the taxonomy: 1. When did the shift from fixed peripheral addresses to vari- able addresses start? 2. How do systems deal with "unsolicited input," such as from terminals? (i.e. IBM systems typically lock the keyboard until a read is executed) 3. Which systems require that the channels perform the transla- tion of virtual to real addresses? 4. Which machines perform I/O into the cache? (see UNIVAC 1100/80 for one) 5. Which machine(s) introduced memory-mapped I/O? Acknowledgements: I wish to thank Randolph Bentson at Colorado State, Hank Dietz at Purdue, Dan Kern at Univ. of Washington, Jim Haynes at UC Santa Cruz, John Levine at ISC, Barry Margolin at Thinking Machines Corp., Robert Olson at ELXSI, and Jan Stubbs at NCR for their comments and suggestions. References Yet to be done. -- Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634 INTERNET: mark@hubcap.clemson.edu UUCP: gatech!hubcap!mark