mark@hubcap.UUCP (Mark Smotherman) (12/21/88)
DRAFT -- 12/20/88
A Taxonomy of I/O Systems
Mark Smotherman
Dept. of Computer Science
Clemson University, Clemson, SC 29634-1906
(803) 656-5878, INTERNET: mark@hubcap.clemson.edu
Introduction
I am attempting to compile a taxonomy of I/O systems that is
based on the program sequencing necessary for the control of I/O
devices. Typical textbook presentations of I/O systems are too
simplistic in that they identify only four categories: 1)
program-controlled I/O (i.e. polling), 2) interrupt-driven I/O,
3) DMA, and 4) channel I/O. A study of the historical develop-
ment of I/O systems reveals a much richer design space.
G.A. Blaauw and F.P. Brooks, Jr., have come much closer to a
comprehensive categorization of the area. In their manuscript,
Computer Architecture, these authors identify essentially seven
categories of I/O systems:
I. Dependent I/O
A. direct
B. single instruction overlap
1. private buffer per device
2. shared buffer
a. dedicated to I/O usage
b. general buffer in main memory
II. Autonomous I/O
A. channel (specialized controller/processor)
B. peripheral processor (generalized processor)
1. homogeneous multiprocessor structure
2. heterogeneous multiprocessor structure
I believe that an I/O taxonomy should make finer distinctions
than either of those above and that multiprocessor issues should
be identified explicitly. For uniprocessors, I see the major
categories as synchronous I/O versus the several different ways
in which overlapped I/O operations can be initiated. For the
overlapped operations, I have chosen the method of completion re-
porting as the basis of subcategories, and under that I have
identified the method of transfer.
In terms of the method of transfer, I make a distinction between
a controller that can transfer only one block before requiring
CPU intervention and a controller that can transfer multiple
blocks in a scatter/gather type of operation (in which the blocks
are identified to the controller by a chain of descriptors).
Many designers and authors call a controller with the latter ca-
pability an I/O channel; however, I reserve that term for a spe-
cialized I/O processor that fetches instructions that have an
identifiable opcode field. I note that Bell and Newell categor-
ize controllers with scatter/gather capability as Pios since they
consider the chain of block descriptors essentially a series of
jump instructions.
From Blaauw and Brooks, I accept their distinction between I/O
channels and I/O processors as being the general ability to
count. That is, an I/O processor should have the ability to
maintain a loop or event count that is unrelated to the transfer
of a given number of characters per block.
For multiprocessors, the technique of I/O initiation is not as
important as the symmetry of initiation; therefore, this symmetry
or lack of it becomes the basis of the major categories. The
method of completion reporting is the basis of subcategories, and
symmetry in interruption is explicitly identified.
A classification of historical machines into the different
categories should help define the categories. Indeed, in line
with the thinking of Blaauw and Brooks, I believe that the earli-
est machine with a given feature will likely exhibit that feature
in its least disguised and least compromised form. For older
machines with multiple I/O options, I have chosen to classify
them according to their established use (e.g. the IBM S/360 has
synchronous I/O capability, which is largely unused). Some mul-
tiprocessors appear in the first section; this is because they
represent the first use of a given transfer initiation method.
The I/O classification I present also serves as a guided tour of
the history of I/O systems. However, not all categories are po-
pulated with machines. This may be the result of omissions on my
part, or the category may indeed be unfruitful. I would like to
characterize the reasons for the latter occurrence.
A New Taxonomy
I. CPU - I/O INTERACTION
A. Synchronous transfer
ERA 1103 (delivered 1953) - interlocked I/O; word at a time
(Blaauw and Brooks credit a UNIVAC 1103A at NASA as the first
computer to use the interrupt concept; a 1956 paper reports
that an interrupt was used to preempt a batch machine and start
data collection from a wind tunnel; see also the UNIVAC I and
NBS DYSEAC)
IBM 702 (announced 1953) - CPU stalls while a block of charac-
ters are transferred from an I/O device buffer; introduced con-
trol unit concept
IBM 1401 (1959) - originally designed as a printer controller
but found widespread use as a small business computer; uses one
opcode per I/O device: 1 is read a card into location 0 to 79,
2 is punch a card from another location, 3 is print from a
third location, etc.; the CPU stalls while the characters are
transferred; a card duplicating program could be written in
about 20 characters and punched onto one card
B. Asynchronous transfer
1. interlocked instruction to start transfer
a. synchronization by interlock
UNIVAC I (1951) - one 60-word tape buffer for input and one
for output; initial input instruction starts transfer to
buffer and then releases CPU for overlapped instruction exe-
cution; subsequent input instruction dumps buffer to memory,
starts next transfer, and then releases CPU; if subsequent
input instruction is issued too early then interlock stalls
CPU; I/O errors halt the CPU and the operator must diagnose
(Codd in 1962 article credits the UNIVAC I as "one of the
earliest machines to be equipped with program interruption"
since he states that an arithmetic overflow would cause the
program to stop; Eckert's 1951 paper mentions several checks
that can stop the machine)
IBM 701 (announced 1952) - "copy logic": after an initial
prepare to read (or write) instruction, the program must is-
sue a copy instruction for each word in the transfer; a loop
is coded to update the memory addresses and issue the copies,
and the loop may also perform superficial processing such as
character code conversion; the copy instruction is inter-
locked so that an early issue is stalled until the I/O device
can provide/accept the next word; at end of file the copy in-
struction causes a one-instruction skip, and at the end of
block it causes a two-instruction skip
b. synchronization by polling
i. separate instructions to poll and transfer data
(earlier machines? CDC 160?)
PDP-1 (1959) - conditional skips on I/O buffer register con-
tents are apparently used to poll for transfer completion
PDP-8 (1965) - conditional skips on control unit status re-
gisters are used for polling
ii. controller transfers words of block (i.e. DMA)
(did Whirlwind I have this capability? -- Everett, 1951 pa-
per: "In general the computer continues to run during termi-
nal equipment wait times.")
IBM SAGE (started 1952, operational 1955) - I/O operations
to start block transfers of data to/from drum buffers; I/O
operations proceed in parallel with further CPU operations;
a controller generates the sequential memory addresses for
the block and decrements a counter; CPU has conditional
branch to test completion of transfer; transfers are inter-
locked so that CPU is stalled if second transfer is attempt-
ed before previous one ends (Serrell, et al., in 1962 iden-
tify "computation in parallel with I/O" as a significant new
feature of SAGE).
iii. controller with scatter/gather capability (often called
an I/O channel)
UNIVAC 1108?
iv. I/O channel (with specialized I/O instruction set)
IBM 709 (announced 1957) - I/O processor with specialized
instruction set; CPU has instruction to start channel and
conditional branch to test completion of channel function;
start channel is interlocked so that CPU is stalled if
second start is issued before previous one ends (note:
Greenstadt paper says external interrupt available)
v. I/O processor
(example unknown)
c. synchronization by interrupt
i. separate instruction to transfer data
NBS DYSEAC (2 papers published in 1954) - 2 program
counters; I/O signal causes CPU to switch PC's; bit in each
instruction can force switch back between PC's (Codd in 1962
article states, "in the NBS DYSEAC the very significant step
was made of extending interruption to input-output opera-
tions")
Lincoln TX-2 (1957 paper) - "multiple sequence": 33 program
counters; each I/O device has a dedicated PC and operates at
a fixed priority (i.e. forerunner of interrupt vector); each
instruction has break and dismiss bits; break is used to in-
dicate points at which a higher-priority sequence can take
over, while dismiss is used to allow lower-priority se-
quences to resume (Blaauw and Brooks classify this machine
as having PPUs, but I see the explicit instruction bits as a
recognition of the sharing of a single CPU; thus, I consider
this machine closer to interrupt vectoring than to virtual
PPUs)
PDP-1 (1959) - Bell, et al., credit the "16-channel sequence
break system" to TX-2 influence (actual operation not
described)
ii. controller transfers words of block (i.e. DMA)
CDC 1604 (1960) ?
iii. controller with scatter/gather capability (often called
an I/O channel)
IBM 7070 (announced 1958) - "priority processing" (I/O in-
terrupt); I/O completion causes CPU to switch to uninterrup-
tible priority routine; return address stored in register;
scatter/gather capability possible with chain of record de-
finition words
IBM STRETCH (started 1954, delivered 1961) - I/O exchange
acts as byte multiplexor; I/O completion is part of a
comprehensive interrupt vector facility (vector contains
single instructions to be executed outside the normal in-
struction cycle, these instructions can be single-
instruction fixups or subroutine calls); interrupt nesting
is allowed but I/O is treated as a single cause
Honeywell 800 (1963) - hardware-assisted multiprogramming; 8
virtual processors, each has 2 program counters and an indi-
vidual interrupt vector base register; for each memory cycle
the hardware scans on a priority basis for activity on the 8
input controllers, then the 8 output controllers, and then
the CPU; within the CPU the hardware scans in a cyclic
manner with various exceptions for multiple memory cycle
operations
iv. I/O channel (with specialized I/O instruction set)
IBM 7090 (announced 1958) - addition of data-channel trap
(I/O interrupt) to 709 architecture; causes CPU to branch to
special instruction sequence
v. I/O processor
UNIVAC LARC (started 1954, delivered 1960) - high-level re-
quest packets (e.g. record number or key) are sent to an I/O
processor, which also performs services such as device
queueing; the requesting processor is interrupted when its
request is complete; main memory contention is resolved by a
time-slotted memory bus
3. conditional instruction to start transfer
a. synchronization by polling
(example unknown)
b. synchronization by interrupt
IBM S/360 (1964) - start I/O instruction sets condition code
according to success of initiation (path may be busy and CPU
must perform queueing, or error may exist); channel I/O but
less complex instruction set for channel than 7090 channels
4. mailbox deposit to start transfer
a. synchronization by polling
CDC 6600 (1964) - in the typical OS structure, PPUs poll
reserved main memory locations (input mailboxes) to determine
I/O requests; after starting a device in response to an I/O
request, a PPU will poll the device until completion; at com-
pletion the PPU will place a completion notice in its output
mailbox; programs can poll the output mailbox or can be
suspended until the PPU running the OS sees the completion
notice and resumes the program by an exchange jump; before an
output transfer the PPU must move the data from the shared
main memory to its local memory, likewise after an input
transfer the PPU must move the data from its local memory to
the shared main memory; PPU execution is time-shared equally
TI-ASC (1972) - PPU execution can be weighted to favor cer-
tain PPU's
b. synchronization by interrupt
(example unknown)
5. queue insert to start transfer
a. synchronization by polling
Burroughs B7700 (1973) - reserved locations exist in main
memory that define head and tail pointers to I/O device re-
quest queues and I/O completion block queues; queue manipula-
tions by the CPU and I/O module are atomic actions; any IOM
can handle any device; a start I/O instruction begins IOM
processing on a specified device queue; processing continues
until an error, interrupt, or empty queue; the CPU polls the
completion block queue; optionally interrupts can be generat-
ed on completion of each request
(see also IBM S/370 XA where path busy queueing is handled by
the channel subsystem)
b. synchronization by queueing
Honeywell Series 60 Level 64 (1974) - microcoded semaphore
operations used in I/O
ELXSI System 6400 - uses messages as a synchronization
mechanism between both OS processes and I/O controllers; an
I/O processor notifies the controller that a message is pend-
ing, but it is the responsibility of the controller to handle
queues, including processing things out of order, and han-
dling errors
c. synchronization by interrupt
(see also IBM S/370 XA where path busy queueing is handled by
the channel subsystem)
6. asynchronous instruction to start transfer
a. synchronization by polling
(example unknown)
b. synchronization by interrupt
IBM S/370 (1970) - SIOF (start I/O fast release) used to
release CPU after channel has fetched CAW and before the
channel has determined if I/O operation can be successfully
initiated; interrupt occurs if device or path is busy (was
assumed to be infrequent but on later systems the interrupt
overhead canceled out performance gain from the fast release
of CPU)
II. MULTIPROCESSOR I/O
A. Asymmetric initiation
1. synchronization by polling
(example unknown)
2. synchronization by asymmetric interrupt
Burroughs B5500 (1964) - up to two CPUs; only master CPU can
initiate I/O; ITI instruction to test for pending interrupt at
end of interrupt handling; I/O channels and CPUs are
crossbarred with memory modules, also the I/O channels are
crossbarred with all the peripherals
IBM S/370 MP - channels and devices are dedicated to a partic-
ular CPU
UNIVAC 1100 Model 80 (1977) - up to four CPUs, but I/O must
initiated by either of the two CPU's that connect to the
storage interface unit that controls memory access for the I/O
device; I/O is directed to the cache in the SIU rather than
directly to main memory; I/O interrupts are made available to
the two CPUs in alternation and for a limited amount of time
each; if one CPU doesn't respond to the interrupt within the
available period, the interrupt is passed on to the next CPU
in sequence (how did earlier 1108 do this?)
3. synchronization by symmetric interrupt
(in a sense, see UNIVAC 1100/80)
B. Symmetric initiation
1. synchronization by polling
Plessey System 250 (1969) - memory-mapped I/O with capability
protection; device drivers get requests from memory queues and
poll until transfer is complete; interrupt-like system is also
available in which each processor periodically (i.e. 100 mi-
crosec.) examines a common status word for interrupt requests
(various papers differ on its use in I/O)
(see also B-7700)
(see also IBM S/370 XA option of masking off subclasses)
2. synchronization by queueing
Intel 432 - CPU does I/O by passing data capabilities to the
IOP; highest level I/O facility that has been built into a
system
3. synchronization by asymmetric interrupt
Honeywell 6000 (1971) - any CPU can initiate I/O operations,
but interrupts are directed to a single control processor,
which is determined by manually set switches (did GE-
635/645/655 have this?)
4. synchronization by symmetric interrupt
Burroughs D-825 (1962) - all interrupts are transmitted to
each processor; an OS-controlled mask register in each proces-
sor determines if it will respond to a given interrupt
IBM S/360 Model 67 (1966) -
IBM S/370 XA (1983) - subchannel per device; any CPU can start
I/O on any device and any CPU can accept an interrupt; option-
ally, interrupt requests from subchannels can be assigned to
one of eight maskable interruption subclasses, and priority
schemes can be programmed so that certain high priority pro-
grams can be interrupted by only a small number of subclasses;
if all CPUs mask off a certain subclass, the interruption
status is held pending in the channel system and can be ac-
cepted later by use of the test subchannel instruction; a test
pending interruption instruction is also available and is used
to avoid immediate context switch after LPSW is executed by
interrupt handler; path busy queueing is handled by the chan-
nel subsystem
Sequent Balance (1986) - SLIC interrupts processor with least
priority
Data General MV/20000-2 - any processor can start I/O on any
channel; channels either send interrupts to processor N where
N is the contents of an IOC register that the OS can set, or
the IOC is in DDI mode (device directed interrupts); DDI uses
a table accessed by the IOC to map device numbers to processor
numbers and interrupts from a particular device then go to the
processor of the OS's choice
Conclusion
I would like to gather responses and corrections to this new tax-
onomy and to the historical information. Important categories
and/or machines may have been omitted, and machines may have been
misclassified.
Some interesting questions have been identified by previous re-
viewers of the taxonomy:
1. When did the shift from fixed peripheral addresses to vari-
able addresses start?
2. How do systems deal with "unsolicited input," such as from
terminals? (i.e. IBM systems typically lock the keyboard until
a read is executed)
3. Which systems require that the channels perform the transla-
tion of virtual to real addresses?
4. Which machines perform I/O into the cache? (see UNIVAC
1100/80 for one)
5. Which machine(s) introduced memory-mapped I/O?
Acknowledgements: I wish to thank Randolph Bentson at Colorado
State, Hank Dietz at Purdue, Dan Kern at Univ. of Washington, Jim
Haynes at UC Santa Cruz, John Levine at ISC, Barry Margolin at
Thinking Machines Corp., Robert Olson at ELXSI, and Jan Stubbs at
NCR for their comments and suggestions.
References
Yet to be done.
--
Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634
INTERNET: mark@hubcap.clemson.edu UUCP: gatech!hubcap!mark