[comp.arch] Revised I/O taxonomy

mark@hubcap.UUCP (Mark Smotherman) (12/21/88)

DRAFT -- 12/20/88


                    A Taxonomy of I/O Systems

                         Mark Smotherman

                    Dept. of Computer Science
           Clemson University, Clemson, SC 29634-1906
        (803) 656-5878, INTERNET: mark@hubcap.clemson.edu



Introduction

I am attempting to compile a taxonomy  of  I/O  systems  that  is
based  on the program sequencing necessary for the control of I/O
devices.  Typical textbook presentations of I/O systems  are  too
simplistic  in  that  they  identify  only  four  categories:  1)
program-controlled I/O (i.e. polling), 2)  interrupt-driven  I/O,
3)  DMA,  and 4) channel I/O.  A study of the historical develop-
ment of I/O systems reveals a much richer design space.

G.A. Blaauw and F.P. Brooks, Jr., have  come  much  closer  to  a
comprehensive  categorization  of the area.  In their manuscript,
Computer Architecture, these authors identify  essentially  seven
categories of I/O systems:

 I.  Dependent I/O
  A.  direct
  B.  single instruction overlap
   1.  private buffer per device
   2.  shared buffer
    a.  dedicated to I/O usage
    b.  general buffer in main memory
 II.  Autonomous I/O
  A.  channel (specialized controller/processor)
  B.  peripheral processor (generalized processor)
   1.  homogeneous multiprocessor structure
   2.  heterogeneous multiprocessor structure

I believe that an I/O taxonomy  should  make  finer  distinctions
than  either of those above and that multiprocessor issues should
be identified explicitly.  For uniprocessors,  I  see  the  major
categories  as  synchronous I/O versus the several different ways
in which overlapped I/O operations can  be  initiated.   For  the
overlapped operations, I have chosen the method of completion re-
porting as the basis of subcategories,  and  under  that  I  have
identified the method of transfer.

In terms of the method of transfer, I make a distinction  between
a  controller  that  can transfer only one block before requiring
CPU intervention and a  controller  that  can  transfer  multiple
blocks in a scatter/gather type of operation (in which the blocks
are identified to the controller  by  a  chain  of  descriptors).
Many  designers and authors call a controller with the latter ca-
pability an I/O channel; however, I reserve that term for a  spe-
cialized  I/O  processor  that  fetches instructions that have an
identifiable opcode field.  I note that Bell and Newell  categor-
ize controllers with scatter/gather capability as Pios since they
consider the chain of block descriptors essentially a  series  of
jump instructions.

From Blaauw and Brooks, I accept their  distinction  between  I/O
channels  and  I/O  processors  as  being  the general ability to
count.  That is, an I/O processor  should  have  the  ability  to
maintain  a loop or event count that is unrelated to the transfer
of a given number of characters per block.

For multiprocessors, the technique of I/O initiation  is  not  as
important as the symmetry of initiation; therefore, this symmetry
or lack of it becomes the basis of  the  major  categories.   The
method of completion reporting is the basis of subcategories, and
symmetry in interruption is explicitly identified.

A  classification  of  historical  machines  into  the  different
categories  should  help  define the categories.  Indeed, in line
with the thinking of Blaauw and Brooks, I believe that the earli-
est machine with a given feature will likely exhibit that feature
in its least disguised and least  compromised  form.   For  older
machines  with  multiple  I/O  options, I have chosen to classify
them according to their established use (e.g. the IBM  S/360  has
synchronous  I/O capability, which is largely unused).  Some mul-
tiprocessors appear in the first section; this  is  because  they
represent the first use of a given transfer initiation method.

The I/O classification I present also serves as a guided tour  of
the  history of I/O systems.  However, not all categories are po-
pulated with machines.  This may be the result of omissions on my
part,  or the category may indeed be unfruitful.  I would like to
characterize the reasons for the latter occurrence.


A New Taxonomy

I.  CPU - I/O INTERACTION

 A.  Synchronous transfer

  ERA 1103 (delivered 1953) - interlocked I/O;  word  at  a  time
  (Blaauw  and  Brooks credit a UNIVAC 1103A at NASA as the first
  computer to use the interrupt concept;  a  1956  paper  reports
  that an interrupt was used to preempt a batch machine and start
  data collection from a wind tunnel; see also the UNIVAC  I  and
  NBS DYSEAC)

  IBM 702 (announced 1953) - CPU stalls while a block of  charac-
  ters are transferred from an I/O device buffer; introduced con-
  trol unit concept

  IBM 1401 (1959) - originally designed as a  printer  controller
  but found widespread use as a small business computer; uses one
  opcode per I/O device: 1 is read a card into location 0 to  79,
  2  is  punch  a  card  from another location, 3 is print from a
  third location, etc.; the CPU stalls while the  characters  are
  transferred;  a  card  duplicating  program could be written in
  about 20 characters and punched onto one card


 B.  Asynchronous transfer

  1.  interlocked instruction to start transfer

   a.  synchronization by interlock

    UNIVAC I (1951) - one 60-word tape buffer for input  and  one
    for  output;  initial  input  instruction  starts transfer to
    buffer and then releases CPU for overlapped instruction  exe-
    cution;  subsequent input instruction dumps buffer to memory,
    starts next transfer, and then releases  CPU;  if  subsequent
    input  instruction  is issued too early then interlock stalls
    CPU; I/O errors halt the CPU and the operator  must  diagnose
    (Codd  in  1962  article  credits the UNIVAC I as "one of the
    earliest machines to be equipped with  program  interruption"
    since  he  states that an arithmetic overflow would cause the
    program to stop; Eckert's 1951 paper mentions several  checks
    that can stop the machine)

    IBM 701 (announced 1952) - "copy  logic":  after  an  initial
    prepare  to read (or write) instruction, the program must is-
    sue a copy instruction for each word in the transfer; a  loop
    is coded to update the memory addresses and issue the copies,
    and the loop may also perform superficial processing such  as
    character  code  conversion;  the  copy instruction is inter-
    locked so that an early issue is stalled until the I/O device
    can provide/accept the next word; at end of file the copy in-
    struction causes a one-instruction skip, and at  the  end  of
    block it causes a two-instruction skip

   b.  synchronization by polling

    i.  separate instructions to poll and transfer data

     (earlier machines?  CDC 160?)

     PDP-1 (1959) - conditional skips on I/O buffer register con-
     tents are apparently used to poll for transfer completion

     PDP-8 (1965) - conditional skips on control unit status  re-
     gisters are used for polling

    ii.  controller transfers words of block (i.e. DMA)

     (did Whirlwind I have this capability? -- Everett, 1951  pa-
     per: "In general the computer continues to run during termi-
     nal equipment wait times.")

     IBM SAGE (started 1952, operational 1955) -  I/O  operations
     to  start  block transfers of data to/from drum buffers; I/O
     operations proceed in parallel with further CPU  operations;
     a  controller  generates the sequential memory addresses for
     the block and decrements  a  counter;  CPU  has  conditional
     branch  to test completion of transfer; transfers are inter-
     locked so that CPU is stalled if second transfer is attempt-
     ed  before previous one ends (Serrell, et al., in 1962 iden-
     tify "computation in parallel with I/O" as a significant new
     feature of SAGE).

    iii.  controller with scatter/gather capability (often called
    an I/O channel)

    UNIVAC 1108?

    iv.  I/O channel (with specialized I/O instruction set)

     IBM 709 (announced 1957) - I/O  processor  with  specialized
     instruction  set;  CPU  has instruction to start channel and
     conditional branch to test completion of  channel  function;
     start  channel  is  interlocked  so  that  CPU is stalled if
     second start is  issued  before  previous  one  ends  (note:
     Greenstadt paper says external interrupt available)

    v.  I/O processor

     (example unknown)

   c.  synchronization by interrupt

    i.  separate instruction to transfer data

     NBS  DYSEAC  (2  papers  published  in  1954)  -  2  program
     counters;  I/O signal causes CPU to switch PC's; bit in each
     instruction can force switch back between PC's (Codd in 1962
     article states, "in the NBS DYSEAC the very significant step
     was made of extending interruption  to  input-output  opera-
     tions")

     Lincoln TX-2 (1957 paper) - "multiple sequence": 33  program
     counters; each I/O device has a dedicated PC and operates at
     a fixed priority (i.e. forerunner of interrupt vector); each
     instruction has break and dismiss bits; break is used to in-
     dicate points at which a higher-priority sequence  can  take
     over,  while  dismiss  is  used  to allow lower-priority se-
     quences to resume (Blaauw and Brooks classify  this  machine
     as having PPUs, but I see the explicit instruction bits as a
     recognition of the sharing of a single CPU; thus, I consider
     this  machine  closer to interrupt vectoring than to virtual
     PPUs)

     PDP-1 (1959) - Bell, et al., credit the "16-channel sequence
     break  system"  to  TX-2  influence  (actual  operation  not
     described)

    ii.  controller transfers words of block (i.e. DMA)

     CDC 1604 (1960) ?

    iii.  controller with scatter/gather capability (often called
    an I/O channel)

     IBM 7070 (announced 1958) - "priority processing"  (I/O  in-
     terrupt); I/O completion causes CPU to switch to uninterrup-
     tible priority routine; return address stored  in  register;
     scatter/gather  capability possible with chain of record de-
     finition words

     IBM STRETCH (started 1954, delivered 1961)  -  I/O  exchange
     acts  as  byte  multiplexor;  I/O  completion  is  part of a
     comprehensive interrupt  vector  facility  (vector  contains
     single  instructions  to  be executed outside the normal in-
     struction  cycle,  these   instructions   can   be   single-
     instruction  fixups  or subroutine calls); interrupt nesting
     is allowed but I/O is treated as a single cause

     Honeywell 800 (1963) - hardware-assisted multiprogramming; 8
     virtual processors, each has 2 program counters and an indi-
     vidual interrupt vector base register; for each memory cycle
     the hardware scans on a priority basis for activity on the 8
     input controllers, then the 8 output controllers,  and  then
     the  CPU;  within  the  CPU  the  hardware scans in a cyclic
     manner with various exceptions  for  multiple  memory  cycle
     operations

    iv.  I/O channel (with specialized I/O instruction set)

     IBM 7090 (announced 1958) - addition  of  data-channel  trap
     (I/O interrupt) to 709 architecture; causes CPU to branch to
     special instruction sequence

    v.  I/O processor

     UNIVAC LARC (started 1954, delivered 1960) - high-level  re-
     quest packets (e.g. record number or key) are sent to an I/O
     processor, which  also  performs  services  such  as  device
     queueing;  the  requesting processor is interrupted when its
     request is complete; main memory contention is resolved by a
     time-slotted memory bus


 3.  conditional instruction to start transfer

   a.  synchronization by polling

    (example unknown)

   b.  synchronization by interrupt

    IBM S/360 (1964) - start I/O instruction sets condition  code
    according  to success of initiation (path may be busy and CPU
    must perform queueing, or error may exist); channel  I/O  but
    less complex instruction set for channel than 7090 channels


 4.  mailbox deposit to start transfer

   a.  synchronization by polling

    CDC 6600 (1964) - in the  typical  OS  structure,  PPUs  poll
    reserved main memory locations (input mailboxes) to determine
    I/O requests; after starting a device in response to  an  I/O
    request, a PPU will poll the device until completion; at com-
    pletion the PPU will place a completion notice in its  output
    mailbox;  programs  can  poll  the  output  mailbox or can be
    suspended until the PPU running the OS  sees  the  completion
    notice and resumes the program by an exchange jump; before an
    output transfer the PPU must move the data  from  the  shared
    main  memory  to  its  local  memory, likewise after an input
    transfer the PPU must move the data from its local memory  to
    the shared main memory; PPU execution is time-shared equally

    TI-ASC (1972) - PPU execution can be weighted to  favor  cer-
    tain PPU's

   b.  synchronization by interrupt

    (example unknown)


 5.  queue insert to start transfer

   a.  synchronization by polling

    Burroughs B7700 (1973) - reserved  locations  exist  in  main
    memory  that  define head and tail pointers to I/O device re-
    quest queues and I/O completion block queues; queue manipula-
    tions  by  the CPU and I/O module are atomic actions; any IOM
    can handle any device; a start  I/O  instruction  begins  IOM
    processing  on a specified device queue; processing continues
    until an error, interrupt, or empty queue; the CPU polls  the
    completion block queue; optionally interrupts can be generat-
    ed on completion of each request

    (see also IBM S/370 XA where path busy queueing is handled by
    the channel subsystem)

   b.  synchronization by queueing

    Honeywell Series 60 Level 64 (1974)  -  microcoded  semaphore
    operations used in I/O

    ELXSI System  6400  -  uses  messages  as  a  synchronization
    mechanism  between  both OS processes and I/O controllers; an
    I/O processor notifies the controller that a message is pend-
    ing, but it is the responsibility of the controller to handle
    queues, including processing things out of  order,  and  han-
    dling errors

   c.  synchronization by interrupt

    (see also IBM S/370 XA where path busy queueing is handled by
    the channel subsystem)


  6.  asynchronous instruction to start transfer

   a.  synchronization by polling

    (example unknown)

   b.  synchronization by interrupt

    IBM S/370 (1970) - SIOF (start  I/O  fast  release)  used  to
    release  CPU  after  channel  has  fetched CAW and before the
    channel has determined if I/O operation can  be  successfully
    initiated;  interrupt  occurs  if device or path is busy (was
    assumed to be infrequent but on later systems  the  interrupt
    overhead  canceled out performance gain from the fast release
    of CPU)



II.  MULTIPROCESSOR I/O

 A.  Asymmetric initiation

  1.  synchronization by polling

   (example unknown)

  2.  synchronization by asymmetric interrupt

   Burroughs B5500 (1964) - up to two CPUs; only master  CPU  can
   initiate I/O; ITI instruction to test for pending interrupt at
   end  of  interrupt  handling;  I/O  channels  and   CPUs   are
   crossbarred  with  memory  modules,  also the I/O channels are
   crossbarred with all the peripherals

   IBM S/370 MP - channels and devices are dedicated to a partic-
   ular CPU

   UNIVAC 1100 Model 80 (1977) - up to four CPUs,  but  I/O  must
   initiated  by  either  of  the  two  CPU's that connect to the
   storage interface unit that controls memory access for the I/O
   device;  I/O  is  directed to the cache in the SIU rather than
   directly to main memory; I/O interrupts are made available  to
   the  two  CPUs in alternation and for a limited amount of time
   each; if one CPU doesn't respond to the interrupt  within  the
   available  period,  the interrupt is passed on to the next CPU
   in sequence (how did earlier 1108 do this?)

  3.  synchronization by symmetric interrupt

   (in a sense, see UNIVAC 1100/80)


 B.  Symmetric initiation

  1.  synchronization by polling

   Plessey System 250 (1969) - memory-mapped I/O with  capability
   protection; device drivers get requests from memory queues and
   poll until transfer is complete; interrupt-like system is also
   available  in  which each processor periodically (i.e. 100 mi-
   crosec.) examines a common status word for interrupt  requests
   (various papers differ on its use in I/O)

   (see also B-7700)

   (see also IBM S/370 XA option of masking off subclasses)

  2.  synchronization by queueing

   Intel 432 - CPU does I/O by passing data capabilities  to  the
   IOP;  highest  level  I/O  facility that has been built into a
   system

  3.  synchronization by asymmetric interrupt

   Honeywell 6000 (1971) - any CPU can initiate  I/O  operations,
   but  interrupts  are  directed  to a single control processor,
   which  is  determined  by  manually  set  switches  (did   GE-
   635/645/655 have this?)

  4.  synchronization by symmetric interrupt

   Burroughs D-825 (1962) - all  interrupts  are  transmitted  to
   each processor; an OS-controlled mask register in each proces-
   sor determines if it will respond to a given interrupt

   IBM S/360 Model 67 (1966) -

   IBM S/370 XA (1983) - subchannel per device; any CPU can start
   I/O on any device and any CPU can accept an interrupt; option-
   ally, interrupt requests from subchannels can be  assigned  to
   one  of  eight  maskable interruption subclasses, and priority
   schemes can be programmed so that certain high  priority  pro-
   grams can be interrupted by only a small number of subclasses;
   if all CPUs mask off  a  certain  subclass,  the  interruption
   status  is  held  pending in the channel system and can be ac-
   cepted later by use of the test subchannel instruction; a test
   pending interruption instruction is also available and is used
   to avoid immediate context switch after LPSW  is  executed  by
   interrupt  handler; path busy queueing is handled by the chan-
   nel subsystem

   Sequent Balance (1986) - SLIC interrupts processor with  least
   priority

   Data General MV/20000-2 - any processor can start I/O  on  any
   channel;  channels either send interrupts to processor N where
   N is the contents of an IOC register that the OS can  set,  or
   the  IOC is in DDI mode (device directed interrupts); DDI uses
   a table accessed by the IOC to map device numbers to processor
   numbers and interrupts from a particular device then go to the
   processor of the OS's choice


Conclusion

I would like to gather responses and corrections to this new tax-
onomy  and  to  the historical information.  Important categories
and/or machines may have been omitted, and machines may have been
misclassified.

Some interesting questions have been identified by  previous  re-
viewers of the taxonomy:

1.  When did the shift from fixed peripheral addresses  to  vari-
 able addresses start?

2.  How do systems deal with "unsolicited input,"  such  as  from
 terminals?   (i.e. IBM systems typically lock the keyboard until
 a read is executed)

3.  Which systems require that the channels perform the  transla-
 tion of virtual to real addresses?

4.  Which machines perform  I/O  into  the  cache?   (see  UNIVAC
 1100/80 for one)

5.  Which machine(s) introduced memory-mapped I/O?


Acknowledgements: I wish to thank Randolph  Bentson  at  Colorado
State, Hank Dietz at Purdue, Dan Kern at Univ. of Washington, Jim
Haynes at UC Santa Cruz, John Levine at ISC,  Barry  Margolin  at
Thinking Machines Corp., Robert Olson at ELXSI, and Jan Stubbs at
NCR for their comments and suggestions.


References

Yet to be done.
-- 
Mark Smotherman, Comp. Sci. Dept., Clemson University, Clemson, SC 29634
INTERNET: mark@hubcap.clemson.edu    UUCP: gatech!hubcap!mark