carl@nrcaero.UUCP (Carl P. Swail) (02/22/85)
: '!/bin/sh ' : 'This is a shell archive, meaning: ' : '1. Remove everything above the #!/bin/sh line. ' : '2. Save the resulting text in a file. ' : '3. Execute the file with /bin/sh (not csh) to create the files: ' : ' d.ms ' : 'This archive created: Tue Jan 29 10:06:36 1985 ' export PATH; PATH=/bin:$PATH echo shar: extracting "'d.ms'" '(57662 characters)' if test -f 'd.ms' then echo shar: over-writing existing file "'d.ms'" fi cat << \SHAR_EOF > 'd.ms' .ta 8m +8m +8m +8m +8m +8m +8m +8m +8m +8m .nr * 0 1 .ds * (\\n+*) .ds *F (\\n*) .de Gl .br .ne \\$1 .. .nr PI 4m .TL .UX Driver Manual .br Perkin-Elmer Edition VII+ .FS +Edition VII is a trademark of The Perkin-Elmer Corporation. .FE .br Release 2.4 (10/23/84) .AU Axel T. Schreiner .AI Sektion Informatik University of Ulm (W-Germany) .QP \(co 1984 by Axel T. Schreiner, Ulm, W-Germany. .ND .sp 2 .ce 1 \fIDisclaimer\fR .QP While the information in this document is believed to be accurate, it may not be. The information was largely obtained from [Rit 78] and [Tho 78]. Some Perkin-Elmer specific details could be inferred by analyzing objects from the system libraries available under binary license. However, these details can only be verified using the system sources. It is therefore suggested that this document not be circulated indiscriminantly. .NH 1 Scope and Conventions .PP This document discusses the general aspects of .UX .I block and .I character drivers. The details of the terminal driver interface, e.g., the line disciplines and thus multiplexed files and the packet driver, are beyond the scope of this document, they are discussed in [For\ 84] and [Sey\ 84]. .PP This document contains two examples: a character output driver for multiple PASLAs and the skeleton of a block driver for multiple controllers with multiple drives and a single logical disk map. The PASLA driver has been tested and it worked immediately. The block driver has been compiled and linked into the system, but the hardware side of it would have to be added. .PP The document assumes familiarity with the UNIX documentation, a working knowledge of C and systems programming in general, and a basic understanding of the devices involved. .PP .I Italics are used for emphasis, to introduce concepts, to name generic functions, and to name files. .B function() is the name of a kernel function or of a function in the driver examples. .B vector[] similarly is a vector name. .B object refers to other variables or program elements. .Gl 8v .NH 1 Theory of Operation .NH 2 Driver .PP A hardware device usually presents a rather ugly interface to the operating system. A .I driver is charged with implementing a uniform .I "software interface" for a specific device. The software interface will in general consist of the following generic routines: .RS .IP \fIconnect\fR .br Control access to the device, initialize it prior to the first use. .IP \fItransfer\fR .br Transfer data between the device and some portion of memory. .IP \fIcontrol\fR .br Set options concerning transfer, action upon error, etc. Return option settings. .IP \fIdisconnect\fR .br Terminate use of the device, reset running state. .RE .PP When this software interface is implemented for a particular device, i.e., for a particular .I "hardware interface" , the driver will in general additionally contain the following generic routines: .RS .IP \fIstart\fR .br Initiate a transfer to or from the device. .IP \fIinterrupt\fR .br Indicate termination of a transfer. Usually the next transfer needs to be started and a process waiting for the completion of the transfer needs to be informed. .RE .PP .I connect , .I transfer , .I control , and .I disconnect are generally called as requested by a process in the system. These routines are refered to as the .I "top half" of the driver; they are usually aware of the process calling them. .PP .I interrupt is called "by the device", i.e., asynchronously, and .I not in conjunction with a process. This routine is refered to as the .I "bottom half" of the driver. .I start belongs to the top or bottom half of the driver depending on its caller. .NH 2 Operating System .PP The operating system mediates data transfers between process and hardware device. The file manager component of the operating system is concerned with implementing a file system structure on suitable (block-replaceable) devices. .PP The operating system usually implements a uniform interface to files and devices for a process. This .I "process interface" would in general contain the following routines: .RS .IP \fBopen()\fR .br Connect the process to a file or device by name for reading, writing, or both. .IP \fBread()\fR .br Transfer data to the process' address space from the connected file or device. .IP \fBwrite()\fR .br Transfer data from the process' address space to the connected file or device. .IP \fBlseek()\fR .br Control the point, in a file or on a device, to or from which the next transfer will take place. .IP \fBioctl()\fR .br Control options for subsequent transfers. Make current option settings available. .IP \fBclose()\fR .br Disconnect the process from the file or device, update the file structure, etc. .RE .PP The similarity between the process interface to the i/o system and the software interface to the driver is no accident. If the process accesses a device directly, the i/o system component of the operating system will do little more then verify and transmit each request to the appropriate driver routine. If the process accesses a file, the operating system may need to translate the requests and construct appropriate calls to driver routines. .PP It is very important that there is a .I uniform software interface to .I all drivers. In this fashion each process and large parts even of the i/o system can be completely unaware of the specific details of a device. This motivates why the software interface proposed in section 2.1 does not contain a .I position routine \(em not all devices are capable of random positioning. We shall see later, how .B lseek() might be handled. .NH 2 .UX .PP .UX has the process interface described in section 2.2. However, it has .I two software interfaces, somewhat vaguely referred to as .I block and .I character interface. .PP Officially block-replaceable devices such as disks are connected using block drivers and devices such as terminals or printers, which transmit one character at a time, are connected using character drivers. Unfortunately (for this definition), there is also a character interface for most block devices .I "for efficiency reasons" , and while a magnetic tape drive tends not to be block-replaceable, it still has a block interface. .PP The proper distinction between the two interfaces appears to be the following: .PP The .I "character interface" is aware of the process for which the transfer takes place; it can therefore inform the process directly if an operation meets with a hardware error. A character driver can in general implement two generic .I transfer routines: .I read sequentially from the device into the process address space, and .I write sequentially to the device from the process address space. The amount to be transferred, as well as the relevant address in the process address space, is available to the driver through the process description in the .UX kernel. .PP While the transfer takes place sequentially, there is a notion of position on the device to or from which the transfer should take place. This position information is also passed through the process description and it is up to the character driver to interpret it. As a consequence, .B lseek() operations are, e.g., honored by the memory driver and by the raw drivers, but they are silently ignored for terminals. .PP The .I "block interface" is .I not aware of the process for which it is invoked. There is no .I control routine in this interface, and there is only one generic .I transfer routine, commonly referred to as the .I strategy routine. This routine is presented with a buffer to be transferred, i.e., a buffer header structure designating transfer direction (read into memory or write to the device), memory address and length of the buffer, as well as a starting block address on the device. The memory address generally is physical. If the device is capable of direct memory access, the .I strategy routine will usually not distinguish between a transfer to kernel or to process address space. Error reports must be made in the buffer header. Once a buffer has been dealt with, responsibility for it is returned to the operating system by calling the .B iodone() routine and transmitting the address of the buffer header to it. .PP A character interface can be .I simulated for block drivers simply by translating the character transfer request into a buffer transfer request and presenting it to the .I strategy routine. The details are handled by the .B physio() routine, which may be called by a character driver for this purpose. The buffer header is then caught again by the .B iodone() routine and the necessary process synchronization takes place internally. .PP This third version of the software interface is known as the .I "raw interface" and it is in fact quite efficient, since transfers take place directly between the device and the process address space. In the normal block interface, all data must be moved between the buffer in kernel address space and the process address space. .PP The raw interface is usually implemented through a special buffer header local to a device driver. Other special buffer headers are constructed by the operating system when a process needs to be swapped. In this case the buffer header will refer to an entire process address space. The .I strategy routine, however, need not be aware of this, as long as it does not assume any specific buffer length. The .B iodone() routine will internally inform the operating system once the swap operation or the simulation within the raw interface has been completed. .PP Buffer headers contain a device address complete with device identification and position on the device. This is used by the operating system to implement .B lseek() operations \(em the driver need not concern itself explicitly with this aspect of the process interface. .PP The complete device address in each buffer header is also used by the operating system to treat each buffer header as a .I unique representation of a particular block on a particular device. In this fashion, reliable file sharing and a software cache for the block devices can be implemented by the operating system. Problems result only from the fact that the special buffer headers mentioned above are not considered by the operating system in this search \(em the raw interface thus provides an alternate path to information on a device and phase errors can result from indiscriminate use of the raw interface. .PP Another source of problems with the raw interface is the fact that transfer requests are passed directly from the process to the driver \(em no additional buffering takes place. Disks can usually only be addressed and read or written to the sector level. While it is generally possible to read less than an entire sector, quite horrible things tend to happen to the tail end of a sector which is only partially written. Programs accessing raw devices should therefore be written in a way that only reasonable transfer lengths are requested. .PP We summarize the important aspects of the theory of drivers in the .UX system: Character drivers transfer to and from a process, they can have independent .I read and .I write transfer routines, and they report errors directly to the calling process. Block drivers transfer to and from a buffer, which is presented to the .I strategy routine for transfer and which must be released through the .B iodone() routine; the buffer header designates transfer direction and memory and device address and it receives error information. Raw drivers are character drivers simulated by calls on a block driver. .NH 1 Installing a Driver .PP A driver is identified by its index in a dispatch table. There are two such tables: .B cdevsw[] for the character interface and .B bdevsw[] for the block interface. Indices in each table start at zero and are known as .I "major numbers" for the device. There is also a .I "minor number" which is handed to the driver and can be used for arbitrary purposes; usually it indicates an instance of a device (one of several printers, etc.). Major and minor numbers are 8-bit quantities which are combined into a 16-bit device number. .PP The dispatch tables are constructed by mkconf(8) during system generation. They are subsequently available in source in the file .I /usr/sys/conf/c.c . .PP Drivers are linked statically to the kernel. A driver is installed by adding a new line to the relevant dispatch table in this file, once it has been constructed by mkconf(8).\** .FS \*(*F If possible, mkconf(8) should be modified so that .I c.c need not be edited after each reconfiguration. Alternatively, an appropriate sed(1) script can be supplied to facilitate the installation of user-written drivers into .I c.c . .FE One could also replace or delete an unused line, but the line positions establish an implicit standard for major numbers, which although unpublished should perhaps be preserved. .PP If the driver needs to receive interrupts \(em and most drivers will \(em its interrupt service routine must also be defined to the .UX kernel. The procedure for this is quite processor hardware specific. .PP Once the driver has been written and installed, a new kernel is linked which includes the new driver. In order to access the driver, a .I "special file" referencing the driver needs to be made using mknod(8). Arguments to this command determine the interface (block or character), i.e., the dispatch table .B bdevsw[] or .B cdevsw[] ; the major number, i.e., the line number in the relevant dispatch table; and the minor number, as required and used by the new driver. .PP Once the new kernel has been booted and the special file exists, the driver can be tested, e.g., using the cat(1) or dd(1) commands for simple transfer and positioning operations. stty(1) can be used to test .I control functions \(em stty(1) arguments are applied to the standard output of stty(1), which can be redirected to the new device, the arguments are passed using .B ioctl() . If the new driver implements a block interface mkfs(8) can be used to establish a file system which can then be tested with fsck(8) and mounted with mount(8). .NH 2 "bdevsw[]" .PP As a typical example, we shall construct and install the skeleton of a very simple disk driver for two controllers. We call the new device .B dk . By tacit convention all the externally visible names of the software interface start with the device name. The relevant part of .I c.c needs to be extended as follows: .DS I .so c.c1 struct bdevsw bdevsw[] = { ... .so c.c2 0 /* end of bdevsw */ }; .DE .PP The definition of .B "struct\ bdevsw" is in .I /usr/include/sys/conf.h : .DS I struct bdevsw { int (* d_open)(); int (* d_close)(); int (* d_strategy)(); struct buf * d_tab; short d_flags; }: .DE .PP The entries for one driver on one line of .B bdevsw[] are, in order: .RS .IP \fBd_open\fR .br The name of the .I connect routine if any. .B nulldev() is an existing routine which does nothing. This name should be entered if nothing needs to be done but if the call as such is permitted. .B nodev() is an existing routine which reports an error to the process calling the routine. This name should be entered if the call is not permitted. Clearly in our case the .I connect call needs to be permitted, but for a disk usually nothing needs to be done. .IP \fBd_close\fR .br The name of the .I disconnect routine, if any. Again .B nulldev() or .B nodev() can be entered as appropriate. .IP \fBd_strategy\fR .br The name of the .I transfer routine which will receive the buffer to be transferred. .IP \fBd_tab\fR .br The name of the .B buf structure used as a queue header in this driver. Such a queue header must be used and its name must be published in order for the buffer cache routines to be able to locate buffers even if they have been given to a driver. .IP \fBd_flags\fR .br This entry defines certain driver characteristics. .B BD_NORMAL is entered here for normal block drivers. .RE .PP We give each controller its own major number, i.e., its own line in .B bdevsw[] . In this fashion, each controller can be operated independently of the others. The code for the driver is \(em of course \(em .I not duplicated, it is simply referenced for several major numbers. .PP Various configuration parameters can be added to .I c.c to define them external to the actual driver. In this fashion, if the configuration changes, the driver need not be recompiled. The use of these variables is explained in section 4. For .B dk we add perhaps the following: .LP .I dk.h .DS I .so dk.h .DE .Gl 25v .LP .I c.c .DS I .so c.c3 .DE .LP Note that this code must .I precede the initialization of .B bdevsw[] since .B dk_tab is referenced there. .PP .B dkparm[] describes the hardware configuration, i.e., number of disks per controller, addresses of the various units, etc. .PP For the sake of illustration, .B dkmap[] will be used to create .I "logical disks" on each physical disk drive. The structure is global for all .B dk drives and permits 4 logical drives per physical drive. The logical drives are configured for 5 MB drives and will be identified by the last two bits of the minor number for the device. .DS I .so c.c4 .DE .PP The disk map is initialized in a typical fashion with overlapping logical drives (or .I slices ): .DS I slice blocks +-----------------+ 0 |*****************| 9792 +-----+-----------+ 1 |*****| 3264 +-----+-----------+ 2 |***********| 6528 +-----+-----+ 3 |*****| 3264 +-----+ .DE .PP mkfs(8) and mount(8) can be used to arrange for a number of combinations of file systems with different sizes. .PP The driver will handle multiple controllers. For simplicity it is installed with one major number for each controller, and the controller number is encoded in the first three bits of the minor number.\** .FS \*(*F Otherwise, the interrupt routine has problems to discover precisely which device called it. We would then need to install one interrupt routine per controller or fiddle with .B devmap[] ; compare section 3.3. .FE .PP The physical drive will be identified by bits 3 through 5 of the minor number. .NH 2 "cdevsw[]" .PP As an example, we shall install a very simple driver to use a lineprinter connected to a PASLA for output.\** .FS \*(*F This PASLA \(em obviously \(em must not be additionally configured as .B vdu through mkconf(8). .FE We call the new device .B pr . .PP If we additionally add a raw interface for the .B dk driver mentioned previously, we need to extend the relevant part of .I c.c as follows: .DS I .so c.c10 struct cdevsw cdevsw[] = { ... .so c.c11 0 }; .DE .PP The definition of .B "struct\ cdevsw" is in .I /usr/include/sys/conf.h : .DS I struct cdevsw { int (* d_open)(); int (* d_close)(); int (* d_read)(); int (* d_write)(); int (* d_ioctl)(); int (* d_stop)(); struct tty * d_tty; }: .DE .PP The entries for one driver on one line of .B cdevsw[] are, in order: .RS .IP \fBd_open\fR .br The name of the .I connect routine if any. For a printer the .I connect routine would be used to assure exclusive access for one process only. .IP \fBd_close\fR .br The name of the .I disconnect routine, if any. For a printer the completion of an exclusive access is managed here. .IP \fBd_read\fR .br The name of the .I "input transfer" routine, if any. For a printer .B nodev() is referenced, since reading is not permitted. .IP \fBd_write\fR .br The name of the .I "output transfer" routine, if any. .IP \fBd_ioctl\fR .br The name of the .I control routine, if any. For a disk .B nodev() is referenced, since a .I control call is not permitted. For a magnetic tape drive the .I control routine is usually used for device positioning operations such as .I rewind , .I "forward filemark" , .I "write filemark" , etc.\** .FS \*(*F The existence of a .I control call satisfies the .B isatty() condition, see ttyname(3). This is quite unfortunate, since printers or even magnetic tape drives can thus be confused with terminals! .FE .IP \fBd_stop\fR .br A routine specific to the terminal driver. .IP \fBd_tty\fR .br The name of a data structure specific to the terminal driver. .RE .PP Various configuration parameters can be added to .I c.c to define them external to the actual driver. For .B pr we add perhaps the following: .LP .I pr.h .DS I .so pr.h .DE .LP .I c.c .DS I .so c.c12 .DE .PP .B prparm[] contains address and mode settings for each printer, .B prnum records the number of available printers. The use of these variables is explained in section 5. .NH 2 Interrupt Handlers .PP On a Perkin-Elmer system, the devices will report back through the interrupt service pointer table. The .UX kernel uses three tables to decode which driver needs to be informed and which minor number was involved. These tables are also defined in .I c.c : .RS .IP \fBdevmap[]\fR .br This .B char vector contains the minor number for each device address used as an index.\** .FS \*(*F The major number is not needed, since the driver is expected to know, what it is, and it is not expected to use the major number for any devious purposes. For our .B dk driver, which intends to do just that, this presents a problem which we circumvented by encoding the controller number in the minor number. .FE .B devmap[] only needs to be changed, if the minor number is not 0, as would for example be the case for our second printer. In our case, the disk controllers are at addresses 0xB6 and 0xB8, the disk drives are at addresses 0xC6 through 0xC9, and the printers are at addresses 0x2D and 0x2F. .B devmap[] becomes: .DS I char devmap[] = { ... .so c.c20 ... .so c.c30 ... }; .DE .IP \fBhandler[]\fR .br This vector of pointers to integer-valued functions is initialized with the names of the interrupt service routines in each driver. The interrupt service routines .B dkint() for the disk drive, .B dkcint() for the disk controller, and .B print() for the .B pr driver must be defined external to .I c.c and must be added to the list: .DS I .so c.c22 int (* handler[])() = { spurint, ... .so c.c23 }; .DE .IP \fBdevint[]\fR .br This .B char vector contains a .I byte (!) offset into .B handler[] for each device address used as an index. The byte offsets in .B handler[] to our new interrupt service routines must be entered in the appropriate slots for the disk drives, disk controllers, and for each printer. Zero is the default for each entry. This would cause .B handler[0] to be called, i.e., .B spurint() , a routine in the .UX kernel which reports a spurious interrupt to the console. .B devint[] becomes: .DS I char devint[] = { ... .so c.c21 ... .so c.c31 ... }; .DE .RE .NH 2 Special Files .PP For the block interface to be controlled by the .B dk driver, we must create the following special files: .DS I mknod dk00 b 7 0; : first controller, entire first drive mknod dk00a b 7 1; : first drive, partial logical drives ... mknod dk00c b 7 3 mknod dk01 b 7 4; : entire second drive mknod dk01a b 7 5; : second drive, partial logical drives ... mknod dk01c b 7 7 mknod dk10 b 8 32; : second controller, first drive mknod dk10a b 8 33 ... mknod dk10c b 8 35 mknod dk11 b 8 36; : second drive mknod dk11a b 8 37 ... mknod dk11c b 8 39 .DE .LP The minor number is defined as .DS I controller * 32 + drive * 4 + slice .DE .LP where .B controller ranges from 0 to 7, .B drive ranges from 0 to 7, and .B slice ranges from 0 to 3. .PP For the raw interface to these physical drives, the following files must be created: .DS I mknod rdk00 c 16 0; : first controller, first drive mknod rdk00a c 16 1; : logical drives ... mknod rdk00c c 16 3 mknod rdk01 c 16 4; : second drive mknod rdk01a c 16 5; : logical drives ... mknod rdk01c c 16 7 mknod rdk10 c 17 32; : second controller, first drive mknod rdk10a c 17 33; : logical drives ... mknod rdk10c c 17 35 mknod rdk11 c 17 36; : second drive mknod rdk11a c 17 37; : logical drives ... mknod rdk11c c 17 39 .DE .PP For two printers, the following files must be created: .DS I mknod pr0 c 25 0; : first printer mknod pr1 c 25 1; : second printer .DE .PP One would presumably create all special files in the .I /dev directory. .PP .I Note : Unless additional logic is added, our drivers essentially believe the minor numbers presented through the special files. It is therefore extremely important that the special files only refer to devices which in fact exist in the configuration! (One should in general not keep special files in a file hierarchy for which no devices are present.) .NH 1 Block Driver .PP The following figure illustrates the basic operation of the transfer mechanism in a block driver: A .B buf is handed to the .I strategy routine. It is queued, if its transfer request is permissible, or it is immediately handed to .B iodone() , if its request is in error. .I strategy calls a .I start routine to dequeue the "next" .B buf and begin transmission if the device is available. The device will eventually \(em it better! \(em cause .I interrupt to be executed, from where .B buf is finally passed to .B iodone() for recycling. .I interrupt also calls .I start to keep the dequeueing mechanism active and the device as busy as possible. .DS I buf | V strategy ----+--------+ | | | add | | | call call V | if +-------+ | error | queue | | | | of | | | | buf | | V +-------+ | iodone() ---> buf | | ^ remove | | | | call V | | start <----+<-------+ | | | call V | device ~~~~~~~~~ interrupt .DE .PP As an example, let us look at the basic code for a disk driver supporting multiple controllers under multiple major numbers. Several drives can be attached to each controller and each drive supports the same set of logical disks, i.e., slices. Only the software aspects of this driver are shown. Before it can be used, some device-specific code must be inserted depending on the actual controller and selector channel. .PP We will simply present the entire file .I dev/dk.c in sequence. First a number of standard header files must be included and certain parameters (and hardware bits!) are defined. .DS I .so dk.c1 .DE .PP For each transfer the driver must first enqueue on its selector channel. Once the channel is reserved, the driver will seek (\c .B WSEEK ) and finally transfer (\c .B WIO ). The current state of the driver is recorded in the .B b_active component of the queue header .B dk->dk_tab for the controller. .PP In order to start a transfer, the driver must reserve the selector channel using the routine .B selchreq() . The routine is presented with the address of the selector channel, and a selector channel queue element .B "struct\ selchq" . The element was initialized (in section 3.1) with two functions supported in the .B dk driver: .B dkseek() will be called by the selector channel manager once the channel becomes available and .B dkscint() will be called if the selector channel receives an interrupt. Both routines in the driver can receive a parameter originally set up by the driver: the parameter is the third argument to the .B selchreq() call, an .B int value. We would of course like to identify the present controller and we therefore pass a pointer... .DS I .so dk.c2 .DE .PP .B error() is called, if something goes wrong. The error is reported to the console using .B deverror() and up to a point the transfer is started again. The details should become clearer as we go on. .DS I .so dk.c3 .DE .PP .B iodone() is the routine to which a buffer header must be passed once the request has been acted upon. .B selchfree() is the complement of .B selchreq() , it will release a reserved selector channel. .PP We are ready for the hard part \(em the .I strategy routine. It is presented with a pointer to a buffer header; the buffer must either be queued and later read or written or it must be disposed of through a call to .B iodone() : .DS I .so dk.c4 .DE .PP We first decode controller, drive, and slice number from the minor number of the device which we receive within the buffer header. .B minor() and .B major() are macros which extract minor and major number from a device number. .PP If .B ASSERT is defined while this driver is compiled, code is inserted to check if we can actually pass a pointer as an .B int variable. If we cannot, we are in serious trouble. .B panic() is the function which will abort the kernel in an orderly manner. Clearly it should never be called in real life. This test would only need to be executed once and should therefore be located in an .I open routine rather than in the .I strategy routine as shown here! .PP Controller and drive must be verified against the configuration parameters. If an impossible device is addressed, the buffer header can be rejected by setting the error bit(!), stating that no bytes were transfered, and passing the buffer header to .B iodone() . .PP The buffer header corresponds to a particular set of 512-byte blocks on the (logical) disk. The .I strategy routine verifies that these blocks really exist in the slice. We chose to reject the entire transfer request if it does not completely fit into the slice. Alternatively, one could satisfy it in part. The code in this segment of the routine is very standard: for the benefit of raw i/o (see section 7), we must indicate an end of file rather than rejecting a block beyond the end of a slice outright.\** .FS \*(*F This is one reason why satisfying a request partially at the end of a slice would be preferable. .FE .PP If all is well (against all odds!) the buffer can be queued. We run a first-in first-out queue and the code implementing it is straightforward. Note that there would be a race condition: while we try to queue a buffer in the top half of the driver, we might be interrupted by the bottom half trying to retrieve the next request from the queue. This is why interrupts are disabled using the .B spl5() and reenabled using the .B spl0() function surrounding the critical region.\** .FS \*(*F Actually, other UNIX implementations provide several interrupt levels. .B spl5() is meant to disable that level of interrupts which the disk can cause. .B spl0() enables .I all interrupts. .FE .PP Once the buffer has been queued, we check the .B b_active field to see if we need to get the bottom half of the driver going. If so we call .B start() and we are off. Notice that .B b_actf is the beginning of the chain of queued buffers. This is why this field is always checked by the routines in the bottom half to see if anything is going on, and to retrieve the current buffer if so. .PP .B start() will eventually cause .B dkseek() to be called. The selector channel is now available and we should initiate a .I seek operation on the device: .DS I .so dk.c5 .DE .PP Once the seek operation is completed the drive is expected to interrupt. If we expect to receive the interrupt we can then proceed to initiate the transfer: .DS I .so dk.c6 .DE .PP Completion of the transfer should cause a selector channel interrupt, which is reported to us through the mechanism set up in .B start() , followed by a controller interrupt. If we are lucky, nothing needs to be done for the selector channel interrupt: .DS I .so dk.c7 .DE .PP Once we finally receive the controller interrupt, things are slightly more difficult. An error can be disposed of using .B error() , but a cylinder overflow may require us to initiate another .I seek sequence \(em with new starting block number, etc. In the simplest case we will dequeue the current buffer header, pass it to .B iodone() and start the dequeueing mechanism from the beginning: .DS I .so dk.c8 .DE .NH 1 Character Output Driver .PP The following figure illustrates the basic operation of the output transfer mechanism in a character driver: .B bytes are handed to the .I write routine. They are queued, if the transfer request is permissible, or the calling process is informed immediately, if the request is in error (this path is not shown). .I write calls a .I start routine to dequeue the next byte and begin transmission if the device is available. The device will eventually \(em it better! \(em cause .I interrupt to be executed, which in turn calls .I start to keep the dequeueing mechanism active and the device as busy as possible. .PP So far, character and block drivers function just about identically. The real difference is that the character driver is called by a process. This process must be suspended until the driver has completed its task. Suspending is accomplished in the top half of the driver, i.e., in the .I write routine called by the process, by calling .B sleep() after queueing the bytes to be transferred by the bottom half of the driver, i.e., the .I start and .I interrupt routines and getting the transfer cycle started. .PP Once suspended, the process must later be resumed. Resuming is accomplished in the bottom half of the driver by calling .B wakeup() after the queue is empty and all bytes have been transferred as requested. .QP .I Note : .B sleep() suspends the currently active, calling process. Since during the .I interrupt phase the original calling process is not known and is certainly not the currently active process, .B sleep() must .I never be called by the .I interrupt routine \(em directly or indirectly! .DS I bytes | V write -----+-- call --+ | | | | | V add call sleep() | | once informed, | | write is done. V | :::::::::::::::::::: +-------+ | :::::::::::::::::::: | queue | | wakeup() | of | | informs sleep() | bytes | | once queue is +-------+ | (near) empty | | ^ remove | | | | call V | | start <----+<---------+ | | | call V | device ~~~~~~~~~~~ interrupt .DE .PP It turns out that queue space for .B bytes usually is at a premium. If many bytes need to be transferred, only a few are queued at a time and the top half of the driver usually goes through several queueing and .B sleep() cycles. To optimize throughput, .B wakeup() is called .I before the queue is entirely empty, so that more bytes can be queued while the device is still busy. .PP As an example, let us look at the code for a very rudimentary driver to output text to a printer connected to a PASLA. This driver has virtually no provisions for hardware errors, and it does not make use of the autodriver channel. It does, however, illustrate the typical aspects. .PP We will simply present the entire file .I dev/pr.c in sequence. First a number of standard header files must be included and certain parameters and hardware bits are defined. .DS I .so pr.c1 .DE .PP A few local routines deal directly with the hardware. .B setup() is used to set the appropriate mode bits in the PASLA and to enable the write side of the device. Mode bits and PASLA address are obtained from the appropriate components of the .B prparm structure describing the printer; this structure is passed as an argument to .B setup() : .DS I .so pr.c2 .DE .LP .B oc() is a function in the kernel which executes an .B oc machine instruction. .PP .B start() retrieves a byte from the queue and writes it to the device. The proper printer is again described by its .B prparm structure presented as an argument to .B start() : .DS I .so pr.c3 .DE .LP .B wd() is a kernel function which executes a .B wd machine instruction. .B ss() is a kernel function which executes a .B ss machine instruction and returns its result. Data can be sent to the PASLA as long as .B ss() returns zero. .PP Queueing of characters is necessary in just about every character driver. The kernel therefore provides a global queueing mechanism for this purpose: .RS .LP The queue header must be defined as a .B "struct\ clist" ; for the .B struct declaration see file .I /usr/include/sys/tty.h . .LP .B getc() takes a queue header as an argument and returns the next character from the queue. If no more characters are available .B getc() will return -1. .LP .B putc() takes a queue header and a character as arguments and adds the character to the end of the queue. .RE .QP .I Note : All queues in the kernel share the .I "character list" space. It is a grave mistake to devour it in large chunks! .PP .B queue() adds a character to a printer's queue for output. The printer is identified through its .B prparm structure; the second argument to .B queue() is the character to be added to the queue: .DS I .so pr.c4 .DE .PP As we shall see, .B start() is only called with interrupts disabled, and it may therefore modify the queue using .B getc() without the possibility of a race condition. .B queue() , however, will be called from the top half of the driver, and it must ensure that it cannot be interrupted while adding a character to the queue using .B putc() . .PP The kernel function .B spl4() is called to disable interrupts.\** .FS \*(*F Actually, other UNIX implementations provide several interrupt levels. .B spl4() is meant to disable that level of interrupts which the PASLA can cause. .B spl0() enables .I all interrupts. .FE Until .B spl0() is called, no interrupts can happen, and our call to .B putc() does therefore not invite trouble. .QP .I Note : The ability to receive interrupts is vital to the health of the UNIX kernel. Code segments in which interrupts are disabled should be small, and the .B spl \c .I x \c .B () calls must be properly balanced! .PP It was stated above that we should not hog space in the character list. If the printer queue reaches the .B HIGH limit, defined as a tunable parameter of the .B pr driver, .B queue() suspends the current process: .RS .LP .B sleep() suspends the currently active process. It must not be called in an interrupt service routine. .LP .B wakeup() resumes a set of processes which have executed .B sleep() . The argument to .B wakeup() is an integer value; exactly those processes are resumed which passed the same integer value to .B sleep() when they suspended themselves. This value is shown as .B WCHAN by ps(1). .LP The second argument of .B sleep() is the .I "wakeup priority" used in scheduling the process once it is resumed. Permissible values are between 0 and 127; if the value is below .B PZERO (see file .I /usr/include/sys/param.h ), the suspending process cannot be terminated using kill(2). .RE .PP Since .B wakeup() may permit a number of processes to resume, each process must check that the condition no longer exists for which it suspended itself; .B sleep() is therefore usually called within a .B while loop. .PP Usually a local address is used as .B WCHAN value so that as few processes as possible are resumed accidentally. The value must not be zero. .PP We are now ready to present the externally visible routines of the .B pr driver. .B propen() checks that only an existing printer is accessed and ensures by means of the .B pr_open bit that only one process at a time may access the printer. If the access is successful, .B setup() is called to initialize the device: .DS I .so pr.c5 .DE .LP .B u refers to the .I "user structure" , i.e., to the system's representation of the currently active process. The structure is described in .I /usr/include/sys/user.h . An error can be reported to the currently active process by placing an appropriate code into .B u.u_error . The codes are shown in intro(2). .PP .B propen() will be called whenever a process accesses the .B pr devices by means of an .B open (2) call. .B prclose() , however, is only called when the .I last process \(em explicitly or implicitly \(em executes a .B close (2) call. It is clear that .B propen() and .B prclose() can and must cooperate to manage exclusive access to a printer. Since processes inherit open file connections during .B fork (2) system calls .I without .B open (2) being invoked, the technique cannot be extended to manage devices for a larger, limited number of processes. .DS I .so pr.c6 .DE .PP Finally, the .B prwrite() routine. Most of the actual work is done in the .B queue() routine presented earlier. We must, however, interpret certain characters for the device: .DS I .so pr.c7 .DE .LP .B cpass() is a kernel function which will return the next byte which the currently active process wants to transfer; -1 indicates that no more bytes are left. .B passc(ch) performs the same service for a character input driver. Both functions take care of all housekeeping chores, i.e., they update .B u_count and .B u_base in the user structure. .PP The interrupt service routine .B print() concludes the mandatory parts of the .B pr driver. We call .B start() to keep the dequeueing operation active, and if necessary we inform a suspended process by means of .B wakeup() : .DS I .so pr.c8 .DE .LP The .B pr_sleep bit is used to make process synchronization more efficient. .PP Our .B pr driver has a rudimentary .I control routine. The call .DS I #include <sgtty.h> struct sgttyb buf; ... buf.sg_flags = 0xee /* PASLA command 2 */; ioctl(fd, TIOCSETP, & buf); .DE .LP can be used to set a new mode for the PASLA, and .DS I ioctl(fd, TIOCGETP, & buf); .DE .LP will retrieve the current mode. .B prioctl() shows the typical skeleton of a .I control routine: .DS I .so pr.c9 .DE .LP .B copyin() and .B copyout() are kernel functions which copy a vector from and to the currently active process. Both routines return a non-zero value if the transfer is not permissible. They are used here to obtain the .B buf contents from the active process. They could also be used in place of .B cpass() or .B passc() to copy a larger number of bytes to be transfered. All of these routines interface with the currently active process, i.e., with the user structure, and must therefore not be called from an interrupt service routine. .NH 1 Character Input Driver .PP The following figure illustrates the basic operation of the input transfer mechanism in a character driver: The calling process requests that .B count bytes should be returned by the .I read routine. They must be obtained from a queue, if the transfer request is permissible, or the calling process is informed immediately, if the request is in error (this path is not shown). .I read attempts to remove the bytes from the queue. If not enough bytes can be found, .I read calls a .I start routine to ensure that interrupts are enabled for the device. The device will eventually \(em it better! \(em cause .I interrupt to be executed, which in turn will retrieve a byte from the device and place it into the queue. Interrupts for the device usually remain enabled. The calling process meanwhile is suspended until the driver has completed its task. .DS I count | V read ----- call --+--- call ----+ ^ | | | V | remove sleep() | | once informed, | | read is done. | | :::::::::::::::::::: | +-------+ :::::::::::::::::::: | | queue | wakeup() | | of | informs sleep() | | bytes | once queue is | +-------+ (near) full | ^ ^ | | | | add call | | | | interrupt -----------+ | ! | ! | ! V device <----------------------- start .DE .PP While output transfers occur at the request of a process and are under control of the process, input transfers might be acceptable .I before a process actually calls the .I read routine. For terminals, this is known as type-ahead. Interrupts for a character input device therefore are usually enabled not by the .I read but by the .I open routine of the driver. The .I interrupt routine can then queue characters before a call to .I read is made. .PP Once again, queue space is very finite and if too many characters are typed ahead, some will have to be discarded. The terminal driver will at this point discard .I all characters in the queue. Other disciplines seem conceivable. .PP The top half of the driver usually goes through several dequeueing and .B sleep() cycles. The .B wakeup() calls must be carefully arranged if an exact .B count of bytes is to be accepted. If .B count is a maximum, and .I read is record (line) oriented, .B wakeup() can essentially be called either if .B count bytes are available or if the record terminator (newline) has been found. .PP A complete terminal driver must support input echo and line editing facilities. It interacts with the output side of the driver in many subtle ways... .NH 1 Raw Driver .PP The raw interface was introduced in section 2.3. It is the simulation of a character interface by means of a call on a block interface and some address trickery. The entire problem is handled by the .B physio() routine. .PP For our .B dk device, the raw interface is implemented as follows: .DS I .so dk.c99 .DE .PP .B physio() does all the work. It verifies that the transfer is legitimate, i.e., that it involves a memory region within the calling process's address space. If necessary .B physio() waits until the buffer header, passed as the second argument, is available. The buffer header is set up to point into the process's address space so that the transfer will take place directly between the device and the desired region. The device number, passed as the third argument, and the transfer direction, passed as the fourth argument, are written into the buffer header. Finally, the .I strategy routine, passed as the first argument, is called. .PP .B physio() then waits for the transfer to be completed, recycles the buffer header within the raw driver, notifies the calling process of any errors and returns. .PP The raw interface is quite easy to provide, as long as the corresponding block driver is willing to cope with arbitrary buffer lengths and arbitrary (physical) buffer addresses. Its drawback is that calling process and device are at each other's mercy, e.g., if the process tries to write one byte to a magnetic tape or to a disk, things certainly will go wrong. The advantage of the raw interface is its speed \(em if the device is capable of direct memory access, data is moved directly between the process's address space and the device. .NH 1 Kernel Services .PP Drivers use a number of functions provided by other parts of the system kernel. This section summarizes the functions which we have used. .IP "\fBint copyin(ua, ka, count) caddr_t ua, ka;\fR" .br Copies .B count bytes from address .B ua in the currently active process to address .B ka in kernel address space. .B count must be a multiple of four, the machine word size. Returns zero on success. See .I /usr/sys/conf/mch.s and .I /usr/include/sys/types.h . .IP "\fBint copyout(ka, ua, count) caddr_t ka, ua;\fR" .br Copies .B count bytes from address .B ka in kernel address space to address .B ua in the currently active process. .B count must be a multiple of four, the machine word size. Returns zero on success. See .I /usr/sys/conf/mch.s and .I /usr/include/sys/types.h . .IP "\fBint cpass()\fR" .br Returns -1 or the next character that a process wants to pass to the .I write side of a character driver. Updates .B u . .IP "\fBdeverror(bp, stat, ad) struct buf * bp;\fR" .br Calls .B prdev("err", bp->b_dev) .R and then prints "\c .B bn= \c .I bp->b_blkno .B er= \c .I stat .B ad= \c .I ad \c " on the console. Should really only be used after all retries are exhausted. .IP "\fBint getc(queue) struct clist * queue;\fR" .br Returns -1 or next character from list headed by .B queue . .IP "\fBiodone(bp) struct buf * bp;\fR" .br Recycles .B bp , i.e., returns it to the buffer cache, to the raw driver, or to the swap mechanism, as appropriate. .IP "\fBiomove(buf, count, flag) caddr_t buf;\fR" .br .B flag should be .B B_READ to read from the currently active process or .B B_WRITE to write to the currently active process as defined by .B "struct user u" . The transfer is for .B count bytes, to or from .B buf in kernel address space. If the move is not legal, .B u.u_error will be set. .B u is updated. .IP "\fBint major(dev)\fR" .br Returns the major number from the device number .B dev . .IP "\fBint minor(dev)\fR" .br Returns the minor number from the device number .B dev . .IP "\fBnodev()\fR" .br Sets .B u.u_error . Indicates that a call on a routine in the top half of a driver is not legal. .IP "\fBnulldev()\fR" .br Does nothing. Indicates that a call on a routine in the top half of a driver is legal, but nothing needs to be done. .Gl 4v .IP "\fBoc(ad, byte)\fR" .br Issues .B od instruction to device address .B ad ; see file .I /usr/sys/conf/mch.s . .IP "\fBpanic(s) char * s;\fR" .br Prints .B s on the console, issues a .B sync (2) and hangs the system. Should "never" happen. .IP "\fBint passc(ch)\fR" .br Places .B ch as next character into the address space of a process wishing to .I read from a character driver. Updates .B u . Returns -1 if this was the last character needed and zero if more should be read. .IP "\fBphysio(stgy, buf, dev, flag) int (*stgy)(); struct buf * bp;\fR" .br Implements the raw driver by setting .B buf from .B "struct\ user\ u" for device .B dev (major and minor number should be there). The buffer will be enqueued on, marked busy, and retrieved from .B iodone() automatically. It is marked for reading or writing, depending on .B flag being .B B_READ or .B B_WRITE . .B (*stgy)() is called with the buffer; it should be the .I strategy routine for the block driver in question. See .I /usr/include/sys/buf.h for .B "struct buf" . .IP "\fBprdev(msg, dev) char * msg;\fR" .br Prints "\c .I msg .B "on dev" .I major(dev) \c .B / \c .I minor(dev) \c " followed by the name of the special file on the console. .IP "\fBprintf(fmt, ...) char * fmt;\fR" .br Supports .B %d , .B %o , .B %s , and .B %x for printing to the console. Should not be used lightly! .IP "\fBint putc(ch, queue) struct clist * queue;\fR" .br Places a character .B ch into the character list headed by .B queue . Returns -1 if it cannot be done. The space in the list is global and should not be hogged; see .I /usr/sys/conf/para.c . .IP "\fBputchar(ch)\fR" .br Prints .B ch on the console. Should not be used lightly, since it interrupts normal output to the console. .IP "\fBint rd(ad)\fR" .br Returns value returned by .B rd instruction to device address .B ad ; see file .I /usr/sys/conf/mch.s . .IP "\fBint rh(ad)\fR" .br Returns value returned by .B rh instruction to device address .B ad ; see file .I /usr/sys/conf/mch.s . .IP "\fBint rdh(ad)\fR" .br Returns value returned by .B rd followed by .B rh instructions to device address .B ad composed into three bytes; see file .I /usr/sys/conf/mch.s . .IP "\fBselchfree(ad)\fR" .br Releases the selector channel with address .B ad . .IP "\fBselchreq(ad, queue, unit) struct selchq * queue;\fR" .br Enqueues for use of the selector channel with address .B ad . .B queue is used for enqueueing; see .I /usr/include/sys/selch.h . Once the selector channel is available, .B "(* queue->sq_sstart)(unit)" will be called. Once the selector channel interrupts, .B "(* queue->sq_sintr)(dm, stat, unit)" will be called. .B unit is a parameter that can be passed through. .B dm is taken from .B devmap[] for the selector channel address, .B stat is the status returned by the selector channel interrupt. .IP "\fBsleep(wchan, prio)\fR" .br Suspends the currently active process until .B wakeup(wchan) is called by another process. .B wchan must not be zero. If .B prio is less than .B PZERO , the suspended process is immune to .B kill (2); see file .I /usr/include/sys/param.h for priority values. .B prio determines the priority of the driver following .B wakeup() , higher numbers are less favored. .B sleep() cannot be meaningfully called from an interrupt service routine. .B sleep(&lbolt,prio) can be issued to receive a .B wakeup() issued regularly by the clock interrupt routine. .IP "\fBint spl0()\fR" .br Enables interrupts, returns previous setting of interrupts. .IP "\fBint spl\fIi\fB()\fR" .br Disables interrupts, returns previous setting of interrupts. .I i can be .B 1 , .B 4 , .B 5 , .B 6 , or .B 7 ; the distinction is meaningless on a Perkin-Elmer system. .IP "\fBsplx(i)\fR" .br Sets interrupts to .B i , a value returned by a call to \fBspl\fIi\fB()\fR. This is normally not used in drivers. .IP "\fBint ss(ad)\fR" .br Returns value returned by .B ss instruction to device address .B ad ; see file .I /usr/sys/conf/mch.s . .IP "\fBtimeout(fun, arg, clk) int (* fun)();\fR" .br Arranges a call .B (*fun)(arg) after .B clk clock intervals. Requires a timeout structure; see .I /usr/sys/conf/para.c . The function is called as a subroutine to the clock interrupt service routine, i.e., as a routine in the .I "bottom half" of a driver. Does not perpetuate, i.e., must be renewed if necessary. .IP "\fBwakeup(wchan)\fR" .br Resumes .I all processes suspended with a .B "sleep(wchan, ...)" . .IP "\fBwd(ad, byte)\fR" .br Issues .B wd instruction to device address .B ad ; see file .I /usr/sys/conf/mch.s . .IP "\fBwh(ad, halfword)\fR" .br Issues .B wh instruction to device address .B ad ; see file .I /usr/sys/conf/mch.s . .IP "\fBwdh(ad, threebyte)\fR" .br Issues .B wd followed by .B wh instruction to device address .B ad ; see file .I /usr/sys/conf/mch.s . .NH 1 Interface Definitions .PP This section summarizes the block and character driver interfaces: .IP "\fIopen\fB(dev, flag)\fR" .br Executed whenever a process (or the file manager or the swapper) connects to the driver using .B open (2). .B dev is a device number to which .B major() and .B minor() can be applied. A nonzero .B flag indicates a write access. Can set an error in .B u.u_error . .IP "\fIclose\fB(dev, flag)\fR" .br Executed for the .I last process disconnecting from the device using .B close (2) or during .B exit (2). .B dev is the device number to which .B major() and .B minor() can be applied. .B flag is nonzero if the last process had the connection open for writing. .IP "\fIstrategy\fB(bp) struct buf * bp;\fR" .br Presents a buffer header .B "struct\ buf" (\c .I /usr/include/sys/buf.h ) for processing. Must return the buffer (later) to .B iodone() . May set an error with .B B_ERROR in .B bp->b_flags . .IP "\fIread\fB(dev)\fR" .br Request to read .B u.u_count bytes from .B u.u_offset on the device to .B u.u_base in user address space. .B u is .B "struct\ user" of the currently active process (\c .I /usr/include/sys/user.h ). .B u.u_count must be counted down, preferably through .B passc(ch) or . B "iomove(..., B_READ)" . .B dev is a device number to which .B major() and .B minor() can be applied. Can set an error in .B u.u_error . .IP "\fIwrite\fB(dev)\fR" .br Request to write .B u.u_count bytes from .B u.u_base in user address space to .B u.u_offset on the device. .B u is .B "struct\ user" of the currently active process (\c .I /usr/include/sys/user.h ). .B u.u_count must be counted down, preferably through .B cpass() or . B "iomove(..., B_WRITE)" . .B dev is a device number to which .B major() and .B minor() can be applied. Can set an error in .B u.u_error . .IP "\fIioctl\fB(dev, cmd, addr, flag) caddr_t addr;\fR" .br Executed in response to .B ioctl (2). .B dev is a device number to which .B major() and .B minor() can be applied. .B cmd and .B addr (in user address space; see also .I /usr/include/sys/types.h ) are as for .B ioctl (2). .B copyin() or .B copyout() should be used to access .B addr to retrieve or post control information in the form of .B "struct\ ttiocb" (\c .I /usr/include/sys/tty.h ). .B flag is from .B "struct\ file" (\c .I /usr/include/sys/file.h ), .B flag&FREAD and .B flag&FWRITE are sensible tests. Can set an error in .B u.u_error . .IP "\fIinterrupt\fB(dm, stat)\fR" .br Called as a result of a hardware interrupt. Interrupts are disabled at this point, the currently active process (\c .B "struct\ user u" ) is .I not meaningful! .B dm is taken from .B devmap[] (in .I c.c ) for the address of the interrupting device. In general this is not a complete device number. .B stat is as returned from the device. The details can be inferred in the files .I /usr/sys/conf/*.s . Must return a buffer using .B iodone() . Can set an error with .B B_ERROR in .B b_flags for a block driver; cannot directly set an error for a character driver. .SH Acknowledgements .PP This document owes much to discussions about .UX and to work on drivers done jointly at Ulm. My thanks go to E. Janich, C. Swail, and to numerous students. .SH References .IP "[Bel\ 83]" .br .UX Programmer's Manual, Seventh Edition, Volume 2, Holt, Rinehart and Winston, 1983. .IP "[For\ 84]" .br M. Forstenhaeusler, M.S.Thesis on the .UX terminal driver, Sektion Informatik, University of Ulm, to appear 1984. In German, available only to source licensees. .IP "[Rit\ 78]" .br The .UX I/O System, e.g. in [Bel\ 83]. .IP "[Sey\ 84]" .br M. Seyfried, M.S.Thesis on the .UX packet driver, Sektion Informatik, University of Ulm, to appear 1984. In German, available only to source licensees. .IP "[Tho\ 78]" .br .UX Implementation, e.g. in [Bel\ 83]. SHAR_EOF if test 57662 -ne "`wc -c 'd.ms'`" then echo shar: error transmitting "'d.ms'" '(should have been 57662 characters)' fi : ' End of shell archive ' exit 0 -- Carl Swail Mail: National Research Council of Canada Building U-66, Montreal Road Ottawa, Ontario, Canada K1A 0R6 Phone: (613) 998-3408 USENET: {pesnta,lsuc}!nrcaero!carl {cornell,uw-beaver}!utcsrgv!dciem!nrcaero!carl {allegra,decvax,duke,floyd,ihnp4,linus}!utzoo!dciem!nrcaero!carl