jsq@usenix.org (John S. Quarterman) (01/07/90)
From: jsq@usenix.org (John S. Quarterman) Here is the White Paper on System Administration that USENIX sponsored for IEEE 1003.7. I am posting it now for three reasons: 1) It never has been posted. 2) It was one of the main reasons for the network orientation of 1003.7 that was mentioned prominently in the recent 1003.7 snitch report. 3) There is much talk of a possible White Paper for IEEE 1201. John S. Quarterman, Institutional Representative from USENIX to IEEE 1003. White Paper on System Administration for IEEE 1003.7 Susanne W. Smith Windsound Consulting John S. Quarterman USENIX Association ABSTRACT The new POSIX committee on System Administra- tion, IEEE 1003.7, is attempting to standardize an area in which there is little prior art, and no generally accepted solutions to many of the known problems. It is a large area, and one that inter- sects with other areas such as networking (IEEE 1003.8) and application programming (IEEE 1003.2). Some of the most applicable prior art was not designed for operating system administration, but for network administration. Perhaps most impor- tantly, there are two basic models for system administration. One must be chosen from the outset, and the choice will affect everything the committee does. The USENIX Association has coordinated the production of this White Paper to set forth the basic issues the committee must address, to recom- mend certain choices it will have to make, and to outline some of the existing solutions that must be considered. 1. Introduction The role of the systems administrator has evolved over the years. Where once an administrator was responsible for a single machine or machines from a single vendor there is now often a network of machines from different vendors. Both the homogeneous single machine case and the heterogene- ous networked case must be addressed by the systems adminis- tration committee in producing a standard. This paper July 17, 1989 - 2 - offers a description of systems administration, its com- ponent tasks, and its scope; it recommends a model upon which to build the standard; it presents an overview of some current systems administration practices; and it provides a reference list. 2. The Basic Model The most basic choice for a system administration stan- dard is between a single machine model and a model based on a network of machines. 2.1. A Single Machine The results of 1003.7 will be applied to many machines that are not connected to any other machines, except perhaps by some indirect technique such as UUCP. The standard must be applicable to such machines. For this purpose, it need only specify a command interface and detail what the com- mands are supposed to accomplish. However, there is a problem with basing the standard on a single machine as a model, because such a standard will not adapt well to a network of machines. The traditional methods used for administration of a single machine are not readily extended for a networked environment. For example maintaining user information on a single machine requires modifications to the /etc/passwd file. In a networked environment this further requires maintaining the con- sistency of this file across many machines. 2.2. A Network of Machines The number of machines connected to networks and the number of networks of computers have grown exponentially in the last several years. Many of us are accustomed to interacting with hundreds of computers on a local area net- work that is in turn connected to hundreds of thousands of other computers through wide area networks. 2.2.1. Remote Access Many machines do not even have local disks: files are kept on a central server, which is accessed over the net- work. There may be more than one server, and two machines may even act as servers for each other for different parts of their file system trees. 2.2.2. Distribution Databases may not have a single location. The mapping between login names and login IDs may be distributed among several machines. The whole database may be duplicated for redundancy. Parts of it may be kept in different places, July 17, 1989 - 3 - for local control. A tree structure may be used. 2.2.3. Heterogeneity Networked environments tend to have machines with many different hardware types and many different variants of operating systems. One machine may have /etc/passwd and another may use a distributed database. The possible param- eters to an operation may differ. Byte orders and word lengths vary. 2.3. Specifications A single interface specification is not sufficient for a networking model of system administration. Three things are needed: 2.3.1. Interface A specification of a programming interface is needed for a networked model, just as for a single machine model. Additional commands may be required for a networked model. But the specification of what the commands for the interface do has to be more complex for a networked model. 2.3.2. Database Because of differences among machines in a heterogene- ous network, such as varying byte orders, word lengths, and options supported, a generic specification of the informa- tion to be managed is needed. It is not practical to pro- vide specifications for every type of machine and software and translations between them, because the numbers of specifications needed would be very large. 2.3.3. Operations Given the interface specification of a command, and the database specification of the information it is to affect, a specification is also needed of how to communicate the necessary operation across the network. This should be done in a manner that is not specific to any of the underlying systems, but that can be translated into appropriate actions on any of them. 2.4. Network Management Standards These issues and this kind of model have been addressed for the purpose of managing networks. It is possible that the work can be adapted and extended for use by 1003.7. Two components, a management station and a management agent, work together to perform network management functions in the following two protocols. The management station monitors and controls network elements. Management agents perform July 17, 1989 - 4 - functions requested by the management station on the network element. 2.4.1. CMIP The Common Management Information Protocol is the emerging ISO standard for network management. It uses a MIB (Management Information Base) and defines operations to be performed on it over a network. 2.4.2. SNMP The Simple Network Management Protocol is in use now with TCP/IP on NSFNET. It addresses many of the basic net- work management problems and presents at least preliminary solutions to them. It proves the concept of a MIB with operations to manipulate it over a network. 2.4.3. ASN.1 Abstract Syntax Notation 1 is the ISO standard for encoding of information at the presentation layer of the seven layer ISO networking model. It is similar to Sun's XDR (External Data Representation) or Apollo's NIDL (Network Interface Definition Language) or NDR (Network Data Representation), but is more general than either. ``ASN.1 is useful for describing structures in a machine-independent fashion. Additionally, ASN.1 definitions can be written which convey to the human reader the semantics of the objects they define.''2 Both CMIP and SNMP are written in terms of ASN.1. 2.5. Scope The responsibilities of systems administrators vary widely among installations. In some environments the tasks of the systems administrator are defined as ``anything it takes to keep computing services available for the user com- munity.'' This definition could encompass everything from hardware diagnostics to network management. In some situa- tions the systems administrator may be responsible for user support and consulting. In other situations the tasks of the systems administrator could be rigidly defined to only include password file maintenance and backups. Because there is no commonly-accepted definition of the scope of system administration, the committee needs to define which specific areas are included as the functions of a systems administrator. Scope and definitions are also required parts of an IEEE standard. These should be addressed before commands and facilities are defined. The committee should consider previous work in network management. The OSI model for network management consists of five functional areas: configuration management, July 17, 1989 - 5 - performance management, fault management, accounting manage- ment, and security management. These functional areas map very well from network management to operating system management. 2.5.1. Configuration Management Configuration management in the network sense is defined as ``detection and control of the state of the net- work, both the logical and physical configurations of the network.''1 Configuration management in a systems adminis- tration context would refer to the management of the infor- mation which defines a machine's functions. Configuration information determines whether a machine is a file server or client, a timesharing service or single user, diskless or diskful. The configuration data identifies the location of other machines and services. 2.5.2. Performance Management Performance management could be defined as the collec- tion and analysis of information that determines a machine(s) performance. Examples could be disk throughput, service access times, or cpu utilization. 2.5.3. Fault Management Fault management is ``the detection, isolation, and correction of abnormal operations in the network.''[1] For systems administration this would be detection of a service's failure, notification of the user community of failure, and the initiation of a backup service. 2.5.4. Accounting Management Accounting management would be the management of the information required to determine the cost of using the sys- tem. This type of information is traditionally collected in units of disk storage blocks, cpu usage, and connect time. 2.5.5. Security Management Security management is composed of the functions required to regulate access to system resources. User authentication, server verification, and security logs are functions of security management. 2.6. Recommendations We strongly recommend the adoption of a network model. We also recommend that the committee focus on the entities to be managed and not the underlying transport protocol. July 17, 1989 - 6 - 2.6.1. Specifications Every command should be specified in terms of an inter- face, an information database, and operations to be per- formed over a network. Although the first of these alone would be sufficient in a single machine case, it is not ade- quate to a networked environment. A network model can be reduced to handle a single machine as a special case, but a single machine model cannot readily be expanded to support a networked environment. This is the main reason that a net- work model should be adopted instead of a single machine model. 2.6.2. Network Management The committee should examine the work done to date on SNMP and CMIP, and should follow the progress of the commit- tees that are producing those protocols. The 1003.7 MIB should be written in ASN.1. 3. Prior Art We present here some examples of areas in which there is prior art that the committee should consider. This is not an exhaustive list of either the areas to be covered or the prior art in a specific area. There are other such areas, and we encourage others to submit proposals to the committee outlining them. The examples are grouped according to the OSI model described above. Because system administration covers a broader area than network management the categories have been extended. Additional categories may be required to com- pletely include all system administration functions. 3.1. Configuration Management In addition to the description above configuration management could include user configuration information. This would include the information required to describe a user and their environment (i.e. the location of their home directory). This area could also include queueing systems. 3.1.1. /etc/passwd The simplest database of user information is /etc/passwd. It is a single file which contains information about each user. /etc/passwd contains a user's login name, user-id, group id, encrypted password, optional full name and additional information, home directory location, and program to be executed upon successful completion of the login process. User information is added, changed, or deleted by using the command vipw or one of many available shell scripts and programs. Access to the information is July 17, 1989 - 7 - controlled by file permissions. This scheme works well in a single machine environment. This method requires each machine to have an /etc/passwd file. As the number of machines on a network and the number of users increases, maintaining the file entries on each individual machine becomes an overwhelming, if not impossi- ble, task for the system administrator. Different methods have been proposed to handle the task of maintaining an /etc/passwd file on each machine in a network. 3.1.2. Yellow Pages Yellow Pages (yp) is a distributed network lookup ser- vice. The Yellow Pages provide configuration information for a group of machines called a domain. A machine requesting information is a yp client and the machine providing the information is a yp server. The information for a particular domain is a set of maps. Commonly the /etc/passwd and /etc/hosts files are replaced by yp maps. However, yp is indifferent to the type of data in the maps. A master flat file resides on a master server machine. Updates to the master file are made here. Dbm is used to transform the flat file into maps. The maps are then propagated to all slave server machines. The number of slave servers is dependent on network size and topology. A single machine may serve more than one domain. Once yp services are available (i.e. the maps have been made and the server machines configured) routines on the yp client machine must be modified to initiate yp requests rather than reading local files. Yp requests are remote procedure calls to a yp server. 3.1.3. Moira ``The purpose of Moira is to provide a single point of contact for authoritative information about resources and services in a distributed environment.''[3] Moira is used to store information about users, the location of network ser- vices, the information needed to create the configuration files for network servers, as well as other information. Updates to the database are made using an application inter- face which is based on curses. Validity checks are per- formed on data to be updated. Access to each object in the database is controlled by an access control list. Statistics are kept about who modified the object last. Network server configuration files are created from the Moira database and sent periodically to the appropriate servers. This eliminates the need to modify configuration files on individual machines. The Hesiod (see below) July 17, 1989 - 8 - database is also created from the information in the Moira database 3.1.4. Hesiod Hesiod provides a read only front end for user infor- mation and the location of network services. User informa- tion is extracted from the Moira database and formated into ASCII files in BIND-compatible resource record format. Modifications have been made to BIND to accept and process Hesiod type queries. Hesiod is used by the login process to acquire user information. Note however that Hesiod does not authenticate the user. Authentication is performed by Kerberos. Hesiod is also used by lpr to retrieve printer information tradition- ally stored in the /etc/printcap file. 3.1.5. Berkeley Print Spooling The Berkeley print spooling system was intended for use with network print services where printers are connected directly to the network or to the serial port of a host machine on the network. The command lpr is used to start the printing process. Line printer daemons (lpd) run on each machine in the network to control the spool area, queue, printing, and network transfers. Lpr looks up information for the requested printer in the /etc/printcap file. This file contains information about each printer, such as location, filters needed, header page format, etc. It determines what to do with this file from this information. The lpc command provides queue management functions. Lpc is used to restart and flush queues, abort jobs, and check the status of queues and printers. 3.1.6. MDQS - Multiple Device Queueing System MDQS provides for local printer support, remote printer support, local and remote batch job scheduling, conversion of troff to device specific format, and sending graphics data to plotters. MDQS consists of a queue management dae- mon, a general-purpose spooler, a set of device specific despooled-data processing slaves, and utilities for setting form types, disabling service, viewing queues, etc. A queue/device mapping table contains the queue name, device name, and the command to be executed as a slave pro- cess for the dequeued data. Remote printing and execution are handled by having slave processes which respool the data into the remote MDQS queues. The mapping table provides the flexibility for multiple devices to process from the July 17, 1989 - 9 - same queue or one device to process from multiple queues. If NFS (network file system) or some similar mechanism is used a single spooling area and daemon with control files can reside on one machine. This eliminates the need for respooling data into remote queues and the overhead of main- taining a local spooling area, daemon and control files. The remote devices simply process the queue from the remotely mounted file system. 3.2. Security Management Personal computers can be protected by making the machine physically secure. In a timesharing environment the operating system is used to protect one user from another. In a networked environment there are three approaches to prevent unauthorized access to network services: rely on the host to authenticate the user and then trust the host; require the host to prove its identity and then trust the host as to who the user is; make the user prove who they are for each network service. 3.2.1. Kerberos ``In an open network computing environment, a worksta- tion cannot be trusted to identify its users correctly to network services.''[4] Therefore Kerberos uses the third approach to system security; make the user prove their iden- tity for each network service. In order for a user to prove their identity, they must be authenticated by Kerberos, not the workstation they are using. Passwords are never sent over the network, but are used locally to decrypt the authentication message from Kerberos. To prevent unauthor- ized use the local workstation destroys the user's password after using it to decryt the initial Kerberos message. Once a user has been authenticated they have the keys to request various network services. Different applications can choose different levels of protection. The first is authentication at initiation but subsequent messages are just accepted if from the same network address. The second is where each message is authenticated but the contents of the message are not encrypted. The third level of security is private messages where each message is authenticated and encrypted. The Kerberos database contains a name, private key, and expiration date for each entity that will use Kerberos ser- vices. The master Kerberos database is kept and modified on one machine. Slave servers have read only versions of the database and provide read only types of services. Modifica- tion to the master database is accomplished by the adminis- tration server (KDBM server). There are two parts to this service, a client which will run on any machine in the net- work and a server that must run on the machine which houses July 17, 1989 - 10 - the master database. 3.3. Accounting Management Accounting is the recording and reporting of resource usage. This information can then be used to determine appropriate charges for a user. 3.3.1. Harvard Accounting System This system would track disk usage, cpu time, logins, connect time, printed pages, and budget on an account-by- account basis and charge the appropriate accounts. It was designed to run in a single machine environment. 3.4. Fault Management In order to restore service after a disk failure a sen- sible backup procedure needs to have been followed by the administrative staff. Basic commands to move data from one medium to another are described below. Tar and cpio file archiving and data interchange for- mats are the only backup formats specified in 1003.1. 3.4.1. System V Interface Definition (SVID) 3.4.1.1. volcopy The volcopy command will make a literal copy of a file system. Copies can be made to another disk location or to tape. 3.4.2. SVID & Berkeley 3.4.2.1. tar The tar command is used to create an archive file. Mul- tiple files can be saved to and restored from a single tar- file. The tarfile can reside on various physical media. tar will read from standard input and write to standard output so that it can be part of a pipeline. This feature can be used for moving directories. 3.4.2.2. cpio cpio copies a list of files to from a cpio archive file. Pathnames and status information are kept along with the files. 3.4.3. Berkeley dump/rdump/restore/rrestore The dump and rdump commands will copy all files in a filesystem to backup media. The restore and rrestore July 17, 1989 - 11 - commands will copy files stored via dump to a filesystem. Rdump and rrestore provide the same functionality as dump and restore over a network. Remote dump devices are speci- fied as a host-device combination. The dump command allows for different levels of back up. A level 0 dump copies every file in the filesystem. A level 5 dump would copy every file that has been modified since the last dump of a lower level. 3.5. Performance Management Performance management analyzes the output from system statistics to determine problem areas and develop solutions. 3.5.1. Berkeley Performance Monitoring Commands The following commands are executable directly on each machine to report local status. 3.5.1.1. vmstat The vmstat command provides information on the memory usage, process status, and disk utilization. 3.5.1.2. iostat The iostat command reports statistics related to I/O operations. Both terminal and disk I/O are included. 3.5.1.3. netstat The netstat command displays the contents of the network-related data structures. Information is provided about established connections and gateways. 4. Work in Progress 4.1. OSF RFT The Open Software Foundation will be issuing an Request for Technology (RFT) for Systems Administration software from the Munich office some time in August 1989. 4.2. FDDI A group is forming to determine which variables are appropriate for inclusion in the MIB for FDDI. 4.3. Network Management Language ``NML is seen as a canonical interface between the net- work management application programmer and the MIXP (Manage- ment Information Exchange Protocol).''5 It isolates the applications programmer from the specific MIXP being used. July 17, 1989 - 12 - Extending this to systems administration would enable the underlying protocol to be changed without the systems administrators programming environment to be changed. 5. Acknowledgements We would like to thank the following people for provid- ing information, support, and inspiration, Carolyn D. Coun- cill, John Lees, Jackie Carlson, Doug Gwyn, Keith Bostic, Clifford Neuman, Mark Ozur, Martin Schoffstall, Frank Cun- ningham, Paul Stutler, Ted Cook, and John Bossert. 6. Authors' Addresses Susanne W. Smith Windsound Consulting 6225 137th Place SW Edmonds, WA 98020 <smith@usenix.org> John S. Quarterman TIC 701 Brazos, Suite 500 Austin, TX 78701-3243 <jsq@usenix.org> References 1. Ben-Artzi, Amatzia, "The CMOT Network Management Archi- tecture," ConneXions, vol. 3, pp. 14-19, Advanced Com- puting Environments, Mountain View, California, March 1989. 2. McCloghrie, Keith and Marshall T. Ross, "Network Management of TCP/IP-based internets," ConneXions, vol. 3, pp. 3-9, Advanced Computing Environments, Mountain View, California, March 1989. 3. Rosenstein, Mark A., Daniel E. Geer, Jr., and Peter J. Levine, "The Athena Service Management System," USENIX Conference Proceedings, pp. 203-212, USENIX Associa- tion, Dallas, Texas, February 9-12, 1988. 4. Steiner, Jennifer G., Clifford Neuman, and Jeffrey I. Schiller, "Kerberos: An Authentication Service for Open Network Systems," USENIX Conference Proceedings, pp. 191-202, USENIX Association, Dallas, Texas, February 9-12, 1988. July 17, 1989 - 13 - 5. Warrier, Unni, "A Network Management Language," ConneX- ions, vol. 3, pp. 33-39, Advanced Computing Environ- ments, Mountain View, California, March 1989. Bibliography 1. System V Interface Definition, I, II, III, AT&T, 1986. 2. Networking on the Sun Workstation, Sun Microsystems, Inc., Mountain View, California, February 1986. 3. System Administration for the Sun Workstation, Sun Microsystems, Inc., Mountain View, California, February 1986. 4. USENIX Proceedings of the Large Installation Systems Administrators Workshop, USENIX Association, Philadel- phia, Pennsylvania, April 9-10, 1987. 5. USENIX Proceedings of the Large Installation Systems Administrators Workshop, USENIX Association, Monterey, California, November 17-18, 1988. 6. Arnold, Edward R. and Marc E. Nelson, "Automatic Unix Backup in a Mass-Storage Environment," USENIX Confer- ence Proceedings, pp. 131-136, USENIX Association, Dal- las, Texas, February 9-12, 1988. 7. Ben-Artzi, Amatzia, "The CMOT Network Management Archi- tecture," ConneXions, vol. 3, pp. 14-19, Advanced Com- puting Environments, Mountain View, California, March 1989. 8. DellaFera, C. Anthony, Mark W. Eichin, Robert S. French -, David C. Jedlinsky, John T. Kohl, and William E. Sommerfeld, "The Zephr Notification Service," USENIX Conference Proceedings, pp. 213-220, USENIX Associa- tion, Dallas, Texas, February 9-12, 1988. 9. Dyer, Stephen P., "The Hesiod Name Server," USENIX Conference Proceedings, pp. 183-190, USENIX Associa- tion, Dallas, Texas, February 9-12, 1988. 10. Eaton, Charles K., "Project Accounting on UNICOS," USENIX Conference Proceedings, pp. 163-170, USENIX Association, Dallas, Texas, February 9-12, 1988. 11. Epstein, Mark E., Curt Vandetta, and John Sechrest, "Asmodeus: A Daemon Servant for the System Administra- tor," USENIX Conference Proceedings, pp. 377-392, USENIX Association, San Francisco, California, June 20-24, 1988. July 17, 1989 - 14 - 12. Fiedler, David and Bruce H. Hunter, UNIX System Administration, Hayden Books, Indianapolis, Indiana, 1988. 13. Howard, John H., "An Overview of the Andrew File Sys- tem," USENIX Conference Proceedings, pp. 23-36, USENIX Association, Dallas, Texas, February 9-12, 1988. 14. Hume, Andrew, "An Incremental Backup System for UNIX," USENIX Conference Proceedings, pp. 61-72, USENIX Asso- ciation, San Francisco, California, June 20-24, 1988. 15. III, Douglas P. Kingston, A Tour Through the Multi- Device Queueing System, Ballistic Research Laboratory, Aberdeen Proving Grounds, Maryland, July 25, 1984. 16. Jatkowski, Paul, "PMON: Graphical Performance Monitor- ing Tool," USENIX Conference Proceedings, pp. 111-118, USENIX Association, Dallas, Texas, February 9-12, 1988. 17. Jones, Von, "System Administration Daemons," USENIX Conference Proceedings, pp. 137-144, USENIX Associa- tion, Dallas, Texas, February 9-12, 1988. 18. Krempel, Henry B. J. and John F. Fowler, "High- Performance Workstations in a Model University Environ- ment," Northeast Parallel Architectures Center Techni- cal Report, Syracuse University, Syracuse, New York, April 7, 1988. 19. McCloghrie, Keith and Marshall T. Ross, "Network Management of TCP/IP-based internets," ConneXions, vol. 3, pp. 3-9, Advanced Computing Environments, Mountain View, California, March 1989. 20. Partridge, Craig, "A UNIX Implementation of HEMS," USENIX Conference Proceedings, pp. 89-96, USENIX Asso- ciation, Dallas, Texas, February 9-12, 1988. 21. Pato, Joseph N., Elizabeth Martin, and Betsy Davis, "A User Account Registration System for a Large (Hetero- geneous) UNIX Network," USENIX Conference Proceedings, pp. 155-172, USENIX Association, Dallas, Texas, Febru- ary 9-12, 1988. 22. Peacock, Don and Mark Giuffrida, "Big Brother: A Net- work Services Expert," USENIX Conference Proceedings, pp. 393-398, USENIX Association, San Francisco, Cali- fornia, June 20-24, 1988. 23. Rosenstein, Mark A., Daniel E. Geer, Jr., and Peter J. Levine, "The Athena Service Management System," USENIX Conference Proceedings, pp. 203-212, USENIX Associa- tion, Dallas, Texas, February 9-12, 1988. July 17, 1989 - 15 - 24. Steiner, Jennifer G., Clifford Neuman, and Jeffrey I. Schiller, "Kerberos: An Authentication Service for Open Network Systems," USENIX Conference Proceedings, pp. 191-202, USENIX Association, Dallas, Texas, February 9-12, 1988. 25. Treese, G. Winfield, "Berkeley UNIX on 1000 Worksta- tions: Athena Changes to 4.3BSD," USENIX Conference Proceedings, pp. 175-182, USENIX Association, Dallas, Texas, February 9-12, 1988. 26. Warrier, Unni, "A Network Management Language," ConneX- ions, vol. 3, pp. 33-39, Advanced Computing Environ- ments, Mountain View, California, March 1989. 27. Yeong, Wengyik, Martin Lee Schoffstall, and Mark S. Fedor, "A UNIX Implementation of the Simple Network Management Protocol," USENIX Conference Proceedings, pp. 209-218, USENIX Association, San Diego, California, January 30 - February 3, 1989. July 17, 1989 Volume-Number: Volume 18, Number 9