djy@INEL.GOV (Daniel J Yurman) (02/04/91)
Mail to - info-futures@world.std.com This long posting is a response to a request for thoughts and expansion on the them of "two competing models of computing ... current polarizing development directions." This response addresses workstation and server environments in the context of applications addressing environmental problems, .e.g, hazardous waste. I think the original post by Barry Shien - bzs@world.std.com - of 2/1/91 was very thoughtful and to the point. This response is not meant be be comprehensive, but hopefully will add to the discussions and perhaps stimulate other to contrubute their views. *%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%*%%*%* Proposed Workstation Configuration for a Decision Support System for Cleanup of Hazardous Waste Sites Using Geographic Information Systems and Expert Systems. Dan Yurman Idaho National Engineering Laboratory PO Box 1625, Idaho Falls, ID 83415 43N, 112W -7GMT (208) 526-8591 * Internet djy@inel.gov * Bitnet djy%inel.gov@cunyvm.cuny.edu ABSTRACT The complexity of identifying the extent of contamination at hazardous waste sites requires the use of computer tools distributed across multiple workstations. Geographic information systems (GIS) coupled with graphics and expert systems can help shape solutions for analysts. This is an account of the design for workstations to meet the requirement for decision support systems (DSS) in this area. Scope I will address two topics. These are (1) systems architecture, and (2) resolving ill-defined problems. 1. System Architecture These comments discuss the requirements of a "site characterization" workstation to support DSS. By "site" I am referring to a discrete location which is known to contain uncontrolled hazardous waste either at the surface, below the surface, or both. By "characterization" I am referring to a process by which the volume, concentration, and spatial distribution of the uncontrolled hazardous waste are quantified, and which can serve as the basis for specifying a cleanup action. The heart of the design is a concept labeled as the "logical workstation" This concept is based on the idea that not all functions for an DSS need reside on a single host, but can be spread among processors and platforms across a network. This is almost a necessity because of the distribution of human and technical resources necessary needed to clean up large areas of hazardous waste contamination. Examples include the four Super- fund sites contaminating 150 square miles of the Clark Fork watershed area in Montana. People and computers needed to do the work are not necessarily co-located in the same office nor even in the same city. In the case of the Clark Fork, computer resources to support the GIS are located in Helena, MT, Denver, CO, and Las Vegas, NV. The main host is in Montana as is the plotter, workstations for some of the input and output of tabular data are in Denver, and workstations for development of applications, including Arc/Info AMLs, are at EPA's GIS Laboratory in Las Vegas. Workstation Processes The "logical workstation" concept states that there are functions that are inter-related and that form logically integrated work processes across a network. Each "logical workstation" provides a standard user interface for packaged application functions, and, it allows the user to descend to the level of the operating system for advanced programming functions. Thus the Logical Workstation is both a physical reality and a system concept. Each of the work processes (Input, Output, Analysis) supporting user requirements may be thought of as being serviced by a workstation. The logical workstations are related through the common data shared by each of the applications. The applications are those supporting site characterization and environmental restoration work. The workstation is composed of three logical workstations corresponding to Input, Analysis, and Output functions. Each logical workstation is integrated with a Support module that provides data management and system guidance functions. The integration through Support functions provides the "data draw- bridge" that links and integrates the applications residing on logical workstations with each other. The Support workstation also takes advantage of surplus computing cycles available on workstations, and allocates their use for numerically intensive operations on the Analysis workstation. Finally, the Support workstation can function as a database server either directly on the main host, or as a gateway to multiple hosts. The logical workstation concept improves the ability of a user community to match hardware and software resources in direct proportion to processing and utilization requirements. For example, the Analysis Workstation may require a high speed processor, but the other workstations may not share this need. On the other hand, the heavy utilization of the input functions may dictate implementing multiple Input Workstations. The resulting configuration would use multiple, low cost, lower power CPUs for processing input functions and higher power, more expensive, CPUs for analysis functions. The point is that organizations designing a DSS need not purchase a single target architecture nor even a single application software package in an effort to be all things to all users. This point might seem obvious, but please bear with me in the exposition of the idea of the logical workstation. Input Workstation The Input Workstation would capture tabular data, chemical sampling and analyses' results, attribute information, digitized map information, etc., for a database. The data in the DBMS would support a geographic information system (GIS); or graphics to support scientific visualization and engineering perspectives. Even sampling and analysis results from a laboratory or test data from an instrument have to go into a database before being accessed for visualization routines. Data can also be accepted from digitizer, keyboard input, and from other sources already in electronic form, e.g. Tiger files from the Census Bureau. The Input Workstation must provide interactive editing of spatial and attribute data. Typically, the Input Workstation does not have significant local disk storage, but ships its data to a larger host across a network. It is assumed that the graphics and GIS functions are interfaces to the database or distributed databases located on one or several hosts across a network, but they do not operate on the Input Workstation. There may be more than one kind of Input Workstation depending on the kind of data being entered in the system. For instance, an ASCII or X-Windowing terminal may be sufficient for entry of data into forms, but a digitizer may be required for map information. A standard '286 or '386-based PC may be used to write code to manipulate data already in electronic form on a larger host. Analysis Workstation The Analysis Workstation would provide tools to support hydrogeological, spatial, and visual analyses of site data. It would support validation of chemical sampling and analyses' results. Additional tools would include geophysical tools, such as cross section and fence diagrams, as well as management of lithologic and well construction data. The Analysis Workstation performs statistical analyses of chemical concentrations, and supports the use of analytical models. In terms of a geographic information system, the Analysis Workstation is host to the GIS software using, in turn, a larger host for storage of attribute data. All work processes to develop and visually display GIS coverages on the screen are carried out on the Analysis Work- station. Memory and local disk storage are sufficient to hold all programs and data needed for the current session. These data are acquired using SQL procedures from the networked host supported by a client /server architecture. There may be more than one kind of Analysis Workstation depending on the analytic procedures required to visualize information, and some configurations may have more than one function. For instance, a large screen, color monitor, e.g., 19-inches, hooked to a processing unit with significant memory, e.g. 16 Mb, 700 Mb of storage, may be needed for GIS work, but a '386- based PC with 4 Mb of RAM, 100 Mb of storage, may be sufficient for statistical analyses of tabular information. Also, the PC may be used as an input platform to write code in a high level language to run a job on the larger workstation using a network host as a server between the two. Groundwater Applications: The Analysis workstation must have the capability to support specific types of applications. The following examples in the area of groundwater analyses, which are fundamental to cleaning up hazardous waste, explain some of the functions. The workstation does not place sampling wells nor does it do other kinds of field work. It is a platform upon which numeric analyses and modeling exercises are conducted which aid in the implementation of the field activities. Types of Applications to be Supported (a) Monitoring, (b) Source Control, (c) Fate and Transport, (d) Technology Transfer, (e) Treatment Technology, and (f) integration of surface and groundwater models. Explanation of Application Types (a) Monitoring ... involves the placement and spacing of wells together with acceptable procedures for sample collection, quality assurance, and quality control. This activity is a fundamental requirement for credible decisions in groundwater protection. (b) Source Control ... assesses technological and operational strategies to reduce the risk of contamination posed by improper disposal of hazardous wastes. (c) Transport & Fate ... involves predicting the behavior of contaminants below ground. It is one of the most difficult and most important tasks for groundwater protection. Analyses of fate and transport problems involve predicting the physical movement of groundwater and contaminants in saturated and unsaturated zones. Also, it involves changes in the quality of ground- water through natural or manmade degradation or differential separation of constituents. (d) Technology Transfer ... imports information on comparative methods for attaining groundwater protection and cost data on available technologies. Field personnel deal with an extremely broad scope of data and information, and often completion of groundwater analyses is aided by consultations among field offices. (e) Treatment Technology ... focuses on the removal of volatile and non-volatile organics, inorganics, metals, and microbes. (f) Integration of Surface & Groundwater Models ... support the practical assessment and prediction of occurrences in the following; - aquifer flow simulation under unconfined flow - aquifer flow simulation under pumping conditions - time/series predictions under pumping conditions - solute transport delineations and predictions - storage of multiple contours, Output Workstation The Output Workstation would support a full range of presentation media for representing site data and the results of analytic operations. It would support desktop publishing applications to integrate text and graphics to meet formal presentation requirements for management. It controls development of plotter jobs, visually displays them prior to execution, and translates them to graphic files, e.g. CGM, for use in other applications. This is an example of "task pipelining", a concept which will be discussed in detail later in these comments. There may be more than one kind of Output Workstation depending on presentation media. Examples of output media include standard 'D or 'E size plots, tabular printout, post- script laser, or television production quality video tape. The preparation of plots of graphs and GIS coverages is done here, and can utilize the services of a network host, or server, to spool the job to an output device thus freeing up this machine for new work. Memory and disk storage on this workstation are sufficient to hold all programs and data needed for the current session, but completed files are shipped back to a larger host across a network for storage between jobs. This workstation may include specialized hardware to achieve almost real-time decompression of graphics files. It may contain hardware to display graphics on a television monitor or feed directly into transmission of TV images over microwave to a remote site. System Support The Support workstation provides data management and user guidance functions integrated with the other Logical Work- stations. Each of the Logical Workstations require interaction with the Support workstation to execute their functions. System configuration management for network services is included here. Data administration functions include support for data base management, query functions, data extract, and reformatting services. The guidance functions include on-line help for packaged applications, system and application tutorials, and expert system guidance to shape solutions regarding what data is to be acquired for each session and what tools are appropriate to use. The expert system would be capable of examining its own reasoning and explaining its operation. A data "drawbridge" functions on the Support workstation linking the GIS and DBMS functions with models, statistics, contouring, and other specialized functions. The drawbridge is linked to a problem solving function which aids the user in several ways. First, it aids in specifying the nature of the problem. Second, it assists in structuring a solution not only in terms of conceptual approaches, but also in identifying relevant data types and data sets which can be used to test alternative solutions, e.g., to use a metaphor, in spreadsheet form Third, it aids the user in selection of appropriate tools by evaluating the types of data and the means of computer assisted analysis which can most effectively represent the data in a solution. The system support workstation is the host to the DBMS as well as databases of spatial information. This workstation has the greatest amount of mass storage. It is also the home for data retrieval and output tools for reports, maps, queries, and final output for maps, plots, slides,a nd text. Additional ideas on the Support workstation are included later in these comments under the heading of "Resolving Ill- Defined Problems." In the remainder of this section I would like to make a few comments on how to migrate from present platforms to the design suggestions for the Logical Workstation. Workstation Configurations There are many ways to configure workstations to support site characterization analyses and integration with GIS systems. Each configuration has its strengths and limitations, and should be considered in light of performance, cost, future requirements, and migration path. The new generation of workstations can give a user community new capabilities at lower cost: faster processing, high resolution graphics, and multitasking for a single user. A suggested migration path would allows a user organization to start with single workstation, and expand to a network of stations running a number of applications. The present standards within the user community for hardware, software, data resources, and networks will influence workstation configurations and the migration path. Four hardware configurations that are feasible for different stages of migration path are discussed below. These are not intended to be exhaustive descriptions of all combinations. They provide a basis for alternative combinations. Low cost single user workstation: This configuration is recommended for work groups with only a few users, no local area networks, and relatively low usage of current workstations to support data entry functions. These workstations may also be used for other purposes including networked office automation functions. These low-cost, single user workstations would in the present era have 32-bit memory, processors, and offer considerably enhanced speed and capacity over current '286-based processors or dedicated terminals. These workstations can process data faster than terminals, and are a good choice for the database interface to a distributed system. High performance single user workstation: This machine is recommended for work groups requiring more processing power, increased local storage, and a multiuser, multitasking environment. This machine can be used for software development and to provide technical support to end users. It allows users to interface with data through packaged software from third party vendors, custom applications, or through high level programming languages which interact with third-party software or custom applications. These functions can be migrated up the power curve of workstation configurations. These machines can be linked with low cost, single user workstations and with larger hosts across a network. Multiuser local area network configuration: The next step in the migration path is a configuration with workstations on a local area network. This configuration is recommended for organizations with existing networking capabilities or those planning to have networking capabilities in the near future. The single-user, high-performance workstation can be networked with other workstations with varying degrees of capability running a number of applications as the demand for more computing power and use of expensive peripherals in their computing facilities grows. At this stage in the migration path, distributed processing can take place capitalizing on surplus cycles of individual machines to support numerically intensive operations. Client / Server Architecture: This type of configuration is recommended for those sites with an increased demand for data sharing, computing, and network resources. The file server is connected to a large disk that holds the network operating system, application software, and data files. This configuration could include a capability to be upgraded to support multiple processors, at least 64 Mb of RAM, and storage in excess of 1 Gb connected via ethernet to workstations for the GIS and graphics, and other workstations for data entry, retrieval, and integration into desktop presentation or publishing tools. Summary In the first set of comments I have presented a proposed system architecture which could potentially support the kind of applications described in the second set of comments. In the second set I will provide some comments for ways in which these design ideas could address the process of resolving ill-defined problems. 2. Resolving Ill-Defined Problems The Support Workstation described above is host to expert systems, a data drawbridge between applications, and a "solution shaper" function. I would like to discuss the spreadsheet metaphor as applied to groundwater analyses and geographic information systems. I reason that there was a comparison which could be made, by analogy, between proposals for DSS and the quantum leap offered to financial analysts by the advent of the electronic spreadsheet. The first part of the analogy is drawn from the idea of the data drawbridge. It's role is the dynamic manipulation and selective mapping of output streams from a tool to the formalized data input structures or another tool or model. These data linkages would allow task pipelining. The first sentence clearly relates to the idea of a spreadsheet with its named ranges (selective mapping of output streams), and the second relates to the reverse which is the use of spreadsheets to manipulate data downloaded from larger hosts. With regard to "data linkages allowing task pipelining" this has been part of the vision of DSS for some time. These types of products have achieved the objec- tives of this vision. The system architecture noted earlier is moving in the direction of developing an integrated toolbox for groundwater analyses and spatial analyses which transcend the ordinary limitations of stand-alone models. Solution Structures Solution structures incorporate the methodological knowledge that an experienced problem solver uses in the site characterization process to analyze a problem and structure the solution. The problem is - where is the contamination and how much is there? The solution is - what is the most technologically and cost effective method to remove threats to human health and the environment? The analyst needs attribute information about the extent of contamination and also technology information about cleanup methods. These are two distinct databases, and under the Logical Workstation concept, need not be on the same host. In the logical workstation application this concept would the data integration capability provided by a global data element dictionary used by all applications. Specifically, the solution structures would assist the user in identifying and locating within the workstation databases or across a network the types of data required to solve a particular type of problem. Finally, once the solution process has been structured and data needs and availability identified, assistance in selecting the most appropriate tools for analysis and display of the results would be provided in the order in which they were needed by the user. Guidance Tool The user must have the option to access a guidance tool for assistance in choosing the right application and analytical method for site characterization analysis. This option would be served by an expert system, and use of this capability would result in more effective use of site characterization analyses and their integration into GIS. The expert system must be able to extract information from the database for knowledge representation. The expert system of choice should integrate with the existing databases, user interface, and tools on the workstation. The expert system functions could include: * Help: The expert system shell can be used as the help facility in the workstation. It could work as an integral part of the object-oriented user interface in a multitasking, windowing environment. * Tutorial: The guidance tool can also be used as a tutorial for new users. The expert system would be integrated with work- station shell and site data to get practical knowledge on using the workstation. * Problem Solver: The expert systems can be used as a check list for steps to be employed and selection of appropriate model for site analysis. These steps include: - Problem Description: assistance in identifying and ordering tasks that must be completed to solve a particular problem. An example might include setting appropriate data quality objectives. - Solution Structuring: the methodological knowledge can used to analyze a problem and structure the solution process. - Tool Identification Assistance: a knowledge-based system that would select the best algorithm for spatial data interpolation based upon the statistical characteristics and completeness of the data. * Self Knowledge: The expert system must have a explanation facility. This explains how the system arrived at its answers. The explanations must display the inference chains and explain the rationale behind each rule used in the chain. The users will have more faith in the results and more confidence in the system. General Considerations Expert systems offer a means to overcome barriers to more effective use of groundwater analyses and their integration into geographic information systems. From an engineering perspective, expert systems can be used to support computational techniques to find acceptable answers to problems which are normally solvable by conventional means, but which require iterative, complex procedures to resolve. Expert systems can be used to address computational problems involving special data structures together with domain specific knowledge engineering techniques. As a body of techniques and methods, expert systems cut across other components of the requirements analysis and have multiple potential applications. Use of Expert Systems There are many barriers involving both computational and conceptual issues which related to problem solving for groundwater analyses and development of GIS. Expert systems have the potential for intensifying the speed and degree of analysis as well as addressing the more complex problems of spatial data structures, resulting spatial data relationships, and visualization of mapped data. In general, expert systems may hold a key to more effectively managing the complexity of groundwater data. Applications which might be built include, but are not limited to the following examples. These examples are drawn from the work of the national Center for Geographic Information Analysis, Santa Barbara, CA; ... the statistical analysis of spatial data, including the selection of statistical tests to be employed, and warnings in case of inappropriate data management or use; ... modeling of spatial phenomena or a large area with many interrelated attributes; ... image interpretation such as detecting changes over time. ... cartographic design including label placement and color selection; enforcement of map design standards for a large user community Summary Ill-defined problems require a "intelligent assistant" to scope an environmental problem against such variables as sources and volume of pollutants, toxicity, transport & fate, etc., and to allow the application of appropriate tools supported by task pipelining. Conclusions These workstation configurations could be assembled today from commercial sources. The greatest area of custom programming would be in the development of "intelligent assistants" and a common user interface for all workstations to shape solutions for cleanup of hazardous waste. REFERENCES [1] "Workstations for Groundwater Analyses at US EPA," Yurman, D., in Proceedings of the Second Joint Conference on Integrating Data, Session of Microcomputers and Public Policy Analysis, National Governers' Association, Washington, DC, December 14, 1988. [2] "A Conceptual Structure for the Groundwater Workstation," Horsey, H., Center for Advanced Decision Support Water Environmental Systems (CADSWES), Boulder, Colorado, in Proceedings of the Groundwater Workstation Advisory Committee, Office of Solid Waste & Emergency Response, U.S. Environmental Protection Agency, Washington, DC, May 1988. [3] Proceedings of the Groundwater Workstation Advisory Committee, Confernce Report, Horsey, H., CADSWES, and Yurman, D., Office of Solid Waste & Emergency Response, U.S. Environmental Protection Agency, Washington, DC, May 1988. [4] Management Plan for Geographic Information Systems, Information Management Staff, Office of Solid Waste & Emergency Response, U.S. Environmental Protection Agency, Washington, DC, November 1988. Contractor assistance provided by Selden, D., American Management Systems, Rosslyn, Virginia. [5] "Old Southington Landfill - Use of Historical Aerial Photography and GIS," Proceedings of the Remote Sensing Coordinators Conference, Environmental Photo Interpretation Center (EPCI), U.S. Environmental Protection Agency, Vint Hill Farms, Virginia, April 1988. [6] "Management Review of the Clark Fork, Montana, Geographic Information System," Hazardous Waste Division, U. S. Environmental Protection Agency, Region VIII, Denver, Colorado, July 1988. [7] "Success Factors for Expert Systems," Yurman, D., In Proceedings of the American Chemical Society, Symposium Series No.431, Washington, DC, September 1989 * Standard disclaimer included by reference * ------------------------------------------------------------ * Dan Yurman Idaho National Engineering Laboratory * djy@inel.gov PO Box 1625, Idaho Falls, ID 83415-3900 * phone: (208) 526-8591 fax: (208)-526-6852 * ------------------------------------------------------------ * 43N 112W Red Shift is not a brand of chewing tobacco.