jimbi@copper.TEK.COM (Jim Bigelow) (01/29/88)
I'm not sure what the purpose of this group is, but since it has hypertext in it name I'd like to offer some of my thoughts on the use of hypertext not only in the textual world by in the arena of CASE (Computer-Aided Software Engineering). I've written a paper which I can post if people are interested. In summary, I propose that a hypertext database can be used to hold everything associated with a software project, the specifications, designs, manuals, test suites, code, etc. Links are used to connect and make explicit the existing implicit links between the project components. Furthermore, the reuse of project components is facilitated by using links. A paragraph may be used in a use manual, specification and a comment in the code without duplicating the text. I've built a prototype of a utility to convert a set of C source files to a hypertext structure I call the source code tree. The tree is based on the call tree of the program and modified by preprocesor commands such as #ifdef. Because all symbol references are resolved, questions such as who calls this function or where is this variable defined and who uses it can be answered by inspecting the source code tree. rather then flipping back and forth through the listing. Jim Bigelow tektronix!copper!jimbi or jimbi@copper.tek.com
norman@husc4.HARVARD.EDU (John Norman) (01/30/88)
In article <1670@copper.TEK.COM> jimbi@copper.TEK.COM (Jim Bigelow) writes: > >In summary, I propose that a hypertext database can be used to hold everything >associated with a software project, the specifications, designs, manuals, >test suites, code, etc. Links are used to connect and make explicit the >existing implicit links between the project components. Furthermore, the >reuse of project components is facilitated by using links. A paragraph may >be used in a use manual, specification and a comment in the code without >duplicating the text. > >I've built a prototype of a utility to convert a set of C source files to a >hypertext structure I call the source code tree. This reminds me of Knuth's "WEB" system for building programs with comments. It would be interesting for WEB to be available in such an on-line system. John Norman Department of English and American Literature and Language Warren House Box D-12 Harvard University Cambridge, MA 02138 617/495-2533 (Official business ONLY) UUCP: harvard!husc4!norman Internet: norman@hulaw1.HARVARD.EDU BITNET: NORMAN@HULAW1
deh0654@sjfc.UUCP (Dennis Hamilton) (02/03/88)
In article <1670@copper.TEK.COM> jimbi@copper.TEK.COM (Jim Bigelow) writes: >I'm not sure what the purpose of this group is, but since it has hypertext >in it name I'd like to offer some of my thoughts on the use of hypertext not >only in the textual world by in the arena of CASE (Computer-Aided Software >Engineering). >I've written a paper which I can post if people are interested. >tektronix!copper!jimbi or jimbi@copper.tek.com Please post the paper. If we can't apply these ideas to our own technology, what better proving ground will we find? Also, have you looked at Donald Knuth's WEB system and his ideas of literate programming? Hypertext would seem to be a natural way of integrating that idea into an overall CASE structure. (Being able to pull out literate technical documentation for blasting through a document process, like TeX, could always remain an option, even if one that hypertext would tend to make less necessary.) Dennis E. Hamilton -- -- orcmid {uucp: ... !rochester!sjfc!deh0654 vanishing into a twisty little network of nodes all alike}
jimbi@copper.TEK.COM (Jim Bigelow) (02/12/88)
Here is a poorly formatted version of my paper; nroff is so clunky. I am posting the source so you may format it yourself to a laserprinter or some such device. Hypertext and CASE James Bigelow Computer Aided Software Engineering Division Design Automation Group Tektronix, Inc. P.O. Box 4600 Beaverton, Oregon 97075 ABSTRACT -------- CASE systems require a method to tie various documents, memos, source code, etc. together to provide coherent system documentation. These sys- tems also require complete version histories of everything in a project. Hypertext meets these requirements and provides an excellent data model for CASE systems. This paper describes hypertext and the Hypertext Abstract Machine, built at Tek- tronix, and shows how hypertext is useful for CASE applications. December 17, 1987 Hypertext and CASE James Bigelow Computer Aided Software Engineering Division Design Automation Group Tektronix, Inc. P.O. Box 4600 Beaverton, Oregon 97075 The heart of any CASE system is its database 1,2,3 which must provide at least three capabilities: o The ability to logically associate documentation and source code. o The ability to make annotations for recording explanations and assumptions. o The ability to provide good version management. A CASE environment places other demands on a database, due to the nature of large scale projects and project teams. The database must support simultaneous access by team members and support editing and authorship in a computer network. Team members also must be able to work indepen- dently without interference from other team members and then be able to merge their work back into the main project; a taxing demand on a configuration management system. The database must allow specific configurations or ver- sion trees to be built, along with the ability to subse- quently merge version branches back into the primary ver- sion. Meeting this requirement provides the fundamentals for good version management as well as the functionality for a configuration manager. In search of a system that meets the above require- ments, researchers at Tektronix built Neptune, which demon- strates that hypertext provides an appropriate data model for CASE systems. Hypertext is a medium grained, entity- relationship-like data model that allows an arbitrary struc- turing of information and keeps a complete version history of both information and structure. Hypertext and Neptune Revealed --------- --- ------- -------- Hypertext was first conceived as early as 1945 as a - 2 - means of storing all sources of information, both for ready access and cross referencing. 4,5 There are now a number of commercial and academic packages promoting hypertext capa- bilities. 6 However, the key to the future success of hyper- text is an efficient, application independent data storage method, such as was introduced in 1986, with the development of Neptune, at Tektronix. Neptune achieves application independency by using a layered system architecture; at the bottom is a transaction-based server, the Hypertext Abstract Machine (HAM), with applications and a user interface lay- ered above. The HAM provides distributed access over a com- puter network, and is synchronized for multi-user access with transaction based recovery. The HAM also presents a generic hypertext model by defining operations for creating, modifying, and accessing hypertext components . A complete information and version history is maintained and rapid access to any version is provided. Hypertext Concepts and Terminology. The basic ingredients of a hypertext system are nodes and links. Nodes provide a means to store data, and links provide the relationship between the data in different nodes. The HAM identifies nodes and links by associating an attribute/value pair with a node or link. For example, a node's name attri- bute is given a value such as module 1 to identify the con- tents as the source code in module 1. Information is grouped into configurations by using contexts, collections of nodes and links. Since nodes and links may be thought of as directed graphs, a collection of nodes, links, and con- texts is called a graph. An Example CASE Environment. An example C-based CASE environment, DynamicDesign developed at Tektronix, has all of its project components in the HAM: o requirements and specifications o design notes and documents o implementation notes o source and object code o user documentation Nodes are used to contain the project components, and links depict the relationships between the components. Attributes are used to label the types of nodes and links. Table 1 shows the possible values of two node and link attributes. In DynamicDesign, nodes have an attribute, projectCom- ----------- ponent, which identifies the type of project component they ------ contain. Links have an attribute, relatesTo, which shows --------- the type of relation the link provides. For example, sequential information may be associated by connecting two nodes with a link whose attribute, relatesTo, has a value of --------- - 3 - ------------------------------------------------------- | Attribute Name | Possible Values | |-------------------+----------------------------------+ | projectComponent | requirement, spec, designNote, | | | design assumption, comment, | | | source, object, symbolTable, | | | documentation, report | ------------------------------------------------------- | relatesTo | leadsTo, comments, refersTo, | | | callsProcedure, followsFrom, | | | implements, isDefinedBy | ------------------------------------------------------- Table 1 - Two Node and Link Attributes and their Values leadsTo. In Figure 1, module 1 precedes module 2 so they ------- are both stored in nodes and connected by a link having the attribute value leadsTo. ------- The relationship between a specification and the code which implements it can be shown with links. The node con- taining the portion of the specification (projectComponent = ---------------- spec) and a node containing the code (projectComponent = ---- ---------------- source) are related with a link which has relatesTo = imple- ------ --------- ------ ments. As another example, a link can used to show what ----- module contains the definition of a variable by relating the two modules with a link of type refersTo. -------- Nodes. Nodes are similar to the nodes in directed graphs and are used to hold any object in the CASE system: text, graphics, object code, etc.. Hypertext does not place any constraint on node format, making it particularly useful for holding, in one database, the wide variety of informa- tion found in the CASE environment. Nodes are atomic data units, so the issue of node con- tents is important. If a piece of data is referenced in more than one place (e.g. a section of text is in both the requirements and the comments for a section of code) the data should be in a node by itself. However, the applica- tion that uses hypertext, by determining the unit of incrementality used when processing the information, is the final arbitrator of how much should be placed in one node. For example, in the case of a compiler, which can recompile a changed procedure individually without recompiling the entire module that contains the procedure, 7,8 the unit of incrementality is a procedure. Other compilers may enforce a larger increment, such as a module. Version history, in the HAM, is kept at the node and - 4 - link level (see the next Section for an explanation of link history) . While lookup time is proportional to the age of the version, all versions of a node's content may be archived and retrieved on demand. When combined with con- texts (see below) this creates the fundamentals of a confi- guration manager. ------- leads ------- leads ------- leads ------- | para- |------>| para- |------>| para- |------>| para- | |graph 1| To |graph 2| To |graph 3| To |graph 4| ------- ------- ------- ------- | \ / | comments comments comments comments | \ / | | \ / | V \ / V -------- calls -------- calls -------- | module |------------->| module |------------->| module | | 1 | Procedure | 2 | Procedure | 3 | -------- -------- -------- Figure 1 - Commenting Source Code with Nodes and Links Links. Links are thought of as the arcs in directed graphs and exist to associate two nodes. Links, along with nodes, form the essence of hypertext; information storage and linking to allow nonlinear organization of information. In Figure 1, the link connecting the two nodes, paragraph 1 and module 1, provides the logical association that para- graph 1 comments on module 1. In addition to association, the HAM provides the ability to traverse a link in either direction. Thus, while reading paragraph 2, in Figure 1, one can traverse the link to module 2, read the module, traverse the link to paragraph 3, and read that paragraph. In the HAM, a link is not restricted to merely pointing to the entire node. A link is attached, at each end, to any object or place in a node. Examples of objects that a link can attach to are: a character in text, an extent of text (e.g. a sentence or paragraph), an x,y coordinate in graphic information, or a graphics object, such as a process bubble in a data flow diagram. The history of a link is the history of its attribute/value pairs and its attachment object within a node. By looking at a link's history you can tell if its name or location of attachement point ever changed and what the changes were. Attribute/Value Pairs. Attribute/value pairs extend the power of hypertext, by allowing the organization of nodes and links into sub-graphs within a single context (see - 5 - below for an explanation of grouping by contexts). The pri- mary objective of attribute/value pairs is to make it easy to access all the information needed and to restrict the access to only what is needed. With the preceding goal in mind, attributes are used to identify or categorize nodes, links and contexts. The HAM provides an unlimited number of attribute/value pairs so numerous attributes are used to multiply categorize nodes, links and contexts. Table 1 is an example of using attribute values to identify the con- tents of nodes and the meaning of links. Another example is identifying contexts in DynamicDesign. Contexts are given an attribute, projectCategory, which has the following --------------- values: specifications, design documentation, program docu- -------------- ------ ------------- ------- ----- mentation, user documentation, implementation notes, source --------- ---- ------------- -------------- ----- ------ code, object code, symbol tables, and product (refer to Fig- ---- ------ ---- ------ ------ ------- ure 3 for the uses of these attributes). The context attri- bute is used in the query operations, discussed in the next section, as a method to locate or filter information. Also, a thorough example of attribute usage in CASE environments is presented in PMDB. 1 Attributes and Query Operations. The HAM provides a sophisticated set of query operations to both traverse and retrieve a collection of nodes, links and contexts. The query operations use predicates based on attribute/value pairs to determine which nodes, links and contexts satisfy the queries. For example, in DynamicDesign, nodes contain- ing source code are placed in the context, projectCategory = --------------- source code and have an attribute system attached to them. ------ ---- ------ The attribute, system, can assume any of the following ------ values: all, amiga, bsd, eunice, osk, sysv, and vms. A node --- ----- --- ------ --- ---- --- predicate system = all used in a retrieval query shows only ------ --- those nodes containing source code applicable to all sys- tems. Conversely, a node predicate of system = vms allows ------ --- access to nodes with source code applicable only to VAX/VMS+. A traversal query using a predicate such as sys- ---- tem = all OR system = vms returns the version of the product --- --- -- ------ --- source code tailored for the VAX/VMS environment. The Code and Comments Problem. The documentation of a computer program is usually either squeezed into the margins of the program where it is generally too terse to be useful, or its text is interspersed through out the text of the pro- gram, breaking up the flow of both the program and the docu- mentation. DynamicDesign allows the program documentation and program source code to exist in separate nodes with links between them. Figure 1 illustrates two node types; program code and program documentation. The nodes labeled module 1, module 2 ------------------------- + VAX/VMS is a trade mark of Digital Equipment Cor- poration - 6 - and module 3 contain program source code while the nodes labeled paragraph 1, paragraph 2, paragraph 3, and paragraph 4 contain the documentation. Figure 1 also shows two uses of links: the sequential link, leadsTo and the annotative ------- link , comments. By using this arrangement, either documen- -------- tation or source code may be viewed by following links typed leadsTo, without interruption. However, if source code ------- requires an explanation, it is seen by following the com- ---- ments link back to its source in the program documentation. ----- Also, while reading the program documentation, source code is viewed at any time by traversing the comment link. This ------- method provides the program documentation with a freedom from space restrictions not usually found in conventional, in-code documentation methods. Contexts -------- The concept of a context, i.e. collecting or partition- ing nodes and links into a set, was missing from hypertext until 1987. 9 Contexts provide a way to group common nodes and links. But they are more powerful than that because contexts indirectly support cooperative multi-person design and documentation of large-scale software systems by directly supporting partitioning, version trees, and confi- guration management. The HAM provides operations for creat- ing and populating contexts with nodes, links and sub- contexts. A merging operation is provided to allow a subset of nodes and links in one context to be copied into another context. The merge operation has several interesting uses and ramifications that are discussed in the next two sec- tions. Contexts and Configuration Management -------- --- ------------- ---------- A context allows the designation of a configuration of nodes and links. When a group of source nodes have reached a baseline configuration, they are moved, with the merge operation, into another context holding released products. Figure 2 depicts the state transitions of two contexts, pro- ---- ject and release. The states are labeled V0, V1, V2, etc. ---- ------- and are used to show differences in the nodes and links that comprise the content of both contexts. Figure 2 illustrates the merging of the content of context project, at state V0, ------- into context release, at state V0. Development continues in ------- project and is shown by the intermediate states V1, V2, and ------- V3. At state V3, in project, the content of project is ------- ------- again merged into release; yielding state V1 in release. ------- ------- One of the properties of the merge operation is that node and link histories are not moved from one context to another. Therefore, while complete node and link histories are preserved in project, the version history in release is ------- ------- that of the versions of the nodes and links in project when ------- they were merged into release. This means that the nodes and ------- links in release do not have a record of the states V1 and ------- - 7 - V2 in project, only states V0 and V3. ------- Project Time Line --> V0 --> V1 --> V2 --> V3 --> V4 --> V5 --> V6 --> V7 --> | | | merge merge merge | | | | | | V V V V0 -------------------> V1 ---------------------------> V2 --> Release Time Line Figure 2 - Time Lines for Two Context, Project and Release, with Intermediate Versions Contexts and Local Workspaces. Contexts can also be used to define a workspace and to partition a project into a project workspace and local workspaces. 10 A local workspace allows a developer to abstract a subset of nodes and links from the project workspace and place them in another workspace. This workspace becomes the local workspace in which developers may make local modifications and test them against the rest of the project. When satisfied, the developer attempts to merge the changes back into the sys- tem. Ideally the partitioning of workspaces between develop- ers is disjoint, however, this may not be possible due to project and/or requirement constraints. So, two or more developers may be working on the same nodes concurrently. The chance of concurrent development means that when merging a local workspace back into the project workspace some method of detecting and resolving concurrent updates must be available. Detection of changes is mandatory to avoid overwriting work by one or more developers. Therefore, the HAM provides operations for detecting differences between both nodes and contexts. These two operations facilitate merging changed nodes back into the project workspace by detecting the differences that may have been made in the project workspace after the developer created the local workspace. Project Category Interconnections ------- -------- ---------------- A project component is any piece of information or data associated with a project. There are some broad categories that the data can be placed in: - 8 - o management reports o specification and requirements o design, program, and user documentation o implementation notes o source code o object code o products Within each of these categories are the actual documents, memos, papers, binaries, etc. that make up the project. By placing all the components of a project in hypertext, they are archived, recoverable, and available for use within other parts of the project. specifications / | \ --------- ------- / / | | \ / / | | \ design --------- program ----------- User documentation / documentation | documentation / / \ | / / / \ | / / / \ | / / / \ | / / / \ | / / / \| / implementation-----------source ------------- object notes code / code / / / / / / / / symbol ---------- product tables Figure 3. - The Interconnections of Project Categories in DynamicDesign Interconnections between project components exist even in a project which uses paper documents. However, there is also much duplication of both effort and documentation. Additionally, many opportunities to point out the relation- ships between components are missed because the effort involved is too great for the time permitted. DynamicDesign, the example CASE environment, has all the information concerning a project in its hypertext data- base. Contexts are used to group the data into the categories mentioned above, shown in Figure 3 and enumerated in the section describing attribute/value pairs. The lines, in Figure 3 which connect the ellipses, representing con- texts, show the direct interconnection and interrelation- ships between the data in the contexts. Due to the ability to use links, one piece of data can be present in several contexts. Therefore a paragraph about a design may do tri- ple duty, as a comment in the program documentation and a - 9 - paragraph in both the user and design documentation. Links are used to provide traceability of functional requirements. A link from a requirement in specifications, -------------- leads to a data flow diagram in design documentation. From ------ ------------- the data flow diagram, a link leads to a structure chart or other design document and then back to the requirement. This provides a check on the one-to-one relationship between requirements and designs. Links also exist from nodes in design documents to ------ --------- nodes in program documents and then nodes in source code. ------- --------- ------ ---- Implementation notes provides a repository for documenting -------------- ----- assumptions and decisions based on the actual implementation of the design. There are links between the nodes in imple- ------ mentation notes and those in design documentation and source --------- ----- ------ ------------- ------ code to record the associates between them. ---- One real gain from this system is the ease of demon- strating that all the requirements have been fulfilled. Starting at specifications, links radiate out through the -------------- project; through design documentation, implementation notes, ------ ------------- -------------- ----- source code, user documentation, and back to specifications, ------ ---- ---- ------------- -------------- forming a graph cycle. Every path from specifications that -------------- is not a cycle, indicates an unfulfilled requirement. DynamicDesign also aides in program maintenance. Maintenance personnel are able to read the designer's design documents, assumptions, implementation notes and have them linked directly to the relevant sections of the code. So the job of gauging the effect of a program modification is aided by easy access to documentation linked to specific portions of the code. Note in Figure 3 that the object code and symbol tables, while part of the project, are only related directly to the source code. A compiler integrated with hypertext makes good use of its storage abilities. Module symbol tables need not be reconstructed for every compile, but merely updated. A symbolic debugger can make excellent use of the module symbol tables left in hypertext and, since they are in hypertext and not in the object code, the symbol table and debugging information can be quite extensive. When a compiler is used with a facility similar to the UNIX+ make facility to generate recompilation commands, Dynam- icDesign becomes a programming environment. Conclusions ----------- I have shown how hypertext provides an appropriate data model for CASE, and have provided a number of examples of ------------------------- + UNIX is a trademark of Bell Laboratories. - 10 - how to apply it. Hypertext is, however, only a part of a CASE environment, albeit a powerful part. With the other parts, editors, compilers, linkers, electronic mail systems, etc. integrated with hypertext there is the possibility of a powerful and productive programming environment. Much work remains for Tektronix and others to fully explore and devise ways to exploit the capabilities of hypertext. Work on building systems using hypertext can focus on how to automate the creation of sequential and relational links. Sequential links show that one node logically fol- lows another (leadsTo in Figure1) and relational links show that two node are logically related, but not sequentially (comments in Figure 1). By automating the linking process based on the way a node is used, the users is spared repeti- tive linking. However, there may always be the need for the means to create a link at the user's command, to point out a relationship the system has missed and only a person can see. A weakness that has not been addressed is how to represent fine grained information. One solution is to create a partnership between hypertext and relational data- bases. A relational database can hold fine grained informa- tion such as definition-use links in an incremental compiler's symbol tables. A relationally complete query language extends the functionality of hypertext to provide even more capabilities. References: See the file in the document source code for references. Sorry