jimbi@copper.TEK.COM (Jim Bigelow) (01/29/88)
I'm not sure what the purpose of this group is, but since it has hypertext in it name I'd like to offer some of my thoughts on the use of hypertext not only in the textual world by in the arena of CASE (Computer-Aided Software Engineering). I've written a paper which I can post if people are interested. In summary, I propose that a hypertext database can be used to hold everything associated with a software project, the specifications, designs, manuals, test suites, code, etc. Links are used to connect and make explicit the existing implicit links between the project components. Furthermore, the reuse of project components is facilitated by using links. A paragraph may be used in a use manual, specification and a comment in the code without duplicating the text. I've built a prototype of a utility to convert a set of C source files to a hypertext structure I call the source code tree. The tree is based on the call tree of the program and modified by preprocesor commands such as #ifdef. Because all symbol references are resolved, questions such as who calls this function or where is this variable defined and who uses it can be answered by inspecting the source code tree. rather then flipping back and forth through the listing. Jim Bigelow tektronix!copper!jimbi or jimbi@copper.tek.com
norman@husc4.HARVARD.EDU (John Norman) (01/30/88)
In article <1670@copper.TEK.COM> jimbi@copper.TEK.COM (Jim Bigelow) writes: > >In summary, I propose that a hypertext database can be used to hold everything >associated with a software project, the specifications, designs, manuals, >test suites, code, etc. Links are used to connect and make explicit the >existing implicit links between the project components. Furthermore, the >reuse of project components is facilitated by using links. A paragraph may >be used in a use manual, specification and a comment in the code without >duplicating the text. > >I've built a prototype of a utility to convert a set of C source files to a >hypertext structure I call the source code tree. This reminds me of Knuth's "WEB" system for building programs with comments. It would be interesting for WEB to be available in such an on-line system. John Norman Department of English and American Literature and Language Warren House Box D-12 Harvard University Cambridge, MA 02138 617/495-2533 (Official business ONLY) UUCP: harvard!husc4!norman Internet: norman@hulaw1.HARVARD.EDU BITNET: NORMAN@HULAW1
deh0654@sjfc.UUCP (Dennis Hamilton) (02/03/88)
In article <1670@copper.TEK.COM> jimbi@copper.TEK.COM (Jim Bigelow) writes: >I'm not sure what the purpose of this group is, but since it has hypertext >in it name I'd like to offer some of my thoughts on the use of hypertext not >only in the textual world by in the arena of CASE (Computer-Aided Software >Engineering). >I've written a paper which I can post if people are interested. >tektronix!copper!jimbi or jimbi@copper.tek.com Please post the paper. If we can't apply these ideas to our own technology, what better proving ground will we find? Also, have you looked at Donald Knuth's WEB system and his ideas of literate programming? Hypertext would seem to be a natural way of integrating that idea into an overall CASE structure. (Being able to pull out literate technical documentation for blasting through a document process, like TeX, could always remain an option, even if one that hypertext would tend to make less necessary.) Dennis E. Hamilton -- -- orcmid {uucp: ... !rochester!sjfc!deh0654 vanishing into a twisty little network of nodes all alike}
jimbi@copper.TEK.COM (Jim Bigelow) (02/12/88)
Here is a poorly formatted version of my paper; nroff is so clunky. I am
posting the source so you may format it yourself to a laserprinter or
some such device.
Hypertext and CASE
James Bigelow
Computer Aided Software Engineering Division
Design Automation Group
Tektronix, Inc.
P.O. Box 4600
Beaverton, Oregon 97075
ABSTRACT
--------
CASE systems require a method to tie various
documents, memos, source code, etc. together to
provide coherent system documentation. These sys-
tems also require complete version histories of
everything in a project. Hypertext meets these
requirements and provides an excellent data model
for CASE systems. This paper describes hypertext
and the Hypertext Abstract Machine, built at Tek-
tronix, and shows how hypertext is useful for CASE
applications.
December 17, 1987
Hypertext and CASE
James Bigelow
Computer Aided Software Engineering Division
Design Automation Group
Tektronix, Inc.
P.O. Box 4600
Beaverton, Oregon 97075
The heart of any CASE system is its database 1,2,3
which must provide at least three capabilities:
o The ability to logically associate documentation
and source code.
o The ability to make annotations for recording
explanations and assumptions.
o The ability to provide good version management.
A CASE environment places other demands on a database,
due to the nature of large scale projects and project teams.
The database must support simultaneous access by team
members and support editing and authorship in a computer
network. Team members also must be able to work indepen-
dently without interference from other team members and then
be able to merge their work back into the main project; a
taxing demand on a configuration management system.
The database must allow specific configurations or ver-
sion trees to be built, along with the ability to subse-
quently merge version branches back into the primary ver-
sion. Meeting this requirement provides the fundamentals
for good version management as well as the functionality for
a configuration manager.
In search of a system that meets the above require-
ments, researchers at Tektronix built Neptune, which demon-
strates that hypertext provides an appropriate data model
for CASE systems. Hypertext is a medium grained, entity-
relationship-like data model that allows an arbitrary struc-
turing of information and keeps a complete version history
of both information and structure.
Hypertext and Neptune Revealed
--------- --- ------- --------
Hypertext was first conceived as early as 1945 as a
- 2 -
means of storing all sources of information, both for ready
access and cross referencing. 4,5 There are now a number of
commercial and academic packages promoting hypertext capa-
bilities. 6 However, the key to the future success of hyper-
text is an efficient, application independent data storage
method, such as was introduced in 1986, with the development
of Neptune, at Tektronix. Neptune achieves application
independency by using a layered system architecture; at the
bottom is a transaction-based server, the Hypertext Abstract
Machine (HAM), with applications and a user interface lay-
ered above. The HAM provides distributed access over a com-
puter network, and is synchronized for multi-user access
with transaction based recovery. The HAM also presents a
generic hypertext model by defining operations for creating,
modifying, and accessing hypertext components . A complete
information and version history is maintained and rapid
access to any version is provided.
Hypertext Concepts and Terminology. The basic
ingredients of a hypertext system are nodes and links.
Nodes provide a means to store data, and links provide the
relationship between the data in different nodes. The HAM
identifies nodes and links by associating an attribute/value
pair with a node or link. For example, a node's name attri-
bute is given a value such as module 1 to identify the con-
tents as the source code in module 1. Information is
grouped into configurations by using contexts, collections
of nodes and links. Since nodes and links may be thought of
as directed graphs, a collection of nodes, links, and con-
texts is called a graph.
An Example CASE Environment. An example C-based CASE
environment, DynamicDesign developed at Tektronix, has all
of its project components in the HAM:
o requirements and specifications
o design notes and documents
o implementation notes
o source and object code
o user documentation
Nodes are used to contain the project components, and
links depict the relationships between the components.
Attributes are used to label the types of nodes and links.
Table 1 shows the possible values of two node and link
attributes.
In DynamicDesign, nodes have an attribute, projectCom-
-----------
ponent, which identifies the type of project component they
------
contain. Links have an attribute, relatesTo, which shows
---------
the type of relation the link provides. For example,
sequential information may be associated by connecting two
nodes with a link whose attribute, relatesTo, has a value of
---------
- 3 -
-------------------------------------------------------
| Attribute Name | Possible Values |
|-------------------+----------------------------------+
| projectComponent | requirement, spec, designNote, |
| | design assumption, comment, |
| | source, object, symbolTable, |
| | documentation, report |
-------------------------------------------------------
| relatesTo | leadsTo, comments, refersTo, |
| | callsProcedure, followsFrom, |
| | implements, isDefinedBy |
-------------------------------------------------------
Table 1 - Two Node and Link Attributes and their Values
leadsTo. In Figure 1, module 1 precedes module 2 so they
-------
are both stored in nodes and connected by a link having the
attribute value leadsTo.
-------
The relationship between a specification and the code
which implements it can be shown with links. The node con-
taining the portion of the specification (projectComponent =
----------------
spec) and a node containing the code (projectComponent =
---- ----------------
source) are related with a link which has relatesTo = imple-
------ --------- ------
ments. As another example, a link can used to show what
-----
module contains the definition of a variable by relating the
two modules with a link of type refersTo.
--------
Nodes. Nodes are similar to the nodes in directed
graphs and are used to hold any object in the CASE system:
text, graphics, object code, etc.. Hypertext does not place
any constraint on node format, making it particularly useful
for holding, in one database, the wide variety of informa-
tion found in the CASE environment.
Nodes are atomic data units, so the issue of node con-
tents is important. If a piece of data is referenced in
more than one place (e.g. a section of text is in both the
requirements and the comments for a section of code) the
data should be in a node by itself. However, the applica-
tion that uses hypertext, by determining the unit of
incrementality used when processing the information, is the
final arbitrator of how much should be placed in one node.
For example, in the case of a compiler, which can recompile
a changed procedure individually without recompiling the
entire module that contains the procedure, 7,8 the unit of
incrementality is a procedure. Other compilers may enforce
a larger increment, such as a module.
Version history, in the HAM, is kept at the node and
- 4 -
link level (see the next Section for an explanation of link
history) . While lookup time is proportional to the age of
the version, all versions of a node's content may be
archived and retrieved on demand. When combined with con-
texts (see below) this creates the fundamentals of a confi-
guration manager.
------- leads ------- leads ------- leads -------
| para- |------>| para- |------>| para- |------>| para- |
|graph 1| To |graph 2| To |graph 3| To |graph 4|
------- ------- ------- -------
| \ / |
comments comments comments comments
| \ / |
| \ / |
V \ / V
-------- calls -------- calls --------
| module |------------->| module |------------->| module |
| 1 | Procedure | 2 | Procedure | 3 |
-------- -------- --------
Figure 1 - Commenting Source Code with Nodes and Links
Links. Links are thought of as the arcs in directed
graphs and exist to associate two nodes. Links, along with
nodes, form the essence of hypertext; information storage
and linking to allow nonlinear organization of information.
In Figure 1, the link connecting the two nodes, paragraph 1
and module 1, provides the logical association that para-
graph 1 comments on module 1. In addition to association,
the HAM provides the ability to traverse a link in either
direction. Thus, while reading paragraph 2, in Figure 1,
one can traverse the link to module 2, read the module,
traverse the link to paragraph 3, and read that paragraph.
In the HAM, a link is not restricted to merely pointing
to the entire node. A link is attached, at each end, to any
object or place in a node. Examples of objects that a link
can attach to are: a character in text, an extent of text
(e.g. a sentence or paragraph), an x,y coordinate in graphic
information, or a graphics object, such as a process bubble
in a data flow diagram.
The history of a link is the history of its
attribute/value pairs and its attachment object within a
node. By looking at a link's history you can tell if its
name or location of attachement point ever changed and what
the changes were.
Attribute/Value Pairs. Attribute/value pairs extend
the power of hypertext, by allowing the organization of
nodes and links into sub-graphs within a single context (see
- 5 -
below for an explanation of grouping by contexts). The pri-
mary objective of attribute/value pairs is to make it easy
to access all the information needed and to restrict the
access to only what is needed. With the preceding goal in
mind, attributes are used to identify or categorize nodes,
links and contexts. The HAM provides an unlimited number of
attribute/value pairs so numerous attributes are used to
multiply categorize nodes, links and contexts. Table 1 is
an example of using attribute values to identify the con-
tents of nodes and the meaning of links. Another example is
identifying contexts in DynamicDesign. Contexts are given
an attribute, projectCategory, which has the following
---------------
values: specifications, design documentation, program docu-
-------------- ------ ------------- ------- -----
mentation, user documentation, implementation notes, source
--------- ---- ------------- -------------- ----- ------
code, object code, symbol tables, and product (refer to Fig-
---- ------ ---- ------ ------ -------
ure 3 for the uses of these attributes). The context attri-
bute is used in the query operations, discussed in the next
section, as a method to locate or filter information. Also,
a thorough example of attribute usage in CASE environments
is presented in PMDB. 1
Attributes and Query Operations. The HAM provides a
sophisticated set of query operations to both traverse and
retrieve a collection of nodes, links and contexts. The
query operations use predicates based on attribute/value
pairs to determine which nodes, links and contexts satisfy
the queries. For example, in DynamicDesign, nodes contain-
ing source code are placed in the context, projectCategory =
---------------
source code and have an attribute system attached to them.
------ ---- ------
The attribute, system, can assume any of the following
------
values: all, amiga, bsd, eunice, osk, sysv, and vms. A node
--- ----- --- ------ --- ---- ---
predicate system = all used in a retrieval query shows only
------ ---
those nodes containing source code applicable to all sys-
tems. Conversely, a node predicate of system = vms allows
------ ---
access to nodes with source code applicable only to
VAX/VMS+. A traversal query using a predicate such as sys-
----
tem = all OR system = vms returns the version of the product
--- --- -- ------ ---
source code tailored for the VAX/VMS environment.
The Code and Comments Problem. The documentation of a
computer program is usually either squeezed into the margins
of the program where it is generally too terse to be useful,
or its text is interspersed through out the text of the pro-
gram, breaking up the flow of both the program and the docu-
mentation. DynamicDesign allows the program documentation
and program source code to exist in separate nodes with
links between them.
Figure 1 illustrates two node types; program code and
program documentation. The nodes labeled module 1, module 2
-------------------------
+ VAX/VMS is a trade mark of Digital Equipment Cor-
poration
- 6 -
and module 3 contain program source code while the nodes
labeled paragraph 1, paragraph 2, paragraph 3, and paragraph
4 contain the documentation. Figure 1 also shows two uses
of links: the sequential link, leadsTo and the annotative
-------
link , comments. By using this arrangement, either documen-
--------
tation or source code may be viewed by following links typed
leadsTo, without interruption. However, if source code
-------
requires an explanation, it is seen by following the com-
----
ments link back to its source in the program documentation.
-----
Also, while reading the program documentation, source code
is viewed at any time by traversing the comment link. This
-------
method provides the program documentation with a freedom
from space restrictions not usually found in conventional,
in-code documentation methods.
Contexts
--------
The concept of a context, i.e. collecting or partition-
ing nodes and links into a set, was missing from hypertext
until 1987. 9 Contexts provide a way to group common nodes
and links. But they are more powerful than that because
contexts indirectly support cooperative multi-person design
and documentation of large-scale software systems by
directly supporting partitioning, version trees, and confi-
guration management. The HAM provides operations for creat-
ing and populating contexts with nodes, links and sub-
contexts. A merging operation is provided to allow a subset
of nodes and links in one context to be copied into another
context. The merge operation has several interesting uses
and ramifications that are discussed in the next two sec-
tions.
Contexts and Configuration Management
-------- --- ------------- ----------
A context allows the designation of a configuration of
nodes and links. When a group of source nodes have reached
a baseline configuration, they are moved, with the merge
operation, into another context holding released products.
Figure 2 depicts the state transitions of two contexts, pro-
----
ject and release. The states are labeled V0, V1, V2, etc.
---- -------
and are used to show differences in the nodes and links that
comprise the content of both contexts. Figure 2 illustrates
the merging of the content of context project, at state V0,
-------
into context release, at state V0. Development continues in
-------
project and is shown by the intermediate states V1, V2, and
-------
V3. At state V3, in project, the content of project is
------- -------
again merged into release; yielding state V1 in release.
------- -------
One of the properties of the merge operation is that node
and link histories are not moved from one context to
another. Therefore, while complete node and link histories
are preserved in project, the version history in release is
------- -------
that of the versions of the nodes and links in project when
-------
they were merged into release. This means that the nodes and
-------
links in release do not have a record of the states V1 and
-------
- 7 -
V2 in project, only states V0 and V3.
-------
Project Time Line
--> V0 --> V1 --> V2 --> V3 --> V4 --> V5 --> V6 --> V7 -->
| | |
merge merge merge
| | |
| | |
V V V
V0 -------------------> V1 ---------------------------> V2 -->
Release Time Line
Figure 2 - Time Lines for Two Context, Project and Release, with
Intermediate Versions
Contexts and Local Workspaces. Contexts can also be
used to define a workspace and to partition a project into a
project workspace and local workspaces. 10 A local workspace
allows a developer to abstract a subset of nodes and links
from the project workspace and place them in another
workspace. This workspace becomes the local workspace in
which developers may make local modifications and test them
against the rest of the project. When satisfied, the
developer attempts to merge the changes back into the sys-
tem.
Ideally the partitioning of workspaces between develop-
ers is disjoint, however, this may not be possible due to
project and/or requirement constraints. So, two or more
developers may be working on the same nodes concurrently.
The chance of concurrent development means that when merging
a local workspace back into the project workspace some
method of detecting and resolving concurrent updates must be
available. Detection of changes is mandatory to avoid
overwriting work by one or more developers. Therefore, the
HAM provides operations for detecting differences between
both nodes and contexts. These two operations facilitate
merging changed nodes back into the project workspace by
detecting the differences that may have been made in the
project workspace after the developer created the local
workspace.
Project Category Interconnections
------- -------- ----------------
A project component is any piece of information or data
associated with a project. There are some broad categories
that the data can be placed in:
- 8 -
o management reports
o specification and requirements
o design, program, and user documentation
o implementation notes
o source code
o object code
o products
Within each of these categories are the actual documents,
memos, papers, binaries, etc. that make up the project. By
placing all the components of a project in hypertext, they
are archived, recoverable, and available for use within
other parts of the project.
specifications
/ | \
--------- -------
/ / | | \
/ / | | \
design --------- program ----------- User
documentation / documentation | documentation
/ / \ | /
/ / \ | /
/ / \ | /
/ / \ | /
/ / \ | /
/ / \| /
implementation-----------source ------------- object
notes code / code
/ /
/ /
/ /
/ /
symbol ---------- product
tables
Figure 3. - The Interconnections of Project Categories in DynamicDesign
Interconnections between project components exist even
in a project which uses paper documents. However, there is
also much duplication of both effort and documentation.
Additionally, many opportunities to point out the relation-
ships between components are missed because the effort
involved is too great for the time permitted.
DynamicDesign, the example CASE environment, has all
the information concerning a project in its hypertext data-
base. Contexts are used to group the data into the
categories mentioned above, shown in Figure 3 and enumerated
in the section describing attribute/value pairs. The lines,
in Figure 3 which connect the ellipses, representing con-
texts, show the direct interconnection and interrelation-
ships between the data in the contexts. Due to the ability
to use links, one piece of data can be present in several
contexts. Therefore a paragraph about a design may do tri-
ple duty, as a comment in the program documentation and a
- 9 -
paragraph in both the user and design documentation.
Links are used to provide traceability of functional
requirements. A link from a requirement in specifications,
--------------
leads to a data flow diagram in design documentation. From
------ -------------
the data flow diagram, a link leads to a structure chart or
other design document and then back to the requirement.
This provides a check on the one-to-one relationship between
requirements and designs.
Links also exist from nodes in design documents to
------ ---------
nodes in program documents and then nodes in source code.
------- --------- ------ ----
Implementation notes provides a repository for documenting
-------------- -----
assumptions and decisions based on the actual implementation
of the design. There are links between the nodes in imple-
------
mentation notes and those in design documentation and source
--------- ----- ------ ------------- ------
code to record the associates between them.
----
One real gain from this system is the ease of demon-
strating that all the requirements have been fulfilled.
Starting at specifications, links radiate out through the
--------------
project; through design documentation, implementation notes,
------ ------------- -------------- -----
source code, user documentation, and back to specifications,
------ ---- ---- ------------- --------------
forming a graph cycle. Every path from specifications that
--------------
is not a cycle, indicates an unfulfilled requirement.
DynamicDesign also aides in program maintenance.
Maintenance personnel are able to read the designer's design
documents, assumptions, implementation notes and have them
linked directly to the relevant sections of the code. So
the job of gauging the effect of a program modification is
aided by easy access to documentation linked to specific
portions of the code.
Note in Figure 3 that the object code and symbol
tables, while part of the project, are only related directly
to the source code. A compiler integrated with hypertext
makes good use of its storage abilities. Module symbol
tables need not be reconstructed for every compile, but
merely updated. A symbolic debugger can make excellent use
of the module symbol tables left in hypertext and, since
they are in hypertext and not in the object code, the symbol
table and debugging information can be quite extensive.
When a compiler is used with a facility similar to the UNIX+
make facility to generate recompilation commands, Dynam-
icDesign becomes a programming environment.
Conclusions
-----------
I have shown how hypertext provides an appropriate data
model for CASE, and have provided a number of examples of
-------------------------
+ UNIX is a trademark of Bell Laboratories.
- 10 -
how to apply it. Hypertext is, however, only a part of a
CASE environment, albeit a powerful part. With the other
parts, editors, compilers, linkers, electronic mail systems,
etc. integrated with hypertext there is the possibility of a
powerful and productive programming environment. Much work
remains for Tektronix and others to fully explore and devise
ways to exploit the capabilities of hypertext.
Work on building systems using hypertext can focus on
how to automate the creation of sequential and relational
links. Sequential links show that one node logically fol-
lows another (leadsTo in Figure1) and relational links show
that two node are logically related, but not sequentially
(comments in Figure 1). By automating the linking process
based on the way a node is used, the users is spared repeti-
tive linking. However, there may always be the need for the
means to create a link at the user's command, to point out a
relationship the system has missed and only a person can
see.
A weakness that has not been addressed is how to
represent fine grained information. One solution is to
create a partnership between hypertext and relational data-
bases. A relational database can hold fine grained informa-
tion such as definition-use links in an incremental
compiler's symbol tables. A relationally complete query
language extends the functionality of hypertext to provide
even more capabilities.
References:
See the file in the document source code for references. Sorry