[alt.hypertext] hypertext and CASE

jimbi@copper.TEK.COM (Jim Bigelow) (01/29/88)

I'm not sure what the purpose of this group is, but since it has hypertext
in it name I'd like to offer some of my thoughts on the use of hypertext not
only in the textual world by in the arena of CASE (Computer-Aided Software
Engineering).  

I've written a paper which I can post if people are interested.
In summary, I propose that a hypertext database can be used to hold everything
associated with a software project, the specifications, designs, manuals,
test suites, code, etc.  Links are used to connect and make explicit the 
existing implicit links between the project components.  Furthermore, the
reuse of project components is facilitated by using links. A paragraph may
be used in a use manual, specification and a comment in the code without
duplicating the text.

I've built a prototype of a utility to convert a set of C source files to a
hypertext structure I call the source code tree.  The tree is based on the
call tree of the program and modified by preprocesor commands such as 
#ifdef.   Because all symbol references are resolved, questions such as 
who calls this function or where is this variable defined and who uses it
can be answered by inspecting the source code tree. rather then flipping
back and forth through the listing.
Jim Bigelow

tektronix!copper!jimbi or jimbi@copper.tek.com

norman@husc4.HARVARD.EDU (John Norman) (01/30/88)

In article <1670@copper.TEK.COM> jimbi@copper.TEK.COM (Jim Bigelow) writes:
>
>In summary, I propose that a hypertext database can be used to hold everything
>associated with a software project, the specifications, designs, manuals,
>test suites, code, etc.  Links are used to connect and make explicit the 
>existing implicit links between the project components.  Furthermore, the
>reuse of project components is facilitated by using links. A paragraph may
>be used in a use manual, specification and a comment in the code without
>duplicating the text.
>
>I've built a prototype of a utility to convert a set of C source files to a
>hypertext structure I call the source code tree.

This reminds me of Knuth's "WEB" system for building programs with comments.
It would be interesting for WEB to be available in such an on-line system.


John Norman
Department of English and American Literature and Language
Warren House Box D-12
Harvard University
Cambridge, MA  02138
617/495-2533 (Official business ONLY)

UUCP:      harvard!husc4!norman
Internet:  norman@hulaw1.HARVARD.EDU
BITNET:    NORMAN@HULAW1

deh0654@sjfc.UUCP (Dennis Hamilton) (02/03/88)

In article <1670@copper.TEK.COM> jimbi@copper.TEK.COM (Jim Bigelow) writes:
>I'm not sure what the purpose of this group is, but since it has hypertext
>in it name I'd like to offer some of my thoughts on the use of hypertext not
>only in the textual world by in the arena of CASE (Computer-Aided Software
>Engineering).  
>I've written a paper which I can post if people are interested.
>tektronix!copper!jimbi or jimbi@copper.tek.com

Please post the paper.  If we can't apply these ideas to our own technology,
what better proving ground will we find?

Also, have you looked at Donald Knuth's WEB system and his ideas of literate
programming?  Hypertext would seem to be a natural way of integrating that
idea into an overall CASE structure.  (Being able to pull out literate 
technical documentation for blasting through a document process, like TeX,
could always remain an option, even if one that hypertext would tend to
make less necessary.)

Dennis E. Hamilton

--
	-- orcmid {uucp: ... !rochester!sjfc!deh0654
		   vanishing into a twisty little network of nodes all alike}

jimbi@copper.TEK.COM (Jim Bigelow) (02/12/88)

Here is a poorly formatted version of my paper; nroff is so clunky.  I am
posting the source so you may format it yourself to a laserprinter or
some such device.








                     Hypertext and CASE


                       James Bigelow

        Computer Aided Software Engineering Division
                  Design Automation Group
                      Tektronix, Inc.
                       P.O. Box 4600
                  Beaverton, Oregon 97075



                          ABSTRACT
                          --------

          CASE systems require a method to tie  various
     documents,  memos,  source  code, etc. together to
     provide coherent system documentation. These  sys-
     tems  also  require  complete version histories of
     everything in a project.   Hypertext  meets  these
     requirements  and provides an excellent data model
     for CASE systems.  This paper describes  hypertext
     and  the Hypertext Abstract Machine, built at Tek-
     tronix, and shows how hypertext is useful for CASE
     applications.



December 17, 1987





































                     Hypertext and CASE


                       James Bigelow

        Computer Aided Software Engineering Division
                  Design Automation Group
                      Tektronix, Inc.
                       P.O. Box 4600
                  Beaverton, Oregon 97075



     The heart of any CASE  system  is  its  database  1,2,3
which must provide at least three capabilities:

o         The ability to logically  associate  documentation
          and source code.

o         The ability  to  make  annotations  for  recording
          explanations and assumptions.

o         The ability to provide good version management.

     A CASE environment places other demands on a  database,
due to the nature of large scale projects and project teams.
The  database  must  support  simultaneous  access  by  team
members  and  support  editing  and authorship in a computer
network.  Team members also must be able  to  work  indepen-
dently without interference from other team members and then
be able to merge their work back into the  main  project;  a
taxing demand on a configuration management system.

     The database must allow specific configurations or ver-
sion  trees  to  be  built, along with the ability to subse-
quently merge version branches back into  the  primary  ver-
sion.   Meeting  this  requirement provides the fundamentals
for good version management as well as the functionality for
a configuration manager.

     In search of a system that  meets  the  above  require-
ments,  researchers at Tektronix built Neptune, which demon-
strates that hypertext provides an  appropriate  data  model
for  CASE  systems.   Hypertext is a medium grained, entity-
relationship-like data model that allows an arbitrary struc-
turing  of  information and keeps a complete version history
of both information and structure.

Hypertext and Neptune Revealed
--------- --- ------- --------

     Hypertext was first conceived as early  as  1945  as  a









                           - 2 -


means  of storing all sources of information, both for ready
access and cross referencing. 4,5 There are now a number  of
commercial  and  academic packages promoting hypertext capa-
bilities. 6 However, the key to the future success of hyper-
text  is  an efficient, application independent data storage
method, such as was introduced in 1986, with the development
of  Neptune,  at  Tektronix.   Neptune  achieves application
independency by using a layered system architecture; at  the
bottom is a transaction-based server, the Hypertext Abstract
Machine (HAM), with applications and a user  interface  lay-
ered above.  The HAM provides distributed access over a com-
puter network, and is  synchronized  for  multi-user  access
with  transaction  based  recovery.  The HAM also presents a
generic hypertext model by defining operations for creating,
modifying,  and accessing hypertext components .  A complete
information and version  history  is  maintained  and  rapid
access to any version is provided.

     Hypertext  Concepts   and   Terminology.    The   basic
ingredients  of  a  hypertext  system  are  nodes and links.
Nodes provide a means to store data, and links  provide  the
relationship  between  the data in different nodes.  The HAM
identifies nodes and links by associating an attribute/value
pair with a node or link.  For example, a node's name attri-
bute is given a value such as module 1 to identify the  con-
tents  as  the  source  code  in  module  1.  Information is
grouped into configurations by using  contexts,  collections
of nodes and links.  Since nodes and links may be thought of
as directed graphs, a collection of nodes, links,  and  con-
texts is called a graph.

      An Example CASE Environment.  An example C-based  CASE
environment,  DynamicDesign  developed at Tektronix, has all
of its project components in the HAM:

        o requirements and specifications
        o design notes and documents
        o implementation notes
        o source and object code
        o user documentation


     Nodes are used to contain the project  components,  and
links  depict  the  relationships  between  the  components.
Attributes are used to label the types of nodes  and  links.
Table  1  shows  the  possible  values  of two node and link
attributes.

     In DynamicDesign, nodes have an attribute,  projectCom-
                                                 -----------
ponent,  which identifies the type of project component they
------
contain.  Links have an attribute,  relatesTo,  which  shows
                                    ---------
the  type  of  relation  the  link  provides.   For example,
sequential information may be associated by  connecting  two
nodes with a link whose attribute, relatesTo, has a value of
                                   ---------









                           - 3 -




  -------------------------------------------------------
 |   Attribute Name  |          Possible Values         |
 |-------------------+----------------------------------+
 |  projectComponent |   requirement, spec, designNote, |
 |                   |   design assumption, comment,    |
 |                   |   source, object, symbolTable,   |
 |                   |   documentation, report          |
  -------------------------------------------------------
 |  relatesTo        |   leadsTo, comments, refersTo,   |
 |                   |   callsProcedure, followsFrom,   |
 |                   |   implements, isDefinedBy        |
  -------------------------------------------------------

  Table 1 - Two Node and Link Attributes and their Values


leadsTo.  In Figure 1, module 1 precedes module  2  so  they
-------
are  both stored in nodes and connected by a link having the
attribute value leadsTo.
                -------

     The relationship between a specification and  the  code
which  implements it can be shown with links.  The node con-
taining the portion of the specification (projectComponent =
                                          ----------------
spec)  and  a  node  containing the code (projectComponent =
----                                      ----------------
source) are related with a link which has relatesTo = imple-
------                                    ---------   ------
ments.   As  another  example,  a link can used to show what
-----
module contains the definition of a variable by relating the
two modules with a link of type refersTo.
                                --------

     Nodes.  Nodes are similar  to  the  nodes  in  directed
graphs  and  are used to hold any object in the CASE system:
text, graphics, object code, etc..  Hypertext does not place
any constraint on node format, making it particularly useful
for holding, in one database, the wide variety  of  informa-
tion found in the CASE environment.

     Nodes are atomic data units, so the issue of node  con-
tents  is  important.   If  a piece of data is referenced in
more than one place (e.g. a section of text is in  both  the
requirements  and  the  comments  for a section of code) the
data should be in a node by itself.  However,  the  applica-
tion  that  uses  hypertext,  by  determining  the  unit  of
incrementality used when processing the information, is  the
final  arbitrator  of how much should be placed in one node.
For example, in the case of a compiler, which can  recompile
a  changed  procedure  individually  without recompiling the
entire module that contains the procedure, 7,8 the  unit  of
incrementality  is a procedure.  Other compilers may enforce
a larger increment, such as a module.

     Version history, in the HAM, is kept at  the  node  and









                           - 4 -


link  level (see the next Section for an explanation of link
history) .  While lookup time is proportional to the age  of
the  version,  all  versions  of  a  node's  content  may be
archived and retrieved on demand. When  combined  with  con-
texts  (see below) this creates the fundamentals of a confi-
guration manager.

       -------  leads  -------  leads  -------  leads  -------
      | para- |------>| para- |------>| para- |------>| para- |
      |graph 1|  To   |graph 2|  To   |graph 3|  To   |graph 4|
       -------         -------         -------         -------
         |                   \         /                 |
      comments           comments   comments           comments
         |                     \     /                   |
         |                      \   /                    |
         V                       \ /                     V
      --------     calls      --------     calls      --------
     | module |------------->| module |------------->| module |
     |    1   |  Procedure   |   2    |   Procedure  |    3   |
      --------                --------                --------


   Figure 1 - Commenting Source Code with Nodes and Links


     Links.  Links are thought of as the  arcs  in  directed
graphs  and exist to associate two nodes.  Links, along with
nodes, form the essence of  hypertext;  information  storage
and  linking to allow nonlinear organization of information.
In Figure 1, the link connecting the two nodes, paragraph  1
and  module  1,  provides the logical association that para-
graph 1 comments on module 1.  In addition  to  association,
the  HAM  provides  the ability to traverse a link in either
direction.  Thus, while reading paragraph 2,  in  Figure  1,
one  can  traverse  the  link  to module 2, read the module,
traverse the link to paragraph 3, and read that paragraph.

     In the HAM, a link is not restricted to merely pointing
to the entire node.  A link is attached, at each end, to any
object or place in a node.  Examples of objects that a  link
can  attach  to  are: a character in text, an extent of text
(e.g. a sentence or paragraph), an x,y coordinate in graphic
information,  or a graphics object, such as a process bubble
in a data flow diagram.

     The  history  of  a  link  is  the   history   of   its
attribute/value  pairs  and  its  attachment object within a
node.  By looking at a link's history you can  tell  if  its
name  or location of attachement point ever changed and what
the changes were.

     Attribute/Value Pairs.   Attribute/value  pairs  extend
the  power  of  hypertext,  by  allowing the organization of
nodes and links into sub-graphs within a single context (see









                           - 5 -


below for an explanation of grouping by contexts).  The pri-
mary objective of attribute/value pairs is to make  it  easy
to  access  all  the  information needed and to restrict the
access to only what is needed.  With the preceding  goal  in
mind,  attributes  are used to identify or categorize nodes,
links and contexts.  The HAM provides an unlimited number of
attribute/value  pairs  so  numerous  attributes are used to
multiply categorize nodes, links and contexts.  Table  1  is
an  example  of  using attribute values to identify the con-
tents of nodes and the meaning of links.  Another example is
identifying  contexts  in DynamicDesign.  Contexts are given
an  attribute,  projectCategory,  which  has  the  following
                ---------------
values:  specifications, design documentation, program docu-
         --------------  ------ -------------  ------- -----
mentation, user documentation, implementation notes,  source
---------  ---- -------------  -------------- -----   ------
code, object code, symbol tables, and product (refer to Fig-
----  ------ ----  ------ ------      -------
ure 3 for the uses of these attributes).  The context attri-
bute  is used in the query operations, discussed in the next
section, as a method to locate or filter information.  Also,
a  thorough  example of attribute usage in CASE environments
is presented in PMDB. 1

     Attributes and Query Operations.  The  HAM  provides  a
sophisticated  set  of query operations to both traverse and
retrieve a collection of  nodes,  links  and  contexts.  The
query  operations  use  predicates  based on attribute/value
pairs to determine which nodes, links and  contexts  satisfy
the  queries.  For example, in DynamicDesign, nodes contain-
ing source code are placed in the context, projectCategory =
                                           ---------------
source  code  and have an attribute system attached to them.
------  ----                        ------
The attribute, system,  can  assume  any  of  the  following
               ------
values: all, amiga, bsd, eunice, osk, sysv, and vms.  A node
        ---  -----  ---  ------  ---  ----      ---
predicate system = all used in a retrieval query shows  only
          ------   ---
those  nodes  containing  source code applicable to all sys-
tems.  Conversely, a node predicate of system =  vms  allows
                                       ------    ---
access   to  nodes  with  source  code  applicable  only  to
VAX/VMS+.  A traversal query using a predicate such as  sys-
                                                        ----
tem = all OR system = vms returns the version of the product
---   --- -- ------   ---
source code tailored for the VAX/VMS environment.

     The Code and Comments Problem.  The documentation of  a
computer program is usually either squeezed into the margins
of the program where it is generally too terse to be useful,
or its text is interspersed through out the text of the pro-
gram, breaking up the flow of both the program and the docu-
mentation.   DynamicDesign  allows the program documentation
and program source code to  exist  in  separate  nodes  with
links between them.

     Figure 1 illustrates two node types; program  code  and
program documentation.  The nodes labeled module 1, module 2

-------------------------
  + VAX/VMS is a trade mark of Digital  Equipment  Cor-
poration










                           - 6 -


and module 3 contain program source  code  while  the  nodes
labeled paragraph 1, paragraph 2, paragraph 3, and paragraph
4 contain the documentation.  Figure 1 also shows  two  uses
of  links:  the  sequential link, leadsTo and the annotative
                                  -------
link , comments.  By using this arrangement, either documen-
       --------
tation or source code may be viewed by following links typed
leadsTo, without  interruption.   However,  if  source  code
-------
requires  an  explanation,  it is seen by following the com-
                                                        ----
ments link back to its source in the program  documentation.
-----
Also,  while  reading the program documentation, source code
is viewed at any time by traversing the comment link.   This
                                        -------
method  provides  the  program  documentation with a freedom
from space restrictions not usually found  in  conventional,
in-code documentation methods.

Contexts
--------

     The concept of a context, i.e. collecting or partition-
ing  nodes  and links into a set, was missing from hypertext
until 1987. 9 Contexts provide a way to group  common  nodes
and  links.   But  they  are more powerful than that because
contexts indirectly support cooperative multi-person  design
and   documentation   of  large-scale  software  systems  by
directly supporting partitioning, version trees, and  confi-
guration management.  The HAM provides operations for creat-
ing and populating  contexts  with  nodes,  links  and  sub-
contexts.  A merging operation is provided to allow a subset
of nodes and links in one context to be copied into  another
context.   The  merge operation has several interesting uses
and ramifications that are discussed in the  next  two  sec-
tions.

Contexts and Configuration Management
-------- --- ------------- ----------

     A context allows the designation of a configuration  of
nodes  and links.  When a group of source nodes have reached
a baseline configuration, they are  moved,  with  the  merge
operation,  into  another context holding released products.
Figure 2 depicts the state transitions of two contexts, pro-
                                                        ----
ject  and  release.  The states are labeled V0, V1, V2, etc.
----       -------
and are used to show differences in the nodes and links that
comprise the content of both contexts.  Figure 2 illustrates
the merging of the content of context project, at state  V0,
                                      -------
into context release, at state V0.  Development continues in
             -------
project and is shown by the intermediate states V1, V2,  and
-------
V3.   At  state  V3,  in  project, the content of project is
                          -------                 -------
again merged into release; yielding  state  V1  in  release.
                  -------                           -------
One  of  the  properties of the merge operation is that node
and link  histories  are  not  moved  from  one  context  to
another.   Therefore, while complete node and link histories
are preserved in project, the version history in release  is
                 -------                         -------
that  of the versions of the nodes and links in project when
                                                -------
they were merged into release. This means that the nodes and
                      -------
links  in  release do not have a record of the states V1 and
           -------









                           - 7 -


V2 in project, only states V0 and V3.
      -------
       Project Time Line

   --> V0 -->  V1 -->  V2 -->  V3 -->  V4 -->  V5 -->  V6 -->  V7 -->
       |                       |                               |

     merge                   merge                            merge
       |                       |                               |
       |                       |                               |

       V                       V                               V
      V0 -------------------> V1 ---------------------------> V2 -->

       Release Time Line

Figure 2 - Time Lines for Two Context, Project and Release, with 
		Intermediate Versions


     Contexts and Local Workspaces.  Contexts  can  also  be
used to define a workspace and to partition a project into a
project workspace and local workspaces. 10 A local workspace
allows  a  developer to abstract a subset of nodes and links
from  the  project  workspace  and  place  them  in  another
workspace.   This  workspace  becomes the local workspace in
which developers may make local modifications and test  them
against  the  rest  of  the  project.   When  satisfied, the
developer attempts to merge the changes back into  the  sys-
tem.

     Ideally the partitioning of workspaces between develop-
ers  is  disjoint,  however, this may not be possible due to
project and/or requirement constraints.   So,  two  or  more
developers  may  be  working on the same nodes concurrently.
The chance of concurrent development means that when merging
a  local  workspace  back  into  the  project workspace some
method of detecting and resolving concurrent updates must be
available.   Detection  of  changes  is  mandatory  to avoid
overwriting work by one or more developers.  Therefore,  the
HAM  provides  operations  for detecting differences between
both nodes and contexts.  These  two  operations  facilitate
merging  changed  nodes  back  into the project workspace by
detecting the differences that may have  been  made  in  the
project  workspace  after  the  developer  created the local
workspace.

Project Category Interconnections
------- -------- ----------------

     A project component is any piece of information or data
associated  with  a project. There are some broad categories
that the data can be placed in:















                           - 8 -



        o management reports
        o specification and requirements
        o design, program, and user documentation
        o implementation notes
        o source code
        o object code
        o products

Within each of these categories are  the  actual  documents,
memos,  papers, binaries, etc. that make up the project.  By
placing all the components of a project in  hypertext,  they
are  archived,  recoverable,  and  available  for use within
other parts of the project.
                          specifications
                          /     |     \
                ---------              -------
               /         /      |       |     \ 
	      /         /       |       |      \   
	 design   --------- program -----------   User
       documentation /   documentation  |     documentation
                    /         /   \     |        /
                   /         /     \    |       /
                  /         /       \   |      /
                 /         /         \  |     /
                /         /           \ |    /
               /         /             \|   /
             implementation-----------source ------------- object
                 notes                 code              /  code
                                       /                /
                                      /                /
                                     /                /
                                    /                /
                             symbol ---------- product
                             tables

Figure 3. - The Interconnections of Project Categories in DynamicDesign


     Interconnections between project components exist  even
in  a project which uses paper documents.  However, there is
also much duplication  of  both  effort  and  documentation.
Additionally,  many opportunities to point out the relation-
ships between  components  are  missed  because  the  effort
involved is too great for the time permitted.

     DynamicDesign, the example CASE  environment,  has  all
the  information concerning a project in its hypertext data-
base.   Contexts  are  used  to  group  the  data  into  the
categories mentioned above, shown in Figure 3 and enumerated
in the section describing attribute/value pairs.  The lines,
in  Figure  3  which connect the ellipses, representing con-
texts, show the direct  interconnection  and  interrelation-
ships  between the data in the contexts.  Due to the ability
to use links, one piece of data can be  present  in  several
contexts.   Therefore a paragraph about a design may do tri-
ple duty, as a comment in the program  documentation  and  a









                           - 9 -


paragraph in both the user and design documentation.

     Links are used to provide  traceability  of  functional
requirements.   A link from a requirement in specifications,
                                             --------------
leads to a data flow diagram in design documentation.   From
                                ------ -------------
the  data flow diagram, a link leads to a structure chart or
other design document and  then  back  to  the  requirement.
This provides a check on the one-to-one relationship between
requirements and designs.

     Links also exist from  nodes  in  design  documents  to
                                       ------  ---------
nodes  in  program  documents and then nodes in source code.
           -------  ---------                   ------ ----
Implementation notes provides a repository  for  documenting
-------------- -----
assumptions and decisions based on the actual implementation
of the design. There are links between the nodes  in  imple-
                                                      ------
mentation notes and those in design documentation and source
--------- -----              ------ -------------     ------
code to record the associates between them.
----

     One real gain from this system is the  ease  of  demon-
strating  that  all  the  requirements  have been fulfilled.
Starting at specifications, links radiate  out  through  the
            --------------
project; through design documentation, implementation notes,
                 ------ -------------  -------------- -----
source code, user documentation, and back to specifications,
------ ----  ---- -------------              --------------
forming  a graph cycle.  Every path from specifications that
                                         --------------
is not a cycle, indicates an unfulfilled requirement.

     DynamicDesign  also  aides  in   program   maintenance.
Maintenance personnel are able to read the designer's design
documents, assumptions, implementation notes and  have  them
linked  directly  to  the relevant sections of the code.  So
the job of gauging the effect of a program  modification  is
aided  by  easy  access  to documentation linked to specific
portions of the code.

     Note in Figure  3  that  the  object  code  and  symbol
tables, while part of the project, are only related directly
to the source code.  A compiler  integrated  with  hypertext
makes  good  use  of  its  storage  abilities. Module symbol
tables need not be  reconstructed  for  every  compile,  but
merely  updated.  A symbolic debugger can make excellent use
of the module symbol tables left  in  hypertext  and,  since
they are in hypertext and not in the object code, the symbol
table and debugging  information  can  be  quite  extensive.
When a compiler is used with a facility similar to the UNIX+
make  facility  to  generate  recompilation commands, Dynam-
icDesign becomes a programming environment.

Conclusions
-----------

     I have shown how hypertext provides an appropriate data
model  for  CASE,  and have provided a number of examples of

-------------------------
+ UNIX is a trademark of Bell Laboratories.










                           - 10 -


how to apply it.  Hypertext is, however, only a  part  of  a
CASE  environment,  albeit  a powerful part.  With the other
parts, editors, compilers, linkers, electronic mail systems,
etc. integrated with hypertext there is the possibility of a
powerful and productive programming environment.  Much  work
remains for Tektronix and others to fully explore and devise
ways to exploit the capabilities of hypertext.

     Work on building systems using hypertext can  focus  on
how  to  automate  the creation of sequential and relational
links.  Sequential links show that one node  logically  fol-
lows  another (leadsTo in Figure1) and relational links show
that two node are logically related,  but  not  sequentially
(comments  in  Figure  1). By automating the linking process
based on the way a node is used, the users is spared repeti-
tive linking.  However, there may always be the need for the
means to create a link at the user's command, to point out a
relationship  the  system  has  missed and only a person can
see.

     A weakness that  has  not  been  addressed  is  how  to
represent  fine  grained  information.   One  solution is to
create a partnership between hypertext and relational  data-
bases.  A relational database can hold fine grained informa-
tion  such  as  definition-use  links  in   an   incremental
compiler's  symbol  tables.   A  relationally complete query
language extends the functionality of hypertext  to  provide
even more capabilities.

References:

See the file in the document source code for references.  Sorry