[comp.software-eng] A New Synthesis of Software Engineering Methodology

ables@lot.ACA.MCC.COM (King Ables) (02/23/90)

I am posting this for someone else who has read-only access to the net.

-------------------cut-here------------------cut-here------------------
.nf

A New Synthesis of Software Engineering Methodology
Edwin L. Cline

The following describes a methodology of programming that can be used
in the construction of large or complex software systems. Both program
intensive and data intensive concepts are integrated into it.  This
methodology is almost entirely based on concepts developed by others.
Concepts included are structured programming, procedures, path
expressions, sequentially and functionally cohesive modules,
antibugging, path testing, standard parts, abstraction, information
hiding, etc.  A heavy backround in the literature of software
methodology is required to understand the reasons for the various
programming constructs that make up this methodology, but is not
required for its use.


There are four main concerns in programming software systems:

1. Decisions: based both on events external to the computer and
   internal events.

2. Data representation: strings, numbers, fields, records,
   words, sentences, paragraphs, etc.

3. Actions: (Execution or procedures) including calculations, data
   transformation, input, output, etc.

4. Environment: the configuration of the computer, memory size, hard
   disk size, how other computers and users communicate with the computer,
   etc.  Those things which the operating system handles, and especially
   what the specific program needs to know to operate successfully.

To prevent the complexity of a program from increasing geometrically
with size, it is necessary to separate decisions, data representation,
and execution from each other; connecting them by standard interfaces
and using a structured methodology.

Basically the way this works is that:

1. All decisions are combined to determine what is to be done.
2. The data required is passed to a module from data base or local
   variables.
3. The module executes data transformations or outputs , etc. without
   making further decisions as to what path to follow.
4. The transformed data may be stored back into a data base or local
   variables, or output, etc.
5. Another combined decision is made.

The combination of decisions and their separation from execution
produces a limited number of possible paths which execution can follow,
making possible exhaustive testing of all paths using all ranges of
data.  Changes made to this kind of program do not cause side effects.
It is enough just to test the path in the program where changes were
made.  Data representation is separated from the programs which use the
data, and  programmers can make changes to the program without having
to understand the whole program.
.bp

To understand the details of this methodology it is necessary to define
and discuss the following programming constructs.

	"Functionally Cohesive Module"
	"Option Module"
	"Path Expression"
	"Goal-State"
	"Context-State"

It is important to note that each of these terms refers to a specific
highly structured programming construct to be described hereafter.
These terms are related to, but NOT the same as commonly used in
programming literature.  Be forewarned, or confusion will result.
The specific programming constructs represented by these terms are
engineering rules which when applied will produce consistent results,
rather than general guidelines which require judgement and experience.

The constructs are logically complete and sufficient to construct all
types of programs and handle every use to which computer programs are
put.  As only sequence, iteration, and alternation are used as the
basis of these constructs, the all and every refer to the effects
produced by these constructs, such as: programs created using these
constructs do not lockup (i.e.  prevents bad system states), can be
"completely" tested, keep functioning in spite of errors, are easily
traceable (i.e.  can be modified, fine-tuned, and maintained), and
finally that programs created using these constructs will fit every
computer solvable problem (i.e.  other constructs are not required).

Please note also, that these constructs are not required in 'quick and
dirty' programs; but are used in the creation of dozens of programs
which must interact with each other, and are to be maintained over
several years by many different programmers.


FUNCTIONALLY COHESIVE MODULE

A "Functionally Cohesive Module" is a callable module, procedure or
process that performs a simple calculation, output, input, operation,
etc.; Always doing it in the same way.  It may have only sequence and
iteration, but not alternation. (No IF ---- THEN ------).  Iteration
is limited to counters. Since there are no path decisions made (no
alternation), the module performs the same way every time it is
called.  Once it is determined by testing that the module actually does
what it is supposed to do, it is not necessary to test it again no
matter where or how many times it is called.  The module can be
described by an expression containing a simple verb,

for example: 
	turn_on_warning_led();		/** 8200 turn on warning led **/
	add_memory1_to_accumulator();	/** 8120 add memory1 to accumulator **/
	clear_the_screen();		/** 8320 clear the screen **/
	gchar=in_modem(); 		/** 9010 get character from modem **/
	outcr_printer();		/** 8500 output CR to printer **/

If the module is a 'read' it needs to be non-blocking(most large programs).
.bp

OPTION MODULE

An "Option Module" is a "Functionally Cohesive Module" preceded by a
simple (true false) decision. The module is executed only if the
condition is met; if the condition is not met then the module is not
executed.  (IF condition THEN module).
For example:

	IF (five lines have been output) THEN output_blank_line
		* output a blank line after every five lines

	IF (option switch is set to COLOR) THEN covert_output_to_color
		* covert to color when switch is set

The decision made in the "Option Module" is different in kind from the
ones made in "Context-States and "Goal-States".


PATH EXPRESSIONS

A "Path Expression" is a Sequentially Cohesive Module made of calls to
only "Functionally Cohesive Modules" and "Option Modules".  It is
tested with all options true and then with all options false.  Once
tested and working properly, it does not need to be tested again.  The
importance of "Path Expressions" lies not only in controlling parallel
programs, its original purpose, but as a means of logical clarity of
programs, separation of decisions and execution, and to provide
filtering to prevent the program from following unwanted paths.


CONTEXT-STATE

Context-states and Goal-states are the constructs used to make
decisions in the program regarding what actions are to be taken.
Whenever the program receives input from a "foreign" source (devices,
other computers, other programs running on the same computer, library
modules, etc.) a Context-state or Goal-state is required to interpret
the input.  Context-states and Goal-states are almost exactly what is
represented by the circle in circles and lines that engineers (hardware
& software) draw to show what a device or a program is supposed to do.
These "states" are NOT 'finite state machine' related or 'state
decision tables', but are the programming constructs described
hereafter and nothing else. The "state" of the program is the
"context-state" currently being executed, which is controlled by the
two integers "current_state" and "next_state".  For example:
--------------------------------------------------------------------
context_state()
{
	I - initialization modules
	while(current_state == next_state)	L - context loop
	{
		M - modules
		G - Goal-states
		D - decision modules
		switch(decision_variable)	S - switch
		{
		}
	}
	E - exit modules
}
--------------------------------------------------------------------
.bp

Most programs (even large ones) will not require that every part of the
context-state be present. If a place is left for each part, the program
will be able to handle any requirement that the future may bring.  If
needed, "context-states" can be table driven.

I - INITIALIZATION MODULES are those modules necessary to start the
operation of the "context-state", and are called only once when the
"context-state" is first entered. ( start timers, set or reset flags,
initialize tables, zero counters, etc.)  A standard initialization module
is first called.  This module is a path expression of modules which are
used to initialize every "context-state". Next exception modules are called
which initialize, and are specific to the current "context_state".
Initialization starts a timer.

L - LOOP  A context-state loops until current_state does not equal next_state.

M - MODULES "Functionally Cohesive Modules" and "Option Modules" that
are called on each loop of the "context-state".  First standard modules
(common to all context-states") and then exception modules that are
specific to the "context-state".

G - GOAL STATES have generally the same parts as a "context-state"
(ILMDE).  They are called by the context-state, with their only exit a
return to the calling "context-state".  "Goal-states" are a path
expression of "states" which are to accomplish a goal, requiring
interaction with a "foreign" source.  They return an integer value of
'Y' for successful completion, 'N' for unsuccessful, 'I' for Idle (no
communication at all), 'P' for partial completion, and 'A' Active (not
yet finished - in non-blocking "goal-states").
for example:
	ret=gs_obtain_environment_variables();
	ret=gs_logon_to_mainframe();
	ret=gs_enter_number_from_keypad();
	ret=gs_load_database_with_amount_field();

D - DECISION MODULES  Those modules called to determine a value to
place in the integer "decision_variable". (checks of flags and timers,
calculation, inputs, data received, the return value of "Goal-states",
etc.) First standard decision modules and then exception decision
modules.

S - SWITCH is a decision. Based on the integer "decision_variable", a
"path expression" is executed and next_state may be changed.  If the
number of cases becomes very large, a binary search may be needed, or a
linked-list can be created that is self-adjusting according to usage.
One of the cases is TIMEOUT.  Every loop (L) has a timeout. (i.e. the
program will not wait forever for some event that is not going to
happen.)  Exceptions are checked FIRST and then a standard SWITCH
module (the same for all "context-state" switches) is called.  All
expected values should be represented, so that the default will be
logically impossible, or something that will 'never happen'.  The
default logs the error, and the "context-state" number. As the
environment changes over time, this feature will save hours or days of
tracing; It also makes explicit any deviations between expectations and
reality during development.


E - EXIT MODULES are those modules necessary to finish the operation of
the "context-state", and are called only once when the "context-state"
is exited.(Standard modules and exception modules.)
.bp

The following is an outline program of a simple integer calculator and
is an example of the use of "context-states".  'K' is the keyboard
value.  'A' is the accumulator. Normally, in a larger program,
statements such as A=A+K; would be given a name.  The calculator has
four functions:  '+' add, '-' subtract, '*' multiply, and '/' divide;
and two clear keys:  CE clear entry; and CL clear all.  The program
does not require timers, initialization modules or exit modules (within
the "context_state").
--------------------------------------------------------------------

main()
{
	next_state=CS_init;
	while(next_state!=CS_exit)
	{	current_state=next_state;
		switch(next_state)
		{	case CS_init:	 cs_init(); break;
			case CS_addpend: cs_addpend(); break;
			case CS_subpend: cs_subpend(); break;
			case CS_mulpend: cs_mulpend(); break;
			case CS_divpend: cs_divpend(); break;
			case CS_exit:	 cs_exit(); break;
			default: m_out_cs_error(current_state,next_state);
				next_state=CS_init; break;
		}
	}
}

cs_stan_switch(kchar)	/** S - the switch common to all context-states **/
int kchar;
{
	switch(kchar)
	{	case '0-9': K=(10*K)+kchar; m_displayK(); break;
		case '+': m_displayA(); next_state=CS_addpend; break;
		case '-': m_displayA(); next_state=CS_subpend; break;
		case '*': m_displayA(); next_state=CS_mulpend; break;
		case '/': m_displayA(); next_state=CS_divpend; break;
		case '=': m_displayA(); next_state=CS_init; break;
		case CE: K=0; m_displayK(); break;	/** clear entry **/
		case CL: next_state=CS_init; break;	/** clear all **/
		default: m_out_stan_switch_error(current_state,kchar);
			next_state=CS_init; break;
	}
}

cs_init()		/** context-state initialization **/
{
	A=0; K=0;
	while(current_state==next_state)/** L - Loop **/
	{	kchar = get_key();	/** D - decision **/
		switch(kchar)		/** S - switch **/
		{	case '=': case '+': case '-': case '*': case '/':
			  A=K; cs_stan_switch(kchar); break;
			default: cs_stan_switch(kchar); break;
		}
	}
}
.bp


cs_addpend()		/** context-state addition pending **/
{
	while(current_state==next_state)/** L - Loop **/
	{	kchar = get_key();	/** D - decision **/
		switch(kchar)		/** S - switch **/
		{	case '=': case '+': case '-': case '*': case '/':
			  A=A+K; cs_stan_switch(kchar); break;
			default: cs_stan_switch(kchar); break;
		}
	}
}

cs_subpend()		/** context-state subtraction pending **/
{
	while(current_state==next_state)/** L - Loop **/
	{	kchar = get_key();	/** D - decision **/
		switch(kchar)		/** S - switch **/
		{	case '=': case '+': case '-': case '*': case '/':
			  A=A-K; cs_stan_switch(kchar); break;
			default: cs_stan_switch(kchar); break;
		}
	}
}

cs_mulpend()		/** context-state multiplication pending **/
{
	while(current_state==next_state)/** L - Loop **/
	{	kchar = get_key();	/** D - decision **/
		switch(kchar)		/** S - switch **/
		{	case '=': case '+': case '-': case '*': case '/':
			  A=A*K; cs_stan_switch(kchar); break;
			default: cs_stan_switch(kchar); break;
		}
	}
}

cs_divpend()		/** context-state division pending **/
{
	while(current_state==next_state)	/** L - Loop **/
	{	kchar = get_key();	/** D - decision **/
		switch(kchar)		/** S - switch **/
		{	case '=': case '+': case '-': case '*': case '/':
			  A=A/K; cs_stan_switch(kchar); break;
			default: cs_stan_switch(kchar); break;
		}
	}
}
--------------------------------------------------------------------
.bp

The next example is portions of a documentation file for a program that
logs onto a mainframe and downloads files.  This shows the name and the
numbers of the modules.  (Basic has a methodology contribution after
all.)
---------------------------------------------------------------------
/** CONTEXT-STATES 0-2999 **/
cs_init();		/** 0100 initialize program **/
cs_variable();		/** 0200 obtain user defined variables **/
cs_logon();		/** 0300 logon to mainframe **/
cs_get_list();		/** 0400 get list of files **/
cs_download();		/** 0500 download the files **/
cs_logoff();		/** 0600 logoff the mainframe **/
cs_exit();		/** 0999 exit the program **/
--------------------------------
/** GOAL-STATES 3000-5999 **/
. . . .
gs_logon();		/** 3100 Goal-state logon to mainframe **/
gs_stan_3105();		/** 3105 standard switch,CP READ,MORE,HOLDING etc.**/
  gs_init();		/** 3110 initialize gs_logon goal-state **/
  gs_termid();		/** 3120 wait for TERMINAL ID; output logon id **/
  gs_entpass();		/** 3130 wait for ENTER PASS; output passwd **/
  gs_rread();		/** 3140 wait for CP READ **/
  gs_rsemi();		/** 3150 wait for Ready; **/
  gs_running();		/** 3160 wait for RUNNING **/
   gs_exitY();		/** 3170 successful completion **/
   gs_exitN();		/** 3195 unsuccessful completion **/
   gs_exitI();		/** 3196 unsuccessful - port Idle **/
   gs_exitP();		/** 3197 partial completion **/
   gs_exitA();		/** 3198 still trying **/
gs_exit();		/** 3199 exit gs_logon, returning exit value **/
. . . .
--------------------------------
/** PATH EXPRESSIONS 6000-7999 **/
. . . .
--------------------------------
/** FUNCTIONALLY COHESIVE MODULES & OPTION MODULES 8000-8899 **/
. . . .
--------------------------------
m_out_clear();		/** 8100 output clear to main frame **/
m_outipl();		/** 8110 output 'ipl cms\r' to main frame **/
m_out_logoff();		/** 8120 output LOGOFF to main frame **/
m_flush_outbuf();	/** 8200 flush output buffer	 **/
--------------------------------
/** LIBRARY FUNCTIONALLY COHESIVE MODULES & OPTION MODULES 8900-8999 **/
. . . .
--------------------------------
/** I/O FUNCTIONALLY COHESIVE MODULES & OPTION MODULES 9000-9999 **/
. . . .
i_mfp_open(); 		/** 9110 open main frame port **/
i_mod7_open();		/** 9120 open modem port #7 **/
i_mfp_close(); 		/** 9210 close main frame port **/
i_mod7_close();		/** 9210 close modem port #7 **/
i_mfp_ioctl(); 		/** 9310 IOCTL output **/
vchar=i_mfp_in();	/** 9410 input char from main frame port **/
i_mfp_out(s); 		/** 9510 output string to main frame port **/
. . . .
---------------------------------------------------------------------
.bp

A descriptive name along with a number and functional description is
given to each module.  No parameters are passed to any numbered module
(unless it is obvious from the module name what is meant).  Things
such as "kcret=kctal(a,b,def1,&bufx[KCTAL]);" only occur inside of
"functionally cohesive modules", 8000 and above. This makes it easier
to understand, and also concentrates like modules so that the work
of one programmer in a specific area can be used by others through
the use of the descriptive name.  Modules can be pulled out of the
documentation file to create new "context-states", "path expressions",
etc.  Variables are local to logical numeric divisions of the numbered
modules. (i.e. file descriptors returned by open are local to
9000-9599, etc.)


DATA REPRESENTATION

The way data is represented in this methodology, while fairly clear to
IS programmers, will no doubt strike most 'C' programmers as strange.
Complex structures become difficult to work with, when there are
thousands of data objects and hundreds of data transformations.  The
'big ole matrix' does not work well in data intensive programs. And on
top of that, if the program is also program intensive, with hundreds
(or thousands) of modules, accessing large amounts of data, making
thousands of data tranformations, the representation of data requires
simplicity. The primary goal of data intensive programs is to make the
data independent of the programs which use that data, and standardize
the interface between programs.

The pattern of data representation is (with some modifications) based
on traditional data intensive methods, also Unix pipes, where the ouput
of one command becomes the input of next, and some Unix commands, such
as 'dd' where parameters are passed to the command in the form
(option=value:  if=infile, bs=512, conv=ascii).

The general rules are: data is passed between modules and programs, and
stored in files as a records (lines) of fields separated by '|'
character.  (there are reasons for not using white space as the field
separator.) The fields are variable length ascii strings of printable
characters.  The names of the fields can be given in the first record
of a file, or the names can be attached to each field with an '='
character.  Variables internal to the program should be of as few types
as possible:  generally character buffers, integers, and in some cases
doubles, indexed as needed.  For those who are incredulous, I am
familiar with D. Ritchie's quote on "records being an obsolete holdover
from the days of punched cards". Perhaps a more useful quote to
consider, on this point and others made in this paper, is that of
C.A.R. Hoare, about "the problems of complex software that masquerades
as power and sophistication."



The following example is a shell script used to obtain a report of the
number of males in Travis County, Texas in the Crockett Park area,
listed by age.  The census has persons and females, so that persons
less females equal males.  Note that a thousand more fields could be
added without increasing the program's complexity. Note also it is an
'open' rather than a 'closed' 4th generation program.
.bp


--------------------------------------------------------------------
gs_sql\  
	database=type1A location=corporate3\  
	select\  
	 state county tract4 tract2\  
	 p0to4 p5to9 p10to14 p15to19 p20to24 p25to29 p30to34\  
	 p25to44 p45to54 p55to59 p60to64 p65to74 p75to84 p85\  
	 f0to4 f5to9 f10to14 f15to19 f20to24 f25to29 f30to34\  
	 f25to44 f45to54 f55to59 f60to64 f65to74 f75to84 f85\  
	from census90 where state=TX and county=Travis\  
|m_remove-spaces\  
|m_calc\  
	 read state county tract4 tract2\  
	 p0to4 p5to9 p10to14 p15to19 p20to24 p25to29 p30to34\  
	 p25to44 p45to54 p55to59 p60to64 p65to74 p75to84 p85\  
	 f0to4 f5to9 f10to14 f15to19 f20to24 f25to29 f30to34\  
	 f25to44 f45to54 f55to59 f60to64 f65to74 f75to84 f85\  
\  
	tractx=trac4+{tract2/100} m0to4=p0to4-f0to4 m5to9=p5to9-f5to9\  
	m10to14=p10to14-f10to14 m15to19=p15to19-f15to19\  
	m20to24=p20to24-f20to24 m25to29=p25to29-f25to29\  
	m30to34=p30to34-f30to34 m35to44=p35to44-f35to44\  
	m45to54=p45to54-f45to54 m55to59=p55to59-f55to59\  
	m60to64=p60to64-f60to64 m65to74=p65to74-f65to74\  
	m75to84=p75to84-f75to84 m85=p85-f85\  
\  
	output tractx m0to4 m5to9 m10to14 m15to19 m20to24 m25to29 m30to34\  
	m25to44 m45to54 m55to59 m60to64 m65to74 m75to84 m85\  
|m_pattern_grep\ 		#pass on records where tractx = number  
	field=1\  
	190.09 181.07 181.06 181.14 126 127 128 129 122.03 122.01\  
|m_crossfoot\		#total across creating total field  
|m_total\		#total down creating new record at end  
|m_format\  
	read tractx m0to4 m5to9 m10to14 m15to19 m20to24 m25to29 m30to34\  
	m25to44 m45to54 m55to59 m60to64 m65to74 m75to84 m85 total\  
\  
	default_format=%d ftractx=%7.2f\  
\  
	output tractx m0to4 m5to9 m10to14 m15to19 m20to24 m25to29 m30to34\  
	m25to44 m45to54 m55to59 m60to64 m65to74 m75to84 m85 total\  
\m_print\  
	read tractx m0to4 m5to9 m10to14 m15to19 m20to24 m25to29 m30to34\  
	m25to44 m45to54 m55to59 m60to64 m65to74 m75to84 m85 total\  
	heading1='1990 CENSUS - AGE GROUP BY TRACTS'\  
	heading2='TRAVIS COUNTY - MALES'\  
	heading3='Crockett Park'\  
	columnA=AGE 0-4 5-9 10-14 15-19 20-24 25-29 30-34\  
	25-44 45-54 55-59 60-64 65-74 75-84 85+ TOTAL\  
	columnB=TRACT\  
	defaultwidth=6 width_tractx=9 width_total=8\  
	pagelength=66\  
> Travis.cp.males  
lp Travis.cp.males  
--------------------------------------------------------------------
This report took 20 minutes to write, test, run, print, and hand to the
person requesting it.  (i.e. a new report while you wait).
.bp

The next example is that of natural language understanding. The program
uses both fields and naming of variables in a free form manner, where
the output of one context-state becomes the input of the next.

--------------------------------------------------------------------
cs_init_structure();		/** 8700 get english sentence and divide **/
cs_surface_structure();		/** 8710 determine surface structure **/
cs_deep_structure();		/** 8720 determine deep structure **/
cs_numbers();			/** 8730 add word numbers **/
cs_meaning();			/** 8740 determine meaning **/



input - The small running horse, stepped in a large hole, and broke its leg.
cs_init_structure();
output-	The|small|run|ing|horse|,|step|ed|in|a|large|hole,|and|broke|it|'s|leg|.

cs_surface_structure();
output-	ART=The|ADJ=small|VERB=run|VERBMARK=ing|NOUN=horse|COMMA=,|VERB=step\
	|VERBMARK=ed|PREP=in|ART=a|ADJ=large|NOUN=hole|COMMA=,|CONJ=and\
	|VERB=broke|PRO=it|POSS='s|NOUN=leg|.

cs_deep_structure();
output-	start|S=horse,V=run,O=-| S=horse,V=step,O=-| S=it,V=break,O=leg|\
	S=it,V=owns,O=leg| S=horse,V=is,O=it| S=horse,V=is,O=small|\
	S=step,V=is-in,O=hole| S=hole,V=is,O=large|\
	S=step,V=is,O=past| S=run,V=is,O=past| S=break,V=is,O=past|stop

cs_numbers();
output-	start|
	horse=(214,215,366,373,726)|run=(29,264,274,278,281,335,348,613)|
	leg=(215,266,298,792,477,479,732)|step=(26,71,215,264,466,626,680)|
	break=(44,70,140,198,279,328)|small=(32,193,930,879)|
	large=(31,192,748,750)|hole=(7,182,189,191,251,260)|
	S=horse,V=run,O=-|
	S=horse,V=step,O=-|
	S=horse,V=break,O=leg|
	S=horse,V=owns,O=leg|
	S=horse,V=is,O=small|
	S=step,V=is-in,O=hole|
	S=hole,V=is,O=large|
	s=run,v=is,o=past|
	s=step,v=is,o=past|
	s=break,v=is,o=past|
	stop

cs_meaning();
output-	start|
	S=horse.366,	V=run.264,	O=-.00|
	S=horse.366,	V=step.264,	O=-.00|
	S=horse.366,	V=break.44,	O=leg.266|
	S=horse.366,	V=owns.00,	O=leg.266|
	S=horse.366,	V=is.00,	O=small.193|
			S=step.264,	V=is-in.00,	O=hole.260|
					      S=hole.260,V=is.00,O=large.192|
			s=run.264,	v=is.00,	o=past.00|
			s=step.264,	v=is.00,	o=past.00|
			s=break.44,	v=is.00,	o=past.00|
	stop
--------------------------------------------------------------------

Once the meaning is obtained, it can be recast into another language,
or used to answer questions, etc.
.bp

In closing, many things have been left unsaid, mostly because the time
and talent is lacking to express myself more fully. I would be very
happy to discuss the subject in more detail with any individual who has
been able to follow what I have been trying to say.

Edwin L. Cline
7401 N. Lamar  #103
Austin, Texas  78752
(512) 452-3962
(319) 326-3969