[net.arch] Automated Structured Analysis

bill@milford.UUCP (bill) (01/31/85)

gobble

One pet concern of mine has been if the development process
could be automated to the extent that structured charts might
no longer be necessary, i.e.


UNIX as a virtual dataflow machine:
                                  

Using one of the most common examples, a spelling checker: 

                                                       ---------   
                                   ---------  words   /         \  
                                  /Have each\------->/ Sort the  \ 
                                 / word be on\       \ words, no / 
        ------------  stripped   \ a separate/        \duplics. /  
       /            \------------>\  line   /          ---------   
      /remove gramm. \document     ---------                |  
     | symbols from   |                                     |unique
     |  the text      |                                     |words  
      \              /                                      V       
    >  \            /                                    ----------   
   /    ------------             ______________         /          \  
  /                               dictionary    ------>/find words  \ 
 /document                       --------------        \  not in    / 
/                                                       \dictionary/  
                                                       / ----------   
                                                      / 
                                             <------- misspelt words


From such a DFD could be constructed the shell script:

tr -cs [A-Za-z] \040 |
sed "s/ */\
/g" |
sort -u |
comm -13 - dictionary

(NOTE: I haven't tested this shell script, its just an illustration (:-)

If the DFD is a network, it gets more complicated, but the "readslow"
routine supplied by Kernighan and Pike can provide the key:          
                                                           

                                      --------   
                          ---------> /        \  
               ------    /           \ do_b   /\            --------   
              /      \--/\            --------  \          /        \  
             /        \   \                      -------->/          \ 
------------>\ do_a   /    \                             |do_b_do_b_do|
              \      /      \          --------           \          / 
               ------        \        /        \---------->\        /  
                              ------>\ do_c     /           --------   
                                      \        /                       
                                       --------                        
                                                                       
could become:
             
  ... | do_a | tee temp1 | do_b ...&
readslow temp1 | do_c ...           
"Structurally, readslow is identical to cat except that it loops
instead of quitting when it encounters the current end of input."

Adjacent to this is the question whether any data flow diagram
could be decomposed into a combination of 'tee', 'joinslow'(like
the do_b_do_b_do above), 'comm', and single-input-single-output
filters? That is, 1) fan_in=2, fan_out=1 'a collector' 2)fan_in=1,
fan_out=2 'a splitter' and 3)fan_in=1 fan_out=1 'a transformer'.

These would run as slooooooow as molasses of course...
but is it conceivable that a "supercomputer" could be
set up as a virtual Unix dataflow machine?
Elements such as 'grep', 'sort', and 'tr' could be viewed
analogously to irreducible machine operations: 'lea', 'addb', etc.
The objects acted upon by 'grep' would be lines, by 'sort' would
be complete files, by 'tr' would be single characters; when a
completed object is available to such a dataflow element, then
{the kernel ?} would activate that process element, producing
hopefully another object needed by another process. I suspect what
currently eats up time in the pipelines exampled above is each
routine testing again and again for objects to act upon.

Well, enough of this 3/4 baked idea, any feedback?

donz@tekcbi.UUCP (Don Zocchi) (02/13/85)

I have used Structured Analysis (Data Flow Diagramming) for a number of years.
It is a very useful tool, for helping you communicate/think about the user's
problem.  I have also carried it to extremes.  In the limit (approaching the
level of coding), SA becomes cumbersome (Even with Tek's SA_Tools).
A perfect example of this is performing an SA on the FFT problem.
I did an SA for an FFT (which I was retrofitting into an assembly language
system), I found that one line of C pseudo code broke down into a full data
flow diagram.  Lets apply this to a 10K line program, where are you going to
keep the diagrams?  How will you keep track of each diagrams interactions?
At the coding level, SA tends to fall apart, because it is an analysis tool.
I see merit at you're striking out at the SA/Code windmill, but the transition
from SA to code has been less severe, than the transition from Current System
Problems to Solution (Code).

			Don Zocchi
			(The opinions expressed above don't reflect
			my employers or anyone elses anything).

robison@uiucdcsb.UUCP (02/14/85)

/* Written  9:01 am  Jan 31, 1985 by bill@milford in uiucdcsb:net.arch */
/* ---------- "RE: Automated Structured Analysis" ---------- */
gobble

One pet concern of mine has been if the development process
could be automated to the extent that structured charts might
no longer be necessary, i.e.


UNIX as a virtual dataflow machine:
                                  

Using one of the most common examples, a spelling checker: 

                                                       ---------   
                                   ---------  words   /         \  
                                  /Have each\------->/ Sort the  \ 
                                 / word be on\       \ words, no / 
        ------------  stripped   \ a separate/        \duplics. /  
       /            \------------>\  line   /          ---------   
      /remove gramm. \document     ---------                |  
     | symbols from   |                                     |unique
     |  the text      |                                     |words  
      \              /                                      V       
    >  \            /                                    ----------   
   /    ------------             ______________         /          \  
  /                               dictionary    ------>/find words  \ 
 /document                       --------------        \  not in    / 
/                                                       \dictionary/  
                                                       / ----------   
                                                      / 
                                             <------- misspelt words


From such a DFD could be constructed the shell script:

tr -cs [A-Za-z] \040 |
sed "s/ */\
/g" |
sort -u |
comm -13 - dictionary

(NOTE: I haven't tested this shell script, its just an illustration (:-)

If the DFD is a network, it gets more complicated, but the "readslow"
routine supplied by Kernighan and Pike can provide the key:          
                                                           

                                      --------   
                          ---------> /        \  
               ------    /           \ do_b   /\            --------   
              /      \--/\            --------  \          /        \  
             /        \   \                      -------->/          \ 
------------>\ do_a   /    \                             |do_b_do_b_do|
              \      /      \          --------           \          / 
               ------        \        /        \---------->\        /  
                              ------>\ do_c     /           --------   
                                      \        /                       
                                       --------                        
                                                                       
could become:
             
  ... | do_a | tee temp1 | do_b ...&
readslow temp1 | do_c ...           
"Structurally, readslow is identical to cat except that it loops
instead of quitting when it encounters the current end of input."

Adjacent to this is the question whether any data flow diagram
could be decomposed into a combination of 'tee', 'joinslow'(like
the do_b_do_b_do above), 'comm', and single-input-single-output
filters? That is, 1) fan_in=2, fan_out=1 'a collector' 2)fan_in=1,
fan_out=2 'a splitter' and 3)fan_in=1 fan_out=1 'a transformer'.

These would run as slooooooow as molasses of course...
but is it conceivable that a "supercomputer" could be
set up as a virtual Unix dataflow machine?
Elements such as 'grep', 'sort', and 'tr' could be viewed
analogously to irreducible machine operations: 'lea', 'addb', etc.
The objects acted upon by 'grep' would be lines, by 'sort' would
be complete files, by 'tr' would be single characters; when a
completed object is available to such a dataflow element, then
{the kernel ?} would activate that process element, producing
hopefully another object needed by another process. I suspect what
currently eats up time in the pipelines exampled above is each
routine testing again and again for objects to act upon.

Well, enough of this 3/4 baked idea, any feedback?
/* End of text from uiucdcsb:net.arch */

robison@uiucdcsb.UUCP (02/14/85)

I'm working on something similiar to a UNIX data-flow machine.

I have a pipe-like syntax for Backus' functional programming language.
I believe it does not have to run slowly on an ordinary von-Neumann machine.
All that is necessary is a good globally optimizing compiler.  This
isn't as hard as it sounds (I hope! Its my thesis!), since the FP or
pipe description is on a high enough level that the semantics aren't buried
under a lot of bit pushing.  A typical optimization that can be seen
on a high level is:

     printf ("%f",F); /* in program sourcing pipe */
     
     ...

     scanf ("%f",&F);  /* in program reading pipe */ 

Clearly the printf/scanf pair is an identity that can removed.

An advantage of FP is that the clumsy "tee" program is not necessary.
The splitting of pipes can be done with functional forms.

(Backus' FP is described in his Turing lecture in the Aug. 1978 issue of CACM)