[comp.lang.misc] dataflow shell

throopw@sheol.UUCP (Wayne Throop) (11/19/89)

> dave@fps.com (Dave Smith)
> Unix pipe programs are one dimensional because they're limited by the shells
> to be one dimensional.   [...]
> I'd prefer to [..do something like..]:
>				    /-awk 'control string 1' > x1
>	producer_prog | sed | grep <
>				    \-awk 'control string 2' > x2

This particular trick can actually be done with the bourne shell as it
currently exists.  Of course, just arbitrarily drawing two dataflows out
of the grep doesn't make the program capable of producing two
datastreams.  We would have to get used to more "pipe fitting" programs
or switches to existing programs.  I'll ignore that, and produce just
the example above:

   producer_prog|sed|(grep|awk 's1' >x1&) 3>&1 | awk 's2' >x2

(This is, of course, asuming that "grep" produces one datastream
 on descriptor 1, and another on descriptor 3.)

> The other thing that would be quite nice would be to have more than one
> input to a program be a pipe [..like..]
>	pipe1 -\
>		> diff
>	pipe2 -/

This, too, is possible in the bourne shell off-the-shelf.  Again, it is
necessary to suppose that diff can take two input streams, and so this
would require a variant of diff, or an additional switch to diff.  But
be that as it may:

      pipe1 | (pipe2 | diff) 3<&0

(This is assuming that "diff" will take an additional input on f.d. 3.)

> Once you've got this dataflow shell, scaling it up and adding features to it
> to make it a full-fledged language wouldn't be too hard.  One of these days...

Well, ugly as the bourne shell syntax for these things is, and limited
as it is (it can't for example produce two dataflows between just two
processes, nor a circular dataflow, nor many others), it IS ALREADY a
start.  Why not start using it, and expand on it by writing pipe
fittings as needed, like the "split" option of grep, or the "join"
option of diff, and others? Then, when a fancy graphical user interface
with allows for multiple data flows into and out of programs occurs, the
primitives necessary to using it will already exist. 

(Another very interesting "pipe fitting" program is one that will
 produce named pipes in the file system.  That way, programs which
 require filenames (such as diff) can be put into a pipe.  The above
 examples might be done like so with two primitives:
         ... |psplit "grep e1|awk 's1' >x1" "grep e2|awk 's2' >x2"
         pjoin f1 "pipe1" f2 "pipe2" "diff $f1 $f2" | ...
 Once you've written psplit and pjoin, you can do a lot, fairly nicely.
 Further, a graphical dataflow program could be trained to use them to
 produce the various fittings in the dataflow, while using existing
 unix tools as the nodes. )

In any event, if one really wants to exploit complicated pipe dataflows,
the thing to do is just go ahead and do it something like the way outlined
above.  Waiting for neat graphical-oriented dataflow shells to become
widely available just hasn't worked for me in the past... this has.
--
Wayne Throop <backbone>!mcnc!rti!sheol!throopw or sheol!throopw@rti.rti.org

mjs@hpfcso.HP.COM (Marc Sabatella) (11/21/89)

>> Unix pipe programs are one dimensional because they're limited by the shells
>> to be one dimensional.   [...]
>> Once you've got this dataflow shell, scaling it up and adding features to it
>> to make it a full-fledged language wouldn't be too hard.

>This particular trick can actually be done with the bourne shell as it
>currently exists.  Of course, just arbitrarily drawing two dataflows out
>of the grep doesn't make the program capable of producing two
>datastreams.  We would have to get used to more "pipe fitting" programs
>or switches to existing programs.

The real limitation here is not the shell, as has been observed, it is the
general rule that most programs have been written to undertand stdin and stdout
and not care about other streams.

As part of an MS project on adding message-based IPC to a simple multitasking
OS, we added all the necessary features to the OS and the interpreter (a Forth
interpreter - think of it as the shell) to do this.  There were only two
reasons I didn't - no obvious convenient syntax to use, and a lack of time to
think one up and implement it.

Anyone interested in "dataflow shells", please read on.  Can you suggest a
clean shell syntax for what we are doing?

What we had was the following (translated to be as Unix-like as possible):
All I/O is done through message ports (named pipes?), each of which can have
an arbitrary number number of readers and writers.  Anyone can send (write) to
a port (assuming correct permissions if translated to Unix) with no special
preparation.  To receive messages (read), you connect to (open?) the port,
which places you on its 'mailing list'.  All future messages sent to that port
are forwarded to you, as well as anyone else connected to the port.  Messages
accumulate in your mailbox (no good analogy) and may be retrieved at your
liesure.

Given this set up, one implemented "dataflow pipes" by brute force.  You have
a command port shared by all processes in the pipe, and as many data ports as
you wish.  You start up each process with a list of the named ports it is to
use as input and output, and the code within each process would then connect to
each of its input ports.  It was not necessary to use more than output port
(since ports can have an arbitrary # of readers) unless you wished to produce
different streams (say, stdout and stderr).  And since this a real time OS, and
since message sent to a port are lost if no one is there to receive them, each
process in the pipe connected to the command port and waited for the 'start'
command.

One example of this was a pipe that took input from an arbitrary number of
sources (usually at least a disk file and a synthesizer), merged the input
streams (sorted in real time order), and wrote the result to an output port.
The output was often redirected to at least two devices - a disk file, and a
synthesizer.  Disk files and synthesizers had device drivers which allowed them
to be treated as ports for input, and as processes for output.

The commands to set this up were something like:

1. start up a file reader process for each input file
   parameters are name of input file "port" (device driver)a
   and name of output port
	(this was necessary so the files would present their input to the
	 merger process without being queried - the files contain timestamped
	 messages which the reader process would send at appropriate times
	 relative to the "start" command)
2. start up merger process
   parameters are list of input ports and name of output port
3. connect output file(s) & synthesizer "processes" (device drivers)
   to merger output port
4. give "start" command

On the "start" command, the file readers would start sending messages, and the
merger would start reading its messages.  You could also start generating input
from any of the synthesizer devices at this time.  Given a "dataflow shell",
the kludge of the "start" command would be unnecessary - at any rate, it would
be hidden from you by the shell (since this is a real time system, you
probably would have to keep the model of starting up each process and have them
wait).  The kludge of the named ports for reader & merger output could also be
done away with - the shell would presumably set these up for you.

Graphically, what we have is the following:

inputfile1 --- reader1 \
                        |
inputfile2 --- reader2  |         outputfile1 (device driver)
                      \ |        /
                       merger ---
                      / |        \
synthesizer1 ---------  |         synthesizer1 (device driver)
                        |
                       /
synthesizer2 ----------

Given that a) the Bourne shell method of knowing file descriptors used for I/O
by programs does not fit in well with this message based model; and b) it is
excruciatingly ugly; can anyone suggest a syntax for expressing this sort of
thing?

--------------
Marc Sabatella
marc%hpfcrt@hplabs.hp.com

dave@fps.com (Dave Smith) (11/21/89)

In article <0207@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes:
 >
 >Well, ugly as the bourne shell syntax for these things is, and limited
 >as it is (it can't for example produce two dataflows between just two
 >processes, nor a circular dataflow, nor many others), it IS ALREADY a
 >start.  Why not start using it, and expand on it by writing pipe
 >fittings as needed, like the "split" option of grep, or the "join"
 >option of diff, and others? 

Because writing the spiffy graphical interface is a whole lot more fun
than writing dataflow Bourne shell scripts.  :-)

--
David L. Smith
FPS Computing, San Diego
ucsd!celerity!dave or dave@fps.com
"I'm trying to think, but nothing happens!" - Curly Fine

throopw@sheol.UUCP (Wayne Throop) (11/23/89)

> mjs@hpfcso.HP.COM (Marc Sabatella)
> Graphically, what we have is the following:
> inputfile1 --- reader1 \
> inputfile2 --- reader2 -- merger ---- outputfile1 (device driver)
> synthesizer1 ----------//       \---- synthesizer1 (device driver)
> synthesizer2 ----------/
> [...] can anyone suggest a syntax for expressing this sort of thing?

Well, a system that can handle nested fan-out and fan-in only (but
not circular or DAG-type connections), we might just have parallel
processes enumerated as a list, with familiar Lisp (lots of silly
paranthesis) syntax (but with "{", say)...which would make the above
work out to be:

    { { <inputfile1 reader1 }
      { <inputfile2 reader2 }
      { synthesizer1 }
      { synthesizer2 } } merger { { consumer1 >outputfile1 }
                                  { consumer2 } }

(I've altered the output fan-out a little.)
--
Wayne Throop <backbone>!mcnc!rti!sheol!throopw or sheol!throopw@rti.rti.org

mjs@hpfcso.HP.COM (Marc Sabatella) (11/28/89)

>    { { <inputfile1 reader1 }
>      { <inputfile2 reader2 }
>      { synthesizer1 }
>      { synthesizer2 } } merger { { consumer1 >outputfile1 }
>                                  { consumer2 } }

>not circular or DAG-type connections

Although this was not present in my example, feedback loops are very common in
the types of musical applications we were concerned with.  And in particular,
one extra path I would love to add to the above is one directly from each
input synthesizer to an output file.  This could presumably be done by assuming
redirection is non-destructive.  I don't see how to anything more complex than
that.  But this makes a good start.

--------------
Marc Sabatella
HP Colorado Language Lab
marc%hpfcrt@hplabs.hp.com