[comp.unix.wizards] Shared Memory --- Parallel filters and piping -- Examples Needed

rbj@icst-cmr.arpa (Root Boy Jim) (02/18/88)

   From: "John S. Robinson" <jsrobin@eneevax.uucp>

   If a programmer has filters filt1, filt2, ... filtn which he wishes to
   apply in a serial fashion on a stream of data, the process can be accomplished
   in a trivial fashion by use of a sequence of pipes:
	   filt1 < <stream> |filt2 |filt3 | ... |filtn > <sink> .

   How does one handle the case where some of the above filters are to be
   applied in parallel and then be recombined:

It depends on what you mean by `recombined'. Do you want the output of the
parallel filters in order, or do you want them to run asynchronously,
mixing their output? BTW, you have two `filt5's in your diagram.

			      filt3
			     /	    \
			    /	     \
   filt1 < <stream> |filt2 /__filt4___+ filt5 | filt6 | ... filtn > <sink> .
			   \	     /
			    \	    /
			     \filt5/

My diagram will look something like this:

   f1 < <stream> | f2 | parallel 'f3' 'f4 -opts' 'f5a' | f5b ...

The program `parallel' forks once for each of it's arguments and execs them
(one filter, f4, is shown with options/arguments) exactly as shown. However,
before it can do this, the parent must establish a pipe between each of it's
children. The parent thus performs the function of `tee' except that
each it is writing to processes rather than files.

   Note that in general the data stream will be to long to be stored and then
   have applied the various filters.

Some versions of unix will buffer pipe data to actual files, some won't.
I am not clear on which ones do or don't, altho to hazard a guess, I think
Syetem V does and BSD doesn't. Someone please enlighten me.

I don't see why you will have to buffer the data unless you care about
the collecting the filters' output in order rather than  intermingled.
To collect the output in order, you need an extra process at the end
order the output correctly. Note that if the OS will not buffer the
pipes on disk, you don't get much parallism, and may get deadlock.

   Secondly, each of the 'filtk' programs will be programmed to accept data
   from stdin in and will send data to stdout (with the exception of 'filt5'
   which would probably be getting its data from shared memory.

No problem. Just set up the plumbing correctly.

   The type of solution I am looking for would be a 'control program' which
   would be given a sequence of programs to be executed as child processes. The
   control program would set up the shared memory segments and the appropriate
   unnamed or named pipes that would feed the various parallel filters. The
   algorithms must operate online and asynchonously and hence the access to
   shared memory must have some sort of handshaking so that after filt[345]
   are through using a buffer filt2 can be allowed to refill it.

The other thing to do is just bite the bullet and embed the thing in a shell
script, using temporary files to distribute the input and collect and/or
order the output. It's not such a big deal, programs like sort use them
all the time, even tho they look like (and behave as) filters.

   If anyone has examples of their use of shared memory and/or pipes in a way
   similar to the above please send me a copy. I really need help on this.

   My e-mailing address is:

	   jsrobin@eneevax.umd.edu

   The machine internet address is 128.8.133.1

   Thank you for your consideration of this problem.

	(Root Boy) Jim Cottrell	<rbj@icst-cmr.arpa>
	National Bureau of Standards
	Flamer's Hotline: (301) 975-5688
	FEELINGS are cascading over me!!!

stuart@bms-at.UUCP (Stuart D. Gathman) (02/20/88)

>  From: "John S. Robinson" <jsrobin@eneevax.uucp>

> If a programmer has filters filt1, filt2, ... filtn which he wishes to
> apply in a serial fashion on a stream of data, the process can be accomplished
> in a trivial fashion by use of a sequence of pipes:
>   filt1 < <stream> |filt2 |filt3 | ... |filtn > <sink> .

>   How does one handle the case where some of the above filters are to be
>   applied in parallel and then be recombined:

Here is my solution:

filt1|cat `@ filt2` `@ filt3`|filt4

The '@' program executes its arguments ala 'nohup' or 'nice' with stdout
connected to a named pipe.  The name of the named pipe is printed on @'s
standard output (which is different from the program it runs).  Note that
this function cannot be written as a shell script because the shell insists
on waiting for all background processes before exiting.

/* @.c */
#include <stdio.h>
#include <fcntl.h>

static char null[] = "/dev/null";	/* null device in case of error */

main(argc, argv)
  char **argv;
{
  char *fname;
  int fd;

  fname = tempnam(NULL,"@pipe");
  if (argc < 2) {
    puts(null);
    return 1;
  }

  fd =  mknod(fname,0010600);
  if (fd == -1) {
    perror(fname);
    puts(null);
    return 1;
  }

  switch (fork()) {
  case 0:
    close(1);
    fd = open(fname,O_WRONLY);
    if (fd != 1) return 1;
    switch (fd = fork()) {
    case 0:
      execvp(argv[1],argv+1);	/* try to execute direct */
      perror(argv[1]);
      return 1;
    case -1:
      perror("fork");
      unlink(fname);
      return 1;
    default:
      while (wait((int *)0) != fd);
      unlink(fname);
      return 0;
    }
  case -1:
    perror("fork");
    puts(null);
    return 1;
  default:
    puts(fname);
  }
  return 0;
}
-- 
Stuart D. Gathman	<stuart@bms-at.uucp>
			<..!{vrdxhq|dgis}!bms-at!stuart>

sow@cad.luth.se (Sven-Ove Westberg) (02/23/88)

In article <11876@brl-adm.ARPA> rbj@icst-cmr.arpa (Root Boy Jim) writes:
|
|   From: "John S. Robinson" <jsrobin@eneevax.uucp>
|
|   If a programmer has filters filt1, filt2, ... filtn which he wishes to
|   apply in a serial fashion on a stream of data, the process can be accomplished
|   in a trivial fashion by use of a sequence of pipes:
|	   filt1 < <stream> |filt2 |filt3 | ... |filtn > <sink> .
|
|   How does one handle the case where some of the above filters are to be
|   applied in parallel and then be recombined:
|
|It depends on what you mean by `recombined'. Do you want the output of the
|parallel filters in order, or do you want them to run asynchronously,
|mixing their output? BTW, you have two `filt5's in your diagram.
|
|			      filt3
|			     /	    \
|			    /	     \
|   filt1 < <stream> |filt2 /__filt4___+ filt5 | filt6 | ... filtn > <sink> .
|			   \	     /
|			    \	    /
|			     \filt5/
|
|My diagram will look something like this:
|
|   f1 < <stream> | f2 | parallel 'f3' 'f4 -opts' 'f5a' | f5b ...
|

Why not use named pipes? 

With named pipes you can use a standard program f5b, I assume that
f5b reads its input from nfiles the syntax would then be,

f1 < <stream>  | f2 | parallel 'f3' 'f4 -opts' 'f5a' 'f5b' | f6 | .....

If you use named pipes you didn't have too write special versions of f5.
The program parallel forks once for each argument. Function f3 to f5a
is called with it's stdout to p1 p2 ... pn. f5b is called with the
arguments f5b p1 p2 p3 (p=named pipe). The program parallel could be a 
shell script. 


Sven-Ove Westberg, CAD, University of Lulea, S-951 87 Lulea, Sweden.
Tel:     +46-920-91677  (work)                 +46-920-48390  (home)
UUCP:    {uunet,mcvax}!enea!cad.luth.se!sow
Internet: sow@cad.luth.se

rbj@icst-cmr.arpa (Root Boy Jim) (03/01/88)

   From: Sven-Ove Westberg <sow@cad.luth.se>

   In article <11876@brl-adm.ARPA> rbj@icst-cmr.arpa (Root Boy Jim) writes:
   |
   |   From: "John S. Robinson" <jsrobin@eneevax.uucp>
   |
   |			      filt3
   |			     /	    \
   |			    /	     \
   |   filt1 < <stream> |filt2 /__filt4___+ filt5 | filt6 | ... filtn > <sink>
   |			   \	     /
   |			    \	    /
   |			     \filt5/
   |
   |My diagram will look something like this:
   |
   |   f1 < <stream> | f2 | parallel 'f3' 'f4 -opts' 'f5a' | f5b ...
   |

   Why not use named pipes? 

Because they don't exist! At least not in *my* UNIX. Besides, as I wrote:

jsr|   How does one handle the case where some of the above filters are to be
jsr|   applied in parallel and then be recombined:
jsr|
rbj|It depends on what you mean by `recombined'. Do you want the output of the
rbj|parallel filters in order, or do you want them to run asynchronously,
rbj|mixing their output? BTW, you have two `filt5's in your diagram.

I think the `@' program previously posted, and the idea of using named
pipes is rather elegant. I have often wanted to split a pipe stream
into two, but never wanted to recombine them, altho I suppose one use
might be simple collection of output into one file.

Other people have also mentioned using awk to write to two pipes as well,
along with caveats that it might be somewhat buggy in some environments.
I have also heard that the Korn shell can do something like this.

The key word here is `recombined'. If you want each line to be processed
by one and only one randomly chosen filter, then previously suggested
programs using named pipes seem the way to go. However, this seems
quite useless, and I wonder what is the real problem we are trying to solve.

If, on the other hand, you want each line to be processed once by each
filter, and then recombined, you need a program to clone the input and
distribute it to each filter. How the output is combined is another
question not specified very well in the original problem either. Do
we want the output randomly assembled, or do we want all the lines
from the first filter before any of the others. If we want the former,
than it is arguable that the output is meaningless; if we want the
latter, we sacrifice most of the parallelism except for short programs.

To illustrate where pipe splitting just might be useful, consider
the following example. I want a list of what gets dumped to tape
when I do backups, but I don't want to read the tape twice. What I
have to do is:

	dump 0u /
	restore tv >& DUMPLIST

What I would like to do is:

	dump 0uf - | tee /dev/rmt8 | restore tvf - >& DUMPLIST

Hey, this almost works! Unfortunately, the magtape is unblocked,
so I would need to do something like:

	dump 0uf - | tee "| dd bs=20b > /dev/rmt8" | restore ...

Hmmm, maybe I will try the awk trick.

   Sven-Ove Westberg, CAD, University of Lulea, S-951 87 Lulea, Sweden.
   Tel:     +46-920-91677  (work)                 +46-920-48390  (home)
   UUCP:    {uunet,mcvax}!enea!cad.luth.se!sow
   Internet: sow@cad.luth.se

	(Root Boy) Jim Cottrell	<rbj@icst-cmr.arpa>
	National Bureau of Standards
	Flamer's Hotline: (301) 975-5688
	YOW!!!  I am having fun!!!