[comp.unix.questions] Can UNIX pipe connections be compiled?

dana@rucs.runet.edu (Dana Eckart) (01/19/91)

Does there exist a piece of software (or is it even possible) to compile
a pipe?  In particular, suppose you had 

	ls -l | fgrep "Dec" | cut -f 4

is there anyway to compile the above pipeline so that the pieces can
communicate more quickly.  I am looking for a general solution, not
one that works only for the above example.

The question arises because I have constructed some small programs which 
become VERY slow when piped together.  It appears that if I can get around 
the slow speed of standard (character based) i/o that things will be MUCH 
faster.

Although I suspect I am stuck (unless I rewrite my code - combining the
pieces programs into a single program), perhaps some kind netter will be
able to save me a great deal of grief.

Thanks in advance...

J Dana Eckart     INTERNET: dana@rucs.runet.edu
                     SNAIL: P.O. Box 10865/Blacksburg, VA  24062-0865

barmar@think.com (Barry Margolin) (01/19/91)

In article <1991Jan18.193234.216@rucs.runet.edu> dana@rucs.runet.edu (Dana Eckart) writes:
>Does there exist a piece of software (or is it even possible) to compile
>a pipe?  In particular, suppose you had 
>
>	ls -l | fgrep "Dec" | cut -f 4
>
>is there anyway to compile the above pipeline so that the pieces can
>communicate more quickly.  I am looking for a general solution, not
>one that works only for the above example.

I'm not really sure I (or you) understand what you expect the pipe to be
compiled into.  On Unix, each program has to be run in its own process, so
they're going to have to use some form of inter-process communication to
feed the data to each other.  There are shell script compilers, but all
they do is save the overhead of parsing the commands and interpreting shell
built-ins; the compiled script still runs each command in its own process
and sets up pipes for them to communicate.

>The question arises because I have constructed some small programs which 
>become VERY slow when piped together.  It appears that if I can get around 
>the slow speed of standard (character based) i/o that things will be MUCH 
>faster.

If the programs that are used in the pipeline do character-at-a-time I/O,
then speeding up the pipeline isn't going to help.  Compiling the pipeline
wouldn't change the programs; they'll still be doing character I/O.

I strongly doubt that the speed of the pipe is the limiting factor; this is
a pretty simple mechanism whose performance is extremely important to most
Unix implementors.  I just timed the following on a Sun-4/330 running SunOS
4.0.3:

	cat file file file | cat >/dev/null

"file" is a 4Mb file on an NFS server.  The SunOS version of "cat" uses
mmap() to read in files named as arguments, so once it is all paged into
memory (I ran the command until it got zero page faults) nearly all the
overhead should be in the pipe (about 95% of the CPU time was system time,
and I doubt I was spending much time in the null device driver).  I was
getting about 4Mbyte/CPU-second throughput.

And I think most stdio implementations don't actually do
character-at-a-time I/O.  getc() and putc() are usually implemented as
macros that read/write a buffer, and don't actually do any I/O until the
buffer is empty/full (putc()'s output buffer will also be flushed if you
call fflush()).

>Although I suspect I am stuck (unless I rewrite my code - combining the
>pieces programs into a single program), perhaps some kind netter will be
>able to save me a great deal of grief.

Have you actually profiled your programs and found that they are spending
most of their time doing I/O to pipes?
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

darcy@druid.uucp (D'Arcy J.M. Cain) (01/19/91)

In article <1991Jan18.193234.216@rucs.runet.edu> Dana Eckart writes:
>Does there exist a piece of software (or is it even possible) to compile
>a pipe?  In particular, suppose you had 
>	ls -l | fgrep "Dec" | cut -f 4
>is there anyway to compile the above pipeline so that the pieces can
>communicate more quickly.  I am looking for a general solution, not
>one that works only for the above example.

I don't see how.  Any program that was created from the above line would
have to do everything the shell does when it sees that line and that
program has to be loaded and run as well.  If anything such a program
would slow it down.

Just a thought BTW - are you running out of memory?  If you are right
at the low limit you may be swapping when you get a large enough pipe.
My motherboard died recently and I have been running on a borrowed one
with less memory and I see a lot of slowdown with it. The swapping is
quite noticeable.

-- 
D'Arcy J.M. Cain (darcy@druid)     |
D'Arcy Cain Consulting             |   There's no government
West Hill, Ontario, Canada         |   like no government!
+1 416 281 6094                    |

raja@bombay.cps.msu.edu (Narayan S. Raja) (01/19/91)

In article <1991Jan18.230530.9331@convex>, (Tom Christiansen) writes:
< From the keyboard of dana@rucs.runet.edu (Dana Eckart):
< :Does there exist a piece of software (or is it even possible) to compile
< :a pipe?  In particular, suppose you had 
< :
< :	ls -l | fgrep "Dec" | cut -f 4
< :
< :is there anyway to compile the above pipeline so that the pieces can
< :communicate more quickly.  I am looking for a general solution, not
< :one that works only for the above example.

< In general, the answer to whether things like this can be automagically
< compiled is no, because you can't know what all the pieces are a priori.


However, wouldn't pipes be speeded up considerably
on a Sun by mounting /tmp as a tmpfs filesystem
(i.e. memory-based filesystem)?  Apparently tmpfs
is *really* quick under SunOS 4.1.1.

Pardonnez-moi if this is a dumb suggestion.


Narayan Sriranga Raja.

mike (Michael Stefanik) (01/20/91)

In article <1991Jan18.193234.216@rucs.runet.edu> rucs.runet.edu!dana (Dana Eckart) writes:
>
>Does there exist a piece of software (or is it even possible) to compile
>a pipe?  In particular, suppose you had 
>
>	ls -l | fgrep "Dec" | cut -f 4
>
>is there anyway to compile the above pipeline so that the pieces can
>communicate more quickly.  I am looking for a general solution, not
>one that works only for the above example.

Unless I'm reading you wrong, you seem to think that pipes are some coded
mechanism for communication between processes; it isn't.  An (anonymous)
pipe is a temporary entity created in the filesystem by the kernel on
behalf of two related processes that want to communicate.  It is useful 
to think of a pipe as a regular file, in which one process is writing to on
one end, and another process is reading from on the other end.

Typically, a pipe can buffer up to about 5K of data flowing through the
pipe.  When the pipe "fills up", the writing process is blocked until
the reading process reads from the pipe.  Similarly, the reading process
will block on an empty pipe, until the writing process writes something.
Should the reading process die and the writing process attempt to write
on the pipe, a signal will be sent (SIGPIPE) to the offending writing
process (which tells it that there is no longer anything out there to
read from the pipe).  If this wasn't done, the writing process would deadlock
when the pipe buffer filled, waiting for a reading process that no longer
existed.

So, after this brief overview of piping, the answer is, no, you cannot
"compile" pipes to increase the speed of reads and writes to the pipe.
An excellent reference would be Bach's book on UNIX System V.
-- 
Michael Stefanik, Systems Engineer (JOAT), Briareus Corporation
UUCP: ...!uunet!bria!mike
--
technoignorami (tek'no-ig'no-ram`i) a group of individuals that are constantly
found to be saying things like "Well, it works on my DOS machine ..."

tchrist@convex.COM (Tom Christiansen) (01/20/91)

From the keyboard of uunet!bria!mike (Michael Stefanik):
:Unless I'm reading you wrong, you seem to think that pipes are some coded
:mechanism for communication between processes; it isn't.  An (anonymous)
:pipe is a temporary entity created in the filesystem by the kernel on
:behalf of two related processes that want to communicate.  

No, it's not.  On BSD systems, pipe(2) is implemented as a 
semi-disabled version of socketpair(2).  It's all IPC -- no
filesystem activity is involved.  All the work is not a Vax,
nor is it SysV.  

:It is useful 
:to think of a pipe as a regular file, in which one process is writing to on
:one end, and another process is reading from on the other end.
:
:Typically, a pipe can buffer up to about 5K of data flowing through the
4k on my system.
:pipe.  When the pipe "fills up", the writing process is blocked until
:the reading process reads from the pipe.  Similarly, the reading process
:will block on an empty pipe, until the writing process writes something.
:Should the reading process die and the writing process attempt to write
:on the pipe, a signal will be sent (SIGPIPE) to the offending writing
:process (which tells it that there is no longer anything out there to
:read from the pipe).  If this wasn't done, the writing process would deadlock
:when the pipe buffer filled, waiting for a reading process that no longer
:existed.

This is all true and useful information.  (As far as I know.)

:So, after this brief overview of piping, the answer is, no, you cannot
:"compile" pipes to increase the speed of reads and writes to the pipe.

But you can often rearrange your program so it doesn't shove the data
through a bunch of processes' address spaces.  A good example is the slow
old makewhatis script, which runs much faster when coded do to the work
entirely in one process.  

--tom
--
"Hey, did you hear Stallman has replaced /vmunix with /vmunix.el?  Now
 he can finally have the whole O/S built-in to his editor like he
 always wanted!" --me (Tom Christiansen <tchrist@convex.com>)

gwyn@smoke.brl.mil (Doug Gwyn) (01/20/91)

In article <1991Jan19.072755.3291@msuinfo.cl.msu.edu> raja@cpswh.cps.msu.edu writes:
>However, wouldn't pipes be speeded up considerably
>on a Sun by mounting /tmp as a tmpfs filesystem

No, genuine pipes are NOT files in /tmp!

mike@bria (01/20/91)

Tom Christiansen writes:
>From the keyboard of uunet!bria!mike (Michael Stefanik):
>>Unless I'm reading you wrong, you seem to think that pipes are some coded
>>mechanism for communication between processes; it isn't.  An (anonymous)
>>pipe is a temporary entity created in the filesystem by the kernel on
>>behalf of two related processes that want to communicate.  
>
>No, it's not.  On BSD systems, pipe(2) is implemented as a 
>semi-disabled version of socketpair(2).  It's all IPC -- no
>filesystem activity is involved.  All the work is not a Vax,
>nor is it SysV.  

Yup, and as I was writing this I thought of mentioning it, but decided
not to (someone who's having troubles with SV3 pipes ain't gonna glean
much from the BSD socket mechanism.)  However, let's not forget that
member of the pipe family that is a certain member of the filesystem,
namely, the "named pipe".  

Hmmm ... there are anonymous pipes and named pipes.  How about the
"incognito pipe"?
-- 
Michael Stefanik, Systems Engineer (JOAT), Briareus Corporation
UUCP: ...!uunet!bria!mike
--
technoignorami (tek'no-ig'no-ram`i) a group of individuals that are constantly
found to be saying things like "Well, it works on my DOS machine ..."

guy@auspex.auspex.com (Guy Harris) (01/21/91)

>However, let's not forget that member of the pipe family that is a
>certain member of the filesystem, namely, the "named pipe".  

It may be a member of the file system, but in many flavors of UNIX -
including SunOS and, I think, S5R4 - a named pipe may have a *name*
that's in the file system, but I/O to or from a named pipe doesn't go
through the file system.

klaus@cnix.uucp (klaus u schallhorn) (01/21/91)

In article <373@bria> uunet!bria!mike (Michael Stefanik) writes:
>In article <1991Jan18.193234.216@rucs.runet.edu> rucs.runet.edu!dana (Dana Eckart) writes:
>>
>>Does there exist a piece of software (or is it even possible) to compile
>>a pipe?  In particular, suppose you had 
>>
>>	ls -l | fgrep "Dec" | cut -f 4
>>
>>is there anyway to compile the above pipeline so that the pieces can
>>communicate more quickly.  I am looking for a general solution, not
>>one that works only for the above example.
>
>Unless I'm reading you wrong, you seem to think that pipes are some coded
>mechanism for communication between processes; it isn't.  An (anonymous)
>pipe is a temporary entity created in the filesystem by the kernel on
>behalf of two related processes that want to communicate.  It is useful 
>to think of a pipe as a regular file, in which one process is writing to on
>one end, and another process is reading from on the other end.
>
But only to THINK of a pipe as a file, under unix there never IS a file.
The fact that pipes are implemented as files on certain other operating
systems probably lead to confusion someplace.

klaus
>-- 
>Michael Stefanik, Systems Engineer (JOAT), Briareus Corporation
>UUCP: ...!uunet!bria!mike
>--
>technoignorami (tek'no-ig'no-ram`i) a group of individuals that are constantly
>found to be saying things like "Well, it works on my DOS machine ..."


-- 
George Orwell was an Optimist

guy@auspex.auspex.com (Guy Harris) (01/23/91)

>But only to THINK of a pipe as a file, under unix there never IS a file.

Well, under *some* versions of UNIX (V7, S3, and I think it's still true
in S5, prior to S5R4), a pipe is sort-of implemented as a file, complete
with an inode *and* a list of N direct blocks pointed to by that inode;
those blocks really do end up containing the data in the file, although
if you're not unlucky, the data will be consumed before the block ever
has to be written to disk. 

rorex@locus.com (Phil Rorex) (01/24/91)

>In article <1991Jan18.193234.216@rucs.runet.edu> Dana Eckart writes:
>>Does there exist a piece of software (or is it even possible) to compile
>>a pipe?  In particular, suppose you had 
>>	ls -l | fgrep "Dec" | cut -f 4
>>is there anyway to compile the above pipeline so that the pieces can
>>communicate more quickly.  I am looking for a general solution, not
>>one that works only for the above example.
>
>
>I don't see how.  Any program that was created from the above line would
>have to do everything the shell does when it sees that line and that
>program has to be loaded and run as well.  If anything such a program
>would slow it down.
>
>Just a thought BTW - are you running out of memory?  If you are right

I agree.  Don't overlook this. ^^^^^^^^^^^^^^^^^^^^^  

>at the low limit you may be swapping when you get a large enough pipe.
>My motherboard died recently and I have been running on a borrowed one
>with less memory and I see a lot of slowdown with it. The swapping is
>quite noticeable.
>
>-- 
>D'Arcy J.M. Cain (darcy@druid)     |
>D'Arcy Cain Consulting             |   There's no government
>West Hill, Ontario, Canada         |   like no government!
>+1 416 281 6094                    |

I've split up many a long pipe because of excessive paging.

On a heavily loaded machine,
	ls -l > /tmp/tmp.$$.1
	fgrep "Dec" < /tmp/tmp.$$.1 > /tmp/tmp.$$.2
	cut -f 4 < tmp.$$.2
	rm /tmp/tmp.$$.[12] &
can get in and out before all the pieces of the pipeline
	ls -l | fgrep "Dec" | cut -f 4
ever even get loaded in.

BTW, I've found egrep to be faster than fgrep on the paging unix's I've been
on for scenario's like yours.   _    Your mileage may vary. 
        +1 213 337-5062        |_) |_  . |        ...!{ucla-se|uunet}!lcc!rorex
          Phillip Rorex        |   | ( | |        rorex@locus.com 
                    Disclaimer: I speak only for myself
-- 
				_
        +1 213 337-5062        |_) |_  . |        ...!{ucla-se|uunet}!lcc!rorex
          Phillip Rorex        |   | ( | |        rorex@locus.com 
                    Disclaimer: I speak only for myself