[comp.os.minix] Bug in specifications of piping or bug in cmp.c? and a kludgy fix

wayne@csri.toronto.edu (Wayne Hayes) (09/23/90)

I've encountered what I beleive to be a bug in either the specifications or
the implementation of piping in minix.

The following works fine:

# cmp - file <file

where the '-' means compare stdin to file.  However, if file is longer than
1024 bytes, then the following fails with "EOF on <stdin>":

# cat file | cmp - file

After some poking around in cmp.c, and adding a few printf's I discovered
that when stdin is reading from a pipe, then a call to read(2) to get
1025 bytes or more returns only 1024 bytes.  Thus cmp.c reads 6K directly
from the file, but it only gets 1K at a time from a pipe, and assumes it
has reached EOF on stdin.

I think I know what the jist of the problem is here, even though I haven't
yet done any serious hacking in Minix's kernel/mm/fs.  In Unix, you generally
try to "keep things moving", so when piping, you accumulate a "reasonable"
amount of output from the writer before passing it to the reader.  The
definition of "reasonable" depends on if you want efficiency (big buffers)
or "fairness" in some way so that the user doesn't have to wait gobs of time
to see something (if for example you do ``cat HugeFile | more'' (smaller
buffers.)

The problems occurs when a program (cmp, for example) doesn't need any of
the "fairness" as defined above.  It just wants maximum throughput, so it
issues an explicit call to read(2) of the form

read(filedes, hugeBuf, sizeof(hugeBuf));

but if filedes refers to a pipe, then it only gets sizeof(pipeBuf), not
sizeof(hugeBuf).  (Or perhaps pipeBuf *is* large, it's just that if
the reader is attempting to read faster than the writer is writing, and
the reader blocks, then the OS decides to give it what is there so far
rather than what the reader asked for, in the name of "fairness".)

So, as far as I can see, at least one of the following is true:

1)  There is a bug in the specification of piping in Minix.
2)  There's a bug in the implementation of piping in Minix.
3)  There's a bug in the implementation of read(2) in Minix.
4)  There's a bug in cmp.c

In case it's just the latter, here's a quick fix for cmp.c:  read the
pipe first, and then just read whatever number of bytes you got from
the pipe from the other file.  Here is cmp.c.cdif:

----------------------------------------
*** cmp.c.bak	Sun Sep 23 11:01:39 1990
--- cmp.c	Sun Sep 23 11:21:19 1990
***************
*** 74,80 ****
    exit_status = 0;
    do {
  	n1 = read(fd1, buf1, BLOCK_SIZE);
! 	n2 = read(fd2, buf2, BLOCK_SIZE);
  	n = (n1 < n2) ? n1 : n2;
  	if (n < 0) {
  		printf("cmp: Error on %s\n", (n1 < 0) ? file_1 : file_2);
--- 74,80 ----
    exit_status = 0;
    do {
  	n1 = read(fd1, buf1, BLOCK_SIZE);
! 	n2 = read(fd2, buf2, n1);
  	n = (n1 < n2) ? n1 : n2;
  	if (n < 0) {
  		printf("cmp: Error on %s\n", (n1 < 0) ? file_1 : file_2);

-- 
"The number of programs that can be done with the Hubble Space Telescope has
always greatly exceeded the time available for their execution, and this
remains true even with the telescope in its current state." -- HST Science
Working Group and User's Commitee Report, 1990 June 29.
Wayne Hayes	INTERNET: wayne@csri.utoronto.ca	CompuServe: 72401,3525

cechew@bruce.cs.monash.OZ.AU (Earl Chew) (09/25/90)

In <1990Sep23.115725.16689@jarvis.csri.toronto.edu> wayne@csri.toronto.edu (Wayne Hayes) writes:

>4)  There's a bug in cmp.c

Yup.

Earl
-- 
Earl Chew, Dept of Computer Science, Monash University, Australia 3168
EMAIL: cechew@bruce.cs.monash.edu.au PHONE: 03 5655447 FAX: 03 5655146
----------------------------------------------------------------------