wayne@csri.toronto.edu (Wayne Hayes) (09/23/90)
I've encountered what I beleive to be a bug in either the specifications or the implementation of piping in minix. The following works fine: # cmp - file <file where the '-' means compare stdin to file. However, if file is longer than 1024 bytes, then the following fails with "EOF on <stdin>": # cat file | cmp - file After some poking around in cmp.c, and adding a few printf's I discovered that when stdin is reading from a pipe, then a call to read(2) to get 1025 bytes or more returns only 1024 bytes. Thus cmp.c reads 6K directly from the file, but it only gets 1K at a time from a pipe, and assumes it has reached EOF on stdin. I think I know what the jist of the problem is here, even though I haven't yet done any serious hacking in Minix's kernel/mm/fs. In Unix, you generally try to "keep things moving", so when piping, you accumulate a "reasonable" amount of output from the writer before passing it to the reader. The definition of "reasonable" depends on if you want efficiency (big buffers) or "fairness" in some way so that the user doesn't have to wait gobs of time to see something (if for example you do ``cat HugeFile | more'' (smaller buffers.) The problems occurs when a program (cmp, for example) doesn't need any of the "fairness" as defined above. It just wants maximum throughput, so it issues an explicit call to read(2) of the form read(filedes, hugeBuf, sizeof(hugeBuf)); but if filedes refers to a pipe, then it only gets sizeof(pipeBuf), not sizeof(hugeBuf). (Or perhaps pipeBuf *is* large, it's just that if the reader is attempting to read faster than the writer is writing, and the reader blocks, then the OS decides to give it what is there so far rather than what the reader asked for, in the name of "fairness".) So, as far as I can see, at least one of the following is true: 1) There is a bug in the specification of piping in Minix. 2) There's a bug in the implementation of piping in Minix. 3) There's a bug in the implementation of read(2) in Minix. 4) There's a bug in cmp.c In case it's just the latter, here's a quick fix for cmp.c: read the pipe first, and then just read whatever number of bytes you got from the pipe from the other file. Here is cmp.c.cdif: ---------------------------------------- *** cmp.c.bak Sun Sep 23 11:01:39 1990 --- cmp.c Sun Sep 23 11:21:19 1990 *************** *** 74,80 **** exit_status = 0; do { n1 = read(fd1, buf1, BLOCK_SIZE); ! n2 = read(fd2, buf2, BLOCK_SIZE); n = (n1 < n2) ? n1 : n2; if (n < 0) { printf("cmp: Error on %s\n", (n1 < 0) ? file_1 : file_2); --- 74,80 ---- exit_status = 0; do { n1 = read(fd1, buf1, BLOCK_SIZE); ! n2 = read(fd2, buf2, n1); n = (n1 < n2) ? n1 : n2; if (n < 0) { printf("cmp: Error on %s\n", (n1 < 0) ? file_1 : file_2); -- "The number of programs that can be done with the Hubble Space Telescope has always greatly exceeded the time available for their execution, and this remains true even with the telescope in its current state." -- HST Science Working Group and User's Commitee Report, 1990 June 29. Wayne Hayes INTERNET: wayne@csri.utoronto.ca CompuServe: 72401,3525
cechew@bruce.cs.monash.OZ.AU (Earl Chew) (09/25/90)
In <1990Sep23.115725.16689@jarvis.csri.toronto.edu> wayne@csri.toronto.edu (Wayne Hayes) writes: >4) There's a bug in cmp.c Yup. Earl -- Earl Chew, Dept of Computer Science, Monash University, Australia 3168 EMAIL: cechew@bruce.cs.monash.edu.au PHONE: 03 5655447 FAX: 03 5655146 ----------------------------------------------------------------------