vance@mtxinu.UUCP (Vance Vaughan) (11/09/84)
4.2 BUGLIST ABSTRACTS from MT XINU, part 8 of 10: The following is part of the 4.2 buglist abstracts as processed by Mt Xinu. The initial line of each abstract gives the offending program or source file and source directory (separated by --), who submitted the bug, when, and whether or not it contained a proposed fix. Due to license restrictions, no source is included in these abstracts. Important general information and disclaimers about this and other lists is appended at the end of the list... sys/kern_time.c--sys salkind@nyu (Lou Salkind) 10 Mar 84 +FIX The timezone field in the settimeofday system call is ignored. (I discovered this when I tried to change the PST timezone on our Pyramid system.) REPEAT BY: Run the program below and you will see no difference. _______________________________________________________________________________ sys/pty.c--sys Spencer W. Thomas <thomas@utah-cs> 26 Jul 83 +FIX When writing more than TTYHOG characters to the controlling end of a PTY in cooked mode, characters will be lost. The PTY should either block or return a partial count (in non-blocking mode). However, the write completes, but all characters above TTYHOG have been dumped on the floor by ttyinput. [Note: the string being written should have several newlines in it.] REPEAT BY: This program demonstrates the problem. It writes 6 65 character lines (including newline) in one write into the controlling end of a pty. A fork reads the lines from the slave end. It only successfully reads the first 3 lines. The write returns successfully with a count of 390 bytes written. When a newline is finally written to the controlling end, the slave reads one more partial line. The total number of bytes read by the slave is TTYHOG (+1 for the extra newline). If TTYHOG is greater than 390 on your system, increase the number of bytes written by the controller. ================================================================ /* * tstpty.c - Test pty bug. * * Author: Spencer W. Thomas */ #include <stdio.h> char tststring[] = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n"; char sendbuf[BUFSIZ]; main() { int ptcfd, ptsfd, n; if ( (ptcfd = open("/dev/ptyqf", 2)) < 0) { perror("ptyqf"); exit(1); } if ( (ptsfd = open("/dev/ttyqf", 2)) < 0) { perror("ttyqf"); exit(1); } if (fork() == 0) { close(ptcfd); while ((n = read(ptsfd, sendbuf, BUFSIZ)) > 0) printf( "%d:%*.*s", n, n, n, sendbuf ); exit(0); } strcpy( sendbuf, tststring ); for (n=0; n<5; n++) strcat( sendbuf, tststring ); printf( "Buflen = %d\n", strlen(sendbuf) ); n = write( ptcfd, sendbuf, strlen(sendbuf) ); printf( "Write returned %d\n", n ); sleep(2); printf( "Sending newline\n" ); write( ptcfd, "\n", 1 ); close( ptcfd ); wait(0); } ================================================================ _______________________________________________________________________________ sys/socket--sys Spencer W. Thomas <thomas%UTAH-GR@utah-cs> 16 Aug 83 A write to a pipe with a bad buffer address does not return an error code (under 4.1a), or returns the wrong error code (under 4.2). On 4.1a, it appears to write garbage into the pipe "forever" (longer than I was willing to wait for it). REPEAT BY: Compile this program: main() { write( 1, 0xabcde, 512 ); perror("write"); } Running a.out gives "write: Bad address". a.out >/dev/null prints "write: Error 0" (another bug, actually). a.out | see prints a lot of garbage (looks like its just running through the buffer pool to me) on 4.1a. On 4.2 it gives the error "write: No buffer space available", obviously the wrong error message. _______________________________________________________________________________ sys/sys_generic.c--sys Marc Shapiro 26 Jul 84 +FIX The arguments passed to the select system call are 3 longs, which are copied into an array of 3 ints (ibits), then back from obits. The manual entry for select specifies those 3 arguments as "int *readfds, *writefds, *exceptfds". This is non-portable to machines where an int is 2 bytes, if NOFILES>15. REPEAT BY: Reading the code (lines 254, 273-275, etc.) and manual entry for select(2) FIX: declare all the above variables as longs. ------- _______________________________________________________________________________ sys/sys_generic.c--sys Mike Braca <mjb%Brown@UDel-Relay> 3 Oct 83 +FIX I claim that exceeding file size limits does not work as advertised. According to the man page for getrlimit(2), when you hit a soft limit you should get a signal (in this case SIGXFSZ) and when you hit the hard limit things stop working. Here is what the man page for getrlimit(2) has to say about it: "A resource limit is specified as a soft limit and a hard limit. When a soft limit is exceeded a process may receive a signal (for example, if the cpu time is exceeded), but it will be allowed to continue execution until it reaches the hard limit (or modifies its resource limit).... A file i/o operation which would create a file which is too large will cause a signal SIGXFSZ to be generated, this nor- mally terminates the process, but may be caught." The way I read this is that the write should succeed if the "soft" limit is exceeded, so if you ignore SIGXFSZ you effectively ignore the soft limit. However the write fails (with the wrong error code, but that's another bug report). The man page for write(2) is no help, here's what it says about it: "[EFBIG] An attempt was made to write a file that exceeds the process's file size limit or the maximum file size." It doesn't specify "soft" or "hard" limit. I, of course, understood "hard" limit. REPEAT BY: Read the man page for getrlimit(2) and become confused about whether or not write()s will succeed after you hit the soft file limit. Write a program that expects that when you ignore SIGXFSZ, the "soft" limit will be ignored. Set your "soft" filesize limit to something small, and run the program. Watch in amazement as the program runs to completion, but the file it produces is incomplete. E.g. compile and run this program and watch it fail: #include <time.h> #include <resource.h> #include <stdio.h> #include <signal.h> main() { struct rlimit lims; int fd, rc; signal(SIGXFSZ, SIG_IGN); lims.rlim_cur = 0; lims.rlim_max = RLIM_INFINITY; setrlimit(RLIMIT_FSIZE, &lims); fd = creat("/tmp/fsizetest", 0666); rc = write(fd, "This will not work\n", 19); if (rc < 0) perror("write"); } _______________________________________________________________________________ sys/sys_generic.c--sys Mike Braca <mjb%Brown@UDel-Relay> 3 Oct 83 +FIX When a process exceeds its file size limit, the write fails with error EMFILE (too many open files). It should actually fail with error EFBIG (file too big). REPEAT BY: Read the manual page for write(2), then compile and run the following program: #include <time.h> #include <resource.h> #include <stdio.h> #include <signal.h> main() { struct rlimit lims; int fd, rc; signal(SIGXFSZ, SIG_IGN); lims.rlim_cur = 0; lims.rlim_max = RLIM_INFINITY; setrlimit(RLIMIT_FSIZE, &lims); fd = creat("/tmp/fsizetest", 0666); rc = write(fd, "This will not work\n", 19); if (rc < 0) perror("write"); } _______________________________________________________________________________ sys/sys_generic.c--sys rws@mit-bold (Robert W. Scheifler) 25 Feb 84 +FIX If a SIGTSTP is generated on the controlling tty of a process that is waiting in a select() on that tty, the process will mysteriously vanish. REPEAT BY: Run in foreground the program: main() { int fds = 1; select(1, &fds, 0, 0, 0); } and then generate SIGTSTP from the keyboard. The process will correctly suspend, but as soon as a character becomes available for input to the terminal (i.e. as soon as you type CR to the shell), the process will vanish. Why: At the select(), the tty t_rsel gets set to the process, but no chars are available, so the process goes into state SSLEEP on &selwait. When the suspend character is typed, a psignal() on the process changes its state to SSTOP and sets p_cursig to SIGTSTP. When input chars are made available to the tty, a ttwakeup() is performed, which calls selwakeup() because t_rsel is still set. Since this is the only process that has done select() on the tty, there are no collisions, and selwakeup() simply calls setrun() on the process rather than calling wakeup(). Therein lies the bug, because this bogusly makes the process runnable, and it will run before the input chars are gobbled, and so the select() will succeed and try to return. However, p_cursig is still set to SIGTSTP, and syscall() will see it and call psig(), which will call exit() and the process will vanish. Also note another bug (which I don't propose a fix for here): select() will succeed on a tty even if the process and the tty are in different process groups. So the process will think there is data to read, and then hang trying to do the actual read. _______________________________________________________________________________ sys/sys_xxx.c--sys cbosgd!mark (Mark Horton) 29 Jul 83 +FIX Accounting gets turned off when there is plenty of space on the /usr filesystem. REPEAT BY: Fill up /usr to where df shows over 92% full. The console will print "Accounting suspended" and all accounting is turned off. If your system is already over 92% full, this happens when /etc/rc tries to turn on accounting. The manual claims that when the disk fills up (I read this to mean 100% full), accounting is turned off. The code suggests that the intent was that if it gets less than 2% free, accounting is turned off, and if it gets over 4% free, it will be turned back on. In reality, the numbers used are 8% and 16%. _______________________________________________________________________________ sys/tty.c--sys Mike Braca <mjb%Brown@UDel-Relay> 27 Sep 83 +FIX Setting TANDEM when in cooked mode loses big. When the input queue gets bigger than TTYHOG/2 a STOP char is sent, but a START char will never be sent because the input queue won't get smaller until a break character is received. Since the sender is blocked, it can't send the break character! REPEAT BY: Get on a terminal that does ^S ^Q protocol. Type "stty tandem". Then type enough characters that the terminal locks. Notice how the terminal never unlocks. _______________________________________________________________________________ sys/tty.c--sys davec@BERKELEY 19 Aug 83 +FIX A bug that might be interesting to those using Un*x on machines other than Vaxen - In tty.c, the function scanc() (which is replaced by a sed script on vax and sun versions) returns an incorrect value. As you can verify by looking up the scanc instruction in the Vax Architecture handbook, scanc leaves the number of bytes remaining in r0. The scanc() function incorrectly leaves an index to the character which fit the mask. So the "return (i);" should be changed to return (size - i); It means the difference of the tty working and not working!! Hoping its not too late for 4.2 ... Dave Cobbley Engineering Computing Systems Tektronix, Inc. tektronix!tekecs!davec (503) 685-2383 _______________________________________________________________________________ sys/tty.c--sys chris@maryland (Chris Torek) 2 Aug 84 I'm not sure if this is a bug or a feature, but select ignores process groups when determining whether to return true for a tty. REPEAT BY: Run the following program in the background, then hit RETURN. #include <sys/types.h> #include <sys/time.h> main () { int in, ex, sel; in = ex = 1; sel = select (1, &in, (int *) 0, &ex, (struct timeval *) 0); printf ("sel=%d in=%d ex=%d\n", sel, in, ex); exit (0); } There seems to be a relationship between this and the fact that select mysteriously dies if you type a ^Z followed by a return. (Csh tells you the job is ``stopped'' and then it vanishes from the list of active jobs.) Chris _______________________________________________________________________________ sys/tty.c--sys Michael John Muuss <mike@brl-vgr> 15 Dec 83 +FIX If the high bit of the "local flags" is set, TIOCLGET smears that bit across the high halfword of the int by the >>16. Credit for finding this goes to Doug Gwyn, <Gwyn@BRL>. REPEAT BY: Set the bit with TIOCLSET, and read it back with TIOCLGET. _______________________________________________________________________________ sys/tty.c,tty_subr.c,vaxuba/dz.c--sys koda@hobgoblin 22 Feb 84 +FIX 3Com interfaces interrupt at IPL 16 while DZ's come in at 15. The DZ code assumes that spl5 (IPL 15) is good enough to hold off any pending interrupts. REPEAT BY: Combination of moderate dz and ethernet activity will cause random panics. FIX: Replace all spl5 to spl6 in above mentioned modules. Actually spl6 (IPL 18) is a little over kill but there is no other good pre-defined level unless you edit asm.sed. _______________________________________________________________________________ sys/tty_pty.c--sys decvax!mcvax!jim (Jim McKie) 6 Apr 84 +FIX 1) When the slave end of a pseudo-tty closes, the controlling side is not informed if it is already trying to read from the device. 2) As it says in the manual, it should be but isn't possible to send an end-of-file to the slave side by the controlling side doing a 0-length write in TIOCREMOTE mode. REPEAT BY: 1) The following short program may or may not cause the parent to wait forever in the read(), depending on whether it gets there before the child exits. It is always possible to put a sleep() in the child process before the exit to ensure the parent gets to read. main() { switch(fork()){ case -1: perror("fork"); break; case 0: child(); /*NOTREACHED*/ default: parent(); /*NOTREACHED*/ } exit(1); } child() { register int fd; if((fd = open("/dev/ttyp4", 1)) == -1){ perror("/dev/ttyp4"); exit(1); } (void) write(fd, "Hello world\n", 12); exit(0); } parent() { register int fd, n; char buf[100]; if((fd = open("/dev/ptyp4", 2)) == -1){ perror("/dev/ptyp4"); exit(1); } while((n = read(fd, buf, sizeof(buf))) > 0) (void) write(1, buf, n); if(n == -1) perror("read"); else printf("EOF"); exit(0); } 2) Typing EOF to a process expecting input from a shell window in EMACS - the process is undisturbed. _______________________________________________________________________________ sys/tty_pty.c--sys Web Dove <dove@sylvester> 21 Feb 84 Using pty's as a link to remote sites means characters read from the ptc side get sent to the remote terminal. When these characters are 02xx timing characters they are sent directly. This means that if the user tty program doesn't translate them, the terminal gets broken. Since the user terminal program generally doesn't know whether the pty is in raw vs cooked mode, it isn't a simple thing for it to expand those timing characters properly. REPEAT BY: We have seen the problem with a non-translating server for remote terminals. Because the timing characters are not translated, the terminal gets broken. FIX: Add code in ptcread() to check for cooked mode and if cooked, to translate the timing characters into an appropriate number of nulls for the current speed that the terminal is operating at. _______________________________________________________________________________ sys/tty_pty.c--sys decvax!uthub!thomson (Brian Thomson) 19 Jun 84 +FIX Oink oink! That is the sound that data makes as it travels through 4.2BSD's pseudo-tty driver. Even in the high-volume direction (slave to controller) there is a great deal of code executed per-character. On our otherwise idle 750 I measured the maximum pty throughput at 5K chars/sec.; after applying the following mods it reached 30K chars/sec. If your machine is often accessed through rlogin(1c) this can mean considerable savings in system-state CPU time. REPEAT BY: Run this program and use iostat(1) to see what your character rate is. #include <sys/types.h> char buf[1024]; int wsize = 1024; main() { int csock, dsock, i; for(i=0 ; i<wsize; i++) buf[i] = '0'; csock = getpty(&dsock); if(csock == -1) { perror("ptty"); exit(1); } if(fork() == 0) { /* Child, writes on slave. */ close(csock); while(write(dsock, buf, wsize) != -1) ; } else { /* parent reads from controller */ close(dsock); while(read(csock, buf, wsize) != -1) ; } exit(0); } getpty(ip) int *ip; { static char name[] = "/dev/ptyp0"; int i; int res; for(i=0; i<16; i++) { name[9] = i+'0'; res = open(name, 2); if(res != -1) { name[5] = 't'; *ip = open(name, 2); if(*ip != -1) return(res); name[5] = 'p'; close(res); } } return(-1); } _______________________________________________________________________________ sys/tty_tb.c--sys guest@ucbarpa (Guest Account) 19 Jun 84 +FIX tbioctl() was apparently never converted to operate in a 4.2bsd environment. REPEAT BY: Examine code in tbioctl() in tty_tb.c. _______________________________________________________________________________ sys/tty_tb.c--sys dagobah!bill (Bill Reeves) 13 Sep 83 +FIX When a tablet is closed the inuse flag is not cleared. Thus after a while all tablets are unavailable. REPEAT BY: Just use it for a while. _______________________________________________________________________________ sys/ufs_alloc.c--sys decvax!jmcg (Jim McGinness) 6 Feb 84 +FIX There is a buffer etiquette bug in the cylinder group resource allocation routines `alloccg' and `ialloccg'. It causes a system buffer covering the cylinder group resource counts to be marked BUSY which eventually causes other processes to be blocked with inodes locked. If there are sufficiently many processes trying to create, extend, or remove files from that cylinder group, the root inode will be locked and the system will appear to be hung. REPEAT BY: A prerequisite for this to happen is that a file system must have become full or almost full so that the resource counts in the cylinder groups are zero. The way the problem has occurred on decvax (and on cbosgd) was for the file system containing the uucp spool directories to become almost full. _______________________________________________________________________________ sys/ufs_alloc.c--sys mckusick@ucbmonet (Kirk Mckusick) 1 Oct 84 +FIX There are two bugs in checking to see if a fragment can be incresed in size. The first bug always causes the check to fail, forcing a new fragment to be allocated. This failure causes a minor performance degredation, but is otherwise harmless. The second bug could potentially cause a system panic, but never occurs because of the first bug. REPEAT BY: Though generating a panic is possible in theory, constructing an example is difficult. _______________________________________________________________________________ sys/ufs_mount.c--sys guest@ucbarpa (Guest Account) 19 Jun 84 +FIX The mountfs() routine in ufs_mount.c fails to validate some critical data in the superblock before using the data. This can cause UNIX to crash if you inadvertently (or purposely) try to mount a disk with garbage on it. REPEAT BY: Mount a filesystem whose superblock contains an absurd fs_sbsize value. _______________________________________________________________________________ sys/ufs_mount.c--sys guest@ucbarpa (Guest Account) 19 Jun 84 +FIX getmdev() in ufs_mount.c forgets to iput() on error cases. This can result in a hung system following a rejected mount request. REPEAT BY: The following command sequence will wedge UNIX: /etc/mount /dev/rhp1a /mnt ls -l /dev/rhp1a Any character-special device can be used in lieu of /dev/rhp1a. _______________________________________________________________________________ sys/ufs_namei.c--sys Jeff Schwab <jrs@Purdue.ARPA> 25 May 84 +FIX It appears that some versions of the 4.2 kernel have re-implemented the concept of "sticky" directories. The existing code catches the case where a user is attempting to delete a file he does not own, but failes to catch the rename case. Under many conditions, a rename can cause many of the same problems as a delete. REPEAT BY: Create a file in a sticky directory that you don't own. Then try and rename it and you can! _______________________________________________________________________________ sys/ufs_nami.c?--sys Chris Kent <kent@BERKELEY> 28 Jun 83 There seems to be a rather odd behaviour in nami. I will describe it as best I can, but since I can only cause it to happen in a particularly small set of circumstances, I don't fully understand it yet. It began when we tried to compile uucp; creat() calls to creat the temporary files which are then linked to were failing with ENOENT. The person working on the compile tried many things, and became suspicious of the chdir call. He replaced chdir(Spool) with chdir("/usr/spool/uucp/"); /* note trailing / */ and things began working again. Removing the trailing / causes things to break. It turns out that changing chdir(Spool) to chdir(Spool); chdir("."); also fixes things. Thus it would appear that some ending condition in namei() (?) is munged. REPEAT BY: Simple programs work fine; I can't construct a program that fails. However, all the uucp family programs fail in the same way. I have looked for aliases to chdir in uucp sources that might cause this, but have not found anything. Similarly for creat(). _______________________________________________________________________________ sys/ufs_syscalls.c--sys mckusick@ucbmonet (Kirk Mckusick) 30 Jun 84 +FIX There is a race condition between the `unlink' and `rename' system calls that can cause the system to leave a reference in a directory that points to an unallocated inode. The next time the entry is accessed, the system panics with a "freeing free inode panic". REPEAT BY: When the following two programs are run, the directory entry for "AA" eventually points to an unallocated inode and the `rename' system call panics when it tries to delete the previous file associated with "AA" in preparation for renaming "A". main() { while(1) { close(creat("A",0666)); rename("A","AA"); } } main() { while(1) { unlink("A"); } } _______________________________________________________________________________ sys/ufs_syscalls.c--sys watmath!arwhite (Alex White) 8 Feb 84 +FIX copen doesn't check permissions if FTRUNC is specified but FWRITE isn't. This means you can truncate files you don't have perms on, and truncate to zero length DIRECTORIES!!!! REPEAT BY: #include <sys/file.h> main() { open("xyz", O_TRUNC|O_RDONLY); /* xyz with no write perms */ open(".", O_TRUNC|O_RDONLY); /* Directory is truncated! */ } _______________________________________________________________________________ sys/ufs_syscalls.c--sys mazama!stew (Stewart Levin) 10 Jan 84 +FIX Our local software (as well as commands like `tee') rely on the ESPIPE error from lseek() to determine whether data is coming/going down a pipe. When converting to 4.2 this failed and we tracked it down to lseek setting an EINVAL rather than ESPIPE error number. REPEAT BY: call lseek on a pipe. FIX: change source at line 371 in ufs_syscalls to if(fp == NULL) { if(u.u_error == EINVAL) u.u_error = ESPIPE; return; } _______________________________________________________________________________ sys/ufs_syscalls.c--sys mazama!stew (Stewart Levin) 3 Sep 84 My program was issuing relative seeks, checking for a -1 return code, and then issuing a read, again checking for a -1 return code. The read did return -1 and set EINVAL. The same arguments had been passed to read() in 20 previous calls. Finally I found that the file offset had been decremented below zero by the previous lseek. REPEAT BY: printf("%d\n",lseek(fd,10L,0)); printf("%d\n",lseek(fd,-30L,0)); printf("%d\n",read(fd,buffer,10)); FIX: In lseek() copy fp->f_offset into a local variable and operate on it. If the result is negative, set u.u_error = EINVAL otherwise store it back into fp->f_offset. _______________________________________________________________________________ sys/ufs_tables.c--sys salkind@nyu (Lou Salkind) 22 May 84 +FIX Both ufs_subr.c and ufs_tables.c are used by fsck. In ufs_subr.c, the location of the #include files depends on #ifdef KERNEL. Not so in ufs_tables.c! REPEAT BY: Compile fsck. Have different header files floating around. FIX: I have changed in ufs_tables.c the line #include "../h/param.h" to read #ifdef KERNEL #include "../h/param.h" #else #include <sys/param.h> #endif The other possible change would be to eliminate the KERNEL #ifdef's in ufs_tables.c. _______________________________________________________________________________ sys/uipc_socket.c--sys sdcsvax!sdccsu3!madden@Nosc 7 Nov 83 +FIX Under 4.2 BSD, termination of a program which has invoked a listen on a UNIX domain socket will cause an interminable loop at net interrupt level if there are pending connections which have not yet been accepted. REPEAT BY: Run program A below in the background. Run program B twice. Kill program A. The result should be a system hang at net interrupt level. _______________________________________________________________________________ sys/uipc_socket.c--sys Mike Braca <mjb%Brown@UDel-Relay> 27 Sep 83 +FIX If you try to write 64K or more to a pipe in a single write() system call, the system will crash with "panic: sbflush 2" when the reader closes its end of the pipe. There is a bug in the socket sending routine (sosend()) whereby it doesn't ever do partial writes to the socket. So when you tell it to write 64KB, by golly, it just jams the data in the pipe without regard for the arbitrary pipe size limit of 4KB. This in itself is not that bad (after all, we have 6MB of memory!), but, alas, the size of the data queued in the socket is kept in a short int. So 64KB of buffers have been allocated, but the count has wrapped to 0. The read statement doesn't do anything. But on closing the 'read' half of the pipe, the system crashes because it can't figure out how to de-allocate the 64KB worth of memory buffers. REPEAT BY: Compile and run the following program: main() { int pipefd[2]; char data[64*1024]; pipe(pipefd); write(pipefd[1], data, 64*1024); /* Shouldn't get here, right? WRONG! */ /* (it should hang because the pipe's not that big, */ /* and no one is reading it) */ close(pipefd[1]); while (read(pipefd[0], data, 1) > 0); close(pipefd[0]); /* SURPRISE! your system just crashed. */ } _______________________________________________________________________________ sys/uipc_socket.c--sys sun!rusty (Russel Sandberg) 2 Apr 84 +FIX Send or sendto of zero length udp packet returns with no error but doesn't send anything. REPEAT BY: Write a program to send zero length udp packets. _______________________________________________________________________________ sys/uipc_socket2.c--sys genji@UCBTOPAZ.CC (Genji Schmeder) 14 Oct 83 large network buffer causing "sbflush 2" panic _______________________________________________________________________________ sys/uipc_syscalls.c--sys Dave Rosenthal 6 Jul 84 +FIX The value returned by a successful socketpair() call is the same as the value in sv[1]. The manual says it should be zero. A trivial bug. REPEAT BY: #include <stdio.h> #include <sys/types.h> #include <sys/socket.h> #include <sys/file.h> main(argc,argv) int argc; char **argv; { int sv[2], res; printf("socketpair() returns %d\n", socketpair (AF_UNIX, SOCK_DGRAM, 0, sv)); exit (0); } _______________________________________________________________________________ sys/uipc_usrreq.c--sys ralph (Ralph Campbell) 12 Sep 83 +FIX If you pass more than one file descriptor in a message it won't work right. FIX: Apply following diff to uipc_usrreq.c/unp_externalize() ------- uipc_usrreq.c ------- 473c473 < *(int *)rp = f; --- > *(int *)rp++ = f; _______________________________________________________________________________ sys/uipc_usrreq.c--sys watmath!arwhite (Alex White) 23 Jan 84 +FIX Receiving data with MSG_OOB set causes panic in the unix domain. soreceive() calls pr_usrreq with a newly allocated mbuf, but the code for PRU_RCVOOB is non-existent, hence it always frees it, when it returns to soreceive that tries to free it again and panics. REPEAT BY: Just do an recv with the flag MSG_OOB set in the unix domain. _______________________________________________________________________________ sys/uipc_usrreq.c--sys watmath!arwhite (Alex White) 20 Feb 84 +FIX Accept -> soaccept -> uipc_usrreq(PRU_ACCEPT) -> bcopy Bcopy dies as unp->unp_remaddr == 0 Why? Because the connect which this accept refers to, connect -> soconnect -> uipc_usrreq(PRU_CONNECT) -> unp_connect -> unp_connect2 -> m_copy; m_copy has run out of mbufs and returns zero into unp->unp_remaddr. REPEAT BY: Ya gotta be kidding, it was after 18,000 requests for memory denied that we got this one. And there seem to be soooo many bugs that occur if you run out, it isn't funny; you'll never get this one a second time! you'll get hit by one of the others. However, for anybody that wants to try, I enclose changes to kern_exit.c so that when you run out of mbufs you won't panic the next time a process exits... We keep on running out of mbufs - generally ~600 mbufs allocated to socket structures, ~600 allocated to protocol control blocks, and ~100 to socket names and addresses. (However, we've also had similar crashes without the socket name and address mbufs). We have hundreds of students running a 5-process game communicating via pipes OR sockets in the unix domain. No, it doesn't seem to be legitimate running out because of too many pipes. There don't seem to be enough sitting around after the crash. I've looked at most student's programmes, and haven't found any yet which seems to cause any problems. _______________________________________________________________________________ sys/uipc_usrreq.c--sys spgggm@ucbopal.CC (Greg Minshall) 31 Jan 84 If you open a socket in the Unix domain, using Datagrams, you should be able to do a connect(), and then just standard writes (or send()s, or whatever). Instead, after the connect(), a write gives you an EDESTADDRREQ. This is because connect() (actually, unp_connect2() in uipc_userreq.c) never actually sets SS_ISCONNECTED. REPEAT BY: Here are two programs, main()+hintSet and main2()+hintSend that demonstrate the problem. (also included is makefile) ###main.c #include <sys/time.h> #include <signal.h> main() { int hintSet(), hintClear(); int nfds, hintNo; long readfs; struct timeval timeout; timeout.tv_usec = 0; /* micro seconds */ timeout.tv_sec = 10; /* seconds */ hintNo = hintSet("hint"); for (; 1 ;) { readfs = 1<<hintNo; nfds = select(hintNo+1, &readfs, (long *) 0, (long *) 0, &timeout); if (nfds > 0) { if ((readfs & (1<<hintNo)) != 0) hintClear(hintNo); printf("hinted\n"); } else if (nfds == 0) printf("timed out\n"); else { perror("select"); exit(1); } } return(0); } ###hintSet.c #include <fcntl.h> #include <errno.h> #include <sys/types.h> #include <sys/ioctl.h> #include <sys/socket.h> #include <sys/un.h> extern int errno; /* hintSet - make room for someone to send us a hint path is a unix path name to be used as the address of the hint. we return an int (actually, a file descriptor), which should then be used in a select(2). the following code shows usage... #include <sys/time.h> #include <signal.h> main() { int hintSet(), hintClear(); int nfds, hintNo; long readfs; struct timeval timeout; timeout.tv_usec = 0; micro seconds timeout.tv_sec = 10; seconds hintNo = hintSet("hint"); for (; 1 ;) { readfs = 1<<hintNo; nfds = select(hintNo+1, &readfs, (long *) 0, (long *) 0, &timeout); if (nfds > 0) { if ((readfs & (1<<hintNo)) != 0) hintClear(hintNo); printf("hinted\n"); else if (nfds == 0) printf("timed out\n"); else { perror("select"); exit(1); } } } return(0); } Note that after the select, "hintClear" MUST be called, else any future selects become no-ops. We take error exits for strange events. If the file name is already in use as a socket, we attempt to unlink it (and unfortunate occurrence). If hintSet is called TWICE (even from two seperate users) with the same pathname, the second caller will pick up all future hints from "hintSend". */ int hintSet(path) char *path; /* path name for hint */ { int s, length, diddle, savedError; long mypid; struct sockaddr_un foo; s = socket(AF_UNIX, SOCK_DGRAM, 0); if (s == -1) { perror("hintSet: socket"); exit(1); } length = strlen(path); if (length > sizeof foo.sun_path) length = sizeof foo.sun_path; strncpy(foo.sun_path, path, length); if (bind(s, &foo, (sizeof foo)-1) == -1) { if ( ((savedError = errno) == EADDRINUSE) && (open(path, O_RDONLY) == -1) && (errno == EOPNOTSUPP) && (unlink(path) != -1) && (bind(s, &foo, (sizeof foo)-1) != -1) ) ; else { errno = savedError; perror("hintSet: bind"); exit(1); } } /* set non blocking... */ diddle = fcntl(s, F_GETFL, 0); if (diddle == -1) { perror("hintSet: fcntl F_GETFL"); exit(1); } diddle = fcntl(s, F_SETFL, diddle | FNDELAY); if (diddle == -1) { perror("hintSet: fcntl F_SETFL"); exit(1); } hintClear(s); return(s); } /* hintClear - clear any hints outstanding on our area... */ int hintClear(s) int s; { char buffer[1024]; while (((read(s, buffer, 1024)) != -1) || (errno != EWOULDBLOCK)) ; } ###main2.c #include <sys/time.h> #include <signal.h> main() { int hintSend(); hintSend("hint"); } ###hintSend2.c #include <fcntl.h> #include <errno.h> #include <sys/types.h> #include <sys/ioctl.h> #include <sys/socket.h> #include <sys/un.h> #define MESSAGE "hint" extern int errno; /* hintSend - send a hint to an address in the Unix domain. the argument 'path' is the Unix pathname waiting for the hint. hintSend takes error exits if too strange of things happen if "path" doesn't exist, or if no one is connected to it, we just quietly return to the caller. */ hintSend(path) char *path; /* path name for hint */ { int s, length; struct sockaddr_un foo; s = socket(AF_UNIX, SOCK_DGRAM, 0); if (s == -1) { perror("hintSend: socket"); exit(1); } length = strlen(path); if (length > sizeof foo.sun_path) length = sizeof foo.sun_path; strncpy(foo.sun_path, path, length); if (connect(s, &foo, (sizeof foo)-1) == -1) { perror("hintSend: connect"); exit(1); } if (write(s, MESSAGE, strlen(MESSAGE)) == -1) { perror("hintSend: write"); exit(1); } } ###makefile CFLAGS = -g main: main.o hintSet.o $(CC) $(CFLAGS) main.o hintSet.o -o main main2: main2.o hintSend.o $(CC) $(CFLAGS) main2.o hintSend.o -o main2 main3: main2.o hintSend2.o $(CC) $(CFLAGS) main2.o hintSend2.o -o main3 main.o: main.c hintSet.o: hintSet.c main2.o: main2.c hintSend.o: hintSend.c hintSend2.o: hintSend2.c FIX: in sys/sys/uipc_usrreq.c, routine unp_connect2, in the switch under "case SOCK_DGRAM", add the line soisconnected(so); to get the stuff set right. (untested fix). _______________________________________________________________________________ sys/various.c--sys kre@ucbmonet (Robert Elz) 21 Oct 83 +FIX This just might, I say might, cause I'm not sure yet, be that elusive inode bug we've been having. It certainly is an inode bug, but I'm yet to be convinced that all the problems we've been having ultimately descend from this (which is either 1 or 2 bugs, depending on how that you look at it). The scenario is something like this ... Process is closing a char device, most probably a terminal, close calls closef() and when it returns, sets u_ofile to NULL. closef() calls ino_close() which does an iput() on the inode, then calls *(devsw[].d_close)(). Now imagine that the close routine is going to need to wait for output queues to drain, or whatever, so it sleeps, and while its asleep, a signal occurs. The longjmp(u_qsave) exits back to syscall() which (lets say for simplicity) calls psig() and then exit(). exit(), noticing a u_ofile that is != NULL (close() doesn't set it to NULL till closef() returns, which its not going to do) then calls closef() again (nb: exit sets u_ofile to NULL before closef()) closef() again calls ino_close, which does another iput(), causing the inode reference count to end up at -1 (and most probably doing many other nasty things). To fix that, I have just moved the u.u_ofile[..] = NULL; to before the close (I did the same thing with u_pofile[] = 0 (*pf = 0) but I am less sure that that is important). While I was looking at that, I saw a related, and somewhat messier bug. The scenario this time starts out the same way, down as far as the iput() in ino_close(). Just after that, in a line marked with XXX, f_count is set to 0. Then the routine belts off to the devsw[].d_close routine (or maybe just doing the itrunc(), ifree(), dqrele() sequence). Anything that has a sleep() in it. We don't need an interruptible sleep this time. While process is sleeping, some other proc does a falloc(), and finds our file slot, that we've generously given away before we're really finished with it. It then grabs it, carefully setting f_count to 1 (or whatever) to mark the file in use. Then our process finishes its sleep, and returns from ino_close to closef() which then sets f_count to zero. A little later, the second process does a close, which noticing that f_count < 1, does all the right things, and no problems ensue. But, if in the meantime, some third process has found this file slot, with its f_count == 0, and grabbed it again, there are now 2 refs, & a ref count of 1. As each of them close it, they both do an iput(), making the ref count go to -1, and generally stuffing things. My fix to this is a real kludge, and is explained below. It should be done properly. REPEAT BY: Make the load average very high (say 50 to 60 or more) with large numbers of processes opening and closing files, etc. (You need that because of the way that falloc() scans the file table). Run the system like that continuously for a day or so. Do a "pstat -i" and look for something with a ref count of 255 (which is really 65535 or -1, pstat conveniently masks the i_count with 0377, but that is another bug entirely). _______________________________________________________________________________ sys/vm_mem.c--sys rws@mit-bold (Robert W. Scheifler) 7 Nov 83 +FIX On large partitions (> 2^19 blocks), the block number gets sign extended, causing panic: munhash. The Berkeley code should work, but there appears to be a bug in the C compiler. REPEAT BY: Try to use lots of a large partition. FIX: In /sys/sys/vm_mem.c in memall() the code swapdev : mount[c->c_mdev].m_dev, (daddr_t)(u_long)c->c_blkno should be changed to swapdev : mount[c->c_mdev].m_dev, c->c_blkno and in /sys/vax/vm_machdep.c in chgprot() the code munhash(mount[c->c_mdev].m_dev, (daddr_t)(u_long)c->c_blkno); should be changed to munhash(mount[c->c_mdev].m_dev, c->c_blkno); because the C compiler apparently incorrectly folds the (daddr_t) and (u_long) together and sign extends anyway. Simply taking out the (daddr_t)(u_long) works, although lint will probably complain about it. _______________________________________________________________________________ sys/vm_mem.c(?)--sys bdh@cit-750 (Brian D. Horn) 29 Jun 84 +FIX When using a debugger (anything that uses ptrace(2), this is known to occur when using dbx and sdb) it is possible to crash the system with a "panic: munhash". When examining the traceback it would appear that the problem originates when a ptrace(4,...) is made (modify childs text segment). We are running on a VAX-11/750 with 2Mbytes real memory. REPEAT BY: Seems to be non-deterministic in nature. Best guess as to how to repeat this is to debug (using dbx or sdb) a "large" (1M or bigger) program and setting a breakpoint or two and starting it running. No guarantee that this will cause the panic however. _______________________________________________________________________________ sys/vm_swp.c--sys lwa@mit-mrclean (Larry Allen) 30 Oct 84 +FIX A readv or writev call to or from a raw disk only does the operation specified by the first element of the io vector. REPEAT BY: Try to perform a readv from a raw disk, specifying a two-element io vector. The number of bytes read will equal the number of bytes specified in the first element of the io vector. _______________________________________________________________________________ sys/vmpage.c--sys allegra!princeton!astro 6 Jun 83 +FIX <<FIX THIS BUG IF YOU DO LARGE TRANSFERS ON RAW DMA DEVICES!!!>> There is a paging routine bug in 4.1 BSD that affects the locking of memory for dma on Raw I/O devices. This bug can cause a process to hang at priority -24 (PSWP+1). The problem occurs when an attempt is made to lock a page that is in the process of being swapped out. The call in mlock in pagin() will block if this is the case. However anything can happen during this block. In particular some other process can have grabbed that page. Pagein() really should start the processing of that page fault again from the beginning. _______________________________________________________________________________ sys/vmsched.c--sys Spencer W. Thomas <thomas@utah-cs> 4 Aug 83 +FIX The t_rm and t_vm fields in the vmtotal structure are usually too big (more than the total real memory on our system in the case of t_rm). REPEAT BY: Use adb to examine the field. _______________________________________________________________________________ sys_inode.c--sys dagobah!efo (Eben Ostby) 17 Nov 83 processes waiting for a flock lock will hang if someone waiting for the lock is killed. The ILWAIT bit never gets cleared. REPEAT BY: You'd have to set up a messy sequence of people waiting for shared and exclusive locks, then kill the right guy. FIX: Either ILWAIT has to be a count rather than a bit (which could be decremented when the guy dies) or everyone waiting for any kind of flock would have to be woken up when the guy dies. _______________________________________________________________________________ syslog.c--lib Christopher A Kent <cak@Purdue.ARPA> 11 Jan 84 +FIX Attempts to use syslog(3) fail in programs that perform other network functions. Output to the log file is garbled, or correct network system call invocations fail for no apparent reason. The syslog supplied with sendmail does, however, function correctly. Inspection of the two versions shows that the C library version neglects to bind the datagram socket to any address. Recompiling the supplied libc source and reinstalling /lib/libc.a causes the problem to go away(!). REPEAT BY: Compile and run the following, both with and without an argument, and inspect the syslog output. /* * demonstrate broken syslog */ #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #include <netdb.h> #include <syslog.h> struct sockaddr_in sin = { AF_INET }; /* socket address */ main(argc, argv) char **argv; { int s; if(fork()) exit(0); if(argc > 1) openlog("stest", LOG_PID); syslog(LOG_INFO, "starting up"); sin.sin_addr.s_addr = INADDR_ANY; sin.sin_port = htons(77); s = socket(AF_INET, SOCK_STREAM, 0, 0); if(s < 0){ perror("socket"); exit(1); } if(bind(s, &sin, sizeof(sin)) < 0){ perror("bind"); exit(1); } syslog(LOG_INFO, "message1"); syslog(LOG_INFO, "message2"); } _______________________________________________________________________________ GENERAL INFORMATION ON THE 4.2 BUGLIST FROM MT XINU _________________________________________________________________ --IMPORTANT DISCLAIMERS-- Material in this announcement and the accompanying reports has been edited and organized by MT XINU as a service to the UNIX community on a non-profit, non-commercial basis. MT XINU MAKES NO WARRANTY, EXPRESSED OR IMPLIED, ABOUT THE ACCURACY, COMPLETENESS, OR FITNESS FOR USE FOR ANY PURPOSE OF ANY MATERIAL INCLUDED IN THESE REPORTS. MT XINU welcomes comments in writing about the contents of these reports via uucp or US mail. MT XINU cannot, however, accept telephone calls or enter into telephone conversations about this material. _________________________________________________________________ Legal difficulties which have delayed the distribution of 4.2bsd buglist summaries by MT XINU have been resolved and three versions of the buglist are now available. The current buglist has been derived from reports submitted to 4bsd-bugs@BERKELEY (not from reports submitted only to net.bugs.4bsd, for example). Reports are integrated into the buglist as they are received, so that any distributions are current to within a week or so. Buglists now being distributed are essentially "raw". No judgment has been passed as to whether the submitted bug is real or not or whether it has been fixed. Only minimal edit- ing has been done to produce a manageable list. Reports which are complaints (rather than bug reports) have been eliminated; obscenities and content-free flames have been eliminated; and duplicates have been combined. The result- ing collection contains over 500 bugs. Three versions of the buglist are now ready for distribu- tion: 2-Liners: Two lines per bug, including a concise description, the affected module, the submittor. Approximately 55K bytes, it is being distributed to net.sources con- currently with this announcement. All-but-Source: All material, except that all but the most inocuous of source material has been removed to meet AT&T license restrictions. Nearly a mega-byte, this will be distributed to net.sources in several 50K byte pieces later this week. A paper listing or mag tape is also available, see below. Please note that local usenet size restrictions may prevent large files from being received and/or retransmitted. MT XINU will not dump this material on the net a second time; if your site has not received material of interest to you within a reasonable time, please send for a paper or tape copy. All-with-Source (FOR SOURCE LICENSEES ONLY): 4.2 licensees who also have a suitable AT&T source license can obtain a tape containing all the material, including proposed source fixes where such were submit- ted. Once again, MT XINU has not evaluated, tested or passed judgment on proposed fixes; all we have done is organ- ize the collection and eliminate obvious irrelevancies and duplications. A free paper copy of the All-but-Source list can be obtained by sending mail to: MT XINU 739 Allston Way Berkeley CA 94710 attn: buglist or electronic mail to: ucbvax!mtxinu!buglist (Be sure to include your US mail address!) For a tape, send a check for $110 or a purchase order for $150 to cover MT XINU's costs to the address given above (California orders add sales tax). For the All-with-Source list, mail us a request for the details of license verifica- tion at either of the above addresses.