ske@pkmab.se (Kristoffer Eriksson) (08/02/89)
As I have told on occasion before, I'm working on porting MGR to Xenix386 on my spare time. (MGR is a windowing system from Bellcore, originally written for Sun systems, that has been distributed in comp.sources.unix) A major obstacle has been that the select system call, that exists in Xenix from version 2.3.1, does not work in many cases, e.g. on pty devices. MGR uses select and ptys. A couple of weeks ago, Clemens Schrimpe (csch@netcs) posted a fix for a bug in select on serial devices. Well, that was just one of many bugs in select, and the fix made no difference for ptys (which apparently do not use the serial device support). I've taken a really deep dive into the guts of Xenix386, and uncovered another couple of bugs, at least in 2.3.1. If they have been fixed in 2.3.2, please let me now. Meanwhile, I have deviced binary patches for these bugs, that I'll describe in a moment. Now for the symptoms: Whenever I perform a select() on the /dev/ptyp*-side of a pty (i suppose that is called the "master" side, and the other is the "slave" side) that is empty, and opened at both ends (the other side is /dev/ttyp*), the system dies a horrible death from "TRAP 000000E in SYSTEM [...] kernel: PANIC: non-recoverable kernel page fault" with the PC at a certain address in mptselect(). Before I applied the fix for Clemens' bug, in MGR, I could avoid the crash by using a number-of-file-descriptors argument less than 32 (MGR uses 32), in which case select() simply never would find anything on the pty file descriptor. This probably also depended on what other devices I used in the same select() call, among other things, making it very difficult to isolate the problem. In MGR, the select() was called with both stdin (the console) and the pty. After applying the fix, the behavior became more consistent: it crashed for any number of file descriptors. Several simple test programs I wrote didn't trigger any bad behavior at all. They probably didn't satisfy all the requirements I stated above. Now I have finally succeded in making a program that always crashes on my system. I have attached it at the end of this posting. I would expect that the effect can vary depending on how the kernel is configured. The bugs that cause the symptoms: 1) The select() routine in the kernel calls a routine named selscan(), that scans all the file descriptors that are indicated by the bitmaps supplied to the system call one by one. If the file is a character device, it performs the equivalent of an ioctl(fd, 0xffff, 0). Inside the kernel, that amounts to calling ioctl routine for the correspondig device, with the minor device number as the first argument. (According to the device driver manual.) The device number is available to the kernel from the in-core inode for the file. There the major and the minor device numbers are stored as two bytes of the same variable. The kernel should mask out only the minor number before passing it on to the device driver routines. That is what the ioctl() kernel routine does. Selscan() does not. Earlier versions were supposed to use the whole device number as argument. Probably the porting source of select() did so too. Seems someone forgot to check all details. Some devices do their own masking of the device number passed. Several have to mask out just portions of the minor number anyway, so they are not affected by this oversight. Som others just read the low byte of the argument. Maybe they have that argument declared as "char" in the source? A few don't care about the device number at all. The console multiscreen has the major number 0. But the pty driver (mptioctl) uses whole int argument as an index into it's private (16-entry) data structure. Guess what happens when the major and minor device number combine to form a much too big index... When this bug is fixed, the system no longer crashes, but in stead select() returns the error EINVAL for ptys whenever the pty is empty. 2) When the driver's ioctl() routine has done it's work, it should call selsuccess() or selfailure() to indicate the outcome of the poll. If neither of those is called, selscan() concludes that the device does not support the special select() poll ioctl, and returns the error EINVAL. The driver routines for the pty master side have names that begin with "mpt". The ioctl routine for that device is called mptioctl(). When it gets called with the select() ioctl, it calls mptselect(). Mptselect() calls selsuccess() when it succeeds, but it does not call selfailure in the opposite case! That bug needs patching at four different places where it just returns in stead of calling selfailure() (or five, if you count one impossible case as well), if you do it at the binary level. When this one is fixed, select() actually appears to work! I can run MGR and really get some output in the windows, for the first time now. (But don't hold your breath waiting for the finished release. I've got plenty of other things to attend to now.) Now to the patches. All fixes just require changing or adding a single instructions, so the can easily be applied directly to the executable with adb, instead of decompiling parts of the kernel to C, fix the C code, compile and link a new kernel. What follows is the adb commands that will do this. Both input commands and the resulting output is showed. Before and after each patch I show the disassembled instructions that are affected, so you can check that you get it right. If the addresses don't work for your version of Xenix386, you may try locating the instructions I show at some different nearby offset. In that case, remember to alter some of the constants in the patches to match the new address offsets. The "*" at the beginning of some lines is the adb prompt. The rest of those lines are adb commands. The lines starting with ";" are my added comments. All other lines are adb output. Notice that "w" and "W" are different commands. Don't get them mnixed up. Of course: YOU DO THIS AT YOUR OWN RISK. ----------- Start binary patches -------------- ; Make a copy to patch, and enter adb. # cp xenix xenix2 # adb -w xenix2 ; ; This is Clemens' serial devices patch in binary form in stead of the C ; rewrite he suggested. The eax register gets zeroed befor returning: ; * ttiocom+24,5?i _ttiocom+0x24: leave ret nop nop mov eax,[ebp+0x10] * ttiocom+26?wc3c9 _ttiocom+0x26: 0x9090= 0xc3c9 * ttiocom+24?wc031 _ttiocom+0x24: 0xc3c9= 0xc031 * ttiocom+24,4?i _ttiocom+0x24: xor eax,eax leave ret mov eax,[ebp+0x10] ; This is the fix to selscan to use only the minor device number. ; movsz is changed to movzx: ; * selscan+11c,2?i _selscan+0x11c: movsx eax,Word Ptr [ebp-0x6] push eax * selscan+11c?Wfa45b60f _selscan+0x11c: 0xfa45bf0f= 0xfa45b60f * selscan+11c,2?i _selscan+0x11c: movzx eax,Byte Ptr [ebp-0x6] push eax ; This changes the first errant return from mptselect into a long jump into ; the selfailure routine. The jump skips the stack frame creation at the ; beginning of selfailure. The rest of selfailure is just a mov to a ; memory location and return. There is a very convenient sequence of nops ; here to patch into. Word alignment of jump targets is a blessing for ; binary patching! The long expression used in the W command computes the ; offset to the jump target. ; * mptselect+9f,6?i _mptselect+0x9f: leave ret nop nop nop mov eax,[ebp-0x8] * mptselect+9f?we9 _mptselect+0x9f: 0xc3c9= 0xe9 * mptselect+a0?Wselfailure+3-mptselect-a0-4 _mptselect+0xa0: 0x90909000= 0xffffbd93 * mptselect+9f,2?i _mptselect+0x9f: jmp near _selfailure+0x3 mov eax,[ebp-0x8] ; The three following patches just changes three additional returns from ; mptselect() into short jumps to the above long jump. ; * mptselect+b1,4?i _mptselect+0xb1: leave ret nop test Byte Ptr [esi+0x40],0x4 * mptselect+b1?web|(100*(9f-b1-2)) _mptselect+0xb1: 0xeb= 0xeceb * .,3?i _mptselect+0xb1: jmp near _mptselect+0x9f nop test Byte Ptr [esi+0x40],0x4 * mptselect+fa,3?i _mptselect+0xfa: leave ret mov eax,[ebp-0x8] * mptselect+fa?web|(100*(9f-fa-2)) _mptselect+0xfa: 0xc3c9= 0xa3eb * .,2?i _mptselect+0xfa: jmp near _mptselect+0x9f mov eax,[ebp-0x8] * mptselect+109,4?i _mptselect+0x109: leave ret nop _msgsys: push ebp * mptselect+109?web|(100*(9f-109-2)) _mptselect+0x109: 0xc3c9= 0x94eb * .,3?i _mptselect+0x109: jmp near _mptselect+0x9f nop _msgsys: push ebp ; quit ; * $q ----------- End binary patches ---------------- To run the new kernel, reboot the system and supply the name of the new kernel at the boot prompt. Here follows the test program. When I run it on my unpatched kernel, it crashes at the second select(). If I only patch selscan(), it succeeds, but select() returns -1, and sets errno = 22 (EINVAL). When I use all the patches, it succeeds, and select() returns 0, as well as bits = 0, which looks reasonable to me. ---- Cut Here and unpack ---- #!/bin/sh # shar: Shell Archiver (v1.22) # # Run the following text with /bin/sh to create: # selectbug.c # select.uue # echo "x - extracting selectbug.c (Text)" sed 's/^X//' << 'SHAR_EOF' > selectbug.c && X/* selectbug.c */ X/* This program demonstrates some bugs in the select() system call when used X * on pty devices. If the demonstration is successfull, your system will X * CRASH. Be prepared to run fsck when the system comes up again. Don't run X * this program if you care about your filesystem. The filesystem shouldn't X * be destroyed by crashing, but you never know for sure. You run this at your X * own risk! X * X * If you have the development system version 2.2, you have to link this X * program with select.o, also supplied in this posting. X */ X X#include <sys/select.h> /* struct timeval */ X X/* Timeout used in select() call. Not needed to provoke the bug, but prevents X * the program from hanging if select() actually works. X */ Xstruct timeval timeout = { 1, 0 }; X Xmain() X{ X int fd1, fd2; X int bits, s; X extern int errno; X X fd1 = open("/dev/ptyp0", 2); X printf("ptyp fd = %d\n", fd1); X X fd2 = open("/dev/ttyp0", 2); X printf("ttyp fd = %d\n", fd2); X X printf( X "About to do select for the first time. Nothing usually happens.\n"); X printf("Press return to proceed.\n"); X getchar(); X sync(); /* Minimize damage to the file system. */ X sync(); X X bits = 1 << fd1; X s = select(16, &bits, 0, 0, &timeout); X printf("select = %d bits = %d errno = %d\n", s, bits, errno); X X printf("\nAbout to do second select. Now it should crash.\n"); X printf("Press return to proceed.\n"); X getchar(); X sync(); /* Minimize damage to the file system. */ X sync(); X X bits = 1 << fd1; X s = select(16, &bits, 0, 0, &timeout); X printf("select = %d bits = %d errno = %d\n", s, bits, errno); X X printf("\nThe system apparently did not crash...\n"); X if (s < 0) X printf("But the select still does not work.\n"); X} SHAR_EOF chmod 0666 selectbug.c || echo "restore of selectbug.c fails" set `wc -c selectbug.c`;Sum=$1 if test "$Sum" != "1690" then echo original size 1690, current size $Sum;fi echo "x - extracting select.uue (Text)" sed 's/^X//' << 'SHAR_EOF' > select.uue && Xbegin 644 select.o XM@ 0 B1XWI8- $0T]$105?5$585)68!P"I&@ # @&8C D !E]E<G)N;P#@ XMH2 0 "X*"0 )H !P!R <.C +C_____PTF=!@#D$"8! 4&0 X5#@ 0=?<V5L96-T >XH" !T X Xend SHAR_EOF chmod 0666 select.uue || echo "restore of select.uue fails" set `wc -c select.uue`;Sum=$1 if test "$Sum" != "179" then echo original size 179, current size $Sum;fi exit 0 -- Kristoffer Eriksson, Peridot Konsult AB, Hagagatan 6, S-703 40 Oerebro, Sweden Phone: +46 19-13 03 60 ! e-mail: ske@pkmab.se Fax: +46 19-11 51 03 ! or ...!{uunet,mcvax}!sunic.sunet.se!kullmar!pkmab!ske