gold@beareq.UUCP (Dan Gold) (12/21/89)
The following scenario frequently results in the unwanted rebooting of a Sun4/280 server: (1) I rlogin to the server (perhaps not incidentally my boot server) from a SparcStation1 to run a program which has substantial memory and i/o requirements. The program is written in C, compiled under SunOS 4.0.3, and uses approximately 20-30 Megabytes of memory at its peek, malloc()'ed in 1, 2, and 8 Megabyte blocks. I/O is via NFS, reading from a disk served by a Sun3/280. XDR is NOT used, however, all structures read from the Sun3 consist SOLELY of 4-byte members. The Sun3 runs SunOs 3.5, the others run 4.0.3. The Sun4/280 has approximately 128 Mb of physical memory. (2) The killer program behaves as follows: Sometimes the program will execute properly. Other times it will hang the SparcStation. This is occasionally accompanied by NFS read errors (NFS Server (the Sun3/280) not responding). When control returns to the SparcStation, it is because the Sun4/280 has rebooted automatically. Then the message "Read Error From Network. Connection reset by peer." appears, and lo, I am rlogged out. Often the program will appear to execute properly, only to see the Sun4/280 reboot later along with the same message, "Read Error From Network. Connection reset by peer." 1. With structures consisting solely of 4-byte members, is it necessary to use XDR when going from a Sun3/280 to a SparcStation? A Sun3/280 to Sun4/280? A Sun4/280 to a SparcStation? I had thought the answer to all of these questions was "No!" 2. Are there any known bugs in 4.0.3 which would allow a user-mode program to crash the server? 3. What affect might rlogin have on this process?