[comp.os.minix] 1.5.0 upgrade: REPORT + BUGS + PATCHES

bill@chinet.chi.il.us (Bill Mitchell) (01/08/90)

I've been running P-H minix 1.3 on my xt clone using bios_wini.c for about
a year now, with the root on /dev/ram being loaded from /dev/hd3 at boot
time.  I have never had a crash, though I have had numerous unexpected
forced logouts.

I am now running minix 1.5.0 in bios configuration with hd3 as the root.
I really appreciate having the 240K formerly dedicated to /dev/ram back.

I've installed and am using the 1.5.0 shell, and have tried a few of the
1.5.0 commands.  I've had problems with some of them, and so am holding
off on wholesale replacement of 1.3 commands with 1.5.0 commands.

Since upgrading to 1.5.0 a couple of days ago, I've been compiling 1.5.0
commands.  In this process, I've had the system crash hard (CNTL-ALT-DEL
would not bring about a reboot) about half a dozen times.  each time, I
found that /dev/hd3 had been trashed, and had to put the root file system
back on hd3 from a backup copy on floppy.  I've also found /dev/hd3
apparently trashed when rebooting after an uneventful shutdown.  I've
also seen numerous unexpected forced logouts and unexpected forced returns
to a suspended mined session, usually (maybe always ??) associated with an
abnormal program termination.

I'm not a *NIX wizard, and so don't feel comfortable working on lib or
system stuff.  Looking into the problems I've seen with a couple of
commands, however, got me below the application level command code
pretty quickly.  Below are details of what I found in these cases, and
one patch to a lib routine.  Will the ARBITER_OF_PATCHES please look
at this?

Speaking of the ARBITER_OF_PATCHES, is there any accepted mechanism for
identifying "official" patches?  Seems that without one, 1.5.0 will
quickly develop into a bunch of separate flavors as patches like mine
below are posted, and various sites install various subsets of the posted
patches.

-----------------------------------------------------------------------

PROBLEM:  the pwd command didn't work.

This is apparently due to a problem in /usr/lib/posix/getcwd.c.  I had
a deleted directory in the root ahead of the /usr entry, and this was
causing getcwd.c to never find /usr.  Following is a patch to getcwd.c
which seeks to correct this by preventing early cutoff if a deleted or
otherwise invalid entry is encountered:

----------------------- cut here ---------------------------

*** getcwd.c.orig	Sat Jan  7 13:33:42 1989
--- getcwd.c	Sat Jan  7 13:38:08 1989
***************
*** 2,7 ****
--- 2,10 ----
  
  /* Directly derived from Adri Koppes' pwd(1).
   * Modified by Andy Tanenbaum for POSIX (29 Oct. 1989)
+  *
+  * patches:
+  * wtm1 - prevent early cutoff (bill mitchell 07 jan 90)
   */
  
  #include <lib.h>
***************
*** 57,66 ****
  		} else {
  			temp_name[0] = '\0';
  			strncat(temp_name, d.d_name, NAME_MAX);
! 			if (stat(temp_name, &dir_entry) == -1) {
! 				close(fd);
! 				return((char *)NULL);
! 			}
  			if (current.st_dev == dir_entry.st_dev &&
  			    current.st_ino == dir_entry.st_ino)
  				found = 1;
--- 60,67 ----
  		} else {
  			temp_name[0] = '\0';
  			strncat(temp_name, d.d_name, NAME_MAX);
! 			if (stat(temp_name, &dir_entry) == -1)
! 				continue; /* wtm1 */
  			if (current.st_dev == dir_entry.st_dev &&
  			    current.st_ino == dir_entry.st_ino)
  				found = 1;

----------------------- cut here ---------------------------

NOTE:  pwd.c, tar.c, and gather.c use getcwd()

-----------------------------------------------------------------------

PROBLEM:  The ls command didn't work

"ls file" worked, but "ls", "ls .", and "ls /" didnt.

This is apparently due to some problem below readdir(), so dp->d_name 
isn't returned as expected.  Things got complicated pretty fast here.
Knowing nothing about POSIX stuff, I didn't feel qualified pursue it.
I did notice that /usr/include/dirent.h had struct dirent.d_name
declared as "char d_name[1]", which looked suspicious.

croes@fwi.uva.nl (Felix A. Croes) (01/11/90)

In article <1990Jan7.205429.5247@chinet.chi.il.us> bill@chinet.chi.il.us (Bill Mitchell) writes:
> [stuff deleted]
>PROBLEM:  The ls command didn't work
>
>"ls file" worked, but "ls", "ls .", and "ls /" didnt.
>
>This is apparently due to some problem below readdir(), so dp->d_name 
>isn't returned as expected.  Things got complicated pretty fast here.
>Knowing nothing about POSIX stuff, I didn't feel qualified pursue it.
>I did notice that /usr/include/dirent.h had struct dirent.d_name
>declared as "char d_name[1]", which looked suspicious.
		   ^^^^^^^*^

I don't know about POSIX, but it should be at least 14. It was 14 in Minix 1.3,
since file names in directories are 14 bytes long. However, using 14 will cause
problems with numerous incorrect programs which assume that file names in
directories always terminate in \0 (ever tried 1.3 tar?).

The comment says: /* name of file plus a 0 byte */ , so perhaps 15 should be
used? Does any POSIX wizard know?

--

Felix Croes    (croes@fwi.uva.nl)

nfs@notecnirp.Princeton.EDU (Norbert Schlenker) (01/12/90)

In article <363@fwi.uva.nl> croes@fwi.uva.nl (Felix A. Croes) writes:
|In article <1990Jan7.205429.5247@chinet.chi.il.us> bill@chinet.chi.il.us (Bill Mitchell) writes:
|> [stuff deleted]
|>This is apparently due to some problem below readdir(), so dp->d_name 
|>isn't returned as expected.  Things got complicated pretty fast here.
|>Knowing nothing about POSIX stuff, I didn't feel qualified pursue it.
|>I did notice that /usr/include/dirent.h had struct dirent.d_name
|>declared as "char d_name[1]", which looked suspicious.
|
|I don't know about POSIX, but it should be at least 14. It was 14 in Minix 1.3,
|since file names in directories are 14 bytes long. However, using 14 will cause
|problems with numerous incorrect programs which assume that file names in
|directories always terminate in \0 (ever tried 1.3 tar?).
|
|The comment says: /* name of file plus a 0 byte */ , so perhaps 15 should be
|used? Does any POSIX wizard know?
|
|Felix Croes    (croes@fwi.uva.nl)

I'm no POSIX wizard, but the declaration is reasonable.  There has been a long
discussion of this in comp.lang.c, the apparent conclusion of which is that
this is a reasonably portable construct for a structure that is built by
some trusted routine and for which a pointer is returned to a caller (like
the return values from readdir()).

It was pointed out that the structure definition gives the average programmer
the idea that actually allocating such a beast is reasonable.  Regrettably,
this isn't the case here.  Do NOT write programs that use this structure
definition - write programs that use a pointer to it.

Norbert

paula@bcsaic.UUCP (Paul Allen) (01/13/90)

In article <363@fwi.uva.nl> croes@fwi.uva.nl (Felix A. Croes) writes:
+In article <1990Jan7.205429.5247@chinet.chi.il.us> bill@chinet.chi.il.us (Bill Mitchell) writes:
+>
+>This is apparently due to some problem below readdir(), so dp->d_name 
+>isn't returned as expected.  Things got complicated pretty fast here.
+>Knowing nothing about POSIX stuff, I didn't feel qualified pursue it.
+>I did notice that /usr/include/dirent.h had struct dirent.d_name
+>declared as "char d_name[1]", which looked suspicious.
+		   ^^^^^^^*^
+
+I don't know about POSIX, but it should be at least 14. It was 14 in Minix 1.3,
+since file names in directories are 14 bytes long. However, using 14 will cause
+problems with numerous incorrect programs which assume that file names in
+directories always terminate in \0 (ever tried 1.3 tar?).
+
+The comment says: /* name of file plus a 0 byte */ , so perhaps 15 should be
+used? Does any POSIX wizard know?

Well, I'm not a POSIX wizard, but I have tried to figure out the
directory access stuff.  As I recall, dirent structs get malloc'd
with a size that depends on the length of the filename.  The 'd_name'
identifier is then used as a pointer constant.  I don't think the
declaration of dirent.d_name is a problem.  

Paul Allen
-- 
------------------------------------------------------------------------
Paul L. Allen                       | pallen@atc.boeing.com
Boeing Advanced Technology Center   | ...!uw-beaver!bcsaic!pallen

inf2007@vax.rz.uni-wuerzburg.dbp.de (inf2007) (02/13/90)

In .... Bill Mitchell writes:
> "ls file" worked, but "ls", "ls .", and "ls /" didnt.
> Knowing nothing about POSIX stuff, I didn't feel qualified pursue it.
> I did notice that /usr/include/dirent.h had struct dirent.d_name
> declared as "char d_name[1]", which looked suspicious.

See my posting of 31. Jan:
Subject:         Bug in 1.5.0 directory functions/headerfiles

Date:         Wed, 31 Jan 90 00:05:35 GMT

Has anybody recompiled the 1.5 directory functions ?
After hours of working i realized that in the new sys/dir.h
two defines are missing:
DIRENTSIZ	and	DIRENTBASESIZ
The bug is very good hidden because in getdents.c these values,
are defined to wrong values if they are undefined !!

The missing code (earlier in sys/dirent.h ) is:

----------------------------------------
#ifdef BSD_SYSV				/* (e.g., when compiling getdents.c) */
extern struct dirent	__dirent;	/* (not actually used) */
/* The following is portable, although rather silly. */
#define	DIRENTBASESIZ		(__dirent.d_name - (char *)&__dirent.d_ino)

#else
/* The following nonportable ugliness could have been avoided by defining
   DIRENTSIZ and DIRENTBASESIZ to also have (struct dirent *) arguments.
   There shouldn't be any problem if you avoid using the DIRENTSIZ() macro. */

#define	DIRENTBASESIZ		(((struct dirent *)0)->d_name \
				- (char *)&((struct dirent *)0)->d_ino)
#endif

#define	DIRENTSIZ( namlen )	((DIRENTBASESIZ + sizeof(long) + (namlen)) \
				/ sizeof(long) * sizeof(long))
----------------------------------------


Robert Regn				University of Wuerzburg, Germany
	regn@uniwue.uucp
	uunet!mcvax!unido!uniwue!regn