[net.unix-wizards] Xenix286 Wonders, Bugs, and Patches...

robin@medstar.UUCP (Robin Cutshaw) (05/03/85)

	For the most part I have been very impressed with Xenix286 3.0 for
the AT.  It seems to be fairly fast (except for c-compiles) and has greatly
increased the amount of work that I can accomplish.  The DOS cross-development
utilities work very well (at least so far).

	I have just about completed a device driver for the IBM PC Network
adapter.  Anyone interested in beta testing please let me know.  I will
also be posting an article on writing device drivers for Xenix.

	I have found many wonderful buglets while porting code to Xenix.  Many
of these bugs were found while porting news2.10.2 and HACK, both of which now
work wonderfully (with several modifications).  Patches mentioned below will
follow shortly.  I would be interested in getting any other bugs that you find.



adb:

	./W 'abcd' writes 'cdcd'

	?M just doesn't work	/*/ IBM says this command is not and will not
				    be supported, it will be removed from the
				    documentation

	size of /dev/mem is short by 1/2K.  See related variable in the kernel.

.cshrc/.login :

	set ignoreeof in .cshrc should be in .login  /* IBM agrees */

malloc :

	never returns more than ~32KB (actually 32634) /* IBM says "too bad" */

realloc :

	doesn't work in large model programs

cc :

	order of args is important (like where you put -M2el)

	arrays of exactly 64K bomb the loader with "> 64K array" error, if
	you create an array of 64K-1 it will create a 64K array due to even
	alignment

	no crypt() in libc.a	/* IBM says it cannot ship overseas with this */

	error recovery is terrible (a missing ")" can bomb every following line)

	uname() returns NULL for nodename instead of the nodename in
	/etc/systemid, you can patch the kernel with your nodename at _utsname+8
	(should not exceed 7 characters).

	exec() doesn't always work with the large model due to NULL being placed
	at the end of the arg list instead of (char *)NULL.  This is a
	programmer problem more than a bug.

	system() doesn't work in the large model if the si register is non-zero.
	This is due to the exec() error above.  They used NULL instead of
	(char *) NULL is compiling this routine, therefore any non-zero value
	on the stack prior to the exec call will not give the routine a zero
	terminator.  The si register is on the stack prior to the exec call.

	BITFIELDS initialize wrong.  Bytes are initialized instead of bits
	leading to massive initialization errors down the line.  Don't
	auto-initialize them.

	The -MxTyyyy option is documented wrong.  It is a hex value which
	defaults to 0xffff (don't use the 0x).

	All pointer arithmetic is 16 bits long (including the large model 32 bit
	pointers).  This can get you into trouble if you were crazy enough to
	subtract pointers in differing segments.  (ala HACK).

	The loader will only give you 4004 usable stack space.  You must run
	stackuse to see if that will be enough (GASP!) and use the -F option
	(which uses a hex value i.e. -F 2000 for 8K) to increase stack space.
	This is GROSS!!  Your program may bomb in one of several ways if you
	don't have enough stack allocated.  Stackuse doesn't always work!  The
	first data segment and stack segment share the same 64K segment, even in
	large model programs.  Good luck.

	The O_NDELAY option to open() may not work in some instances.

make :

	make -n doesn't always give you the sequence that make alone will

	make will run out of environment space if you have several things set
	in yours, especially if you run the c-shell

tar :

	one must link /dev/mt1 to /dev/fd0 for tar to work (or your standard
	floppy device)

	tar doesn't back up empty directories (not sure that it should)

	tar uk 1150 won't work properly

restor :

	Will not work if your environment is too large, may not work anyway

function keys :

	These are user defined (GASP!!).  It would have been much better for
	application programmers if they had just given them a standard escape
	sequence and been done with it.  One cannot read their values back to
	see what they are currently set to do, so if you have multiple processes
	changing them you can get into lots of trouble.  The SETKEY command does
	not allow the standard "^*" escapes so one cannot insert control
	characters with this command, use Esc Q x "sss".  All fkey defs are
	erased when you log off.  I will be posting a patch to read the fkeys
	and a patch to permanently set them to some "known" string.

cursor keys/setkey :

	Only Home, left, right, up, and down, are recognized by the kernel.
	There is a table of strings to pass on to ttin for each key and the
	other cursor keys are set to NULL.  I have a patch for this and will
	post shortly.

stackuse :

	Will barf if more than a few programs to scan.
	Uses a .cu file for internal use, but this will overwrite the source
	file if only the .c part is used due to the filename length (BARF!)

vi :

	Cannot edit large files.  One must patch the stacksize in the x.out
	header from 0x8000 to 0x5000.  Will post patch shortly.

csh :

	No pushd-popd
	Doesn't notify you when a background task has finished
	#!/bin/sh doesn't work
	No unsetenv so you are stuck with it

sh :

	#!/bin/sh will execute csh

strings :

	The -o option for offsets doesn't work because they compiled the
	Berkeley version with "%D" instead of "%ld".  This compiler doesn't
	recognize D as a valid format for printf.

stty/gtty :

	If one passes the sgttyb struct to stty that came from gtty, the
	echoe bit will be cleared.

dmesg :

	No man entry.
	/usr/adm/msgbuf must be present before the - option works as seen in
	crontab

cron :

	Even if /etc/default/cron parameter CRONLOG is set to YES the file
	/usr/lib/cronlog must exist to get the log (it doesn't create it)

uucp :

	A bug in S3 uucp make uustat unusable if /usr/lib/uucp and
	/usr/spool/uucp are not on the same file system due to us_open trying
	to link rstat.pid in /usr/spool/uucp to R_stat in /usr/lib/uucp. 
	This was fixed in S5.

	The basic operations guide gives an incorrect example of a dialin/dialout
	script for crontab.  A corrected one follows (/usr/lib/uucp/uuhourly) :

		#
		/bin/disable /dev/tty00
		if ($status == 0) then
			sleep 20
			/usr/lib/uucp/uucico -r1
			/bin/enable /dev/tty00
		endif


EGA support :

	Xenix only supports compatability mode on the EGA.  A patch has already
	been posted to the net to fix this (ret for initCRTC in /xenix).
	Does not support monochrome monitor on EGA.

DOC :

	Many, many, many, misprints!  Lots of spaces between switches and arguments
	(i.e. -M2eT xxxx where it should be -M2eTxxxx).  In the device driver
	section replace all cent signs with [ and most | signs with ].


Patching the kernel :

	Permanent -
			 adb -w /xenix -

	Online -
			 adb -w /xenix /dev/kmem
			 * $x
			 * /m 18 0 e400
			 * ...your patches...


DISKINFO :

	If you use a drive other than the standard type 2 you will have to patch
	the kernel.  It seems that the diskinfo table which corresponds with the
	BIOS table of 15 is incorrect.  Also, /boot will not work if you have
	more than 4 heads.  I will post a patch to this shortly.

	/etc/hdinit will work for type 2 but not well for others.

Shared data :

	The function sdget() must include the size param at all times and the
	mode param one the first call.  This is documented wrong in the manual.
Color :

	Color works well with the ansi escape sequences as described in the DOS
	manual.  No multiple arguments are allowed though (i.e. Esc[#;#m must
	be Esc[#mEsc[#m).

		from .cshrc :

		if ($TERM == ansi)
			set prompt="Esc[36m# Esc[0mEsc[37m" # set blue prompt
		else
			set prompt="# "
		endif

		note : replace Esc with the escape character


Questions, comments, bugs to me...

-- 
----
Robin Cutshaw
Director of Systems Research, MedSTAR, Inc.
..{akgua,gatech,gacsr}!medstar!robin

guy@sun.uucp (Guy Harris) (05/04/85)

> 	no crypt() in libc.a	/* IBM says it cannot ship overseas with this

AT&T also says so.  The problem is that the gov't has some sort of embargo
on exporting "encryption technology", and instead of AT&T trying to get
an exemption for the code in UNIX they just decided to say "no, you can't
export any encryption code" and introduced "International System V".  Feh.

> 	uname() returns NULL for nodename instead of the nodename in
> 	/etc/systemid, you can patch the kernel with your nodename at
>	_utsname+8 (should not exceed 7 characters).

Every other System(III|V) I've seen supported 9 characters (or 8 plus
a '\0' at the end) for the names in the "utsname" structure.  Don't know
why Microsoft changed it...

> 	exec() doesn't always work with the large model due to NULL being
>	placed at the end of the arg list instead of (char *)NULL.  This is a
> 	programmer problem more than a bug.
> 
> 	system() doesn't work in the large model if the si register is
>	non-zero.  This is due to the exec() error above.  They used NULL
>	instead of (char *) NULL...

The former is not at all a bug.  *Good* programmers know to cast null
pointers properly.  Other programmers write code that breaks on all the
machines out there with 16-bit "int"s and > 16-bit pointers...

> 	tar doesn't back up empty directories (not sure that it should)

Standard "tar" doesn't dump directories at all; the only reason non-empty
directories appear is that "tar" creates needed directories when extracting
files.  The 4.xBSD "tar" writes entries for directories as well as files,
which has the side effect that empty directories are dumped.

> csh :
> 
> 	#!/bin/sh doesn't work
> 
> sh :
> 
> 	#!/bin/sh will execute csh

"#!" is a kernel feature, and is only in 4.xBSD and a few other systems
(Masscomp?).  Microsoft didn't pick it up.

> strings :
> 
> 	The -o option for offsets doesn't work because they compiled the
> 	Berkeley version with "%D" instead of "%ld".  This compiler doesn't
> 	recognize D as a valid format for printf.

"printf" was changed in System III so that %[DOX] no longer are equivalent
to %l[dox].  The latter is preferable, both because System III/V don't
support the former and because %D in programs gets turned into a date by
SCCS.  Nobody should be using %D, %O, or %X anymore; %ld, %lo, and %lx work
just as well and will also work in System III/V.

> stty/gtty :
> 
> 	If one passes the sgttyb struct to stty that came from gtty, the
> 	echoe bit will be cleared.

By your mention of "echoe", I presume Xenix 286 is System III or V
compatible.  If so, you should *NOT* be using "stty" or "gtty"!  They are
implemented only as, to quote the comment in the code, a "compatibility
aide"(sic).  They are backwards compatible NOT with V7, but with UNIX/TS
1.0.  UNIX/TS 1.0 didn't support "echoe" (i.e., CRT rubout), but then
neither did V7; as such, AT&T decided to have "stty" clear the "echoe" bit
(rather than leaving it alone).  V7 terminal "ioctl"s will NOT work under
System III/V's tty driver, so programs written for V7 which use "stty" or
"gtty" have to be rewritten; programs written for UNIX/TS 1.0 should be
rewritten so that, among other things, they preserve the setting of the
ECHOE bit.

	Guy Harris

mark@cbosgd.UUCP (Mark Horton) (05/06/85)

In article <2158@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>
>> 	exec() doesn't always work with the large model due to NULL being
>>	placed at the end of the arg list instead of (char *)NULL.  This is a
>> 	programmer problem more than a bug.
>> 
>> 	system() doesn't work in the large model if the si register is
>>	non-zero.  This is due to the exec() error above.  They used NULL
>>	instead of (char *) NULL...
>
>The former is not at all a bug.  *Good* programmers know to cast null
>pointers properly.  Other programmers write code that breaks on all the
>machines out there with 16-bit "int"s and > 16-bit pointers...

Sorry, Guy, I wish it were that simple.  Get out your Unix Programmer's
Manual and look up EXECL(3) (or EXEC(2) on System III/V).  You'll notice
that the DOCUMENTED FORM of the execl call says to pass a zero.   Not
a (char *) 0.  Just a plain old 0.  This worked fine on the PDP-11 and
the VAX, but just in the last few years people have discovered that this
single property of UNIX won't work on a machine with pointers that are
bigger than integers.  This includes some 68K implementations (although
most, including Sun, provide 32 bit integers, presumably this is one reason
why) and anything using the large model on an 8086 family machine.

Of course, the fix is easy, just cast the 0 to (char *), but there is a
lot of code out there that doesn't do this.  Also, if you make the cast,
you have to worry about what happens if your code is run on a machine with
big integers and small pointers that didn't change the interface!

I wonder what the /usr/group standard and the System V standard recently
published say?

>> stty/gtty :
>> 
>> 	If one passes the sgttyb struct to stty that came from gtty, the
>> 	echoe bit will be cleared.
>
>By your mention of "echoe", I presume Xenix 286 is System III or V
>compatible.  If so, you should *NOT* be using "stty" or "gtty"!  They are
>implemented only as, to quote the comment in the code, a "compatibility
>aide"(sic).  They are backwards compatible NOT with V7, but with UNIX/TS
>1.0.  UNIX/TS 1.0 didn't support "echoe" (i.e., CRT rubout), but then
>neither did V7; as such, AT&T decided to have "stty" clear the "echoe" bit
>(rather than leaving it alone).  V7 terminal "ioctl"s will NOT work under
>System III/V's tty driver, so programs written for V7 which use "stty" or
>"gtty" have to be rewritten; programs written for UNIX/TS 1.0 should be
>rewritten so that, among other things, they preserve the setting of the
>ECHOE bit.

You mean UNIX/TS 2.0; 1.0 was released to the public as PWB 1.0.  My
impression of Xenix was that they had enhanced the stty emulation to
the point where it handled more than just what 2.0 did - that CBREAK
was now supported and ECHOE got left alone.  (They didn't add support
for TANDEM, but that's pretty obscure.)  You can argue that you should
not use stty on a System III derivative such as Xenix, but you'll find
that lots of pieces shipped with Xenix, like vi and curses, use it.

It is possible that Xenix 286 has less support for the V7 tty driver
than Xenix 3.0 does; I think I would have noticed if my echoe bit got
cleared every time I went into vi or ran a program that uses curses.
However, not having the source to Xenix, I can't be sure.

henry@utzoo.UUCP (Henry Spencer) (05/08/85)

> I wonder what the /usr/group standard and the System V standard recently
> published say?  [about the last argument to execl]

Both get it right:  the last argument is "(char *)0", not just "0".
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

mash@mips.UUCP (John Mashey) (05/08/85)

> In article <2158@sun.uucp> guy@sun.uucp (Guy Harris) writes:
> >By your mention of "echoe", I presume Xenix 286 is System III or V
> >compatible.  If so, you should *NOT* be using "stty" or "gtty"!  They are
> >implemented only as, to quote the comment in the code, a "compatibility
> >aide"(sic).  They are backwards compatible NOT with V7, but with UNIX/TS
> >1.0.  UNIX/TS 1.0 didn't support "echoe" (i.e., CRT rubout), but then
> >neither did V7; as such, AT&T decided to have "stty" clear the "echoe" bit
	....
Mark Horton <1143@cbosgd.UUCP> writes:
> 
> You mean UNIX/TS 2.0; 1.0 was released to the public as PWB 1.0.  My
> impression of Xenix was that they had enhanced the stty emulation to
> ...

Not quite: more accurately (<bad words> upon the numbering):

PWB/UNIX 1.0  was a Version 6-based system, as was 1.1 & 1.2; only 1.0
was released outside, as I recall [which was too bad: 1.2 was a really
clean, well-tuned V6].

UNIX/TS 1.0 [my manual says Nov 78] was basically V7 + few kernel changes
derived from PWB + some USG Generic 3 stuff. It's goal was to get at least the
time-sharing kernel interface standard. It didn't have SCCS & other PWB
major user-level subsystems, although little things crept in.

PWB/UNIX 2.0 [June 1979] was UNIX/TS 1.0 + the rest of the PWB stuff.

UNIX 3.0 [June 1980] was System III.  Note there was no UNIX 2.0, whose
number was taken by the last PWB release.  Most of PWB/UNIX 2.0 was included.
This is where ioctl & echoe appear.
-- 
-john mashey
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!glacier!mips!mash
ARPA:	mips!mash@SU-Glacier.ARPA
DDD:  	415-960-1200
USPS: 	MIPS, 1330 Charleston Rd, Mtn View, CA 94043

BARNES@TL-20A.ARPA (05/09/85)

In article <2158@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>
>> 	exec() doesn't always work with the large model due to NULL being
>>	placed at the end of the arg list instead of (char *)NULL.  This is a
>> 	programmer problem more than a bug.
>> 
>> 	system() doesn't work in the large model if the si register is
>>	non-zero.  This is due to the exec() error above.  They used NULL
>>	instead of (char *) NULL...
>
>The former is not at all a bug.  *Good* programmers know to cast null
>pointers properly.  Other programmers write code that breaks on all the
>machines out there with 16-bit "int"s and > 16-bit pointers...

NULL is usually declared in stdio.h as:
	#define NULL 0
Kernighan & Ritchie on page 98 state explicitly that, "In general, integers
cannot meaningfully be assigned to pointers; zero is a special case."  On the
previous page they state, "C guarantees that no pointer that validly points at
data will contain zero, ..."

Thus, the code is not buggy in the least; with or without casting.  If code
such as:
	struct mumble *p;
	...
	p = 0;
does not assign a 32-bit wide zero on a machine with 16-bit int's and 32-bit
pointers then you don't have a buggy program.  You have a buggy compiler that
needs fixing.
-------

gwyn@Brl.ARPA (VLD/VMB) (05/09/85)

You will undoubtedly get a dozen messages saying this, but there
is a world of difference between assigning 0 to a pointer (or
testing a pointer against 0) and passing an (int)0 to a function
where a (whatever *)0 is required.  The latter is not only not
guaranteed to work, it is guaranteed NOT to work on machines with
different sizes for (int) and (whatever *).

Hastings@SU-SUSHI.ARPA (Andrew) (05/11/85)

Please!  Let's not start the Null pointer discussion again!

I've seen the same points made umpteen times on INFO-C.  It seems someone
would claim that null pointers don't need to be casted in function calls,
Guy Harris would argue them into submission, and the discussion would die
down.  Two weeks later, someone else who apparently hadn't seen the previous
discussion would make the same claim, and the argument would start all over
again.

I don't know if this is still going on, since I stopped reading INFO-C.

UNIX-WIZARDS is supposed to be about UNIX.  Discussions about C that are
not specific to UNIX should be in INFO-C -- especially confusion about
null pointers!

-Andy
-------

alex@ucla-cs.UUCP (05/12/85)

[Yes, Virginia, people do post without thinking!]

In article <10550@brl-tgr.ARPA> BARNES@TL-20A.ARPA writes:
>In article <2158@sun.uucp> guy@sun.uucp (Guy Harris) writes:
>>> 	exec() doesn't always work with the large model due to NULL being
>>>	placed at the end of the arg list instead of (char *)NULL.  This is a
>>> 	programmer problem more than a bug...
>
>NULL is usually declared in stdio.h as:
>	#define NULL 0
>Kernighan & Ritchie on page 98 state explicitly that, "In general, integers
>cannot meaningfully be assigned to pointers; zero is a special case."  On the
>previous page they state, "C guarantees that no pointer that validly points at
>data will contain zero, ..."
>
>Thus, the code is not buggy in the least; with or without casting...

Aaaaarrrrrggggghhhhhhhh!  How about net.lang.c.bozo?  

When NULL is passed as a parameter the compiler doesn't necessarily know
that NULL is supposed to be a pointer and should possibly be widened or
otherwise modified.  After all NULL is simply 0; the context surrounding
its use determines whether it is a pointer.  Since functions can be
defined in different files and compiled separately, the formal parameter
declarations for the called function aren't guaranteed to be available
when the function is compiled, and one must explicitly cast the NULL
parameter to a pointer if one wants to write correct code.

The bug, as Guy pointed out, is in the programmer rather than the compiler.
And the bug seems to be amazingly difficult to get rid of.

Alex Quilici     alex@ucla-cs   {ucbvax, ihnp4}!ucla-cs!alex

jonk@tekigm.UUCP (Jonathan W. Krueger) (05/14/85)

'#define'ing NULL to be simply 0 (zero) can and will screw up C programs
on the '286 (or '86) which make the assumption that
sizeof(int) == sizeof(int *).  The following exemplefies this:
        baz()
	{
	  ...
	  foo( NULL, more_args_of_various_sizes, ... );
	  ...
	}

	foo( p, more_params_of_various_sizes, ... );
	    struct whatever *p;
	    typeof more_para....;
	{ if ( p == NULL ) /* then */ { ... }
	  ...
	}

The Xenix '286 C compiler generates the call to foo with an "int" 0 (2 bytes),
not a "long" 0 (4 bytes).  foo expects 4 bytes in that position.  This is
partly why on Xenix-286 you find three (or more) packagings of the standard C
library: one for each compiler memory model, small, medium and large.  Another
reason is that larger programs need to use "far jumps" rather than "near
jumps."

Of course, I trust you to tell me if I'm wrong.

			-- Humbly,
			   Jonathan W. Krueger
			...ihnp4!tektronix!tekigm!jonk

			"To change my opinion,
			 you must first change my reality."

robin@medstar.UUCP (Robin Cutshaw) (05/15/85)

    As pointed out earlier, the utsname structure has strings of len 9
not 8, therefore the patch needed to add your nodename is at utsname+9 not
utsname+8 and the length can be 8 characters plus the null.

-- 
----
Robin Cutshaw
uucp:   ...!{akgua,gatech}!medstar!robin

caf@omen.UUCP (Chuck Forsberg WA7KGX) (05/15/85)

>NULL is usually declared in stdio.h as:
>	#define NULL 0
>Kernighan & Ritchie on page 98 state explicitly that, "In general, integers
>cannot meaningfully be assigned to pointers; zero is a special case."  On the
>previous page they state, "C guarantees that no pointer that validly points at
>data will contain zero, ..."
>
>Thus, the code is not buggy in the least; with or without casting.  If code
>such as:
>	struct mumble *p;
>	...
>	p = 0;
>does not assign a 32-bit wide zero on a machine with 16-bit int's and 32-bit
>pointers then you don't have a buggy program.  You have a buggy compiler that
>needs fixing.

The problem is not a buggy compiler (Lord knows there are enough to go
around in PC-AT Xenix) but the difficulty of transforming a sow's ear
(8086 segmmented CPU) into a silk purse (32 bit CPU).

Pushing an int onto the stack pushes 16 bits but a large model program
is expecting to get a 32 bit magic cookie (pointer to char).

Passing zero to a function is not as universal as assigning zero to
something.
-- 
Chuck Forsberg WA7KGX	..!tektronix!reed!omen!caf
Omen Technology Inc 17505-V NW Sauvie IS RD Portland OR 97231
Voice: 503-621-3406	Modem: 503-621-3746 (Hit CR's for speed detect)

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (05/17/85)

> 	  foo( NULL, more_args_of_various_sizes, ... );

Coding bug.  This topic has been beaten to death more than once.
Go find a good C book (Harbison & Steele recommended).