[comp.lang.c] Distinguished pointers

dhesi@bsu-cs.UUCP (Rahul Dhesi) (07/15/87)

In article <6109@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) 
writes:
>There are several UNIX library routines in various implementations
>that attempt to return a -1 value for a function whose return type
>is (char *). . . .
>I would hope that all these
>botches could be fixed (certainly in any proposed standards!). . .
>Phase 2 -- change these functions to return
>NULL (of the appropriate type) on failure.

On the contrary, I think we need more distinguished pointer values, not
just a single zero or NULL value.  I have a set of custom I/O routines
that use the pointer value NOFILE to indicate that no file could be
opened (equivalent to (char *) 0 in current C implementations) and
another pointer value NULLFILE to indicate that the custom I/O library
routines should ignore all output to this file (equivalent to the value
(char *) -1 but conceptually equivalent to opening /dev/null for
output, except that a file descriptor isn't wasted and the existence of
a null device or its name need not be presumed).

Consider again how gets(3) indicates end-of-file and error.  If there
were two distinguished pointer values, one could test for both
end-of-file and for error without using the botched-up errno.

Upward compatibility will prohibit doing this for gets(), but the need
for more than one distinguished pointer is clear.  We already have a
(char *) -1; let's just give it a different name and keep it.  Then
sbrk can return ERRPTR on error, and we can define ERRPTR as 
(char *) -1, and remain fully compatible to boot.
-- 
Rahul Dhesi         UUCP:  {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!dhesi

ron@topaz.rutgers.edu (Ron Natalie) (07/16/87)

> On the contrary, I think we need more distinguished pointer values, not
> just a single zero or NULL value.  I have a set of custom I/O routines
> that use the pointer value NOFILE to indicate that no file could be
> opened (equivalent to (char *) 0 in current C implementations) and
> another pointer value NULLFILE to indicate that the custom I/O library
> routines should ignore all output to this file (equivalent to the value
> (char *) -1 but conceptually equivalent to opening /dev/null for
> output, except that a file descriptor isn't wasted and the existence of
> a null device or its name need not be presumed).

Actually, I think your example is sloppy programming, but there is no
problem with defining a NULLFILE in C.  They already do it with standard
I/O with stdin, stdout, and stderr.  Do this...

    FILE    null_file;
    #define NULLFILE &null_file

Great, now you have a new pointer value, guaranteed to be unique and to
point to nothing else that your I/O routines can test for.

This requires no modifications to existing compilers and it obviates the
need for special case code to map your "-1" pointers to something that is
storable in the architecture that the machine is working with.  Many machines
have no representable "(char *) -1."  Some consider it a botch to even do
(char *) 0.

-Ron

gwyn@brl-smoke.ARPA (Doug Gwyn ) (07/16/87)

In article <846@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>We already have a (char *) -1; let's just give it a different name and keep it.

You missed my point entirely!  "(char *)-1" is MEANINGLESS in some C
implementations and CANNOT be returned as a function value.  Sure, using
symbolic names such as SIG_ERR is helpful, since that permits all
implementations to provide a meaningful definition.  However, my specific
recommendation that routines such as sbrk() be changed to return NULL
((char *)0) instead of (char *)-1 is based on there being just one
"failed" value, along with the desire to be able to "phase in" the new
semantics.  If you really want to have to test for multiple failure modes
every time you use such a function, more power to you, but PLEASE do not
attempt to dictate the numeric values for pointers -- use symbolic names
in the specification.

guy%gorodish@Sun.COM (Guy Harris) (07/16/87)

> On the contrary, I think we need more distinguished pointer values, not
> just a single zero or NULL value.

Well, you're completely wrong.

> I have a set of custom I/O routines that use the pointer value NOFILE to
> indicate that no file could be opened (equivalent to (char *) 0 in current
> C implementations) and another pointer value NULLFILE to indicate that
> the custom I/O library routines should ignore all output to this file
> (equivalent to the value (char *) -1 but conceptually equivalent to
> opening /dev/null for output, except that a file descriptor isn't wasted
> and the existence of a null device or its name need not be presumed).

There are plenty of other ways to do this.  Consider having a flag in whatever
data structure the custom I/O routines refer to that says "ignore all
output to this file".  Then set that bit in the cases where you would
otherwise return NULLFILE.

> Consider again how gets(3) indicates end-of-file and error.  If there
> were two distinguished pointer values, one could test for both
> end-of-file and for error without using the botched-up errno.

RTFM.  You can do that *now*; look up "ferror" and "feof" in the
appropriate manual page.

> Upward compatibility will prohibit doing this for gets(), but the need
> for more than one distinguished pointer is clear.

You haven't shown that yet.

> We already have a (char *) -1; let's just give it a different name and
> keep it.

Let's not.  Consider a system where "(char *)-1" would evaluate to a
pointer that *could* point to a legitimate object.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

chips@usfvax2.UUCP (Chip Salzenberg) (07/19/87)

In article <846@bsu-cs.UUCP>, dhesi@bsu-cs.UUCP writes:
> In article <6109@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) 
> writes:
> >There are several UNIX library routines in various implementations
> >that attempt to return a -1 value for a function whose return type
> >is (char *). . . .
> >Phase 2 -- change these functions to return
> >NULL (of the appropriate type) on failure.
> 
> On the contrary, I think we need more distinguished pointer values, not
> just a single zero or NULL value.  I have a set of custom I/O routines
> that use the pointer value NOFILE to indicate that no file could be
> opened (equivalent to (char *) 0 in current C implementations) and
> another pointer value NULLFILE to indicate that the custom I/O library
> routines should ignore all output to this file.
> [contention that (FILE *) -1 should be portable, so it can be used as
> an alternative invalid pointer]

That's easy to do, and there's no need for "another NULL".
Declare a global variable of type FILE and use its address as a "magic
number".  For example:

--- C code follows ---

FILE nullfile;		/* This is never opened, read, etc. */

...
	FILE *logfile;
	logfile = myopen("logfile", "w");
...
	mywrite(logfile, buf, bufsiz);
...

FILE *myopen(fname, fmode);
{
	if (no_disk_io)
		return (&nullfile);
	else
		return (fopen(fname, fmode));
}

int mywrite(fp, buf, len)
FILE *fp;
char *buf;
int len;
{
	if (fp == NULL)
		big_problem();	/* :-) */
	else if (fp == &nullfile)
		return (len);	/* do nothing */
	else
		return (fwrite(fp, 1, len, buf));
}

--- End of C code ---

> Consider again how gets(3) indicates end-of-file and error.  If there
> were two distinguished pointer values, one could test for both
> end-of-file and for error without using the botched-up errno.

But where would it end?  No, if there must be an invalid pointer -- which
there is :-) -- then it must be one of a kind.

> Rahul Dhesi         UUCP:  {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!dhesi

-- 
Chip Salzenberg            UUCP: "uunet!ateng!chip"  or  "chips@usfvax2.UUCP"
A.T. Engineering, Tampa    Fidonet: 137/42    CIS: 73717,366
"Use the Source, Luke!"	   My opinions do not necessarily agree with anything.

henry@utzoo.UUCP (Henry Spencer) (07/19/87)

> On the contrary, I think we need more distinguished pointer values, not
> just a single zero or NULL value.  I have a set of custom I/O routines
> that use the pointer value NOFILE... and another pointer value NULLFILE...

This is utterly trivial to do without any language extensions whatsoever.
Put the following code fragment into a file and include it in your library:

	char myio_no;
	char myio_null;

And then include this in the include file for your library:

	#define	NOFILE		((FILE *)&myio_no)
	#define	NULLFILE	((FILE *)&myio_null)

This works just fine without any unportable messes like "(char *) -1".
It isn't quite so handy for system calls, although equivalents could be
devised.
-- 
Support sustained spaceflight: fight |  Henry Spencer @ U of Toronto Zoology
the soi-disant "Planetary Society"!  | {allegra,ihnp4,decvax,utai}!utzoo!henry

bilbo.dobson@CS.UCLA.EDU (Peter Dobson) (07/22/87)

> From: Rahul Dhesi <dhesi@bsu-cs.uucp>
> Newsgroups: comp.lang.c
> Subject: Distinguished pointers (was Re: Weird syscall returns)
> Date: 15 Jul 87 16:14:16 GMT
> To:       info-c@brl-smoke.arpa

In article <846@bsu-cs.UUCP> dhesi@bsu-cs.uucp (Rahul Dhesi) writes:

> On the contrary, I think we need more distinguished pointer values, not
> just a single zero or NULL value.  I have a set of custom I/O routines
> that use the pointer value NOFILE to indicate that no file could be
> opened (equivalent to (char *) 0 in current C implementations) and
> another pointer value NULLFILE to indicate that the custom I/O library
> routines should ignore all output to this file (equivalent to the value
> (char *) -1 but conceptually equivalent to opening /dev/null for
> output, except that a file descriptor isn't wasted and the existence of
> a null device or its name need not be presumed).

> Consider again how gets(3) indicates end-of-file and error.  If there
> were two distinguished pointer values, one could test for both
> end-of-file and for error without using the botched-up errno.

> Upward compatibility will prohibit doing this for gets(), but the need
> for more than one distinguished pointer is clear.  We already have a
> (char *) -1; let's just give it a different name and keep it.  Then
> sbrk can return ERRPTR on error, and we can define ERRPTR as 
> (char *) -1, and remain fully compatible to boot.
> -- 
> Rahul Dhesi         UUCP:  {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!dhesi

A portable way to do this is to declare a global data object
that is just used as a location for a pointer to point to, like:

    /* outside a function declaration */
    char nofile, nullfile;

    #define NULLFILE &nullfile
    #define NOFILE &nofile

This way the pointers will be valid values on all machines, and
won't contain a value that is valid in a pointer to point to any
data object (as the characters nullfile, and nofile aren't
used.)

This may fail on machines where pointers can't be cast from type
(char *) to another type and back again.  For example in
MicroSoft C on MS-DOS in some memory models a pointer to
character cast to pointer to function, cast back to pointer to
function, doesn't work.  I don't think this is likely to be a
problem on any implementations where all the pointer types point
to data.

--- Peter

steve@nuchat.UUCP (Steve Nuchia) (08/01/87)

In article <6129@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> symbolic names such as SIG_ERR is helpful, since that permits all
> implementations to provide a meaningful definition.  However, my specific

This reminds me of an issue regarding symbolic constants and the
switch construct that has come up a handful of times porting stuff.

For example, lets say we are switching on an input character, we
wrote the code for UNIX, and we are porint it to OS9.

	switch ( getchar() )
	{
	......
	case '\n':
	case '\r':
		common code for the "enter" morpheme
	......
	}

Now we find ourselves on OS9, where the compiler says that '\n' and
'\r' are both 13.  (They thoughtfully provide '\l' == 11).  Now the
switch doesn't work, even though all it needs is to be collapsed.

This same situation arrises in porting from a featureful unix (say bsd)
to a feature-sparse environment, say a 3b2.  It happens, perhaps not
"often", but it happens, that error indicators and the like wind
up with aliased names and a similar switch syntax error come up.

To "fix" the problem we have to make the code not reverse portable -
we have to break it to get it to work.  I'm not sure I want to require
the compiler to recognise and allow (warning?) the special case of
stacked identical case values, put it might be worth looking into.

	More random thoughts for your consideration,
	Steve Nuchia
	{{soma,academ}!uhnix1,sun!housun}!nuchat!steve

peter@sugar.UUCP (Peter da Silva) (08/01/87)

> Now we find ourselves on OS9, where the compiler says that '\n' and
> '\r' are both 13.  (They thoughtfully provide '\l' == 11).  Now the
> switch doesn't work, even though all it needs is to be collapsed.

While there's nothing inherently wrong with having non linefeed newlines,
the 'C' compiler should deal with it in such a way as not to break
existing code. MS-DOS uses CR/LF as the newline, which is broken, but
at least the fix is handled by the I/O library... which allows normal
text-oriented stuff to work. Of course this breaks other stuff (fseek
on text files... it needn't, but it does).

Suggestion: lie and say '\r' is LF.

Suggestion: give up and accept that if your file formats differ you're
going to have to use #ifdefs. It least it's not like RSX where lines are
variable length records with a word length and a bunch of byte data.

PS: seek is handled badly by a number of compilers and libraries on non
UNIX systems. The UNIX manual states that on some systems offsets are
magic cookies, and that the only reliable way to get an offset in these
systems is to read to that point, yet.

Some libraries don't include seek, because they would have to use magic
cookies.

Some libraries require 'C' programs to access unblocked files only.

One that I know of physically copies a file to an unblocked file when
you open it in 'C', so you can seek.

An Atari-800 'C' compiler implements 'note()' and 'point()' with identical
semantics to ftell() and fseek(), because "you can't seek to a random location
in the file"... which is true but irrelevant.

I wish these people would RTFM.
-- 
-- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter (I said, NO PHOTOS!)

richard@aiva.ed.ac.uk (Richard Tobin, JANET: R.Tobin@uk.ac.ed ) (08/03/87)

In article <8317@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>Put the following code fragment into a file and include it in your library:
>
>	char myio_no;
>	char myio_null;	
>
>And then include this in the include file for your library:
>
>	#define	NOFILE		((FILE *)&myio_no)
>	#define	NULLFILE	((FILE *)&myio_null)

Is this valid in all Ansi-conforming implementations?  That is, is a
comparison like

    if(file == NOFILE)

valid?  My copy of the draft C standard says (apropos of pointer comparisons):

"If the objects pointed to are not members of the same aggregate object, the
result is undefined"

and that seems to apply here.  I assume this restriction is to allow segmented
architectures to just compare the segment offset (or is there another reason?).
Of course, if the routines can be arranged to always return pointers from
the same array, then the out-of-band values could be in the same array, and
all would be well.

My copy of the C standard is a little (18 months) out of date, so maybe this
has changed.
-- 
Richard Tobin,                           JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,               ARPA:  R.Tobin%uk.ac.ed@cs.ucl.ac.uk
Edinburgh University.                    UUCP:  ...!ukc!ed.ac.uk!R.Tobin

mouse@mcgill-vision.UUCP (der Mouse) (08/05/87)

In article <8317@utzoo.UUCP>, henry@utzoo.UUCP (Henry Spencer) writes:
>> I think we need more distinguished pointer values [...].  I have
>> [need for other FILE * values]

> This is utterly trivial to do [in current C].  Put the following code
> fragment into a file and include it in your library:
> 	char myio_no;
> 	char myio_null;
> And then include this in the include file for your library:
> 	#define	NOFILE		((FILE *)&myio_no)
> 	#define	NULLFILE	((FILE *)&myio_null)
> This works just fine without any unportable messes like "(char *)-1".

Hm.  I see no guarantee that NOFILE != NULLFILE.  Do it right:

FILE myio_no;
FILE myio_null;
#define NOFILE (&myio_no)
#define NULLFILE (&myio_null)

					der Mouse

				(mouse@mcgill-vision.uucp)

kent@xanth.UUCP (Kent Paul Dolan) (08/05/87)

In article <274@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes:
[...]
>For example, lets say we are switching on an input character, we
>wrote the code for UNIX, and we are [porting] it to OS9.
>
>	switch ( getchar() )
>	{
>	......
>	case '\n':
>	case '\r':
>		common code for the "enter" morpheme
>	......
>	}
>
>Now we find ourselves on OS9, where the compiler says that '\n' and
>'\r' are both 13.  (They thoughtfully provide '\l' == 11).  Now the
>switch doesn't work, even though all it needs is to be collapsed.
>	Steve Nuchia
>	{{soma,academ}!uhnix1,sun!housun}!nuchat!steve

What ends up happening if the cases didn't have common code; that is,
you did something different for '\n' than for '\r'?  Seems you could
have 1) compiler bletches, or 2) subtle error sneaks through.

Kent, the man from xanth.

kdmoen@watcgl.UUCP (08/05/87)

In article <274@nuchat.UUCP> steve@nuchat.UUCP (Steve Nuchia) writes:
>	switch ( getchar() )
>	{
>	......
>	case '\n':
>	case '\r':
>		common code for the "enter" morpheme
>	......
>	}

>Now we find ourselves on OS9, where the compiler says that '\n' and
>'\r' are both 13.  (They thoughtfully provide '\l' == 11).  Now the
>switch doesn't work, even though all it needs is to be collapsed.

>To "fix" the problem we have to make the code not reverse portable -
>we have to break it to get it to work.  I'm not sure I want to require
>the compiler to recognise and allow (warning?) the special case of
>stacked identical case values, put it might be worth looking into.

This isn't pretty, but it solves your problem:
	switch ( getchar() )
	{
	......
	case '\n':
#if '\r' != '\n'
	case '\r':
#endif
		common code for the "enter" morpheme
	......
	}
-- 
Doug Moen
University of Waterloo Computer Graphics Lab
UUCP:     {ihnp4,watmath}!watcgl!kdmoen
INTERNET: kdmoen@cgl.waterloo.edu

flaps@utcsri.UUCP (08/05/87)

>> Now we find ourselves on OS9, where the compiler says that '\n' and
>> '\r' are both 13.  (They thoughtfully provide '\l' == 11).  Now the
>> switch doesn't work, even though all it needs is to be collapsed.

[ "the switch" had a "case '\r':" immediately followed by "case '\n':". ]

In article <452@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
>Suggestion: give up and accept that if your file formats differ you're
>going to have to use #ifdefs...

DON'T USE IFDEFS!  It's very easy to accidently overuse ifdefs, but when
possible you should use the other features of the pre-processor because
they are often a more direct way of saying what you want.

switch(...) {
    case '\n':
#ifndef OS9
    case '\r':

is WRONG.  The test line should be:

#if '\n' != '\r'


-- 

      //  Alan J Rosenthal
     //
 \\ //        flaps@csri.toronto.edu, {seismo!utai or utzoo}!utcsri!flaps,
  \//              flaps@toronto on csnet, flaps at utorgpu on bitnet.


"To be whole is to be part; true voyage is return."

guy%gorodish@Sun.COM (Guy Harris) (08/06/87)

> My copy of the draft C standard says (apropos of pointer comparisons):
> 
> "If the objects pointed to are not members of the same aggregate object, the
> result is undefined"

The October 1, 1986 draft says this apropos of relational operators
on pointers, but !NOT! apropos of equality operators.  Does your
draft says this even about comparisons for equality?

> and that seems to apply here.

Since this is a comparison for equality, it *doesn't* apply.  It
says, apropos of equality operators,

	If two pointers to objects or functions compare equal, they
	point to the same object or function, respectively.

which pretty clearly indicates that comparison of pointers for
equality is meant to work regardless of whether the two pointers
point into the same array or not.

> I assume this restriction is to allow segmented architectures to just
> compare the segment offset (or is there another reason?).

The restriction on *relational* operators is there because there may
not be a straightforward order that can be imposed on addresses in
general (either because the address space is segmented, or for any
other reason); if the two addresses point to members of the same
array, "less than" and "greater than" can be defined purely within
the terms of the language by saying that pointer A is less
than/greater than pointer B iff the array index of the object pointed
to by pointer A is less than/greater than the array index of the
object pointed to by pointer B.

For *equality* operators, it is clear that they want to *forbid*
segmented architectures from just comparing the segment offset; doing
the comparison that way would be horribly bogus and stupid.

So the answer is "yes, this is valid in all ANSI-conforming
implementations."
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

gwyn@brl-smoke.ARPA (Doug Gwyn ) (08/07/87)

In article <5193@utcsri.UUCP> flaps@utcsri.UUCP (Alan J Rosenthal) writes:
-    case '\n':
-#ifndef OS9
-    case '\r':
-is WRONG.  The test line should be:
-#if '\n' != '\r'

It's all wrong anyway -- in C, '\r' and '\n' represent distinct
(whitespace) characters.

drw@cullvax.UUCP (Dale Worley) (08/07/87)

guy%gorodish@Sun.COM (Guy Harris) writes:
> [The Draft Standard] says, apropos of equality operators,
> 
> 	If two pointers to objects or functions compare equal, they
> 	point to the same object or function, respectively.
> 
> which pretty clearly indicates that comparison of pointers for
> equality is meant to work regardless of whether the two pointers
> point into the same array or not.

This statement leaves a really nasty ambiguity:  If two pointers
compare *unequal*, then do they *not* point to the same thing?  It
seems possible that an implementation could have multiple
representations of pointers to a particular object, but how are we to
make sure that

	int X[10];
	int *a, *b;
	a = X;
	b = X;
	a++;
	b++;
	a == b

always returns true?

Dale
-- 
Dale Worley	Cullinet Software		ARPA: cullvax!drw@eddie.mit.edu
UUCP: ...!seismo!harvard!mit-eddie!cullvax!drw
OS/2: Yesterday's software tomorrow	    Nuclear war?  There goes my career!

gwyn@brl-smoke.ARPA (Doug Gwyn ) (08/08/87)

In article <1437@cullvax.UUCP> drw@cullvax.UUCP (Dale Worley) writes:
>This statement leaves a really nasty ambiguity:  If two pointers
>compare *unequal*, then do they *not* point to the same thing?  ...

This is essentially the "aliasing" problem, which was addressed at
the June X3J11 meeting.  See my previous summary from that meeting
(posted yesterday) for the current rules.

richard@aiva.ed.ac.uk (Richard Tobin) (08/09/87)

In article <25045@sun.uucp> guy%gorodish@Sun.COM (Guy Harris)
writes (in reply to me):

>> "If the objects pointed to are not members of the same aggregate object, the
>> result is undefined"
>The October 1, 1986 draft says this apropos of relational operators
>on pointers, but !NOT! apropos of equality operators.  Does your
>draft says this even about comparisons for equality?

No, but it doesn't say this either (at least not in the section entitled
"C.3.9 Equality operators"):

>	If two pointers to objects or functions compare equal, they
>	point to the same object or function, respectively.

All it says is:

"The == (equal to) and the != (not equal to) operators are analogous to
the relational operators except for their lower precedence."

So it seems that the standard has been clarified since the version
I have.

>For *equality* operators, it is clear that they want to *forbid*
>segmented architectures from just comparing the segment offset; doing
>the comparison that way would be horribly bogus and stupid.

It certainly would, which is why I wanted to check up on it.  Thanks
for your help.
-- 
Richard Tobin,                           JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,               ARPA:  R.Tobin%uk.ac.ed@cs.ucl.ac.uk
Edinburgh University.                    UUCP:  ...!ukc!ed.ac.uk!R.Tobin

levy@ttrdc.UUCP (08/11/87)

In article <6242@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
< In article <5193@utcsri.UUCP> flaps@utcsri.UUCP (Alan J Rosenthal) writes:
< -    case '\n':
< -#ifndef OS9
< -    case '\r':
< -is WRONG.  The test line should be:
< -#if '\n' != '\r'
< 
< It's all wrong anyway -- in C, '\r' and '\n' represent distinct
< (whitespace) characters.

What is an implementation of C supposed to do on an OS/machine/character-code
combination that doesn't have the foggiest that there is such a thing
as distinct "new line" and "carriage return" characters?  From the looks of
the discussion here, I'd gather that OS9 is just such a beast and its C
compiler is making the best of this brain damaged situation that it can.

[Or perhaps, as Guy Harris likes to say, it "ain't C." :-) ]
-- 
|------------Dan Levy------------|  Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa,
|         an Engihacker @        |		vax135}!ttrdc!ttrda!levy
| AT&T Computer Systems Division |  Disclaimer:  i am not a Yvel Nad
|--------Skokie, Illinois--------|

guy%gorodish@Sun.COM (Guy Harris) (08/12/87)

> < It's all wrong anyway -- in C, '\r' and '\n' represent distinct
> < (whitespace) characters.
> 
> What is an implementation of C supposed to do on an OS/machine/character-code
> combination that doesn't have the foggiest that there is such a thing
> as distinct "new line" and "carriage return" characters?

1) Give up in despair.

2) Fake it.

What they are NOT supposed to do, under ANY circumstance, is to make '\r' and
'\n' be the same thing!  Doug is right; they *are* distinct characters.  From
K&R:

	2.4.3 Character constants

	   Certain non-graphic characters, the single quote ' and the backslash
	   \, may be represented according to the following table of escape
	   sequences:

		newline		NL (LF)		\n
		horizontal tab	HT		\t
		backspace	BS		\b
		carriage return	CR		\r
		form feed	FF		\f
		backslash	\		\\
		single quote	'		\'
		bit pattern	<ddd>		\<ddd>

And from the ANSI C standard:

	2.2.2 Character display semantics

	   The *active position* is that location on a display device where the
	next character output by the "fputc" function would appear.  The intent
	of writing a printable character (as defined by the "isprint" function)
	to a display device is to display a graphic representation of that
	character at the active position and then advance the active position
	to the next position on the current line.  The direction of printing is
	locale-specific.  If the active position is at the final position of a
	line (if there is one), the behavior is unspecified.

	   Alphabetic escape sequences representing nongraphic characters in
	the execution character seet are intended to produce actions on display
	devices as follows:

	...

	\n (*new line*) Moves the active position to the initial position of
	   the next line.

	\r (*carriage return*) Moves the active position to the initial
	   positiion of the current line.

	...

	   Each of these escape sequences shall produce a unique
	implementation-defined value which can be stored in a single "char"
	object.  The external representations in a text file need not be
	identical to the internal representations, and are outside the scope of
	this Standard.

On top of all this, having '\n' and '\r' be the same character violates the
Principle of Least Surprise, at least if the system uses ASCII (which I presume
OS9 does).

> From the looks of the discussion here, I'd gather that OS9 is just such a
> beast and its C compiler is making the best of this brain damaged situation
> that it can.

Uh-uh.  No way.  They had no excuse; they screwed up.

If they could, in any way, cause a display device to do the aforementioned, and
can represent these instructions to the display device in a file, they should
have arranged that the C library produce the appropriate instructions on
output.  If they couldn't do that, then they either shouldn't have implemented
C ("give up in despair") or they should have had '\n' be LF (i.e., '\012') and
'\r' be CR (i.e., '\015'), and had the *C library* act the same way when told
to output either character ("fake it").

In other words, it really *ain't* C.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

peter@sugar.UUCP (Peter da Silva) (08/15/87)

> What is an implementation of C supposed to do on an OS/machine/character-code
> combination that doesn't have the foggiest that there is such a thing
> as distinct "new line" and "carriage return" characters?  From the looks of
> the discussion here, I'd gather that OS9 is just such a beast and its C
> compiler is making the best of this brain damaged situation that it can.

There is no "new line" character in ASCII. UNIX uses "line feed" as the new
line character. OS/9 uses "carriage return". I'd say the 'C' language itself
is suffering from parochialism here.

Disclaimer: I have never used OS/9, and I don't know anything more of this
aspect of the operating system than what I have read here. I would say that
since OS/9 is a UNIX lookalike, using CR for NL instead of LF was probably
not the best choice... but at least it's better than using both of them.
-- 
-- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter (I said, NO PHOTOS!)

am@cam-cl.UUCP (08/17/87)

In article <854@mcgill-vision.UUCP> mouse@mcgill-vision.UUCP (der Mouse) writes:
>> 	char myio_no;
>> 	char myio_null;
>> And then include this in the include file for your library:
>> 	#define	NOFILE		((FILE *)&myio_no)
>> 	#define	NULLFILE	((FILE *)&myio_null)
>
>Hm.  I see no guarantee that NOFILE != NULLFILE.  Do it right:
>
>FILE myio_no;
>FILE myio_null;
>#define NOFILE (&myio_no)
>#define NULLFILE (&myio_null)
>
Hm.  I see no guarantee that FILE is not a function type, which invalidates
your suggestion.  :-)

mouse@mcgill-vision.UUCP (der Mouse) (08/18/87)

[someone recommends the following, for getting distinguished FILE * values]
>> char myio_no;
>> char myio_null;
>> #define NOFILE   ((FILE *)&myio_no)
>> #define NULLFILE ((FILE *)&myio_null)
[then I say]
> Hm.  I see no guarantee that NOFILE != NULLFILE.  Do it right:
> FILE myio_no;
> FILE myio_null;
> #define NOFILE (&myio_no)
> #define NULLFILE (&myio_null)

I got a letter from someone I can't reply to (aimt!breck - aimt isn't
in our uucp maps), asking why this change is necessary:

> I would have sworn that the original would have sufficed.  The two
> character variables {myio_no,myio_null} are different variables and
> the compiler had better put them in different locations.  If you
> don't mind taking the time to reply, how can NOFILE and NULLFILE be
> equal in the first case?

Well, since I can't mail, and there are likely others suffering from
the same confusion, I'll explain.

&myio_no is definitely not equal &myio_null.  However, this is not true
of ((FILE *)&myio_no) and ((FILE *)&myio_null).  (To be sure, most
common machines are byte-addressable and have C implementations that
would indeed result in NOFILE != NULLFILE, but portability only among
the common byte-addressable machines isn't what we're after here.)

Let us postulate a word-addressed 32-bit machine (by "word" I mean a
16-bit quantity).  On this machine, let us say, a natural pointer is a
32-bit quantity, meaning the address space is 8 gigabytes (2^32 x 2
bytes).  A char will surely be 8 bits.  A char pointer will then be a
32-bit pointer plus at least one more bit indicating which of the two
chars in the addressed word the pointer points to.

Now let us suppose that structures must be word-aligned (very probable
on a word-addressed machine).  Then also casting from (char *) to
(FILE *) will probably just consist of dropping the extra bit (this is
permitted by the C definition ever since K&R, see below for supporting
quote).

Now let us suppose that the compiler optimizes for space (possibly at
the direction of the user) and puts myio_no and myio_null in the same
word.  Then &myio_no != &myio_null, but the only difference is in the
extra bit, so ((FILE *)&myio_no) == ((FILE *)&myio_null).

The legality of the postulated property of pointer conversion goes
clear back to K&R, and I doubt it's changed in ANSI, because they are
paying even more attention to portability than K&R were (I don't have a
copy of the draft, and won't until it's available in machine-readable
form).  From Appendix A: C Reference Manual, 14.1, Explicit pointer
conversions:

	Certain conversions involving pointers are permitted but have
	implementation-dependent aspects.  [...]

	A pointer may be converted to any of the integral types large
	enough to hold it.  [...]

	An object of integral type may be explicitly converted to a
	pointer.  The mapping always carries an integer converted from
	a pointer back to the same pointer, but is otherwise machine
	dependent.

	A pointer to one type may be converted to a pointer to another
	type.  The resulting pointer may cause addressing exceptions
	upon use [...].  It is guaranteed that a pointer to an object
	of a given size may be converted to a pointer to an object of a
	smaller size and back again without change.

Note that nothing is said about converting a pointer to an object of a
given size to a pointer to an object of a larger size, unless the
original pointer was obtained by the reverse cast, which is not the
case here.

					der Mouse

				(mouse@mcgill-vision.uucp)

guy%gorodish@Sun.COM (Guy Harris) (08/19/87)

> There is no "new line" character in ASCII. UNIX uses "line feed" as the new
> line character. OS/9 uses "carriage return". I'd say the 'C' language itself
> is suffering from parochialism here.

No, not really; this is no more parochial than using "0" (converted to the
appropriate type) to represent a null pointer.  In both cases, it is possible
to properly implement C; you just have to clear your mind of the notion that
the fact that "0" is used to represent a null pointer means that a null pointer
must consist of all zero bits, or that the fact that '\n' stands for LF means
that lines in the native OS's file system must end with an LF.

If OS/9 uses CR as the line-terminator character, the C I/O routines for OS/9
(i.e., "printf", "fputs", etc.) should translate LF into CR on output, and
translate CR into LF on input.  This may, of course, require them not to ignore
the "b" modifier on "fopen" modes, so that in "text" mode this translation is
performed and in "binary" mode it isn't, but that's life.

Of course this raises the question of what the C I/O routines should translate
CR to on output, or what should be translated to CR on input, but then if OS/9
doesn't provide a mechanism to "move the active position (of an output device)
to the initial position of the current line" they do, technically, have a
problem with implementing ANSI C, at least.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

henry@utzoo.UUCP (Henry Spencer) (08/20/87)

> There is no "new line" character in ASCII. UNIX uses "line feed" as the new
> line character. OS/9 uses "carriage return". I'd say the 'C' language itself
> is suffering from parochialism here.

Tsk, tsk.  How many of the people pontificating about ASCII have actually
read the ASCII standard?  Down in the fine print, it says loud and clear
that IF a single character is used for the new line function, it shall be
the one otherwise known as linefeed, and "newline" is then a legitimate
name for it.  I don't vouch for the wording -- my copy of the standard
isn't handy -- but the meaning is clear.  OS/9 is simply wrong here.
-- 
Support sustained spaceflight: fight |  Henry Spencer @ U of Toronto Zoology
the soi-disant "Planetary Society"!  | {allegra,ihnp4,decvax,utai}!utzoo!henry

chips@usfvax2.UUCP (Chip Salzenberg) (08/20/87)

In article <503@sugar.UUCP>, peter@sugar.UUCP (Peter da Silva) writes:
>> What is an implementation of C supposed to do on an OS/machine/character-code
>> combination that doesn't have the foggiest that there is such a thing
>> as distinct "new line" and "carriage return" characters?  From the looks of
>> the discussion here, I'd gather that OS9 is just such a beast and its C
>> compiler is making the best of this brain damaged situation that it can.

This compiler botched the job.

> There is no "new line" character in ASCII. UNIX uses "line feed" as the new
> line character. OS/9 uses "carriage return". I'd say the 'C' language itself
> is suffering from parochialism here.

Not quite; no useful language can pander to every (hostile? :-]) environment.

> I would say that
> since OS/9 is a UNIX lookalike, using CR for NL instead of LF was probably
> not the best choice... but at least it's better than using both of them.

I once re-targeted a UNIX C compiler from the Z-80 to the 6809, so as to
compile OS-9 programs.  My solution to the above-mentioned problem was to
define '\n' as 0x0D and '\r' as 0x0A.  This did not produce the correct
behavior for '\r', but it did prevent the ('\r' == '\n') problem; and since
the OS-9 I/O services consider 0x0D as `newline' (CR-LF), it would have been
_very_ difficult to make '\r' behave correctly anyway.
-- 
Chip Salzenberg            UUCP: "uunet!ateng!chip"  or  "chips@usfvax2.UUCP"
A.T. Engineering, Tampa    Fidonet: 137/42    CIS: 73717,366
"Use the Source, Luke!"    My opinions do not necessarily agree with anything.

peter@sugar.UUCP (Peter da Silva) (08/23/87)

In article <8444@utzoo.UUCP>, henry@utzoo.UUCP (Henry Spencer) writes:
> Tsk, tsk.  How many of the people pontificating about ASCII have actually
> read the ASCII standard?  Down in the fine print, it says loud and clear
> that IF a single character is used for the new line function, it shall be
> the one otherwise known as linefeed, and "newline" is then a legitimate
> name for it.  I don't vouch for the wording -- my copy of the standard
> isn't handy -- but the meaning is clear.  OS/9 is simply wrong here.

The man from the Zoo comes through again.

Wow. Amazing. I never knew that.

I never even thought about looking at the Ascii standard. Where do you get
it?
-- 
-- Peter da Silva `-_-' ...!seismo!soma!uhnix1!sugar!peter (I said, NO PHOTOS!)

henry@utzoo.UUCP (Henry Spencer) (08/27/87)

> I never even thought about looking at the Ascii standard. Where do you get
> it?

Its full name is "ANSI X3.4-1977, American National Standard Code for
Information Interchange".  ANSI is at 1430 Broadway, NYC 10018; be warned
that their prices are high.

In case anyone's interested, the relevant item is in section 5.2, in the
description of LF:

	"Where appropriate, this character may have the meaning "New Line"
	(NL), a format effector that advances the active position to the
	first character position on the next line.  Use of the NL convention
	requires agreement between sender and recipient of data."

There is no analogous clause under the description of CR.
-- 
"There's a lot more to do in space   |  Henry Spencer @ U of Toronto Zoology
than sending people to Mars." --Bova | {allegra,ihnp4,decvax,utai}!utzoo!henry