[comp.lang.eiffel] Character and string literals

sommar@enea.se (Erland Sommarskog) (07/09/89)

When writing my article on Eiffel and national characters I wanted
to check what characters Eiffel allows in characters and string
literals. The result was somewhat surprising and puzzling.

I wrote a class that had 256 features, a0 to a255. The were
declared as
    a0 : character is '\000';
    a1 : character is '\001';
but where "\001" was the character itself. First attempt revealed
that newline, apostrophe and backslash meant a syntax error on which
the compiler gave up, but that was expected. Of the remaining characters 
the following were regarded as "Invalid character constant": 1-31, 127,
135, 138, 146, 155, 162, 166, 170, 173, 181, 184, 192, 201, 208, 212, 
216, 219, 227, 230, 238, 247-255.
  Those below 128 are obvious. They are non-printing characters in
the ASCII set, and it's understandable that Eiffel forbids them to
be written explicitly. But above 128? Is ISE using some eight-bit
set with gaps in it at the points in the list above? It's probably
not a standard set in that case. If Eiffel were to support ISO 8859 -
which I think it should - it would forbid 1-31, 127-159 and permit
everything else, and it wanted to be restrictive in this area. I'm
not sure it should.

Next thing I tried was replacing "character" with "string" and the
apostrophes with quotes and compiled again. (I also had to remove the
quote character from the list of course.) This gave the following
output:
   Pass 1 on class pelle
   Pass 2 on class pelle
   	Interface has not changed.
   Pass 4 on class pelle
   C-compiling pelle
   "pelle.c", line 3304: unexpected EOF
   "pelle.c", line 3304: newline in string or char constant
   "pelle.c", line 3305: syntax error at or near string "));?
   *** ec: C-compilation canceled
That is, what Eiffel didn't allow in character literals, it did
allow in strings! Doesn't seem like a consistent behaviour to me.
  Now, what about the error the C compiler detected? The cause is
the very last string, which contains character 255. (Which corre-
sponds to lowercase dotted "y" in 8859/1.) Apparently the C compiler
takes this end of file. (My knowledge of C and Unix is little, but
isn't -1 often a code for end of file? And -1 and 255 is the same
thing for a byte.)
-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
Bowlers on strike!

diamond@diamond.csl.sony.junet (Norman Diamond) (07/11/89)

comp.lang.c has been added to the distribution of this followup.

Mr. Sommarskog tested the Eiffel compiler to see which characters are
accepted in character and/or string literals.  The Eiffel compiler
generates a portable assembly code (C of course) as intermediate code.

In article <102@enea.se> sommar@enea.se (Erland Sommarskog) writes:

>  Now, what about the error the C compiler detected? The cause is
>the very last string, which contains character 255. (Which corre-
>sponds to lowercase dotted "y" in 8859/1.) Apparently the C compiler
>takes this end of file. (My knowledge of C and Unix is little, but
>isn't -1 often a code for end of file? And -1 and 255 is the same
>thing for a byte.)

Indeed yes.  There are periodic flamefests in comp.lang.c, reminding
C programmers that they should getchar() into a short or int, instead
of into a char, so that they can test the int value correctly against
the constant EOF, which is -1.  Looks like some programmer wrote a
C compiler without knowing how to use C.  (This happens a lot.)

--
Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.jp@relay.cs.net)
 The above opinions are claimed by your machine's init process (pid 1), after
 being disowned and orphaned.  However, if you see this at Waterloo, Stanford,
 or Anterior, then their administrators must have approved of these opinions.

cowan@marob.masa.com (John Cowan) (07/27/89)

In article <10527@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes:
>Mr. Sommarskog tested the Eiffel compiler to see which characters are
>accepted in character and/or string literals.  The Eiffel compiler
>generates a portable assembly code (C of course) as intermediate code.
>
>In article <102@enea.se> sommar@enea.se (Erland Sommarskog) writes:
>
>>  Now, what about the error the C compiler detected? The cause is
>>the very last string, which contains character 255. (Which corre-
>>sponds to lowercase dotted "y" in 8859/1.) Apparently the C compiler
>>takes this end of file. (My knowledge of C and Unix is little, but
>>isn't -1 often a code for end of file? And -1 and 255 is the same
>>thing for a byte.)
>
>Indeed yes.  There are periodic flamefests in comp.lang.c, reminding
>C programmers that they should getchar() into a short or int, instead
>of into a char, so that they can test the int value correctly against
>the constant EOF, which is -1.  Looks like some programmer wrote a
>C compiler without knowing how to use C.  (This happens a lot.)


On the other hand, it would be better for the Eiffel compiler to emit
the sequence "\377" in this case, rather than the character itself.
No C program should contain characters from outside the C character set.
It's not illegal, merely a poor idea.

i
n
e
w
s

i
s

a

f
a
s
c
i
s
t
-- 
Internet/Smail: cowan@marob.masa.com	Dumb: uunet!hombre!marob!cowan
Fidonet:  JOHN COWAN of 1:107/711	Magpie: JOHN COWAN, (212) 420-0527
		Charles li reis, nostre emperesdre magnes
		Set anz toz pleins at estet in Espagne.

sommar@enea.se (Erland Sommarskog) (07/30/89)

(This comes from comp.lang.eiffel originally. I cross-posted to
comp.lang.c and .misc and directed followup to the latter group,
since I see this a general language issue. And, besides, I don't
read comp.lang.c.)

John Cowan (cowan@marob.masa.com)  = ">"
Norman Diamond (diamond@csl.sony.junet) = ">>"
Me = ">>>"

I was testing the Eiffel compiler to see which non-ASCII characters
it accepted and which it rejected. The compiler generates C as portable
assembler, and one of the characters made the C compiler choke:

>>>  Now, what about the error the C compiler detected? The cause is
>>>the very last string, which contains character 255. (Which corre-
>>>sponds to lowercase dotted "y" in 8859/1.) Apparently the C compiler
>>>takes this end of file. (My knowledge of C and Unix is little, but
>>>isn't -1 often a code for end of file? And -1 and 255 is the same
>>>thing for a byte.)

>>Indeed yes.  There are periodic flamefests in comp.lang.c, reminding
>>C programmers that they should getchar() into a short or int, instead
>>of into a char, so that they can test the int value correctly against
>>the constant EOF, which is -1.  Looks like some programmer wrote a
>>C compiler without knowing how to use C.  (This happens a lot.)

>On the other hand, it would be better for the Eiffel compiler to emit
>the sequence "\377" in this case, rather than the character itself.
>No C program should contain characters from outside the C character set.
>It's not illegal, merely a poor idea.

In that case C better extends its character set pretty quick. And all
other languages too. Try to convince the user with a 8859/1 that he
has just made a poor choice of a character. The lowercase dotted "y"
looks as legal to him as any other printable character.

John Cowan the says with one character per line:
>inews is a fascist

Os rimply replace ">" with some other string, " >" for example.

I usually don't comment signature in public, but:
>		Charles li reis, nostre emperesdre magnes
>		Set anz toz pleins at estet in Espagne.
What on Earth is this for language? Galician? Provencal?
-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
"Hey poor, you don't have to be Jesus!" - Front 242