sommar@enea.se (Erland Sommarskog) (07/09/89)
When writing my article on Eiffel and national characters I wanted to check what characters Eiffel allows in characters and string literals. The result was somewhat surprising and puzzling. I wrote a class that had 256 features, a0 to a255. The were declared as a0 : character is '\000'; a1 : character is '\001'; but where "\001" was the character itself. First attempt revealed that newline, apostrophe and backslash meant a syntax error on which the compiler gave up, but that was expected. Of the remaining characters the following were regarded as "Invalid character constant": 1-31, 127, 135, 138, 146, 155, 162, 166, 170, 173, 181, 184, 192, 201, 208, 212, 216, 219, 227, 230, 238, 247-255. Those below 128 are obvious. They are non-printing characters in the ASCII set, and it's understandable that Eiffel forbids them to be written explicitly. But above 128? Is ISE using some eight-bit set with gaps in it at the points in the list above? It's probably not a standard set in that case. If Eiffel were to support ISO 8859 - which I think it should - it would forbid 1-31, 127-159 and permit everything else, and it wanted to be restrictive in this area. I'm not sure it should. Next thing I tried was replacing "character" with "string" and the apostrophes with quotes and compiled again. (I also had to remove the quote character from the list of course.) This gave the following output: Pass 1 on class pelle Pass 2 on class pelle Interface has not changed. Pass 4 on class pelle C-compiling pelle "pelle.c", line 3304: unexpected EOF "pelle.c", line 3304: newline in string or char constant "pelle.c", line 3305: syntax error at or near string "));? *** ec: C-compilation canceled That is, what Eiffel didn't allow in character literals, it did allow in strings! Doesn't seem like a consistent behaviour to me. Now, what about the error the C compiler detected? The cause is the very last string, which contains character 255. (Which corre- sponds to lowercase dotted "y" in 8859/1.) Apparently the C compiler takes this end of file. (My knowledge of C and Unix is little, but isn't -1 often a code for end of file? And -1 and 255 is the same thing for a byte.) -- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se Bowlers on strike!
diamond@diamond.csl.sony.junet (Norman Diamond) (07/11/89)
comp.lang.c has been added to the distribution of this followup. Mr. Sommarskog tested the Eiffel compiler to see which characters are accepted in character and/or string literals. The Eiffel compiler generates a portable assembly code (C of course) as intermediate code. In article <102@enea.se> sommar@enea.se (Erland Sommarskog) writes: > Now, what about the error the C compiler detected? The cause is >the very last string, which contains character 255. (Which corre- >sponds to lowercase dotted "y" in 8859/1.) Apparently the C compiler >takes this end of file. (My knowledge of C and Unix is little, but >isn't -1 often a code for end of file? And -1 and 255 is the same >thing for a byte.) Indeed yes. There are periodic flamefests in comp.lang.c, reminding C programmers that they should getchar() into a short or int, instead of into a char, so that they can test the int value correctly against the constant EOF, which is -1. Looks like some programmer wrote a C compiler without knowing how to use C. (This happens a lot.) -- Norman Diamond, Sony Computer Science Lab (diamond%csl.sony.jp@relay.cs.net) The above opinions are claimed by your machine's init process (pid 1), after being disowned and orphaned. However, if you see this at Waterloo, Stanford, or Anterior, then their administrators must have approved of these opinions.
cowan@marob.masa.com (John Cowan) (07/27/89)
In article <10527@socslgw.csl.sony.JUNET> diamond@csl.sony.junet (Norman Diamond) writes: >Mr. Sommarskog tested the Eiffel compiler to see which characters are >accepted in character and/or string literals. The Eiffel compiler >generates a portable assembly code (C of course) as intermediate code. > >In article <102@enea.se> sommar@enea.se (Erland Sommarskog) writes: > >> Now, what about the error the C compiler detected? The cause is >>the very last string, which contains character 255. (Which corre- >>sponds to lowercase dotted "y" in 8859/1.) Apparently the C compiler >>takes this end of file. (My knowledge of C and Unix is little, but >>isn't -1 often a code for end of file? And -1 and 255 is the same >>thing for a byte.) > >Indeed yes. There are periodic flamefests in comp.lang.c, reminding >C programmers that they should getchar() into a short or int, instead >of into a char, so that they can test the int value correctly against >the constant EOF, which is -1. Looks like some programmer wrote a >C compiler without knowing how to use C. (This happens a lot.) On the other hand, it would be better for the Eiffel compiler to emit the sequence "\377" in this case, rather than the character itself. No C program should contain characters from outside the C character set. It's not illegal, merely a poor idea. i n e w s i s a f a s c i s t -- Internet/Smail: cowan@marob.masa.com Dumb: uunet!hombre!marob!cowan Fidonet: JOHN COWAN of 1:107/711 Magpie: JOHN COWAN, (212) 420-0527 Charles li reis, nostre emperesdre magnes Set anz toz pleins at estet in Espagne.
sommar@enea.se (Erland Sommarskog) (07/30/89)
(This comes from comp.lang.eiffel originally. I cross-posted to comp.lang.c and .misc and directed followup to the latter group, since I see this a general language issue. And, besides, I don't read comp.lang.c.) John Cowan (cowan@marob.masa.com) = ">" Norman Diamond (diamond@csl.sony.junet) = ">>" Me = ">>>" I was testing the Eiffel compiler to see which non-ASCII characters it accepted and which it rejected. The compiler generates C as portable assembler, and one of the characters made the C compiler choke: >>> Now, what about the error the C compiler detected? The cause is >>>the very last string, which contains character 255. (Which corre- >>>sponds to lowercase dotted "y" in 8859/1.) Apparently the C compiler >>>takes this end of file. (My knowledge of C and Unix is little, but >>>isn't -1 often a code for end of file? And -1 and 255 is the same >>>thing for a byte.) >>Indeed yes. There are periodic flamefests in comp.lang.c, reminding >>C programmers that they should getchar() into a short or int, instead >>of into a char, so that they can test the int value correctly against >>the constant EOF, which is -1. Looks like some programmer wrote a >>C compiler without knowing how to use C. (This happens a lot.) >On the other hand, it would be better for the Eiffel compiler to emit >the sequence "\377" in this case, rather than the character itself. >No C program should contain characters from outside the C character set. >It's not illegal, merely a poor idea. In that case C better extends its character set pretty quick. And all other languages too. Try to convince the user with a 8859/1 that he has just made a poor choice of a character. The lowercase dotted "y" looks as legal to him as any other printable character. John Cowan the says with one character per line: >inews is a fascist Os rimply replace ">" with some other string, " >" for example. I usually don't comment signature in public, but: > Charles li reis, nostre emperesdre magnes > Set anz toz pleins at estet in Espagne. What on Earth is this for language? Galician? Provencal? -- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se "Hey poor, you don't have to be Jesus!" - Front 242