bertrand@eiffel.UUCP (Bertrand Meyer) (01/21/90)
A planned change for the character set of Eiffel ------------------------------------------------ This is the fifth of a sequence of postings describing the changes planned for version 3 of Eiffel. These are cleanup changes and do not affect anything fundamental. The particular issue discussed here is the character set for the language and the support for national variants. Since the preparation of version 3 is clean-up time, I have finally looked in some detail at these theoretically mundane but practically important questions. I must confess that not much thought went into these aspects when the language was designed, but now is the time to get them right. It's great to have the proper solution for the big things, but why not take care of the little ones as well? I must further admit that I have no particular expertise in this field, and won't be offended if it turns out that someone else has a better answer. I was considerably helped by the contribution made by Erland Sommarskog last July (<101@enea.se>). In fact, the advantage I gained from that posting is almost unfair; Mr. Sommarskog did all the hard work, and I only had to draw the conclusions for Eiffel. (Of course, he bears no responsibility for any deficiency in what follows.) Anyone who wants to contribute comments or criticisms will be well advised to look at Mr. Sommarskog's message first. The solution described below only addresses one-byte codes (such as those used for European languages other than English). No consideration has been given to multi-byte languages. (If we don't leave some work for the standardization committee, they might get bored.) ---------------------------------------------------------------------- |WARNING: The change described here is planned for version 3 of the | |environment, not to be released until late 1990. | | | |Any change in the language supported by Interactive's tools | |will be accompanied by CONVERSION TOOLS to translate ``old'' syntax | |into new. Programmers will NOT need to perform any significant work | |to update existing Eiffel software. | | | |This posting is made solely for the purpose of informing the Eiffel | |community about ongoing developments. Although the posting has been | |preceded by careful reflection and internal discussions within | |Interactive, we make no commitment at this point that the features | |described here will actually be included, and, if they are, that | |their final form will be the exact one shown below. | ---------------------------------------------------------------------- Purpose of the change. ---------------------- Several problems were raised by Mr. Sommarskog with respect to the use of Eiffel on non-American keyboards/terminals. 1. - Characters such as @ At sign [ Opening bracket ] Closing bracket { Opening brace } Closing brace | Vertical bar \ Backslash ^ Circumflex ` Back quote ~ Tilde are often pre-empted by national character set variants. For example, on many French keyboards, [ and ] appear as e' (e with an acute accent) and e` (e with a grave accent). Mr. Sommarskog cited further examples with Swedish keyboards. These character translations make it very unpleasant for programmers working on such keyboards, who have to remember the correspondence between the character in the language manual and their local keyboard equivalents. Mr. Sommarskog went so far as to say that ``Any programming language using any of [these] characters as an operator or a delimiter is committing a crime in my eyes''. (He did not say, however, that the language *designer* was committing a crime, so I feel relatively safe even though I will be traveling to Sweden soon.) The problem does exist in Eiffel since all of the above except ` (back quote) are used in special symbols. (| and ~ will be used for boolean operators in Eiffel 3.) 2. - The syntax of identifiers restricts them to letters, underscores and digits. ``Letters'' here means unaccented letters of the English alphabet. A French programmer would often like to use accented letters in an identifier (e.g. e've'nement, with two acute accents), and similarly for other languages. 3. - The backslash is particularly ``criminal'' since it has an important role in strings and character constants as the ``escape'' for special characters, in the Unix-C tradition. For example a quote in a character constant is \' (backslash-quote). 4. - As a less important but unpleasant point, special characters are specified through a three-digit octal code, as in '\756'. Why force octal? Also, why require exactly three digits, which imply leading zeroes? The language change ------------------- The language change is simple. First, an observation: brackets and braces are (fortunately) not strictly needed syntactically in Eiffel: parentheses would do just as well in the places where these characters are needed. (Brackets are used for generic parameters; braces for selective exports.) As a consequence, parentheses now become legal in those places, although the forms using brackets and braces remain the standard ones for publication of program texts. Brackets and braces will continue to be used as the default form for text produced as output of tools of the Eiffel environment such as ``short'', even if the original class text uses parentheses. (Presumably, a decent troff/TEX/Interleaf/Word adapted to Swedish, Polish or French will still have those characters.) Similarly, equivalents are defined for ^ (for which the equivalent is **, as used in Fortran for exponentiation), ~ (not) etc. Then, the backslash loses its special role as an escape character in character and string constants. It is replaced by the exclamation mark. For example, in a character or string constant: !! means ! !" means " !' means ' !T means tab !N means new line !D(27) means the character of decimal code 27 !O(27) means the character of octal code 27 !X(27) means the character of hexadecimal code 27 etc. The convention for character strings split over two or more lines remains as before, with ! instead of backslash. In all codes involving letters (!T, !D etc.), lower- and upper-case are equivalent. For the last three codes in the above list, note that the numerical value is parenthesized, so that the number of digits is not fixed. Finally, although the default alphabet for identifiers is still the English letters plus digits and underscore, it becomes possible to use others if they are specified in a special file (which could be called ``.characters'' in the Unix implementation). The idea of using a file rather than a compilation option is that if you deliver classes to a customer (possibly in a different country) you will deliver the .characters file as well, ensuring consistent recompilation at the target site; with compilation options this cannot be achieved. Furthermore, a file is more flexible. Obviously, some restrictions are imposed on the characters that may be specified in the .characters file: they may not conflict with characters used in special symbols of the language, such as ``;'' or ``:'', unless these symbols have default substitutes (as with the bracket ``['', whose substitute is ``(''). Just as obviously, once a character has been selected for identifiers through the .characters file, it cannot be used as special symbol any more; for example, if you accept the opening bracket in identifiers because its shows up as e' on your keyboard, then you may not use it as a bracket any more and must resort to parentheses. Discussion ---------- The exclamation mark seems to be the least bad among universally possible choices. Its use as an attention-getter in ordinary language seems to fit well with its above use as a special character marker. We have, of course, considered the obvious objection that a new Eiffel programmer's first attempt may contain the instruction putstring ("Hello world!") which will trigger a compilation error (because the exclamation mark eats the following double quote). Tough luck. At least, we can try to produce a decent error message. -- -- Bertrand Meyer bertrand@eiffel.com
weiner@novavax.UUCP (Bob Weiner) (01/23/90)
In article <236@eiffel.UUCP> bertrand@eiffel.UUCP (Bertrand Meyer) writes:
Then, the backslash loses its special role as an escape character in
character and string constants. It is replaced by the exclamation mark.
If this change were optional for international programmers it would be
fine but if it is required of all programmers it is unacceptable. This
is the equivalent of all of the personal computer vendors that create
their own regular expression syntax. They add no new functionality but
change the character set so that the largest body programmers who use
regular expressions (UNIX programmers) frequently must relearn something
they already understand well. The backslash as a character quote has
stood the test of time for usage, readability, etc. on UNIX systems. If
UNIX International has not seen fit to eliminate its usage, there is no
reason that such should be done in Eiffel.
Just add some mechanism in the character mapping file that lets
programmers use exclamation marks instead of backquotes if they want.
--
Bob Weiner, Motorola, Inc., USENET: ...!gatech!uflorida!novavax!weiner
(407) 364-2087
shelley@atc.sps.mot.com (Norman K. Shelley) (01/24/90)
I heartily agree with weiner's comments on changing the backslash usage to an exclamation point thereby changing a style that has been worked around in the international UNIX world already. In fact I would appreciate the ability to use the exclamation point character as "not" and "/" (for "!=" instead of having to use "/="). A personal mapping file, preprocessor, or whatever to allow my taste (and many Unix/C programmers taste) would be extremely nice. Norman Shelley Motorola - ATC 2200 W. Broadway M350 Mesa, AZ 85202 ...!uunet!dover!atc!shelley shelley@atc.sps.mot.com (602) 962-2473
jacob@gore.com (Jacob Gore) (01/25/90)
/ comp.lang.eiffel / shelley@atc.sps.mot.com (Norman K. Shelley) / Jan 24 1990/ > In fact I would appreciate the ability to use the > exclamation point character as "not" and "/" (for "!=" instead of having > to use "/="). A personal mapping file, preprocessor, or whatever to allow my > taste (and many Unix/C programmers taste) would be extremely nice. This is going too far. It's one thing to accomodate hardware restrictions, but quite another to provide character mappings for the purpose of personal style. Are you going to ship your personal mapping file with each file of source code? And what if there are several people working on one project -- how are you going to associate their mappings with various files? ``#include "normans_key_map.h"''? If "/=" bothers you so much, you can always run your programs through something like "sed -e 's:!=:/=:g'" before letting the compiler (AND other people) see it. Jacob -- Jacob Gore Jacob@Gore.Com boulder!gore!jacob