sommar@enea.UUCP (08/31/87)
I'd like to write a programme that can handle text which contains characters from an extented ASCII set for covering national characters. The LRM seems to totally disregard this, since it states that the character type is ASCII with 128 possible values. Also, Ada only allows you to have printable characters within strings. And printable is defined as the range ' '..'~'. Easy, you might say. Just define a new character type. How? I can't have quoted strings for the new characters, since they are "non-printing". I can't extend the ASCII package (in STANDARD), since it relies on that the character type is already defined. And even if I succeed somhow, how to with Text_io? Will the compiler accept attempt to give Text_io the new character type, even if it's called "character"? Hardly. Have I missed someting? I hope. If not, THIS IS A VERY SERIOUS RESTRICTION IN ADA. I should add that to some extent it is possible to handle these characters. My Ada system (Verdix 5.2A for VAX/Unix) doesn't mind if I read an extended character from a file or if I try to write it. Character'val(ch) on the character returns the correct code. But Character'pos(Character'val(ch)) raises Contraint_error if ch is from the upper half. But this only a little. I want string constants in my programme, it's dead. What do? Read them from a file at start-up? :-) -- Erland Sommarskog ENEA Data, Stockholm sommar@enea.UUCP
mooremj@EGLIN-VAX.ARPA ("MARTIN J. MOORE") (08/31/87)
I encountered the same problem in attempting to use DEC's extended character set. I worked around it by using an UNCHECKED_CONVERSION to stick 8-bit values into CHARACTER objects (thereby making the program erroneous according to the LRM; however, it worked.) For example, to use the DEC control character CSI (= 155) I did: function EIGHT_BIT_CHARACTER is new UNCHECKED_CONVERSION (INTEGER, CHARACTER); CSI : constant CHARACTER := EIGHT_BIT_CHARACTER (155); Characters so defined could then be used in string constants, such as the following: ERASE_SCREEN : constant STRING := CSI & "2J"; -- ANSI erase screen command ------------------------------------------------------------------------------ Martin Moore mooremj@eglin-vax.arpa ------
stt@ada-uts (09/02/87)
You may define your own enumeration type, but you are correct that only "standard" ASCII graphic characters may be used in character and string literals. For "characters" which have no standard ASCII graphic representation, you should define normal non-character enumeration literals. You may then define your own IO package (there is generally nothing "magical" about TEXT_IO, except that the compiler-writer has to provide it) to provide I/O for characters and arrays of these extended characters. Extended "string" literals can be created via concatenate of strings containing only graphic characters and enumeration literals for extended characters. S. Tucker Taft Intermetrics, Inc. Cambridge, MA 02138 P.S.: Here is an example: package Extended_ASCII is type X_Character is (NUL, SOH, . . ., 'a', 'b', . . ., UMLAUT, ALPHA, OMEGA, . . .); type X_String is array(Positive range <>) of X_Character; pragma Pack(X_String); end Extended_ASCII; with Extended_ASCII; use Extended_ASCII; package X_Text_IO is . . . procedure Put_Line(Str : X_String); . . . end X_Text_IO; with Extended_ASCII; use Extended_ASCII; with X_Text_IO; procedure Test is S : constant X_String := "This is an " & ALPHA & " to " & OMEGA & " test." begin X_Text_IO.Put_Line(S); end Test;
sommar@enea.UUCP (09/05/87)
In a recent article colbert@hermix.UUCP writes: >You can create your own Character type by defining an enumeration type that >has character literals. > type Character_Type is (Nul, Del, ..., 'A', 'B', ..., > Koo_Kai, Khoo_Khai, ....); >... >Once you have this character type defined, you can create a string type by >defining an array of this character type: > > type String_Type is array (positive <>) of Character_Type; >... >However, you will have to use catenation to create string_type expressions that >contain your countries special characters (and of course non-printable >characters). >... >As for the I/O of your language specific characters, you will need to create >a Thai_Text_IO (or something equivalent). Ada does not say that Text_IO is >the ONLY text I/O package, only that it is the standard text I/O package. In >this case you need something non-standard. I think Martin Moore's solution was much more simple and elegant. It will work on any Ada system that doesn't check character assignments for Constraint_error. This solution requires one hell lot of work and it isn't portable from OS to another. Yes, I can write my own Text_IO, but guess how fun I find that. And, I will have to write one Text_IO for each OS I want to work with. Guess why there is a standard Text_IO. It gives you a standard interface. But even better, a change in the language definition would be the approriate. It's ridiculus that perfectly good letters are being regarded as illegal and unprintable. -- Erland Sommarskog ENEA Data, Stockholm sommar@enea.UUCP
jmoody@DCA-EMS.ARPA (Jim Moody, DCA C342) (09/09/87)
It's not clear that there's a conflict between Martin Moore's solution to the problem and that of colbert @hermix.UUCP. Colbert is clearly correct that formally one should create a new version of text_io. Martin tells you how to do that on certain targets. There is little thought required to turn Martin's solution into a full-blown text_io package (about three minutes, don't invite A. E. Housman's scorn), and not more than a couple hours typing. The point is that that makes all applications which use (say) Thai_text_io portable in the sense that it isolates the machine dependencies (does Martin'e trick work) into a single package. Which, I thought, was the point. Jim Moody DCA/JDSSC
colbert@hermix.UUCP (colbert) (09/10/87)
In response to my answer to his question about character types sommar@seismo.css.gov (Erland Sommarskog) writes: >I think Martin Moore's solution was much more simple and elegant. It will >work on any Ada system that doesn't check character assignments for >Constraint_error. > This solution requires one hell lot of work and it isn't portable from >OS to another. Yes, I can write my own Text_IO, but guess how fun I find >that. And, I will have to write one Text_IO for each OS I want to work >with. Guess why there is a standard Text_IO. It gives you a standard >interface. Unfortunately, Martin Moore's solution is NOT portable either. It only works because: 1) Unchecked_Conversion is implemented in DEC Ada. 2) The size of type Character objects in DEC is 8 bits. 3) DEC did not give a Constraint_Error on the assignemt (which may be a bug in DEC's implementation). 4) DEC does not "place restrictions on unchecked conversions" (13.10.2 P2); 5) DEC truncates high order bits if the source value if its size is greater than the size of the target type (this is really only a problem with the specific example given by Moore, in that he used the type Integer as the source type as opposed to an 8 bit type). The principle benefit of my proposed solution is the creation of a portable abstraction that represents the problem. Re-implementing a Text I/O for this type is a small price to pay for this benefit (especially when Moore's technique can be used in the implementation of this Text I/O - Sufficiently issolated to prevent major impact on the system that I'm implementing and later porting [as pointed out by another reader of this group]). Take Care, Ed Colbert hermix!colbert@rand-unix.arpa P.S. As an additional comment, at the recent SIGAda Conference, Dr. Dewer indicated that Unchecked_Conversion could be legally implemented to always return 0 no matter what the "value" of the source object was. I did not get a chance to full nail him down on what he ment by this comment, so may be he will respond to this message.
mooremj@EGLIN-VAX.ARPA ("MARTIN J. MOORE") (09/10/87)
> From: colbert <hermix!colbert@rand-unix.ARPA> > Unfortunately, Martin Moore's solution is NOT portable either. It only works > because: > [list of reasons] He is absolutely correct. My solution is non-portable and, as I pointed out in my original message, erroneous as defined by the LRM. My purpose in posting it was to possibly help the original questioner, since the solution does work on the VAX and may work on other machines. It wasn't intended to be a universal solution. The approach suggested by Colbert et al is obviously the way to go to provide portability. Martin Moore ------
barmar@think.COM (Barry Margolin) (09/10/87)
In article <8709100440.AA04224@rand-unix.rand.org> hermix!colbert@rand-unix.ARPA writes: >As an additional comment, at the recent SIGAda Conference, Dr. Dewer indicated >that Unchecked_Conversion could be legally implemented to always return 0 no >matter what the "value" of the source object was. I did not get a chance to >full nail him down on what he ment by this comment, so may be he will respond >to this message. I presume that this refers to the fact that the language doesn't specify what the result of Unchecked_Conversion is, rather it leaves it implementation-defined. In that case, an implementation may return any value, and returning 0 in all cases would be valid. It's not a very useful behavior, but it isn't the purpose of a language spec to define a useful language, merely a portable one. Since you must check the implementation spec to find out what the result is, you'll know whether it is useful. --- Barry Margolin Thinking Machines Corp. barmar@think.com seismo!think!barmar
sommar@enea.UUCP (Erland Sommarskog) (09/12/87)
hermix!colbert@rand-unix.ARPA writes: >In response to my answer to his question about character types >sommar@seismo.css.gov (Erland Sommarskog) writes: >>I think Martin Moore's solution was much more simple and elegant. It will >>work on any Ada system that doesn't check character assignments for >>Constraint_error. >Unfortunately, Martin Moore's solution is NOT portable either. It only works >because: Well, I didn't say it was portable, did I? I would say it is "presumably" portable. > 1) Unchecked_Conversion is implemented in DEC Ada. As far I understand, the LRM doesn't mention Unchecked_character as optional. It's true, however, that it allows for arbitrary restrictions. > 2) The size of type Character objects in DEC is 8 bits. A quite reasonable assumption with the architectutres of today. And the only time, you're in trouble is when the size is exactly seven bits. > 3) DEC did not give a Constraint_Error on the assignemt (which may > be a bug in DEC's implementation). Verdix Ada on VAX/Unix doesn't seem bother, either. It's probably a violation of the langauge definition, but somehow it seems like a frequent violation. (Check your Ada system. Have it read a non-ASCII character and mingle around with it. Do you get Constraint_error?) > 4) DEC does not "place restrictions on unchecked conversions" > (13.10.2 P2); True, as I said under 1), but other hand what reasons for restrictions are there in this case? > 5) DEC truncates high order bits if the source value if its size is > greater than the size of the target type (this is really only a > problem with the specific example given by Moore, in that he used > the type Integer as the source type as opposed to an 8 bit type). Yes, replace Integer with Very_short_integer and it's fixed. >The principle benefit of my proposed solution is the creation of a portable >abstraction that represents the problem. Re-implementing a Text I/O for this >type is a small price to pay for this benefit (especially when Moore's >technique can be used in the implementation of this Text I/O - Sufficiently >issolated to prevent major impact on the system that I'm implementing and later >porting [as pointed out by another reader of this group]). >to this message. Mr. Colbert seems to share my opinion about "presumably" portable. Else he wouldn't propose Moore's technique in Text_io_8_bit. Now, we don't have to rewrite Text_io_8_bit until we meet Ada system does not implement things as we need them. It's perfectly true, that if we stick to Moore's original idea, we have a lot more code to rewrite. If I were to write a 10000-lines system I would surely consider Text_io_8_bit. Presently, I'm not. Just want to write some small pieces of code to demonstrate more a meaningful character comparisons than the use of a simple collating sequence. (Those who read comp.std.internat know what I'm talking about.) And, guys, can't we agree on that it would have been much easier if the language definition in one way or another had given place for a wider character concept than 128 ASCII codes? -- Erland Sommarskog ENEA Data, Stockholm sommar@enea.UUCP
jmoody@DCA-EMS.ARPA (Jim Moody, DCA C342) (09/14/87)
Erland Sommarskog (sommar@enea.uucp) gets to the heart of the
disagreement when he writes:
And, guys, can't we agree on that it would have been much easier
if the language definition in one way or another had given place
for a wider character concept than 128 ASCII codes?
No it wouldn't. Or at least, easier for whom?
Text_IO, remember, is standard. That means that all vendors must support
it. And must support it to all output devices (not just bright terminals).
That means printers with hammers which are limited to the ASCII 95 graphic
characters. The only reasonable way of requiring vendors to support
something more than the 95 characters plus ASCII.HT would be to make
Text_IO generic. This brings its own problems: there is currently no
provision for a generic formal parameter to be restricted to character
type and indeed no requirement that a compiler recognise character types
as a separate semantic category. I do not know that LRM 3.5.2 is
referenced elsewhere in the LRM. This means that doing what Sommarskog
wants imposes costs on a vendor/implementor which are not limited to
the Text_IO package but spread into the middle part of the compiler.
If we have a cost of such magnitude, we are entitled to ask what benefit
to the user community as a whole does it produce. I think that it was a
reasonable decision to limit the standard to the 95 ASCII printables plus
ASCII.HT which means that if someone wants to use other characters, he/she
has to shoulder the cost his/herself rather than have the entire user
community pay. I emphasis that this is a cost/benefit decision which
could change in the future. One of these days, Ada standardisation will
be reopened. If at that point, it is clear that a substantial segment of
the user community is using or wants to use a bigger character set, the
benefits of centralising the cost of supporting them may outweigh. I
doubt that it does now. That is, the cost to Sommarskog of implementing
the subset of text_io which he needs plus the cost to the other users
of implementing the subsets of text_io that they need for the character
sets they want to use is less than the cost (for 137 compilers at last
count) of requiring vendors to support bigger character sets. Maybe I'm
wrong. Maybe there are a thousand applications out there which need
bigger character sets (I think that's the order of magnitude needed for
it to be cheaper on the whole for vendors to support). If there are,
then ISO/ANSII/AJPO probably need to be told.
Usual disclaimer: the opinions expressed are my own and should not be
construed as the opinions of the US Government.
Sorry to go on at such length.
Jim Moody
sommar@enea.UUCP (Erland Sommarskog) (09/15/87)
jmoody@DCA-EMS.ARPA (Jim Moody, DCA C342) writes: >Erland Sommarskog (sommar@enea.uucp) gets to the heart of the >disagreement when he writes: > And, guys, can't we agree on that it would have been much easier > if the language definition in one way or another had given place > for a wider character concept than 128 ASCII codes? >No it wouldn't. Or at least, easier for whom? As you have guessed, I'm not giving in that easy. >Text_IO, remember, is standard. That means that all vendors must support >it. And must support it to all output devices (not just bright terminals). >That means printers with hammers which are limited to the ASCII 95 graphic >characters. That is not an argument. If that is true, we should immediately get away with the lowercase letters. There are printers who don't know them and leaves spaces where thext should have been. And, yes, Text_io allows you send any ASCII character. (Or since when did Put(ASCII.ETX) become illegal?) More on the spot, what the output device does with bits sent to it is beyond the scope of the language definition. Or else all vendors will be in big trouble. How can they assure that ASCII.L_BRACKET always turn up as a left bracket? Sent to a Swedish terminal you are likely to get a capital A with dots over. >The only reasonable way of requiring vendors to support >something more than the 95 characters plus ASCII.HT would be to make >Text_IO generic. This brings its own problems: there is currently no > > ...Long discussion on character generic and cost/benefit The very easy solution is to remove the restriction of character/ string literals to consist of only printable characters. A very cheap modification, even in 137 compilers. Also the language must explicitly allow character codes up to 255, which several compilers already seem to do. (Any one who knows of one that doesn't?) These two changes are the minimum, although the language definition would be cleaner with some packages defining names for all characters. (You need more than one, since there exists, or will exist, more than one ISO standard in parallel.) I could stop here, but let me continue to a definitely more costsome solution. Ed Colbert talked about the virtue of data abstraction when advocating the idea of writng a new Text_io. But the character type is as concrete it can be. Character comparisons based the codes used for communication is ridiculous when you think of it. It happens to give the correct result for English, but that is the one. (With Ada it doesn't really matter, since it seems to disallow other languages. :-) What I like to see is language-dependent comparison operators. Languages being selected somewhere, in the OS, a pragma or set dynamically. This last idea is of course not unique for Ada, but general for all programming langauges. -- Erland Sommarskog ENEA Data, Stockholm sommar@enea.UUCP