dcw@doc.ic.ac.uk (Duncan C White) (06/24/89)
In article <6197@pdn.paradyne.com> alan@oz.paradyne.com (Alan Lovejoy) writes: >Frans van Otten writes: ><Niklaus Wirth writes (in pim3, the report, chapter 6.6): ><= ><=A set type defined as SET OF T .. (where T has) at most N values, ><=where N is a small constant determined by the implementation, usually ><=the computer's wordsize or a small multiple thereof. ><= >.... So the fact that a >compiler supports SET OF CHAR does not make it nonstandard, because 256 >is a small multiple of 32, namely 8. And also because the maximum size >is ultimately TO BE DETERMINED BY THE IMPLEMENTATION. Compilers which >implement small maximum set sizes do so solely at the whim of the implementors. >They are not forced to do this in order to conform to the standard. What you say, Alan, is correct, but IMHO misses the point. As you say, PIM clearly leaves the choice up to the implementor. That is, set size in Modula-2 is strictly defined as being implementation- dependent. (I always like standards double-speak :-) However, this does not ONLY say "it's implementation defined", it says "the minimum (and only) guaranteed range is the machine wordlength". Furthermore, the whole emphasis of his book is on word-sized sets: In Chapter 18 "Set Types", he says: ".. implementations of Modula set a limit to the number of elements (in a set) That limit is often the wordlength of the computer or a small multiple of it. This is quite a small number, say 16 or 32. "Although these rules restrict the generality of the set concept, set types are a powerful tool and allow to express operations <<on individual bits>> of an operand on a high level of abstraction based on a familiar and intuitively appealing mathematical concept." <<>>: My italics, not his :-) It seems very clear to me that he knows that ONLY word-sized sets "restrict the generality of the set concept" but he is only concerned with putting bit-twiddling on a higher plane!!! [Aside: the rationale for this is clear: Modula-2 was designed as a systems programming language, and was used by the ETH group to do bit-twiddling far more than general-purpose programming in which big sets would be useful. Hence BITSETs, module SYSTEM, processes and interrupt handlers] The consequence of this pervasive bias is that most Modula-2 compilers, including all (?) the versions that have come from ETH, only implement word-sized sets. Now, when we compare this situation with Pascal, we find an interesting situation: In the Pascal User Manual and Report (2nd edition) in Chapter 8 : Set Types (page 50) we find: "Implementations of PASCAL may define limits for the size of sets, which can be quite small (eg. the number bits in a word)" At first glance, this might seem to blow my argument away, as Wirth SAYS THE SAME THING about Pascal as he did about Modula-2. However, my argument is rather more subtle (it would be :-), is about the overall tone of the book. Further down page 50, we find: "Examples of set constructors: [13] [i+j,i-j] ['A'..'Z', '0'..'9']" And on page 51: "examples of declarations -and- assignment chset1 : set of 'a'..'z'; chset1 := ['d', 'a', 'g'];" And again, "Set operations are relatively fast, and can be used to eliminate more complicated tests. A simpler test for: if (ch='a') or (ch='b') or (ch='c') or (ch='d') or (ch='z') ... is: if ch in ['a'..'d', 'z'] ..." He then follows this with an example of what to do when you really need a set bigger than whatever maximum size your compiler provides: a Sieve of Eratosthenes to find primes in range 1..10000. He says: "Most implementations would therefore not accept a set with 10000 elements." (!!) I hope it is clear from these quotes that Wirth was envisaging that character-subrange sets (broadly alphanumeric ones) would very probably be supported, whereas sets with several thousand elements would not be supported. [Aside: One should recall, of course, that he was using large CDC6000s then, which had 59 (!) bits in a word, and 64 characters..almost all characters had ordinal values < wordlength] Many of his examples use the idiom: if ch in ['0'..'9'] then The result of the emphasis in the Pascal Report, then, is that compiler writers, while knowing that they were ALLOWED to restrict sets to one machine word, knew that they were EXPECTED to provide an environment in which the example programs would work! So, SET OF CHAR (or failing that the alphanumeric subrange !) was typically allowed, whereas sets with say, a couple of thousand elements were not. Returning to Modula-2, then, I contend that the whole emphasis of the discussions of sets in PIM is designed to give the impression that ONLY word-sized sets must be supported. Note: whether the restriction is Num_elements < wordlength or ORD(lower) < wordlength & ORD(higher) < wordlength is ALSO implementation defined, so not even the set constant {'0'..'9'} is guaranteed! Exit the Pascal idiom :-( In practice, this emphasis means that I, and any other user who is interested in writing programs that are PORTABLE (as well as standard-conforming) cannot safely use SET OF CHAR, or even SET OF ['0'..'9']. The blame for this can only be laid at one door: someone who designs languages for individual projects, sacrificing any generality not needed for that project, then releases them to the world at large, releases new versions of the definition document (PIM) for such reasons as "heh, we've got a new font, so we re-typeset the book" (see preface to 4th edition), and who is clearly incapable of ensuring consistency between the definitive grammar and the railroad diagrams, much less with the grammar snippets and examples given in the book! [jeez... I hope the BSI standard Modula-2 will be better than this... equally, it surely couldn't be worse (could it...) ] Duncan >Alan Lovejoy; alan@pdn; 813-530-2211; AT&T Paradyne: 8550 Ulmerton, Largo, FL. >Disclaimer: I do not speak for AT&T Paradyne. They do not speak for me. >_____________________________Down with Li Peng!________________________________ >Motto: If nanomachines will be able to reconstruct you, YOU AREN'T DEAD YET. [ Reply to: dcw@doc.ic.ac.uk or ...!ukc!icdoc!dcw ] ---------------------------------------------------------------------------- Duncan White, | "In the case of Mr. Lawson, I can't say Dept. Of Computing, | his years as Chancellor have given him Imperial College, | an understanding of the economy, but he can London SW7 | probably find his way around the building" England. | Ian Aitken, Question Time
neitzel@infbs.UUCP (Martin Neitzel) (06/26/89)
One of the original questions was "Is SET OF CHAR a legal type?". Some submissions discussed the contraints suggested or required for the size N of "base type" T in "SET OF T". I just want to point out that "SET OF CHAR" must be illegal because of the other constraints put on T. To narrow the report (already cited by Frans van Otten) even more down: pim3> This must be a subrange type of the integers [...], pim3> or a (subrange of an) enumeration type ... CHAR simply doesn't qualify. There are several places in PIM3 where subranges and enumerations are allowed and it was necessary to add CHAR and BOOLEAN explicitly, for example consider index types of arrays. Chapter 16 in the tutorial section introduces BOOLEAN as a enumeration type. This is conceptually clear and consistent within in the entire language. However, this classification didn't make it into the reference part. Don't ask me why. Likewise, Modula-2 could benefit from viewing CHAR as an enumeration type in my opinion. Martin
alan@oz.nm.paradyne.com (Alan Lovejoy) (06/30/89)
In article <1296@infbs.UUCP> neitzel@infbs.UUCP (Martin Neitzel) writes: >One of the original questions was "Is SET OF CHAR a legal type?". > >I just want to point out that "SET OF CHAR" must be illegal because of >the other constraints put on T. To narrow the report (already cited >by Frans van Otten) even more down: > >pim3> This must be a subrange type of the integers [...], >pim3> or a (subrange of an) enumeration type ... > >CHAR simply doesn't qualify. Your quote comes from page 150. Note that in the phrase "of the integers" the word "integers" is NOT capitalized. Hence it is not clear that Wirth means to authorize only values of type INTEGER. As evidence that Wirth did not mean that, try using one of his compilers to compile the following: ... VAR aCardinal: CARDINAL; BEGIN aCardinal := 0; IF aCardinal IN {0} THEN WriteString("Either mixed CARDINAL-INTEGER expressions are legal,"); WriteString(("or else SETs are not restricted to containing INTEGERS"); END; END ... As further proof, consider the formal definition of set types given in the next paragraph: $ SetType = SET OF SimpleType. On page 147: $ SimpleType = qualident | enumeration | SubrangeType. On page 148: $ SubrangeType = [ident] "[" ConstExpression ".." ConstExpression "]". Examples of subrange types: [0..N-1] ["A".."Z"] [Monday..Friday] The grammar in Appendix 1 says that ConstExpression = expression. Finally, try compiling TYPE CharSet = SET OF [0C..17C]; using a Wirth compiler. The only reasonable conclusion is that SET OF CHAR is legal Modula-2. >There are several places in PIM3 where subranges and enumerations are >allowed and it was necessary to add CHAR and BOOLEAN explicitly, for >example consider index types of arrays. Chapter 16 in the tutorial >section introduces BOOLEAN as a enumeration type. This is conceptually >clear and consistent within in the entire language. However, this >classification didn't make it into the reference part. Don't ask me >why. Likewise, Modula-2 could benefit from viewing CHAR as an >enumeration type in my opinion. Notice that all of Wirth's compilers blithely accept "SET OF BOOLEAN" in spite of the fact that the Report does not grant BOOLEAN the right to be called an enumeration type. PIM simply contains errors, omissions and contradictions. The good doctor wants us to DoWhatHeMeansNotWhatHeSays. Alan Lovejoy; alan@pdn; 813-530-2211; AT&T Paradyne: 8550 Ulmerton, Largo, FL. Disclaimer: I do not speak for AT&T Paradyne. They do not speak for me. ______________________________Down with Li Peng!________________________________ Motto: If nanomachines will be able to reconstruct you, YOU AREN'T DEAD YET.
fransvo@maestro.htsa.aha.nl (Frans van Otten) (06/30/89)
Alan Lovejoy writes: >Martin Neitzel writes: > >>One of the original questions was "Is SET OF CHAR a legal type?". Yesterday I went to the first Dutch ISO-meeting on standardizing Modula-2. I got a copy of the "First Working Draft Modula-2 Standard", from the British Standards Institution. It states about set types: "The base type of the set type shall be an ordinal type". I think this settles all problems. Also, it is proposed that an implementation should at least support SET OF CHAR. (A small problem was mentioned: Chinese etc. character sets are rather big...) -- Frans van Otten | fransvo@maestro.htsa.aha.nl or Algemene Hogeschool Amsterdam | fransvo@htsa.uucp or Technische en Maritieme Faculteit | [[...!]backbone!]htsa!fransvo
neitzel@infbs.UUCP (Martin Neitzel) (07/02/89)
pim3> This must be a subrange type of the integers [...], pim3> or a (subrange of an) enumeration type ... Martin> CHAR simply doesn't qualify. Alan> Note that in the phrase "of the integers" the word "integers" Alan> is NOT capitalized. Hence it is not clear that Wirth means to Alan> authorize only values of type INTEGER. I always understood "integers" in the generic meaning too. It is quite clear that this generic sense is intended here and at other places as well. However, I always thought that only (LONG)INTEGER and CARDINAL would count as integers, i.e. things you can add and multiply etc. And that CHAR would not. This was induced by the following: [PIM3, 6.1 Basic types:] pim3> pim3> The following basic types are predeclared and denoted by pim3> standard identifiers: pim3> pim3> 1. INTEGER comprises the integers between MIN(INTEGER) and pim3> MAX(INTEGER). pim3> 2. CARDINAL comprises the integers between 0 and pim3> MAX(CARDINAL). [...] pim3> 4. CHAR denotes the character set provided by the used pim3> computer system. [...] pim3> 6. LONGINT comprises the integers between MIN(LONGINT) and pim3> MAX(LONGINT). This makes me think that characters are not integers. A closer look at the "definition" of the term "integers" reveils little more: [PIM3, 3. Vocabulary and representation:] pim3> pim3> 2. `Numbers' are (unsigned) integers or real numbers. pim3> Integers are sequences of digits. If the number is followed pim3> by the letter B, it is taken as an octal number; if it is pim3> followed by the letter H, it is taken as a hexadecimal number; pim3> if it is followed by the letter C, it denotes the character pim3> with the given (octal) ordinal number (and is of type CHAR, pim3> see 6.1). As I understand this, there exist objects (characters) that can be denoted via their ordinal number. While this number is an integer, the denoted object must not necessarily be, at least in my opinion. (Else, if there weren't a difference between a character and its number, what should be the reason to take the ORD of a character at all?) Alan> As evidence that Wirth did not mean that [integer only refers Alan> to INTEGER, Martin] , try using one of his compilers to compile Alan> the following: Alan> VAR Alan> aCardinal: CARDINAL; Alan> BEGIN Alan> aCardinal := 0; Alan> IF aCardinal IN {0} THEN Alan> WriteString("Either mixed CARD-INT expressions are legal,"); Alan> WriteString(("or else SETs are not restricted to containing INTs"); Alan> END; Alan> END 1.: I already said that what I consider to be "integer". We both agree in that it includes CARDINALs. No problem here. 2.: The type of an ambigous literal like "0" may be either CARDINAL or INTEGER, as suitable for the actual context. This is why "X+3" works for X declared as INTERGER and as CARDINAL. [PIM3, Ch.3, 2.] 3.: Wirth's own compilers can fail to match his intentions as well as they can fail to match his report. (In both cases deliberately or unwillingly.) Alan> As further proof, consider the formal definition of set types Alan> given in the next paragraph: Alan> Alan> $ SetType = SET OF SimpleType. Alan> [...] This is not the "formal definition of set types". This is only the formal definition of their syntax. You may not simply forget the additional semantic constraints (given in informal language). Alan> Notice that all of Wirth's compilers blithely accept "SET OF Alan> BOOLEAN" in spite of the fact that the Report does not grant Alan> BOOLEAN the right to be called an enumeration type. Nice of him. (Even no sarcasm here.) Alan> PIM simply contains errors, omissions and contradictions. The Alan> good doctor wants us to DoWhatHeMeansNotWhatHeSays. I would happily take the "corrective" approach in my implementation if Wirth had replaced the last paragraph in the "Introduction" of the report with a modest "DoWhatIMean". (I know that one shouldn't expect even a "95% bug free definition" if it shall still be readable at the same time.) As a compiler implementor one also has sometimes obligations to his customers/users. In my case, the CS fresh(wo)men expect the compiler to behave according to the book they bought (preferably only ONE book...:-). They certainly cannot be expected to guess what Wirth may have meant and how a "sensible" implementation should differ from his books. For a mere personal use of the compiler, I wouldn't hesitate to digress from the book. But that's not the case here. Martin
alan@oz.nm.paradyne.com (Alan Lovejoy) (07/03/89)
In article <1303@infbs.UUCP< neitzel@infbs.UUCP (Martin Neitzel) writes:
<I always understood "integers" in the generic meaning too. It is
<quite clear that this generic sense is intended here and at other
<places as well. However, I always thought that only (LONG)INTEGER and
<CARDINAL would count as integers, i.e. things you can add and multiply
<etc. And that CHAR would not.
There is no formal definition given by Wirth of the generic "integer" concept.
Hence there is no basis for claiming that the operations "+", "-", "*", "MOD"
and "DIV" must be defined on any type which qualifies as an "integer" type.
There is, however a formal definition of a generic concept called "ordinal
types." One possible paraphrase of this concept is "a type whose values
or magnitudes are integral--that is, occur in discrete units." There is no
reason that Wirth could not have meant "integer" to be synonymous with
"ordinal type."
Characters are logically integral values. And physically, they are unsigned
(or on some systems, signed) integers. The purpose of the CHAR abstraction
is to hide both the number of bits in a character and the mapping between
integer values and character semantics. If the purpose had included hiding
that fact that characters are integers, then the ORD() and CHR() functions
would neither exist nor be guaranteed to be definable.
<[PIM3, 3. Vocabulary and representation:]
<pim3>
<pim3> 2. `Numbers' are (unsigned) integers or real numbers.
<pim3> Integers are sequences of digits. If the number is followed
<pim3> by the letter B, it is taken as an octal number; if it is
<pim3> followed by the letter H, it is taken as a hexadecimal number;
<pim3> if it is followed by the letter C, it denotes the character
<pim3> with the given (octal) ordinal number (and is of type CHAR,
<pim3> see 6.1).
<
<As I understand this, there exist objects (characters) that can be
<denoted via their ordinal number. While this number is an integer,
<the denoted object must not necessarily be, at least in my opinion.
<(Else, if there weren't a difference between a character and its number,
<what should be the reason to take the ORD of a character at all?)
PIM3 clearly states that "33C" is a number of type CHARACTER. And that
a number is either an integer (note the lower case) or a real number.
This quote is unambiguos support form my position.
<As a compiler implementor one also has sometimes obligations to his
<customers/users. In my case, the CS fresh(wo)men expect the compiler
<to behave according to the book they bought (preferably only ONE
<book...:-). They certainly cannot be expected to guess what Wirth may
<have meant and how a "sensible" implementation should differ from
<his books. For a mere personal use of the compiler, I wouldn't hesitate
<to digress from the book. But that's not the case here.
When creating products, implementors must choose who their target market will
be and must attempt to satisfy the needs and expectations of the customers
in that market. The same dilemma faces those who would design and/or decide
on a standard for a programming language.
Customers in the commercial Modula-2 compiler market need and expect to see
SET OF CHAR.
And I would think that educators would want to teach a language that prepared
their students for the realities of the commercial world.
Alan Lovejoy; alan@pdn; 813-530-2211; AT&T Paradyne: 8550 Ulmerton, Largo, FL.
Disclaimer: I do not speak for AT&T Paradyne. They do not speak for me.
______________________________Down with Li Peng!________________________________
Motto: If nanomachines will be able to reconstruct you, YOU AREN'T DEAD YET.
dcw@doc.ic.ac.uk (Duncan C White) (07/04/89)
In article <1296@infbs.UUCP> neitzel@infbs.UUCP (Martin Neitzel) writes: > >I just want to point out that "SET OF CHAR" must be illegal because of >the other constraints put on T. To narrow the report (already cited >by Frans van Otten) even more down: > >pim3> This must be a subrange type of the integers [...], >pim3> or a (subrange of an) enumeration type ... > >CHAR simply doesn't qualify. In article <6328@pdn.paradyne.com> alan@oz.paradyne.com (Alan Lovejoy) replies: >.. Note that in the phrase "of the integers" >the word "integers" is NOT capitalized. Hence it is not clear that Wirth >means to authorize only values of type INTEGER. Well, this is certainly sloppy wording on Wirth's part, but I'm not clear whether Alan was saying that this quote is a sloppy wording of the <<syntax>> or the <<semantics>> of set base types in Modula-2. It seems to me that this quote is the sole definition of the <<semantics>> of set base types. It is immediately followed by "where N is a small constant determined by the implementation", which clearly suggests semantics to me. Alan then outlines a broadly syntactical argument: (cut down to save space) >...consider the formal definition of set types: > >$ SetType = SET OF SimpleType. >$ SimpleType = qualident | enumeration | SubrangeType. >$ SubrangeType = [ident] "[" ConstExpression ".." ConstExpression "]". >$ ConstExpression = Expression .. [Some examples and asides on behaviours of Wirth compilers deleted] .. >.. The only reasonable conclusion is that SET OF CHAR >is legal Modula-2. Alan's argument establishes that SET OF subrange_of_char is permitted syntactically, but does not establish: a). that SET OF CHAR is permitted syntactically. and b). that SET OF CHAR, or come to that even SET OF subrange_of_char, is permitted <<semantically>>. Taking (a) first, the only way the syntax fragment above can possibly generate the string "SET OF CHAR" is by the rule: $ SimpleType = qualident | ... Since CHAR certainly doesn't match the rules for enumeration or SubrangeType!! I would argue that the predeclared simple types (CHAR, BOOLEAN, INTEGER and CARDINAL) should have been explicitly classified as "simple types", but this involves modifying the grammar, which is really a different debate.. In the absence of an explicit syntactic classification, we must fall back on "leave everything to semantics" mode: that is, CHAR, BOOLEAN are classified as qualidents, as indeed is almost any random sequence of alphanumerics and dots. So, the answer to (a) is that SET OF CHAR is syntactically valid. Moving on now to (b): Are SET OF CHAR and SET OF [ "A".."Z" ] semantically valid ? Well, I think Martin's argument, plus the quote being the <<semantic>> definition of set base types in Modula-2, is sufficiently convincing to say that SET OF CHAR is outlawed, since CHAR is simply neither a subrange nor an enumeration. However, I think that Alan's argument about "integers" not being capitalized, taken together with knowing how careless Wirth is with his various editions, is sufficiently persuasive to leave open the possibility that Wirth really meant "a subrange with ordinal values between 0..N-1", which would allow SET OF subrange_of_char, and SET OF [Monday..Friday], both of which would be otherwise disallowed. So, we're right back where I entered this debate: Wirth does not explicitly ban SET OF subrange_of_char, but neither does he explicitly require them. However, I maintain that the <<emphasis>> of Modula-2 is biased heavily towards word sized sets. Alan concludes: >PIM simply contains errors, omissions and contradictions. The good doctor >wants us to DoWhatHeMeansNotWhatHeSays. Amen to that... Duncan >Alan Lovejoy; alan@pdn; 813-530-2211; AT&T Paradyne: 8550 Ulmerton, Largo, FL. >Disclaimer: I do not speak for AT&T Paradyne. They do not speak for me. >____________________________Down with Li Peng!________________________________ >Motto: If nanomachines will be able to reconstruct you, YOU AREN'T DEAD YET. [ Reply to: dcw@doc.ic.ac.uk or ...!ukc!icdoc!dcw ] ---------------------------------------------------------------------------- Duncan White, | "In the case of Mr. Lawson, I can't say Dept. Of Computing, | his years as Chancellor have given him Imperial College, | an understanding of the economy, but he can London SW7 | probably find his way around the building" England. | Ian Aitken, Question Time
jem@sauna.HUT.FI (Johan Myreen) (07/11/89)
In article <990@maestro.htsa.aha.nl> fransvo@maestro.htsa.aha.nl (Frans van Otten) writes:
Yesterday I went to the first Dutch ISO-meeting on standardizing
Modula-2. I got a copy of the "First Working Draft Modula-2
Standard", from the British Standards Institution. It states
about set types:
"The base type of the set type shall be an ordinal type".
I think this settles all problems. Also, it is proposed that
an implementation should at least support SET OF CHAR. (A
small problem was mentioned: Chinese etc. character sets are
rather big...)
I don't see any problem with that. I really don't understand why you
should stop at SET OF CHAR? What is so special about 16 bytes that
makes it more difficult to represent SETs using more bytes than that?
Why not at the same time restrict array indices to some subrange, say
0..1023, or allow at most three-dimensional arrays? Restricting the base
types of sets sounds just as stupid to me.
SET OF INTEGER, for instance, is a perfectly reasonable data type.
Probably exactly the same machine code is generated from a statement
accessing an element in such a set as is generated from accessing an
element from a set of type SET OF CHAR. In both cases you have to
first calculate an offset into an array of bytes (or words) and then
fetch the bit corresponding to the referenced element.
The beutiful thing about Modula-2 sets (at least from a compiler
writers point of view) is that the type and size of every set variable
and set literal are known at compile time. This means that small sets,
which fit in a word, can be implemented quite efficiently. It also
means that allowing SET OF INTEGER does not affect the efficiency of
a relatively small set like SET OF CHAR, even if huge sets would require
more expensive instructions. The situation is analoguos to big arrays
vs. small arrays: small arrays may be indexed more efficiently on some
architectures but that is no reason to prohibit big arrays, is it?
(The type of a set literal had to be declared last time I looked. You
aren't changing that, are you?? I agree it looks nice to write
IF ch IN ['A'..'Z'] THEN ... like in Pascal, but that has really got
nothing to do with the mathematical concept of a set, and is, in my
opinion, an ugly hack.)
A variable of type SET OF INTEGER probably even fits in the memory of
a 16-bit computer. And even if it doesn't fit, that is just as much a
reason to ban them as there is reason to ban big arrays for not
fitting in the memory of some computer. Althoug I said earlier that
big sets may require more expensive instructions (which probably isn't
even true on most machines) I can't see why they would be more
difficult to implement, probably just as much work as handling the
restriction that they can't be larger than SET OF CHAR.
So prove me wrong. If there is a reason for restricting the
size of a Modula-2 set, other than that of insufficient memory, I
would very much like to hear it.
(* Johan Myreen *)
(* jem@spica.hut.fi *)