broman@noscvax.UUCP (Vincent P. Broman) (10/03/85)
<> Both Pascal and Modula-2 offer simple and useful notations for SETs, with builtin operators to manipulate them, making a powerful mathematical abstraction available in a high-level form. I use sets often. Unfortunately, the warning that SETs may be restricted by the implementor to base types with few (8, 16?) elements, forces me often to replace SETs by ARRAYs of SETs and write procedures to manipulate them, in order to write portable code. This is clumsy and detracts much from their notational convenience. I would suggest that rule 2 on page 65 of PIM2 end approximately as follows: "2. ... set a limit to the number of elements admissible in base types. This should be no more restrictive than memory or segmentation considerations necessitate. In any case, base types of up to N (256?) elements shall be supported." Supporting larger SETs in the compiler would probably require that some SET operations be realized as single machine instructions, others as subroutine calls. This is a reasonable price to pay to make SET types machine independent. Wirth slipped a bit when he said on the page (65) mentioned above, that "set types are a powerful tool ... to express operations on individual bits of an operand ...." Actually, the BITSET is the low-level tool used to manipulate bits on your machine. The user-defined SET types implement the high-level concept of a mathematical set, and if they are mapped into bit operations by the compiler, that fact has little relevance to how I should use them. Even on a decimal machine BITSET may disappear, but I would expect user-defined SETs to work. Hurray that Modula-2 has SETs. Among the pile of features pasted onto Ada, this important one seems to have been left out. Vincent Broman (619) 225-2365 MILNET: broman@nosc Analysis Branch, code 632 UUCP: {ihnp4,decvax,akgua,dcdwest, Naval Ocean Systems Center allegra,ucbvax}!sdcsvax!noscvax!broman San Diego, CA 92152 ICBM: 32deg 42min N / 117deg 14min W
quiroz@rochester.UUCP (Cesar Quiroz) (10/07/85)
In the posting of the reference, the problem of supporting reasonably large sets is considered. A comment is also made as to BITSETs being a good starting point for that. I want to add a few of my own opinions, in the hope of public scrutiny. Sets are a Good Thing To Have only if you can manage to use them without undue hassle. Pascal first and the Modula2 provide powerful *paper* set types, that become much less useful thanks to not being supported beyond the width of a machine's word (just a few elements). This, of course, doesn't prevent competent implementors from doing things in a more usable way, but this still makes the portability-minded user beware. And so those enlightened users end up writing routines that amount exactly TO THEIR OWN IMPLEMENTATION OF SETS! Now there is a reason for this (so I think). Doing Sets right in all their glory is hard. In some broad sense, it is similar from doing strings right. Representations that are good for one application slow down another and the decision is hard (or impossible) to make statically (at compile time). So it is almost sure that a good number of applications will end up requiring a different set implementation. Why this happens may be left open for discussion. At any rate, my preference here would be to left Sets (and Strings, for that matter) out of the language, but support them with adequate primitives (usable for more than implementing sets) and standard libraries (you can find a few implementation options that cover most of the applications). This connects to the last point, the support for bit strings. Saying that *sets* are a good mechanism to handle *bits* is going backwards atrociously. When you need to think of *bits*, the least you want is something else to get in your way. I barely remember that I used to be very pleased with the way bit-strings were implemented in AlgolW and still think that that is the way to go. Provide the ability to refer to arrays of bits (and guarantee only that they are densely packed in an implementation dependent way, so they may be different from array of [0..1]) and provide a reasonable way to denote constants (i.e., "masks") and you get an abstraction that can be conveniently mapped to something sensible in most architectures and that allows you to write at least one implementation of sets-of-reasonable-length as a standard library. Cesar -- Cesar Augusto Quiroz Gonzalez Department of Computer Science {allegra|seismo}!rochester!quiroz University of Rochester or Rochester, NY 14627 quiroz@ROCHESTER
bart@reed.UUCP (Bart Massey) (10/08/85)
> In the posting of the reference, the problem of supporting reasonably > large sets is considered. The other referenced poster says that he thinks set ops should be extrinsic to the language, and so should strings. I don't think I buy it for M2, for the simple reason of code legibility. For sets as a mathematical construct, like numeric expressions (another place where "a lot of tradeoffs are made"), OPERATORS, and not functions, are desirable as the atomic elements for manipulation of the object. One of the reasons it isn't very much fun to write mathematical functions in LISP, but is in APL, is that an APL function is symbolically much like what you would write on paper, whereas LISP functions bear no obvious symbolic relation to the equation they represent. And M2 of course does not allow you to define operators. In Algol68 it might make sense to make sets extrinsic. But note also that fixed size sets have a straightforward and highly efficient internal representation on most machines. It seems reasonable to make the compiler generate code to deal with this representation, instead of using more inefficient user code to do it, since set ops are often so useful. I think that this is also a good argument for adding intrinsic ops to the language to implement strings, as well as stacks, queues, and other commonly used data structures. Of course, the tradeoff is, as always, compiler complexity... As per the size of the sets, it seems reasonable to insist that compilers support sets of (bits/byte)*(largest value that fits in a WORD) elements. This is enough to support any reasonable use, and is still efficient to index... Wouldn't it be nice to be able to write the Sieve of Eratosthenes just like you think it? Bart Massey ...tektronix!reed!bart
throopw@rtp47.UUCP (Wayne Throop) (10/09/85)
> In the posting of the reference, the problem of supporting reasonably > large sets is considered. A comment is also made as to BITSETs being > a good starting point for that. I want to add a few of my own opinions, > in the hope of public scrutiny. Ditto. > Sets are a Good Thing To Have only if you can manage to use them without > undue hassle. Agreed. However, I think that you are a little pessimistic about what constitutes "undue hassle" (for the compiler writer). > Pascal first and the Modula2 provide powerful *paper* set > types, that become much less useful thanks to not being supported beyond > the width of a machine's word (just a few elements). Your tack seems to be that supporting sets beyond a machine word is hard for the compiler, and that sets should therefore be dropped. I disagree. I don't find implementing multi-word sets to be any more difficult than implementing, say, complex arithmetic. FORTRAN has had that for years, and nobody thinks of it as unreasonably hard to do. > Doing Sets right in all their > glory is hard. In some broad sense, it is similar from doing strings right. > Representations that are good for one application slow down another and > the decision is hard (or impossible) to make statically (at compile time). > So it is almost sure that a good number of applications will end up requiring > a different set implementation. This is true. But consider substituting "floating point numbers" for "sets" in the above. After all, most languages implement a single format of floating point number (sometimes with a choice of mantissa length). Sometimes this just won't serve the needs of developers, because they need an extended exponent range, or they need different error-propogation behavior, or they need 16-bit floats for space economy, or whatever. I don't take this to mean that languages shouldn't support floating point numbers. Nor do I think the above paragraph shows that sets should not be included in "primitive" language features. > Saying that > *sets* are a good mechanism to handle *bits* is going backwards atrociously. Also true. However, I think that languages should support *both* sets and bit strings as primitive notions. Just because A can be implemented using B is no reason for a language not to support A. Some examples are that characters can be implemented using integers and complex numbers can be implemented using floats. In neither case do I take this to be an argument that characters or complex numbers should not be supported. Similarly, I think that supporting arbitrary-length sets is a reasonable thing for a language to do, and further that Modula "ought to have" done it. > Cesar Augusto Quiroz Gonzalez {allegra|seismo}!rochester!quiroz -- Wayne Throop at Data General, RTP, NC <the-known-world>!mcnc!rti-sel!rtp47!throopw
paul@greipa.UUCP (Paul A. Vixie) (10/10/85)
In article <1982@reed.UUCP> bart@reed.UUCP (Bart Massey) writes: >As per the size of the sets, it seems reasonable to insist that compilers >support sets of (bits/byte)*(largest value that fits in a WORD) elements. >This is enough to support any reasonable use, and is still efficient >to index... Let's see... for a VAX, that's (8) * (2**32) This may seem efficient to index, but to me that's still alot of bits. Even so, I agree with this. A feature I used heavily when writing non-portable Pascal was constant SET OF CHAR expressions out in the code... thus if (ch = 'e') or (ch = 'q') or (ch = 'x') then ... was often written if (ch in ['e', 'q', 'x']) then ... . This was more efficient AND more legible, a rare combination. Wirth decided to limit sets to having no more elements than a WORD has bits because everybody implements sets with bit strings (anybody besides me remember when we used to multiply primes together?), and it's easier to check the status of a bit if you know in advance what byte it's in. But for the additional readability, I'd prefer a compiler smart enough to generate the extra code for manipulating sets larger than a WORD can handle. It won't be efficient, but sets that size would not often be used by system programmers, so the "usual" case (small sets) loses nothing. Even system programmers need large bit strings once in a while. Cached allocation maps for mass storage devices, for example. Will high-level Modula code for selecting an element out of an ARRAY OF BITSET be as efficient as compiler-generated code for SET OF [0..999999] ? I think not. -- Paul Vixie {decwrl dual pyramid}!greipa!paul
quiroz@rochester.UUCP (Cesar Quiroz) (10/11/85)
From article <1982@reed.UUCP> (bart@reed.UUCP (Bart Massey)): > ... >The other referenced poster says that he thinks set ops should be >extrinsic to the language, and so should strings. > >I don't think I buy it for M2, for the simple reason of code legibility. >For sets as a mathematical construct, like numeric expressions (another >place where "a lot of tradeoffs are made"), OPERATORS, and not functions, >are desirable as the atomic elements for manipulation of the object. One >of the reasons it isn't very much fun to write mathematical functions in >LISP, but is in APL, is that an APL function is symbolically much like >what you would write on paper, whereas LISP functions bear no obvious >symbolic relation to the equation they represent. And M2 of course does >not allow you to define operators. In Algol68 it might make >sense to make sets extrinsic. > ... Absolutely. I think there is agreement that most design decisions have to take a side and leave something out in order to make the rest practical to implement. If one needs a language in order to deal with a particular domain, it is best when that language presents the right level of abstraction (i.e., compiler-level support for the necessary entities as first-class objects). So you have numerically oriented languages that give you lots of power in terms of accuracy control and plenty of pertinent types (say, complex numbers of various precisions) and so on. Similarly, if the language must deal mainly with text, it is almost imperative that strings are fully supported (say, that you be able to express the replacement of a substring with another of different length, with all the surgery being hidden). This agreed (so I hope), I must say I failed in my previous posting by not making explicit the design domain I thought corresponded to M2. My personal impression comes from the first line of the Report: "Modula-2 grew out of a practical need for a general, efficiently implementable systems programming language for minicomputers." I think that those early motivations are better served leaving sets out of the language in favor, say, of better handling of bit strings. (My reasons for believing this are documented in too much a lengthy fashion in my previous article.) This doesn't prevent M2 from growing into a sound language for other types of applications, where first-order support for strings and sets would have to be deemed more pertinent. Currently, ICON and SETL are good exponents of how to deal with those domains. The argument in favor of guaranteeing sets of at least N elements, for N a big number, is certainly a good compromise. N<=32 is a total loser. I'd like to notice, also, that Bitsets in M2 are closer to a practical implementation of bit strings than Sets are to their intent. Having to think of "membership" when I mean "being on" is just the class of misuse that I dislike in this aspect of the language. -- Cesar Augusto Quiroz Gonzalez Department of Computer Science {allegra|seismo}!rochester!quiroz University of Rochester or Rochester, NY 14627 quiroz@ROCHESTER
rcd@nbires.UUCP (Dick Dunn) (10/15/85)
> > Pascal first and the Modula2 provide powerful *paper* set > > types, that become much less useful thanks to not being supported beyond > > the width of a machine's word (just a few elements). > > Your tack seems to be that supporting sets beyond a machine word is hard > for the compiler, and that sets should therefore be dropped. I > disagree... I'll also go along with the position that multi-word sets are not that hard to implement. In a project using a language based on Pascal some years ago, we implemented LARGE sets--the upper limit on set size was (for some arcane reasons) 100 000 000 elements. This is more nearly a storage limitation than a limit on set size, and certainly answers most desires for large sets! It did, however, introduce two restrictions which were not present in Pascal's set mechanism: - First, the basetype of a set was required to be evident from its left context. This was due in part to the simplistic analysis in the compiler we were using. It turns out to be a very odd re- striction to explain but one which causes no trouble in practice. An example of a statement violating the constraint is if [1,3,5]=S1 then ... (which can be fixed by exchanging the operands of =) The constraint is always met in assignments and procedure calls. We could have eliminated it. - Second, set types were considered distinct if their basetypes were distinct. This avoids the problems associated with performing set operations on sets with, say, disjoint basetypes which are nonetheless subranges of the same parent type. This is an area in which Pascal is surprisingly (needlessly?) lenient, and may be one of the reasons that large sets are not allowed-- all sets would have to be maximum size, where in our scheme sets were stored in the smallest natural storage unit that would contain the required number of bits. -- Dick Dunn {hao,ucbvax,allegra}!nbires!rcd (303)444-5710 x3086 ...Simpler is better.
broman@noscvax.UUCP (Vincent P. Broman) (10/19/85)
> I'll also go along with the position that multi-word sets are not that hard > to implement. In a project using a language based on Pascal some years... > - First, the basetype of a set was required to be evident from its > left context. This was due in part to the simplistic analysis in > > - Second, set types were considered distinct if their basetypes > were distinct. This avoids the problems associated with > performing set operations on sets with, say, disjoint basetypes > which are nonetheless subranges of the same parent type. This is > -- > Dick Dunn {hao,ucbvax,allegra}!nbires!rcd (303)444-5710 x3086 > ...Simpler is better. I suspect that the difficulties in Doing Strings Right, lie mainly in their having dynamically varying lengths, and in the lack of typename for string literals and expressions. Modula2 allows large SETs to be Done Right, in all their glory, because SET literals and expressions have types and basetypes easily known at compile time, as a result of Wirth's choice of syntax: "settypename{elt,elt,...}" for any thing but a BITSET, ... a clear winner. BTW, the intended domain of application for Modula2 seems to me to embrace general-purpose programming of single-processor computers, including systems work, numerics, CAD, etc. I think so because Lilith was designed to run on M2 alone, no f77, C, or anything else as auxilliary. A good implementation of Modula2 would give Ada a good run for its money in generality of application, save the area of real-time programming only. Vincent Broman (619) 225-2365 MILNET: broman@nosc Analysis Branch, code 632 UUCP: {ihnp4,decvax,akgua,dcdwest, Naval Ocean Systems Center allegra,ucbvax}!sdcsvax!noscvax!broman San Diego, CA 92152 ICBM: 32deg 42min N / 117deg 14min W