[comp.lang.modula2] Number of elements in a set

I1090801@DBSTU1.BITNET (09/14/87)

I'm looking for a Modula-Compiler for the CP/M-80 operating system.
Because I am very satisfied with Turbo-Pascal, I had high hopes
for Turbo-Modula, but recently someone told me, that Turbo-Modula
only allows a maximum of 16 elements in a set. Is there anyone with
some experience with this compiler who can tell me if this really is
true?
I also like to know, what other people think about the minimum number
of elements in a set, so that sets are really useful. For a while I had
to live with a Pascal-Compiler that allowed 64 elements. I remember
some programs, that I couldn't write the way I wanted, because of this
limitation. I think, there should be at least 128 elements.
What do you think about this?

                                      Ulf Gruene
                                I1090801@DBSTU1.BITNET

joel@DECWRL.DEC.COM (09/15/87)

Pascal's set constructors are untyped, and there are cases where context iss
insufficient to determine at compile time the boundaries of a set.

Most implementations thus implement sets of some small, fixed, size (like 16,
32, 128, or 256 elements).  The UCSD implementations go to some pains to allow
dynamically sized sets up to 4096 elements (8-bit microcomputers can deal with
256 16-bit words without too much grief).

Modula-2's set constructors are typed, and therefore the size of the set is
always known at compile time.

Many implementations nonetheless implement sets as the word size of the
machine, because then it's easy to generate single machine instructions.  It
is not hard  to implement sets of effectively unbounded size; further, it is
usually more efficient, and certainly more readable and useful to have large
sets than to simulate this with the corresponding user code.

One can only hope that the eventual ISO standard will say something useful
about sets, like you can have at least as many elements as, say, MAXINT.  But
people who haven't implemented such tend to vote against inclusion of such
language, so don't count on it.

- Joel McCormack {ihnp4 decvax ucbvax allegra}!decwrl!joel
		 joel@decwrl.dec.com

samlb@well.UUCP (Samuel B. Bassett) (09/15/87)

	At present, _All_ (or almost all) Modula-2 compilers have _only_
16-element bitsets, because that is how Der Verehrter Herr Doktor Professor
wanted it, and enshrined it in the language definition.
	(Whenever he is challenged about one of the many inadequacies of the
language, he replies that given the point he wanted to prove and the resources
he has {slaves... errr, graduate students} at ETH, some choices had to be
made, and, besides, that is a mere matter of implementation which can be easily
handled in a separate module.)
	The latter is certainly true -- there are at least two commercially-
available 'longset' modules that I have seen, which overcome the 16-element
set problem -- both are for Logitech's MS-DOS Modula, but do have source
available, so should port directly to most other machines . . .

	The British Standards Institute Modula-2 Standardisation effort is
working on changing the language definition to open up sets to at least
include enough memebers to define SET OF CHAR.  Progress is being made at a
glacial pace on this . . .
-- 
Sam'l Bassett, Writer/Editor/Consultant -- ideas & opinions mine!
34 Oakland Ave., San Anselmo  CA  94960;  (415) 454-7282
UUCP:  {...known world...}!hplabs OR ptsfa OR lll-crg!well!samlb;
Compuserve:  71735,1776;  WU Easylink ESL 6284-3034;  MCI SBassett

hal@newton.physics.purdue.edu (Hal Chambers) (09/15/87)

In article <8709142056.AA10394@vulcan.dec.com> joel@DECWRL.DEC.COM writes:
>...
>Modula-2's set constructors are typed, and therefore the size of the set is
>always known at compile time.
>
>Many implementations nonetheless implement sets as the word size of the
>machine, because then it's easy to generate single machine instructions.  It
>...
>- Joel McCormack {ihnp4 decvax ucbvax allegra}!decwrl!joel
>		 joel@decwrl.dec.com

I was totally shocked to see how Wirth handled BITSET in the MODULA-2
definition.  He makes the effort in the language design to require
system dependent modules to identify themselves by containing imports
from SYSTEM.  Then, instead of BITSET being defined in SYSTEM we find
that it is a STANDARD type and its import is PERVASIVE!!??
Furthermore, a set with unspecified type has type BITSET.

A set is a perfectly abstract concept that has nothing to do with
machine structures.  Yes, it has to be implemented on a machine, but
it you go to the trouble to provide it at all it shouldn't have
artificial restrictions.  Also, if there is going to be a default
set type I think it makes much more sense to make it
	SET OF CHAR;

I am not condemning the language though.  I like MODULA-2 very much.
I even like case dependent identifiers (although, I'm not thrilled
with UPPERCASE KEYWORDS).  I always use the compiler option that
enforces use of the standard language.

Cheers,
Hal Chambers

news@santra.UUCP (news) (09/18/87)

>It
>is not hard  to implement sets of effectively unbounded size; further, it is
>usually more efficient, and certainly more readable and useful to have large
>sets than to simulate this with the corresponding user code.

Right! In addition to that you cannot use operators like +, -, /, *,
IN and { ... } on larger sets implemented as user data types, they are
*nonportable* and *inefficient*. Built-in Modula-2 sets are
syntactically much better than Pascal's sets. It is unfortunate that
they are so poorly implemented.

As existing Modula-2 compilers usually don't allow large sets, you
cannot even write a (portable) compiler that uses the method of
passing sets of stopping symbols as parameters to the parsing
procedures. This method is used for error recovery in many Pascal
compilers (written in Pascal).

And how about being able to say: VAR screen[8000H]: SET OF [0..768*512-1] ?

The potential applications of large sets are numerous.

>	The British Standards Institute Modula-2 Standardisation effort is
>working on changing the language definition to open up sets to at least
>include enough memebers to define SET OF CHAR.  Progress is being made at a
>glacial pace on this . . .

Sarcastically you could imagine the following reasoning behind this
decision:
"These one-word sets, aren't they a little too restricting? Let's
represent sets as the bits of an *array* of words."
"Yes, but let's not generalize too much: surely there must be some upper
limit on the size of this array. Shall we say *four* 32-bit words?"

The point is: because the size of the set is always known at compile
time, the size of this array can also be determined at compile time.
If you only use small sets that fit in one word, then you don't even
need an array: *no* runtime overhead.  Even indexing the array can be
efficient, on a 32-bit machine, divide the number of the bit to be
tested/included/excluded/etc. by 32, that is, shift the bits five
places to the right. And if you are accessing a constant member of the
set then you don't even have to do that, the compiler can figure out
the address of the word in which to look for the appropriate bit.

-----------------------------------------------------------------------
Johan Myreen			UUCP:     ...!mcvax!hutcs!jem
Bistervagen 6 B 416             Bitnet:   jem%hutcs.uucp@fingate.bitnet
SF-02150 Esbo, Finland          Internet: jem@hutcs.hut.fi

news@santra.UUCP (news) (09/18/87)

>It
>is not hard  to implement sets of effectively unbounded size; further, it is
>usually more efficient, and certainly more readable and useful to have large
>sets than to simulate this with the corresponding user code.

Right! In addition to that you cannot use operators like +, -, /, *,
IN and { ... } on larger sets implemented as user data types, they are
*nonportable* and *inefficient*. Built-in Modula-2 sets are
syntactically much better than Pascal's sets. It is unfortunate that
they are so poorly implemented.

As existing Modula-2 compilers usually don't allow large sets, you
cannot even write a (portable) compiler that uses the method of
passing sets of stopping symbols as parameters to the parsing
procedures. This method is used for error recovery in many Pascal
compilers (written in Pascal).

And how about being able to say: VAR screen[8000H]: SET OF [0..768*512-1] ?

The potential applications of large sets are numerous.

>       The British Standards Institute Modula-2 Standardisation effort is
>working on changing the language definition to open up sets to at least
>include enough memebers to define SET OF CHAR.  Progress is being made at a
>glacial pace on this . . .

Sarcastically you could imagine the following reasoning behind this
decision:
"These one-word sets, aren't they a little too restricting? Let's
represent sets as the bits of an *array* of words."
"Yes, but let's not generalize too much: surely there must be some upper
limit on the size of this array. Shall we say *four* 32-bit words?"

The point is: because the size of the set is always known at compile
time, the size of this array can also be determined at compile time.
If you only use small sets that fit in one word, then you don't even
need an array: *no* runtime overhead.  Even indexing the array can be
efficient, on a 32-bit machine, divide the number of the bit to be
tested/included/excluded/etc. by 32, that is, shift the bits five
places to the right. And if you are accessing a constant member of the
set then you don't even have to do that, the compiler can figure out
the address of the word in which to look for the appropriate bit.

-----------------------------------------------------------------------
Johan Myreen                    UUCP:     ...!mcvax!hutcs!jem
Bistervagen 6 B 416             Bitnet:   jem%hutcs.uucp@fingate.bitnet
SF-02150 Esbo, Finland          Internet: jem@hutcs.hut.fi

konijn@ace.nl (Erik van Konijnenburg) (09/22/87)

In article <8709192023.AA27593@jade.berkeley.edu>, jem@hutcs.hut.fi writes:
> As existing Modula-2 compilers usually don't allow large sets, you
> cannot even write a (portable) compiler that uses the method of
> passing sets of stopping symbols as parameters to the parsing
> procedures. This method is used for error recovery in many Pascal
> compilers (written in Pascal).
> 
> And how about being able to say: VAR screen[8000H]: SET OF [0..768*512-1] ?
> 
> The potential applications of large sets are numerous.

Yep.  But, as you say, your code won't be very portable, since
now you not only depend on the accepted size of the set, but
also on its implementation (that it is implemented as a bitmap,
and assumptions about which set element is which pixel).

Our VAX and mc68020 compilers allow 524288 elements in a set
value, the amount that can be moved by an efficient copy loop,
so your screen would fit with room to spare, but you are still
being forced to accept an arbitrary limit.

The only inherent limit to the size of a Modula-2 set is that
the base type must be representable.  Because the set size is
known at compile time, this should not be a performance issue
except for _very_ large sets as complex code will only need to
be generated for those specific cases.

Part of the task of a standardisation committee is to define a
standard that can be accepted by a majority of users (that is,
implementors as well as programmers).  If this involves the
inclusion of minimal requirements for values of data types,
these really must be _minimal_.  Current Modula-2 practice is
based on Wirth's "Programming in Modula-2" which states that
the maximal set size is a _small_ constant determined by the
implementation, usually the computer's word size, or a small
multiple thereof (ed. 2 page 147, ed. 3 page 150).

In view of this, and assuming that it should be specified in
the standard at all, a minimum set size of 128 or 256 might
be a reasonable requirement (though _any_ value will lead to
endless discussion). These values are still arbitrary, but
arguably make _some_ sense because of current character set
sizes (though what about Japanese?), and can be implemented
on any(?) machine.

Requiring an implementation to support a set size determined
by the value of MAX(INTEGER), as was suggested, can hardly be
called minimal.

Defining the minimum in terms of CHAR, as was also suggested,
would probably not be a good choice either, as CHAR itself is
implementation dependent.  If a minimum set size is indeed
felt to be necessary perhaps it should be something along the
lines of "A minimal conforming implementation will support
set values of at least (some number, e.g. 256) elements,
however, it is recommended that such an implementation
support SET OF CHAR".

Compilers that are intended for serious work will of course
have to set their own limits (and given the growth in screen
sizes, we may have to increase ours pretty soon :-).

By the way, what about the minimum size of ARRAY types ...
(don't answer that!)

---------
Erik van Konijnenburg,	<konijn@ace.nl>, ...!mcvax!ace!konijn,
ACE Associated Computer Experts bv, Amsterdam, the Netherlands

The article really stops here.

You had the signature already.

Don't forget about the 'n' key.

F.y.i. this is a brief summary of an article that appeared in
comp.newprod.

Subject: announcing the ACE Modula-2 compiler

  - strict adherence to the language, the only extension is the inter-
    face to C-compatible subroutines such as from the Unix system-
    call library.

  - accepts edition 2 and edition 3, including NEWPROCESS, TRANSFER
    and IOTRANSFER.

  - high reliability, through a proprietary validation suite as well
    as through the BSI validation suite for which ACE is a beta-test
    site.
    
  - easy reconfiguration, allowing most runtime support routines to
    be redeclared.

  - good run-time checking: falling out of functions, NIL pointer
    dereferencing, missing defaults in case statements, range errors,
    uninitialised variables etc., are all checked.

  - amongst others, a Unix interface module and Wirth's I/O modules
    are available as Modula-2 source.

  - comprehensive documentation including all implementation defined
    decisions and how to interface to Unix and standalone systems.

  - a global optimiser that performs constant folding, expression
    reduction, size-conversion elimination and common sub-expression
    and loop-invariant code detection and other optimisations.

  - a register allocation optimiser

  - a peep-hole optimiser

  - IEEE floating point

  - Motorola 68020 code generation

  - Motorola 68881 support

  - generation of COFF or a.out images

  - mixing of code from different languages

  - available native on 68K and VAX Bsd and cross for 68K on
    VAX Bsd, VAX VMS and Gould