johnl@ee.brunel.ac.uk (John Lancaster) (12/02/86)
Will Modula-2 be Successful? NO!
I have been using Modula-2 for some time and watching the efforts of the
BSI Modula-2 Working Group [1] to standardize the language. Throughout
this period I have often asked myself:
'Will Modula-2 be efficient/versatile enough to be a winner in
the general programming language market place?'
In the form proposed by Wirth I feel the answer is no. The BSI Working
Group has done much to improve the situation and I would like to thank
its members for their unpaid efforts. They have formalised the language
definition and introduced extensions/changes to it where necessary:
multi-dimension open arrays [2] and co-routines [3] are two examples.
However, there are still deficiencies. Below I highlight a number of
problems and propose solutions to them.
PROBLEM 1. Structured Constants
At present structured constants are implemented by declaring a global
variable which is initialized at runtime by code in the body of each
module.
There are two disadvantages to this approach:
1. The "constant" is not safe because it is really a variable, hence the
compiler cannot protect it from unintentional change.
2. For those structures whose value cannot be derived by simple
computation, constants are duplicated in the data and code areas of
the program. This can be a high overhead in memory-sensitive ROM
based systems.
SOLUTION 1. Allow for the declaration of structured constants (CONST) as
provided in the Turbo Pascal [4] dialect of Pascal.
PROBLEM 2. Parameter types.
Consider the following two procedures:
PROCEDURE MatrixOp (Op1, Op2 : ARRAY OF WORD,
VAR Result : ARRAY OF WORD)
PROCEDURE Length (String : ARRAY OF CHAR) : CARDINAL
These procedures have their input parameters passed by value. Although
this is a safe method, the time spent creating a local copy of input
parameters can have a severe effect on execution speed.
Many programmers consider the above inefficient and work around it by the
unsafe practice of declaring structured input parameters as VARs.
Unfortunately in the case of the 'Length' procedure this causes another
problem, namely the following is not legal:
Size := Length('Literal String')
SOLUTION 2. Add to the language a new formal parameter PVAR (protected
variable). The implementation of a PVAR parameter is identical to that
of a VAR parameter except that compile-time checks protect it from
modification within the procedure by not allowing assignment to it or its
use as a VAR parameter.
As an aside I would also like to be able to code
Result := MatrixOP (Op1, Op2)
where the function is supplied with a pointer to Result and directly
operated on it rather than creating an internal result which is passed
out and assigned to Result. Although the syntax of Pascal could be
modified to accommodate it, it appears that Modula-2's syntax cannot.
Does anyone have any idea?
PROBLEM 3. User exception handling.
There is no provision in Modula-2 for the programmer to implement
exception handling. This is primarily due to the absence of an
equivalent to the Pascal construct "GOTO Label_InOuterBlock". The BSI
proposed IO library [5] works around this shortcoming by predicate
pretesting, i.e. testing if an operation can be performed before trying
to do it. Although such an approach has merit it is not universally
applicable.
SOLUTION 3. Allow for user-written exception handlers as in ADA.
Borland's [6] have proposed a possible Modula-2 implementation.
PROBLEM 4. BITSET size and syntax.
The data type BITSET provides a simple mechanism for bit addressing
(thereby allowing the crippled data type SET to be replaced by something
more useful) and performing logical operations on word wide variables.
The deficiencies of BITSET become apparent on a machine with a variable
word size addressing architecture like the 8086 & 68000 microprocessor
families. Direct bit-twiddling of the hardware registers on such
machines requires the language to support more than one size of BITSET.
For the 68000 family BITSETs of 8, 16 & 32 elements (bits wide) are
required.
SOLUTION 4. Introduce a new type construction, which needs to be
imported from SYSTEM, with syntax of the form:
RegBitMap = BITSET OF [TxOn,RxOn,NIL,NIL,Reset,NIL,NIL,Error]
The size of the memory 'word' being addressed is given by the number of
elements in the set. Allowing the elements of the set to belong to an
enumerated type in addition to a CARDINAL sub-range brings to bit
addressing the same benefits it gives to variable (word) addressing. The
reserved word NIL is used as a padding (spacing) element, but also
indicates to the compiler those bits which should not be accessed.
PROBLEM 5. Word subfields.
Consider a hardware device with the following bit allocation
---------------------------------
|. . . .|* * * *|* * . .|. . . .|
| | | | | | | | | | | | | | | | |
---------------------------------
where '.' are 1-bit flags
and '*' is a 6-bit integer
To be efficient a low-level module which accesses the integer subfield
should generate in-line code for bit shifting and sign extension.
SOLUTION 5. Require SYSTEM to export Shift and SignExtend functions
which operate on all word sizes.
PROBLEM 6. Low-level escape path.
When SYSTEM does not export a suitable low-level facility the programmer
must resort to calling an assembly language module. Such a solution is
unattractive because:
1. Calling a module to execute one or two machine instructions is
inefficient.
2. The special link format used by many of the currently available
Modula-2 compilers cannot (easily) be linked with non-Modula-2 object
files.
SOLUTION 6. SYSTEM must export a "code insert" facility.
I have presented above a number of extensions to the currently proposed
BSI Modula-2 standard. With the exception of the BITSET proposal their
adoption would not invalidate any existing code. I feel they are
justified by the inability of the proposed language to implement these
features with (library) modules else they are justified by the resulting
increase in program reliability and efficiency they offer.
Although I welcome comment on my proposals I feel that the interests of
the community are best served by submissions to the BSI Modula-2 Working
Group and encourage readers to do so.
REFERENCES:
1. The Modula-2 Working Group of the British Standards Institution can
be contacted via Barry Cornelius, Department of Computer Science,
University of Durham, Durham, DH1 3LE, United Kingdom.
Barry_Cornelius%mts.durham.ac.uk@UCL-CS.ARPA
Barry_Cornelius@uk.ac.durham.mts
bjc@uk.ac.nott.cs
2. "BSI Accepted Change: Multi-dimensional open arrays", Willy Steiger,
"MODUS Quarterly" Issue 5, pp. 8-9.
3. "Coroutines and Processes", Roger Henry, "BSI Modula-2 Working Group,
Second Open Meeting", July 24th 1986.
4. Turbo Pascal is a registered trademark of Borland International Inc.
5. "Draft BSI Standard I/O Library for Modula-2", Susan Eisenbach,
"MODUS Quarterly" Issue 5, pp. 15-18.
6. "Proposal for a standard library and an Extension to Modula-2",
Odersky, Sollich & Weisert of Borland International, "MODUS
Quarterly" Issue 4, pp. 13-25.
John Lancaster,
London, UK
johnl@uk.ac.brunel.ee
johnl%ee.brunel.ac.uk@ucl-cs.arpabills@cca.UUCP (Bill Stackhouse) (12/06/86)
A comment about problem #4 which had to do with a 16 bit word which has a 6 bit integer between a bunch of 1 bit fields. Why not solve it with the Pascal approach of a packed structured record? x = packed record of b1, b2, b3, b4 : boolean; i : 0..63; b5, b6, b7, b8, b9, b10 : boolean; end; The syntax is loose but the key is that a compiler can detect the intent that everything should fit into one word. You might want a shift function but not to solve the given example. Something I would like to see in all procedure based languages is some syntax in the procedure def. that indicates that the procedure is to be included inline at all places it is called. That would allow the abstraction to still occur but would do away with the overhead of calling and returning just for a few lines of code. -- Bill Stackhouse Cambridge, MA. bills@cca.cca.com
marty@ism780c.UUCP (Marty Smith) (12/08/86)
In article <11458@cca.UUCP> bills@cca.UUCP (Bill Stackhouse) writes: > >Something I would like to see in all procedure based languages >is some syntax in the procedure def. that indicates that the >procedure is to be included inline at all places it is called. >That would allow the abstraction to still occur but would do >away with the overhead of calling and returning just for a few >lines of code. Good idea. Marty Smith
dan@prairie.UUCP (Daniel M. Frank) (12/09/86)
In article <11458@cca.UUCP> bills@cca.UUCP (Bill Stackhouse) writes: > >Something I would like to see in all procedure based languages >is some syntax in the procedure def. that indicates that the >procedure is to be included inline at all places it is called. This is implemented in C++, and in many Ada compilers (it's an optional pragma). -- Dan Frank uucp: ... uwvax!prairie!dan arpa: dan%caseus@spool.wisc.edu
firth@sei.cmu.edu (Robert Firth) (12/09/86)
In article <4814@ism780c.UUCP> marty@ism780c.UUCP (Marty Smith) writes: >In article <11458@cca.UUCP> bills@cca.UUCP (Bill Stackhouse) writes: >> >>Something I would like to see in all procedure based languages >>is some syntax in the procedure def. that indicates that the >>procedure is to be included inline at all places it is called.
nagler@seismo.CSS.GOV@olsen.UUCP (Robert Nagler) (12/10/86)
In response to the comments about ``inline'' procedures in M2 (and other procedure languages. I do not see why the programmer should provide hints to the compiler or linker as to the ``correct'' way to optimize a particular program. Given that most linkers nowadays are static, I see no reason why a linker couldn't decide to in-line a particular procedure given enough information by the compiler (like the size of the procedure). After programming in C a few years, it was particularly annoying to me to have to figure out which variables were supposed to go in a register and which were not. I also always wondered why I couldn't make global variables register variables, but that is besides the point. Clearly, I am making a common argument: let the compiler do it, it is smarter than you are. The Volition's M2 compiler for the P-system provides a method for doing in-line code in the definition module. It always seemed silly to me to provide such an interface. The argument was that it was a cheap way towards improving the speed of M2 code. I say that it is a good way to decrease the maintainability of a program or library. The same goes for in-line procedures which are not in the definition module. Specifically, some machines have expensive procedure calling mechanisms and others have cheap ones (VAX vs. RISC). It is certainly reasonable that if you in-line something on a VAX it may go faster while the same ``optimization'' may reduce the speed of the same code running on a RISC. The reasons for this are complicated, but it boils down to how much is in the Cache and how fast is a procedure call. M2 has plenty of silly programmer directed optimization techniques and some of them yield incorrect results when used without special checks (e.g. the CAP function). We should work on getting a good optimizing compiler for the language that exists instead of trying to add features that make the language even more complicated. Rob Nagler Olsen & Associates seismo!mcvax!cernvax!unizh!olsen!nagler
stuart@bms-at.UUCP (Stuart D. Gathman) (12/10/86)
In article <11458@cca.UUCP>, bills@cca.UUCP (Bill Stackhouse) writes: > x = packed record of > b1, b2, b3, b4 : boolean; > i : 0..63; > b5, b6, b7, b8, b9, b10 : boolean; > end; This has my vote. Even bit arrays fit this idea (packed array of . . .) > Something I would like to see in all procedure based languages > is some syntax in the procedure def. that indicates that the > procedure is to be included inline at all places it is called. Many global optimizers do this automatically. Procedures less than a certain size or called a small number of times are automatically converted to inline. This optimization can even be performed on the binary! (I.e. language independent) Adding an 'inline' key word allows good code from simpler compilers, however. -- Stuart D. Gathman <..!seismo!dgis!bms-at!stuart>
firth@sei.cmu.edu.UUCP (12/10/86)
In article <307@bms-at.UUCP> stuart@bms-at.UUCP (Stuart D. Gathman) writes: >Adding an 'inline' key word allows good code from simpler compilers, however. >-- >Stuart D. Gathman <..!seismo!dgis!bms-at!stuart> Let me agree at once with Stuart and others that a "good" compiler should automatically expand procedures inline where appropriate. However, I think some pragma or hint is needed to tell most compilers where this is indeed appropriate. Suppose, for instance, that a DEFINITION module contains a procedure definition, and the IMPLEMENTATION module contains the body. If the procedure is compiled as true out-of-line code, then you can replace the implementation (body) without recompiling dependents. If however it is made inline, then generally you can't - all the dependents reference the body and therefore will change. There is a tradeoff between execution efficiency and ease of change, and in general the compiler can't make that tradeoff, because it doesn't know whether you are in test-&-debug mode or whether you are in performance assessment mode, or even building the final load image At the highest level, therefore, you need a compiler option that says "yes, go for inline". At a finer level of control, you might want to say of any one procedure: never inline (rarely called, huge body, or whatever) always inline (critical code, body won't change, ...) sometimes inline (under option control) But here I'd state my personal preference for more intelligent compilers. In general, the user should give the compiler any information it CANNOT find out for itself, and no other information.
bobc@tikal.UUCP (12/11/86)
There are several articles based on John Lancaster's (johnl@uk.ac.brunel.ee)
article entitled "Modula-2 standard". I will not include any of the
original text to reduce net traffic. John lists six problems he sees
in Modula2 as it is currently defined. I feel that only one of these
problems requires a "Language Change", and that most of them are
requests for what I feel should be "Compiler Options" or a larger
standard "SYSTEM" module. I don't feel that any of these problems
will make the language be unsuccessful as even with out these features
I find that the language is better then either "C" or "Pascal", and is
still much smaller then Ada.
In all cases I consider Compiler Options to be embedded in comment
blocks such that the code can be ported to compiler which does not
support the features with out rewritting large parts of the code.
Problem 1. structured constants
This would require a modification to the syntax of the language, if it
is really greatly needed then perhaps it should be added to the
language. If the BSI were really on their toes they might write a
optional standard which could define the way that compilers which
support this feature should do so. (And maybe John Lancaster should
write it up and submit it to them). A optional compiler feature which
might solve this problem is a method for the compiler (of definition
modules) to indicate read only variables (this would allow a module to
set up the "constant" variables and disallow other modules from
modifing them this would also mean that they can't be passed by
address except as may be as required in item 2).
Problem 2 PVAR
Is a code generation problem in which the programmer wishes to suggest
to the compiler that the variable should be passed by value and will
not be changed by the module. At times I have felt that the compiler
should not allow the programmer to modify the values of "passed by
value" parameters. I feel that the correct solution is for a optional
compiler switch which can be used to indicate that the program wishes
to have the compiler pass strings/structures by address and inforce the
rule of not allowing the "value" parameters to be modified. Note that
this "pass by address will not change" type of definition will have to
be in the definition modules, but could be done with a compiler
switch.
Problem 3. User exception handling.
I don't agree that any more support for this is needed then is already
available. I feel that a "setjmp"/"longjmp", and a signal catching
routine can be created that will be every bit as good as the ones used
by "C". I also know that Wirth's implementations supply all the low
level features required to do this (even if it does require writting a
small routine in machine language). I have a almost complete version
of coroutines for MacMETH (Macintosh compiler from ETH) which prove
that low level things like this can be done if the correct "SYSTEM"
module is provided by the compiler.
Problem 4. BITSET size and syntax.
I don't see where this does any good. What would appear to be needed
is a compiler which defines how a set is to be represented in menory.
Then these sets can be used for hardware device registers. I also
beleive that the NILs are not needed as the programmer can create names
like "PAD1", "PAD2" ... this can be done without any changes to the
language as it is only a compiler implementation issue.
Problem 4. PACKING OF RECORDS
I feel that some form of compiler option could be created to allow the
programmer to specify that a RECORD is to be packed in to the smallest
space possible. I also feel that it should be done in a way that it
is portable to the current compilers, this is why I prefer options of
the form (*$PACK*) which would allow me to quickly detect routines
which make use of special compiler features, and at the same time
allow compilers which do not support the feature still compile the
program (which small adjustments). This feature is not required, it
is just a extra feature that allows the compiler to do some the of
decoding for you (given that many processors now support bit fields it
would be easier for the compiler to do it but this is not true for
things like PDP-11s, 68000s etc).
Problem 5. SHIFT, and SignExtend.
These functions should I agree be included in the standard SYSTEM
module (and in fact the compilers that I have used have supported them
in some form either ASH, or SHIFT, and the function LONG to do sign
extention of a short).
Problem 6. Low-level escape path.
I feel that this is a function of finding a good compiler. First the
MacMETH compiler (also the generic 68K compiler, and I suppose the
Lilth, and 32032 versions also) support inline machine language code,
as well as allowing the definition of one word code procedures which
get expanded in line to support Operating System Calls via TRAPs. In
any case the low-level escape paths should be considered to be machine
specific, but it would be nice if there was a standard syntax for
insert code inline. (The current MacMETH compiler supports the psuedo
function INLINE to do this).
I also feel that the current implemenetations do not support linking to
non-Modula-2 object files because of external restrictions on identifer
lengths on some systems. I currently have a compiler which runs on a
Macintosh, and can link to assembler code, and there are only a few
small changes left to allow it to link to Pascal routines. Also the
code from this compiler can be called from either assembler, "C", or
"Pascal" with the only problem is that the init routines have to be
called by the program (if for any reason someone would want to do
this).
Bob Campbell
Teltone Corporation 18520 - 66th AVE NE
P.O. Box 657 Seattle, WA 98155
Kirkland, WA 98033
{amc,dataio,fluke,hplsla,sunup,uw-beaver}!tikal!bobcmarty@ism780c.UUCP (Marty Smith) (12/12/86)
In article <8612091923.AA23422@olsen.uucp> nagler@seismo.CSS.GOV@olsen.UUCP (Robert Nagler) writes: >In response to the comments about ``inline'' procedures in M2 (and other >procedure languages. I do not see why the programmer should provide hints >to the compiler or linker as to the ``correct'' way to optimize a particular >program. > [...] >Clearly, I am making a common argument: let the compiler do it, it is smarter >than you are. I know of no compiler that is smarter than I am. When I want a program optimized for execution speed rather than memory usage, I want the compiler to optimize for speed. Marty Smith
ahe@k.cc.purdue.edu.UUCP (12/12/86)
In article <4814@ism780c.UUCP> marty@ism780c.UUCP (Marty Smith) writes: >In article <11458@cca.UUCP> bills@cca.UUCP (Bill Stackhouse) writes: >> >>Something I would like to see in all procedure based languages >>is some syntax in the procedure def. that indicates that the ^^^^^^^^^ >>procedure is to be included inline at all places it is called. In article <475@aw.sei.cmu.edu.sei.cmu.edu>, firth@sei.cmu.edu.UUCP writes: > Let me agree at once with Stuart and others that a "good" compiler > should automatically expand procedures inline where appropriate. > > However, I think some pragma or hint is needed to tell most > compilers where this is indeed appropriate. Suppose, for instance, > that a DEFINITION module contains a procedure definition, and the > IMPLEMENTATION module contains the body. If the procedure is > compiled as true out-of-line code, then you can replace the > implementation (body) without recompiling dependents. [...] The original article referenced *procedures* and not modules. Furthermore, in the case of a module, the compiler has NO CHOICE but to generate out-of-line code; the separateness must be maintained. In article <475@aw.sei.cmu.edu.sei.cmu.edu>, firth@sei.cmu.edu.UUCP writes: > But here I'd state my personal preference for more intelligent > compilers. In general, the user should give the compiler any > information it CANNOT find out for itself, and no other > information. The compiler already has all the information it needs. When one specifies "module", the compiler can, should, and will take this as a clear indication that the code is *not* to be expanded in-line. Bill Wolfe
firth@sei.cmu.edu (Robert Firth) (12/15/86)
In article <1656@k.cc.purdue.edu> ahe@k.cc.purdue.edu (Bill Wolfe) writes: >In article <4814@ism780c.UUCP> marty@ism780c.UUCP (Marty Smith) writes: >>In article <11458@cca.UUCP> bills@cca.UUCP (Bill Stackhouse) writes: >>> >>>Something I would like to see in all procedure based languages >>>is some syntax in the procedure def. that indicates that the > ^^^^^^^^^ >>>procedure is to be included inline at all places it is called. > >In article <475@aw.sei.cmu.edu.sei.cmu.edu>, firth@sei.cmu.edu.UUCP writes: >> Let me agree at once with Stuart and others that a "good" compiler >> should automatically expand procedures inline where appropriate. >> >> However, I think some pragma or hint is needed to tell most >> compilers where this is indeed appropriate. Suppose, for instance, >> that a DEFINITION module contains a procedure definition, and the >> IMPLEMENTATION module contains the body. If the procedure is >> compiled as true out-of-line code, then you can replace the >> implementation (body) without recompiling dependents. [...] > > The original article referenced *procedures* and not modules. > Furthermore, in the case of a module, the compiler has NO CHOICE but to > generate out-of-line code; the separateness must be maintained. > And I was talking about procedures defined in modules. In M2, ALL procedures are defined in modules, so we are talking about the same thing. The Modula-2 Report [third edition, Ch 14] requires only that modules be separately compiled, it does not require that an implementation module be REcompilable without visible effect. This is mentioned in the user manual merely as one possible feature of a "sophisticated" Modula compiler. >In article <475@aw.sei.cmu.edu.sei.cmu.edu>, firth@sei.cmu.edu.UUCP writes: >> But here I'd state my personal preference for more intelligent >> compilers. In general, the user should give the compiler any >> information it CANNOT find out for itself, and no other >> information. > > The compiler already has all the information it needs. When one > specifies "module", the compiler can, should, and will take this > as a clear indication that the code is *not* to be expanded in-line. > > Bill Wolfe > Such a compiler would be wrong. It really does help to read the language definition, you know.
nagler@seismo.CSS.GOV@olsen.UUCP (Robert Nagler) (12/16/86)
From Marty Smith:
When I want a program optimized for execution speed rather than memory
usage, I want the compiler to optimize for speed.
I agree with the above statement whole-heartedly. I think all compilers
should have a range of optimization: execution speed vs. space vs. compilation
time. The programmer could specify it on the command line. The compiler could
then trade off on the amount of loop unrolling, size of procedures to inline,
complexity of register coloring algorithms, etc.
From Doug Johnston:
The decision to include procedures inline may often be better done by
compilers but compilers often do not have all of the information necessary
to make good decisions.
Exactly what information does the compiler have that the programmer doesn't?
I think there are many things that the programmer shouldn't know that the
compiler does, e.g. knowing the cache size of the target is 128Kbytes (fully
associative), thus it might be a good idea to re-execute the 5 lines of code
from the cache than re-fetching the instructions if the procedure is called
three times in a row.
...I know that programmers may not always make the best decisions but
it is not my job to control them.
Given two tasks, people always tend towards the easier one. Micro-optimizing
source code is *easy* for your average programmer, but designing quality
systems is hard (documented, maintainable, extensible, robust). I make this
statement after catching a lot of very fast bugs. As a manager, I want
the programmers I work with to design fast systems not optimize bad designs.
Sometimes, I think it would be better if we weren't even allowed to write
code and we would have to hand over a design to a data entry typist who knows
how to translate pseudo-code to Modula-2. Getting to the bits is very
important, but only in a very few cases (10% of the code does 90% of the work).
I don't know. Maybe I am wrong. If this discussion results in a Modula-2
standard for INLINE, I think we should also consider the following primitives:
LOCKDOWN - inserted before a memory declaration (procedure or
variable), the declared memory is ``locked down'' in so
that swapping or paging will not occur.
REGISTER - before a variable declaration indicates that the variable
should be kept in memory (even globals). Before a parameter
procedure (in def mod of course) would specify that the parameters
should be passed in the registers.
SUBEXP[xxx] - before a parenthesized expression indicates to the
compiler that the following expression is identified as 'xxx'. If
another SUBEXP[xxx] appears, then the compiler should use the
previous value instead of generating the code for this new
expression.
REF - before a parameter declaration indicates to the compiler that
the parameter should be passed by reference, but to the caller
it should appear as a pass-by-value. The idea of course is that
the called procedure will probably not touch the variable. This
would replace the over-used VAR (see Strings.Length in Logitech's
library for a good use of this new primitive).
DONTLOOK - instead of these silly function-like coercions that always
seem to get in the way. This primitive would tell the compiler
to turn off all levels of type, range, etc-checking until a
matching END statement.
Did I leave out anything? If you would like to see a more complete list,
see the "C Puzzle Book".
Rob Nagler
[If my employer knew what I was writing, I think he would fire me for wasting
time. As far as opinions go, my employer has a lot of them.]AlHall.osbunorth@XEROX.COM (01/06/87)
I'm interested in hearing from anyone who has given some serious thought to implementing "User exception handling" (as per Bob Campbell's message of 12/11/86) in Modula-2. Return codes really make me C-sick; I'd much rather write less-than-portable code in Modula-2 than pollute my normal-case code with a ridiculous number of IF statements [how would you like your conscious mind cluttered with paranoid thoughts like, "Have I been hit by a 747?" every time you cross the street?]. Implementation details would be nice, but I'm mostly interested in a "sanity check" regarding my own thoughts on its ease of implementation in Modula-2 as opposed to C. Thanks, Al