[comp.lang.modula2] TopSpeed 3.0 First Impressions

toma@sail.LABS.TEK.COM (Tom Almy) (06/19/91)

I just received my "free" upgrade, and used the half price coupon for
Top Speed C. Some first impressions:

1. The package is much bigger (even before installing C). More memory
   models has enlarged the libraries, and there are new extended memory
   versions of ts and vid.
2. The environment is somewhat nicer, yet basically compatible with the
   old version.
3. The old .prj files don't work. There is now a .pr file that has 
   new, and even more confusing, syntax. The confusing array of options
   has now expanded into the project file.
4. A couple of my sample programs have balooned in size about 20%.
5. MGDEMO.MOD crashes my machine when run. In fact none of the graphics
   programs work. Their example C graphics program runs fine. I'm going
   to try and track this down, but it is a bad omen.
6. The WNDDEMO program comes in both Modula and C versions, and the C version
   is *much* faster. It is also about 10% smaller. Again the optimization
   bias is leaning even more toward C and away from Modula.
7. Documentation is much better.

I have a number of additional programs to compile, but it looks like this
version is a step backwards in code size and execution speed. And it takes
11 Megs of my disk!

I will try the C compiler on some "gut buster" code I've got here, and will
also see how it works with Microsoft Windows. I only wish they hadn't gone
the multi-langage route. I'm afraid this is going to be another "the jack of
all trades is master of none" package.

Tom Almy
toma@sail.labs.tek.com
Standard Disclaimers Apply

-- 
Tom Almy
toma@sail.labs.tek.com
Standard Disclaimers Apply

USDGOG@VTVM1.BITNET (Greg Granger) (06/20/91)

On Tue, 18 Jun 91 21:11:21 GMT Tom Almy said:
>I just received my "free" upgrade, and used the half price coupon for
>Top Speed C. Some first impressions:
>
>1. The package is much bigger (even before installing C). More memory
>   models has enlarged the libraries, and there are new extended memory
>   versions of ts and vid.
Sigh, doesn't surprise me, but I'll probably have to wait till I get a
new machine to install it. (I just ordered my M2 upgrade and a 1/2
price C++ (I figured the C++ manuals should be worth that).


>2. The environment is somewhat nicer, yet basically compatible with the
>   old version.
>3. The old .prj files don't work. There is now a .pr file that has
>   new, and even more confusing, syntax. The confusing array of options
>   has now expanded into the project file.
Gee, I can't wait <sarcastic grin>


>4. A couple of my sample programs have balooned in size about 20%.
>5. MGDEMO.MOD crashes my machine when run. In fact none of the graphics
>   programs work. Their example C graphics program runs fine. I'm going
>   to try and track this down, but it is a bad omen.
>6. The WNDDEMO program comes in both Modula and C versions, and the C version
>   is *much* faster. It is also about 10% smaller. Again the optimization
>   bias is leaning even more toward C and away from Modula.
Could you run a couple of simple tests (like time to do n printf's
against time to do n WrStr's)?  I'm interested in just how bad this
bias is.  It is my impression that JPI wrote the low level C calls and
is calling them via M2, so a call to WrStr invokes (at some level) a
call to printf.  This means that JPI M2 programmers have to paid for
the overhead of the transition calls plus the overhead of printf.
I noticed this first in there heap management stuff.


>7. Documentation is much better.
Gee, that should be too hard.  All Greek text has been changed to
French <grin>.


>I have a number of additional programs to compile, but it looks like this
>version is a step backwards in code size and execution speed. And it takes
>11 Megs of my disk!
Is that just the M2 compiler or both the C and M2 compilers?


>I will try the C compiler on some "gut buster" code I've got here, and will
>also see how it works with Microsoft Windows. I only wish they hadn't gone
>the multi-langage route. I'm afraid this is going to be another "the jack of
>all trades is master of none" package.
I feel that the multi-language route is fine, but they just have a
decidedly poor implementation (mainly involving 'language bleed' where
none is needed or wanted).

Greg

Greg Granger                         BITNET: USDGOG@VTVM1
Consultant, USD                    Internet: USDGOG@VTVM1.CC.VT.EDU
Computing Center, Va Tech

toma@sail.LABS.TEK.COM (Tom Almy) (06/20/91)

In article <INFO-M2%91062009222665@UCF1VM.BITNET> Modula2 List <INFO-M2%UCF1VM.BITNET@ucf1vm.cc.ucf.edu> writes:
>On Tue, 18 Jun 91 21:11:21 GMT Tom Almy said:
>>I just received my "free" upgrade, and used the half price coupon for
>>Top Speed C. Some first impressions:

>>6. The WNDDEMO program comes in both Modula and C versions, and the C version
>>   is *much* faster. It is also about 10% smaller. Again the optimization
>>   bias is leaning even more toward C and away from Modula.
>Could you run a couple of simple tests (like time to do n printf's
>against time to do n WrStr's)?  I'm interested in just how bad this
>bias is.  
Well I though it unfair to compair WrStr with printf, but WrStr with
fputs should be ok:

I compared the Top Speed Modula-2 with C. I wrote a program that wrote
10000 copies of "This is a test of writing speed" to the display. The result:

Language	Size	Speed
Modula-2	2854	21.1 sec
C		3116	20.6

I then tried writing to a file. Since Modula-2 is unbuffered by default, while
C is buffered by default, I did the test (now for 1000 copies) both 
unbuffered and both buffered with buffer size 1024.

Language	Size	Speed	
Modula-2 Nobuf	5620	 1.64
Modula-2 Buf	6525	 6.81	(NOT a misprint!)
C Nobuf		4654	12.58
C Buffered	4750	 0.93


It looks like Modula-2 is really buffered, somewhere, and that for some 
reason explicitly specifying a buffer actually slows it down. At any rate,
the TopSpeed C is a clear win of TopSpeed Modula-2. Incidentally, the C
Buffered code took 1.48 seconds to execute with Borland C, with a file
size of 6294, making the TopSpeed C product look very good.


Here are the sources for the buffered file versions:


MODULE test;
IMPORT FIO;
IMPORT Storage;

CONST BufferSize = 1024 + FIO.BufferOverhead;

VAR i:CARDINAL;
    dummy: FIO.File;
    buffer: ADDRESS;

BEGIN
    dummy := FIO.Create("test.out");
    Storage.ALLOCATE(buffer,BufferSize);
    FIO.AssignBuffer(dummy, buffer);
    FOR i := 0 TO 1000 DO
        FIO.WrStr(dummy,"This is a test of writing speed");
        FIO.WrLn(dummy);
    END;
END test.

#include <stdio.h>

void main() {
     int i;
     FILE *dummy = fopen("test.out","w");

     setvbuf(dummy, NULL, _IOFBF, 1024);

     for (i = 0; i <= 1000; i++) {
         fputs("This is a test of writing speed\n",dummy);
     }
}

>>I have a number of additional programs to compile, but it looks like this
>>version is a step backwards in code size and execution speed. And it takes
>>11 Megs of my disk!
>Is that just the M2 compiler or both the C and M2 compilers?

Both, but I couldn't install even the M2 compiler on my home system which
had only 2 meg free before erasing the old.

>>I will try the C compiler on some "gut buster" code I've got here, and will
>>also see how it works with Microsoft Windows. I only wish they hadn't gone
>>the multi-langage route. I'm afraid this is going to be another "the jack of
>>all trades is master of none" package.

Well the results are in! The large C program (XLISP) compiled without a
hitch to an executable about 10% smaller than that of Borland, Zortech, or
Microsoft, but it was also about 10% slower than any of those. Compilation
speed was much better than Microsoft or Zortech, but much worse than Borland
(these with all optimizations on).

The Windows test was another story. The simple example program I had (About2
from the Petzold book) would not compile correctly when modified as per
the TopSpeed manual. I looked at the TopSpeed sample program and discovered
that it was set up slightly differently (in the DEF/EXP file and the project
file) than the book said. I changed the About2 accordingly and it compiled.
The executable was 2k smaller than Borland C. Unfortunately it did some
funny things with the system resources when I ran it. 

>I feel that the multi-language route is fine, but they just have a
>decidedly poor implementation (mainly involving 'language bleed' where
>none is needed or wanted).

Yes, if they pulled it off, just having a common environment for different 
languages, even if one never did mixed language programming, would have
been great. But I just don't have the time to fool around with this when
I have other already working compilers.

I'm going to check the crashing graphics code on another machine, and if
it fails give JPI a call. If this doesn't pan out, I'll remove the whole
mess from my system and just use version 1 -- it worked!

Tom Almy
toma@sail.labs.tek.com
Standard Disclaimers Apply
-- 
Tom Almy
toma@sail.labs.tek.com
Standard Disclaimers Apply

VOGT@EMBL.BITNET (Gerhardt Vogt) (06/21/91)

I have read the last two messages concerning TS 3.0 and I have a slightly
different opinion about it.
1. Size of the System: It's true, the size of the System has increased, however
   less than mentioned in the postings. I have now 3 memory models installed
   for M2, C and C++, plus Source for the M2 library, plus all multi-language
   and windows stuff, the EXE-files are packed with DIET and I have 9.2 MB on
   my disk (2.2 MB of this for examples and Lib-Source)
2. Project files are more difficult and more powerful but who cares. I played
   a bit and it's not that difficult. And normally it's not necessary to change
   anything in the default pr files. If you like to play with it, it's a nice
   toy, if not it does not bother you.
3. The size of programs did not change as far as I can see. I have some rather
   big programs (> 20 source files) and I did not notice a difference. You
   should switch off error checking before you compare. And even if the
   EXE-files are bigger, the code must not necessarily be as well. My biggest
   program is a TSR so I can easily check it's real size when it's loaded in
   memory.
4. MGDemo works fine on my computer (386-33 with VGA). There was an error in
   an old version of the program, it did not switch in the proper mode but
   this was a Source problem not one of the compiler. (I have not checked the
   new program and I don't remember exactly what it was).
5. I can not run the M2 and the C version of the Window demo because I have
   not installed the MThread library but i can hardly believe that code
   generation should be extremely different between M2 and C. Maybe one program
   was tested with error check and the other without.
   The idea that WrStr might call printf is somewhat strange. printf is much
   more powerful than all M2-IO and normally all formatted C-IO is quite slow
   because of the overhead for interpreting format strings.
   All library functions have either a Modula or a Assembler implementation.
   I don't know why the writer of one of the mails thinks, the heap management
   would call C subroutines, the implementation is in MSTORAGE.MOD and
   calls COREMEM.A to do the low-level work.

The system did not improve very much but it's worth to get the upgrade. Multi-
language programming is a good thing and it's easier than in V 2. I would have
liked some improvements in the environment and the debugger (it was better
than Borland's three years ago but Borland has done a lot and JPI has not.

Anyway, I like it

Gerhard Vogt

EMBL
D-6900 Heidelberg
W Germany

USDGOG@VTVM1.BITNET (Greg Granger) (06/21/91)

On Thu, 20 Jun 91 18:21:00 +0100 Gerhardt Vogt said:
>...
>   The idea that WrStr might call printf is somewhat strange. printf is much
>   more powerful than all M2-IO and normally all formatted C-IO is quite slow
>   because of the overhead for interpreting format strings.
>   All library functions have either a Modula or a Assembler implementation.
>   I don't know why the writer of one of the mails thinks, the heap management
>   would call C subroutines, the implementation is in MSTORAGE.MOD and
>   calls COREMEM.A to do the low-level work.
>...
-------------------------------------------------
Sorry, I didn't mean to suggest that this was in V3.0 (I haven't
received my copy of V3.0 yet).  In version 2.0 you can trace the
heap calls to a low level routine named ?alloc (can't remember
the first letter).  It sure looked like C to me.  If JPI has
now 'fixed' this I'm glad, I'd much rather see the heap and IO
routines written in M2 (as in version 1.x) instead of M2 'wrappers'
for C routines (as in version 2.x).

Greg

USDGOG@VTVM1.BITNET (Greg Granger) (06/21/91)

On Thu, 20 Jun 91 16:11:20 GMT Tom Almy said:
>...
>I compared the Top Speed Modula-2 with C. I wrote a program that wrote
>10000 copies of "This is a test of writing speed" to the display. The result:
>...
Thanks, makes me feel a little better about JPI's M2 product.
BTW, I think printf/WrStr isn't an unfair comparison considering
the 'common' use for each.  Which reminds me of something in JPI's
Comm Toolbox manual.  The writer of the manual (clearly a die-hard
C programmer) couldn't understand why "the Modula-2 community
attached itself to an awkward set of I/O procedures, largely
ignoring the solutions for similar problems already found in C."
So this person wrote a kludgy Printf in M2.  Considering this
is the type of person JPI hires to write their toolboxes I guess
we are lucky they didn't 'improve' M2 by replacing all those
awkward IMPORTS/Def files with some nice %INCLUDE/.h files.

I guess this list has spoiled me, I just can't imagine that the
world is so short on M2 programmers that a compiler company can't
find one to support there flagship product  (ooopps, according to
here latest marketing noise, ONE of there flagship products, (BTW,
how many flagships can you have?))

Sigh, sorry I just kind'a "went off" ...

>...
>I'm going to check the crashing graphics code on another machine, and if
>it fails give JPI a call. If this doesn't pan out, I'll remove the whole
>mess from my system and just use version 1 -- it worked!
>...
Yes, I know what you mean, version 1.x still looks real good compared
to later versions.  Sometimes I wonder if JPI is advancing or ...
A least they make a nice YACC (!!! Yet Another C Compiler !!! :-)
(bet I woke some Unix hacks up with that last one ;-)

Greg

toma@sail.LABS.TEK.COM (Tom Almy) (06/21/91)

Regarding the crashing MGDEMO program:

1. It doesn't crash on my home system which has a Video Seven graphics card,
   but does on my work machine which has a Diamond Speedstar.
2. There appears to be a major rewrite of the graphics code since version
   2.

I'm going to check the graphics code carefully, possibly recompiling to use
VID. Somewhere in that code they must be doing some non-portable coding.

I'm still underwhelmed.

-- 
Tom Almy
toma@sail.labs.tek.com
Standard Disclaimers Apply

toma@sail.LABS.TEK.COM (Tom Almy) (06/22/91)

A few days ago I stated that my attempt to build a working Microsoft 
Windows program using TopSpeed 3.0 failed. It turns out I was missing
an additional line in the EXP file. 

I guess my mistake was to use the section on Windows programming in the
Techkit manual as the guide -- the extra line necessary was documented
in the "Windows Programming Suplement" pamphlet which I had overlooked
among the eight glossy manuals.

The simple example program made a 4k executable. Borland C made a 9k
executable. Very impressive reduction in overhead!

It should be noted that the EXP file is *almost* the same as the DEF
file needed by the Microsoft, Borland, and Zortech linkers, but requires
two additional lines. Also the  C source files are slightly different
and need some pragma statments not needed by the other C's.

I haven't tried it, but writing a Modula-2 Windows app looks easier than
using C (except for the wealth of existing C Windows examples...).

Now back to the graphics problem...


-- 
Tom Almy
toma@sail.labs.tek.com
Standard Disclaimers Apply

jordan@aero.org (Larry M. Jordan) (06/22/91)

I too received my upgrade (and added the TechKit) last night.  Someone
must have goofed, because the library source was thrown in as well.  Is
this the same as the "source kit"?  

I failed to make anything NOT WORK--GRDEMO, DEMO, and a compiler I'm 
building.  I even tried the windows demo--it too fails not to work.  I only
installed the Small and MThread models and have no idea how much
hard disk was required--I had 50Meg available.  That appeared to
be enough.

The mouse seems better supported by the environment than last time.  But,
I still must use the keyboard to resize windows--awkward.  The environment
is designed for the keyboard and the mouse is ancillary.  JPI makes no
pretense here.

The CLASS extension has evolved considerably--multiple inheritance and
aliasing (renaming of fields and methods) and "safe" downcasting (if ancestor
classes are "up") via type guards.  The class (instance) "initialization code"
looks just like the module initialization code.  I find the syntax and
semantics of this to be quite appealing.  

I had not used JPI Modula-2 for sometime and had forgotten how proficient the system
is at reporting compilation errors and permitting you to step through them, edit
(continue stepping without getting confused) and recompiling.  Zortech C++ v2.1
is pathetic by comparison.  TPW stops after the first error (or is this settable?).
cfront says things like "you have an error on or near ...", which I've always
found humorous, but not extremely helpful.
  
The smartlinking capability is also a great marketting hype (in addition to
be a truly valuable technology).  Shrinking virtual method
tables appears to be a gain.  Is the linker smart enough to convert virtual
methods to static methods if no descendent class overrides/redefines them?  
I think Eiffel's application builder does this. 

My only gripe so far: I've noticed many typographical errors in the documentation.
I would have thought a guick review of galley proofs (or whatever) would have
caught most of these.  Such errors are especially disturbing in coding examples.

All in all, I'm pleased.  There is a heck of lot here.
I just wish the docs were on par with the product.

--Larry

VOGT@EMBL.BITNET (Gerhardt Vogt) (06/24/91)

In a previous posting, Tom Almy tried to compare JPI's speed of String-
output to textfiles in M2 and C.

> Language        Size    Speed
> Modula-2 Nobuf  5620     1.64
> Modula-2 Buf    6525     6.81   (NOT a misprint!)
> C Nobuf         4654    12.58
> C Buffered      4750     0.93

The 6.81 seconds are not a misprint but Tom's error.

> MODULE test;
> IMPORT FIO;
> IMPORT Storage;
>
> CONST BufferSize = 1024 + FIO.BufferOverhead;
>
> VAR i:CARDINAL;
>    dummy: FIO.File;
>    buffer: ADDRESS;
>
> BEGIN
>     dummy := FIO.Create("test.out");
>     Storage.ALLOCATE(buffer,BufferSize);
>     FIO.AssignBuffer(dummy, buffer);
>     FOR i := 0 TO 1000 DO
>         FIO.WrStr(dummy,"This is a test of writing speed");
>         FIO.WrLn(dummy);
>     END;
> END test.

The AssignBuffer does not take the length of the buffer as an explicit
argument but uses the implicit size which is 4 in case of a ADDRESS
variable. Using a 4 byte buffer wouldn't be fast in any language. The
proper usage of AssignBuffer is

TYPE
  Array = ARRAY[0 .. BufferSize - 1] OF CHAR;
VAR
  buffer : POINTER TO Array;
       .
       .
       .
 FIO.AssignBuffer(dummy, buffer^);

Here AssignBuffer gets the proper size as an implicit argument and the program
is running about 1.2 seconds buffered and 12 seconds unbuffered on a 386 SX
with 16 MHz and a 18 ms disk (without Smartdrive and Co).

I agree with people who loved version 1. I asked several times people from JPI
which powerful features of V1 do not exist anymore (for examples VID's feature
to let you examine all local and global variables after a runtime error and
to let the program continue in such a case. PMD is nice if a program crashes
rarely but letting a program run in the debugger until it crashed is much
more convenient). And I don't understand why they removed programs like
ANALYZE which produces a list who exports and imports what.
On the other hand they do not want to include features in Modula which are
standard in C like a check for uninitialized or unused variables which should
be quite simple because the optimizer has to maintain this information anyway
(I was told that they have asked the compiler writer but that he does not
want to do it)
Anyway, i still like it very much

Gerhard Vogt
EMBL
D-6900 Heidelberg
West Germany

toma@sail.LABS.TEK.COM (Tom Almy) (06/24/91)

In article <56F0AA44FD1F400E38@EMBL-Heidelberg.DE> Modula2 List <INFO-M2%UCF1VM.BITNET@ucf1vm.cc.ucf.edu> writes:
>In a previous posting, Tom Almy tried to compare JPI's speed of String-
>output to textfiles in M2 and C.
>[...]
>The 6.81 seconds are not a misprint but Tom's error.
>
Thanks for pointing out my mistake. I typically statically allocate the
buffer, but wanted to dynamically allocate it so as to be as much like the
C version as possible.

After correcting the program, results were:

Language	Size	Speed	
Modula-2 Nobuf	5620	 1.64
Modula-2 Buf	6525	 0.87
C Nobuf		4654	12.58
C Buffered	4750	 0.93

Making Modula-2 consistantly faster, but larger (at least for small programs).
**********************

On another note:

Concerning the crashing problem when graphics were used with my TSENG-4000
based system, I discovered the bug in graph.mod (thank God I've got the
source kit!). It was correct in version 2, and they broke it in version 3:

In procedure SetVideoMode:


        IF (Mode >= 13) AND (Mode < 19) THEN
            R.AX := 1002H;
            R.ES := Seg(PalRegs);
            R.DX := Ofs(PalRegs);
            Lib.Intr(R, 10H);
            n := 0;
            WHILE n < 16 DO
                R.AX:=1010H;	(* WAS R.AL = 010H; *)
                R.BX:=n;
                R.DH:=SHORTCARD(PalCols[n]);
                R.CH:=SHORTCARD(LONGCARD(LongSet(PalCols[n])*LongSet(0FF00H))>>8);
                R.CL:=SHORTCARD(LONGCARD(LongSet(PalCols[n])*LongSet(0FF0000H))>>16);
                Lib.Intr(R, 10H);
                INC(n);
            END;
        END;

They made the invalid assumption that AH would not be changed across the 
interrupt call.

I'll give JPI a call on this one.

Conclusion: I could be happier, but it does all work. I'll cut back on 
disk space by eliminating lots of the various memory model libraries and
by archiving the source files.



-- 
Tom Almy
toma@sail.labs.tek.com
Standard Disclaimers Apply

schoebel@bs3.informatik.uni-stuttgart.de (Thomas Schoebel) (06/25/91)

In article <9782@sail.LABS.TEK.COM> toma@sail.LABS.TEK.COM (Tom Almy) writes:
>Language	Size	Speed	
>Modula-2 Nobuf	5620	 1.64
>Modula-2 Buf	6525	 0.87
>C Nobuf	4654	12.58
>C Buffered	4750	 0.93
>
>Making Modula-2 consistantly faster, but larger (at least for small programs).

I just got V 3.01 and played a little bit with a larger project (about
70 Modules, 750K source code in M2).

Impressions:
The OS/2 version did compile and run without problems when moving from
V 2.X. To get the DOS version run (moving from V 1.X) will take some
time for adapting calling conventions and system startup code.

I also played with calling conventions, with a surprising result:
Changing the calling conventions from JPI to STACK decreases the
code size by nearly 10%! However, without adapting some assembler
routines this code will not run, I just wanted to see the size.
Surprise: With STACK conventions, the size produced by V3.01 is
nearly the same as with V1.X (188K EXE, all checks on, full optimize).
*Not* worse than V1.X! 

Consequences:
The JPI conventions give only some gain at routines with small parameter
sets. As soon as a routine calls another or the size of it is more than
a trivial one, you can examine a push orgy in the called routine
where all parameters are moved to stack. In fairly large systems,
only a small percentage of all routines are small and trivial enough
to get advantage from register passing. In most cases it produces
additional overhead! Whether the execution time will be affected from
that is not sure, but it is most likely that larger code will run slower.
If JPI had choosen the old stack passing convention as default, their
product would be better in general.

Another question concerning benchmarks: Most of them are short
routines with few parameters. Did this JPI imply to choose
register passing? What about relevancy of such benchmarks??

In practice, I believe, register passing will be worthy
only if you manually control it. In general, it should be turned
off for better results.

-- Thomas

gkt@iitmax.iit.edu (George Thiruvathukal) (06/25/91)

In article <11710@ifi.informatik.uni-stuttgart.de>, schoebel@bs3.informatik.uni-stuttgart.de (Thomas Schoebel) writes:

> The JPI conventions give only some gain at routines with small parameter
> sets. As soon as a routine calls another or the size of it is more than
> a trivial one, you can examine a push orgy in the called routine
> where all parameters are moved to stack.

This is a good point.  I think we could prove that the average case behaviour
(a result of the push orgies) is probably worse than straight stack-based 
allocation of parameters.  For trivial programs (i.e. some of the programs 
commonly used for benchmarks), there might be a payoff.

> If JPI had choosen the old stack passing convention as default, their
> product would be better in general.

I have to disagree with you here.  It is quite easy to use the stack-based
parameter passing conventions by choosing an appropriate compiler flag.  You
would have to own the SourceKit and rebuild it with the supplied project files.
I have a feeling, however, you really meant to say that people in general would
be more happy if JPI had used the stack-based parameter passing by default for
reasons of compatibility with existing libraries and our agreement on the 
debatable nature of the alleged "performance improvement."

> Another question concerning benchmarks: Most of them are short
> routines with few parameters. Did this JPI imply to choose
> register passing? What about relevancy of such benchmarks??

As I mentioned, the practice of benchmarking in the context of software vendors
should be taken with a grain of salt.  As yet, I have not seen a vendor publish
a benchmark result which was based on non-trivial programs.  I am under the 
impression that the choice of the JPI calling convention is based on two
points:
  1. The one you made above. Simple programs with few parameters tend to
     compile well.  These programs are characterised by graphs (for the 
     allocation of registers) which have minimal, if any, conflicts.  What
     does this mean?  Well, it means that the push orgies to which Thomas
     alluded are virtually non-existant.  Conclusion: such programs are
     guaranteed to compile better with a register-based calling convention
     than a stack-based calling convention.

  2. Mainstream programming style.  Even to this day, programmers tend to
     use many global variables (even if their use is potentially confined
     to a handful of procedures.  While the rationale for doing so differs
     from programmer to programmer, many programmers I know do so because 
     they are aware of the overhead of procedure calls and compiler 
     optimization techniques.  Of course, many of the programmers I know
     really do not believe in structured programming.  They make claims to
     the effect of "structured programs cannot possibly be efficient."  In
     any event, programs which are written in the mainstream programming
     style tend to be characterized by register allocation graphs which are
     similar to the ones described for trivial programs.

> In practice, I believe, register passing will be worthy
> only if you manually control it. In general, it should be turned
> off for better results.

You can.  Check out the pragmas.  There is one which you can use which 
constricts the compilers attempt to allocate registers.  Since I cannot 
remember what it is, please look it up.

-- 
George Thiruvathukal

Laboratory for Parallel Computing and Languages
Illinois Institute of Technology
Chicago

dhinds@elaine18.Stanford.EDU (David Hinds) (06/26/91)

In article <11710@ifi.informatik.uni-stuttgart.de> schoebel@bs3.informatik.uni-stuttgart.de (Thomas Schoebel) writes:
>
>I also played with calling conventions, with a surprising result:
>Changing the calling conventions from JPI to STACK decreases the
>code size by nearly 10%! However, without adapting some assembler
>routines this code will not run, I just wanted to see the size.
>Surprise: With STACK conventions, the size produced by V3.01 is
>nearly the same as with V1.X (188K EXE, all checks on, full optimize).
>*Not* worse than V1.X! 

This may be a classic speed vs. size tradeoff.

>Consequences:
>The JPI conventions give only some gain at routines with small parameter
>sets. As soon as a routine calls another or the size of it is more than
>a trivial one, you can examine a push orgy in the called routine
>where all parameters are moved to stack. In fairly large systems,
>only a small percentage of all routines are small and trivial enough
>to get advantage from register passing. In most cases it produces
>additional overhead! Whether the execution time will be affected from
>that is not sure, but it is most likely that larger code will run slower.
>If JPI had choosen the old stack passing convention as default, their
>product would be better in general.

This is quite a strong claim, and I'm not at all sure it is valid.  Many
very serious compilers (the MIPS RISC compiler backend, for example) that
are state of the art in optimization pass some parameters in registers.
The MIPS compilers pass the first three or four word-sized parameters this
way.  The push orgy you complain about would seem to be equivalent to the
push orgy that would have been done if the parameters had to be put on the
stack in the first place, right?  You also over-generalize about properties
of procedures in large systems.  Can you show some evidence of this claim,
such as, that the most time-critical procedures in "large systems" tend to
both have many parameters and also do relatively little work (if they do
a lot of work, the overhead of passing parameters one way or another is
insignificant)?

I guess you could argue that the 80x86 machines don't have enough registers
to justify using any of them to pass parameters, even transiently.  Or that
the JPI compiler isn't strong enough at optimization to minimize the overhead.
But you should have some timing data before you make this claim (based only
on program size, as far as I can tell).

 -David Hinds
  dhinds@cb-iris.stanford.edu

Cobus.Debeer@p0.f1.n7101.z5.fidonet.org (Cobus Debeer) (06/26/91)

I have found that the MGDEMO program crashes on certain video adaptors 
and traced the cause of the problem to the code where the video mode is 
 In line 1961 of the GRAPH.mod file the bios function is called without 
the AH register being set to 0.  This can be fixed by changing the 
assignment of AL = 010 to AX := 010 or by making ah zero.  The reason 
why it does not crash consistently is that AH is mostly zero.  When it 
happens to be non-zero you go off into space.
 
There is another bug in the code.  When using VGA graphics the 
structure indicating the hrizontal and vertical pixel count returns the 
correct values but the pixel plotting functions use the EGA limitsto 
decide if a point should be plotted.
 
regards
Cobus de Beer



--  
uucp: uunet!m2xenix!puddle!5!7101!1.0!Cobus.Debeer
Internet: Cobus.Debeer@p0.f1.n7101.z5.fidonet.org

schoebel@bs3.informatik.uni-stuttgart.de (Thomas Schoebel) (06/26/91)

In article <1991Jun25.172455.12384@leland.Stanford.EDU> dhinds@elaine18.Stanford.EDU (David Hinds) writes:
>This is quite a strong claim, and I'm not at all sure it is valid.  Many
>very serious compilers (the MIPS RISC compiler backend, for example) that
>are state of the art in optimization pass some parameters in registers.
>The MIPS compilers pass the first three or four word-sized parameters this
>way.  The push orgy you complain about would seem to be equivalent to the
>push orgy that would have been done if the parameters had to be put on the
>stack in the first place, right? 

You are right with respect to RISC architectures: There you have plenty
of registers, and also the stack can be achieved by register windowing.
However, TopSpeed is running at the 8086 where you have only 4 general
purpose 16bit registers. Index registers like SI and DI, and also
segment register could be used for temporary storage, but the 86 architecture
is not very symmetric: Some instructions don't work with all registers
or they are hard-assiged to particular registers. This results in
frequent use of registers for temporary values. An example: If a procedure
has three parameters, where two of them are pointers resp. VAR parameters,
you'd need 5 registers for them. Unfortunately, AX is always choosen
for the first parameter, but AX is (most likely) the most frequently
used register. So the mentioned push orgies are very likely.

>You also over-generalize about properties
>of procedures in large systems.  Can you show some evidence of this claim,
>such as, that the most time-critical procedures in "large systems" tend to
>both have many parameters and also do relatively little work (if they do
>a lot of work, the overhead of passing parameters one way or another is
>insignificant)?

Well, my project is a multi-thread one. At least for this case my claims
are most likely, because you have to avoid static data where possible.
Thus nearly all structures are referenced via pointers, which have to
be passed as parameters.
Of course, if you program FORTRAN or COBOL style, you need not use
parameters, but then there would be no difference between both
calling conventions.

>I guess you could argue that the 80x86 machines don't have enough registers
>to justify using any of them to pass parameters, even transiently.  Or that
>the JPI compiler isn't strong enough at optimization to minimize the overhead.

Yes, the problem is the available number of registers: They are only useful
for parameters if you don't have to much of them.
Secondly: Whenever procedure A calls B where B's parameters are passed
via registers, A has to preserve the old values if they are needed after
the call. Even if procedure B preserves all temporary used registers,
there will be some overhead somewhere. The point is that there is no
general measure for predicting the lifetime of values. Keeping less
used but lately referenced values in registers leads to preserving
overhead. The question is that nobody can predict from a procedure
definition (e.g. in a DEFINITION MODULE) the reference pattern for
the parameters. Allocating registers implies an assumption that
parameters will be used either frequently or can be discarded after
a few statements. But what if the parameter is used at the bottom of
the procedure, of if the register (e. g. AX) is used in a prior
function call for the return value?

>But you should have some timing data before you make this claim (based only
>on program size, as far as I can tell).
>
> -David Hinds
>  dhinds@cb-iris.stanford.edu

Yes, I'm currently doing some timing measurements, but there are limits
because of the file accesses of my program which aren't reproducable
in general.
In a previous article, George Thiruvathukal also suggests a proof for
my claims. I think a mathematical proof is not very hard, for there
are parallels to thrashing effects in paging strategies in operating
systems. The only problem is to "prove" that having only 4 or, say 8
registers makes it very likely that such thrashing may occur.

-- Thomas