nagler@olsen.UUCP.UUCP (08/28/87)
After reading yet another message about inefficient operations in
Modula-2 versus Turbo (or C), I decided to do some benchmarks to
see if they were really true. This message contains the source
to a C and a Modula-2 program. The programs were compiled and run on
an unloaded diskless Sun 3/50. The compile times are included for
the sake of completeness. The compilers were both supplied by Sun,
but I should like to say that the Sun C compiler is much more mature
than its sibling M2 compiler. The M2 compiler was derrived from the
original Lilith four-pass version and was retrofitted to Sun's standard
code generator.
The programs execute all the Modula-2 set operations in a
loop for 10,000,000 iterations. Note that Modula-2 was used as the
base, because there are more native set operations in Modula-2
than there are in C. C "register" declarations weren't used,
because the Sun M2 compiler does not do register allocation which
is the equivalent to "register" in C. I assume that the register
allocation mechanism is the job of the code generator or optimizer.
As I found out later, the optimizer doesn't seem to do register
allocation/colouring at all.
The 2.0 FCS version of the Sun M2 compiler and the 3.2 SunOS/C
compiler were used. Note that people seem to think that the Sun
C compiler generates good code.
Here are the results:
% time m2c -nobounds -norange -O x.mod -e x -o x
1.9u 2.9s 0:14 33% 5+4k 158+86io 24pf+0w
% time cc -O -o y y.c
1.6u 1.3s 0:06 42% 5+5k 83+44io 5pf+0w
% time x
197.2u 0.2s 3:18 99% 0+0k 2+0io 0pf+0w
% time y
190.3u 0.2s 3:11 99% 0+0k 2+0io 0pf+0w
%
If you don't know how to read Unix gobbly-gook, the C program took
190 seconds of CPU to run and the M2 program took 197 seconds.
So it turns out that C is 3% faster. The -O flag is the optimizer,
but it doesn't appear to do too much for the Modula-2 or C. Here are
the results of an unoptimized sequence:
% time m2c -nobounds -norange x.mod -e x -o x
1.5u 2.6s 0:14 29% 5+3k 154+82io 25pf+0w
% time cc -o y y.c
1.2u 1.2s 0:05 41% 5+4k 69+31io 5pf+0w
% time x
212.5u 0.2s 3:34 99% 0+0k 2+0io 0pf+0w
% time y
201.0u 0.0s 3:21 100% 0+0k 0+0io 0pf+0w
The results tell you that the optimizer doesn't do much. The C
program consumed 201 seconds of CPU (5% speed up) and the M2 program
took 212 seconds (7% speed up). The optimizer didn't reduce the
size of the code by an appreciable amount either.
The last test I performed was to change the C program to use register
variables. The results are obviously stupendous.
% time cc -DUseRegisters -o y y.c
1.1u 1.2s 0:05 46% 5+4k 69+38io 5pf+0w
% time y
80.5u 0.0s 1:20 99% 0+0k 1+0io 0pf+0w
% time cc -O -DUseRegisters -o y y.c
1.4u 1.5s 0:06 45% 5+5k 97+43io 5pf+0w
% time y
58.6u 0.0s 0:58 99% 0+0k 2+0io 0pf+0w
The unoptimized version took 81 seconds and with the optimizer the
program speed up by 27% to 59 seconds. The absolute speed up was
twice as much as the "non-register" C version. The marinal change
was 5 1/2 times that of the "non-register" version.
I'll let you draw your own conclusions from all of this. I am still
using Modula-2 in the hopes that some day, someone will take a couple
of months off to write an optimizer for Modula-2. I have included the
test programs at the end of this note. If nothing else, they server
as a crib sheet for translating between C and Modula-2.
Rob
-----------------------------------------------------------
MODULE x;
VAR
a : BITSET;
b : BITSET;
c : BITSET;
t : BOOLEAN;
i : CARDINAL;
BEGIN
FOR i := 0 TO 9999999 DO
b := BITSET( CARDINAL( a ) * 2 ); (* b = a << 1; *)
b := BITSET( CARDINAL( a ) DIV 2 ); (* b = a >> 1; *)
INCL( a, 3 ); (* a |= ( 1 << 3 ); *)
EXCL( a, 3 ); (* a &= ~( 1 << 3 ); *)
c := a + b; (* c = a | b; *)
c := a - b; (* c = a & ~b; *)
c := a * b; (* c = a & b; *)
c := a / b; (* c = a ^ b; *)
t := 3 IN a; (* t = a & ~( 1 << 3 ); *)
t := a # b; (* t = a != b; *)
t := a <= b; (* t = !( ( a | b ) & ~ a ); *)
(* The < and > operators are not defined for sets *)
END;
END x.
main()
{
#ifdef UseRegisters
register unsigned int a, b, c, t, i;
#else
unsigned int a, b, c, t, i;
#endif UseRegisters
for ( i = 0; i <= 9999999; i++ )
{
/* b := BITSET( CARDINAL( a ) * 2 );*/ b = a << 1;
/* b := BITSET( CARDINAL( a ) DIV 2 ); */ b = a >> 1;
/* INCL( a, 3 ); */ a |= ( 1 << 3 );
/* EXCL( a, 3 ); */ a &= ~( 1 << 3 );
/* c := a + b; */ c = a | b;
/* c := a - b; */ c = a & ~b;
/* c := a * b; */ c = a & b;
/* c := a / b; */ c = a ^ b;
/* t := 3 IN a; */ t = a & ~( 1 << 3 );
/* t := a # b; */ t = a != b;
/* t := a <= b; */ t = !( ( a | b ) & ~ a );
}
} /* main */michael@corona.UUCP (09/02/87)
Neulich schrieb nagler@olsen.UUCP (Robert Nagler):
After reading yet another message about inefficient operations in
Modula-2 versus Turbo (or C), I decided to do some benchmarks to
see if they were really true.
Perhaps this is the right time to ask the net for all the
benchmarks for Modula-2, which are available. If you got one and
you are willing to share, please mail it to me.
Thanks in advance
Michael Schmidt
--
UUCP: ...!seismo!unido!pbinfo!michael | Michael Schmidt
or michael@pbinfo.UUCP | Universitaet-GH Paderborn, FB 17
CSNET: michael%pbinfo.uucp@Germany.CSNET | Warburger Str. 100
ARPA: michael%pbinfo.uucp@seismo.css.gov | D-4790 Paderborn, West Germany