[net.lang.ada] LA AdaTEC Ada Fair '84 Report

colbert@spp1.UUCP (12/07/84)

                           Report on the

                    L.A. AdaTEC Ada* Fair '84

                       Compiler Test Results


                          Bryce M. Bardin
                      Hughes Aircraft Company
                   Software Engineering Division
                       Ground Systems Group
                           Fullerton, CA





 On June 30th, 1984, L.A. AdaTEC held its second annual Ada Fair.  Again
 this year, compiler vendors were invited to run a suite of test programs
 selected by L.A. AdaTEC.  Each vendor was asked to report his own
 results in accordance with the set of rules which were supplied with the
 test suite.  This report summarizes the results reported by the vendors.

 Source listings of the programs and copies of the rules were distributed
 to the people who attended the Fair.  They are now available on the
 ARPAnet by logging into EV-INFORMATION at ECLB (with a password of EV)
 and typing "HELP TESTS-ADA-FAIR-84" or by FTPing
 <EV-INFO>TESTS-ADA-FAIR-84.HLP.  As an alternative, L.A. AdaTEC, in the
 person of Ed Colbert, will mail you the tests over usenet if you contact
 him at "trwrb!trwspp!colbert".  The test suite was assembled by Ed
 Colbert (TRW), Gerry Fisher (IBM Research), and me.

 The vendors who participated by running the tests were:

    1) Data General Corporation (DG), running the DGC/Rolm ADE compiler
       on a DG MV8000 under AOS/VS,

    2) Irvine Computer Sciences Corporation (ICSC), running the ICSC-Ada
       Compiler on a Gould 32/87, and

    3) RR Software, Inc. (RR), running the JANUS/Ada compiler on an IBM
       PC-XT under DOS.

 This year, with the advent of more validated compilers, the tests were
 chosen without trying to limit the Ada constructs used in any way.  The
 intent of the suite was to reveal the current status of Ada
 implementations to the entire Ada community, to the extent this is
 possible with a very small set of tests.






 * Ada is a registered trademark of the U.S. Government,
 Ada Joint Program Office.










                                -2-


 Since we wished to enable vendors and end users alike to make simple
 performance comparisons on a uniform and equitable basis, we assumed
 that package Calendar was implemented.  Because evaluation of the
 differences in performance which depend on slight differences in source
 code is almost impossible, we established the rule that making
 unauthorized changes to any test automatically removes a vendor from
 consideration on that test.

 Additionally, in order to challenge the vendors of validated compilers a
 bit, we included a few tests of features that are needed in order to
 build serious real-time embedded systems -- features that only a rather
 complete Ada implementation would be likely to support.  Where possible,
 the tests were designed to be self-checking and to report their success
 or failure.

 The tests were checked out as far as possible with validated versions of
 NYU Ada/Ed, although some features not supported by Ada/Ed were
 simulated.  In spite of our best efforts, two tests were clearly
 incorrect as given to the vendors and, in accordance with the rules,
 these tests were dropped from the suite.  The Boolean vector "and" test
 had two errors:  "v2(N) := true;" should have been "v1(N) := true;" and
 "vector_result(n) := v1(n) and v2(n);" should have been "vector_result
 := v1 and v2;".  The derived type inter-conversion test had the record
 representation clause and length clause commented out, which defeats the
 purpose of the test.  (Although it turns out that no vendor could have
 performed this test even if the source text had been correct.)  A third
 test, the sets package, was challenged by Data General at the time their
 results were submitted.  Several experts have now agreed that the test
 (and also version 1.2.9 of Ada/Ed) is in error, so the test has been
 dropped.  The ARPAnet version of the test suite has been corrected.

 One group of the tests attempted to produce serious timing results using
 the package Calendar.  These tests were quite interesting because of the
 problems in test construction they revealed.  In order to assure
 adequate precision in the results, the vendors were instructed to modify
 the loop counts to obtain significant net time differences.  The
 criterion used to determine whether the loop count was adequate to pass
 these tests was based on the assumption that the resolution of the Clock
 function is determined by Duration'Small and therefore the tests
 compared the net time with 100 times Duration'Small in order to be sure
 of at least one percent precision in the average times.

 However, according to the Ada Reference Manual (ARM), "Duration'Small
 need not correspond to the basic clock cycle, the named number
 System.Tick" (ARM 9.6/4).  Although the ARM does not define "basic clock
 cycle", I interpret it to mean the resolution of the function
 Calendar.Clock.  Then the comparison should have been against 100 times
 the maximum of Duration'Small and System.Tick, instead.

 Since the disparity between the clock resolution and Duration'Small may
 be very large (e.g., in the case of Data General it is 1.0 vs. 1/(2**9)
 seconds, a ratio of 512 to 1), the results of the timing tests as
 written are not guaranteed to be very accurate even when the test itself
 announces that it "passed".  It should be emphasized that the cause of
 this problem is primarily poor test design.








                                -3-



 The major reasons that compilers did not pass some tests can be simply
 stated:

    1) The test was not attempted.  (We speculate that this is likely to be
    due to the fact that some feature or features necessary to the proper
    functioning of the test are not implemented or have significant bugs.)

    2) The vendor was disqualified on the test due to the use of
    unauthorized changes to the source code.  (Initially, all vendors were
    disqualified on one or more tests for this reason.  This was
    particularly likely to be the cause for the non-validated
    implementations, since they need work-arounds for unimplemented
    features, in order to make a program compilable.  However, in a few
    cases, there was no apparent reason for the vendor to modify the code.
    In such cases we asked the vendor to re-run the test without the
    modifications.)

    3) The test was run correctly, but the results did not meet the
    accuracy criterion, so the test itself indicated that it failed.
    (This was generally due to poor test design.)

 The following overall comments apply to the results from each of the
 vendors individually:

    1) The DG implementation has an apparent inconsistency in the
    implementation of the Calendar.Clock function and the definition
    of System.Tick.  The value of System.Tick is 0.1 seconds and the
    resolution of Clock is 1.0 seconds.  I believe their implementation
    to be incorrect.  (DG says that they are aware of this discrepancy
    and are taking steps to improve the resolution of their clock
    function to equal System.Tick.)  Errors were present in the output
    format for type Duration and an apparent bug was revealed in the
    operation of division of type Duration by type Integer.

    2) The ICSC compiler, which is not yet validated, currently
    implements type Calendar.Duration as a (hidden) subtype of float
    and uses the floating point output routines.  This leads to an
    incorrect format for a Put of Duration values with both Fore and
    Exp set to 0.

    3) The RR compiler is also not yet validated.  Contrary to the
    benchmarking rules, no compilation or execution listings were
    provided by RR.  Their results have been compiled from the
    summary they submitted.

 How the vendors fared on each individual test is given in Table 1.
















                                -4-


 Most of the timing results reported by the vendors are summarized in
 Table 2, regardless of whether the test was passed, "failed" due to
 insufficient precision, or the vendor was disqualified on the test,
 since these results are generally not too sensitive to the work-arounds
 which may have been used.  The original intent of the tests to provide
 times accurate to 1% was not realized due to problems in test design.
 Some of the times are only accurate to about one significant digit.
 Therefore we are reporting the results in the exact format given by the
 vendor, where possible, in order to avoid biasing the data further.

 Interpretation of the data may be easier with the aid of the values of
 the clock function resolution and Duration'Small, which are included in
 the table along with their ratio.  The greater the ratio of the
 resolution value to Duration'Small, the less accurate the results would
 be if the minimum iteration count that met the precision criterion were
 used in the test.  In general, the iteration counts used by the vendors
 were greater than necessary to pass the Duration'Small criterion, but
 not greatly so.  All times are given in seconds.

 Some of the size information supplied by the vendors is summarized in
 Table 3.  Because most vendors did not report all of the sizes
 requested, only the size of the object module compiled for the test (the
 columns labelled "Object") and the maximum memory size used (the columns
 labelled "Memory") are given here.  It should be noted that the DG data
 include the stack/heap allocation in the size reported.  All sizes are
 given in (decimal) bytes.

 One thing is clear about the results, and that is that all of the timing
 tests need further refinement and, in some cases, drastic surgery to
 improve their precision.  In particular, besides using both System.Tick
 and Duration'Small in checking the precision, better strategies are
 needed for the measurement of some of the I/O times.

 Another problem is that some of the tests were nominally "failed" for
 reasons of inadequate precision because iteration counts or array sizes
 greater than the maximum the implementation can support would have been
 required.  This is manifestly unfair when the goal of a test is to
 measure timing rather than capacity.  Future tests should have a better
 separation of test concerns, making sure that timing tests and capacity
 tests are kept distinct, and designing timing tests to run properly on
 machines with small word sizes and small address spaces wherever that is
 feasible.

 We need to iterate the test design and trial use process until the
 results are satisfactory to users and implementers alike.  I believe the
 current set of tests will have served their purpose, in spite of their
 obvious flaws, if they help to point us in the right direction.
















                                -5-


 Test Name                      Vendor:  DG      ICSC    RR

 Ackermann's Function                    A[a,b]  A[a,c]  D[d,e]
 Boolean Vector And Test                 I       I       I
 Binary Search                           P       N       N
 Cauchy Matrices - Floating Point        F[f]    N       N
 Cauchy Matrices - Fixed Point           F[f]    N       N
 Cauchy Matrices - Universal Numbers     F[f]    N       N
 Character Direct I/O                    P[a]    P[a]    D[d,e]
 Character Enumeration I/O               P[a]    P[a]    N
 Character Text I/O                      P[a]    P[a]    D[d,e]
 Consumer/Producer                       P       N       N
 Derived Type Inter-conversion           I       I       I
 Floating Point Vector Addition          P[a]    F[a,g]  D[d,e]
 Friendliness Test                       P[h]    N       N
 Integer Direct I/O                      P[a]    P[a]    D[d,e]
 Integer Text I/O                        P[a]    P[a]    D[d,e]
 Integer Vector Addition                 P[a]    F[a,g]  D[d,e]
 Low Level Test                          N       N       N
 Procedure Call Timing                   P[a]    P[a]    D[d,e]
 Quick Sort - Parallel                   P       D[d]    N
 Quick Sort - Sequential                 P       D[d]    N
 Readers/Writers Problem                 P       N       N
 Rendezvous Call Timing                  P[a]    P[a]    N
 Sets Package                            I[i]    I       I

 Legend:  P = Passed
          A = Anomalous (Program behavior was slightly anomalous)
          F = Failed
          N = Not Attempted
          D = Disqualified
          I = Invalid Test (Test Dropped)

 Notes:
    a  Output had errors in format.
    b  Output had errors in values.
    c  Stack overflow occurred after Ackermann (3,7), but Storage_Error
       was not raised or handled.
    d  Disqualified due to source code changes.
    e  No listing was provided by the vendor.
    f  Compiler passed the syntax and semantics checking phases, but
       couldn't generate correct code.
    g  Array size could not be set large enough to give adequate timing
       precision.  Otherwise, program executed correctly.
    h  Compiled and executed correctly, but no set/use errors (use of a
       variable before initialization) or 'hard' exceptions (exceptions
       which will always be raised by the program) were detected by the
       compiler.  Procedure Dont_Do_It was not called in the generated
       code, but was included in the load module.  The run-time
       environment did not identify the name of the exception which is
       deliberately raised by the program (Program_Error).
    i  Compiler diagnosed source errors.  Vendor successfully challenged
       the validity of the test.

                     Table 1:  Overall Results








                                -6-


 Test Name             Vendor:      DG               ICSC            RR
                       Machine:  DG MV8000        Gould 32/87     IBM PC-XT

 Clock resolution (System.Tick)  1.0[a]           1.66667E-02     0.0549
 Duration'Small                  1.95312E-03      1.66667E-02     0.01
 Ratio (a pure number)           512.0            1.0             5.49
 Ackermann's function:                                            3.26E-4[b]
    (3,1)                        0.00000E+00[c]   0.00000E+00[c]    --
    (3,2)                        0.00000E+00[c]   3.08059E-05[d]    --
    (3,3)                        0.00000E+00[c]   1.37056E-05[d]    --
    (3,4)                        9.70214E-05[d,f] 1.13187E-05[d]    --
    (3,5)                        2.35638E-05[d,f] 1.13887E-05[e]    --
    (3,6)                        3.48365E-05[d,f] 1.13214E-05       --
    (3,7)                        3.60249E-05[e,f] 1.15515E-05       --
    (3,8)                        3.62527E-05[f]       [g]           --
    (3,9)                            [h]              --            --
 Character Direct I/O Write      1.00000E-04[d]   8.49966E-04     4.73E-3
 Character Direct I/O Read       8.33333E-05[d]   5.20812E-04     3.63E-3
 Character Enumer. I/O Write     1.33333E-03[e]   9.83294E-04       --
 Character Enumer. I/O Read      5.60000E-03[e]   1.54994E-03       --
 Character Text I/O Write        4.33333E-04[e]   4.79147E-05     1.54E-3
 Character Text I/O Read         5.33333E-04[e]   9.41629E-05     1.40E-3
 Float Vector Add                1.53846E-05[d]   0.00000E+00[c]  3.30E-4
 Integer Direct I/O Write        2.70000E-04[e]   1.09579E-03     4.88E-3
 Integer Direct I/O Read         1.10000E-04[e]   5.33312E-04     3.79E-3
 Integer Text I/O Write          2.80000E-03      1.64993E-03     3.93E-3
 Integer Text I/O Read           3.97500E-03      2.26658E-03     4.81E-3
 Integer Vector Add              2.50000E-05[d]   3.33320E-06[d]  2.70E-4
 No Parameter Call               1.50000E-05[d]   6.19975E-06     1.37E-4
 In Parameter Call               1.50000E-05[d]   5.49978E-06     2.11E-4
 Out Parameter Call              2.00000E-05[d]   5.89976E-06     1.77E-4
 In Out Parameter Call           2.00000E-05[d]   6.06642E-06     1.77E-4
 No Parameter Rendezvous         8.36666E-03      8.99964E-04       --

 Notes:
    a  Clock resolution is 1.0, although System.Tick is 0.1 seconds.
    b  No individual results were provided by vendor.
    c  Net time was less than one resolution interval.
    d  Net time was at least 1 but less than 10 resolution intervals.
    e  Net time was at least 10 but less than 100 resolution intervals.
    f  Calculated by hand from intermediate results. (Due to a compiler
       bug the values printed were all zero.)
    g  Storage_Error exception not raised or handled.  The system
       detected stack overflow and terminated the program.
    h  Terminated (as expected) by Storage_Error exception.

                     Table 2:  Timing Results
















                                -7-


 Test Name               Vendor:     DG[a]         ICSC[b]         RR[b]
                         Size:   Object Memory  Object Memory  Object Memory

 Ackermann's function              --   348160   1792   75016   1540   86784
 Binary Search                     --   243712    --      --     --      --
 Character Direct I/O              --   251904   3392   81872   2531   90112
 Character Enumeration I/O         --   251904   4104   77328    --      --
 Character Text I/O                --   249856   3336   76560   2527   87680
 Consumer/Producer                 --   352256    --      --     --      --
 Floating Point Vector Addition    --   948224   1744   74968   1467   86656
 Friendliness Test                 --   241664    --      --     --      --
 Integer Direct I/O                --   251904   3392   81872   2528   90240
 Integer Text I/O                  --   249856   3352   76576   2557   87680
 Integer Vector Addition           --   948224   1712   74936   1422   86656
 Procedure Call Timing             --   249856   2080   75304   2083   87296
 Quick Sort - Parallel             --   354304   3080   84648    --      --
 Quick Sort - Sequential           --   243712   3296   84864    --      --
 Readers/Writers Problem           --   356352    --      --     --      --
 Rendezvous Call Timing            --   360448   1672   86824    --      --

 Notes:
    a  Stack/heap storage is included in size
    b  Stack/heap storage is not included in size
-------