[net.lang.apl] APL performance metric

prins@svax.cs.cornell.edu (Jan Prins) (06/16/86)

>> As absurd as it may sound, I know a guy who claims that the
>> real-world performance of an APL system can be accurately
>> predicted by finding out how fast an empty user-defined
>> function can be called in a simple loop.  Thus:
>> 
>> 
>> 	N <- 1000
>> LOOP:	FUN
>> 	-> (0 >= N <- N - 1) / LOOP
>> 
>> 
>> where FUN has no arguments and no body.

>It would be interesting to see a system that is blindingly fast on that
>expression, but stupidly slow at some "real" problem.  But, I doubt
>that such a system exists.

IBM apparently considered (constructed?) several microprogrammed assists
for the VSAPL interpreter on various bottom end /370 machines (/148,/135).
Each was to optimize a different aspect of APL execution.  

I am not sure whether more than one ever made it into the field, but the 
/148 microcode released implemented the mechanics of interpretation:
parsing, function invocation and variable localization, storage allocation,
etc.  The assist actually "popped out" to branch tables and native 370
code to perform most of the APL primitives.

This was surprising since the "fast arithmetic and vector loop" implemen-
tations are far easier to build and were generally thought to be the best 
bet for speeding APL execution up.

IBM claimed, and this was supported by interpreter statistics, that despite
the large array opportunities in the language, most data values were very 
short or even scalar, and that interpretation dominated the cost of execution.

This was borne out by the performance of the /148 assist.  STSC performed
some benchmarks that found the /148 VSAPL to be equal, in average performance 
(commercial time-sharing), to a /168.  The performance loss with the assist
disabled was a factor of 10, while for worst-case large numerical computations
the factor was 40.

The sort of loop described above is exactly where this system excelled.
None of the operations require an exit from the microcode, so this represents
top speed of the implementation.  Depending on how you read "blindingly fast" 
and "stupidly slow", this system might appear to violate the performance 
metric.  But the bottom line was that it was an impressive APL engine, on
the other end of the spectrum from ANALOGIC.

In fact, this "surprise" was the cause of an enormous computing fiasco right 
here at Cornell some years ago.  Heads rolled!  A clever administrator 
realized that the main campus' /168, which was 10% idle, had spare capacity 
equal to the Med. School's /148, which could then be eliminated.  "All they 
do is run APL all day" he said as they carted the machine off....
                                    
Personally I think that the small sizes of arrays that dominated the 
statistics 10 years ago were symptomatic of poor utilization of the language,
something that might have improved with more generalized definitions (APL2)
and programmer experience.  Perhaps the time is better now for parallel
implementations that stress performance on large aggregate values?

jan prins    {vax135,decvax,ihnp4}!cornell!prins
             prins@svax.cs.cornell.edu
             prins@cornell.csnet
             PRINS@CRNLCS.BITNET