[comp.lang.apl] implementing @:

cs450a03@uc780.umd.edu (04/19/91)

There are various parts of J which are rather slow.  I've been looking
at implementing J myself, and I think I can see a lot about the
implementation from what's slow and what's fast.

Essentially, it looks like J is written in a highly self-referent
fashion.  This is execellent for generality, and is even better for
making the language consistent with itself.  There is a speed price
though.

One way to improve speed would be to selectively substitute faster
expressions for overly general expressions.  I haven't explored this
idea much, yet.

Another way to improve speed is to re-do the underlying
implementation.  For example, if you consider J to be a generalized
array language, then one step back would be a generalized vector
language.  Maybe implementing J in terms of only scalar and vector
routines would be a win.  (Presumably, one would use J instructions
limited to rank 1...)

Finally, you could go whole-hog and implement every J function as a
"leaf" function in the implementation language (or, call system
routines, but nothing else).  This might increase code-size quite a
bit, but might run really fast.

With that out of the way, I'd like to bring up a tiny algorithm.
Consider  @: v
where v is a numeric vector (not boxed)

I think that what J is doing for this case is building a table of all
permutations of i.#v, and using a transitive i. to find the row which
matches v.  This is nice and concise, but seems slow in the current
incarnation of J.

It's occurred to me that on a typical (scalar) machine, there is a
simple O(n squared) algorithm that should run pretty fast:

(1) find length of v (if 12 or less, assume an integer result, if 170
or less, assume a floating point result -- quite possibly will
overflow if greater than 170)

(2) build a vector of i.#v (call it N), and another vector of 0#~#v
(call it M).

(3) loop over elements of v (left to right, call loop index j)
       if j{N is positive, make j{M equal to j{N
                           then make j{N be _1
                           then do N =. N - j < i.#N
       if j{N is negative, abort -- result of @: is !#v

(4) result of @: is +/M*!|.i.#M

--------------------

Note that this algorithm would be pretty fast if M could be
constructed in one pass (rather than having to re-build it at every
iteration through loop(3)).

I suppose that some attention should be paid to "intellegently"
pre-allocating memory for results.  Perhaps if catenate could somehow
be told to allocate extra memory for future catenates?  (this would
have to be fairly transparent to the user)

Has anyone thought about this sort of condition?

I think J is losing a lot of time, currently, in catenation.

(Of course, I may be wrong)

Raul Rockwell

rbe@yrloc.ipsa.reuter.COM (Robert Bernecky) (04/21/91)

In article <19APR91.00433381@uc780.umd.edu> cs450a03@uc780.umd.edu writes:
>There are various parts of J which are rather slow.  I've been looking
>
>One way to improve speed would be to selectively substitute faster
>expressions for overly general expressions.  I haven't explored this
>idea much, yet.
>
>Another way to improve speed is to re-do the underlying

Of course, special casing and embedding support for idiomatic expressions
in functional languages is one way to get lots of speed. In SHARP APL,
we obtained factors of 5-500 speedups for primitives such as catenate-with-rank,
base-value-with-rank, and so on. Too bad we never got time to finish that
work...

My J compiler is going to have some capabilities in this area -- it has to do
so, if it's going to be successful (fast, that is).

I think Roger's design as it stands is excellent for a prototyping platform:
Radical design changes can be propagated through the entire interpreter
with minimal effort. Once you start bolting special case code into an 
interpreter, the design tends to become frozen to a large extent.

Also, special cases DO take time, and given the limited budget (temporal
and otherwise) which J is being developed under, it is understandable that
correct operation takes precedence over speed. ( I am told that some
other language designers do not share this view...   8^}
)

Bob



Robert Bernecky      rbe@yrloc.ipsa.reuter.com  bernecky@itrchq.itrc.on.ca 
Snake Island Research Inc  (416) 368-6944   FAX: (416) 360-4694 
18 Fifth Street, Ward's Island
Toronto, Ontario M5J 2B9 
Canada

hui@yrloc.ipsa.reuter.COM (Roger Hui) (04/22/91)

Raul Rockwell writes:
 
> With that out of the way, I'd like to bring up a tiny algorithm.
> Consider  @: v
> where v is a numeric vector (not boxed)
>
> I think that what J is doing for this case is building a table of all
> permutations of i.#v, and using a transitive i. to find the row which
> matches v.  This is nice and concise, but seems slow in the current
> incarnation of J.

There is **no way** I would implement @:y with an order !n=.#y algorithm.
FYI this is a model of the current implementation of @: , computing the
atomic representation of a permutation p (possibly in nonstandard form):
 
   ord  =. >:@(>./)
   std  =. ((i.@ord -. ]) , ]) @ (ord | ])
   rfd  =. +/@({.>}.)\."1
   base =. (- i.)@ord
   ar   =. base #. rfd@std
 
ord computes the order of a permutation, possibly in nonstandard form.
 
std converts a permutation into standard form.  std is equivalent to
'((i.n)-.n|y.),n|y. ] n=.1+>/y.' : ''
 
rfd computes the reduced representation of a permutation from its
direct (standard) representation.  rfd and its inverse dfr were
written by E.E. McDonnell, Iverson Software PARC:
   dfr =. /:^:2@,/"1
   rfd =. +/@({.>}.)\."1
 
base computes the value n-i.n .
 
ar p computes @:p, the value of the reduced representation of p in the
base (n-i.n) numbering system.

-----------------------------------------------------------------
Roger Hui
Iverson Software Inc., 33 Major Street, Toronto, Ontario  M5S 2K9
(416) 925 6096