[comp.lang.apl] JKT replies to EEM re: histograms

loc@yrloc.ipsa.reuter.COM (Leigh Clayton) (04/03/91)

 The following from Joey Tuttle, also on IPSA mailbox.
--------------------------------------------------------------------


no. 5065406 filed  3.47.05  tue  2 apr 1991
 from jkt
to   eem
cc   jig loc
subj histogram
ref  5065227
expy  3.47.00   2 may 1991

Nice trick -- as we discussed, this is something I have wanted to
do in APL for a long time.

When I tried your algorithm on (b=. ?64 64$466  -- the question off
of Net News that raised the issue) I got a ws full after 25 seconds
with j in a 2 meg partition on a Macintosh CX, not terribly
efficient (sigh).  Another thing I find uncomfortable about your
solution is the order of the result.  It has 0's in it for the
ordinal empty buckets (coming from JOINING the set of all values to
the argument) that makes sense, but the other values are in the
order they appear in the RAVEL of the array.  To make sense of the
result probably requires some sorting....  You can correct this
problem by putting the set of all values first, that and
eliminating the redundant ravel on the left of your expression
leaves

     h =. <: #/.~ (i.x),,y

Which produces a more understandable result.  On the other hand, if
you don't join all possible values to y, then at least the counts
appear in the nub order of y which seems reasonable.

Another advantage to not joining all possible values is that if the
possible number is large (e.g. 2^31, a common number) then it pays
not to count all the empty ones!  Clearly though,

     #/.~,y

is what I have asked for in the past in APL.  By making a special
performance case of (#/.~), a problem which has always been
"inappropriate for APL because of performance" could be solved very
nicely.  I suppose in a practical sense, I would want to associate
the result closely with the nub of y since a "real application"
like analysis of collected data would likely involve a very large
set of data and y would need to be segmented in a loop of some
kind.  Since the algorithm to do this probably generates a nub of
the set, perhaps it should be some varient of nub that returns the
nub and associated count.  Still leaves the interesting problem of
how to combine successive results...



-----------------------------------------------------------
loc@tmsoft.UUCP                     uunet!mnetor!tmsoft!loc
loc@yrloc.ipsa.reuter.COM                   (Leigh Clayton)

cs450a03@uc780.umd.edu (04/05/91)

Joey Tuttle writes:
>
>     #/.~,y
> 
>is what I have asked for in the past in APL.  By making a special
>performance case of (#/.~), a problem which has always been
>"inappropriate for APL because of performance" could be solved very
>nicely.

what's wrong with

    v =. /:, y
    k =. k # i. $k=. 1, (_1}.v) ~: 1}. v
 hist =. k - ($k){. 0,k

here, /:  is O(n log n)  (where n is the length of v)
and everything is is O(n)

Also note that k{v  is the nub of ,y

Yeah, this version will blow up if y is empty, but that is easily
handled (either as a special case, by coercing k, or by choosing a
different algorithm to check adjecency when computing k).

Raul Rockwell