[comp.lang.apl] MODE

ljdickey@water.waterloo.edu (Lee Dickey) (02/24/88)

 A few weeks ago someone posted an article asking about an
 efficient ways to calculate the mode of an array of numbers.
 I guess I'll stick my neck out and suggest this algorithm.
 I don't know if this is best, but I think this works:


	NUB	<-  ( ( X iota X ) =  iota rho X ) /x <- ,x

	FREQ	<-  +/ NUB jot.= X

	Z	<-  1 take ( FREQ = max/ FREQ ) / NUB

 Of course in the last line it would be better to replace the
 "1 take" by "MIDDLE", where MIDDLE is the function that selects
 the middle element of a sorted vector, but I think that is another
 topic...

 Sorry, but I lost the name and address of the original poster, so
 can not send mail.  I would be interested in seeing other solutions.

-- 
 L. J. Dickey, Faculty of Mathematics, University of Waterloo.
	ljdickey@water.waterloo.edu
	ljdickey@watdcs.BITNET
	ljdickey@water.UUCP	...!uunet!water!ljdickey

jaxon@uicsrd.csrd.uiuc.edu (03/01/88)

For a large array, the outer product in the above solution
can be slow.  A solution that makes fewer comparisons is:

$ MODE is MODE V;#CT;FREQ;NUB;T
[1]  #CT is 0   note Need transitivity of =
[2]  V is V[gradeup V]
[3]  NUB is 1,(1 drop V) neq (^1 drop V)
[4]  FREQ is T - ^1 drop 0,T is NUB/iota rho T
[5]  MODE is (FREQ=max/FREQ)/NUB/V
[6]  MODE is MODE[floor .5 times ''rho rho MODE]
$

After obtaining the FREQuency counts in line 4
you might wish to select the MODE a different way,
Line[5]  selects all of them.

Iverson's Window Reduction operator could be used
on lines [3] and [4].

Since gradeup is  o(n log n) this algorithm is
faster in the limit cases, but probably won't
beat Prof. Dickey over short vectors.


P.S.  $ = del,  ^ = high minus,  # = quad.

Greg Jaxon   (jaxon@uicsrd.uiuc.edu)

zeke@fornax.UUCP (Zeke Hoskin) (03/01/88)

The nice thing about outer-product-based mode is that
it lends itself to a one-liner which you can print on
a T shirt and tell people you are dressed in the latest
mode. I guess a shirt is an outer product in its own 
right....
   The bad evil nasty thing is that it is an O(Nsquared)
algorithm. Here is a faster one:
   X <- X[gradeup X <- ,X]
   bool <- X =/= 1 drop X,somevalueotherthangreatestX
   ends <- bool / iota rho bool
   lengths <- ends - minus1 drop 0, ends  (in origin 1)
   pos <- ends [ lengths iota gratest reduc lengths]
   mode <- X[pos]

ljdickey@water.waterloo.edu (Lee Dickey) (03/08/88)

In article <49700001@uicsrd.csrd.uiuc.edu> jaxon@uicsrd.csrd.uiuc.edu writes:
> ...
>For a large array, the outer product in the above solution
>can be slow.  A solution that makes fewer comparisons is:
> ...

I think that Greg is right, as is David Lauer, who observed that

	..."+/NUB jot.= X" can produce an
	immensely large intermediate array...

David's solution, quite similar to Greg's is :

 Z <- MODE X
 X <- X[gradeup X]            # SORT X
 I <- X not_eq 1 shift X      # FIND BOUNDARIES BETWEEN GROUPS OF LIKE NUMBERS
 I <- ( neg 1 drop I),1       # DEAL WITH EDGE CONDITIONS
 NUB <- I/X                   # FIND UNIQUE NUMBERS
 H <- I/ iota rho I           # 
 H <- H-0, neg 1 drop H       # FIND LENGTH OF RUN (IE: FREQUENCY)
 I <- H iota (max /H)         # FIND MOST FREQUENT (SMALLEST IF SEVERAL)
 Z <- NUB[I]                  # LOOK-UP MOST FREQUENT X VALUE

David's address is: David Laur <DMLAUR@PUCC.bitnet>

-- 
 L. J. Dickey, Faculty of Mathematics, University of Waterloo.
	ljdickey@waterloo.edu
	ljdickey@watdcs.BITNET
	ljdickey@water.UUCP	...!uunet!water!ljdickey