[comp.lang.apl] Statistical Functions in J

hui@yrloc.ipsa.reuter.COM (Roger Hui) (05/12/91)

The definitions of mean (m.), normalize (n.), and spread (s.) have
changed in response to a detailed critique by Professor Fraser Jackson
of the Victoria University of Wellington (uunet!matai.vuw.ac.nz!jackson).
Professor Jackson's comments are as follows (minor editing by me):
 
--------------------------------------------------------------------
 
The concept of expectation is central to both probability and
statistics.  Indeed it is quite possible to axiomatise
probability theory in terms of expectations rather than
probabilities.  One didactic advantage of doing so is that the
whole apparatus of measure theory essential for an
axiomatisation based on probabilities can be introduced much
later.  From the classical viewpoint, P. Whittle, Probability,
Penguin, 1970?  (To appear in a revised edition by Springer
Verlag) gives an illustration of this alternative approach in
a text designed to cover the subject from first principles.
He illustrates numerous results from quantum theory to
investment analysis where the expectation is the natural
function to use.  From the subjective probability perspective,
De Finneti's classic published by Wiley also provides an
axiomatisation based on expectation.
 
m.
 
I believe therefore that the function  m.  should be the
expectation.   This would give
 
     m.  y        as the usual mean
     x  m.  y     as the mean formed with weights given
                  by the vector  x%+/x
 
Hence the monadic case would just be  1 m. y
 
This definitions permits a very simple treatment of something
which appears at the beginning of virtually all statistical
texts - frequency distributions.
 
For a frequency distribution with frequencies f and values y
this gives  (in expressions which I am sure can be written
more simply)
 
     the mean as   f m. y
     deviations from the mean   y  -  f m. y
     the mean squared deviation   f m. *: y - f m. y
 
In doing calculations with density functions the weighted mean
is always required to find statistics of the distribution, and
if there is any transformation of the variable then that also
requires use of a weighted mean calculation to find parameters of
the transformed distribution.  Since variable transformation is
often required the function proves very useful in numerical studies.
 
The expectation of a function g of y under a density  p y
becomes just  (p y) m. g y  or better  (p m. g) y
 
Using an obvious notation, the two commonest price index
numbers become
 
     laspeyres      (p0*q0) m. p1%p0
     paasche        (p0*q1) m. p1%p0
 
In the Dictionary, the concept of using the left argument for
the degrees of freedom is introduced.  The degrees of freedom
are generally only important in second moment calculations
with unweighted data.  It seems to me that is giving a special
case unwarranted importance.  The degrees of freedom are
simply corrected for using a multiplier.  In the simplest case
it is just  (#%(_1&+&#)) y  , but a whole range of other cases
are equally simply dealt with.
 
I note that Table 1 uses  moment  as the description of the
dyadic case of  m.  If that represents an earlier (or later
idea) it again seems to me inappropriate.  The moments are
highly specialised means and best dealt with using the mean of
some transformation of the variable, e.g. in the monadic case
 
     m. (y - m. y)^4
 
with the dyadic case
 
     f  m. ( y - f m. y)^4
 
n.
 
I would define  n.  to be consistent with  m.  Hence the
monadic case would be with divisor  #y, and the dyadic case
would give normalised deviations from a weighted mean.  Again
if there is a need to adjust degrees of freedom for a
particular case I would prefer to see a multiplier used since
in many cases items have non-integer weights.  Alternatively,
a similar procedure could be adopted as with the fit
conjunction, this time however using a degrees of freedom
conjunction.  That solution could give I believe the features
you want with the added flexibility and power permitted by
generalising to weighted means.  The DF conjunction would then
only be needed in cases where the data were individual
observations and the adjustment was required for a particular
statistic or calculation.
 
The dictionary definition of the dyadic case seems to me to
give an expression almost never used since apart from squared
deviations we seldom use a mean with divisor less than #y.
In the squared deviations case when it is used it is not then
common to normalise the squared values in any calculations I
have had occasion to use.
 
s.
 
I believe a case can be made here for two functions.  The
first would give a measure of spread for the variables
considered individually, and be defined across arbitrary arrays.  
For that I would prefer either the sum of squared deviations or 
the mean squared deviation on the grounds noted below.
 
The second is a function which generates information about the
covariation of the variables in each item of the data object.
If there is to be just one function, I would favour this, in
view of the importance of many areas of multi-variate analysis.
 
The standard deviation is a useful measure but very
specialised to the univariate case.  Further the standard
deviation is not an additive measure like the variance (using
it now with divisor #y) and the sum of squares.  Variances
can be simply combined, using the m. function above.  (For a
population the variance is just the weighted mean of the
variances if the populations consists of subsets with
different variances.)  The standard deviation needs to be
squared before it can be combined.  As soon as you move to
multivariate statistics it is the covariance matrix which is
the basic tool, not the correlation matrix or the variable
standard deviations.  In multivariate analyses it has been my
experience that the speed of formation of a cross product
matrix  X'X  is crucial.  This is so important that it deserves
to be a special function and programmed for speed of execution.
So my primitive function would be to form  (|:y) +/ . * y  in the
monadic case.  A case can be made for using the mean value of
the outer products of the items of y for consistency with the
previous definitions though I do not find that compelling.
The avoidance of centering within the function is deliberate.
Nearly all of the multivariate texts and the works on regression
express the theory in terms of the original observation units.
The case where the data are deviations from the mean then
becomes a special case.  With the  m.  primitive function it is
easily treated as just that.
 
The conventional covariance matrix then is
 
     (#y)%~(|:yd) +/ . * yd =. y - m. y
 
but for the occasions where that is required, if  s.  is the
cross product verb it is simple enough to enter
 
     (#y)%~ s. y - m. y
 
Using the normalised values we obtain the correlation matrix as
 
     (#y)%~ s. n. y
 
The choice of dyadic form is more difficult.  I believe that
consistency with the previous definitions would lead to choice
of the dyadic form  x s. y  as weighting the items in y by
%:x  where x is a vector of weights.  In this way each item in
y is given a weight in the crossproduct matrix equal to the
corresponding item in x.  The weights are not normalised to sum to 1
since in the applications a range of normalisations will be used.
 
This provides a very powerful primitive function, encompassing
all of the weighted regression models, and the powerful tools of
generalised least squares as outlined by McCullough and Nelder in
'Generalised Linear Models' (2nd ed) Chapman and Hall, 1989.
It would also provide a primitive function for many methods based
on weighted observations in Econometrics, in particular methods
due to H. White of constructing consistent estimators in the
common case when there are heteroscedastic errors.  It would also
be a core function in those methods of non-linear optimisation
designed to find the extreme of a likelihood function.
 
An alternative would be to define it as forming the expression
(|:x)+/ . * y  since this is also often required.  This would
be of value if the code for it was much faster than the using
the general expression above.
 
I prefer the definition which preserves consistency with
the concept of weighted values.  It is a more natural
extension consistent with the usage in the definitions of m.
and n. proposed.  By introducing a weighting function it is
signficantly generalising the monadic operation.  Second, and
less important from a language design persepective (but
perhaps as important for potential users), it is rather messy
to program otherwise, and exploiting the special structure of
the function could lead to important speed gains in execution.
 
I recognise that speed of execution has not been a primary
factor in your design, but it is an important consideration,
especially with the data intensive calculations likely in
multi-variate studies.  If the functions selected contribute
both to ease of statement of the problem, and fast execution
then they seem to me to have a strong case for inclusion.
 
Looking at these three functions now as a group, and
endeavouring to put them into Dictionary style --
 
m.
 
MEAN (_)  m. is the mean value, and  m. y  is  1 m. y
MEAN (_ _)  x m. y  is the weighted mean of the values of y
with weights given by x.  The weights will be normalised to
sum to one in forming the weighted mean.
 
(It may be desirable to restrict the dimensionality of the
arguments to (1 _) though the general case is often useful.)
 
n.
 
NORMALIZE (_)  The result of  n. y  is y translated to have a
mean of zero and a mean squared deviation of 1.  It is defined
as  1 n. y  .
 
NORMALIZE (_ _)  x n. y  provides normalised values of y
which when applied with weights x sum to zero and have mean
squared deviation equal to 1.  The weights x will be
normalised to sum to 1.  The result of  x n. y  is therefore
(y-x m. y)% %:x m.*:y-x m. y   Thus  x m. x n. y  is 0
and  x m. *: x n. y  is 1.  In the case where the weights are
frequencies and it is desired to standardise so that variance
is 1 the values must be multiplied by  %:(+/x)%(+/x)-1
 
(Again a restriction on the arguments to (1 _) could be
considered.)
 
s.
 
SPREAD (2)  s. y  is the crossproduct matrix of the values in
y and equal to  (|:y)+/ . * y  .  Consistent with the usage of
the other statistical functions,  s. y  is  1 s. y  .  The
covariance matrix is just  (#y)%~ s. y - m. y   and the
correlation matrix is  (#y)%~ s. n. y
 
SPREAD (1 2)  x s. y  is the crossproduct matrix
s. (%:x)*"0 1 y .  Note that the weights are not normalised
to sum to one in this case.  The dyadic form weights the outer
product of each item of y by the corresponding item in x.
 
----------------------------------------------------------
 
The new definitions have been implemented in J 3.1.  They are
modelled thus:
 
rx  =. ($@$@[ <@}. $@]) $&> [
ry  =. ($@$@] <@}. $@[) $&> ]
m2  =. (rx * ry) %&(+/) rx
m1  =. +/ % #
 
dev =. ry - m.
rms =. [ %:@m. *:@dev
n2  =. dev % rms
n1  =. 1&n2
 
s1  =. |: +/ . * ]
s2  =. s1@(%:@[ *"0 1 ])
 
m.  is   m1 : m2
n.  is   n1 : n2
s.  is   s1 : s2 " 2 1 2

-----------------------------------------------------------------
Roger Hui
Iverson Software Inc., 33 Major Street, Toronto, Ontario  M5S 2K9
(416) 925 6096

louk@tslwat.UUCP (Lou Kates) (05/19/91)

In article <1991May12.145907.19563@yrloc.ipsa.reuter.COM> hui@yrloc.ipsa.reuter.COM (Roger Hui) writes:
>The definitions of mean (m.), normalize (n.), and spread (s.) have
>changed in response to a detailed critique by Professor Fraser Jackson
>of the Victoria University of Wellington (uunet!matai.vuw.ac.nz!jackson).
>Professor Jackson's comments are as follows (minor editing by me):
> 
>--------------------------------------------------------------------
> 
>The concept of expectation is central to both probability and
>statistics.  Indeed it is quite possible to axiomatise
>probability theory in terms of expectations rather than
>probabilities.  One didactic advantage of doing so is that the
>whole apparatus of measure theory essential for an
>axiomatisation based on probabilities can be introduced much
>later.  

Nevertheless, I believe the key  concept here is not expectation,
probability or  measure  but regression and projection. From this
viewpoint the old APL's domino  operator (or regression operator)
had it correct and the above suggestions are a step backwards.

The mean    of  a  vector  V   is the  regression coefficient  of
projecting  V onto a vector  of all ones. The space orthogonal to
this vector of ones  is the deviation space and the length of the
projection of  V onto  this  deviation  space   divided   by  the
dimensionality of the deviation  space (which is  the length of V
minus one) is the standard deviation.

This fits    in  with  the   geometric  interpretation of  linear
statistical  methods that is commonly taught to statisticians and
is  the standard  way   of visualizing and unifying a variety  of
statistical techniques including regression, analysis of variance
and time series analysis.

Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu
519-725-0646

hui@yrloc.ipsa.reuter.COM (Roger Hui) (05/21/91)

Lou Kates writes:

> Nevertheless, I believe the key  concept here is not expectation,
> probability or  measure  but regression and projection. From this
> viewpoint the old APL's domino  operator (or regression operator)
> had it correct and the above suggestions are a step backwards.

> The mean    of  a  vector  V   is the  regression coefficient  of
> projecting  V onto a vector  of all ones. The space orthogonal to
> this vector of ones  is the deviation space and the length of the
> projection of  V onto  this  deviation  space   divided   by  the
> dimensionality of the deviation  space (which is  the length of V
> minus one) is the standard deviation.

I am puzzled.  How would a belief in regression and projection
instead of expectation as the key concept materially affect
the design of the primitives m., n., and s.?

There are many statistical functions worthy of inclusion in the
language.  We are adding the new functions mean, normalization, 
and cross product matrix; and assigning the cognate weighted mean, 
weighted normalization, and weighted cross product to the dyads 
of these functions seems both consistent and useful.  These new 
functions do not preclude adding other statistical functions in 
the future, nor do they affect the domino function or any other 
existing function.  Therefore, I would not characterize them 
as a backward step.

We would welcome suggestions for other statistical functions.

-----------------------------------------------------------------
Roger Hui
Iverson Software Inc., 33 Major Street, Toronto, Ontario  M5S 2K9
(416) 925 6096

sam@kalessin.jpl.nasa.gov (Sam Sirlin) (05/21/91)

In article <1991May21.042804.21102@yrloc.ipsa.reuter.COM>, hui@yrloc.ipsa.reuter.COM (Roger Hui) writes:

|> We would welcome suggestions for other statistical functions.
|> 

I can calculate means and standard deviations pretty easily in APL or J.
On the other hand, what about FFT's or PSD's? These would have applicability
beyond just statistics. Of course once linkc (sp?) is available we can all
build our own. It seems to me that APL was a pioneer in including solutions
to linear equations (domino) in the language, but that now the standard is
somewhat higher (Matlab, Mathematica, ...) and includes EISPACK, LINPACK, 
and FFT's. Of course these make the code size grow...

-- 
Sam Sirlin
Jet Propulsion Laboratory         sam@kalessin.jpl.nasa.gov

weg@convx1.ccit.arizona.edu (Eythan Weg) (05/22/91)

In article <1991May21.042804.21102@yrloc.ipsa.reuter.COM> hui@yrloc.ipsa.reuter.COM (Roger Hui) writes:

   Lou Kates writes:

   > Nevertheless, I believe the key  concept here is not expectation,
   > probability or  measure  but regression and projection. From this
   > viewpoint the old APL's domino  operator (or regression operator)
   > had it correct and the above suggestions are a step backwards.

   > The mean    of  a  vector  V   is the  regression coefficient  of
   > projecting  V onto a vector  of all ones. The space orthogonal to
   > this vector of ones  is the deviation space and the length of the
   > projection of  V onto  this  deviation  space   divided   by  the
   > dimensionality of the deviation  space (which is  the length of V
   > minus one) is the standard deviation.

   I am puzzled.  How would a belief in regression and projection
   instead of expectation as the key concept materially affect
   the design of the primitives m., n., and s.?

   There are many statistical functions worthy of inclusion in the
   language.  We are adding the new functions mean, normalization, 
   and cross product matrix; and assigning the cognate weighted mean, 
   weighted normalization, and weighted cross product to the dyads 
   of these functions seems both consistent and useful.  These new 
   functions do not preclude adding other statistical functions in 
   the future, nor do they affect the domino function or any other 
   existing function.  Therefore, I would not characterize them 
   as a backward step.

   We would welcome suggestions for other statistical functions.

I do not want to see J as a statistical package.  But I would agree
that the mean and its cousins are basic enough to be included in
some form.   Regression is after all some derived concept for which
expectation is needed to set up the appropriate model.

But since we are invited to suggest I will voice my desire to have
generalized means etc that are some robust versions of these, were you
obtain the regular concepts as defaults.  A generalization of this to
general robust regression is welcome but I am not sure it can be done
in the spirit of J.

Eythan

hui@yrloc.ipsa.reuter.COM (Roger Hui) (05/22/91)

Sam Sirlin writes:

>I can calculate means and standard deviations pretty easily in APL or J.
>On the other hand, what about FFT's or PSD's? These would have applicability
>beyond just statistics. Of course once linkc (sp?) is available we can all
>build our own. It seems to me that APL was a pioneer in including solutions
>to linear equations (domino) in the language, but that now the standard is
>somewhat higher (Matlab, Mathematica, ...) and includes EISPACK, LINPACK, 
>and FFT's. Of course these make the code size grow...

Yes, LinkJ is the preferred means for access to FFT, LP, and other more
specialized routines.  The advantages are twofold:  it avoids duplicating
the excellent work of others, and it avoids having to support the facility
forever more.

-----------------------------------------------------------------
Roger Hui
Iverson Software Inc., 33 Major Street, Toronto, Ontario  M5S 2K9
(416) 925 6096

louk@tslwat.UUCP (Lou Kates) (05/29/91)

In article <1991May21.042804.21102@yrloc.ipsa.reuter.COM> hui@yrloc.ipsa.reuter.COM (Roger Hui) writes:
>Lou Kates writes:
>
>> Nevertheless, I believe the key  concept here is not expectation,
>> probability or  measure  but regression and projection. From this
>> viewpoint the old APL's domino  operator (or regression operator)
>> had it correct and the above suggestions are a step backwards.
> 
>I am puzzled.  How would a belief in regression and projection
>instead of expectation as the key concept materially affect
>the design of the primitives m., n., and s.?
                   ^^^^^^^^^^
When everything you can think of gets put into the lanugage its hard
to see how they all qualify as primitive.

> 
>There are many statistical functions worthy of inclusion in the
>language.  We are adding the new functions ...

I guess its whether you believe in parsimony or not.

Personally I   would   rather have   a wider  family of  powerful
operations at my disposal (such as regression, constrained linear
optimization a la simplex  or Karmarker, eigenvalue calculations,
etc.) that are not easily derivable from each other rather than a
large  set  of functions which are all readily derivable from the
ideas of regression  and   projection.  My own preference, and  I
suspect that of many others too, would be  that if   you feel the
need to  have zillions of functions at  your  disposal,  define a
standard library so that you can take them out of the language so
as to keep  the   language  smaller  and more    manageable.   If
performance is the  issue   then   there  is  nothing  to  stop a
particular  implementation  from implementing  certain   standard
library functions in the kernel.

Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu
519-725-0646

dave@visual1.jhuapl.edu (Dave Weintraub) (05/30/91)

In article <414@tslwat.UUCP>, louk@tslwat.UUCP (Lou Kates) writes:
|> ...
|> 
|> Personally I   would   rather have   a wider  family of  powerful
|> operations at my disposal (such as regression, constrained linear
|> optimization a la simplex  or Karmarker, eigenvalue calculations,
|> etc.) that are not easily derivable from each other rather than a
|> large  set  of functions which are all readily derivable from the
|> ideas of regression  and   projection.  My own preference, and  I
|> suspect that of many others too, would be  that if   you feel the
|> need to  have zillions of functions at  your  disposal,  define a
|> standard library so that you can take them out of the language so
|> as to keep  the   language  smaller  and more    manageable.   If
|> performance is the  issue   then   there  is  nothing  to  stop a
|> particular  implementation  from implementing  certain   standard
|> library functions in the kernel.
|> 
|> Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu
|> 519-725-0646
I fully agree.  This is the path taken by IBM with APL2: write external
functions (in Assembler, FORTRAN, PL/I,  C, ...) and make these available
using QuadNA.

sam@kalessin.jpl.nasa.gov (Sam Sirlin) (05/30/91)

In article <1991May29.191441.1279@aplcen.apl.jhu.edu>, dave@visual1.jhuapl.edu (Dave Weintraub) writes:

|> I fully agree.  This is the path taken by IBM with APL2: write external
|> functions (in Assembler, FORTRAN, PL/I,  C, ...) and make these available
|> using QuadNA.

An interesting path in this vein is the path taken by ProMatlab. It uses the
dynamic linking capabilities of modern machines. Hence a variety of compiled
routines are available (and more can be written by the user) that are easily
linked in while running the Matlab interpreter, simply by invoking the program 
by name (Matlab then searches for the right sort of file and then does the
dynamic link). Using this approach, the kitchen sink doesn't have to be in 
the code for everyone, but standard compiled routines are available for those 
who need them at practically no overhead.

-- 
Sam Sirlin
Jet Propulsion Laboratory         sam@kalessin.jpl.nasa.gov

hui@yrloc.ipsa.reuter.COM (Roger Hui) (06/01/91)

Dave Weintraub writes:

> I fully agree.  This is the path taken by IBM with APL2: write external
> functions (in Assembler, FORTRAN, PL/I,  C, ...) and make these available
> using QuadNA.

Dave, this is in fact possible with LinkJ.  LinkJ permits calling
J from C and C from J.  C functions so introduced behave like primitive
verbs, in the sense that they may be assigned names, and (independently)
may serve as arguments to adverbs and conjunctions.  For example,

   f =. 10!:57     NB. f is an external function encoded by 57
   f"r y           NB. Apply f to rank r cells of y
   x f"r y         NB. Apply f to rank r cells of x and y
   f/y             NB. f insertion ("reduction")
   x f/y           NB. f table ("outer product")
   10!:362"r y     NB. Apply the external fn encoded by 362 to rank r cells

Sam Sirlin writes:

> An interesting path in this vein is the path taken by ProMatlab. It uses the
> dynamic linking capabilities of modern machines. Hence a variety of compiled
> routines are available (and more can be written by the user) that are easily
> linked in while running the Matlab interpreter,  ...

Sam, this is possible with LinkJ on systems supporting dynamic linking.

Lou Kates writes:

> >I am puzzled.  How would a belief in regression and projection
> >instead of expectation as the key concept materially affect
> >the design of the primitives m., n., and s.?
>                    ^^^^^^^^^^
> When everything you can think of gets put into the lanugage its hard
> to see how they all qualify as primitive.
>   ...
> I guess its whether you believe in parsimony or not.
>
> Personally I   would   rather have   a wider  family of  powerful
> operations at my disposal (such as regression, constrained linear
> optimization a la simplex  or Karmarker, eigenvalue calculations,
> etc.) that are not easily derivable from each other rather than a
> large  set  of functions which are all readily derivable from the
> ideas of regression  and   projection.  My own preference, and  I
> suspect that of many others too, would be  that if   you feel the
> need to  have zillions of functions at  your  disposal,  define a
> standard library so that you can take them out of the language so
> as to keep  the   language  smaller  and more    manageable.
>   ...

It may be helpful to consult your copy of the Dictionary.  In it,
you'd find that (a) not everything you can think of is in the language;
(b) we do not in fact have zillions of functions in the language;
(c) the ISO APL %. (matrix inverse and matrix divide, what you called
regression) is in the language; (d) characteristic values and vectors
are in the language.

Many primitives can be derived readily from each other.  One could
argue that *: (nand) makes the other boolean functions redundant, as do
- (minus) and % (divide) for + (plus) and * (times).  Similarly, one can
derive |. (reverse), |: (transpose), u;.n (cut), u . v (generalized
determinant), %. (domino), and so forth, from { (from).

There is more to parsimony than reducing the number of primitives.

-----------------------------------------------------------------
Roger Hui
Iverson Software Inc., 33 Major Street, Toronto, Ontario  M5S 2K9
(416) 925 6096