hui@yrloc.ipsa.reuter.COM (Roger Hui) (05/12/91)
The definitions of mean (m.), normalize (n.), and spread (s.) have
changed in response to a detailed critique by Professor Fraser Jackson
of the Victoria University of Wellington (uunet!matai.vuw.ac.nz!jackson).
Professor Jackson's comments are as follows (minor editing by me):
--------------------------------------------------------------------
The concept of expectation is central to both probability and
statistics. Indeed it is quite possible to axiomatise
probability theory in terms of expectations rather than
probabilities. One didactic advantage of doing so is that the
whole apparatus of measure theory essential for an
axiomatisation based on probabilities can be introduced much
later. From the classical viewpoint, P. Whittle, Probability,
Penguin, 1970? (To appear in a revised edition by Springer
Verlag) gives an illustration of this alternative approach in
a text designed to cover the subject from first principles.
He illustrates numerous results from quantum theory to
investment analysis where the expectation is the natural
function to use. From the subjective probability perspective,
De Finneti's classic published by Wiley also provides an
axiomatisation based on expectation.
m.
I believe therefore that the function m. should be the
expectation. This would give
m. y as the usual mean
x m. y as the mean formed with weights given
by the vector x%+/x
Hence the monadic case would just be 1 m. y
This definitions permits a very simple treatment of something
which appears at the beginning of virtually all statistical
texts - frequency distributions.
For a frequency distribution with frequencies f and values y
this gives (in expressions which I am sure can be written
more simply)
the mean as f m. y
deviations from the mean y - f m. y
the mean squared deviation f m. *: y - f m. y
In doing calculations with density functions the weighted mean
is always required to find statistics of the distribution, and
if there is any transformation of the variable then that also
requires use of a weighted mean calculation to find parameters of
the transformed distribution. Since variable transformation is
often required the function proves very useful in numerical studies.
The expectation of a function g of y under a density p y
becomes just (p y) m. g y or better (p m. g) y
Using an obvious notation, the two commonest price index
numbers become
laspeyres (p0*q0) m. p1%p0
paasche (p0*q1) m. p1%p0
In the Dictionary, the concept of using the left argument for
the degrees of freedom is introduced. The degrees of freedom
are generally only important in second moment calculations
with unweighted data. It seems to me that is giving a special
case unwarranted importance. The degrees of freedom are
simply corrected for using a multiplier. In the simplest case
it is just (#%(_1&+&#)) y , but a whole range of other cases
are equally simply dealt with.
I note that Table 1 uses moment as the description of the
dyadic case of m. If that represents an earlier (or later
idea) it again seems to me inappropriate. The moments are
highly specialised means and best dealt with using the mean of
some transformation of the variable, e.g. in the monadic case
m. (y - m. y)^4
with the dyadic case
f m. ( y - f m. y)^4
n.
I would define n. to be consistent with m. Hence the
monadic case would be with divisor #y, and the dyadic case
would give normalised deviations from a weighted mean. Again
if there is a need to adjust degrees of freedom for a
particular case I would prefer to see a multiplier used since
in many cases items have non-integer weights. Alternatively,
a similar procedure could be adopted as with the fit
conjunction, this time however using a degrees of freedom
conjunction. That solution could give I believe the features
you want with the added flexibility and power permitted by
generalising to weighted means. The DF conjunction would then
only be needed in cases where the data were individual
observations and the adjustment was required for a particular
statistic or calculation.
The dictionary definition of the dyadic case seems to me to
give an expression almost never used since apart from squared
deviations we seldom use a mean with divisor less than #y.
In the squared deviations case when it is used it is not then
common to normalise the squared values in any calculations I
have had occasion to use.
s.
I believe a case can be made here for two functions. The
first would give a measure of spread for the variables
considered individually, and be defined across arbitrary arrays.
For that I would prefer either the sum of squared deviations or
the mean squared deviation on the grounds noted below.
The second is a function which generates information about the
covariation of the variables in each item of the data object.
If there is to be just one function, I would favour this, in
view of the importance of many areas of multi-variate analysis.
The standard deviation is a useful measure but very
specialised to the univariate case. Further the standard
deviation is not an additive measure like the variance (using
it now with divisor #y) and the sum of squares. Variances
can be simply combined, using the m. function above. (For a
population the variance is just the weighted mean of the
variances if the populations consists of subsets with
different variances.) The standard deviation needs to be
squared before it can be combined. As soon as you move to
multivariate statistics it is the covariance matrix which is
the basic tool, not the correlation matrix or the variable
standard deviations. In multivariate analyses it has been my
experience that the speed of formation of a cross product
matrix X'X is crucial. This is so important that it deserves
to be a special function and programmed for speed of execution.
So my primitive function would be to form (|:y) +/ . * y in the
monadic case. A case can be made for using the mean value of
the outer products of the items of y for consistency with the
previous definitions though I do not find that compelling.
The avoidance of centering within the function is deliberate.
Nearly all of the multivariate texts and the works on regression
express the theory in terms of the original observation units.
The case where the data are deviations from the mean then
becomes a special case. With the m. primitive function it is
easily treated as just that.
The conventional covariance matrix then is
(#y)%~(|:yd) +/ . * yd =. y - m. y
but for the occasions where that is required, if s. is the
cross product verb it is simple enough to enter
(#y)%~ s. y - m. y
Using the normalised values we obtain the correlation matrix as
(#y)%~ s. n. y
The choice of dyadic form is more difficult. I believe that
consistency with the previous definitions would lead to choice
of the dyadic form x s. y as weighting the items in y by
%:x where x is a vector of weights. In this way each item in
y is given a weight in the crossproduct matrix equal to the
corresponding item in x. The weights are not normalised to sum to 1
since in the applications a range of normalisations will be used.
This provides a very powerful primitive function, encompassing
all of the weighted regression models, and the powerful tools of
generalised least squares as outlined by McCullough and Nelder in
'Generalised Linear Models' (2nd ed) Chapman and Hall, 1989.
It would also provide a primitive function for many methods based
on weighted observations in Econometrics, in particular methods
due to H. White of constructing consistent estimators in the
common case when there are heteroscedastic errors. It would also
be a core function in those methods of non-linear optimisation
designed to find the extreme of a likelihood function.
An alternative would be to define it as forming the expression
(|:x)+/ . * y since this is also often required. This would
be of value if the code for it was much faster than the using
the general expression above.
I prefer the definition which preserves consistency with
the concept of weighted values. It is a more natural
extension consistent with the usage in the definitions of m.
and n. proposed. By introducing a weighting function it is
signficantly generalising the monadic operation. Second, and
less important from a language design persepective (but
perhaps as important for potential users), it is rather messy
to program otherwise, and exploiting the special structure of
the function could lead to important speed gains in execution.
I recognise that speed of execution has not been a primary
factor in your design, but it is an important consideration,
especially with the data intensive calculations likely in
multi-variate studies. If the functions selected contribute
both to ease of statement of the problem, and fast execution
then they seem to me to have a strong case for inclusion.
Looking at these three functions now as a group, and
endeavouring to put them into Dictionary style --
m.
MEAN (_) m. is the mean value, and m. y is 1 m. y
MEAN (_ _) x m. y is the weighted mean of the values of y
with weights given by x. The weights will be normalised to
sum to one in forming the weighted mean.
(It may be desirable to restrict the dimensionality of the
arguments to (1 _) though the general case is often useful.)
n.
NORMALIZE (_) The result of n. y is y translated to have a
mean of zero and a mean squared deviation of 1. It is defined
as 1 n. y .
NORMALIZE (_ _) x n. y provides normalised values of y
which when applied with weights x sum to zero and have mean
squared deviation equal to 1. The weights x will be
normalised to sum to 1. The result of x n. y is therefore
(y-x m. y)% %:x m.*:y-x m. y Thus x m. x n. y is 0
and x m. *: x n. y is 1. In the case where the weights are
frequencies and it is desired to standardise so that variance
is 1 the values must be multiplied by %:(+/x)%(+/x)-1
(Again a restriction on the arguments to (1 _) could be
considered.)
s.
SPREAD (2) s. y is the crossproduct matrix of the values in
y and equal to (|:y)+/ . * y . Consistent with the usage of
the other statistical functions, s. y is 1 s. y . The
covariance matrix is just (#y)%~ s. y - m. y and the
correlation matrix is (#y)%~ s. n. y
SPREAD (1 2) x s. y is the crossproduct matrix
s. (%:x)*"0 1 y . Note that the weights are not normalised
to sum to one in this case. The dyadic form weights the outer
product of each item of y by the corresponding item in x.
----------------------------------------------------------
The new definitions have been implemented in J 3.1. They are
modelled thus:
rx =. ($@$@[ <@}. $@]) $&> [
ry =. ($@$@] <@}. $@[) $&> ]
m2 =. (rx * ry) %&(+/) rx
m1 =. +/ % #
dev =. ry - m.
rms =. [ %:@m. *:@dev
n2 =. dev % rms
n1 =. 1&n2
s1 =. |: +/ . * ]
s2 =. s1@(%:@[ *"0 1 ])
m. is m1 : m2
n. is n1 : n2
s. is s1 : s2 " 2 1 2
-----------------------------------------------------------------
Roger Hui
Iverson Software Inc., 33 Major Street, Toronto, Ontario M5S 2K9
(416) 925 6096louk@tslwat.UUCP (Lou Kates) (05/19/91)
In article <1991May12.145907.19563@yrloc.ipsa.reuter.COM> hui@yrloc.ipsa.reuter.COM (Roger Hui) writes: >The definitions of mean (m.), normalize (n.), and spread (s.) have >changed in response to a detailed critique by Professor Fraser Jackson >of the Victoria University of Wellington (uunet!matai.vuw.ac.nz!jackson). >Professor Jackson's comments are as follows (minor editing by me): > >-------------------------------------------------------------------- > >The concept of expectation is central to both probability and >statistics. Indeed it is quite possible to axiomatise >probability theory in terms of expectations rather than >probabilities. One didactic advantage of doing so is that the >whole apparatus of measure theory essential for an >axiomatisation based on probabilities can be introduced much >later. Nevertheless, I believe the key concept here is not expectation, probability or measure but regression and projection. From this viewpoint the old APL's domino operator (or regression operator) had it correct and the above suggestions are a step backwards. The mean of a vector V is the regression coefficient of projecting V onto a vector of all ones. The space orthogonal to this vector of ones is the deviation space and the length of the projection of V onto this deviation space divided by the dimensionality of the deviation space (which is the length of V minus one) is the standard deviation. This fits in with the geometric interpretation of linear statistical methods that is commonly taught to statisticians and is the standard way of visualizing and unifying a variety of statistical techniques including regression, analysis of variance and time series analysis. Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu 519-725-0646
hui@yrloc.ipsa.reuter.COM (Roger Hui) (05/21/91)
Lou Kates writes: > Nevertheless, I believe the key concept here is not expectation, > probability or measure but regression and projection. From this > viewpoint the old APL's domino operator (or regression operator) > had it correct and the above suggestions are a step backwards. > The mean of a vector V is the regression coefficient of > projecting V onto a vector of all ones. The space orthogonal to > this vector of ones is the deviation space and the length of the > projection of V onto this deviation space divided by the > dimensionality of the deviation space (which is the length of V > minus one) is the standard deviation. I am puzzled. How would a belief in regression and projection instead of expectation as the key concept materially affect the design of the primitives m., n., and s.? There are many statistical functions worthy of inclusion in the language. We are adding the new functions mean, normalization, and cross product matrix; and assigning the cognate weighted mean, weighted normalization, and weighted cross product to the dyads of these functions seems both consistent and useful. These new functions do not preclude adding other statistical functions in the future, nor do they affect the domino function or any other existing function. Therefore, I would not characterize them as a backward step. We would welcome suggestions for other statistical functions. ----------------------------------------------------------------- Roger Hui Iverson Software Inc., 33 Major Street, Toronto, Ontario M5S 2K9 (416) 925 6096
sam@kalessin.jpl.nasa.gov (Sam Sirlin) (05/21/91)
In article <1991May21.042804.21102@yrloc.ipsa.reuter.COM>, hui@yrloc.ipsa.reuter.COM (Roger Hui) writes: |> We would welcome suggestions for other statistical functions. |> I can calculate means and standard deviations pretty easily in APL or J. On the other hand, what about FFT's or PSD's? These would have applicability beyond just statistics. Of course once linkc (sp?) is available we can all build our own. It seems to me that APL was a pioneer in including solutions to linear equations (domino) in the language, but that now the standard is somewhat higher (Matlab, Mathematica, ...) and includes EISPACK, LINPACK, and FFT's. Of course these make the code size grow... -- Sam Sirlin Jet Propulsion Laboratory sam@kalessin.jpl.nasa.gov
weg@convx1.ccit.arizona.edu (Eythan Weg) (05/22/91)
In article <1991May21.042804.21102@yrloc.ipsa.reuter.COM> hui@yrloc.ipsa.reuter.COM (Roger Hui) writes: Lou Kates writes: > Nevertheless, I believe the key concept here is not expectation, > probability or measure but regression and projection. From this > viewpoint the old APL's domino operator (or regression operator) > had it correct and the above suggestions are a step backwards. > The mean of a vector V is the regression coefficient of > projecting V onto a vector of all ones. The space orthogonal to > this vector of ones is the deviation space and the length of the > projection of V onto this deviation space divided by the > dimensionality of the deviation space (which is the length of V > minus one) is the standard deviation. I am puzzled. How would a belief in regression and projection instead of expectation as the key concept materially affect the design of the primitives m., n., and s.? There are many statistical functions worthy of inclusion in the language. We are adding the new functions mean, normalization, and cross product matrix; and assigning the cognate weighted mean, weighted normalization, and weighted cross product to the dyads of these functions seems both consistent and useful. These new functions do not preclude adding other statistical functions in the future, nor do they affect the domino function or any other existing function. Therefore, I would not characterize them as a backward step. We would welcome suggestions for other statistical functions. I do not want to see J as a statistical package. But I would agree that the mean and its cousins are basic enough to be included in some form. Regression is after all some derived concept for which expectation is needed to set up the appropriate model. But since we are invited to suggest I will voice my desire to have generalized means etc that are some robust versions of these, were you obtain the regular concepts as defaults. A generalization of this to general robust regression is welcome but I am not sure it can be done in the spirit of J. Eythan
hui@yrloc.ipsa.reuter.COM (Roger Hui) (05/22/91)
Sam Sirlin writes: >I can calculate means and standard deviations pretty easily in APL or J. >On the other hand, what about FFT's or PSD's? These would have applicability >beyond just statistics. Of course once linkc (sp?) is available we can all >build our own. It seems to me that APL was a pioneer in including solutions >to linear equations (domino) in the language, but that now the standard is >somewhat higher (Matlab, Mathematica, ...) and includes EISPACK, LINPACK, >and FFT's. Of course these make the code size grow... Yes, LinkJ is the preferred means for access to FFT, LP, and other more specialized routines. The advantages are twofold: it avoids duplicating the excellent work of others, and it avoids having to support the facility forever more. ----------------------------------------------------------------- Roger Hui Iverson Software Inc., 33 Major Street, Toronto, Ontario M5S 2K9 (416) 925 6096
louk@tslwat.UUCP (Lou Kates) (05/29/91)
In article <1991May21.042804.21102@yrloc.ipsa.reuter.COM> hui@yrloc.ipsa.reuter.COM (Roger Hui) writes: >Lou Kates writes: > >> Nevertheless, I believe the key concept here is not expectation, >> probability or measure but regression and projection. From this >> viewpoint the old APL's domino operator (or regression operator) >> had it correct and the above suggestions are a step backwards. > >I am puzzled. How would a belief in regression and projection >instead of expectation as the key concept materially affect >the design of the primitives m., n., and s.? ^^^^^^^^^^ When everything you can think of gets put into the lanugage its hard to see how they all qualify as primitive. > >There are many statistical functions worthy of inclusion in the >language. We are adding the new functions ... I guess its whether you believe in parsimony or not. Personally I would rather have a wider family of powerful operations at my disposal (such as regression, constrained linear optimization a la simplex or Karmarker, eigenvalue calculations, etc.) that are not easily derivable from each other rather than a large set of functions which are all readily derivable from the ideas of regression and projection. My own preference, and I suspect that of many others too, would be that if you feel the need to have zillions of functions at your disposal, define a standard library so that you can take them out of the language so as to keep the language smaller and more manageable. If performance is the issue then there is nothing to stop a particular implementation from implementing certain standard library functions in the kernel. Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu 519-725-0646
dave@visual1.jhuapl.edu (Dave Weintraub) (05/30/91)
In article <414@tslwat.UUCP>, louk@tslwat.UUCP (Lou Kates) writes: |> ... |> |> Personally I would rather have a wider family of powerful |> operations at my disposal (such as regression, constrained linear |> optimization a la simplex or Karmarker, eigenvalue calculations, |> etc.) that are not easily derivable from each other rather than a |> large set of functions which are all readily derivable from the |> ideas of regression and projection. My own preference, and I |> suspect that of many others too, would be that if you feel the |> need to have zillions of functions at your disposal, define a |> standard library so that you can take them out of the language so |> as to keep the language smaller and more manageable. If |> performance is the issue then there is nothing to stop a |> particular implementation from implementing certain standard |> library functions in the kernel. |> |> Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu |> 519-725-0646 I fully agree. This is the path taken by IBM with APL2: write external functions (in Assembler, FORTRAN, PL/I, C, ...) and make these available using QuadNA.
sam@kalessin.jpl.nasa.gov (Sam Sirlin) (05/30/91)
In article <1991May29.191441.1279@aplcen.apl.jhu.edu>, dave@visual1.jhuapl.edu (Dave Weintraub) writes: |> I fully agree. This is the path taken by IBM with APL2: write external |> functions (in Assembler, FORTRAN, PL/I, C, ...) and make these available |> using QuadNA. An interesting path in this vein is the path taken by ProMatlab. It uses the dynamic linking capabilities of modern machines. Hence a variety of compiled routines are available (and more can be written by the user) that are easily linked in while running the Matlab interpreter, simply by invoking the program by name (Matlab then searches for the right sort of file and then does the dynamic link). Using this approach, the kitchen sink doesn't have to be in the code for everyone, but standard compiled routines are available for those who need them at practically no overhead. -- Sam Sirlin Jet Propulsion Laboratory sam@kalessin.jpl.nasa.gov
hui@yrloc.ipsa.reuter.COM (Roger Hui) (06/01/91)
Dave Weintraub writes: > I fully agree. This is the path taken by IBM with APL2: write external > functions (in Assembler, FORTRAN, PL/I, C, ...) and make these available > using QuadNA. Dave, this is in fact possible with LinkJ. LinkJ permits calling J from C and C from J. C functions so introduced behave like primitive verbs, in the sense that they may be assigned names, and (independently) may serve as arguments to adverbs and conjunctions. For example, f =. 10!:57 NB. f is an external function encoded by 57 f"r y NB. Apply f to rank r cells of y x f"r y NB. Apply f to rank r cells of x and y f/y NB. f insertion ("reduction") x f/y NB. f table ("outer product") 10!:362"r y NB. Apply the external fn encoded by 362 to rank r cells Sam Sirlin writes: > An interesting path in this vein is the path taken by ProMatlab. It uses the > dynamic linking capabilities of modern machines. Hence a variety of compiled > routines are available (and more can be written by the user) that are easily > linked in while running the Matlab interpreter, ... Sam, this is possible with LinkJ on systems supporting dynamic linking. Lou Kates writes: > >I am puzzled. How would a belief in regression and projection > >instead of expectation as the key concept materially affect > >the design of the primitives m., n., and s.? > ^^^^^^^^^^ > When everything you can think of gets put into the lanugage its hard > to see how they all qualify as primitive. > ... > I guess its whether you believe in parsimony or not. > > Personally I would rather have a wider family of powerful > operations at my disposal (such as regression, constrained linear > optimization a la simplex or Karmarker, eigenvalue calculations, > etc.) that are not easily derivable from each other rather than a > large set of functions which are all readily derivable from the > ideas of regression and projection. My own preference, and I > suspect that of many others too, would be that if you feel the > need to have zillions of functions at your disposal, define a > standard library so that you can take them out of the language so > as to keep the language smaller and more manageable. > ... It may be helpful to consult your copy of the Dictionary. In it, you'd find that (a) not everything you can think of is in the language; (b) we do not in fact have zillions of functions in the language; (c) the ISO APL %. (matrix inverse and matrix divide, what you called regression) is in the language; (d) characteristic values and vectors are in the language. Many primitives can be derived readily from each other. One could argue that *: (nand) makes the other boolean functions redundant, as do - (minus) and % (divide) for + (plus) and * (times). Similarly, one can derive |. (reverse), |: (transpose), u;.n (cut), u . v (generalized determinant), %. (domino), and so forth, from { (from). There is more to parsimony than reducing the number of primitives. ----------------------------------------------------------------- Roger Hui Iverson Software Inc., 33 Major Street, Toronto, Ontario M5S 2K9 (416) 925 6096