hui@yrloc.ipsa.reuter.COM (Roger Hui) (05/12/91)
The definitions of mean (m.), normalize (n.), and spread (s.) have changed in response to a detailed critique by Professor Fraser Jackson of the Victoria University of Wellington (uunet!matai.vuw.ac.nz!jackson). Professor Jackson's comments are as follows (minor editing by me): -------------------------------------------------------------------- The concept of expectation is central to both probability and statistics. Indeed it is quite possible to axiomatise probability theory in terms of expectations rather than probabilities. One didactic advantage of doing so is that the whole apparatus of measure theory essential for an axiomatisation based on probabilities can be introduced much later. From the classical viewpoint, P. Whittle, Probability, Penguin, 1970? (To appear in a revised edition by Springer Verlag) gives an illustration of this alternative approach in a text designed to cover the subject from first principles. He illustrates numerous results from quantum theory to investment analysis where the expectation is the natural function to use. From the subjective probability perspective, De Finneti's classic published by Wiley also provides an axiomatisation based on expectation. m. I believe therefore that the function m. should be the expectation. This would give m. y as the usual mean x m. y as the mean formed with weights given by the vector x%+/x Hence the monadic case would just be 1 m. y This definitions permits a very simple treatment of something which appears at the beginning of virtually all statistical texts - frequency distributions. For a frequency distribution with frequencies f and values y this gives (in expressions which I am sure can be written more simply) the mean as f m. y deviations from the mean y - f m. y the mean squared deviation f m. *: y - f m. y In doing calculations with density functions the weighted mean is always required to find statistics of the distribution, and if there is any transformation of the variable then that also requires use of a weighted mean calculation to find parameters of the transformed distribution. Since variable transformation is often required the function proves very useful in numerical studies. The expectation of a function g of y under a density p y becomes just (p y) m. g y or better (p m. g) y Using an obvious notation, the two commonest price index numbers become laspeyres (p0*q0) m. p1%p0 paasche (p0*q1) m. p1%p0 In the Dictionary, the concept of using the left argument for the degrees of freedom is introduced. The degrees of freedom are generally only important in second moment calculations with unweighted data. It seems to me that is giving a special case unwarranted importance. The degrees of freedom are simply corrected for using a multiplier. In the simplest case it is just (#%(_1&+&#)) y , but a whole range of other cases are equally simply dealt with. I note that Table 1 uses moment as the description of the dyadic case of m. If that represents an earlier (or later idea) it again seems to me inappropriate. The moments are highly specialised means and best dealt with using the mean of some transformation of the variable, e.g. in the monadic case m. (y - m. y)^4 with the dyadic case f m. ( y - f m. y)^4 n. I would define n. to be consistent with m. Hence the monadic case would be with divisor #y, and the dyadic case would give normalised deviations from a weighted mean. Again if there is a need to adjust degrees of freedom for a particular case I would prefer to see a multiplier used since in many cases items have non-integer weights. Alternatively, a similar procedure could be adopted as with the fit conjunction, this time however using a degrees of freedom conjunction. That solution could give I believe the features you want with the added flexibility and power permitted by generalising to weighted means. The DF conjunction would then only be needed in cases where the data were individual observations and the adjustment was required for a particular statistic or calculation. The dictionary definition of the dyadic case seems to me to give an expression almost never used since apart from squared deviations we seldom use a mean with divisor less than #y. In the squared deviations case when it is used it is not then common to normalise the squared values in any calculations I have had occasion to use. s. I believe a case can be made here for two functions. The first would give a measure of spread for the variables considered individually, and be defined across arbitrary arrays. For that I would prefer either the sum of squared deviations or the mean squared deviation on the grounds noted below. The second is a function which generates information about the covariation of the variables in each item of the data object. If there is to be just one function, I would favour this, in view of the importance of many areas of multi-variate analysis. The standard deviation is a useful measure but very specialised to the univariate case. Further the standard deviation is not an additive measure like the variance (using it now with divisor #y) and the sum of squares. Variances can be simply combined, using the m. function above. (For a population the variance is just the weighted mean of the variances if the populations consists of subsets with different variances.) The standard deviation needs to be squared before it can be combined. As soon as you move to multivariate statistics it is the covariance matrix which is the basic tool, not the correlation matrix or the variable standard deviations. In multivariate analyses it has been my experience that the speed of formation of a cross product matrix X'X is crucial. This is so important that it deserves to be a special function and programmed for speed of execution. So my primitive function would be to form (|:y) +/ . * y in the monadic case. A case can be made for using the mean value of the outer products of the items of y for consistency with the previous definitions though I do not find that compelling. The avoidance of centering within the function is deliberate. Nearly all of the multivariate texts and the works on regression express the theory in terms of the original observation units. The case where the data are deviations from the mean then becomes a special case. With the m. primitive function it is easily treated as just that. The conventional covariance matrix then is (#y)%~(|:yd) +/ . * yd =. y - m. y but for the occasions where that is required, if s. is the cross product verb it is simple enough to enter (#y)%~ s. y - m. y Using the normalised values we obtain the correlation matrix as (#y)%~ s. n. y The choice of dyadic form is more difficult. I believe that consistency with the previous definitions would lead to choice of the dyadic form x s. y as weighting the items in y by %:x where x is a vector of weights. In this way each item in y is given a weight in the crossproduct matrix equal to the corresponding item in x. The weights are not normalised to sum to 1 since in the applications a range of normalisations will be used. This provides a very powerful primitive function, encompassing all of the weighted regression models, and the powerful tools of generalised least squares as outlined by McCullough and Nelder in 'Generalised Linear Models' (2nd ed) Chapman and Hall, 1989. It would also provide a primitive function for many methods based on weighted observations in Econometrics, in particular methods due to H. White of constructing consistent estimators in the common case when there are heteroscedastic errors. It would also be a core function in those methods of non-linear optimisation designed to find the extreme of a likelihood function. An alternative would be to define it as forming the expression (|:x)+/ . * y since this is also often required. This would be of value if the code for it was much faster than the using the general expression above. I prefer the definition which preserves consistency with the concept of weighted values. It is a more natural extension consistent with the usage in the definitions of m. and n. proposed. By introducing a weighting function it is signficantly generalising the monadic operation. Second, and less important from a language design persepective (but perhaps as important for potential users), it is rather messy to program otherwise, and exploiting the special structure of the function could lead to important speed gains in execution. I recognise that speed of execution has not been a primary factor in your design, but it is an important consideration, especially with the data intensive calculations likely in multi-variate studies. If the functions selected contribute both to ease of statement of the problem, and fast execution then they seem to me to have a strong case for inclusion. Looking at these three functions now as a group, and endeavouring to put them into Dictionary style -- m. MEAN (_) m. is the mean value, and m. y is 1 m. y MEAN (_ _) x m. y is the weighted mean of the values of y with weights given by x. The weights will be normalised to sum to one in forming the weighted mean. (It may be desirable to restrict the dimensionality of the arguments to (1 _) though the general case is often useful.) n. NORMALIZE (_) The result of n. y is y translated to have a mean of zero and a mean squared deviation of 1. It is defined as 1 n. y . NORMALIZE (_ _) x n. y provides normalised values of y which when applied with weights x sum to zero and have mean squared deviation equal to 1. The weights x will be normalised to sum to 1. The result of x n. y is therefore (y-x m. y)% %:x m.*:y-x m. y Thus x m. x n. y is 0 and x m. *: x n. y is 1. In the case where the weights are frequencies and it is desired to standardise so that variance is 1 the values must be multiplied by %:(+/x)%(+/x)-1 (Again a restriction on the arguments to (1 _) could be considered.) s. SPREAD (2) s. y is the crossproduct matrix of the values in y and equal to (|:y)+/ . * y . Consistent with the usage of the other statistical functions, s. y is 1 s. y . The covariance matrix is just (#y)%~ s. y - m. y and the correlation matrix is (#y)%~ s. n. y SPREAD (1 2) x s. y is the crossproduct matrix s. (%:x)*"0 1 y . Note that the weights are not normalised to sum to one in this case. The dyadic form weights the outer product of each item of y by the corresponding item in x. ---------------------------------------------------------- The new definitions have been implemented in J 3.1. They are modelled thus: rx =. ($@$@[ <@}. $@]) $&> [ ry =. ($@$@] <@}. $@[) $&> ] m2 =. (rx * ry) %&(+/) rx m1 =. +/ % # dev =. ry - m. rms =. [ %:@m. *:@dev n2 =. dev % rms n1 =. 1&n2 s1 =. |: +/ . * ] s2 =. s1@(%:@[ *"0 1 ]) m. is m1 : m2 n. is n1 : n2 s. is s1 : s2 " 2 1 2 ----------------------------------------------------------------- Roger Hui Iverson Software Inc., 33 Major Street, Toronto, Ontario M5S 2K9 (416) 925 6096
louk@tslwat.UUCP (Lou Kates) (05/19/91)
In article <1991May12.145907.19563@yrloc.ipsa.reuter.COM> hui@yrloc.ipsa.reuter.COM (Roger Hui) writes: >The definitions of mean (m.), normalize (n.), and spread (s.) have >changed in response to a detailed critique by Professor Fraser Jackson >of the Victoria University of Wellington (uunet!matai.vuw.ac.nz!jackson). >Professor Jackson's comments are as follows (minor editing by me): > >-------------------------------------------------------------------- > >The concept of expectation is central to both probability and >statistics. Indeed it is quite possible to axiomatise >probability theory in terms of expectations rather than >probabilities. One didactic advantage of doing so is that the >whole apparatus of measure theory essential for an >axiomatisation based on probabilities can be introduced much >later. Nevertheless, I believe the key concept here is not expectation, probability or measure but regression and projection. From this viewpoint the old APL's domino operator (or regression operator) had it correct and the above suggestions are a step backwards. The mean of a vector V is the regression coefficient of projecting V onto a vector of all ones. The space orthogonal to this vector of ones is the deviation space and the length of the projection of V onto this deviation space divided by the dimensionality of the deviation space (which is the length of V minus one) is the standard deviation. This fits in with the geometric interpretation of linear statistical methods that is commonly taught to statisticians and is the standard way of visualizing and unifying a variety of statistical techniques including regression, analysis of variance and time series analysis. Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu 519-725-0646
hui@yrloc.ipsa.reuter.COM (Roger Hui) (05/21/91)
Lou Kates writes: > Nevertheless, I believe the key concept here is not expectation, > probability or measure but regression and projection. From this > viewpoint the old APL's domino operator (or regression operator) > had it correct and the above suggestions are a step backwards. > The mean of a vector V is the regression coefficient of > projecting V onto a vector of all ones. The space orthogonal to > this vector of ones is the deviation space and the length of the > projection of V onto this deviation space divided by the > dimensionality of the deviation space (which is the length of V > minus one) is the standard deviation. I am puzzled. How would a belief in regression and projection instead of expectation as the key concept materially affect the design of the primitives m., n., and s.? There are many statistical functions worthy of inclusion in the language. We are adding the new functions mean, normalization, and cross product matrix; and assigning the cognate weighted mean, weighted normalization, and weighted cross product to the dyads of these functions seems both consistent and useful. These new functions do not preclude adding other statistical functions in the future, nor do they affect the domino function or any other existing function. Therefore, I would not characterize them as a backward step. We would welcome suggestions for other statistical functions. ----------------------------------------------------------------- Roger Hui Iverson Software Inc., 33 Major Street, Toronto, Ontario M5S 2K9 (416) 925 6096
sam@kalessin.jpl.nasa.gov (Sam Sirlin) (05/21/91)
In article <1991May21.042804.21102@yrloc.ipsa.reuter.COM>, hui@yrloc.ipsa.reuter.COM (Roger Hui) writes: |> We would welcome suggestions for other statistical functions. |> I can calculate means and standard deviations pretty easily in APL or J. On the other hand, what about FFT's or PSD's? These would have applicability beyond just statistics. Of course once linkc (sp?) is available we can all build our own. It seems to me that APL was a pioneer in including solutions to linear equations (domino) in the language, but that now the standard is somewhat higher (Matlab, Mathematica, ...) and includes EISPACK, LINPACK, and FFT's. Of course these make the code size grow... -- Sam Sirlin Jet Propulsion Laboratory sam@kalessin.jpl.nasa.gov
weg@convx1.ccit.arizona.edu (Eythan Weg) (05/22/91)
In article <1991May21.042804.21102@yrloc.ipsa.reuter.COM> hui@yrloc.ipsa.reuter.COM (Roger Hui) writes: Lou Kates writes: > Nevertheless, I believe the key concept here is not expectation, > probability or measure but regression and projection. From this > viewpoint the old APL's domino operator (or regression operator) > had it correct and the above suggestions are a step backwards. > The mean of a vector V is the regression coefficient of > projecting V onto a vector of all ones. The space orthogonal to > this vector of ones is the deviation space and the length of the > projection of V onto this deviation space divided by the > dimensionality of the deviation space (which is the length of V > minus one) is the standard deviation. I am puzzled. How would a belief in regression and projection instead of expectation as the key concept materially affect the design of the primitives m., n., and s.? There are many statistical functions worthy of inclusion in the language. We are adding the new functions mean, normalization, and cross product matrix; and assigning the cognate weighted mean, weighted normalization, and weighted cross product to the dyads of these functions seems both consistent and useful. These new functions do not preclude adding other statistical functions in the future, nor do they affect the domino function or any other existing function. Therefore, I would not characterize them as a backward step. We would welcome suggestions for other statistical functions. I do not want to see J as a statistical package. But I would agree that the mean and its cousins are basic enough to be included in some form. Regression is after all some derived concept for which expectation is needed to set up the appropriate model. But since we are invited to suggest I will voice my desire to have generalized means etc that are some robust versions of these, were you obtain the regular concepts as defaults. A generalization of this to general robust regression is welcome but I am not sure it can be done in the spirit of J. Eythan
hui@yrloc.ipsa.reuter.COM (Roger Hui) (05/22/91)
Sam Sirlin writes: >I can calculate means and standard deviations pretty easily in APL or J. >On the other hand, what about FFT's or PSD's? These would have applicability >beyond just statistics. Of course once linkc (sp?) is available we can all >build our own. It seems to me that APL was a pioneer in including solutions >to linear equations (domino) in the language, but that now the standard is >somewhat higher (Matlab, Mathematica, ...) and includes EISPACK, LINPACK, >and FFT's. Of course these make the code size grow... Yes, LinkJ is the preferred means for access to FFT, LP, and other more specialized routines. The advantages are twofold: it avoids duplicating the excellent work of others, and it avoids having to support the facility forever more. ----------------------------------------------------------------- Roger Hui Iverson Software Inc., 33 Major Street, Toronto, Ontario M5S 2K9 (416) 925 6096
louk@tslwat.UUCP (Lou Kates) (05/29/91)
In article <1991May21.042804.21102@yrloc.ipsa.reuter.COM> hui@yrloc.ipsa.reuter.COM (Roger Hui) writes: >Lou Kates writes: > >> Nevertheless, I believe the key concept here is not expectation, >> probability or measure but regression and projection. From this >> viewpoint the old APL's domino operator (or regression operator) >> had it correct and the above suggestions are a step backwards. > >I am puzzled. How would a belief in regression and projection >instead of expectation as the key concept materially affect >the design of the primitives m., n., and s.? ^^^^^^^^^^ When everything you can think of gets put into the lanugage its hard to see how they all qualify as primitive. > >There are many statistical functions worthy of inclusion in the >language. We are adding the new functions ... I guess its whether you believe in parsimony or not. Personally I would rather have a wider family of powerful operations at my disposal (such as regression, constrained linear optimization a la simplex or Karmarker, eigenvalue calculations, etc.) that are not easily derivable from each other rather than a large set of functions which are all readily derivable from the ideas of regression and projection. My own preference, and I suspect that of many others too, would be that if you feel the need to have zillions of functions at your disposal, define a standard library so that you can take them out of the language so as to keep the language smaller and more manageable. If performance is the issue then there is nothing to stop a particular implementation from implementing certain standard library functions in the kernel. Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu 519-725-0646
dave@visual1.jhuapl.edu (Dave Weintraub) (05/30/91)
In article <414@tslwat.UUCP>, louk@tslwat.UUCP (Lou Kates) writes: |> ... |> |> Personally I would rather have a wider family of powerful |> operations at my disposal (such as regression, constrained linear |> optimization a la simplex or Karmarker, eigenvalue calculations, |> etc.) that are not easily derivable from each other rather than a |> large set of functions which are all readily derivable from the |> ideas of regression and projection. My own preference, and I |> suspect that of many others too, would be that if you feel the |> need to have zillions of functions at your disposal, define a |> standard library so that you can take them out of the language so |> as to keep the language smaller and more manageable. If |> performance is the issue then there is nothing to stop a |> particular implementation from implementing certain standard |> library functions in the kernel. |> |> Lou Kates, Teleride Sage Ltd., louk%tslwat@watmath.waterloo.edu |> 519-725-0646 I fully agree. This is the path taken by IBM with APL2: write external functions (in Assembler, FORTRAN, PL/I, C, ...) and make these available using QuadNA.
sam@kalessin.jpl.nasa.gov (Sam Sirlin) (05/30/91)
In article <1991May29.191441.1279@aplcen.apl.jhu.edu>, dave@visual1.jhuapl.edu (Dave Weintraub) writes: |> I fully agree. This is the path taken by IBM with APL2: write external |> functions (in Assembler, FORTRAN, PL/I, C, ...) and make these available |> using QuadNA. An interesting path in this vein is the path taken by ProMatlab. It uses the dynamic linking capabilities of modern machines. Hence a variety of compiled routines are available (and more can be written by the user) that are easily linked in while running the Matlab interpreter, simply by invoking the program by name (Matlab then searches for the right sort of file and then does the dynamic link). Using this approach, the kitchen sink doesn't have to be in the code for everyone, but standard compiled routines are available for those who need them at practically no overhead. -- Sam Sirlin Jet Propulsion Laboratory sam@kalessin.jpl.nasa.gov
hui@yrloc.ipsa.reuter.COM (Roger Hui) (06/01/91)
Dave Weintraub writes: > I fully agree. This is the path taken by IBM with APL2: write external > functions (in Assembler, FORTRAN, PL/I, C, ...) and make these available > using QuadNA. Dave, this is in fact possible with LinkJ. LinkJ permits calling J from C and C from J. C functions so introduced behave like primitive verbs, in the sense that they may be assigned names, and (independently) may serve as arguments to adverbs and conjunctions. For example, f =. 10!:57 NB. f is an external function encoded by 57 f"r y NB. Apply f to rank r cells of y x f"r y NB. Apply f to rank r cells of x and y f/y NB. f insertion ("reduction") x f/y NB. f table ("outer product") 10!:362"r y NB. Apply the external fn encoded by 362 to rank r cells Sam Sirlin writes: > An interesting path in this vein is the path taken by ProMatlab. It uses the > dynamic linking capabilities of modern machines. Hence a variety of compiled > routines are available (and more can be written by the user) that are easily > linked in while running the Matlab interpreter, ... Sam, this is possible with LinkJ on systems supporting dynamic linking. Lou Kates writes: > >I am puzzled. How would a belief in regression and projection > >instead of expectation as the key concept materially affect > >the design of the primitives m., n., and s.? > ^^^^^^^^^^ > When everything you can think of gets put into the lanugage its hard > to see how they all qualify as primitive. > ... > I guess its whether you believe in parsimony or not. > > Personally I would rather have a wider family of powerful > operations at my disposal (such as regression, constrained linear > optimization a la simplex or Karmarker, eigenvalue calculations, > etc.) that are not easily derivable from each other rather than a > large set of functions which are all readily derivable from the > ideas of regression and projection. My own preference, and I > suspect that of many others too, would be that if you feel the > need to have zillions of functions at your disposal, define a > standard library so that you can take them out of the language so > as to keep the language smaller and more manageable. > ... It may be helpful to consult your copy of the Dictionary. In it, you'd find that (a) not everything you can think of is in the language; (b) we do not in fact have zillions of functions in the language; (c) the ISO APL %. (matrix inverse and matrix divide, what you called regression) is in the language; (d) characteristic values and vectors are in the language. Many primitives can be derived readily from each other. One could argue that *: (nand) makes the other boolean functions redundant, as do - (minus) and % (divide) for + (plus) and * (times). Similarly, one can derive |. (reverse), |: (transpose), u;.n (cut), u . v (generalized determinant), %. (domino), and so forth, from { (from). There is more to parsimony than reducing the number of primitives. ----------------------------------------------------------------- Roger Hui Iverson Software Inc., 33 Major Street, Toronto, Ontario M5S 2K9 (416) 925 6096