henrikj@kuling.UUCP (Henrik Johansson) (04/07/91)
Least Squares Approximations on the HP48 In a way the statistic functions on the HP48 are quite good, but if one wants to do some more complicated curve fitting the built-in functions are unsatisfactory. Therefore I sat down and rewrote a LISP program I wrote a few years ago for more general least squares approximation problems into HP48 notation. Suppose you have a set of points in the plane and you know that they will fit well on a line, then you use the built-in functions to solve the problem. But if they seem to fit better on a function like "y = a*x^2 + b*x + c + d*ln(x)" or something? You don't. This program makes it possible to solve a problem like that above. The most general form of input 0is y = a_1 * f_1(x_1, x_2, ..., x_m) + a_2 * f_2(x_1, x_2, ..., x_m) + ... + a_n * f_n(x_1, x_2, ..., x_m) together with a set of data points. I call the x_i's the independent variables, and y the dependent variable. Example 1: x: 1 2 3 4 y: 3 6 11 18 and y is a quadratic function like "a*x^2 + b*x + c" then you enter Stack level 4: [ 1 2 3 4 ] x values 3: [ 3 6 11 18 ] y values (be sure that they agree by index) 2: { 'X^2' X 1 } f_i's 1: 'X' independent variable execute LSQ and get the following result (symbolic mode on) (symbolic mode off) Stack level Stack level 2: .999999999992 2: .999999999992 1: '.99999999967*X^2+ 1: [[ .99999999967 ] 1.6511E-9*X+ [ 1.6511E-9 ] 1.99999999835' [ 1.99999999835 ]] The result is that a = 1, b = 0 and c = 2. The number in level 2 tells how well the resulting coefficients (a_i) make the total function correspond to the given set of data. A value of 1 is a complete match, and 0 is a complete mismatch. -------- It may be the case that you have a `t' that are dependent on more than one variable. Example 2: x: 1 -8 5 1 2 3 3 y: 3 6 2 1 3 0 2 z: -5 7 4 1 4 6 1 t: -102 -123 108 -2 61 74 10 and a reasonable function to try is "t = a*x + b*x*z + c*y + d*y*z + e". (It is always up to the user to suggest a function!) Stack level 4: [[ 1 3 -5 ] [ -8 6 7 ] [ 5 2 4 ] [ 1 1 1 ] [ 2 3 4 ] [ 3 0 6 ] [ 3 2 1 ]] 3: [ -102 -123 108 -2 61 74 10 ] 2: { X 'X*Z' Y 'Y*Z' 1 } 1: { X Y Z } execute LSQ and get the following result (symbolic mode on) (symbolic mode off) Stack level Stack level 2: .999999999998 2: .999999999998 1: '3.00000000006*X+ 1: [[ 3.0000000006 ] 3.99999999999*(X*Z)- [ 3.9999999999 ] 6.000000000007*Y+ [ -6.00000000007 ] 4.000000000001*(Y*Z) [ 4.00000000001 ] -7.000000000064' [ -7.00000000064 ]] The result is that a = 3, b = 4, c = 6, d = 4 and e = -7. -------- As you can see there is two ways to enter the x_i-values. If there is only one dependent variable (m = 1) then its values may be entered in a vector (eg. [ 1 2 3 4 5 ]) but if there are more than one, their values has to be entered as in the Example 2. The y-values should be entered in a vector. If there is only one dependent variable it can be entered as 'X' instead of { X } in stack level 1. If you want to find a and b in "a*x^b", then you have to do some pre- and post-processing on the data to make it linear in a and b. It is always the case that `y' is a linear function of the a_i's (unfortunately). It had been nice to have a completely general function... The method I used in the program uses much memory but is somewhat faster than the other method I could have used (that would have used a lesser amount of memory). By this I don't mean that the program is fast...:-) Some parts of the code can surely be rewritten for speed, but I didn't feel the need to make it better than it is - I'm satisfied with the result. Almost. As can be seen by the examples, the numerical accuracy could have been better but I don't know how a general method for increasing the accuracy could be implemented. If anyone has any ideas - please let me know. I haven't consulted any textbooks about this problem so good advice about what to read would be welcome. If there are any questions, suggestions, comments or bugs - just write me a mail, and I will answer as soon as I can. --------- The program takes up 672.5 bytes, and has ckecksum #7543h when stored in a variable called 'LSQ'. ---- cut here ---- %%HP: T(3)A(R)F(.); \<< \<< IF DUP SIZE SIZE 1 == THEN OBJ\-> 1 SWAP + \->ARRY TRN CONJ END \>> \-> fix \<< IF DUP TYPE 5 \=/ THEN 1 \->LIST END 4 ROLL fix EVAL DUP SIZE OBJ\-> DROP \-> yv fv vv xv r c \<< IF r yv SIZE 1 GET - c vv SIZE - OR THEN "Size(s) mismatch" DOERR END 1 fv SIZE FOR I xv OBJ\-> DROP fv I GET \-> f \<< r 1 FOR J vv OBJ\-> DROP 1 c FOR K c 2 + K - ROLL SWAP STO NEXT f EVAL J 1 - c * 1 + r + J - ROLLD -1 STEP \>> NEXT fv SIZE r 2 \->LIST \->ARRY DUP yv fix EVAL DUP 4 ROLLD * DUP 4 ROLLD SWAP DUP TRN CONJ * / DUP 4 ROLLD TRN CONJ SWAP 3 ROLLD SWAP * OBJ\-> DROP SWAP DUP 1 CON TRN SWAP * OBJ\-> DROP SQ r / SWAP OVER - SWAP yv DUP DOT - NEG / SWAP IF -3 FC? THEN \-> cv \<< 0 1 fv SIZE FOR I cv I 1 2 \->LIST GET fv I GET * + NEXT \>> END \>> \>> \>> ---- and cut here too ---- ------- henrikj@kuling.docs.uu.se Henrik Johansson (Nothing fancy about me - at least not in the signature)