[comp.sys.handhelds] HP48 Least Squares Approximation

henrikj@kuling.UUCP (Henrik Johansson) (04/07/91)
      Least Squares Approximations on the HP48

In a way the statistic functions on the HP48 are quite good, but
if one wants to do some more complicated curve fitting the built-in
functions are unsatisfactory.  Therefore I sat down and rewrote a
LISP program I wrote a few years ago for more general least squares
approximation problems into HP48 notation.

Suppose you have a set of points in the plane and you know that
they will fit well on a line, then you use the built-in functions
to solve the problem.  But if they seem to fit better on a
function like "y = a*x^2 + b*x + c + d*ln(x)" or something?  You don't.

This program makes it possible to solve a problem like that above.
The most general form of input 0is

  y = a_1 * f_1(x_1, x_2, ..., x_m) +
      a_2 * f_2(x_1, x_2, ..., x_m) +
      ... +
      a_n * f_n(x_1, x_2, ..., x_m)

together with a set of data points.  I call the x_i's the independent
variables, and y the dependent variable.

Example 1:

   x:   1   2   3   4
   y:   3   6  11  18

and y is a quadratic function like "a*x^2 + b*x + c" then you enter

 Stack level
     4:     [ 1 2 3 4 ]          x values
     3:   [ 3 6 11 18 ]          y values (be sure that they agree by index)
     2:   { 'X^2' X 1 }          f_i's
     1:             'X'          independent variable

execute LSQ and get the following result

   (symbolic mode on)                       (symbolic mode off)
 Stack level                            Stack level
     2:       .999999999992                 2:       .999999999992
     1: '.99999999967*X^2+                  1: [[ .99999999967 ]
        1.6511E-9*X+                            [ 1.6511E-9 ]
        1.99999999835'                          [ 1.99999999835 ]]

The result is that a = 1, b = 0 and c = 2.

The number in level 2 tells how well the resulting coefficients (a_i)
make the total function correspond to the given set of data.
A value of 1 is a complete match, and 0 is a complete mismatch.

--------

It may be the case that you have a `t' that are dependent on more than
one variable.

Example 2:

   x:     1    -8     5     1     2     3     3
   y:     3     6     2     1     3     0     2
   z:    -5     7     4     1     4     6     1
   t:  -102  -123   108    -2    61    74    10

and a reasonable function to try is "t = a*x + b*x*z + c*y + d*y*z + e".
(It is always up to the user to suggest a function!)

 Stack level
     4:  [[ 1 3 -5 ]
          [ -8 6 7 ]
          [ 5 2 4 ]
          [ 1 1 1 ]
          [ 2 3 4 ]
          [ 3 0 6 ]
          [ 3 2 1 ]]
     3:  [ -102 -123 108 -2 61 74 10 ]
     2:  { X 'X*Z' Y 'Y*Z' 1 }
     1:  { X Y Z }

execute LSQ and get the following result

   (symbolic mode on)                       (symbolic mode off)
 Stack level                            Stack level
     2:       .999999999998                 2:       .999999999998
     1: '3.00000000006*X+                   1: [[ 3.0000000006 ]
        3.99999999999*(X*Z)-                    [ 3.9999999999 ]
        6.000000000007*Y+                       [ -6.00000000007 ]
        4.000000000001*(Y*Z)                    [ 4.00000000001 ]
        -7.000000000064'                        [ -7.00000000064 ]]

The result is that a = 3, b = 4, c = 6, d = 4 and e = -7.

--------

As you can see there is two ways to enter the x_i-values.  If there is
only one dependent variable (m = 1) then its values may be entered in
a vector (eg. [ 1 2 3 4 5 ]) but if there are more than one, their values
has to be entered as in the Example 2. The y-values should be entered in
a vector.  If there is only one dependent variable it can be entered
as 'X' instead of { X } in stack level 1.

If you want to find a and b in "a*x^b", then you have to do some
pre- and post-processing on the data to make it linear in a and b.

It is always the case that `y' is a linear
function of the a_i's (unfortunately).  It had
been nice to have a completely general function...

The method I used in the program uses much memory but is somewhat faster
than the other method I could have used (that would have used a lesser
amount of memory).  By this I don't mean that the program is fast...:-)

Some parts of the code can surely be rewritten for speed, but I didn't
feel the need to make it better than it is - I'm satisfied with the
result.  Almost.  As can be seen by the examples, the numerical accuracy
could have been better but I don't know how a general method for increasing
the accuracy could be implemented.  If anyone has any ideas - please
let me know.  I haven't consulted any textbooks about this problem so
good advice about what to read would be welcome.

If there are any questions, suggestions, comments or bugs - just write
me a mail, and I will answer as soon as I can.
---------

The program takes up 672.5 bytes, and has ckecksum #7543h when stored in a
variable called 'LSQ'.

---- cut here ----
%%HP: T(3)A(R)F(.);
\<<
  \<<
    IF DUP SIZE
SIZE 1 ==
    THEN OBJ\-> 1
SWAP + \->ARRY TRN
CONJ
    END
  \>> \-> fix
  \<<
    IF DUP TYPE 5 \=/
    THEN 1 \->LIST
    END 4 ROLL fix
EVAL DUP SIZE OBJ\->
DROP \-> yv fv vv xv
r c
    \<<
      IF r yv SIZE
1 GET - c vv SIZE -
OR
      THEN
"Size(s) mismatch"
DOERR
      END 1 fv SIZE
      FOR I xv OBJ\->
DROP fv I GET \-> f
        \<< r 1
          FOR J vv
OBJ\-> DROP 1 c
            FOR K c
2 + K - ROLL SWAP
STO
            NEXT f
EVAL J 1 - c * 1 +
r + J - ROLLD -1
          STEP
        \>>
      NEXT fv SIZE
r 2 \->LIST \->ARRY DUP
yv fix EVAL DUP 4
ROLLD * DUP 4 ROLLD
SWAP DUP TRN CONJ *
/ DUP 4 ROLLD TRN
CONJ SWAP 3 ROLLD
SWAP * OBJ\-> DROP
SWAP DUP 1 CON TRN
SWAP * OBJ\-> DROP SQ
r / SWAP OVER -
SWAP yv DUP DOT -
NEG / SWAP
      IF -3 FC?
      THEN \-> cv
        \<< 0 1 fv
SIZE
          FOR I cv
I 1 2 \->LIST GET fv
I GET * +
          NEXT
        \>>
      END
    \>>
  \>>
\>>
---- and cut here too ----

-------
henrikj@kuling.docs.uu.se               Henrik Johansson
(Nothing fancy about me - at least not in the signature)