[sci.math] Solution of quartic equation

greg@jiff.berkeley.edu (Greg) (03/15/88)

In article <1656@thorin.cs.unc.edu> turner@unc.cs.unc.edu (Douglass Turner) writes:
>Hello, does anyone know of a good, robust, method to find all roots of a 
>quartic equation?

There are various plans of attack.  I assume, since you are ray-tracing,
that you want the real roots, or perhaps even the real positive roots.
I will describe one reasonable method below.

>I have heard that direct solution is prone to numerical instability, so
>an iterative technique is called for.

The direct solution is prone to programmer's instability.  It's very
messy and doesn't generalize.  The most painful step is taking the
cube root of a complex number, which cannot be expressed in terms of
roots of real numbers.  In any case, the numerous roots in the formula
are themselves computed by iterative algorithms in the math package,
so you aren't really saving much by using the exact formula.

My algorithm is based on Newton's method and Sturm chains.  A Sturm
chain is an easy way to find out how many roots a polynomial has
in an interval.  You take the polynomial p(x), which we may call
p0(x), and you take its derivative, p1(x) = p'(x).  Now let pn(x)
be the remainder obtained when you divide p{n-2}(x) by p{n-1}(x).
Quit when you obtain a constant polynomial (the degree goes down
by at least one every time).  The sequence of polynomials p0,...,pk
is the Sturm chain.  Now let S(a) be the number of sign changes
in the sequence p0(a),p1(a),...,pk(a), i.e. the number of times
that pi(a) is positive and p{i+1}(a) is negative, or vice versa.
Then the number of roots in the interval [a,b] is S(a)-S(b).

One can use Sturm chains directly in an enlightened divide-and-conquer
approach, but if you want to zero in on the solution quickly it's best to
switch to Newton's method.  The trick with Newton's method is to avoid
the inflection points.  They are the roots of the second derivative,
p''(x).

Find the real roots of p''(x), say y1,...,ym, by recursion.  In your
case p(x) is a quartic, so you can use the quadratic formula for p''.
Pick y0 to be smaller than all of the roots, and y{m+1} to be larger
than all of the roots.  Now use Sturm chains to find the number of
roots in the intervals [y0,y1], [y1,y2],...,[ym,y{m+1}].  Each interval
has at most two roots.  Use Newton's method in each interval that has
roots in it.  If an interval has two roots, Newton's method will
converge to the two roots starting from each end of the interval.  For
an interval with one root, check if p(x) is concave up or concave down
in that interval (the concavity of course alternates with successive
intervals).  If it is concave up, Newton's method will converge
starting from the endpoint at which p(x) has a positive value.
Otherwise start from the other endpoint.

Since you are ray tracing, you may often only want the first
positive root.  You can discard and truncate the intervals appropriately,
and then progress from left to right, stopping at the first root you
find.

Sturm chains will fail if p(x) and p'(x) share roots, which happens
exactly when p(x) has multiple roots.  Exact numerical coincidences are
a delicate matter in floating point arithmetic, if the first constant
polynomial in the Sturm chain is the zero function, then the previous
polynomial in the chain is the greatest common divisor of p(x) and p'(x),
and you can just divide out p(x) by this and start over.
--
Greg

td@alice.UUCP (03/16/88)

I believe that the Sturm chain algorithm as described in the referenced article
has at least two bugs.

The article describes sturm chains thus:
>  You take the polynomial p(x), which we may call
>p0(x), and you take its derivative, p1(x) = p'(x).  Now let pn(x)
>be the remainder obtained when you divide p{n-2}(x) by p{n-1}(x).
>Quit when you obtain a constant polynomial (the degree goes down
>by at least one every time).
Restating, this is
	p0(x)=p(x)
	p1(x)=p'(x)
	p[n+2](x)=p[n](x) mod p[n-1](x)
where a(x) mod b(x) is the remainder left after `synthetic division.'

This is incorrect.  Sturm chains have

	p[n+2](x)= -(p[n](x) mod p[n-1](x))

i.e. the sign of the remainders in jiff!greg's version must be flipped
for even n>2.

The article describes the method of counting roots on an interval [a,b]
as follows:
>  Now let S(a) be the number of sign changes
>in the sequence p0(a),p1(a),...,pk(a), i.e. the number of times
>that pi(a) is positive and p{i+1}(a) is negative, or vice versa.
>Then the number of roots in the interval [a,b] is S(a)-S(b).

This is wrong for two reasons: first, the interval must be open at
the bottom (i.e. (a,b], not [a,b]).  Verify this for yourself by
considering the case a=b where p(a)=0.  Second the method of counting
sign changes is subtly wrong.  It is possible for p[k](a) to be zero.
Any zeros in the sequence must be deleted before counting sign changes.

Another criticism of the method is that his numerical method for converging
to the root once it's isolated is over-elaborate.  Since the root is bracketed,
methods like regula falsi are guaranteed to converge, albeit slowly.
A good way to proceed is to take Newton steps starting at one end of the
interval, contracting the interval as you go.  If a Newton step takes you
outside the interval, or involves division by zero, take a regula falsi
step instead.

td@alice.UUCP (03/16/88)

Small error in previous article:
> i.e. the sign of the remainders in jiff!greg's version must be flipped
> for even n>2.
	     ^
should be `for even n>0.'

cfchiesa@bsu-cs.UUCP (Christopher Chiesa) (03/17/88)

In article <7662@agate.BERKELEY.EDU>, greg@jiff.berkeley.edu (Greg) writes:
> In article <1656@thorin.cs.unc.edu> turner@unc.cs.unc.edu (Douglass Turner) writes:
> >Hello, does anyone know of a good, robust, method to find all roots of a 
> >quartic equation?
> 
>    [miscellaneous introductory comments]

> 
> The direct solution is prone to programmer's instability.  It's very
> messy and doesn't generalize.  The most painful step is taking the
> cube root of a complex number, which cannot be expressed in terms of
> roots of real numbers.  In any case, the numerous roots in the formula

Perhaps I misunderstand you (heaven knows I didn't follow the Sturm chains
and so on), but...  I recall taking cube and higher roots of complex numbers
quite easily by converting from "cartesian" (A+Bi) representation to "polar"
(r cis Theta), and performing a few straightforward real-number calculations.
Basically, you take the root (let's generalize and say the Nth root) of the
"r" (radius) part of the representation, and divide the "Theta" (angle) part
by N, to find the first Nth root; then you add Theta/N to the angle part re-
peatedly to get the other N-1 roots.  Works quite nicely; I've never bothered
to program it, but if it's been a bottleneck for you, this may be a way around
it.

I'd be interested to know if this has been helpful to you or anyone else...

Chris Chiesa
Senior, CS Dept
Ball State University
Muncie, IN
 
-- 
<><><><><><><><><><><><><><><><><><><><><><><><><><><><> Chris Chiesa <><><><><>
<> {ihpn4|seismo}!{iuvax|pur-ee}!bsu-cs!cfchiesa                              <>
<> cfchiesa@bsu-cs.UUCP                                                       <>
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

bumby@math.rutgers.edu (Richard Bumby) (03/17/88)

In article <7741@alice.UUCP> td@alice.UUCP writes:
>
>
>I believe that the Sturm chain algorithm as described 
>in the referenced article has at least two bugs.
>
>...(stuff omitted)...
>	p[n+2](x)= -(p[n](x) mod p[n-1](x))
>....(more stuff omitted)...
>The article describes the method of counting roots on an interval [a,b]
>as follows:
>>  Now let S(a) be the number of sign changes
>>in the sequence p0(a),p1(a),...,pk(a), i.e. the number of times
>>that pi(a) is positive and p{i+1}(a) is negative, or vice versa.
>>Then the number of roots in the interval [a,b] is S(a)-S(b).
>
>This is wrong for two reasons: first, the interval must be open at
>the bottom (i.e. (a,b], not [a,b]).
>....(more stuff omitted)...

	The proof of Sturm's theorem, which you can find in lovely
detail in Weber's "Algebra", makes clear why the method works and what
the theorem really says.  For example, the polynomial that you start
with should have no multiple roots so that any Euclidean algorithm
starting from these two polynomials will always have consecutive terms
without common factor.  Next, you observe what happens as you pass
through a zero of any of the polynomials in your chain: the role of
the sign change is to guarantee that p{i-1}(a) and p{i+1}(a) have
opposit signs if pi(a) = 0.  Thus, the number of changes of sign does
not change, however you interpret the sign of 0.  Finally, going
through a zero of p from left to right is easily seen to always remove
a change of sign between p and p'. The same method of proof gives
"Descartes' rule of signs" and other related results.

	A thoroughly robust program must be able to handle the
accidental discovery of the root of the polynomial, but the first
version should be allowed to handle it in a sloppier fashion.
Similarly, the decision of whether to use a rapid iterative solution
once the root has been isolated can be postponed.  It is sobering to
realize that much more time is spent telling students why 'bisection'
is a poor rootfinding method than would be spent in using it for all
relevant problems in a numerical analysis course.


-- 

--R. T. Bumby
** Math ** Rutgers ** New Brunswick **
(in one form or another for all kinds of mail)

greg@jiff.berkeley.edu (Greg) (03/19/88)

In article <2381@bsu-cs.UUCP> cfchiesa@bsu-cs.UUCP (Christopher Chiesa) writes:
>In article <7662@agate.BERKELEY.EDU>, greg@jiff.berkeley.edu (Greg) writes:
>> The direct solution is prone to programmer's instability.  It's very
>> messy and doesn't generalize.  The most painful step is taking the
>> cube root of a complex number,...
...
>I recall taking cube and higher roots of complex numbers
>quite easily by converting from "cartesian" (A+Bi) representation to "polar"
>(r cos Theta), and performing a few straightforward real-number calculations.

You are correct; you can take roots of complex number by passing to polar
coordinates.  Although the calculation is not quite as messy as I suggested,
it is still painful in the sense that it requires several invocations of
transcendental functions.  It does not seem like the fastest solution.

Be that as it may, the formula for the roots of a quartic is complicated.
Its derivation is even worse.
--
Greg

bumby@math.rutgers.edu (Richard Bumby) (03/22/88)

This discussion concerns the solution of quartic equations.  The
formula reducing the solution of qurtic equations to the extaction of
roots was referred to as 'the direct method' and questions about
implementing the complex cube root it requires were addressed in some
of the articles.  Here is a quote from article <2381@bsu-cs.UUCP> by
cfchiesa@bsu-cs.UUCP (Christopher Chiesa):
    Perhaps I misunderstand you (heaven knows I didn't follow
    the Sturm chains and so on), but...  I recall taking cube
    and higher roots of complex numbers quite easily by
    converting from "cartesian" (A+Bi) representation to "polar"
    (r cis Theta), and performing a few straightforward
    real-number calculations.

This is possible, but it is totally insensitive to the topology of the
real numbers, and hence to any numerical considerations.  Similarly,
the Sturm chains are easy (I thought I posted on that, but I'm not sure
it got out), but are really used for theoretical purposes.  In this
problem, it is probably easy to separate the roots using other
considerations.  If there are 4 real roots, then Newton's method is
sure to locate the largest if you start from a value larger than it.
The smallest is found similarly.  If there are only two real roots,
this may be reliable for only one of them.  Deflation to reduce to an
equation of lower degree is probably safe if you check the roots in
the original equation.  Since I last wrote on this, I have had a
chance to consult "Numerical Recipes" and find that it does not give
very much information.  On the other hand, this is really a numerical
problem, not an algebraic one.  There is no reason to believe that
'extraction of roots', which must be done numerically (frequently in
software) is to be preferred to a direct rootfinding method which does
not require complicated preprocesssing.


-- 

--R. T. Bumby
** Math ** Rutgers ** New Brunswick **
(in one form or another for all kinds of mail)

jadwa@henry.cs.reading.ac.uk (James Anderson) (03/23/88)

Expires:

Sender:

Followup-To:

Distribution:


I am an absolute beginner, so let me get this clear. If I have an
equation with several roots and I start Newton's method lower
than the least one it will converge to the least one and if I
start it larger than the largest it will converge to the largest?
No matter how close the roots are? Wow!

James

(JANET) James.Anderson@reading.ac.uk

ags@s.cc.purdue.edu (Dave Seaman) (03/26/88)

In article <726@onion.cs.reading.ac.uk>, jadwa@henry.cs.reading.ac.uk (James Anderson) writes:
> I am an absolute beginner, so let me get this clear. If I have an
> equation with several roots and I start Newton's method lower
> than the least one it will converge to the least one and if I
> start it larger than the largest it will converge to the largest?
> No matter how close the roots are? Wow!

This is true if all roots are real and if you assume infinite precision
arithmetic (a very large assumption).  The reason is that a polynomial
with all roots real is either concave upward (if positive) or concave
downward (if negative) on the intervals (-inf,rmin] and [rmax,+inf),
where rmin and rmax are the smallest and largest roots.

The claim is not necessarily true if the polynomial has complex roots.
There may be extreme values which lie outside the interval [rmin,rmax],
and choosing an initial approximation near such an extreme may cause
the first Newton iteration to lead to a point that is far away on the
other side of all the real roots.

-- 
Dave Seaman	  					
ags@j.cc.purdue.edu