[comp.lang.c] a tree question

bhil@ohs.UUCP (Brian T. Hill) (08/09/89)

Does anyone have a good alternative to the AVL method of balancing
binary trees?  It seems to me that the AVL method is wasteful of 
both time and space.

Any solutions will be appreciated.

--------------------------------------------------------------------------
 /"""\       Kerr's Three Rules for a Successful College:
 |^ ^|       Have plenty of football for the alumni, sex for the students,
@|O O|@      and parking for the faculty.
 | - |
  \_/
 %-+-%
   #
 _/ \_       --Brian T. Hill  (...uunet!iconsys!ohs!bhil)  (bhil@ohs.uucp)
--------------------------------------------------------------------------

hascall@atanasoff.cs.iastate.edu (John Hascall) (08/11/89)

In article <421@ohs.UUCP> bhil@ohs.UUCP (Brian T. Hill) writes:
}Does anyone have a good alternative to the AVL method of balancing
}binary trees?  It seems to me that the AVL method is wasteful of 
}both time and space.
 
    How so?

    Insertion (balancing) is O(log n) and requires only 2 extra bits
    per node (although almost everyone uses at least a byte).

    And half the time (roughly) no rebalancing is needed, with single
    and double rotation needed about one time in four each.

    If you insist on another method, how about B-trees?

John Hascall
ISU Comp Center

amirben@TAURUS.BITNET (08/22/89)

In article <421@ohs.UUCP> bhil@ohs.UUCP (Brian T. Hill) writes:
>Does anyone have a good alternative to the AVL method of balancing
>binary trees?  It seems to me that the AVL method is wasteful of
>both time and space.
>

I don't know if you can save a lot beyond AVL trees, but you may find
one of the following methods more elegant/easy-to-implement:

red-black trees --- Sedgewick, Algorithms (Addison-Wesley 83).
2-3 trees --- Aho, Hopcroft and Ullman, The Design and Analysis of
              Computer Algorithms (Addison-Wesley 74).

------
 Amir
------

hascall@atanasoff.cs.iastate.edu (John Hascall) (08/23/89)

In article 1074 amirben%math.tau.ac.il@CUNYVM.CUNY.EDU (Ben-amram Amir) writes:
}In article 421 bhil@ohs.UUCP (Brian T. Hill) writes:

}>Does anyone have a good alternative to the AVL method of balancing...
 
}I don't know if you can save a lot beyond AVL trees, but you may find
}one of the following methods more elegant/easy-to-implement:
 
}2-3 trees --- Aho, Hopcroft and Ullman, The Design and Analysis of
}              Computer Algorithms (Addison-Wesley 74).
 
   BTW, 2-3 trees are really B-trees of order 3.
   They, along with several other trees, are discussed in Knuth V3.
 
   Since this is comp.lang.c :-)  Does anyone know of any good "Algortithm"
   books where the examples are in C?

John Hascall

gillies@m.cs.uiuc.edu (08/23/89)

One of the easiest trees to implement is the Splay tree, but there is
a high overhead (on a 1.5 MIPS Mac II, I could only achieve 55,000
comparisons [not *searches*] per second [searching numbers]).  But an
entire implementation is less than 60 lines of C code.  Splay trees
accomplish insert, delete, join, lookup, all in amortized O(log n)
time.

Send e-mail if you're interested.  The main reference is Tarjan &
Sleator, Siam J. on Computing, 1984 [sometime].

Don Gillies, Dept. of Computer Science, University of Illinois
1304 W. Springfield, Urbana, Ill 61801      
ARPA: gillies@cs.uiuc.edu   UUCP: {uunet,harvard}!uiucdcs!gillies

mccaugh@s.cs.uiuc.edu (08/25/89)

Responding to: gillies@m.cs.uiuc.edu 

> One of the easiest trees to implement is the Splay tree, but there is
> a high overhead (on a 1.5 MIPS Mac II, I could only achieve 55,000
> comparisons [not *searches*] per second [searching numbers]).  But an
> entire implementation is less than 60 lines of C code.  Splay trees
> accomplish insert, delete, join, lookup, all in amortized O(log n)
> time.
> 
 But if there is such a high overhead, how do you gain such efficiency?

gillies@p.cs.uiuc.edu (08/26/89)

> Written  1:58 am  Aug 25, 1989 by mccaugh@s.cs.uiuc.edu in comp.lang.c
> Re: But if there is such a high overhead, how do you gain such efficiency?

Well, splays trees require *no* balancing information.  This saves on
space.  I said that I could acheive 55,000 *comparisons* per second.  In
other words, if the tree has 1000 elements, then log n ~= 10, so that
means only 5500 *searches* per second.

If you've ever implemented AVL trees, you realize what a pain it is.
Splay trees are extremely simple to implement, and have some unique
properties (they are finger trees:  Searches in 'the same
neighborhood' are extremely fast).  For more info, see [1].

[1] D.D. Sleator & R.E. Tarjan, Self-Adjusting Binary Search Trees,
JACM, July 1985, pp. 652-686


Don Gillies, Dept. of Computer Science, University of Illinois
1304 W. Springfield, Urbana, Ill 61801      
ARPA: gillies@cs.uiuc.edu   UUCP: {uunet,harvard}!uiucdcs!gillies

hascall@atanasoff.cs.iastate.edu (John Hascall) (08/26/89)

In article <207600033@s.cs.uiuc.edu> mccaugh@s.cs.uiuc.edu writes:
 
}Responding to: gillies@m.cs.uiuc.edu 
 
}> One of the easiest trees to implement is the Splay tree, but there is
}> a high overhead 
}> ....O(log n) time.
  
} But if there is such a high overhead, how do you gain such efficiency?


     In typical theoretical computer science fashion: 
    
        1) Ignore the constant factor: O(log n) really is c * log n
	   (notice how it was ignored above).

        2) If someone mentions "1)", fall back to the asymptotic argument:
	   "When n is very large...".  Never mind that n has to be so huge
	   that it would/could never happen in real life.  "See when n
	   is 1E1000 records..."  :-)


        Imagine the following conversation:

	   Boss:  "This sort of student records you wrote is too slow."
	   You:   "But if we had a billion students it would be fast."
	   Boss:  (beats you severely with your keyboard)


John Hascall

gillies@p.cs.uiuc.edu (08/28/89)

Re: Hascall from Cornell criticizes the high overhead in Splay Trees
    blames problem on theoretical computer scientists.

First off the time per comparison is constant, whether you do one, or
a billion.  It only takes O(n) comparisons for Splay to achieve O(n
log n) performance.  The high overhead comes about because *every*
search involves doing rotation on the tree.  In fact, the sought-after
object is rotated all the way to the root of the tree (Splay is a
"move to front" heuristic for trees).

The basenote writer wanted something more efficient in time & space
than AVL trees.  That is a Tall Order.  He called AVL "inefficient in
time & space", and didn't give any reasons.  I tend to find today's
automobiles inefficient in time & space, too.

Splay trees are so simple to implement, they also save *code space*.
I wrote a splay package to find ways to speed up splay(), and
successfully designed two new splays.

I agree that 20 microseconds/comparison is pretty slow, and it should
be more like 1-2 microseconds a comparison, but this package is not
absolutely as fast as possible.  In fact, you could
  (0) unroll the rotations (estimate: 65k comparisons/sec)
  (1) Implement top-down splay (estimate: 100k comparisons/sec)
  (2) Implement my splay, top-down (maybe 110k/sec)

Unlike an AVL tree, 100k comparisons becomes 100k lookups, if you're
looking up the same thing repeatedly.  So Splay can blow the doors off
AVL trees if there is much locality of reference, or insertion /
deletion.

Don Gillies, Dept. of Computer Science, University of Illinois
1304 W. Springfield, Urbana, Ill 61801      
ARPA: gillies@cs.uiuc.edu   UUCP: {uunet,harvard}!uiucdcs!gillies

hascall@atanasoff.cs.iastate.edu (John Hascall) (08/29/89)

In article <77200043@p.cs.uiuc.edu> gillies@p.cs.uiuc.edu writes:
 
}Re: Hascall from Cornell criticizes the high overhead in Splay Trees
}    blames problem on theoretical computer scientists.

  Just because we are in the middle of Iowa is no reason to call
  our university "CORNell"!
 
   Anyway, I merely pointed out that asymptotic efficiency (big O)
   does not always equate to real world efficiency.  The converse
   can also be true (i.e., qsort).

   I wasn't blaming anyone, just stating that in theory many practical
   aspects are overlooked/ignored in order to make the theory more
   manageable/tractable/understandable.

   This does not make the theory any less important, rather it merely
   emphasizes that what's important is knowing how to decide which
   theory is applicable to your real world problem (and knowing how
   to apply it).


John Hascall
IOWA STATE UNIVERSITY