[comp.theory] lower-bound for searching sorted sequences

lim@iris.ucdavis.edu (Lloyd Lim) (01/31/91)

In article <56707@eerie.acsu.Buffalo.EDU> sarnath@acsu.buffalo.edu (Ramnath Sarnath) writes:
>In article <56531@eerie.acsu.Buffalo.EDU> I wrote:
>
>This needs a clarification... what I wanted to prove was that searching
>a sorted sequence (by any means) is not possible in time faster than
>O(logn).

You can search faster than O(log n).  Interpolation search comes to mind.
Basically, you guess or interpolate the next postion to look at based on
the value to just saw.

From Udi Manber's "Introduction to Algorithms" (since I can't say it any
better myself):

"The performance of interpolation search depends not only on the size of the
sequence, but also on the input itself.  There are inputs for which
interpolation search checks every number in the sequence.  However,
interpolation search is very efficient for inputs consisting of relatively
uniformly distributed elements.  It can be shown that the average number of
comparisons preformed by interpolation search, where the average is taken over
all possible sequences, is O(log log n).  Although this seems to be an order of
magnitude improvement over the performace of binary search, interpolation
search is not much better than binary search in practice for two main reasons.
First, unless n is very large, the value of log n is small enough that the
logarithm of it is not much smaller.  Second, interpolation search requires
more elaborate arithmetic."

+++
Lloyd Lim     Internet: lim@iris.eecs.ucdavis.edu
              Compuserve: 72647,660
              US Mail: 215 Lysle Leach Hall, U.C. Davis, Davis, CA 95616

NN1@awiwuw11.wu-wien.ac.at (03/11/91)

It was already pointed out that the best lower bound for the complexity
depends on the model of computation used. One possibility is to regard the
array components as abstract atomic objects, and that the only possible
operations are comparisons and assignments of objects as a whole; in this
case the lower bound for the number of comparisons is O(log n). The
algorithm of average complexity O(log log n) assumes some additional
structure (arithmetic for integers or accessing individual components of
lexically ordered strings). In this case it may be more natural, however,
to count the number of bit operations rather than the number of comparisons,
and then the lower bound, even for the average complexity, will again be
O(log n), unless the number of DIFFERENT array entries is of much lower
order than n. Assume, for instance, that the array entries are strings of
length l, ordered lexicographically, and that there only two different
symbols. (This includes also the case of integers in binary representation,
or strings over a larger alphabet, provided that the ordering of the symbols
coincides with lexicographical one of their representation as bit
combinations.) Now, the comparison of two such strings can be done within
constant average complexity, but the same is not longer true for the
arithmetical operations required for interpolation. This complexity depends
also on l, and l must be >= log n, if all array entries are different.The
proof that the number of bit operations cannot be smaller than O(log n) is
the same as in the case where the array entries are considered unstructured.
It must be admitted, though, that it depends on the assumption that strings
are distributed evenly. In fact, the O(log n) barrier can be broken, if the
probability distribution of the inputs is rather peculiar, for instance, if
the element to be searched for is almost always smaller than the smallest
array element.
The whole situation is analogous to that for sorting: In this case the lower
bound is O(n log n), if the array elements are assumed to be unstructured.
Without this assumption the RADIXSORT-algorithm has linear complexity, but
only if the number of bits required to represent the array entries is
considered constant. The latter assumption cannot be maintained if we let
n approach infinity, and if we assume that all (or most) entries are
different.
All this does not mean that the algorithms requiring an asymptotically low
number of comparisons can not be superior in practice to those where the
number of comparisons is O(log n) or O(n log n) respectively. My only point
is that from a theoretical point of view the bounds log n or n log n
respectively do remain valid as far as the number of bit operations is
concerned.

Heinrich Rolletschek
RISC-Linz
K310670@AEARN