[comp.lang.c] search

glenn@stsim.UUCP (Glenn Ford) (08/17/89)

I am writing a small project (not related to work, personal) that has to
do search's on a sorted database.  I search off the keyword which is string
of N length and is unique to all other keys.  For right now I do a binary
search off the database, but this tends to be slow when run off a floppy
drive (yes, it has to run on a floppy drive in SOME cases).  My question
is: Is there a faster, yet fairly easy to implement, routine that can
search my database?  Faster than binary search..  I don't want B-Tree's.  My
database is allready built.

Glenn Ford
..uunet!ocsmd!stsim!glenn

chris@mimsy.UUCP (Chris Torek) (08/18/89)

In article <184@stsim.UUCP> glenn@stsim.UUCP (Glenn Ford) writes:
>... search's on a sorted database. ... Is there a faster, yet fairly easy
>to implement, routine that can search my database?  Faster than binary
>search..  I don't want B-Tree's.  My database is allready built.

An interpolative search will run in O(log log n) time rather than
O(log n) in the `average' case.  Coding these is not too terribly
tricky (one assumes that if a[0] is 0 and a[10] is 100, then a[1] is
10, a[2] is 20, etc., until it is otherwise demonstrated).  But what
you really want to do is minimise read operations (since you are
reading off slow media), which suggests that the algorithm be tweaked
slightly: read big blocks (if they are contiguous media-wise) and look
at the edges of the block before switching blocks.

You will do better, though, if you rebuild the database.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

wjr@ftp.COM (Bill Rust) (08/18/89)

In article <184@stsim.UUCP> glenn@stsim.UUCP (Glenn Ford) writes:
>I am writing a small project (not related to work, personal) that has to
>do search's on a sorted database.  I search off the keyword which is string
>of N length and is unique to all other keys.  For right now I do a binary
>search off the database, but this tends to be slow when run off a floppy
>drive (yes, it has to run on a floppy drive in SOME cases).  My question
>is: Is there a faster, yet fairly easy to implement, routine that can
>search my database?  Faster than binary search..  I don't want B-Tree's.  My
>database is allready built.

First, I am not sure why you don't want B-Trees (or some variant),
since they are pretty easy to implement and quite fast. You don't
mention how big the database is or what percentage of the record a key
takes up. If the key is a small part of record, making a separate key
file that gets read into your application in one piece, will really
speed up access into the database, since the keys are searched at RAM
speed not disk speed. Also, if the database is a fixed size, a hashing
function (directly converting the key to a record position) may be
most appropriate. If you want a really good discussion of all this,
consult Knuth. Just remember that you want to minimize the number of
read operations to disk. One possible longshot, since you are working
with floppies, is just read in the whole database. On a PC, you can
get upwards of 500K of data read in, with a small program, and, at
that point, your searching technique matters a lot less.

Bill Rust (wjr@ftp.com)

cliffs@sunrock.east.sun.com (Clifford C. Skolnick) (08/19/89)

In article <184@stsim.UUCP> glenn@stsim.UUCP (Glenn Ford) writes:

   I am writing a small project (not related to work, personal) that has to
   do search's on a sorted database.  I search off the keyword which is string
   of N length and is unique to all other keys.  For right now I do a binary
   search off the database, but this tends to be slow when run off a floppy
   drive (yes, it has to run on a floppy drive in SOME cases).  My question
   is: Is there a faster, yet fairly easy to implement, routine that can
   search my database?  Faster than binary search..  I don't want B-Tree's.  My
   database is allready built.

   Glenn Ford
   ..uunet!ocsmd!stsim!glenn

Perhaps you could make a better guest on your binary search start.  Say
look at the first character and make a guess about where to start in the
file.  You can also branch on much less, less, equal, greater, and much
greater.  This will be a bit faster than a binary search.  You say you
are using a binary search now, so not much more logic is needed.

				Cliff

glenn@ocsmd.ocs.com (glenn ford) (08/23/89)

In article <706@ftp.COM> wjr@ftp.UUCP (Bill Rust) writes:
>with floppies, is just read in the whole database. On a PC, you can
>get upwards of 500K of data read in, with a small program, and, at
>that point, your searching technique matters a lot less.
This is what I ended up doing.  I just read the whole database into
memory and then search from there.  My database is fairly small, but
when forced to run off a floppy drive it was too slow.  Thanks for
the replies to all!

Glenn Ford
..uunet!ocsmd!stsim!glenn