[comp.databases] Optimization algorithms

gbrandon@BRAHMS.BERKELEY.EDU (Greg Brandon) (12/01/86)

Is there a forum specifically for the discussion of relational database
access optimization algorithms, or is this the place for that kind of
discussion?  I am interested in both online (newsgroup, mailing list)
or paper (newsletter, digest) forums.  AdvTHANKSance.

Greg Brandon

wohler@sri-spam.istc.sri.com (Bill Wohler) (12/08/86)

Greg Brandon writes:

>Is there a forum specifically for the discussion of relational database
>access optimization algorithms, or is this the place for that kind of
>discussion?

  i was expecting to find more of a technical discussion when i began
  reading this column, but have not yet seen much.

  so hopefully to begin a more fruitful dialogue, i pose the
  following. i used to use "grep -i <pattern> phonebook" to retreive
  records from my personal online phone directory, but now that the
  file has begun to get quite large (> 200 records and counting), i
  have been tossing around ideas for implementing a more efficient and
  flexible method of retreival.

  however, the requirements are a little different from normal
  databases.  i'd like the generality of grep in my searches, that is,
  to be able to search for a record with only fragments of a key.  for
  example, "dan" would match the first names "Dan" and "Daniel", the
  last name "Danielson", and the company called "Danskin".  i currently
  key on first and last names, business name, and (at least now) month
  of birthday to get a quick printout of people who have birthdays
  this month. 

  dbm(3) loses since it needs fully qualified keys, and only allows
  one entry per key.  note that a name may occur more than once.

  i'm thinking of creating a multi-ring stucture, but can't figure out
  how to create an efficient way of finding an index from a partial
  key short of using a linear search.  perhaps there might be a way in
  a relational type database.

  a hash function for this would make a nice thesis (and a quick
  program). 

  well, food for thought...

							--bw

UH2@PSUVM.BITNET (12/09/86)

Go ahead and discuss algorithms.  It will be a welcome change
from this newsgroup's usual fare (including that submitted by me)
which comprises mostly questions about where one can get great
free software for a Heinz 57 running Foonix version 92.  8-)

Lee

brain@Shasta.STANFORD.EDU (Sam Brain) (12/09/86)

>   i'd like the generality of grep in my searches, that is,
	...
>   for example, "dan" would match the first names "Dan" and "Daniel", the
>   last name "Danielson", and the company called "Danskin".  i currently
>   key on first and last names, business name, and (at least now) month
>   of birthday to get a quick printout of people who have birthdays
>   this month. 
>   ... I can't figure out
>   how to create an efficient way of finding an index from a partial
>   key short of using a linear search.  perhaps there might be a way in
>   a relational type database.

Since all your examples are fragments at the beginning of the word,
it seems you could create inverted indices, (which are, of course
sorted lexicographically) i.e. ISAM- or B-tree-type structures, one for each
field you want to key on. This would give the added flexibility of saying
things like:

	'find lastname = "Dan*" or firstname = "Ge*"'
			or
	'find lastname = "Dan*" and firstname = "Ge*"'

depending on what you were looking for. Of course you couldn't find
strings of the form "*niel*" etc.

Most DBMSs have inverted indices of one type or another. And haven't I
seen B-tree code on {mod|net}.sources?

UH2@PSUVM.BITNET (12/10/86)

To be honest, I am not sure I understand your proposed approach.  But two
practical questions come to mind.


1.  Have you tried egrep?  In theory it takes longer to encode the
search string, but then the search is faster.

2.  By adding structure to your phone book, you are gonna lose the
transparent interface to other tools provided by the shell.

lee

billc@blia.BLI.COM (Bill Coffin) (12/10/86)

response to the article about grepping the phone list:

Most relational DBMSs will do what you want -- search on an index with
a partial key.  Most DBMSs also support rudimentary pattern matching;
much weaker than egrep, but still better than nothing.  Keep in mind
though that the very nature of pattern-matching tends to defeat 
indexing.  So, use of partial keys should be allowed in any self-respecting
DBMS, but the user should be aware that the search may degenerate into
a linear scan.  (Of course, with just 200 tuples this won't hurt much.)

paul@vcvax1.UUCP (paul) (12/14/86)

> 2.  By adding structure to your phone book, you are gonna lose the
> transparent interface to other tools provided by the shell.

Not necessarily.  The database in our (VenturCom's) Prelude
Information Management System product has a similar challenge: to use an
ASCII database format, compatible with existing UNIX tools, and
yet provide a fast look-up mechanism.

Our solution is to use a daemon, called the Record Management
System (RMS), which maintains a btree index (in a separate file)
to the database.  Programs which wish to locate a record quickly
communicate with the RMS to find records.  The RMS is also used
to update records in the table, as well as maintain locks.
However, since the actual table is always kept in ASCII format,
other UNIX programs are free to read it directly.  (The one
constraint is that they are not permitted to write to it
directly, since that would cause inconsistencies between the
file's contents and the RMS btree.)

The result is that we can maintain an ASCII, "open-architecture"
data format, and still provide fast record access when desired.

------------
Paul Kleppner
VenturCom, Inc.
617/661-1230
{seismo!harvard,genrad!mit-eddie}!cybvax0!vcvax1!paul

kimcm@olamb.UUCP (Kim Chr. Madsen) (12/17/86)

In article <8988UH2@PSUVM>, UH2@PSUVM.BITNET writes:
> 1.  Have you tried egrep?  In theory it takes longer to encode the
> search string, but then the search is faster.

All very fine for ordinary ASCII files that's not too big!
or if you have ordered them in some way to optimize (might not be
ASCII sorting) seaching even egrep to to slow to produce the answers.

Another thing is that Regular Expressions (as used in egrep) tends to
be unitelligible for non-UNIX-wizards (eg secretaries) and making shell-
scripts to hide the RE's will almost certainly limit the usability of them.

> 
> 2.  By adding structure to your phone book, you are gonna lose the
> transparent interface to other tools provided by the shell.

OK, UNIX provides you with a byte-stream, which is all very fine, makes a
lot of applications avaiable to the same files. But the concept is
already broken in several places, just think of the /etc/passwd file. A
nicely structured file, with a record structure. But you can still use
standard UNIX tools on it (cat, sed, awk, vi ...etc...).

Database systems that uses C-ISAM (or the like) will normally make
similar ASCII files, with one line containing a record, with fields
separated by a special character (not always a good idea, many uses
'|' (vertical bar) as a default separator but this is a genuine letter in
many languages - but still we have to face that UNIX is based on good ole
7-bit ASCII). This file (the table file) is normally paired with an index
file, which contains the offset-indexes to the record-entries and
field-entries in the tablefile.

Other database systems, do not use the standard UNIX filesystem but make
a disk-partition on which it uses raw access and own I/O routines. This
concept is generally used to speed up the disk-I/O which tends to be the
bottleneck in database systems (giving the system a superior performance
compared with systems using the standard UNIX FS). This may sound far
away from the UNIX approach - but again it has been done in standard UNIX
for a long time - think of the swap-partition on every UNIX disk, and
think of the performance loss it would be to implement the swapping on
the standard FS.

					Kim Chr. Madsen