gbrandon@BRAHMS.BERKELEY.EDU (Greg Brandon) (12/01/86)
Is there a forum specifically for the discussion of relational database access optimization algorithms, or is this the place for that kind of discussion? I am interested in both online (newsgroup, mailing list) or paper (newsletter, digest) forums. AdvTHANKSance. Greg Brandon
wohler@sri-spam.istc.sri.com (Bill Wohler) (12/08/86)
Greg Brandon writes: >Is there a forum specifically for the discussion of relational database >access optimization algorithms, or is this the place for that kind of >discussion? i was expecting to find more of a technical discussion when i began reading this column, but have not yet seen much. so hopefully to begin a more fruitful dialogue, i pose the following. i used to use "grep -i <pattern> phonebook" to retreive records from my personal online phone directory, but now that the file has begun to get quite large (> 200 records and counting), i have been tossing around ideas for implementing a more efficient and flexible method of retreival. however, the requirements are a little different from normal databases. i'd like the generality of grep in my searches, that is, to be able to search for a record with only fragments of a key. for example, "dan" would match the first names "Dan" and "Daniel", the last name "Danielson", and the company called "Danskin". i currently key on first and last names, business name, and (at least now) month of birthday to get a quick printout of people who have birthdays this month. dbm(3) loses since it needs fully qualified keys, and only allows one entry per key. note that a name may occur more than once. i'm thinking of creating a multi-ring stucture, but can't figure out how to create an efficient way of finding an index from a partial key short of using a linear search. perhaps there might be a way in a relational type database. a hash function for this would make a nice thesis (and a quick program). well, food for thought... --bw
UH2@PSUVM.BITNET (12/09/86)
Go ahead and discuss algorithms. It will be a welcome change from this newsgroup's usual fare (including that submitted by me) which comprises mostly questions about where one can get great free software for a Heinz 57 running Foonix version 92. 8-) Lee
brain@Shasta.STANFORD.EDU (Sam Brain) (12/09/86)
> i'd like the generality of grep in my searches, that is, ... > for example, "dan" would match the first names "Dan" and "Daniel", the > last name "Danielson", and the company called "Danskin". i currently > key on first and last names, business name, and (at least now) month > of birthday to get a quick printout of people who have birthdays > this month. > ... I can't figure out > how to create an efficient way of finding an index from a partial > key short of using a linear search. perhaps there might be a way in > a relational type database. Since all your examples are fragments at the beginning of the word, it seems you could create inverted indices, (which are, of course sorted lexicographically) i.e. ISAM- or B-tree-type structures, one for each field you want to key on. This would give the added flexibility of saying things like: 'find lastname = "Dan*" or firstname = "Ge*"' or 'find lastname = "Dan*" and firstname = "Ge*"' depending on what you were looking for. Of course you couldn't find strings of the form "*niel*" etc. Most DBMSs have inverted indices of one type or another. And haven't I seen B-tree code on {mod|net}.sources?
UH2@PSUVM.BITNET (12/10/86)
To be honest, I am not sure I understand your proposed approach. But two practical questions come to mind. 1. Have you tried egrep? In theory it takes longer to encode the search string, but then the search is faster. 2. By adding structure to your phone book, you are gonna lose the transparent interface to other tools provided by the shell. lee
billc@blia.BLI.COM (Bill Coffin) (12/10/86)
response to the article about grepping the phone list: Most relational DBMSs will do what you want -- search on an index with a partial key. Most DBMSs also support rudimentary pattern matching; much weaker than egrep, but still better than nothing. Keep in mind though that the very nature of pattern-matching tends to defeat indexing. So, use of partial keys should be allowed in any self-respecting DBMS, but the user should be aware that the search may degenerate into a linear scan. (Of course, with just 200 tuples this won't hurt much.)
paul@vcvax1.UUCP (paul) (12/14/86)
> 2. By adding structure to your phone book, you are gonna lose the > transparent interface to other tools provided by the shell. Not necessarily. The database in our (VenturCom's) Prelude Information Management System product has a similar challenge: to use an ASCII database format, compatible with existing UNIX tools, and yet provide a fast look-up mechanism. Our solution is to use a daemon, called the Record Management System (RMS), which maintains a btree index (in a separate file) to the database. Programs which wish to locate a record quickly communicate with the RMS to find records. The RMS is also used to update records in the table, as well as maintain locks. However, since the actual table is always kept in ASCII format, other UNIX programs are free to read it directly. (The one constraint is that they are not permitted to write to it directly, since that would cause inconsistencies between the file's contents and the RMS btree.) The result is that we can maintain an ASCII, "open-architecture" data format, and still provide fast record access when desired. ------------ Paul Kleppner VenturCom, Inc. 617/661-1230 {seismo!harvard,genrad!mit-eddie}!cybvax0!vcvax1!paul
kimcm@olamb.UUCP (Kim Chr. Madsen) (12/17/86)
In article <8988UH2@PSUVM>, UH2@PSUVM.BITNET writes: > 1. Have you tried egrep? In theory it takes longer to encode the > search string, but then the search is faster. All very fine for ordinary ASCII files that's not too big! or if you have ordered them in some way to optimize (might not be ASCII sorting) seaching even egrep to to slow to produce the answers. Another thing is that Regular Expressions (as used in egrep) tends to be unitelligible for non-UNIX-wizards (eg secretaries) and making shell- scripts to hide the RE's will almost certainly limit the usability of them. > > 2. By adding structure to your phone book, you are gonna lose the > transparent interface to other tools provided by the shell. OK, UNIX provides you with a byte-stream, which is all very fine, makes a lot of applications avaiable to the same files. But the concept is already broken in several places, just think of the /etc/passwd file. A nicely structured file, with a record structure. But you can still use standard UNIX tools on it (cat, sed, awk, vi ...etc...). Database systems that uses C-ISAM (or the like) will normally make similar ASCII files, with one line containing a record, with fields separated by a special character (not always a good idea, many uses '|' (vertical bar) as a default separator but this is a genuine letter in many languages - but still we have to face that UNIX is based on good ole 7-bit ASCII). This file (the table file) is normally paired with an index file, which contains the offset-indexes to the record-entries and field-entries in the tablefile. Other database systems, do not use the standard UNIX filesystem but make a disk-partition on which it uses raw access and own I/O routines. This concept is generally used to speed up the disk-I/O which tends to be the bottleneck in database systems (giving the system a superior performance compared with systems using the standard UNIX FS). This may sound far away from the UNIX approach - but again it has been done in standard UNIX for a long time - think of the swap-partition on every UNIX disk, and think of the performance loss it would be to implement the swapping on the standard FS. Kim Chr. Madsen