curtin@cbnewse.ATT.COM (C.S.Curtin) (02/23/90)
this posting is for a friend without net access. I will forward
e-mail and redirect flames > /dev/null.
dBase/Clipper gurus
I am doing an application that is having speed problems when
many records are entered into the database. The time
seek()ing records using the index do not seem to be linear.
Sure with additional records the seek time should increase
but the factor seems to be x**2. This generates several
questions for the data base internals gurus to answer. Are
there any good books on the internals of pc bases data
bases? (mainly clipper/dbase)
1. How are the index files structured? The data base
will have approximately 20,000 records and will have
several indexes active at a time. currently the size
of the index file is 250K bytes. How is the index
internally accessed by clipper, since the index cannot
be kept in core memory? Does it require clipper to
access the index file on disk first?
a. does anyone have any hints on increasing the
speed of the index access?
b. Does index key data type have that much to do
with it? I am assuming that it just does a
string compare between the requested key and the
key in the index file.
c. What does it cost to index on various data types
(ie. string+int) or should all indexes be
performed on just string "types" ? key
2. How is the skip command performed? What actually
happens when a skip command is given? Does the file
pointer just increment X bytes (whatever the width of
the record) number of bytes, or is the record read
into the buffer and flushed by clipper? It appears as
though the record is read into the buffer and the file
pointer not just incremented
is it?
lseek(dbf_fd, (num_skips)*sizeof(record), SEEK_CUR);
or?
while (num_skips--)
read(dbf_fd, rec, sizeof(struct rec);
Any assistance on these issues or others related to the
internals of clipper/dbase would be appreciated. I am
concerned about the speed of the data access and currently
cannot see how a system with many records can be maintained
using either of these databases.awd@dbase.A-T.COM (Alastair Dallas) (02/27/90)
Congratulations! You've managed to hit on precisely what my management's lawyers mean when they speak of "proprietary information." Sorry for being flip, but this is in reply to mail that asked question after question pertaining to the exact nature of Clipper (and by extension dBASE) operations and there's just no way I can be forthcoming. I can say that the main speed cost in any PC database system is reading the disk. Nothing else (string compare vs numeric compare) comes close to affecting the bottom line speed so profoundly as being able to avoid "hitting the disk" even once. Therefore, by keeping your index keys small you allow the system to pack more of them into a fixed-length block (dBASE IV supports adjustable block sizes), which ultimately reduces the number of disk reads (especially for SKIP operations). If you want to get really tricky, write code that hashes your key values into a 4-byte long and index on a UDF that uses this value to build a 4-byte Character string. That'll let you SKIP 40 times or so without reading another index node. The other thing I _can_ say is that you might look at Knuth's "Art of Computer Programming," Vol. 3: Sorting and Searching. It describes the operation of Clipper's and dBASE's indexing in sufficient abstraction so as not to perturb the lawyers. Hope it helps. /alastair/