curtin@cbnewse.ATT.COM (C.S.Curtin) (02/23/90)
this posting is for a friend without net access. I will forward e-mail and redirect flames > /dev/null. dBase/Clipper gurus I am doing an application that is having speed problems when many records are entered into the database. The time seek()ing records using the index do not seem to be linear. Sure with additional records the seek time should increase but the factor seems to be x**2. This generates several questions for the data base internals gurus to answer. Are there any good books on the internals of pc bases data bases? (mainly clipper/dbase) 1. How are the index files structured? The data base will have approximately 20,000 records and will have several indexes active at a time. currently the size of the index file is 250K bytes. How is the index internally accessed by clipper, since the index cannot be kept in core memory? Does it require clipper to access the index file on disk first? a. does anyone have any hints on increasing the speed of the index access? b. Does index key data type have that much to do with it? I am assuming that it just does a string compare between the requested key and the key in the index file. c. What does it cost to index on various data types (ie. string+int) or should all indexes be performed on just string "types" ? key 2. How is the skip command performed? What actually happens when a skip command is given? Does the file pointer just increment X bytes (whatever the width of the record) number of bytes, or is the record read into the buffer and flushed by clipper? It appears as though the record is read into the buffer and the file pointer not just incremented is it? lseek(dbf_fd, (num_skips)*sizeof(record), SEEK_CUR); or? while (num_skips--) read(dbf_fd, rec, sizeof(struct rec); Any assistance on these issues or others related to the internals of clipper/dbase would be appreciated. I am concerned about the speed of the data access and currently cannot see how a system with many records can be maintained using either of these databases.
awd@dbase.A-T.COM (Alastair Dallas) (02/27/90)
Congratulations! You've managed to hit on precisely what my management's lawyers mean when they speak of "proprietary information." Sorry for being flip, but this is in reply to mail that asked question after question pertaining to the exact nature of Clipper (and by extension dBASE) operations and there's just no way I can be forthcoming. I can say that the main speed cost in any PC database system is reading the disk. Nothing else (string compare vs numeric compare) comes close to affecting the bottom line speed so profoundly as being able to avoid "hitting the disk" even once. Therefore, by keeping your index keys small you allow the system to pack more of them into a fixed-length block (dBASE IV supports adjustable block sizes), which ultimately reduces the number of disk reads (especially for SKIP operations). If you want to get really tricky, write code that hashes your key values into a 4-byte long and index on a UDF that uses this value to build a 4-byte Character string. That'll let you SKIP 40 times or so without reading another index node. The other thing I _can_ say is that you might look at Knuth's "Art of Computer Programming," Vol. 3: Sorting and Searching. It describes the operation of Clipper's and dBASE's indexing in sufficient abstraction so as not to perturb the lawyers. Hope it helps. /alastair/