RMC100@psuvm.psu.edu (Randy Carraghan) (02/06/90)
This post probably is more appropriate for comp.databases, but it seems the best hackers 'live' in this group, so... Has anybody had much experience tearing apart the structures of popular PC databases (R:base, dBASE, Paradox)? I've spent a fair amount deciphering the R:base file format and will soon move on to dBASE. From what I've seen in Jeff Walden's "File Formats for Popular PC Software" (1986 edition), dBASE II and III ff's seem pretty straightforward, but I haven't seen the specs for III+ or IV yet. Are there other good references (more up-to-date) for these version and/or Paradox? Does anybody know the type of encryption used by dBASE for protecting its files? R:base's protection was very simple, but I imagine dBASE will be more of a challenge (perhaps not even possible if DES or some similar technique is used). I'd imagine the protection used would be something much less CPU-intensive, though, as a database's speed is a primary consideration. My final question concerns the use of B+ trees. Keys in the R:base index file are four bytes. If the key field is a integer/date/time its value is the key. If the key field is a string, the lowest-order bytes are its first two characters and the highest-order two bytes are a function of the remaining 3..n characters of the string. If there is a collision, the duplicates are stored in another block. The scheme for key definition seems very efficient, since only four bytes are used and for strings with the same first two characters, there is still 65k+ resolution for the remaining two bytes. I was under the impression some databases used up to 30 characters (just straight ascii) as the key, but this would allow for much fewer keys per block. Is anyone aware of other popular techniques employed? Also, it's difficult to tell the parameters used by a particular application for constructing its B+ tree (when to split, distribution of keys when split, key movement (as in Knuth's B* tree), etc), but it seems that if you were to recreate a Btree for an application using your own technique, it could still be used by their algorithm. Likewise, a Btree created by their application could still be read by your own. That is, two algorithms which create two Btrees in two different ways (resulting in two distinct trees) could still utilize and reference and 'grow' each other's Btrees. Is this the case or am I missing some points? And finally (I know, I said I was done before...), what is generally the best type of internal sorting method to use for large (disk-based) files? Does anyone know the commercially implemented algorithms are? I've read about so many different techiques (internal merge or tournament sort followed one of many varieties of merge sort to disk). Thanks for any help/advice/references! Randy Carraghan (rmc100@psuvm.bitnet)