parris@itcatl.UUCP (Parris Hughes) (07/09/86)
I am building a database manager which uses B-Trees. My references on B-Trees are "Algorithms + Data Structures = Programs" by Wirth, and "Searching & Sorting" by Knuth. I am having problems understanding the B-Tree deletion alogrithm. Given the following 2nd Order B-Tree: ____________ 25 ______________ / \ ___ 10,20 ___ _ 30,40 __ / | \ / | \ 5,7,8 13,15,18 22,24 26,27 32,35,38 42,45,46 delete keys 25, 45, and 24. According to Wirth, the resultant B-Tree should be: ________________ 10,22,30,40 __________________ / ______/ | \______ \ / / | \ \ 5,7,8 13,15,18,20 26,27 32,35,38 42,46 Could someone please email me the intervening forms the tree takes (i.e., after deleting 25, then after deleting 45)? My result looks totally different (still three levels; don't want to waste resources posting it). Any help would be most appreciated. Thanks for reading this far. Parris ....gatech!itcatl!parris
ark@ut-sally.UUCP (Arthur M. Keller) (07/12/86)
In article <151@itcatl.UUCP> parris@itcatl.UUCP (Parris Hughes) writes: >I am building a database manager which uses B-Trees. My references on >B-Trees are "Algorithms + Data Structures = Programs" by Wirth, and >"Searching & Sorting" by Knuth. I am having problems understanding the >B-Tree deletion alogrithm. Given the following 2nd Order B-Tree: I suggest that the answer is only of `academic' interest. Because of fanout considerations (increasing fanout decreases tree height), all internal (non-leaf) nodes contain keys only for the purposes of navigation. All actual database keys appear only in the leaves of the B-tree (this is technically called a B+-tree (the + is a superscript, the - is a hyphen)). This means that internal node keys need not be actual database keys, and hence are not affected by deletion. Also, right end key compression is appropriate to save space (and can even be a factor when deciding to split a node!). If you intend your database manager support a relational database, you probably want multiple B-trees referring to the same collection of records (presumably one relation). I have a paper submitted for publication containing algorithms and data structures for a multiple B-tree structure that is crash-proof and deadlock-free (for disk block locks but not for logical record locks) and would be relatively efficient if coupled with a block cacheing scheme with write-through. I would be willing to send out advance copies to people who mail me a message containing their physical (snail) mail address and agree to send me comments. I'd especially be interested in hearing from anyone who plans to implement such algorithms, and I might be persuaded to spend some time explaining it to such people under some appropriate circumstances. An earlier version of these algorithms was prototyped and the lessons learned were used in developing the approach described in my paper. Arthur -- ------------------------------------------------------------------------------ Arpanet: ARK@SALLY.UTEXAS.EDU UUCP: {gatech,harvard,ihnp4,pyramid,seismo}!ut-sally!ark
herr@infbs.UUCP (07/14/86)
_________________25_______________ / \ ___ 10,20 ___ _ 30,40 __ / | \ / | \ 5,7,8 13,15,18 22,24 26,27 32,35,38 42,45,46 25 deleted: _________________24________________ / \ ___ 10,18 ___ _ 30,40 __ / | \ / | \ 5,7,8 13,15 20,22 26,27 32,35,38 42,45,46 45 deleted: _________________24________________ / \ ___ 10,18 ___ _ 30,40 __ / | \ / | \ 5,7,8 13,15 20,22 26,27 32,35,38 42,46 24 deleted: _________________22_________________ / \ ___ 10,18 ___ _ 30,40 __ / | \ / | \ 5,7,8 13,15 ---- 26,27 32,35,38 42,46 |20| ---- underflow _____/ _________________22_________________ / \ ---- _ 30,40 __ |10| <----- underflow / | \ ___ ---- __ / | \ / | \ / | \ 5,7,8 13,15 18,20 26,27 32,35,38 42,46 _____________10,22,30,40____________ / / | \ \ / / | \ \ / / | \ \ 5,7,8 13,15,18,20 26,27 32,35,38 42,46
johnl@ima.UUCP (John R. Levine) (07/17/86)
In article <5303@ut-sally.UUCP> ark@sally.UUCP (Arthur M. Keller) writes: >In article <151@itcatl.UUCP> parris@itcatl.UUCP (Parris Hughes) writes: >>... I am having problems understanding the B-Tree deletion alogrithm. > >I suggest that the answer is only of `academic' interest. Because of >fanout considerations (increasing fanout decreases tree height), all >internal (non-leaf) nodes contain keys only for the purposes of >navigation. All actual database keys appear only in the leaves of the >B-tree (this is technically called a B+-tree (the + is a superscript, >the - is a hyphen)). This means that internal node keys need not be >actual database keys, and hence are not affected by deletion. Also, >right end key compression is appropriate to save space (and can even >be a factor when deciding to split a node!). Not necessarily. For one thing, you shouldn't assume that all useful B-trees are really indices into data stored external to the B-tree. If your records are fairly small, and if there is one key that is the predominant access path, it makes perfect sense to put the actual records into the B-tree. If you need secondary indices, they would be B-trees which map the secondary key into the primary key which you could then look up. This scheme sacrifices secondary index lookup time for primary index lookup time, which may be what you want. The other point is that if you don't delete stale records from the B-tree, the nice logarithmic access time property of B-trees no longer holds. I find this particularly insidious when the keys in question are things like dates of pending orders in a file, so that the file tends to shrink at the back and grow at the front. Delete your stale records, you'll be glad you did. Both left and right hand key compression are useful ways to cram more entries into a tree node, and there's no argument that in the usual case that it's more expensive to fetch a node from the disk than to process it once it's read in, they're a big win. On the other hand, I have this blindingly fast Eagle disk drive lashed up to a very sluggish IBM PC, and the tradeoffs are not always what you'd assume. -- John R. Levine, Javelin Software Corp., Cambridge MA +1 617 494 1400 { ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Levine@YALE.EDU The opinions expressed herein are solely those of a 12-year-old hacker who has broken into my account and not those of any person or organization.
eric@osiris.UUCP (Eric Bergan) (07/18/86)
In article <164@ima.UUCP>, johnl@ima.UUCP (John R. Levine) writes: > Both left and right hand key compression are useful ways to cram more entries > into a tree node, and there's no argument that in the usual case that it's > more expensive to fetch a node from the disk than to process it once it's read > in, they're a big win. On the other hand, I have this blindingly fast Eagle > disk drive lashed up to a very sluggish IBM PC, and the tradeoffs are not > always what you'd assume. Actually, I have talked to a couple of different relational database companies, and their studies indicate that key compression is not a win. The reason for this is that yes, a disk access is normally longer than the time necessary to uncompress, but since most of the bigger systems try to cache the indices, there may not be a disk access. Then the compression is a straight loss. Of course, compression would mean you can get more index into the cache, but their studies seem to argue that compression just does not speed up access time. -- eric ...!seismo!umcp-cs!aplcen!osiris!eric