[comp.sys.mac.programmer] HFS, extent, B*-trees, oh joy...

captkidd@athena.mit.edu (Ivan Cavero Belaunde) (02/14/91)

I'm trying to write a routine that takes a file identifier (filename and
WDRefNum or filename, dirID and vRefNum) and returns a pointer to a block
containing a list of the block numbers on that volume occupied by a file.
Although I understand the basic procedure pretty clearly, there are a couple
of stumbling blocks that are driving me crazy.

Basically a file consists of one or more extents (chunks of contiguous
blocks in the volume). Information about the extents is stored in a B*-tree,
on which each node has the following structure:

	struct {
	  NodePtr	ndFLink;	// Ptr to next leaf node
	  NodePtr	ndBLink;	// Ptr to prev leaf node
	  char	ndType;			// Node type
	  char	ndLevel;		// Node level
	  short	ndNRecs;		// # of records in node
	  < record 0 >
	  < record 1 >
	      ....
	  < record n >
	  < free space >
	  < offset to free space >
	  < offset to record n   >
	       ....
	  < offset to record 1 >
	  < offset to record 0 >
	} NodeRec, *NodeRec;

This structure doesn't seem to be very well defined, however. IM
says that all nodes (no matter what kind) are a fixed size, and thus
one can access the fields at the end which point to the records (which
can be of varying size). I haven't been able to find this fixed size
documented anywhere, however (anyone know?). (It's not absolutely necessary,
since I know the size records I'm dealing with, but it'd be better).

The structure of the records is not clearly documented either, but it
would seem to be:

	struct {
	  unsigned char	xkrKeyLen;	// Key length
	  char	xkrFkType;		// Fork type
	  long	xkrFNum;		// File number
	  short xkrFABN;		// Allocation block # within file
	  union {
	    ExtentDesc	extents[3];	// Extent descriptors (3 of them)
	    				//	(if leaf node)
	    long	dnNode;		// Ptr to node in lowever level
	    				//	(if index node)
	  }
	} ExtentRecord;
	
	struct {
	  short	xtFirstBlk;		// # of extent's 1st allocation block
	  short xtNumBlks;		// Number of blocks in extent
	} ExtentDesc;

The extents B*-tree is not in RAM, however. It's in a file (second FCB in the
FCB buffer). So my question is: what's the format of this file (so I
can actually navigate the B*-tree)?  Is there a header before the actual
data comes? And do the node pointers effectively translate to marks
within the extent file?

Now it would seem that what I need to do is:

	1) Open the file and get a refNum for it.
	2) Get the info from the FCB (file number)
	3) Get info from the corresponding VCB
		(vcbAlBlSt for calculating actual block #s,
		vcbXTRef [the extent file might not always be the second
		FCB in the FCB buffer])
	4) Navigate the extent tree, searching for the first extent record,
		then using the data to calculate the search key for the
		next extent record, and so on, building the list of block #s
		as we go.

Now, has anyone out there done this before? Are there any hairy issues as
far as moving the mark for the extent file FCB behind the file manager's
back? What about the extent cache, and is there a way to flush it (does it
need to be)? And finally, are my data structure definitions correct? I
can't seem to find formal definitions *anywhere*, and I just deduced those
from the descriptions in IM4.

Or maybe there's a really obscure call somewhere that will get me the info I
need :-) Doubt it, tho'.

I'd really appreciate any insights anyone has on this topic. Thanx y'all.

-Ivan Cavero Belaunde
Internet: captkidd@ATHENA.MIT.EDU

        "I suppose you think that was funny."
                "I don't know. I'll have to consult our
                 humor officer. Mr. Spock, was that funny?"
                        "I shall have to analyze it, sir. It may
                         take time."
                -From Peter David's "Star Trek" comic

andyp@treehouse.UUCP (Andy Peterman) (02/15/91)

In article <1991Feb13.210702.9407@athena.mit.edu> captkidd@athena.mit.edu (Ivan Cavero Belaunde) writes:
>I'm trying to write a routine that takes a file identifier (filename and
>WDRefNum or filename, dirID and vRefNum) and returns a pointer to a block
>containing a list of the block numbers on that volume occupied by a file.
>Although I understand the basic procedure pretty clearly, there are a couple
>of stumbling blocks that are driving me crazy.

See if you can find a manual for the old disk editor program Fedit Plus.
It has a B-Tree description that goes beyond that of IM volume IV.  I'll
try to answer some of your questions in case you can't find it:

>Basically a file consists of one or more extents (chunks of contiguous
>blocks in the volume). Information about the extents is stored in a B*-tree,
>on which each node has the following structure:
>
> ... [ stuff deleted]
>
>This structure doesn't seem to be very well defined, however. IM
>says that all nodes (no matter what kind) are a fixed size, and thus
>one can access the fields at the end which point to the records (which
>can be of varying size). I haven't been able to find this fixed size
>documented anywhere, however (anyone know?). (It's not absolutely necessary,
>since I know the size records I'm dealing with, but it'd be better).

The first node of a B-Tree is a header node (type 1).  It will have
three records.  The first one, starting at an offet of $000E into the
node, contains the following:

$000E	Logical sector number of the last leaf node
$0012	The size of each B-Tree node (always 512 bytes or one sector)
$0014	Maximum size of a key in the B-Tree.
$0016	The total number of nodes in the B-Tree.
$001A	The number of free nodes in the B-Tree.

You can see from the above that the size of a B-Tree node is specified
at an offset of $0012 into the B-Tree file (first node, first record).
The second record of the B-Tree header node (at $0078) is unused.  The
third record (at $00F8) contains a bit map of which sectors (nodes) are 
in use for the B-Tree file.  

>The extents B*-tree is not in RAM, however. It's in a file (second FCB in the
>FCB buffer). So my question is: what's the format of this file (so I
>can actually navigate the B*-tree)?  Is there a header before the actual
>data comes? And do the node pointers effectively translate to marks
>within the extent file?

To use either the extent or catalog b-tree file, get the Volume Control
Block for the volume and get the refNums from vcbXTRef or vcbCtlRef (see
IM-IV page 176-177).  These files will alread be open and all you have to
do is do a standard PBRead using these refnums.  (I might as well mention
it now - DO NOT WRITE TO EITHER OF THESE FILES!!!!).

The files just contain the b-tree nodes - if you read the first 512 bytes 
of the file, you will get the header node as described above.  The next 512
bytes will be node 2, etc.

>Now it would seem that what I need to do is:
>
>	1) Open the file and get a refNum for it.
>	2) Get the info from the FCB (file number)
>	3) Get info from the corresponding VCB
>		(vcbAlBlSt for calculating actual block #s,
>		vcbXTRef [the extent file might not always be the second
>		FCB in the FCB buffer])

No.  See above for an easier solution.

>	4) Navigate the extent tree, searching for the first extent record,
>		then using the data to calculate the search key for the
>		next extent record, and so on, building the list of block #s
>		as we go.

Well, sort of.  If you're looking for the extent of a file, you need to
start with the extent entry in its FCB (or catalog B-Tree entry).  The
first three extents of a file are stored in the fcbExtRec field of the
FCB (see IM-IV page 180).  Most files, unitl they get fragmented,
do not contain entries in the extent b-tree - they just have one to
three entries here.   

>Now, has anyone out there done this before? Are there any hairy issues as
>far as moving the mark for the extent file FCB behind the file manager's
>back? What about the extent cache, and is there a way to flush it (does it
>need to be)? And finally, are my data structure definitions correct? I
>can't seem to find formal definitions *anywhere*, and I just deduced those

I haven't used the extents file, although I have read from the catalog
b-tree file successfully.  I don't remember if there is a cache problem,
although I think I do a FlushVol before I try reading the file.

Good luck!!!

-- 
Andy Peterman                       |   Opinions expressed
treehouse!andyp@gvgpsa.gvg.tek.com  | are definitely those of
(916) 273-4569                      |      my employer!