[comp.os.cpm] CP/M directory extents

bridger%rcc@RAND-UNIX.ARPA (Bridger Mitchell) (03/15/88)

Yes, directory calculations are confusing, and not helped by confusing
terminology!

The key point is to keep physical and logical extents distinct.

A CP/M LOGICAL extent holds 16K, or 80h = 128 records (each 128 bytes).

A CP/M file can have several directory entries.  Each DIRECTORY ENTRY
(i.e. 32 bytes in the first group(s) on a disk) has 16 bytes of space
for data group numbers; the group numbers must be words if there are
>255 groups on a disk. Thus:

A CP/M PHYSICAL extent (i.e., a single DIRECTORY ENTRY) may be hold
a multiple of 16K.

  The multiple depends on:
	(a) whether a group number is a byte or word
	(b) the size of one allocation group (1,2,4,8,16 K)
	(c) whether the format designer used all of the 16 bytes
	    available in a single directory entry

The RECORD COUNT BYTE gives the number of records in the HIGHEST
logical extent referenced by the physical extent/directory entry it
appears in.

Similarly, the EXTENT BYTE (and its overflow into the low-order bits
of the S2 byte) gives the number of the HIGHEST logical extent
referenced by the physical extent/directory entry it appears in.


examples:
(with S2 and EXT appropriately masked to remove internal bdos flag bits)

#1	S2  = 0,  EXT = 1, RC = 7fh 

	If there is 1 directory entry per logical extent, then
	this is the second directory entry; its final record is 7f=127.

	If there are 2 or more directory entries per logical extent
	this is the first directory entry; its final record is
	80h + 7fh = 255.

#2	S2 = 0, EXT =1, RC = 80h

	If there is 1 directory entry per logical extent, then
	this is the second directory entry; its final record is 80h = 128

	If there are exactly 2 directory entries per logical extent
	this is the first directory entry; its final record is
	80h + 80h = 256, and this entry is full.  

	If there are more than 2 entries per logical extent,
	this entry is not full; the next record would result in
	EXT = 2, RC = 1.

If the file is written sequentially, then we know in #1 that the file has
80h +7fh=255 total records.  If it is written randomly, all we know
(without inspecting the group numbers) is that there is at least one
record, the 255th; there may be others, including some in higher-numbered
extents.

In #2, assuming for example exactly 2 logical extents per directory
entry, there may be a second directory entry that is totally empty
(EXT=2, RC = 0).  This can happen when a sequentially-written file
is exactly 256 records long; the bdos internally closes the directory
entry, creates a new one, and then is told by the program to close the
file.

Regarding (c) above, several OEM's and ramdisk suppliers (Televideo,
SWP, ...) have defined disk formats that do not use all 8/16 group
slots in a single directory entry.  Apparently they weren't able to
distinguish logical and physical extents!  The result is unnecessary
extra directory entries for large files and additional headaches for
programmers.

COPYING random files (ones containing holes) is not so
straightforward, because the utility must determine how to handle a
destination disk that has a different allocation group size.  In the
general case, although the data records can be copied, I don't believe
a "perfect copy" is possible, because the destination copy may not
retain the same information about unwritten records that existed in
the original.  (Information about unwritten records is mostly inferred
from missing group numbers.)  This could conceiveably lead to errors
in a database program that relied on the "unwritten-data" error from
the bdos. (Consider, for example, copying a random-record database
from a 2K to a 4K group disk, and then copying it back to a 2K disk.)


--bridger mitchell

raf@cup.portal.com (03/22/88)

Bridger Mitchell <bridger%rcc@RAND-UNIX.ARPA> writes:

> Yes, directory calculations are confusing, and not helped by confusing
> terminology!

I agree, and I apologize for contributing to the confusion by use of
misleading terminology and (as I now realize) some innacuracy in my
previous posting.  In particular, my statement was *incorrect*, that
RC = 80h implies a full "extent" (meaning PHYSICAL extent, or directory
entry).

> The key point is to keep physical and logical extents distinct.

Yes, that certainly is the key to understanding CP/M directory entries.
Thank you for the clear explanation, Bridger.

[I think my mind was in the same place as those designers who failed
apparently to distinguish logical and physical extents, by not using
all 16 bytes of the allocation group number area.  Several years ago,
I puzzled over the DPB extent mask in a system (Zenith Z-100?) which I
believe falls into this category.  Only now, after reading your note,
do I understand that particular diskette format!]

Bob Freed                         Uucp:  ...!sun!portal!Robert_A_Freed
                              Internet:  Robert_A_Freed@cup.portal.com