m5@lynx.uucp (Mike McNally) (08/16/90)
Does 4.3BSD (or anything else) dynamically compress directories when entries are removed? If so, how does the system deal with consistency? Like if I'm reading while something else is unlinking and the directory is being compressed, how does the world go on living? -- Mike McNally Lynx Real-Time Systems uucp: {voder,daver}!lynx!m5 phone: 408 370 2233 Where equal mind and contest equal, go.
bzs@world.std.com (Barry Shein) (08/19/90)
>Does 4.3BSD (or anything else) dynamically compress directories when >entries are removed? If so, how does the system deal with >consistency? Like if I'm reading while something else is unlinking and >the directory is being compressed, how does the world go on living? >-- >Mike McNally Lynx Real-Time Systems I believe all that is done is the freeing of (completely) emptied blocks, so there's no problem. There's really no need, that I can see, to do any further squeezing (I suppose you could try to smash now small dirs into frags, not really worthwhile.) -- -Barry Shein Software Tool & Die | {xylogics,uunet}!world!bzs | bzs@world.std.com Purveyors to the Trade | Voice: 617-739-0202 | Login: 617-739-WRLD
chris@mimsy.umd.edu (Chris Torek) (08/19/90)
In article <7913@lynx.UUCP> m5@lynx.uucp (Mike McNally) writes: >Does 4.3BSD (or anything else) dynamically compress directories when >entries are removed? If so, how does the system deal with >consistency? I cannot speak for `anything else', but the answer for 4BSD is `no'. Recent versions of 4BSD will reduce the sizes of directories, but not when files are removed: only when files are created. (There is another answer to the above question, which appears below.) The code goes something like this (the order is a bit different): if (this operation is not CREATE or RENAME) we can use the name cache; else { we have to scan the entire directory anyway; we need a slot with enough space for this name; } if (we can use the cache && we should use the cache) { if (cache lookup yeilds valid result) return result; remove invalid entry, if any, from cache; } enduseful = 0; for (offset = 0; offset < dirsize; offset += cur_record_len) { get current record stuff; if (entry is valid) { /* found a nonempty slot */ if (this entry matches) { if (we can use the cache) add it to the cache; return it; } enduseful = offset + cur_record_len; } else { /* found an empty slot */ if (this slot is big enough && we need a slot) remember this slot; } } /* entry not found */ if (we can use the cache) { /* 4.3BSD-Reno only: */ add a `known not present' entry to the cache; } save slot information, if any; return (NOTFOUND); Later, if we decide to write to the directory: if (enduseful < dirsize) { /* directory can be shrunk */ itrunc(inode for directory, enduseful); } The logic is this: if we are creating a new entry (CREATE or RENAME operations), a cache lookup will only tell us that we cannot proceed, not that we can. This is rarely helpful (the typical case is that we can proceed, and looking in the cache will just mean spending some time only to get a `no information' result). Otherwise (for a LOOKUP or a DELETE) we can use the cache; however, profiles gathered on ucbvax showed that cacheing DELETE operations was not a win, so it is not done. In any case, if the cache lookup fails or cannot be done, the whole directory must be scanned. (This is actually done in two parts: rather than from 0 to N, the scan goes from M to N, then 0 to M, where M is the place the last search ended. This means that, e.g., if you read a directory, then stat each file in the directory in directory order, the loop always hits the next entry `perfectly'.) While we are scanning it, however, we might as well find the place (if any) that a new name could be stashed, and---since it only takes one more variable---the last place there are any non-empty slots in the directory. If we later decide to scribble on the directory (which means we will have to write back the inode), we may as well trim the directory at the same time. Since this was all `detritus', any user program that is about to read this part of the directory is not losing any data if it gets EOF instead. This still leaves the other possible question: `Since records within 4BSD-style directories are variable length, it would be possible to take an existing directory and make it smaller by packing the names in a better way. That is, we have a bin-packing problem, and when the contents of the bins change a different packing might be more optimal. Does any 4BSD system do such a repacking?' The answer to this question is `sort of'. In the pseudo-code above, there really should be three states for the `found a big enough slot' variable: `no', `yes', and `yes, but only after moving stuff around'. (These are actually three enumerations, NONE, FOUND, and COMPACT respectively.) The `compaction' state, however, applies only to a single directory block (512 bytes on the VAX, 1024 bytes on the Tahoe). Some records in one block might get moved (uniformly towards the front, in this case) in order to combine free space from two different records. That is: entry for foo+free space|entry for bar+free space <----200----> <---30--->|<----200----> <---82---> (bytes) <-------record 0------->|<-------record 1-------> (total size 230) (total size 282) (assuming VERY long file names :-) ). If we wanted to add a 95-byte entry, it would fit in this 512-byte block, but only if we moved record 1 somewhat leftwards. In this case we might get a slot status of `COMPACT'. When this happens, the directory rewriting code moves entries leftwards as necessary, then fills in the (newly expanded) hole. This means that a program that (stupidly) reads directories one byte at a time may get an inconsistent picture of the directory block. Any program that reads a whole directory block at a time, however, will get a consistent picture; and any program that uses getdirentries()---this is required for NFS, among other things, and is used by the current readdir()---will likewise get a consistent picture. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris (New campus phone system, active sometime soon: +1 301 405 2750)
edward@ucbarpa.Berkeley.EDU (Edward Wang) (08/19/90)
In article <26078@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes: >The logic is this: if we are creating a new entry (CREATE or RENAME >operations), a cache lookup will only tell us that we cannot proceed, >not that we can. This is rarely helpful (the typical case is that we >can proceed, and looking in the cache will just mean spending some time >only to get a `no information' result). Otherwise (for a LOOKUP or a >DELETE) we can use the cache; however, profiles gathered on ucbvax >showed that cacheing DELETE operations was not a win, so it is not >done. This brings up a related question. A result of a study I did a while ago is that failed lookups often repeat (stats of mail boxes, for example), and caching this information brings the hit rate (cache hit / total number of lookups) up quite a bit. (The cache would contain files known not to exist.) My impression is that BSD does not do this. Does anyone here know why?