[comp.os.research] O/S's using bit maps for free disk block lists?

earle@mahendo.Jpl.Nasa.Gov (Greg Earle) (03/29/88)

In Maurice Bach's `The Design Of The UNIX Operating System' book, the last
Exercise in Chapter 4 (Internal Representation Of Files) reads:

	15. Discuss a system implementation that keeps track of free disk
	    blocks with a bit map instead of a linked list of blocks.  What
	    are the advantages and disadvantages of this scheme?

I once worked on a project (a proprietary database computer, not unlike that
made by Britton-Lee) where the O/S team member designing the file system used
the bitmap scheme for the free disk block list.  Seeing this exercise got me
curious as to whether anyone has tried implementing this as a research project
using a UNIX system as a base, or conversely, do any extant O/S research
projects use this method?  If so, can anyone answer the question above (namely
how does it compare to using a linked list, kept in the Super Block)?

[ I know VMS uses bit maps.  --DL ]


-- 
	Greg Earle		earle@mahendo.JPL.NASA.GOV
	Indep. Sun consultant	earle%mahendo@jpl-elroy.ARPA	[aka:]
	(Gainfully Unemployed)	earle%mahendo@elroy.JPL.NASA.GOV
	Lake View Terrace, CA	...!{cit-vax,ames}!elroy!jplgodo!mahendo!earle

allyn@odin.ucsd.edu (Allyn Fratkin) (03/31/88)

See:

Peacock, J. Kent, "The Counterpoint Fast File System", 
USENIX Conference Proceedings, (Winter, 1988), pp. 243-249.

for a description of an implementation of a "fast" file system
for system v.  it uses a bit map for the free list and shows throughput
approaching the berkeley fast file system.

From the virtual mind of 

Allyn Fratkin            allyn@sdcsvax.ucsd.edu    or
EMU Project              {ucbvax, decvax, ihnp4}!sdcsvax!allyn
U.C. San Diego

die@frog.UUCP ( David I. Emery) (03/31/88)

	Charles River Data Systems UNOS operating system (an independently
implemented system V UN*X kernel (passes SVVS) that contains no AT&T code) has
used bit maps for inode and free disk block allocation since the
first version of its file system was designed in 1979.

	We feel that bit map allocation significantly enhances file system
reliability and robustness.  It also permits improved performance - we use it
to allocate contiguous regions on the disk to those blocks which are written
behind when a file is being written to sequentially.  We postpone actually
binding the blocks in the cache to physical sectors until enough have
accumulated and then go find a contiguous region big enough to hold them
by searching the bit map.  Having the sectors contiguous on the disk 
permits efficient read-ahead in one disk operation and minimizes head
motion when a file is accessed.

[ You postpone writing to disk to collect enough for a contiguous region on ]
[ disk.  What metric do you use to decide when you have collected enough?   ]
[ How do you trade this off against the vulnerability to crashes?  --DL     ]

-- 
David I. Emery
Charles River Data Systems
983 Concord St.
Framingham, MA 01701
Tel: (617) 626-1102
uucp: ...!decvax!frog!die

earle@mahendo.Jpl.Nasa.Gov (Greg Earle) (03/31/88)

In article <4811@sdcsvax.UCSD.EDU> I wrote:
>I once worked on a project ... that ... used [ a ]
>bitmap scheme for the free disk block list.  Seeing this exercise got me
>curious as to whether anyone has tried implementing this as a research project
>using a UNIX system as a base, or conversely, do any extant O/S research
>projects use this method?

Everyone has been kindly reminding me that 4.x BSD UNIX uses such a scheme.
Yes, thank you, I knew that already (I'd be a pretty sad consultant otherwise).
My trusty copy of `A Fast File System for UNIX' is within easy reach ...

What I was *really* thinking when I said `UNIX system' was a System V-based or
BTL Research (i.e., V8 or V9) version (sorry, mass-Bach reading has me in
System V mode).  After having used it for 4 years I sort of forgot that 
4.x BSD UNIX can (and still is) be called a `research project'.  What I was
*really* thinking when I said [ (-: ] `research project' was things I know
little of, like Sprite for example.

Sorry for the confusion ...

-- 
	Greg Earle		earle@mahendo.JPL.NASA.GOV
	Indep. Sun consultant	earle%mahendo@jpl-elroy.ARPA	[aka:]
	(Gainfully Unemployed)	earle%mahendo@elroy.JPL.NASA.GOV
	Lake View Terrace, CA	...!{cit-vax,ames}!elroy!jplgodo!mahendo!earle

darrell@cs.ucsd.edu (Darrell Long) (04/01/88)

Melinda Shore <shor@sphinx.uchicago.edu>, says...

Unicos (Cray's version of SysV) uses a bitmap of disk free blocks.
Earlier versions used the traditional linked lists.  We never ran
the old filesystem, but apparently it really was much slower than 
the current one.  Maybe someone at Cray could give us real figures.
-- 
Melinda Shore                               ..!uunet!reason.psc.edu!shore
Pittsburgh Supercomputing Center                     shore@reason.psc.edu

emory!arnold@ucsd.edu (Arnold D. Robbins {EUCC}), says...

The Eighth and Ninth Edition Research Unix systems use a bit map to store
the free list; blocks in the file system are all 4K bytes in size. The
V8/V9 'fsck' knows how to convert an 'old' style 1K-linked-list filesystem
(i.e. the 4.1 BSD file system) into a new style 4K bitmapped filesystem, in
place.

Peter Weinberger wrote a description of this file system in a Bell Labs
Technical Journal article circa 1981, I have the journal at home, so if
someone desperately needs the reference I could look it up. As I recall,
the main speed-up came from the larger block sizes, although the ability
to use an in-memory bit map also helped. It's been a while since I read the
article. Basically, Weinberger was able to move data through the filesystem
at just about maximum practical disk transfer speeds. Note also that this
is still the traditional "all the i-node blocks at the front" sort of
file system.

A wart is that V8 uses the eigth bit in the minor device field to distinguish
between file system types; the new one has the high bit in the byte on. It
seems to me they could have done it better with the File System Switch, but
I suspect the FSS came much later than the faster file system. Perhaps V9
does it differently than V8 did; I've only seen (someone else's) V8 manual.
-- 
Arnold Robbins
ARPA, CSNET:	arnold@emory.ARPA	BITNET: arnold@emory
UUCP: { decvax, gatech, }!emory!arnold	DOMAIN: arnold@emory.edu (soon)
	``csh: just say NO!''

deh0654@sjfc.UUCP (Dennis Hamilton) (04/01/88)

[ This will be the last bit-mapped allocation posting, unless something ]
[ really new appears. --DL ]

The CP/M operating system used bit maps, and it was very efficient on small-
memory machines using slow floppy disks, for example. The bit map is not
stored on the disk however -- it is constructed from allocation tables every
time the disk is "mounted." That way, there is no problems about using a
bit map that has been invalidated by a crash (or at least, fewer problems).
The CP/M directory structure lists the allocated blocks for each file, using
directory overflow records if necessary, for large or strange, sparse files
whose virtual blocks are not all allocated.

The biggest problem with the CP/M scheme has to do with shared physical blocks.
The file erase function updated the bit map and invalidated the directory slot
of the erased (actually, released) file, but it wouldn't notice if any blocks
were shared with another file. This allows re-allocation of shared blocks when
they are really not available. There are remedies, and some utility software
is careful to force the equivalent of a remount every time that erasure is
done, so that the bit map will be correctly reconstituted.

Apart from the time needed to reconstruct a bit map every time a disk volume
is mounted, the only other problem that tended to reduce performance was the
need to consult directories so often, especially for file allocation overflow
records. This was improved in later versions by improved directory caching and
knowing which devices were non-removable.

A very nice quality of the CP/M scheme was the ability to handle sparsely
allocated large virtual files. That, and sharing of blocks seems actually to
go smoother than the linking in allocation lists that is typical of MSDOS and
other systems. But as far as performance goes, caching is becoming so easy
that it would seem to overcome most performance objections to any method (except, possibly, in terms of ability to handle recoveries after crashes).

Dennis E. Hamilton

--
-- orcmid {uucp: ... !rochester!sjfc!deh0654
vanishing into a twisty little network of nodes all alike}

cantrell@Alliant.COM (Paul Cantrell) (04/01/88)

The MASSCOMP unix file system, which was a derivative of the Bell file
system, was modified by Tom Teixeira (now at Stellar) to do bitmap
allocation of file blocks. This was necessary because the system supported
contiguous block allocation (required for the high speed data acquisition).
You could try mailing him at stellar for more information about it.

My experience with his file system was that allocations were very fast,
and the file system throughput was very high. Even when non-contiguous
files were allocated, a side affect of the bitmap allocation was that
the blocks were obtained in sorted order. Even with gaps between some
of the blocks, the locality obtained made for very nice file system
throughput. I can't think of any negative effects of his allocation
method...


					PC

jgy@hropus.att.com (John Young) (04/02/88)

This bit map discussion is quite interesting!
Question:  Previous posters have mentioned the use of bitmaps
for simplyfing contigous allocations.  How is this done with
respect to gap/cylinder optimizations.  Do you keep n maps where
n is the gap size??

	John Young
	AT&T-BL

bruce@ucsd.edu (Bruce Beare) (04/05/88)

Convergent's Mightyframe (S/320, S/640, etc) uses a combination bitmap/free 
list file system. What we did was to keep the freelist intact, and slice off a 
section of the file system at a time for bitmapping.

Advantages: Minimal changes to kernel, fsck, etc, very good performance,
	ability to switch between the two formats (free list/combo bitmap)
	at will (fsck does it).

Disadvantages: When the bitmap is exhausted, the kernel will lock the free list
	and scan the entire free list (one pass) building a new bitmap. 
	This does not happen very often, but when it does the file system will 
	block alloc/free calls (for that file system) for several seconds.
	
Other improvements:
	When an alloc calls for more then 1 block, it gets contiguous blocks 
	from the bit map (changes to bmap, etc). This coupled with a 
	'direct I/O' facility allowed us to do large file system reads/write 
	as a single DMA to the physical device, giving us a 4X speed 
	improvement. 
	
	We peaked with the Fuji SMD drive, Interphase SMD Controller
	at the 1MB/S range for file system I/O on the AIM II benchmark.
	The typical ST506 performance was 350-400 K/S with Maxtor xt1140 drives.


	Bruce Beare
	Now @ Voysys Corporation
	..!pyramid!ctnews!voysys!bruce


Non-Standard disclaimer: 
	I never did speak for Convergent Technologies, and still don't.