ken@gvax.cs.cornell.edu (Ken Birman) (04/06/88)
Among other things, I have done a lot of work on systems that maintain large data files (for example, digitized 24-hour electrocardiograms, about 35Mbytes each) and graphics images (easily several Mbytes). Has anyone looked at file system optimization for managing such large files? Under UNIX, what one wants would be a mechanism like the "raw" file system, but with a precessional block mapping mechanism oriented towards the size of blocks the application reads... thus, for my ECG stuff, I tend to read 10kbytes at a time, so one would want a raw file system blocked at units of 10kBytes, with a precession factor based on how long, on the average, my ECG scanner delays between reads. These two numbers should be parameters, set when the file is created. Kernel buffer pools and things like that are a total loss for this type of applications. Instead, you really want read-ahead, or a shared memory scheme where two processes can cooperate to implement a read-ahead. A really nice solution to this problem could double the speed of this class of program, by eliminating copying through the kernel and by reducing the average access delay to near zero... 4.xBSD UNIX is terrible from this point of view. Has much work been done on this sort of thing? Can people point me to a good operating system for this type of application? ---- Note to those who don't know about precessional allocation schemes: The idea here is to lay file system blocks out with gaps between them in such a way that if a typical program issues its next request within the expected time delay, the next block will be right under the read head just after the request is issued. You can do this without wasting space using tricks relating to counting mod S, where S is the number of sectors per track... for blocking factors that don't divide into S, you spiral up and down from surface to surface in a way that parallels the hardware handling of multiple-sector reads. Most good operating systems texts talk about this sort of optimization, but few systems seem to use it.