jpayne%bbn-vax@sri-unix.UUCP (01/05/84)
From: Jonathan Payne <jpayne@bbn-vax> Could somebody out there explain (completely) the difference between a raw device and a block device? Don't just say that raw devices are faster 'cos I already know that. Why two types? Why doesn't adb'ing a raw device work? (At least not when I tried.) I think a lot of people know part of the answer to my question, but if somebody knows EVERYTHING I would appreciate a response. Thanks, J
gwyn%brl-vld@sri-unix.UUCP (01/06/84)
From: Doug Gwyn (VLD/VMB) <gwyn@brl-vld> A raw ("character") device does not go through the block buffers, a block device does.
sdyer%bbn-unix@sri-unix.UUCP (01/06/84)
From: Steve Dyer <sdyer@bbn-unix> Reading and writing on a disk block device participate in the kernel's buffer cache. That is, data transfers occur between the user's address space and the buffers in the buffer cache, possibly implying that no I/O was performed immediately (i.e. on a read the buffer might have already been present in the cache, and on a write, the actual I/O request would be enqueued, but not yet performed.) Note that when the number of bytes to be transferred is greater than the UNIX system's buffer size, BSIZE (usually 512 or 1024), the single request given by the user program must be broken up into multiple requests to fill a system buffer. "Raw" disk I/O occurs directly between the user program and the hardware device, bypassing any buffering. Raw I/O is faster than "cooked" I/O for two reasons: first, since data is DMA'ed directly into the user's address space, one avoids the CPU overhead of having to copy bytes to/from an intermediate buffer. More importantly, when performing disk operations like "?check", "fsck" or a disk-to-disk copy, all of which need to read multiple contiguous physical blocks, it is often possible (depending on the controller) to read multiple sectors in a single DMA operation. The same I/O request on the block device would have to be split into several operations, almost certainly losing revolutions between successive requests. Adb'ing the raw disk device doesn't work because of physio(), the mediator of raw "dma-type" requests. Physio() hands to the disk device strategy routine the "block number" of the request. The block number is derived quite simply as u.u_offset>>BSHIFT. u.u_offset is the current "lseek" position of the open raw device file, BSHIFT is log2(BSIZE). Thus, all RAW I/O operations must occur on a BSIZE boundary. (Now only MUST, but DO! It's quite surprising the first time you attempt raw I/O on a non-BSIZE boundary and find that you've trashed the beginning of the block!) Adb, like most UNIX programs, simply lseeks to the desired spot and starts writing. Think about it. The primitive writable object on the surface of a disk is a sector, which is usually 512 bytes. To write on a disk device at other than a sector boundary would require reading the old sector into memory, modifying it, and writing it out again, something the raw device cannot do, but which the block device handles quite well, since its higher levels have already taken care of that. Now, you might ask why physio() truncates at BSIZE rather than SECTORSIZE (since they are no longer, since V7, one and the same.) I suspect it's merely a convenience, saving an extra manifest constant to keep track with reality. /Steve Dyer sdyer@bbncca decvax!bbncca!sdyer
phil@amd70.UUCP (Phil Ngai) (01/09/84)
Steve Dyer, your article was helpful to a novice like me. But it raises another question. I once had a slightly corrupt root filesystem which I fixed by using adb. I can't remember whether I used the raw or block device but from your article I must have used the block device. The question is: should I sync or not before rebooting the system? If I don't sync, then the area I adb'd won't be written. If I do sync, which takes precedence, the in-core superblock or the block I adb'd? -- Phil Ngai (408) 988-7777 {ucbvax,decwrl,ihnp4,allegra,intelca}!amd70!phil
clark.wbst@PARC-MAXC.ARPA (01/09/84)
A raw device does not go through the buffer pool... this has some side effects like you have to read and write in integer multiples of physical blocks (sectors), starting on block boundries. --Ray
clark.wbst@PARC-MAXC.ARPA (01/09/84)
A raw device does not go through the buffer pool... this has some side effects like you have to read and write in integer multiples of physical blocks (sectors), starting on block boundries. Also, a raw file system is defined in terms of sectors - an offset and a length. I do not THINK there is anything to prevent you from going beyond there and tromping on the next file system. --Ray
clark.wbst@PARC-MAXC.ARPA (01/09/84)
New, Related question... I seem to remember a warning once that doing a read on a raw device reads in at least the physical record size, i.e. sector on disk or record on tape, regardless of the byte count you put - so that if you have a 512 byte buffer and read a tape with an 8K block, you write past the end of buffer! Is this true? Does it depend on the device controller? --Ray
sdyer%bbn-unix@sri-unix.UUCP (01/09/84)
From: Steve Dyer <sdyer@bbn-unix> In general, most UNIX magtape drivers use the following conventions with the RAW device: read(fh, buf, nbytes) returns -1 when nbytes < physical record size otherwise, read(fh, buf, nbytes) returns the actual number of bytes in the record (i.e., it transfers only a single record, regardless of the byte count.) I have always ascribed the former behavior to a limitation of the controller; it transfers a full record or nothing. /Steve Dyer sdyer@bbncca decvax!bbncca!sdyer
ron%brl-vgr@sri-unix.UUCP (01/12/84)
From: Ron Natalie <ron@brl-vgr> In one word..."buffering". A io on a block device is always done into a buffer in the kernel. A raw disk io is done directly into the buffer the user passed. Of course raw is faster if you are looking at a block and then throwing it away. You don't need the buffer cache and you don't need all that copying. There are constraints however. Since the peripheral really transfers directly into the user buffer the number of characters the user requested it may not work. Most peripherals require word align ments, various offsets and minimum granularities. ADB is probably going away because you are not reading the beginning of a physical disk block, or you are not reading a whole block. -ROn
scw%ucla-locus@cepu.UUCP (01/13/84)
From: Steve Woods <cepu!scw@ucla-locus> I seem to remember a warning once that doing a read on a raw . . . end of buffer! Is this true? Does it depend on the device controller? No, but it does depend on the controller. All DEC controllers will read as many bytes as you tell them to read, when writing however they will write a full sector (disks) padding the sector with zero bytes up to its full length. Tape records will exactly as long as you tell them to be, within the limits of the controller (some tape controllers require an even number of bytes). <scw>
ron%brl-vgr@sri-unix.UUCP (01/16/84)
From: Ron Natalie <ron@brl-vgr> It depends on both the driver and the device as to what exactly is allowed during RAW IO. When the read is initiated it is never (unless someone has really messed up the driver) set up to do more than what the user asks for. Generally, what happens is u.u_count is just stuck in the byte count register (after making conversion to words or negative as required by the device). Most of the tape drivers issue an error if the Physical record size is greater than the dma size (the size asked to read). -Ron