[net.unix] Raw vs. block device. I'm confused.

jpayne%bbn-vax@sri-unix.UUCP (01/05/84)

From:  Jonathan Payne <jpayne@bbn-vax>

Could somebody out there explain (completely) the difference between a
raw device and a block device?  Don't just say that raw devices are
faster 'cos I already know that.  Why two types?  Why doesn't adb'ing
a raw device work? (At least not when I tried.)  I think a lot of people
know part of the answer to my question, but if somebody knows EVERYTHING
I would appreciate a response.
	Thanks,
		J

gwyn%brl-vld@sri-unix.UUCP (01/06/84)

From:      Doug Gwyn (VLD/VMB) <gwyn@brl-vld>

A raw ("character") device does not go through the block buffers,
a block device does.

sdyer%bbn-unix@sri-unix.UUCP (01/06/84)

From:  Steve Dyer <sdyer@bbn-unix>

Reading and writing on a disk block device participate in the kernel's
buffer cache.  That is, data transfers occur between the user's address
space and the buffers in the buffer cache, possibly implying that no I/O
was performed immediately (i.e. on a read the buffer might have already
been present in the cache, and on a write, the actual I/O request would be
enqueued, but not yet performed.) Note that when the number of bytes to be
transferred is greater than the UNIX system's buffer size, BSIZE (usually
512 or 1024), the single request given by the user program must be broken
up into multiple requests to fill a system buffer.

"Raw" disk I/O occurs directly between the user program and the hardware
device, bypassing any buffering.  Raw I/O is faster than "cooked" I/O for
two reasons: first, since data is DMA'ed directly into the user's address
space, one avoids the CPU overhead of having to copy bytes to/from an
intermediate buffer.  More importantly, when performing disk operations
like "?check", "fsck" or a disk-to-disk copy, all of which need to read
multiple contiguous physical blocks, it is often possible (depending on the
controller) to read multiple sectors in a single DMA operation.  The same
I/O request on the block device would have to be split into several
operations, almost certainly losing revolutions between successive
requests.

Adb'ing the raw disk device doesn't work because of physio(), the mediator
of raw "dma-type" requests.  Physio() hands to the disk device strategy
routine the "block number" of the request.  The block number is derived
quite simply as u.u_offset>>BSHIFT.  u.u_offset is the current "lseek"
position of the open raw device file, BSHIFT is log2(BSIZE).  Thus, all RAW
I/O operations must occur on a BSIZE boundary.  (Now only MUST, but DO!
It's quite surprising the first time you attempt raw I/O on a non-BSIZE
boundary and find that you've trashed the beginning of the block!)
Adb, like most UNIX programs, simply lseeks to the desired spot and
starts writing.

Think about it.  The primitive writable object on the surface of a disk is
a sector, which is usually 512 bytes.  To write on a disk device at other
than a sector boundary would require reading the old sector into memory,
modifying it, and writing it out again, something the raw device cannot
do, but which the block device handles quite well, since its higher levels
have already taken care of that.  Now, you might ask why physio() truncates
at BSIZE rather than SECTORSIZE (since they are no longer, since V7, one
and the same.)  I suspect it's merely a convenience, saving an extra
manifest constant to keep track with reality.

/Steve Dyer
sdyer@bbncca
decvax!bbncca!sdyer

phil@amd70.UUCP (Phil Ngai) (01/09/84)

Steve Dyer, your article was helpful to a novice like me. But it raises
another question. I once had a slightly corrupt root filesystem which
I fixed by using adb. I can't remember whether I used the raw or block
device but from your article I must have used the block device. The
question is: should I sync or not before rebooting the system? If I
don't sync, then the area I adb'd won't be written. If I do sync, which
takes precedence, the in-core superblock or the block I adb'd?
-- 
Phil Ngai (408) 988-7777 {ucbvax,decwrl,ihnp4,allegra,intelca}!amd70!phil

clark.wbst@PARC-MAXC.ARPA (01/09/84)

A raw device does not go through the buffer pool... this has some side effects
like you have to read and write in integer multiples of physical blocks 
(sectors), starting on block boundries.

--Ray

clark.wbst@PARC-MAXC.ARPA (01/09/84)

A raw device does not go through the buffer pool... this has some side effects
like you have to read and write in integer multiples of physical blocks 
(sectors), starting on block boundries.  Also, a raw file system is defined in
terms of sectors - an offset and a length.  I do not THINK there is anything
to prevent you from going beyond there and tromping on the next file system.

--Ray

clark.wbst@PARC-MAXC.ARPA (01/09/84)

New, Related question...

	I seem to remember a warning once that doing a read on a raw 
device reads in at least the physical record size, i.e. sector on disk or
record on tape, regardless of the byte count you put - so that if you have
a 512 byte buffer and read a tape with an 8K block, you write past the
end of buffer! 

	Is this true?  Does it depend on the device controller?

--Ray

sdyer%bbn-unix@sri-unix.UUCP (01/09/84)

From:  Steve Dyer <sdyer@bbn-unix>


In general, most UNIX magtape drivers use the following conventions with
the RAW device:

	read(fh, buf, nbytes) returns -1 when nbytes < physical record size

		otherwise,

	read(fh, buf, nbytes) returns the actual number of bytes in the record
		(i.e., it transfers only a single record, regardless of the
		 byte count.)

I have always ascribed the former behavior to a limitation of the controller;
it transfers a full record or nothing.

/Steve Dyer
sdyer@bbncca
decvax!bbncca!sdyer

ron%brl-vgr@sri-unix.UUCP (01/12/84)

From:      Ron Natalie <ron@brl-vgr>

In one word..."buffering".  A io on a block device is always done into
a buffer in the kernel.  A raw disk io is done directly into the buffer
the user passed.  Of course raw is faster if you are looking at a block
and then throwing it away.  You don't need the buffer cache and you don't
need all that copying.  There are constraints however.  Since the peripheral
really transfers directly into the user buffer the number of characters
the user requested it may not work.  Most peripherals require word align
ments, various offsets and minimum granularities.  ADB is probably going
away because you are not reading the beginning of a physical disk block,
or you are not reading a whole block.

-ROn

scw%ucla-locus@cepu.UUCP (01/13/84)

From:  Steve Woods <cepu!scw@ucla-locus>


	    I seem to remember a warning once that doing a read on a raw 
    .
    .
    .
    end of buffer! 

	    Is this true?  Does it depend on the device controller?

No, but it does depend on the controller. All DEC controllers will read as
many bytes as you tell them to read, when writing however they will write a full
sector (disks) padding the sector with zero bytes up to its full length. Tape
records will exactly as long as you tell them to be, within the limits of the
controller (some tape controllers require an even number of bytes).
<scw>

ron%brl-vgr@sri-unix.UUCP (01/16/84)

From:      Ron Natalie <ron@brl-vgr>

It depends on both the driver and the device as to what exactly is
allowed during RAW IO.  When the read is initiated it is never (unless
someone has really messed up the driver) set up to do more than what the
user asks for.  Generally, what happens is u.u_count is just stuck in
the byte count register (after making conversion to words or negative
as required by the device).

Most of the tape drivers issue an error if the Physical record size is
greater than the dma size (the size asked to read).

-Ron