[comp.unix.wizards] BSD 4.2 minphys

jwf@munsell.UUCP (Jim Franklin) (12/12/86)

Why does minphys() in BSD 4.2 systems still limit block i/o transfers to
63 * 1024 bytes?  I assume that this is an ancient artifact from PDP-11
days, when int's were 16 bits and 64K bytes was considered a lot of
memory.

But processor memory is very cheap now, so we can afford to allocate much
larger buffer pools.  Block devices such as disks are also much faster
and have higher densities. For example, 64K bytes is only 2 tracks on a
Fujitsu 2361.  I would like to be able to blast a 1/4 megabyte to a disk in
one i/o -- the disk, disk controller, and device driver can all deal with
this.  But minphys() is the bottleneck.

Does anyone know why minphys is still < 64 K?  Has anyone successfully
increased this, or know why it can't be done?  Thanks ...
-----
{harvard!adelie,{decvax,allegra,talcott}!encore}!munsell!jwf

Jim Franklin, Eikonix Corp., 23 Crosby Drive, Bedford, MA 01730
Phone: (617) 275-5070 x415

thomas@utah-gr.UUCP (Spencer W. Thomas) (12/16/86)

In article <376@wyszecki.munsell.UUCP> jwf@munsell.UUCP (Jim Franklin) writes:
>Does anyone know why minphys is still < 64 K?  Has anyone successfully
>increased this, or know why it can't be done?  Thanks ...

(Note: this applies to Vax/PDP-11 only.)

physio() is often used to write to Unibus devices.  These devices have a
limit of 64k bytes transferred (due to a 16 bit count register).  It has
nothing to do with host memory size.
-- 
=Spencer   ({ihnp4,decvax}!utah-cs!thomas, thomas@utah-cs.ARPA)

chris@mimsy.UUCP (Chris Torek) (12/16/86)

In article <1871@utah-gr.UUCP>thomas@utah-gr.UUCP (Spencer W. Thomas) writes:
>physio() is often used to write to Unibus devices.  These devices have a
>limit of 64k bytes transferred (due to a 16 bit count register).  It has
>nothing to do with host memory size.

This is true (although many Unibus devices use a word count, not
a byte count, so should be able to handle 128kb ... maybe).  That
does not explain why all the massbuss code uses the same routine.

All the pages involved in physical I/O must be locked into core
during the transfer.  The MBA byte count register is 32 bits wide,
so it should be able to handle up to 4Gb, but physio itself would
probably hang or crash in pagein() if you tried to transfer more
in one shot than you have in free pages.  Rather than compute this
at runtime, it seems easier to have all the MBA drivers use the
same minphys routine.  At 63k-at-a-time, I get a raw data rate of
1.1Mb/s on a Vax 785 with Eagles on an Emulex SC788 (I am told it
is a 788; they all look alike to me).  This amounts to less than
20 interrupts per second---quite trivial; the *clock* interrupts
100 times per second.  Even with an XMD controller, at 2.2Mb/s,
that would be only 35 interrupts per second.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

henry@utzoo.UUCP (Henry Spencer) (12/17/86)

> Why does minphys() in BSD 4.2 systems still limit block i/o transfers to
> 63 * 1024 bytes?  I assume that this is an ancient artifact from PDP-11
> days...

Provided the disk controller and the bus being used can cope, I can't see
any good reason to retain the restriction.  Bear in mind that most Unibus
controllers tend to use 16-bit transfer-size counts, and any Unibus I/O
is limited by the 256KB Unibus hardware address space.  For non-Unibus
controllers, I can't see any good reason for the limitation unless there
is some other stupid problem with the VAX hardware (I don't have a VAX
and don't want one).

This doesn't mean that there isn't something silly in Berklix that will
break if you remove the limit...
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry

mangler@cit-vax.Caltech.Edu (System Mangler) (12/18/86)

In article <376@wyszecki.munsell.UUCP>, jwf@munsell.UUCP (Jim Franklin) writes:
> I would like to be able to blast a 1/4 megabyte to a disk in one i/o --
> the disk, disk controller, and device driver can all deal with this.

What kind of disk controller?

In article <4763@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) writes:
> The MBA byte count register is 32 bits wide,

The MBA byte count register contains two 16-bit byte counts:  the
number of bytes transferred to/from memory, and the number transferred
to/from the drive.  On an error, they may differ, due to buffering.
The 16-bit byte counts limit the MBA to at most 127 sectors at a time.

The massbus disk driver really ought to supply its own minphys to deal
with 516 byte sectors (which happen when you're writing the headers).

> At 63k-at-a-time, I get a raw data rate of
> 1.1Mb/s on a Vax 785 with Eagles on an Emulex SC788

Even a lowly 750 can do that, given Eagles.

> This amounts to less than 20 interrupts per second---quite trivial;

Lost revolutions, not interrupts, are the issue, and only when you're
doing fairly specialized things, like a database writing raw cylinders,
fast bad-block checking, or image backup to a 200-ips 6250-bpi streamer,
all of which probably require more CPU horsepower than a VAX anyway.

Don Speck   speck@vlsi.caltech.edu  {seismo,rutgers,ames}!cit-vax!speck

hosking@convexs.UUCP (12/19/86)

We've been running with a NOP version of minphys on Convex C-1s
for several years with very few problems.  I won't guarantee anything about
VAX hardware, but there doesn't seem to be any reason why the software
can't hack the large transfer sizes, except for the possible memory deadlocks
if you get REALLY abusive.  We made a few changes to reduce the chance of
such deadlocks when doing huge transfers.  (We run a modified 4.2 BSD based
version of UNIX.)

In sys_generic.c, rwuio() was changed:
	for (i = 0; i < uio->uio_iovcnt; i++) {

		/*
		 * This check is really two checks in one.  It catches negative
		 * sizes AND requests to transfer too large a chunk at once,
		 * such as "dd if=/dev/rda0c of=/dev/rmt20 bs=100000000" on
		 * a system with only 16 MB of physical memory.  Lack of such
		 * checks can result in hangs or panics in vslock(), and other
		 * nasties.  This won't catch some pathological cases of
		 * vslock() hangs, but it should prevent the vast majority of
		 * potential hangs/panics in vslock() without being too
		 * unfriendly.  DRH 7/23/86.
		 */

+		if ((unsigned) iov->iov_len > (unsigned) (ctob(maxmem) / 2)) {
+			u.u_error = EINVAL;
+			return;
+		}
		uio->uio_resid += iov->iov_len;
		if (uio->uio_resid < 0) {
			u.u_error = EINVAL;
			return;
		}
		iov++;
	}

In vm_mem.c, vslock() was changed:

    /*
     * We should "never" fail the test that follows, but we do it anyway to
     * prevent hangs when brain damaged callers make unreasonable
     * requests on system memory.  This won't prevent *all* deadlocks,
     * but it should catch all but the pathological cases.  It's better
     * to get info on what's broken (and be able to sync the disks) than
     * to just hang forever, and it's a cheap check, so why not ?  DRH 7/23/86
     */
    if ((unsigned) count > (unsigned) (ctob(maxmem) / 2))
	panic("vslock: trying to wire too much memory");