[comp.unix.wizards] why limit of 16 iovec's in readv/writev?

buck@siswat.UUCP (A. Lester Buck) (11/17/89)

I just finished writing a raw Ethernet driver for AIX PS/2.  During
performance testing, I ran across the limit on 16 iovec's in each readv(2)
or writev(2) call.  This limits the maximum transfer per system call
to about 24KB.

Since my usual assumption is that Unix does not have many gratuitous
restrictions, and this limit also exists on SunOS and 4.3BSD, could someone
please explain the reason for this limit?  After all, the sum of the byte
counts in the 16 iovec's can be anything up to 2^32-1.

Thanks alot!

-- 
A. Lester Buck		...!texbell!moray!siswat!buck

chris@mimsy.umd.edu (Chris Torek) (11/23/89)

In article <473@siswat.UUCP> buck@siswat.UUCP (A. Lester Buck) writes:
>[why is there a limit of 16 iovec vectors]?

Since the array of iovec structures must exist in kernel space (for
reasons having to do with cleanliness and security in the rest of the
kernel), they are `created' on the kernel stack during a sendmsg() or
recvmsg() (or readv() or writev()) call, and the user values are
copied into this local array.  It has a `reasonable' bounded size.

4.4BSD already uses the kernel malloc for readv() and writev(),
and allows 8 `free' iovec structures (on the stack) or up to
1024 `expensive' iovec structures (via malloc+io+free).  The same
could be done for sendmsg and recvmsg, but I suspect there is a
bit less incentive.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

thorinn@skinfaxe.diku.dk (Lars Henrik Mathiesen) (11/24/89)

buck@siswat.UUCP (A. Lester Buck) writes:
>I ran across the limit on 16 iovec's in each readv(2) or writev(2) call.

The limit is there because the iovecs are copied into an array on the
kernel stack; the further processing uses a pointer to this array and
a count (contained in a structure which also has total and residual
byte counts and some flags).

>This limits the maximum transfer per system call to about 24KB.

In my opinion you should not use the readv/writev mechanism to provide
packet delimiters, even for debugging. Unless you are on a machine
where memory is tight, I would suggest that your driver treat the user
buffer as an array of (e.g.) 2K subbuffers, one for each packet. The
last word in each subbuffer could then be a bytecount.

--
Lars Mathiesen, DIKU, U of Copenhagen, Denmark      [uunet!]mcvax!diku!thorinn
Institute of Datalogy -- we're scientists, not engineers.      thorinn@diku.dk

buck@siswat.UUCP (A. Lester Buck) (11/26/89)

In article <4996@freja.diku.dk>, thorinn@skinfaxe.diku.dk (Lars Henrik Mathiesen) writes:
> The limit is there because the iovecs are copied into an array on the
> kernel stack; the further processing uses a pointer to this array and
> a count (contained in a structure which also has total and residual
> byte counts and some flags).
[same answer from Chris Torek.  Thanks for the information.]

I still think this is quite small.  Why does it need to be on the kernel
stack?  Why not use a kernel buffer?  A 1K buffer would have a limit of
128.  This doesn't sound like rocket science...

> >This limits the maximum transfer per system call to about 24KB.
> 
> In my opinion you should not use the readv/writev mechanism to provide
> packet delimiters, even for debugging. Unless you are on a machine
> where memory is tight, I would suggest that your driver treat the user
> buffer as an array of (e.g.) 2K subbuffers, one for each packet. The
> last word in each subbuffer could then be a bytecount.

Reasons, please?  Works just fine for me.  The original requirements for
the driver included the ability to bundle packets.  My driver background
is from System V, so I was thinking of adding ioctl's so as not to
mangle the clean semantics of read(2) and write(2).  When readv(2) and
writev(2) were pointed out, they were exactly what was needed.
For the AIX PS/2 driver, it doesn't really make that much difference,
but I am told that the bundled interface is very important for decent
performance on AIX 370, which is part of the project I am working on.

Why not use readv/writev if they are there?


-- 
A. Lester Buck		...!texbell!moray!siswat!buck

thorinn@skinfaxe.diku.dk (Lars Henrik Mathiesen) (11/26/89)

buck@siswat.UUCP (A. Lester Buck) writes:
>In article <4996@freja.diku.dk>, thorinn@skinfaxe.diku.dk (Lars Henrik Mathiesen) writes:
>> In my opinion you should not use the readv/writev mechanism to provide
>> packet delimiters.

>Reasons, please?  Works just fine for me.
>Why not use readv/writev if they are there?

Normally, the 4.3 BSD kernel does its best to treat all the iovecs in
a single call to readv/writev as one continuous buffer --- that
includes the PF_REMOTE mode of pseudo-ttys, for instance. That's why I
thought that it would be confusing to use other semantics, and that it
should therefore be avoided.

But it turns out that I had overlooked the "physical I/O" character
devices, such as raw tapes (and disks, and so on). These devices call
the physio routine, which in turn calls the driver's strategy routine
once (at least) _per_iovec_. On tapes this gives one tape block per
iovec (I think). So, after all, your usage is actually quite
consistent with the system. Sorry for the confusion.

--
Lars Mathiesen, DIKU, U of Copenhagen, Denmark      [uunet!]mcvax!diku!thorinn
Institute of Datalogy -- we're scientists, not engineers.      thorinn@diku.dk