[comp.unix.wizards] Shared memory problems under SVr2

fmayhar@killer.UUCP (Frank Mayhar) (05/26/88)

Help!

We're working on a project that will provide an optical disk archival system for
our mainframe operating system.  The drives themselves are on a scsi bus 
connected to a Motorola Unix system that will act as an intelligent peripheral.
The Motorola system is VME-based, 68020 CPU, running System V rel 2.2.  Software
runs on the 68020 that talks to the mainframe and to the drives.  This software
uses shared memory and a couple of concurrently-running processes to let us do
double-buffering (we write one buffer while we fill another, concurrently).
We chose this scheme for performance reasons, since the data rate to the
optical disk is much slower than that to the mainframe.  Since we write large
amount of data at once (roughly 96k), we write directly from the shared memory.
This is where we run into problems.  Apparently, the virtual address we pass
to the write (a pointer into the shared memory area) is not being mapped to 
physical memory correctly.  We put a printf() in our scsi driver to print the
address returned by physio(), just to see what was happening.  Given a passed
pointer of 0x20001, the resolved address is 0x1.  Sigh.  Does anyone know why
this is happening, and, more importantly, how to fix it?  Or if it's fixed in
SVR3?  It's way too late to change our design (we have to meet inflexible dead-
lines), so we would prefer a fix to SVR2.

I've been talking to people at Motorola, but I thought I would try to get the
benefit of some of the expertise in this newsgroup.  Suggestions are solicited.
Please reply via email (preferably to 'Frank-Mayhar%ladc@BCO-MULTICS.ARPA',
my work address), since I don't have ready access to this newsgroup from work.
Thanks in advance!
-- 
Frank Mayhar            UUCP: ..!{ihnp4,dj3b1}!killer!fmayhar
                        ARPA: Frank-Mayhar%ladc@BCO-MULTICS.ARPA
                        USmail: 2116 Nelson Ave. Apt A, Redondo Beach, CA  90278
                        Phone: (213) 371-3979 (home)  (213) 216-6241 (work)

ron@topaz.rutgers.edu (Ron Natalie) (05/28/88)

Can't say I can fix your problem, but I can suggest what we did.
Get a better SCSI controller.  The Ciprico controllers are interesting
in that the board is capable of doing I/O queing.  There ends up being
no active queue in the driver, you just feed the I/O's to it and wait
for them to come back.  Pretty nice.

Ron

fmayhar@killer.UUCP (Frank Mayhar) (05/29/88)

In article <May.28.09.29.06.1988.25821@topaz.rutgers.edu> ron@topaz.rutgers.edu (Ron Natalie) writes:
>Can't say I can fix your problem, but I can suggest what we did.
>Get a better SCSI controller.  [...]

Well, we figured out what was wrong, and it was a bug in Motorola's version 2.2
of SVR2.  When I looked at the mmu_entry for the affected process with crash,
I noticed that the p_addr for the shared memory segment was zero.  So yesterday
we traced the shmget() code to figure out where to get the right p_addr (from
the page table for that shared memory area).  We're kludging around the problem
by using an ioctl() into our SCSI driver to fill in the proper physical address
in the mmu_entry (we do the ioctl() _after_ the shmat()), by tracing down the
page table tree.  Apparently, this problem is fixed in version 3.0.  Since it
only affects doing I/O directly from shared memory, everything else worked
properly.  Tuesday we get to test our kludge.  Please wish us luck!

Thanks for the couple of responses!
-- 
Frank Mayhar            UUCP: ..!{ihnp4,dj3b1}!killer!fmayhar
                        ARPA: Frank-Mayhar%ladc@BCO-MULTICS.ARPA
                        USmail: 2116 Nelson Ave. Apt A, Redondo Beach, CA  90278
                        Phone: (213) 371-3979 (home)  (213) 216-6241 (work)