[comp.realtime] Shared memory with multiple processors running VxWorks

vanandel@stout.ucar.edu (Joe Van Andel) (09/09/89)

I am working on a real-time radar signal processing system that will
use multiple 68020 processors running VxWorks on a VME backplane.
Because of the large amount of data we are processing
(400Kbytes/second), I can't afford to use TCP/IP or RPC interprocess
communication methods.  As I read the manual, I conclude that VxWorks
doesn't provide facilities for managing shared memory or semaphores
between tasks executing on different processors.  I very much like
VxWorks, but I feel it needs more multi-processor support.

Has anyone else written code (that you would be willing to share) to provide
these facilities?  Do other real-time operating system vendors offer
better multi-processor support?
	Joe VanAndel  		Internet:vanandel@ncar.ucar.edu
	NCAR - RSG  			
	P.O Box 3000		Fax:	 303-497-2044
	Boulder, CO 80307-3000	Voice:	 303-497-2071 

drk@athena.mit.edu (David R Kohr) (09/11/89)

In article <4252@ncar.ucar.edu> vanandel@ncar.ucar.edu (Joe Van Andel) writes:
>[...]
>communication methods.  As I read the manual, I conclude that VxWorks
>doesn't provide facilities for managing shared memory or semaphores
>between tasks executing on different processors.  I very much like
>VxWorks, but I feel it needs more multi-processor support.
>[...]
>	Joe VanAndel  		Internet:vanandel@ncar.ucar.edu
>	NCAR - RSG  			
>	P.O Box 3000		Fax:	 303-497-2044
>	Boulder, CO 80307-3000	Voice:	 303-497-2071 

I'm using both pSOS (the well-known real-time single processor kernel)
and pRISM (an extension of the pSOS interprocess communications primitives
to multi-CPU systems) from Software from Software Components Group (who
originally designed pSOS, I believe).  I was wondering if VxWorks supports
anything like the pRISM primitives.  Can you at least buy an add-on
package for VxWorks to get these primitives?

David R. Kohr   M.I.T. Lincoln Laboratory    Group 45 ("Radars 'R' Us")
	email:	KOHR@LL.LL.MIT.EDU   or   DRK@ATHENA.MIT.EDU
	phone:	(617)981-0775 (work),   (617)527-3908 (home)

projoe@crim.eecs.umich.edu (Joseph A. Dionise) (09/11/89)

In article <4252@ncar.ucar.edu> vanandel@ncar.ucar.edu (Joe Van Andel) writes:
>I am working on a real-time radar signal processing system that will
>use multiple 68020 processors running VxWorks on a VME backplane.
>As I read the manual, I conclude that VxWorks
>doesn't provide facilities for managing shared memory or semaphores
>between tasks executing on different processors.  

   The support is there for "low-level" communication between processors
using shared memory.  We recently setup a shared-memory buffer between 
a pair of 68020's (MVME-133A) and a single 68030 (MVME-147A).  I'll 
outline our setup.

   A single 68020 has a 64K segment (excuse the Intel slip) of 
shared memory located at the upper bounds of its onboard memory.
We tricked vxWorks into not using this memory by modifying the
sysMemTop routine (in sysLib.c) to return the total amount of 
onboard RAM minus the amount of shared memory.  Specifically, 
sysMemTop returns 0x0f0000 = 0x100000 - 0x010000.  This cpu sets 
up the shared memory data structures, initializes the semaphores, 
etc.  We used a very simple queue in the shared memory.  In this 
case, we guarded reads from the queue and writes to the queue, 
since both are destructive.  The other processors use the sysBusTas 
routine to perform a test-and-set across the bus on the global 
semaphores.  If they "win", then reading/writing to the queue takes 
place. 

Our boards enable the RMW (Read-Modify-Write) sequence through the 
use of jumpers (68020) and software (68030).  Hence, the sysBusTas 
routine is really just a call to the 68K tas instruction.  Note
that a cold boot will zero all of the onboard memory (including
any shared memory segments).  If this is not acceptable, then 
the assembly routine romInit must be modified.


################################################################################
#  Joseph A. Dionise                                                           #
#  Robot Systems Division    Internet : projoe@crim.eecs.umich.edu             #
#  University of Michigan    uucp : {..}!umich.uucp!crim.eecs.umich.edu!projoe #
#  1101 Beal Avenue          BIX  : jdionise                                   #
#  Ann Arbor, MI 48109       (313) 936-2830                                    #
################################################################################

ksh@vine.VINE.COM (Kent S. Harris) (09/11/89)

I have had as many as 11 processors communicating across a VME bus under
pSOS.  At the lowest level the communication model was one of shared
memory.  The application interface was a complete device driver in the
usual pSOS sense which included an exclusion exchange (special message)
and a synchronization exchange for supervisor and Interrupt Service
Routine (ISP) synchronization.  To keep life simple, I did not implement
packet fragmenting so applications where limited on the size of the
message they could send, but this would be simple to do.  I did implement
a stream style interface as an application library so an application could
do byte stream i/o.  All in all, no big ditty.

hmp@cive.ri.cmu.edu (Henning Pangels) (09/14/89)

In article <4252@ncar.ucar.edu>, vanandel@stout.ucar.edu (Joe Van Andel) writes:
> I am working on a real-time radar signal processing system that will
> use multiple 68020 processors running VxWorks on a VME backplane.
> Because of the large amount of data we are processing
> (400Kbytes/second), I can't afford to use TCP/IP or RPC interprocess
> communication methods.  As I read the manual, I conclude that VxWorks
> doesn't provide facilities for managing shared memory or semaphores
> between tasks executing on different processors.  I very much like
> VxWorks, but I feel it needs more multi-processor support.
> 

	Rather than mucking around with the sysLib routines, I modified the
Makefile and usrConfig.c used to build VxWorks. In the Makefile, I define
USER_CFLAGS = -DRESERVE_MEM=0x100000, which is appended to the regular
CFLAGS. Then, in usrConfig.c, I change the kernel initialization to read

kernelInit (TRAP_KERNEL, usrRoot, ROOT_STACK_SIZE,
            FREE_RAM_ADRS, sysMemTop () - RESERVE_MEM,
            ISR_STACK_SIZE, INT_LOCK_LEVEL);

Of course, you only want to do this for the processor board on which
the shared memory actually resides - on all other processors in the system,
your application code will have to know where you've mapped your memory
spaces in order to correctly share the memory which you've reserved above.
To coordinate access to the shared memory region, I use the vxTas() routine,
(which is all that is called by sysBusTas()). We have implemented a
rudimentary "backplane pipe" using this mechanism, which uses mailbox- or
backplane interrupts (depending on the processor board used). Even the very
first un-optimized experimental version is almost 10 times faster than
going through the overhead of TCP/IP sockets - as usual, it's possible
to trade off some portability and generality in favor of performance.

To anyone from WRS who might be listening: I agree with comments made
by others that some mechanism like this should be made part of the vxWorks
package.

As an aside: Be careful about mapping several processor's memory spaces
contiguously -- some versions of the sysMemTop() routines work by probing
for live memory, so if there's no memory gap between boards, one processor
might actually claim another's memory for itself.

-- 
Henning Pangels                                   Field Robotics Center
ARPAnet/Internet: hmp@cive.ri.cmu.edu             Robotics Institute
(412) 268-6557                                    Carnegie-Mellon University

projoe@crim.eecs.umich.edu (Joseph A. Dionise) (09/15/89)

In article <6143@pt.cs.cmu.edu> hmp@cive.ri.cmu.edu (Henning Pangels) writes:
>
>	Rather than mucking around with the sysLib routines, I modified the
>Makefile and usrConfig.c used to build VxWorks. In the Makefile, I define
>USER_CFLAGS = -DRESERVE_MEM=0x100000, which is appended to the regular
>CFLAGS. Then, in usrConfig.c, I change the kernel initialization ...
>

   I agree.  This method is better than the approach that I outlined.
   
>
>As an aside: Be careful about mapping several processor's memory spaces
>contiguously -- some versions of the sysMemTop() routines work by probing
>for live memory, so if there's no memory gap between boards, one processor
>might actually claim another's memory for itself.
>

   We encountered this problem.  In fact, this is why I initially modified
the sysMemTop() routine.  I hard coded it to return the amount of onboard 
RAM, instead of probing for the first "open" byte.  

The moral to this story : become familiar with the sysLib library.  It is
the gateway to your processor.


################################################################################
#  Joseph A. Dionise                                                           #
#  Robot Systems Division    Internet : projoe@crim.eecs.umich.edu             #
#  University of Michigan    uucp : {..}!umich.uucp!crim.eecs.umich.edu!projoe #
#  1101 Beal Avenue          BIX  : jdionise                                   #
#  Ann Arbor, MI 48109       (313) 936-2830                                    #
################################################################################

topper@mcgill-vision.UUCP (Anthony Topper) (09/16/89)

>Has anyone else written code (that you would be willing to share) to provide
>these facilities?  Do other real-time operating system vendors offer
>better multi-processor support?
It seems that a number of vendors are weak in this area. VxWorks is no
exception. We had an application that required very fast interprocessor
communication and vxWorks didn't have it so I wrote one. However our 
application required such raw speed that the package I created was 
not used, but I believe would fit many people's needs. Some features (long):

  o  interprocessor, interprocess communication on the same backplane.
  o  Very fast. It fits in between vxWorks pipes and sockets. Half the
     speed of pipes but 50-100 times faster than sockets.
  o  uses vxWorks file level to be as seemless as possible. The same user code
     can be used for sockets between vxWorks "boxes", my shared mem for
     same backplane, and vxWorks pipes within a CPU. So you choose what is
     most appropriate.
  o  Has many modes of operation: queue, ring buffer, mailbox, plain buffer.
     Each of these can be blocking or non-blocking and the blocking mode
     can use interrupt demons to wake processes or use "test-and-set" loops.
     (the former uses mailbox or backplane interrupts to wakeup processes on
      other cpus and requires interrupt processing, the latter takes up bus
      bandwidth but no interrupt overhead). All of these modes are available
     for each block of memory requested by the user, so the appropriate mode
     can be used where needed.
  o Supports memory in contiguous or non-contiguous blocks any where on VME or
    VSB bus. 
  o Does auto-synchronization on boot-up and simultaneous memory allocation.
  o Has semaphore mode for counting semaphores (minimizes allocation
    of memory).
  o  Uses about 2.2k memory per memory partition requested for overhead. Base
     overhead is about 32k. All in shared memory. Requires about 30K of
     code memory on CPU board.
  o  can configure for cold-boot wipe contents of memory or not.

  o performance:
                        1 byte           64 bytes       255 bytes
                        ------           --------       ---------
    VxWorks pipe:        360               390             490
       shMem tas:        700               780            1150
     shMem demon:       1700              1700            2800
sockets (TCP/IP):     200000            200000          200000
(datagrams are faster)

 All times in microseconds. This test using vxWorks timer functions between
 two Heurikon-V2F's (Mc68020 @ 20Mhz, 0 wait) and using Micro-Memory's MM6300
 shared memory board (200 ns access, I think)

  o for example user does something like:

CPU producer:

fd = open("/dev/sharedMem/VME/myBlock", "SIZE=1000, ELSIZE=10, QUEUE, BLOCK");
for (i = 1; i < however_many; i++);
  {
  do some processing get some data ...
  write(fd, buffer, 10);
  }

CPU consumer:

fd = open("/dev/sharedMem/VME/myBlock", "SIZE=1000, ELSIZE=10, QUEUE, BLOCK");
for (i = 1; i < however_many; i++);
  {
  read(fd, buffer, 10);
  do some processing with the data ...
  }

The open requests are automatically synchronized, the rest follows naturally.

 o Current implementation Only runs on the vw3.2 and Heurikon V2F cpus. A
   port to vw4.x and generic vxWorks CPUs is quite straight forward to do,
   though supported CPUs would require mailbox interrupts.
   I started to do it, but I just don't have the time. The code is about 
   a year old and I haven't looked at it in quite some time. I did have a
   neat demo of five cpus doing a classic consumer/producer problem.

 o I also did a port of curses and unix level 3 file I/O for vw3.2 which is
   now obsolete.

 o I people want it they can send me a tape and I'll copy for you. I'll
   be away all october so be patient.

Are you listening Wind River? I'll do a no cash deal for the rights if you
are interested.

   Tony Topper
                                                     _________________
   McGill University, EE Dept.                       | / \  / \  / \ | 
   Montreal, Canada                                  \/   \/   \/   \/ 
                                                     \  ***     ***  / 
 smart mailers: topper@mcgill-vision.uucp            \  ***     ***  / 
           usa: {ihnp4,decvax,akgua,utzoo,etc}!utscri \  *  ***  *  / 
                !musocs!mcgill-vision!topper           \    ***    / 
            or                                           \   *   /
                think!mosart!mcgill-vision!topper         \     / 
       ARPAnet: topper@larry.mcrcim.mcgill.edu              \ /
        bitnet: mcgill-vision!topper@musocs.bitnet

   Bell Canada: (514) 398-3788

topper@mcgill-vision.UUCP (Anthony Topper) (09/16/89)

>Has anyone else written code (that you would be willing to share) to provide
>these facilities?  Do other real-time operating system vendors offer
>better multi-processor support?
It seems that a vxWorks is weak in this area. We had an application that
required very fast interprocessor communication and vxWorks didn't have it, so
I wrote one. However our application required such raw speed that the package
I created was not used, but I believe would fit many people's needs. Some
features (long):

  o  interprocessor, interprocess communication on the same backplane.
  o  Very fast. It fits in between vxWorks pipes and sockets. Half the
     speed of pipes but 50-100 times faster than sockets.
  o  uses vxWorks file level to be as seemless as possible. The same user code
     can be used for sockets between vxWorks "boxes", my shared mem for
     same backplane, and vxWorks pipes within a CPU. So you choose what is
     most appropriate.
  o  Has many modes of operation: queue, ring buffer, mailbox, plain buffer.
     Each of these can be blocking or non-blocking and the blocking mode
     can use interrupt demons to wake processes or use "test-and-set" loops.
     (the former uses mailbox or backplane interrupts to wakeup processes on
      other cpus and requires interrupt processing, the latter takes up bus
      bandwidth but no interrupt overhead). All of these modes are available
     for each block of memory requested by the user, so the appropriate mode
     can be used where needed.
  o Supports memory in contiguous or non-contiguous blocks any where on VME or
    VSB bus. 
  o Does auto-synchronization on boot-up and simultaneous memory allocation.
  o Has semaphore mode for counting semaphores (minimizes allocation
    of memory).
  o  Uses about 2.2k memory per memory partition requested for overhead. Base
     overhead is about 32k. All in shared memory. Requires about 30K of
     code memory on CPU board.
  o  can configure for cold-boot wipe contents of memory or not.

  o performance:
                        1 byte           64 bytes       255 bytes
                        ------           --------       ---------
    VxWorks pipe:        360               390             490
       shMem tas:        700               780            1150
     shMem demon:       1700              1700            2800
sockets (TCP/IP):     200000            200000          200000
(datagrams are faster)

 All times in microseconds. This test using vxWorks timer functions between
 two Heurikon-V2F's (Mc68020 @ 20Mhz, 0 wait) and using Micro-Memory's MM6300
 shared memory board (200 ns access, I think)

  o for example user does something like:

CPU producer:

fd = open("/dev/sharedMem/VME/myBlock", "SIZE=1000, ELSIZE=10, QUEUE, BLOCK");
for (i = 1; i < however_many; i++);
  {
  do some processing get some data ...
  write(fd, buffer, 10);
  }

CPU consumer:

fd = open("/dev/sharedMem/VME/myBlock", "SIZE=1000, ELSIZE=10, QUEUE, BLOCK");
for (i = 1; i < however_many; i++);
  {
  read(fd, buffer, 10);
  do some processing with the data ...
  }

The open requests are automatically synchronized, the rest follows naturally.

 o Current implementation Only runs on the vw3.2 and Heurikon V2F cpus. A
   port to vw4.x and generic vxWorks CPUs is quite straight forward to do,
   though supported CPUs would require mailbox interrupts.
   I started to do it, but I just don't have the time. The code is about 
   a year old and I haven't looked at it in quite some time. I did have a
   neat demo of five cpus doing a classic consumer/producer problem.

 o I also did a port of curses and unix level 3 file I/O for vw3.2 which is
   now obsolete.

 o I people want it they can send me a tape and I'll copy for you. I'll
   be away all october so be patient.

Are you listening Wind River? I'll do a no cash deal for the rights if you
are interested.

   Tony Topper
                                                     _________________
   McGill University, EE Dept.                       | / \  / \  / \ | 
   Montreal, Canada                                  \/   \/   \/   \/ 
                                                     \  ***     ***  / 
 smart mailers: topper@mcgill-vision.uucp            \  ***     ***  / 
           usa: {ihnp4,decvax,akgua,utzoo,etc}!utscri \  *  ***  *  / 
                !musocs!mcgill-vision!topper           \    ***    / 
            or                                           \   *   /
                think!mosart!mcgill-vision!topper         \     / 
       ARPAnet: topper@larry.mcrcim.mcgill.edu              \ /
        bitnet: mcgill-vision!topper@musocs.bitnet

   Bell Canada: (514) 398-3788