[comp.lang.c++] Parallel Processing in C++

pcb@usl.usl.edu (Peter C. Bahrs) (10/13/88)

We are working on developing class definitions for 'parallelizing' C++
from the programmmers point of view.  These definitions include the
normal semaphores, message queues, shared memory...

Are target hardware environments are:
  Encore Multimax with 6 processors
  35 Sun 3 Workstations with NFS on ethernet
  30 IBM PS2's on Novell

Has anyone dealt with dynamic allocation of new objects (i.e. processes)
in C++?

Also, has anyone experimented with moving these processes across networks?

Any help, comments, suggestions or references will be appreciated.
Thanks in advance.
 
pcb@usl.usl.edu

grunwald@m.cs.uiuc.edu (10/14/88)

I have such a class library which is now in its third invocation.

The current version has the following three classes to support ``cpu
multiplexing''

	CpuMultiplexor	-- Gives you Threads, Semaphores, Barriers, Events,
			SpinLock, SpinBarrier and SpinFetchAndOp.

			Primary scheduler is use-selectable, either
			FCFS or priority scheduled.	

			``dangerous'' events (i.e. addition/removal
			from event queues) handled by surrogate thread
			to avoid race conditions.

	SimulationMultiplexor -- Tosses in Facilities and OwnedFacilities,	
			simulated time. Pending events stored in paring heap.

	MonitorSimMux	-- Instrumented version of SimulationMultiplexor
				that measures ``degree of parallelism''.

These are subclasses in the order listed. All the data/bss is shared.
Each UNIX process has a private stack segment for per-CPU storage,
which is used to hold CPU id numbers and pointers to per-task data structures.
Event lists are stored in shared memory so that CPUs can ``load level''
themselves by stealing work from one another.
	
Concurrency is represented by class Thread, which is subclassed. Each subclass
defines ``virtual void main()'' to be the ``start-off'' routine. This is
cleaner than the method used in task.h, because you don't need to copy
parameters around from one stack to another.

The thread constructor just initializes the thread; the user has to put
it on a queue somewhere. The thread destructor gets called when the thread
is deleted or when the thread returns from the ``main'' procedure. Usually,
you add it to the list of current runnable threads.

Right now, this runs on the Encore (UMAX) and under SunOs (although not using
their shared memory or tasking features yet). It looks like it'll get on
the Alliant after they fix a bug in their C compiler. 

On the Encore, you need to declare the amount of memory used.
I share the entire data area, because
this greatly simplifies issues like I/O; however, this can occur *after*
you've allocated large arrays, so you don't need to be as accurate as you
might thing.

I've over-ridden the standard malloc/free to use a parallel-safe implementation
that another local group implemented.

``message passing'' as such is done using a Fifo. These are subclassed:

	AwesimeFifo		-- generic fifo, not parallel-safe
	LockedFifo		-- lock on each access
	LowerBoundedFifo	-- locked, can't take out more than you put in
	BoundedFifo		-- locked, can't take out more than you put in
				   and you can't put in too much

using Semaphores to provide the blocking.

My experiences with this have been that the right class hierarchy is a pain
to deduce. There's a lot of consideration on where & how to place
responsibility. Also, you've got to completely decenteralize your data
structures. I went from having aweful concurrency (2.5 out of 6 processors
on a task-switching intensive simulation) to pretty good (5.7, and it's
been improved since then) by spreading the `current threads' lists over
multiple processors. I still haven't clean up the decenteralized structures
to enforce priority-ordered tasking. 
 
Also, it's not clear to me that I should be willing to assume that you 
can share an entire data segment. However, I (as a programmer) didn't
want to have to fool around with ``share_malloc'' vs. ``malloc''. There
were too many data structures I'd like to have as global references,
and requiring everything to know about a shared segment seemed like a
lot of work. However, this may be needed for certain O/S's. I haven't
run into it yet.

Also, it has pointed out the desire (nay, *need*) for delegation-by-pointer
in C++. Supposed I have class CpuMultiplexor. I want to subclass this by
FifoCpuMux and PriorityCpuMux. I'd like either of these to be used by
a SimulationMultiplexor. Right now, I can't do that, unless I pass in a
pointer to the CpuMux I want to use, and then I have to provide ``call
redirection'' for every service in CpuMux. I hope delegation-by-pointer
gets in G++ or C++ soon; it's a very useful feature.


Dirk Grunwald
Univ. of Illinois
grunwald@m.cs.uiuc.edu