[comp.sys.sgi] Parallel Programming Problem: taskcreate

cindy@cesdis2.gsfc.nasa.gov (Cindy Starr) (06/10/91)

  Howdy Folks,
 
     I am trying to implement a parallel, recursive quicksort
  routine on our 240VGX running version 3.3.1.  (Yes, I realize this
  is dangerous to do . . .)  I am using "taskcreate" to spawn
  the processes, as follows:

  if ((taskM = taskcreate(strID, parQuicksort, qsort1 , 0) ) < 0)
         {
          perror("The taskcreate(1) call failed in parQuicksort.");
          exit (-1);
         }

   After four new processes are created, I receive the following
   errors:
----------------------

The taskcreate(1) call failed in parQuicksort.: New share group member pid 9278
could not join I/O arena. error:No space left on device

The taskcreate(1) call failed in parQuicksort.: No space left on device
----------------------

   One would jump to the conclusion that disk space were limited.
   However, in this case there is plenty. Is an area created other
   than the one I have created by "usinit"?  I have increased the
   size of the area I created to no avail.  I have also looked up
   "setrlimit" to see if any parameters there need to be expanded.
   I haven't found anything that looks viable.
  
   I found the error message in the man page for "sproc", but still
   have no idea how to correct the problem.  The man page states:
	
     New share group member pid # could not join I/O arena. error:<..>
                    if the new share group member could not properly join the
                    semaphored libc arena.  The new process exits with a -1.

    Would anyone know what this problem is and how I can get around
    or correct it?
 
    Many thanks!
 
    Cindy Starr
    cindy@cesdis2.gsfc.nasa.gov

micah@flobb4.csd.sgi.com (Micah Altman) (06/10/91)

In <5601@dftsrv.gsfc.nasa.gov> cindy@cesdis2.gsfc.nasa.gov (Cindy Starr) writes:


>  Howdy Folks,
> 
>     I am trying to implement a parallel, recursive quicksort
>  routine on our 240VGX running version 3.3.1.  (Yes, I realize this
>  is dangerous to do . . .)  I am using "taskcreate" to spawn
>  the processes, as follows:

>  if ((taskM = taskcreate(strID, parQuicksort, qsort1 , 0) ) < 0)
>         {
>          perror("The taskcreate(1) call failed in parQuicksort.");
>          exit (-1);
>         }

>   After four new processes are created, I receive the following
>   errors:
>----------------------

>The taskcreate(1) call failed in parQuicksort.: New share group member pid 9278
>could not join I/O arena. error:No space left on device

>The taskcreate(1) call failed in parQuicksort.: No space left on device

I think that what is happening is that you are creating more than 7
additional processes, and the extra processes can't join the arena ( which
is by default set up for 8 users, max ). At least, I can reproduce an error 
by doing something to that effect.

To get rid of this error, before you first create your arenas, first tasks,
etc, use

	usconfig(CONF_INITUSERS, somebignumber)

where somebignumber is the max. number of processes you expect to run in
parallel at any time during the program (I.e. the number of processes
in the "share group").
 


--
	"Entia non sunt multiplicanda sine necessitate." - William of Ockham
	Micah Altman, "Computational Juggler"	   	   micah@csd.sgi.com
	Phone (415) 335-1866				   FAX (415) 965-2309
	Disclaimer: 	Everything in this document is a lie.	

bron@bronze.wpd.sgi.com (Bron Campbell Nelson) (06/11/91)

In article <5601@dftsrv.gsfc.nasa.gov>, cindy@cesdis2.gsfc.nasa.gov (Cindy Starr) writes:
> 
>    After four new processes are created, I receive the following
>    errors:
> 
> The taskcreate(1) call failed in parQuicksort.: New share group member pid 9278
> could not join I/O arena. error:No space left on device
> 
> The taskcreate(1) call failed in parQuicksort.: No space left on device
> 
> ...
>    I found the error message in the man page for "sproc", but still
>    have no idea how to correct the problem.  The man page states:
> 	
>      New share group member pid # could not join I/O arena. error:<..>
>                     if the new share group member could not properly join the
>                     semaphored libc arena.  The new process exits with a -1.
> 

Well, assuming I understand this (always dangerous :-)), it looks like
you got really really close to the answer here.  The particular error
you are getting is "No space left on device", i.e. an ENOSPC error.
The sproc man page describes this: 

     [ENOSPC]       If the size of the share group exceeds the number of users
                    specified via usconfig(3P) (8 by default).  Any changes
                    via usconfig(3P) must be done BEFORE the first sproc is
                    performed.

So a "usconfig(CONF_INITUSERS, big_number)" at the begining of your main
routine might (should) cure the problem.


---------------------------------------------------------------
Bron Campbell Nelson       | "The usual approach is to pick one
Silicon Graphics, Inc.     | of several revolting kludges."
2011 N. Shoreline Blvd.    |              Henry Spencer
Mtn. View, CA  94039       |___________________________________
bron@sgi.com
These statements are my own, not those of Silicon Graphics.