[comp.parallel] Problems with Intel iPSC/2?

fenrich@cs.buffalo.edu (Rick Fenrich) (04/13/89)

I have been writing some programs on the Intel iPSC/2 with C using
dynamic memory allocation. My problem is that when I run the program
it seems to run fine and then it hangs, giving no output nor releasing
the cube. A friend of mine has had the same problem with dynamic
memory allocation. We know it is not a synchronization problem and it
is not a looping problem. Has anyone out there come across similar
problems and if so what was your approach to a solution? By the way,
my friend said his program works once he removes "free(pointer)" to
free up allocated storage.

Another problem that we have been having is that when we run a job in
the foreground and we need to cancel it we use Ctrl-C. Sometimes when
Ctrl-C is hit more than once the nodes become inoperable. We get messages
such as "cubetype not found", etc. One solution is to background the
job and then kill it, but if we use debug statements we will not be
able to retrieve the last lines of output since output is buffered.
Any suggestions on how to avoid the Ctrl-C problem?

I appreciate any help you can give!!

						Rick Fenrich

						fenrich@cs.buffalo.edu
						fenrich@sunybcs.bitnet

berryman-harry@YALE.EDU (Harry Berryman) (04/14/89)

In article <5095@hubcap.clemson.edu> fenrich@cs.buffalo.edu (Rick Fenrich) writes:
>I have been writing some programs on the Intel iPSC/2 with C using
>dynamic memory allocation. My problem is that when I run the program
>it seems to run fine and then it hangs, giving no output nor releasing
>the cube. A friend of mine has had the same problem with dynamic
>memory allocation. We know it is not a synchronization problem and it
>is not a looping problem. Has anyone out there come across similar
>problems and if so what was your approach to a solution? By the way,
>my friend said his program works once he removes "free(pointer)" to
>free up allocated storage.

One thing to do is to isolate the problem to the
smallest set of code and pass it on to Intel. They've been pretty helpful 
to me. Also, I use dynamic data structures on the iPSC/2 a lot and
have had no trouble. I use malloc() and free() on 
a regular basis. Is it possible
that your trying to free a pointer to nowhere? You may be barking up 
the wrong tree by looking at memory management, as the iPSC/2 is known 
to hang when messages are not read in as fast as they are generated. 

>Another problem that we have been having is that when we run a job in
>the foreground and we need to cancel it we use Ctrl-C. Sometimes when
>Ctrl-C is hit more than once the nodes become inoperable. We get messages
>such as "cubetype not found", etc.

The deamon that mediates
the host-node communication might be left in a strange state because
a message or series of messages to it was interupted. This deamon 
uses Unix System V IPC datagrams for communication with host programs.
"cubetype not found" does not mean that 
the "the nodes become inoperable." If NX had crashed on one or more
of the nodes, "node not reponding" would be a more likely message.

>One solution is to background the
>job and then kill it, but if we use debug statements we will not be
>able to retrieve the last lines of output since output is buffered.

Uh, a newline will flush the output buffer. So I must be missing something.

I know this has been of little help, but not knowing what the code looks like,
it's hard to give good advice. 

                       Scott Berryman 
                       Dept. of Computer Science

                       Yale University 
                       51 Prospect Street 
                       New Haven, CT 
                       berryman@cs.yale.edu
                       203-432-1260