[comp.sys.transputer] Helios

grabas@m.cs.uiuc.edu (12/07/88)

  For some insight on Transputers, read note 1 "What makes Transputers 
interesting" and its answers. It will give you some ideas. For more 
information, read the Inmos documentation and/or articles on Transputers. 

  For Helios, I am sorry to admit that I don't know anything about it.     

             That'll be all for today... 

                  Dominique Grabas, University of Illinois at Urbana-Champaign 
		grabas@m.cs.uiuc.edu 
 

J.Wexler@edinburgh.ac.uk (11/10/89)

Are there any Helios buffs out there?  I'm almost embarrassed to ask this
question, but I've spent some time trying to answer it from the Prentice-Hall
book - without success - so:

A port is presumably used by at least two parties, one sending a message and the
other receiving the message.  One of them will have to create the port using
NewPort.  How does the other party find out the identification of the port? - in
particular, if the two parties are on different processors?

    John Wexler

homeis@cs3.ifistg.uucp (05/02/90)

On 27 Apr 90  Tommy Leemann, Zurich, Switzerland, said:

>>* If I connect my transputers to a 2-D-grid ...
>.... I guess your resource map doesn't correspond
>to your actual hardware configuration.

I am quite sure that my resource map is ok. May be the reason is my small
memory (250 KB per node).

>>* Helios crashes a lot of times ...
>We had similar crash problems. The PC-server probably doesn't have enough
>memory and doesn't care to tell you about. Try to give as much as possible
>DOS-memory to the PC-server. With version 1.1A we don't have that problem.
>Here the server doesn't start with too little memory!

My server has 640 KB PC memory. And server crashes are not the only crashes.

>>* It is impossible to redirect the stderr, so we could never save ...
>We havn't had such problems at all. In our makefile we always redirect the
>error output into files.

Could You please post an example?

>>* After a while one ore more transputers die without a recognizable ...
>No idea. Couldn't this be the result of a wild program damaging Helios
>resources on that transputer?

This happens even if *no* user programs are running.

Thank You for Your Reply.

--------------
Dieter Homeister, Universitaet Stuttgart,
Institut fuer parallele und verteilte Hoechstleistungsrechner (IPVR)
7000 Stuttgart 1, Azenbergstr. 12, Tel 0711-121-1342, W-Germany
e-mail homeister@informatik.uni-stuttgart.dbp.de

alan@perisl.UUCP (Alan Cosslett) (05/03/90)

On 25 Apr 90 Dieter Homeister, University Stuttgart said

>1. Reset Problems
 =================

The original driver im_ra_b4.d was designed when we did not have any boards
with more than one processor and does not work with more than two processors.

A new driver tram_ra.d (shipped with Helios 1.1a), which is designed for use 
with the INMOS reset scheme on more than one processor, solves the problems
encountered with the previous driver

>* Helios 1.1 cannot boot more than 19 processors

As Tommy Leemann has correctly pointed out the only reason that Helios
does not boot more than 19 transputers is due to a restriction built in
to the standard version of Helios to enforce a licensing restriction.

Helios is currently running on a 128 processor Parsytec Supercluster.

Please contact DSL for more details if you need to run on more than
20 processors.

>>* If I connect my transputer to a 2-D grid
>... small memory (250 KB per node)

We recomend that Helios runs on transputers with not less than 1 MegaByte
of memory. If you want to use transputers with less memory we recomend
that you mark them as NATIVE in the resource map and run code in them
using out stand alone compiler.

If anyone has any difficulties booting certain hardware configuration
could they please contact us directly.

>* Helios crashes a lot ...

We have tested Helios 1.1a on a 128 processor box for several days 
without crashing. One copy of Helios even survived the San Fransisco
Earthquake !!!. We also have a report of Helios running for 13 days
at which the Sun being used as host crashed even though Helios kept going.

>* The editor is totally undocumented ...

Current version of Helios are sent out with a booklet about emacs.
Please contact DSL if you need a copy.

>* The memory management is full of errors ...

As the transputer does not implement memory protection there is no way
that we can protect the system from `unfriendly / buggy' programs that
corrupt memory. If any programs mess up free memory then malloc is
very likely to crash if memory has been corrupted. The lack of virtual
memory also means that it is impossible to stop programs fragmenting
memory (although giving suitable heap size to programs with objed can
reduce the problem.) All the problems sent to technical support so far
pertaining to illustrate a problem with malloc / free have been tracked 
down to memory corruption in the example program !. 

You can use the command map to monitor memory fragmentation.


>* The documentation of the resource maps ...

Yes the original documentation could have been better, since the original
documentation was released we have produced a number of technical notes
and guides, such as `The CDL Guide', to try and improve things. We are 
currently putting a lot of effort into getting the documentation right 
for Helios version 1.2 (Provisionally available NOV/DEC 90)

>* It is impossible to redirect stderr ...

Yes there is a problem here. Due to the distributed nature of Helios
all messages must be retryable, this means that reads and writes
contain not only the data and amount of data to write but also the
position to write the data this in turn means that when you use >&
from the shell to redirect sdout and stderr to a file both stream
send writes to the same file but including their position so 
the data is written on top of itself. In Helios 1.2 we are thinking of
solving this problem by not allowing the redirection of stderr and sdout
together but allowing seperate redirection to different files.
If you really want to use >& at present try redirecting the output
to the /ram server e.g. make >& /00/ram/fred as this works at present.

>* The stack check of the c compiler doesn't work

We have had no problems with the stack checking could you
send us more details ?

>* the error return codes are ugly ...

Use the fault command to get a more user friendly message. If you get
the message exec format error try typing fault on its own with no 
arguments to get further information.

>* the helios message passing mechanism should be documented with more 
examples ...

The average user program should not need to use message passing 
directly but should use the CDL and pipes for the communication
between programs. The use of pipes is NOT inefficient under Helios 
see technical note 22 for more details. The only time you need to 
know about the message passing in detail is if you are writing 
new Helios servers.

>* sometimes user programs started by the shell script don't start ...

Without more deatils I cannot comment on the above. Version 1.1a
has fixed a few bugs to do with the CDL which meant that sometimes
programs did not run.

>* After a while one or more transputers die

We have not had any problems in Version 1.1a of this nature 
due to software. We have had similar problems that have 
turned out to be due to faulty hardware.

>* Helios is non-deterministic. So it is hard to reproduce errors.

You're telling us !!!. Try debugging an operating systen without memory
protection.

>* Helios boots very slowly

We are currently working on this area and anyone trying to boot very 
large transputer networks should contact us.

I hope the above has answered some of your questions

----------------

Alan Cosslett
Technical Support
Perihelion Software Limited
The Maltings, Charlton Road, Shepton Mallet
Somerset, BA4 5QE, ENGLAND
Tel [+44] (0)749 344203
Fax [+44] (0)749 344977

email alan@perisl.uucp

-----------------

Distributed Software Limited (DSL)
670 Azrec West, Bristol BS12 4SD, ENGLAND
Tel [+44] (0)454 612777
Fax [+44] (0)454 618188