[comp.sys.transputer] 16-bit Integers

koontz%capvax.decnet@CAPSRV.JHUAPL.EDU ("CAPVAX::KOONTZ") (08/16/90)

Markus Bohn writes:

>Has anyone experience or knowledge of transputer C-compilers with
>16 Bit Integer Dataformat ???

I can tell you that Logical Systems also does not support a true 16-bit integer
(short is implemented as 32-bits, TCX Manual, Pg 10.).

Would it hurt to use a 32-bit representation of a 16-bit integer (other than the
2x requirement for memory)?  We've found that the transputer can operate quite 
well with 8-bit, 32-bit, and (sometimes) 64-bit values.  But 16-bit information 
is slow: the T4 or T8 just can't access 16-bit information from memory quickly 
or operate on it well.  You either have to use 32-bit or 2 8-bit values.

I have a small campfire story for you. In one of our programs, we needed to 
access data in 24-bit, 8-bit, or 32-bit widths.  Yes, 24-bit integers (this was 
how the hardware recorded the data, we had to fetch and access it quickly 
without too many byte swaps or mask operations).  We used a simple union:

typedef union
{
  uint32 w;
  byte b[4];
}
byte_access_union;

where:
#define uint32 unsigned int
#define byte   unsigned char
(I like to use occam-like keywords for primitive data storage...)

Given a variable byte_access_union data, we could then access what we needed
quickly:

     the whole word:             data.w
     the lower 24-bits of data:  data.w BITAND #0xFFFFFF
     the upper byte:             data.b[3]

You could also use it to access 16-bit values:
     the lower 16-bits of data:  data.w BITAND #0xFFFF
     the upper 16-bits of data:  (data.w BITAND #0xFFFF0000) >> 16
                      OR
                                 data.b[2] BITOR (data.b[3] << 8)

The 16-bit accesses show the problem: using the higher 16-bits of data for
storage requires logical operations and shifts.  The transputer is slow on 
shifting (overhead time + 1 shift per clock cycle).  Also, getting that large 
32-bit mask into the machine will require 7 prefix ops.  The individual byte 
method only requires 8 shifts and an OR, but it will require 2 memory fetches.
Note this is all from a programmer's perspective: the compiler writer's may use 
similar tricks to create 16-bit integers or they may just use the lower 16-bits 
of a 32-bit word, making checks for the sign bit if required.

We hit upon an interesting snag during a port to a Sun that might interest you: 
the old big vs. little endian problem.  The access to the 24-bit value was OK, 
but the Sun's 68030 needed to get the 8-byte value out of data.b[0]!  We had to 
define keywords for the values and use a conditional compilation switch, e.g.

#define TARGET T800   /* TARGET = T800 | SUN */

#if (TARGET == T800)
#define data24  data.w BITAND #0x00FFFFFF
#define data8   data.b[3]
#endif

#if (TARGET == SUN)
#define data24  data.w BITAND #0x00FFFFFF
#define data8   data.b[0]
#endif

In general, we learned that ports of software which play with individual byte 
orderings are troublesome.  You might find similar problems with your port if it 
does byte and bit manipulation, particularly if the program was developed on a 
680x0.

Ken Koontz
The Johns Hopkins University
  Applied Physics Laboratory
Laurel, MD USA
email: koontz@capsrv.jhuapl.edu

homeis@azu.informatik.uni-stuttgart.de (Dieter Homeister) (08/18/90)

Markus Bohn writes:

>Has anyone experience or knowledge of transputer C-compilers with
>16 Bit Integer Dataformat ???

I had the same Problem. I needed 1024*768 pixel data and I have only
2 MB RAM, so I had to pack 2 16-bit-ints into one 32-bit int.
My c-compiler (developped by one of my students, Andreas Kaiser),
cannot handle 16-bit-ints on T414 or T800, too. Only for the T212 the
compiler can produce code for 16- and 32-bit-ints.
The reason is that the transputer does not support the handling of
16-bit-values on 32-bit-transputers, so the produced code is
very inefficient, no matter if a compiler or a programmer generates
it. For the programmer it might be more comfortable, if the compiler
does this job.
I tried two solutions: shift and mask, or moving two bytes.
To my surprise the shift solution was faster, so I now use this.
In my case efficiency of the 16-bit-access is not important, so I
didn't try register optimisation or asm parts.

--------------
Dieter Homeister, Universitaet Stuttgart,
Institut fuer parallele und verteilte Hoechstleistungsrechner (IPVR)
7000 Stuttgart 1, Azenbergstr. 12, Tel 0711-121-1342, W-Germany
e-mail homeister@informatik.uni-stuttgart.dbp.de
--------------
I'm a little bit tired, so put spelling errors > /dev/null

nathan@elberton.inmos.co.uk (Nathan Sidwell) (08/20/90)

In article <9008161656.AA09519@devvax.TN.CORNELL.EDU> koontz%capvax.decnet@CAPSRV.JHUAPL.EDU ("CAPVAX::KOONTZ") writes:
>You could also use it to access 16-bit values:
>     the lower 16-bits of data:  data.w BITAND #0xFFFF
>     the upper 16-bits of data:  (data.w BITAND #0xFFFF0000) >> 16
>                      OR
>                                 data.b[2] BITOR (data.b[3] << 8)
>
>The 16-bit accesses show the problem: using the higher 16-bits of data for
>storage requires logical operations and shifts.  The transputer is slow on 
>shifting (overhead time + 1 shift per clock cycle).  Also, getting that large 
>32-bit mask into the machine will require 7 prefix ops.  The individual byte 
>method only requires 8 shifts and an OR, but it will require 2 memory fetches.

If you're using OCCAM, (which I assume that you are from the syntax of the
examples) the upper 16 bits may be extracted by just a shift op, ie

upper.16 := data.w >> 16

as >> is a bitwise SHIFT (not rotate) operator,
it does not propagate the sign bit.

In c you could code it as

upper_16 = (unsigned)datw_w >> 16

I suppose you could argue that the compiler should spot this, but who knows
what they do?

Nathan Sidwell, INMOS Ltd, UK         JANET:    nathan@uk.co.inmos
Generic disclaimer applies            UUCP:     ukc!inmos!nathan
My indicision is final (I think)      INTERNET: nathan@inmos.com