koontz%capvax.decnet@CAPSRV.JHUAPL.EDU ("CAPVAX::KOONTZ") (08/16/90)
Markus Bohn writes: >Has anyone experience or knowledge of transputer C-compilers with >16 Bit Integer Dataformat ??? I can tell you that Logical Systems also does not support a true 16-bit integer (short is implemented as 32-bits, TCX Manual, Pg 10.). Would it hurt to use a 32-bit representation of a 16-bit integer (other than the 2x requirement for memory)? We've found that the transputer can operate quite well with 8-bit, 32-bit, and (sometimes) 64-bit values. But 16-bit information is slow: the T4 or T8 just can't access 16-bit information from memory quickly or operate on it well. You either have to use 32-bit or 2 8-bit values. I have a small campfire story for you. In one of our programs, we needed to access data in 24-bit, 8-bit, or 32-bit widths. Yes, 24-bit integers (this was how the hardware recorded the data, we had to fetch and access it quickly without too many byte swaps or mask operations). We used a simple union: typedef union { uint32 w; byte b[4]; } byte_access_union; where: #define uint32 unsigned int #define byte unsigned char (I like to use occam-like keywords for primitive data storage...) Given a variable byte_access_union data, we could then access what we needed quickly: the whole word: data.w the lower 24-bits of data: data.w BITAND #0xFFFFFF the upper byte: data.b[3] You could also use it to access 16-bit values: the lower 16-bits of data: data.w BITAND #0xFFFF the upper 16-bits of data: (data.w BITAND #0xFFFF0000) >> 16 OR data.b[2] BITOR (data.b[3] << 8) The 16-bit accesses show the problem: using the higher 16-bits of data for storage requires logical operations and shifts. The transputer is slow on shifting (overhead time + 1 shift per clock cycle). Also, getting that large 32-bit mask into the machine will require 7 prefix ops. The individual byte method only requires 8 shifts and an OR, but it will require 2 memory fetches. Note this is all from a programmer's perspective: the compiler writer's may use similar tricks to create 16-bit integers or they may just use the lower 16-bits of a 32-bit word, making checks for the sign bit if required. We hit upon an interesting snag during a port to a Sun that might interest you: the old big vs. little endian problem. The access to the 24-bit value was OK, but the Sun's 68030 needed to get the 8-byte value out of data.b[0]! We had to define keywords for the values and use a conditional compilation switch, e.g. #define TARGET T800 /* TARGET = T800 | SUN */ #if (TARGET == T800) #define data24 data.w BITAND #0x00FFFFFF #define data8 data.b[3] #endif #if (TARGET == SUN) #define data24 data.w BITAND #0x00FFFFFF #define data8 data.b[0] #endif In general, we learned that ports of software which play with individual byte orderings are troublesome. You might find similar problems with your port if it does byte and bit manipulation, particularly if the program was developed on a 680x0. Ken Koontz The Johns Hopkins University Applied Physics Laboratory Laurel, MD USA email: koontz@capsrv.jhuapl.edu
homeis@azu.informatik.uni-stuttgart.de (Dieter Homeister) (08/18/90)
Markus Bohn writes: >Has anyone experience or knowledge of transputer C-compilers with >16 Bit Integer Dataformat ??? I had the same Problem. I needed 1024*768 pixel data and I have only 2 MB RAM, so I had to pack 2 16-bit-ints into one 32-bit int. My c-compiler (developped by one of my students, Andreas Kaiser), cannot handle 16-bit-ints on T414 or T800, too. Only for the T212 the compiler can produce code for 16- and 32-bit-ints. The reason is that the transputer does not support the handling of 16-bit-values on 32-bit-transputers, so the produced code is very inefficient, no matter if a compiler or a programmer generates it. For the programmer it might be more comfortable, if the compiler does this job. I tried two solutions: shift and mask, or moving two bytes. To my surprise the shift solution was faster, so I now use this. In my case efficiency of the 16-bit-access is not important, so I didn't try register optimisation or asm parts. -------------- Dieter Homeister, Universitaet Stuttgart, Institut fuer parallele und verteilte Hoechstleistungsrechner (IPVR) 7000 Stuttgart 1, Azenbergstr. 12, Tel 0711-121-1342, W-Germany e-mail homeister@informatik.uni-stuttgart.dbp.de -------------- I'm a little bit tired, so put spelling errors > /dev/null
nathan@elberton.inmos.co.uk (Nathan Sidwell) (08/20/90)
In article <9008161656.AA09519@devvax.TN.CORNELL.EDU> koontz%capvax.decnet@CAPSRV.JHUAPL.EDU ("CAPVAX::KOONTZ") writes: >You could also use it to access 16-bit values: > the lower 16-bits of data: data.w BITAND #0xFFFF > the upper 16-bits of data: (data.w BITAND #0xFFFF0000) >> 16 > OR > data.b[2] BITOR (data.b[3] << 8) > >The 16-bit accesses show the problem: using the higher 16-bits of data for >storage requires logical operations and shifts. The transputer is slow on >shifting (overhead time + 1 shift per clock cycle). Also, getting that large >32-bit mask into the machine will require 7 prefix ops. The individual byte >method only requires 8 shifts and an OR, but it will require 2 memory fetches. If you're using OCCAM, (which I assume that you are from the syntax of the examples) the upper 16 bits may be extracted by just a shift op, ie upper.16 := data.w >> 16 as >> is a bitwise SHIFT (not rotate) operator, it does not propagate the sign bit. In c you could code it as upper_16 = (unsigned)datw_w >> 16 I suppose you could argue that the compiler should spot this, but who knows what they do? Nathan Sidwell, INMOS Ltd, UK JANET: nathan@uk.co.inmos Generic disclaimer applies UUCP: ukc!inmos!nathan My indicision is final (I think) INTERNET: nathan@inmos.com