[comp.compilers] SPARC tagged data

horst@techfak.uni-bielefeld.de (04/30/91)

Does anyone know what TAGGED DATA instructions are useful for and how to
use them? Tagged data is assumed to be 30 bits wide followed by trwo bits
set to zero. The SPARC allows add and subtract instructions on tagged
data.

HH
[Most likely it's for immediate integers in a Lisp-like system that uses
tagged pointers, but I hope someone who actually knows will tell us. -John]
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

eb%watergate@lucid.com (Eric Benson) (04/30/91)

In article <9104291542.AA11213@flora.techfak.uni-bielefeld.de> horst@techfak.uni-bielefeld.de wrote:
> Does anyone know what TAGGED DATA instructions are useful for and how to
> use them? Tagged data is assumed to be 30 bits wide followed by two bits
> set to zero. The SPARC allows add and subtract instructions on tagged data.
> 
> [Most likely it's for immediate integers in a Lisp-like system that uses
> tagged pointers, but I hope someone who actually knows will tell us. -John]

Yes, the tagged arithmetic instructions were put in the SPARC architecture
for Lucid Common Lisp.  If the low-order two bits of a Lisp object
reference are zero, it is a 30-bit immediate fixnum.  If some of those
bits are non-zero, it may be a pointer to a floating point number or a
bignum (arbitrary-precision integer).  Generic arithmetic is generally
optimized for the fixnum case, since the overwhelming majority of
arithmetic is performed on small integers.  On many machines + is compiled
inline as

   Test low order two bits of first operand.
     If nonzero, use general case. (Operand could be a float or bignum.)
   Test low order two bits of second operand.
     If nonzero, use general case. (Operand could be a float or bignum.)
   Add two operands.
   If overflow, use general case.  (Result is a bignum).

On the SPARC this is done as one instruction (TADDCC) followed by a
conditional branch rarely taken.

eb@lucid.com 	           	 	Eric Benson
415/329-8400 x5523                      Lucid, Inc.
Telex 3791739 LUCID                     707 Laurel Street
Fax 415/329-8480                        Menlo Park, CA 94025
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

moss@cs.umass.edu (Eliot Moss) (04/30/91)

Well, I cannot speak for SPARC and say what the instructions were DESIGNED
for, but as the moderator pointed out, they can be used to good effect in
implementing languages such as Smalltalk and LISP, which used tagging to
distinguish (small, i.e., 30-bit) integers from pointers. One uses a tag of 00
in the low bits for integers, and a tag of 01 (say) for pointers. All offsets
from pointers are scaled by -1 to compensate for the 01 in the low bits. Note
that integer add/sub on pointers will be trapped (if you used the tagged
add/sub instructions) and pointer access off an integer can also be trapped.
This means you don't have to insert gobs of tag checking code all over the
place. Multiply and divide tend to require scaling and adjustment anyway, and
bsides, they take long enough, and are rare enough (compared with add/sub)
that additional penalty in handling the tags is judged "acceptable".
--

		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206, 545-1249 (fax); Moss@cs.umass.edu
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

weaver@Sun.COM (Michael Weaver) (04/30/91)

In article <9104291542.AA11213@flora.techfak.uni-bielefeld.de> horst@techfak.uni-bielefeld.de writes:
>Does anyone know what TAGGED DATA instructions are useful for and how to
>use them? ...

Tagged data instructions apparently were borrowed from the Berkeley SOAR
(Smalltalk on a RISC) project. Studies have shown that much of the time in
used by Smalltalk programs is taken up in adds and subtracts, even though
for most adds and subtracts both operands are integers, because in
practice for every add (etc.) a method lookup must be done.  That is, the
type of the both operands must be checked, and then a piece of code that
will perform the operation on this combination of operand types must be
found and invoked.

Tagged instructions can be used (on SPARC) by generating instructions as
though all these adds and subtracts were on integers, but using the tagged
add (etc.) instruction rather than add.

Integers are encoded by setting the two low-order bits of a word to zero
to indicate an integer, and the upper 30 to represent the value.  Data
types other than integers must have at least one of the two lowest bits
set, and the upper 30 bits can be encoded arbitrarily, so that the 32 bits
can be used (by software) to determine the type and value of the operand.

If two such integers are added with a tagged add, the result is the sum,
similarly encoded. However, if either of the operands of a tagged add has
either of its low bits set, then a trap is taken. The trap handler then
can check both operands, dispatch to appropriate code to effect the add,
and then resume execution following the add.

The overall effect is that adds of integers happen quickly while adds of
other types are slowed down a bit. If most of the adds are actually
integers, the overall run times are improved. I imagine that Lisp can
benefit from these instructions similarly.

See Bush et. al. 'Compiling Smalltalk-80 to a RISC', in Proceedings Second
International Conference on Architectural Support for Programing Languages
and Operating Systems (ASPLOS II).

Michael Weaver.
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

kers@otter.hpl.hp.com (Chris Dollin) (05/07/91)

Eliot Moss says:

   Well, I cannot speak for SPARC and say what the instructions were DESIGNED
   for, but as the moderator pointed out, they can be used to good effect in
   implementing languages such as Smalltalk and LISP, which used tagging to
   distinguish (small, i.e., 30-bit) integers from pointers. One uses a tag of
   00 in the low bits for integers, and a tag of 01 (say) for pointers. All 
   offsets from pointers are scaled by -1 to compensate for the 01 in the low
   bits.

Doesn't this choice make inter-language working unnecessarily hard? It means
that structures containing pointers cannot be safely passed to (say) C
routines, because all the pointer values are wrong. (Structures that you
pass to foreign procedures need their numbers raw anyway.) Seems to me that
the fixnum tag should have been something other than 0.

Isn't it nice when hardware does *almost* what you want?
--
Regards, Kers.
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

pardo@june.cs.washington.edu (David Keppel) (05/09/91)

>>[SPARC tagged instructions: set low bits in pointers.]

kers@otter.hpl.hp.com (Chris Dollin) writes:
>[Inter-language hard; pointers cannot be safely passed.]
>Isn't it nice when hardware does *almost* what you want?

Or put another way: isn't it nice when your programming environment
lacks standardized representations for inter-language calls and your
compiler and linker lack hooks for taking advantage of them even if
they did exist?

		;-D on  ( Tagged code )  Pardo
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

henry@zoo.toronto.edu (Henry Spencer) (05/10/91)

In article <KERS.91May7093547@cdollin.hpl.hp.com> kers@otter.hpl.hp.com (Chris Dollin) writes:
>Doesn't this choice make inter-language working unnecessarily hard? It means
>that structures containing pointers cannot be safely passed to (say) C ...

It may be impossible to pass structures to C anyway, because of other design
decisions made differently.  Even calls between C and FORTRAN, which are
*much* closer in basic philosophy than C and Lisp-derived languages, have
many boobytraps and take careful attention on both ends.

Given that both ends know what is going on, actually, there is no disastrous
problem.  The C code simply has to correct the values of incoming pointers
(in an inevitably machine-specific way -- all these conventions are quite
machine-specific!) before using them.  This is, at worst, a fairly routine
problem of inter-language calls.  It can be much worse.

>the fixnum tag should have been something other than 0.

Except that then you need a special adder which knows about it, because you
don't want the tag to change during (say) fixnum addition, and 0 is the only
one with that property.  The low-bits-zero scheme potentially involves no
extra data-path hardware, because the same old adder will work and the
check-for-non-zero-bits hardware is already there for pointers.

>Isn't it nice when hardware does *almost* what you want?

Most Lispish-language users consider higher execution speed more important
than more convenient interlanguage calls.  The hardware *is* doing what they
want.
-- 
Henry Spencer @ U of Toronto Zoology
 henry@zoo.toronto.edu  utzoo!henry
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.

pardo@june.cs.washington.edu (David Keppel) (05/11/91)

henry@zoo.toronto.edu (Henry Spencer) writes:
>[It may be difficult to pass structures to C anyway because of
> other design decisions made differently.]

Indeed, it may be difficult to pass structures between C and C if the
fragments were compiled with different compilers (e.g., libraries).
Problems include:

* Structure passing (most notably return) conventions
* Alignment
* Padding

So, for example, a compiler for a machine that allows unaligned fetches
(e.g., the VAX) might implement

	struct { char c; int i; }

as:

	one byte followed by 4 bytes, non-padded
	one byte followed by 4 bytes, padded to 8 bytes
	1 byte, padded to 4 bytes, followed by 4 bytes

I think these choices are all legal, but I wouldn't swear to it.  The point
is that a compiler may legitimately derive different struct layouts from one
hadware specification.  The second one is pretty silly, but the third may
actually improve performance by reducing the number of unaligned fetches.

So even if all the world programmed in C, we still wouldn't have solved
the interoperability problem :-)

	;-D on  ( Inter-city commuting?  Inter-face computing )  Pardo
-- 
Send compilers articles to compilers@iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers.  Meta-mail to compilers-request.