[net.lang.prolog] Why I think partitions are faster than tagging.

ok@edai.UUCP (Richard O'Keefe) (04/01/84)

     I've been asked to justify my contention that Prolog interpreters
using tags are likely to be slower than Prolog interpreters which like
C-Prolog use a few partitions for their type tests.  C Prolog in fact
can do almost all its type testing by comparing against two boundaries
(which can be kept in registers a lot of the time) and against 0.

     The basis of my claim is that I've looked at the code produced by
the C compiler in both kinds of interpreters, and I've tried to hand-
code unification routines in assembler.  There is a very important loop
that crops up all over the place in a Prolog interpreter:

	a = <a pointer>;
	while (a <is a bound reference>) a = <contents of a>;

This is generally followed by a switch with

	case <a is unbound> : ...
	case <a is atomic>  : ...
	case <a is compound>:

which takes two comparisons or their equivalent in either case.
I can get this loop down to 4 instructions using C-Prolog's data
representation, but with *byte* tags I can only get it down to
5 whichever end I put the tag.  The VAX code is 11 bytes -vs- 19.
The number of instructions would be the same on an M68000.

     Of course the skill of the programmar is a greater factor.

AndrewT%Ucb-Vax@basser.UUCP (04/03/84)

I can not agree that a partitioned scheme is faster than
tagging. edai!ok claims that the loop for de-referencing
a variable is faster in a partioned scheme:

"I can get this loop down to 4 instructions using C-Prolog's
data representation, but with *byte* tags I can only get it
down to 5 whichever end I put the tag.  The VAX code is 11
bytes -vs- 19.  The number of instructions would be the same
on an M68000."


I use a tagged scheme. The compiled VAX code for the aforesaid
loop involves only 3 instructions, 8 bytes. It lokes like this:

                jbr     L1
        L2:     movl    4(r11),r11
        L1:     tstl    (r11)
                jeql    L2

A tagged scheme has other advantages. The unification routine
can be table-driven. This is, I think, faster and it certainly
is much cleaner.

A partition scheme (or at least the C-Prolog version thereof)
involves horrible, heavily machine dependent, manipulations
so that the bit patterns for integers, floats and pointers can
be distinguished.

Using a tagged scheme the various storage areas (stacks) can
be small when created and "grown" to what ever size is
necessary. This is harder to do in a partitioned scheme. The
C-Prolog approach of initially allocating ~1 megabyte is totally
impractical in our environment. We have a swapping (non-virtual
memory) system with 4Mb of memory and sometimes 20-30 students
using Prolog.

-- Andrew Taylor

andrewt@basser.SUN (Andrew Taylor) (04/03/84)

I can not agree that a partitioned scheme is faster than tagging.
edai!ok claims that the loop for de-referencing a variable
is faster in a partioned scheme:
{
	I can get this loop down to 4 instructions using C-Prolog's data
	representation, but with *byte* tags I can only get it down to
	5 whichever end I put the tag.  The VAX code is 11 bytes -vs- 19.
	The number of instructions would be the same on an M68000.
}

I use a tagged scheme. The compiled VAX code for the aforesaid loop
involves only 3 instructions, 8 bytes. It lokes like this:
	
		jbr	L1
	L2:	movl	4(r11),r11
	L1:	tstl	(r11)
		jeql	L2

A tagged scheme has other advantages. The unification routine can be
table-driven. This is, I think, faster and it certainly is much cleaner.

A partition scheme (or at least the Cprolog version thereof) involves
horrible, heavily machine dependent, manipulations so that the bit patterns
for integers, floats and pointers can be distinguished.

Using a tagged scheme the various storage areas (stacks) can be small
when created and "grown" to what ever size is necessary. This is harder
to do in a partitioned scheme. The C-prolog approach of initially allocating
~1 megabyte is totally impractical in our environment. We have a swapping
(non-virtual memory) system with 4Mb of memory and sometimes 20-30 students
using prolog.

				Andrew Taylor

Pereira%SRI-AI@sri-unix.UUCP (04/16/84)

From:  Fernando Pereira <Pereira@SRI-AI>

I think Andrew Taylor missed Richard O'Keefe's point.
Richard was talking of term representations where a
variable cell occupies a single 32 bit word.  Andrew
Taylor's code is clearly for a representation with
64 bits per cell.  For large programs, this factor of
2 in space more than offsets other considerations.
Furthermore, number of memory accesses is a major
(THE major?) factor in the efficiency of a Prolog
system.  Cells twice as big, twice as many accesses...

Since version 1.4, C-Prolog allows the user to specify
the sizes of data areas in the command line.  The
relocation code needed to expand data areas dynamically
is already there, in the function that restores saved
states.  To ensure that the areas being expanded do not
walk over the malloc arena, malloc & friends need to be
simulated within C-Prolog.  Now if only someone out there
would take the hint...

-- Fernando Pereira

pereira@sri-unix.UUCP (04/16/84)

I think Andrew Taylor missed Richard O'Keefe's point.  Richard was
talking of term representations where a variable cell occupies a single
32 bit word.  Andrew Taylor's code is clearly for a representation with
64 bits per cell.  For large programs, this factor of 2 in space more
than offsets other considerations. Furthermore, number of memory accesses
is a mojor (THE major?) factor in the efficiency of a Prolog system.
Cells twice as big, twice as many accesses...

Since version 1.4, C-Prolog allows the user to specify the sizes of
data areas in the command line.  The relocation code needed to expand
data areas dynamically is already there, in the function that restores
saved states.  To ensure that the areas being expanded do not walk over
the malloc arena, malloc & friends need to be simulated within
C-Prolog.  Now if only someone out there would take the hint...

-- Fernando Pereira