[gnu.gcc.bug] A possible bug in sparc.md movstrsi instruction

jkp@SAUNA.HUT.FI (Jyrki Kuoppala) (07/25/89)
I'm not quite sure about this yet, as I'm not yet that familiar with
the GCC internals.  So don't rush applying the change to your machine;
it probably doesn't matter anyway for GCC, but perhaps for other
compilers which use the same backend.  I'll be hacking on this more
and hope to have a better-tested version of this ready sometime.

In gcc.texinfo, there's a comment about the movstrM instruction:

`movstrM'
     Block move instruction.  The addresses of the destination and source
     strings are the first two operands, and both are in mode `Pmode'.
     The number of bytes to move is the third operand, in mode M.
     The fourth operand is the known shared alignment of the source and
     destination, in the form of a `const_int' rtx.

However, the pattern for the movstrsi instruction is:

(define_expand "movstrsi"
  [(parallel [(set (mem:BLK (match_operand:BLK 0 "general_operand" ""))
		   (mem:BLK (match_operand:BLK 1 "general_operand" "")))
	      (use (match_operand:SI 2 "arith32_operand" ""))
	      (clobber (match_dup 3))
	      (clobber (match_dup 0))
	      (clobber (match_dup 1))])]

It doesn't seem to use the fourth operand at all.  And in out-sparc.c
function output_block_move, there's the following comment:

/* Output code to place a size count SIZE in register REG.
   If SIZE is round, then assume that we can use alignment
				        ~~~~~~~~~~~~~~~~~
   based on that roundness, and return an integer saying
   what alignment (roundness, transfer size) we will be using.

I think this is true for C, but not for other languages.  I ran into
this debugging a very early version of gpc.  I changed the pattern to:

(define_expand "movstrsi"
  [(parallel [(set (mem:BLK (match_operand:BLK 0 "general_operand" ""))
		   (mem:BLK (match_operand:BLK 1 "general_operand" "")))
	      (use (match_operand:SI 2 "arith32_operand" ""))
	      (use (match_operand:VOID 3 "immediate_operand" ""))
	      (clobber (match_dup 4))
	      (clobber (match_dup 0))
	      (clobber (match_dup 1))])]

and changed the pattern after this one which calls output_block_move
accordingly.  Now that pattern passes the alignment to
output_block_move, which then uses the alignment rather than the size
of the move to decide if it can use whole-word move instructions.  I
also changed quite a lot of the output_block_move function to make it
take the alignment into account.

Please don't hesitate to tell me if I'm altogether on the wrong track.
As mentioned, I don't think the problem appears with C, but with
pascal string assignment it does if the strings are not aligned.  You
would perhaps want to double-word align the strings anyway (the it
works right), but still the compiler should work as documented.

Also, I'm having trouble with short (two, four or eight characters)
strings; somewhere they get changed from BLKmode into SImode or DImode
and the compiler doesn't call emit_block_move but instead generates
move instructions and doesn't take the alignment into account and the
target program core dumps (sorry, gets a runtime error ;-).  I haven't
looked into that closer, so I don't know how to fix that.

//Jyrki

Here's the changed part from sparc.md:

(define_expand "movstrsi"
  [(parallel [(set (mem:BLK (match_operand:BLK 0 "general_operand" ""))
		   (mem:BLK (match_operand:BLK 1 "general_operand" "")))
	      (use (match_operand:SI 2 "arith32_operand" ""))
	      (use (match_operand:VOID 3 "immediate_operand" ""))
	      (clobber (match_dup 4))
	      (clobber (match_dup 0))
	      (clobber (match_dup 1))])]
  ""
  "
{
  operands[0] = copy_to_mode_reg (SImode, XEXP (operands[0], 0));
  operands[1] = copy_to_mode_reg (SImode, XEXP (operands[1], 0));
  operands[4] = gen_reg_rtx (SImode);
}")

(define_insn ""
  [(set (mem:BLK (match_operand:SI 0 "register_operand" "r"))
	(mem:BLK (match_operand:SI 1 "register_operand" "r")))
   (use (match_operand:SI 2 "arith32_operand" "rn"))
   (use (match_operand:SI 3 "immediate_operand" "n"))
   (clobber (match_operand:SI 4 "register_operand" "=r"))
   (clobber (match_operand:SI 5 "register_operand" "=0"))
   (clobber (match_operand:SI 6 "register_operand" "=1"))]
  ""
  "* return output_block_move (operands);")


and here's the two changed functions from out-sparc.c:

/* Output code to place a size count SIZE in register REG.
   Because block moves are pipelined, we don't include the
   first element in the transfer of SIZE to REG.  */

void
output_size_for_block_move (size, reg, align)
     rtx size, reg;
     int align;
{
  rtx xoperands[2];
  xoperands[0] = reg;

  /* First, figure out best alignment we may assume.  */
  if (REG_P (size))
    {
      xoperands[1] = size;
      output_asm_insn ("sub %1,1,%0", xoperands);
    }
  else
    {
      int i = INTVAL (size);

      /* predecrement count.  */
      i -= align;
      if (i < 0) abort ();

      xoperands[1] = gen_rtx (CONST_INT, VOIDmode, i);

      output_asm_insn ("set %1,%0", xoperands);
    }
}

/* Emit code to perform a block move.

   OPERANDS[0] is the destination.
   OPERANDS[1] is the source.
   OPERANDS[2] is the size.
   OPERANDS[3] is the alignment in bytes.
   OPERANDS[4..6] are pseudos we can safely clobber as temps.  */

char *
output_block_move (operands)
     rtx *operands;
{
  /* A vector for our computed operands.  Note that output_load_address
     makes use of (and can clobber) up to the 8th element of this vector.  */
  rtx xoperands[10];
  rtx zoperands[10];
  static int movstrsi_label = 0;
  int align = INTVAL (operands[3]);
  int i, j;

  /* Check if we are moving bytes in smaller chunks than align is and adjust
     align accordingly.  This should never happend, but do it just to make sure. */

  if (! REG_P (operands[4])) {
       if (INTVAL (operands[4]) & 1)
	    align = 1;
       else if ((INTVAL (operands[4]) & 2) && align > 2)
	    align = 2;
       else if ((INTVAL (operands[4]) & 4) && align > 4)
	    align = 4;
  }

  xoperands[0] = operands[0];
  xoperands[1] = operands[1];
  xoperands[2] = operands[4];

  /* Since we clobber untold things, nix the condition codes.  */
  CC_STATUS_INIT;

  /* Recognize special cases of block moves.  These occur
     when GNU C++ is forced to treat something as BLKmode
     to keep it in memory, when its mode could be represented
     with something smaller.

     We cannot do this for global variables, since we don't know
     what pages they don't cross.  Sigh.  */

  if (GET_CODE (operands[2]) == CONST_INT
      && INTVAL (operands[2]) <= 16
      && ! CONSTANT_ADDRESS_P (operands[0])
      && ! CONSTANT_ADDRESS_P (operands[1]))
    {
      int size = INTVAL (operands[2]);

      cc_status.flags &= ~CC_KNOW_HI_G1;
      if (align == 1)
	{
	  if (memory_address_p (QImode, plus_constant (xoperands[0], size))
	      && memory_address_p (QImode, plus_constant (xoperands[1], size)))
	    {
	      /* We will store different integers into this particular RTX.  */
	      xoperands[2] = gen_rtx (CONST_INT, VOIDmode, 13);
	      for (i = size-1; i >= 0; i--)
		{
		  INTVAL (xoperands[2]) = i;
		  output_asm_insn ("ldub [%a1+%2],%%g1\n\tstb %%g1,[%a0+%2]",
				   xoperands);
		}
	      return "";
	    }
	}
      else if (align == 2)
	{
	  if (memory_address_p (HImode, plus_constant (xoperands[0], size))
	      && memory_address_p (HImode, plus_constant (xoperands[1], size)))
	    {
	      /* We will store different integers into this particular RTX.  */
	      xoperands[2] = gen_rtx (CONST_INT, VOIDmode, 13);
	      for (i = (size>>1)-1; i >= 0; i--)
		{
		  INTVAL (xoperands[2]) = i<<1;
		  output_asm_insn ("lduh [%a1+%2],%%g1\n\tsth %%g1,[%a0+%2]",
				   xoperands);
		}
	      return "";
	    }
	}
      else
	{
	  if (memory_address_p (SImode, plus_constant (xoperands[0], size))
	      && memory_address_p (SImode, plus_constant (xoperands[1], size)))
	    {
	      /* We will store different integers into this particular RTX.  */
	      xoperands[2] = gen_rtx (CONST_INT, VOIDmode, 13);
	      for (i = (size>>2)-1; i >= 0; i--)
		{
		  INTVAL (xoperands[2]) = i<<2;
		  output_asm_insn ("ld [%a1+%2],%%g1\n\tst %%g1,[%a0+%2]",
				   xoperands);
		}
	      return "";
	    }
	}
    }

  /* This is the size of the transfer.
     Either use the register which already contains the size,
     or use a free register (used by no operands).
     Also emit code to decrement the size value by ALIGN.  */

  output_size_for_block_move (operands[2], operands[4], align);
     
  zoperands[0] = operands[0];
  zoperands[3] = plus_constant (operands[0], align);
  output_load_address (zoperands);

  xoperands[3] = gen_rtx (CONST_INT, VOIDmode, movstrsi_label++);
  xoperands[4] = gen_rtx (CONST_INT, VOIDmode, align);

  if (align == 1)
    output_asm_insn ("\nLm%3:\n\tldub [%1+%2],%%g1\n\tsubcc %2,%4,%2\n\tbge Lm%3\n\tstb %%g1,[%0+%2]", xoperands);
  else if (align == 2)
    output_asm_insn ("\nLm%3:\n\tlduh [%1+%2],%%g1\n\tsubcc %2,%4,%2\n\tbge Lm%3\n\tsth %%g1,[%0+%2]", xoperands);
  else
    output_asm_insn ("\nLm%3:\n\tld [%1+%2],%%g1\n\tsubcc %2,%4,%2\n\tbge Lm%3\n\tst %%g1,[%0+%2]", xoperands);
  return "";
}