jkp@SAUNA.HUT.FI (Jyrki Kuoppala) (07/25/89)
I'm not quite sure about this yet, as I'm not yet that familiar with the GCC internals. So don't rush applying the change to your machine; it probably doesn't matter anyway for GCC, but perhaps for other compilers which use the same backend. I'll be hacking on this more and hope to have a better-tested version of this ready sometime. In gcc.texinfo, there's a comment about the movstrM instruction: `movstrM' Block move instruction. The addresses of the destination and source strings are the first two operands, and both are in mode `Pmode'. The number of bytes to move is the third operand, in mode M. The fourth operand is the known shared alignment of the source and destination, in the form of a `const_int' rtx. However, the pattern for the movstrsi instruction is: (define_expand "movstrsi" [(parallel [(set (mem:BLK (match_operand:BLK 0 "general_operand" "")) (mem:BLK (match_operand:BLK 1 "general_operand" ""))) (use (match_operand:SI 2 "arith32_operand" "")) (clobber (match_dup 3)) (clobber (match_dup 0)) (clobber (match_dup 1))])] It doesn't seem to use the fourth operand at all. And in out-sparc.c function output_block_move, there's the following comment: /* Output code to place a size count SIZE in register REG. If SIZE is round, then assume that we can use alignment ~~~~~~~~~~~~~~~~~ based on that roundness, and return an integer saying what alignment (roundness, transfer size) we will be using. I think this is true for C, but not for other languages. I ran into this debugging a very early version of gpc. I changed the pattern to: (define_expand "movstrsi" [(parallel [(set (mem:BLK (match_operand:BLK 0 "general_operand" "")) (mem:BLK (match_operand:BLK 1 "general_operand" ""))) (use (match_operand:SI 2 "arith32_operand" "")) (use (match_operand:VOID 3 "immediate_operand" "")) (clobber (match_dup 4)) (clobber (match_dup 0)) (clobber (match_dup 1))])] and changed the pattern after this one which calls output_block_move accordingly. Now that pattern passes the alignment to output_block_move, which then uses the alignment rather than the size of the move to decide if it can use whole-word move instructions. I also changed quite a lot of the output_block_move function to make it take the alignment into account. Please don't hesitate to tell me if I'm altogether on the wrong track. As mentioned, I don't think the problem appears with C, but with pascal string assignment it does if the strings are not aligned. You would perhaps want to double-word align the strings anyway (the it works right), but still the compiler should work as documented. Also, I'm having trouble with short (two, four or eight characters) strings; somewhere they get changed from BLKmode into SImode or DImode and the compiler doesn't call emit_block_move but instead generates move instructions and doesn't take the alignment into account and the target program core dumps (sorry, gets a runtime error ;-). I haven't looked into that closer, so I don't know how to fix that. //Jyrki Here's the changed part from sparc.md: (define_expand "movstrsi" [(parallel [(set (mem:BLK (match_operand:BLK 0 "general_operand" "")) (mem:BLK (match_operand:BLK 1 "general_operand" ""))) (use (match_operand:SI 2 "arith32_operand" "")) (use (match_operand:VOID 3 "immediate_operand" "")) (clobber (match_dup 4)) (clobber (match_dup 0)) (clobber (match_dup 1))])] "" " { operands[0] = copy_to_mode_reg (SImode, XEXP (operands[0], 0)); operands[1] = copy_to_mode_reg (SImode, XEXP (operands[1], 0)); operands[4] = gen_reg_rtx (SImode); }") (define_insn "" [(set (mem:BLK (match_operand:SI 0 "register_operand" "r")) (mem:BLK (match_operand:SI 1 "register_operand" "r"))) (use (match_operand:SI 2 "arith32_operand" "rn")) (use (match_operand:SI 3 "immediate_operand" "n")) (clobber (match_operand:SI 4 "register_operand" "=r")) (clobber (match_operand:SI 5 "register_operand" "=0")) (clobber (match_operand:SI 6 "register_operand" "=1"))] "" "* return output_block_move (operands);") and here's the two changed functions from out-sparc.c: /* Output code to place a size count SIZE in register REG. Because block moves are pipelined, we don't include the first element in the transfer of SIZE to REG. */ void output_size_for_block_move (size, reg, align) rtx size, reg; int align; { rtx xoperands[2]; xoperands[0] = reg; /* First, figure out best alignment we may assume. */ if (REG_P (size)) { xoperands[1] = size; output_asm_insn ("sub %1,1,%0", xoperands); } else { int i = INTVAL (size); /* predecrement count. */ i -= align; if (i < 0) abort (); xoperands[1] = gen_rtx (CONST_INT, VOIDmode, i); output_asm_insn ("set %1,%0", xoperands); } } /* Emit code to perform a block move. OPERANDS[0] is the destination. OPERANDS[1] is the source. OPERANDS[2] is the size. OPERANDS[3] is the alignment in bytes. OPERANDS[4..6] are pseudos we can safely clobber as temps. */ char * output_block_move (operands) rtx *operands; { /* A vector for our computed operands. Note that output_load_address makes use of (and can clobber) up to the 8th element of this vector. */ rtx xoperands[10]; rtx zoperands[10]; static int movstrsi_label = 0; int align = INTVAL (operands[3]); int i, j; /* Check if we are moving bytes in smaller chunks than align is and adjust align accordingly. This should never happend, but do it just to make sure. */ if (! REG_P (operands[4])) { if (INTVAL (operands[4]) & 1) align = 1; else if ((INTVAL (operands[4]) & 2) && align > 2) align = 2; else if ((INTVAL (operands[4]) & 4) && align > 4) align = 4; } xoperands[0] = operands[0]; xoperands[1] = operands[1]; xoperands[2] = operands[4]; /* Since we clobber untold things, nix the condition codes. */ CC_STATUS_INIT; /* Recognize special cases of block moves. These occur when GNU C++ is forced to treat something as BLKmode to keep it in memory, when its mode could be represented with something smaller. We cannot do this for global variables, since we don't know what pages they don't cross. Sigh. */ if (GET_CODE (operands[2]) == CONST_INT && INTVAL (operands[2]) <= 16 && ! CONSTANT_ADDRESS_P (operands[0]) && ! CONSTANT_ADDRESS_P (operands[1])) { int size = INTVAL (operands[2]); cc_status.flags &= ~CC_KNOW_HI_G1; if (align == 1) { if (memory_address_p (QImode, plus_constant (xoperands[0], size)) && memory_address_p (QImode, plus_constant (xoperands[1], size))) { /* We will store different integers into this particular RTX. */ xoperands[2] = gen_rtx (CONST_INT, VOIDmode, 13); for (i = size-1; i >= 0; i--) { INTVAL (xoperands[2]) = i; output_asm_insn ("ldub [%a1+%2],%%g1\n\tstb %%g1,[%a0+%2]", xoperands); } return ""; } } else if (align == 2) { if (memory_address_p (HImode, plus_constant (xoperands[0], size)) && memory_address_p (HImode, plus_constant (xoperands[1], size))) { /* We will store different integers into this particular RTX. */ xoperands[2] = gen_rtx (CONST_INT, VOIDmode, 13); for (i = (size>>1)-1; i >= 0; i--) { INTVAL (xoperands[2]) = i<<1; output_asm_insn ("lduh [%a1+%2],%%g1\n\tsth %%g1,[%a0+%2]", xoperands); } return ""; } } else { if (memory_address_p (SImode, plus_constant (xoperands[0], size)) && memory_address_p (SImode, plus_constant (xoperands[1], size))) { /* We will store different integers into this particular RTX. */ xoperands[2] = gen_rtx (CONST_INT, VOIDmode, 13); for (i = (size>>2)-1; i >= 0; i--) { INTVAL (xoperands[2]) = i<<2; output_asm_insn ("ld [%a1+%2],%%g1\n\tst %%g1,[%a0+%2]", xoperands); } return ""; } } } /* This is the size of the transfer. Either use the register which already contains the size, or use a free register (used by no operands). Also emit code to decrement the size value by ALIGN. */ output_size_for_block_move (operands[2], operands[4], align); zoperands[0] = operands[0]; zoperands[3] = plus_constant (operands[0], align); output_load_address (zoperands); xoperands[3] = gen_rtx (CONST_INT, VOIDmode, movstrsi_label++); xoperands[4] = gen_rtx (CONST_INT, VOIDmode, align); if (align == 1) output_asm_insn ("\nLm%3:\n\tldub [%1+%2],%%g1\n\tsubcc %2,%4,%2\n\tbge Lm%3\n\tstb %%g1,[%0+%2]", xoperands); else if (align == 2) output_asm_insn ("\nLm%3:\n\tlduh [%1+%2],%%g1\n\tsubcc %2,%4,%2\n\tbge Lm%3\n\tsth %%g1,[%0+%2]", xoperands); else output_asm_insn ("\nLm%3:\n\tld [%1+%2],%%g1\n\tsubcc %2,%4,%2\n\tbge Lm%3\n\tst %%g1,[%0+%2]", xoperands); return ""; }