mcg@mipon2.intel.com (Steven McGeady) (05/23/89)
Here is a note I sent some time ago - I never saw it reflected onto the net. I am posting three messages - This one, which contains changes I needed to make to the base, a second, which contains suggestions for improvements in the base, and a third which contains mods for gcc.c to provide environment variable support for locating passes and files. I have not posted the machine-specific files for the 960. If anyone wants them they can write to me. S. McGeady Intel Corp. (503) 696-4393 ---------- To: rms@wheaties.ai.mit.edu Subject: [MSG 1 of 3] Changes to 1.34 base to accomodate Intel 960 processor Date: Fri, 21 Apr 89 13:45:42 PDT From: mcg Here are the changes that I have needed to make to the 1.34 base to make the retarget of GCC to the Intel 80960 work correctly. I will try to explain the rationale for each. 1. Changes involving the calling sequence, in expr.c and stmt.c. The explanation of these changes requires a brief explanation of the calling sequence for the 960. For more information on the calling sequence, ask me, or read my paper in the recent Compcon (San Francisco) Proceedings. The 960 passes parameters in registers g0 through g11. Registers g0 to g7 are presumed 'trashed' (without a well-defined value on return). Registers g8 trhough g11 are 'trashed' only if they are actually used for incoming parameters. Parameters of 1, 2, and 4 words are passed in the registers, *even if they are aggregates*. If all parameters fit in the registers, no stack space is allocated by the caller. Parameters larger than four words, and parameters which do not wholly fit in the registers are passed on the stack (i.e. FUNCTION_ARG_PARTIAL_NREGS is not used). Parameters subsequent to a stack parameter are also passed on the stack. The stack area created for these parameters is called an 'argument block'. The argument block, when created, always contains enough room for the 12 parameter registers to be stored (48 bytes). Register g14 is used to pass the location of the argument block. If there is no argument block, g14 must contain zero. This allows the calling of non-prototyped varargs routines. A varargs routine creates an argument block if g14 is zero, and copies the 12 parameter registers to the first 48 bytes of the argument block in any case. Standard varargs macros may now walk the argument block in a straightforward fashion. Most routines never use the argument block, and g14's value is considered preserved when not used, so it is normally not touched. So, in GCC, some example allocations struct quad { int i,j,k,l;}; struct big { char c[100];}; foo(int i, double d, struct quad s, struct big x) i g0 0(g14) d g2,g3 8(g14) s g4..g7 16(g14) x - 48(g14) foo(struct big x, int i) x - 48(g14) i - 148(g14) I was unable to convince both expr.c and stmt.c to correctly implement this convention without modifying them. REG_PARM_STACK_SPACE, if defined, would *always* allocate stack space in the caller, even when none was needed. FIRST_PARM_CALLER_OFFSET could not be simply set to 48, because then the callee (setup in stmt.c) would assume that the first parameter was there, even if it was passed in a register. In the following changes, the macros FUNCTION_ARG_FIX_OFFSET and FUNCTION_ARG_CALLER_FIX_OFFSET are defined in tm.h as follows: #define FUNCTION_ARG_CALLER_FIX_OFFSET(CUM,SIZE,VAR,MODE,OFF) \ ((CUM.ca_nstackparms == 0) ? \ (((SIZE > 16) || VAR || (CUM.ca_cum >= 12)) ? \ ARG_BLOCK_MIN_SIZE : OFF) : OFF) #define FUNCTION_ARG_FIX_OFFSET(CUM,SIZE,VAR,MODE,OFF) \ ((CUM.ca_nstackparms == 0) ? \ (((SIZE > 16) || VAR || (CUM.ca_cum >= 12)) ? \ ARG_BLOCK_MIN_SIZE : OFF) : OFF) The macro REG_PARM_STACK_FIX fixes up the size of the stack after allocation so that the caller does not allocate space that is unneccesary: #define REG_PARM_STACK_FIX(CUM,SIZE,VAR) \ ((((CUM.ca_cum) >= NPARM_REGS) || (VAR)) ? \ (SIZE)+ARG_BLOCK_MIN_SIZE-(CUM.ca_nregparms*4) \ : 0) (For the definition of CUM, see the attached tm-i960.h) Also, there was a typo in the parameter fntype (which I believe should have been fndecl) passed to FIRST_PARM_CALLER_OFFSET(). I don't understand why REG_PARM_STACK_SPACE was linked with the definition of FIRST_PARM_CALLER_OFFSET. My macros could have gotten around this, but it seemed an unneccesary linkage. *** expr.c Fri Mar 24 13:08:34 1989 --- ../gcc/expr.c Tue Mar 7 08:59:22 1989 *************** *** 3878,3902 **** /* If we know nothing, treat all args as named. */ n_named_args = num_actuals; /* Make a vector to hold all the information about each arg. */ args = (struct arg_data *) alloca (num_actuals * sizeof (struct arg_data)); bzero (args, num_actuals * sizeof (struct arg_data)); args_size.constant = 0; args_size.var = 0; #ifdef FIRST_PARM_CALLER_OFFSET ! args_size.constant = FIRST_PARM_CALLER_OFFSET (fndecl); ! #ifdef REG_PARM_STACK_SPACE stack_count_regparms = 1; #endif - #endif starting_args_size = args_size.constant; /* In this loop, we consider args in the order they are written. We fill up ARGS from the front of from the back if necessary so that in any case the first arg to be pushed ends up at the front. */ #ifdef PUSH_ARGS_REVERSED i = num_actuals - 1, inc = -1; /* In this case, must reverse order of args so that we compute and push the last arg first. */ --- 3866,3888 ---- /* If we know nothing, treat all args as named. */ n_named_args = num_actuals; /* Make a vector to hold all the information about each arg. */ args = (struct arg_data *) alloca (num_actuals * sizeof (struct arg_data)); bzero (args, num_actuals * sizeof (struct arg_data)); args_size.constant = 0; args_size.var = 0; #ifdef FIRST_PARM_CALLER_OFFSET ! args_size.constant = FIRST_PARM_CALLER_OFFSET (fntype); stack_count_regparms = 1; #endif starting_args_size = args_size.constant; /* In this loop, we consider args in the order they are written. We fill up ARGS from the front of from the back if necessary so that in any case the first arg to be pushed ends up at the front. */ #ifdef PUSH_ARGS_REVERSED i = num_actuals - 1, inc = -1; /* In this case, must reverse order of args so that we compute and push the last arg first. */ *************** *** 3991,4051 **** args[i].stack = const0_rtx; BLKmode_parms_sizes += TREE_INT_CST_LOW (size); /* If this parm's location is "below" the nominal stack pointer, note to decrement the stack pointer while it is computed. */ #ifdef FIRST_PARM_CALLER_OFFSET if (BLKmode_parms_first_offset == 0) BLKmode_parms_first_offset /* If parameter's offset is variable, assume the worst. */ = (args[i].offset.var ! ? FIRST_PARM_CALLER_OFFSET (fndecl) : args[i].offset.constant); #endif } } /* If a part of the arg was put into registers, don't include that part in the amount pushed. */ if (! stack_count_regparms) args[i].size.constant -= ((args[i].partial * UNITS_PER_WORD) / (PARM_BOUNDARY / BITS_PER_UNIT) * (PARM_BOUNDARY / BITS_PER_UNIT)); /* Update ARGS_SIZE, the total stack space for args so far. */ args_size.constant += args[i].size.constant; - if (args[i].size.var) { ADD_PARM_SIZE (args_size, args[i].size.var); } - #ifdef FUNCTION_ARG_FIX_OFFSET - args[i].offset.constant = FUNCTION_ARG_FIX_OFFSET(args_so_far, - args[i].size.constant,args[i].size.var, - TYPE_MODE(type),args[i].offset.constant); - #endif - /* Increment ARGS_SO_FAR, which has info about which arg-registers have been used, etc. */ FUNCTION_ARG_ADVANCE (args_so_far, TYPE_MODE (type), type, i < n_named_args); - } - - #ifdef REG_PARM_STACK_FIX - args_size.constant = REG_PARM_STACK_FIX(args_so_far, - args_size.constant,args_size.var); - #endif /* If we would have to push a partially-in-regs parm before other stack parms, preallocate stack space instead. */ must_preallocate = 0; { int partial_seen = 0; for (i = 0; i < num_actuals; i++) { if (args[i].partial > 0) partial_seen = 1; --- 3977,4024 ---- args[i].stack = const0_rtx; BLKmode_parms_sizes += TREE_INT_CST_LOW (size); /* If this parm's location is "below" the nominal stack pointer, note to decrement the stack pointer while it is computed. */ #ifdef FIRST_PARM_CALLER_OFFSET if (BLKmode_parms_first_offset == 0) BLKmode_parms_first_offset /* If parameter's offset is variable, assume the worst. */ = (args[i].offset.var ! ? FIRST_PARM_CALLER_OFFSET (fntype) : args[i].offset.constant); #endif } } /* If a part of the arg was put into registers, don't include that part in the amount pushed. */ if (! stack_count_regparms) args[i].size.constant -= ((args[i].partial * UNITS_PER_WORD) / (PARM_BOUNDARY / BITS_PER_UNIT) * (PARM_BOUNDARY / BITS_PER_UNIT)); /* Update ARGS_SIZE, the total stack space for args so far. */ args_size.constant += args[i].size.constant; if (args[i].size.var) { ADD_PARM_SIZE (args_size, args[i].size.var); } /* Increment ARGS_SO_FAR, which has info about which arg-registers have been used, etc. */ FUNCTION_ARG_ADVANCE (args_so_far, TYPE_MODE (type), type, i < n_named_args); } /* If we would have to push a partially-in-regs parm before other stack parms, preallocate stack space instead. */ must_preallocate = 0; { int partial_seen = 0; for (i = 0; i < num_actuals; i++) { if (args[i].partial > 0) partial_seen = 1; *** stmt.c Sun Mar 12 17:13:00 1989 --- ../gcc/stmt.c Tue Mar 7 09:00:30 1989 *************** *** 3717,3747 **** passed_mode = TYPE_MODE (DECL_ARG_TYPE (parm)); nominal_mode = TYPE_MODE (TREE_TYPE (parm)); /* Get this parm's offset as an rtx. */ stack_offset = stack_args_size; stack_offset.constant += first_parm_offset; /* Find out if the parm needs padding, and whether above or below. */ where_pad = FUNCTION_ARG_PADDING (passed_mode, ! expand_expr(size_in_bytes(DECL_ARG_TYPE (parm)), 0, VOIDmode, 0)); - #ifdef FUNCTION_ARG_CALLER_FIX_OFFSET - /* added for 960 varargs calling convntion - 3/9/89 - mcg */ - stack_offset.constant = FUNCTION_ARG_CALLER_FIX_OFFSET(args_so_far, - int_size_in_bytes(DECL_ARG_TYPE(parm)), 0, - TYPE_MODE(TREE_TYPE(parm)), - stack_offset.constant); - #endif - /* If it is padded below, adjust the stack address upward over the padding. */ if (where_pad == downward) { if (passed_mode != BLKmode) { if (GET_MODE_BITSIZE (passed_mode) % PARM_BOUNDARY) stack_offset.constant += (((GET_MODE_BITSIZE (passed_mode) + PARM_BOUNDARY - 1) / PARM_BOUNDARY * PARM_BOUNDARY / BITS_PER_UNIT) --- 3717,3739 ---- passed_mode = TYPE_MODE (DECL_ARG_TYPE (parm)); nominal_mode = TYPE_MODE (TREE_TYPE (parm)); /* Get this parm's offset as an rtx. */ stack_offset = stack_args_size; stack_offset.constant += first_parm_offset; /* Find out if the parm needs padding, and whether above or below. */ where_pad = FUNCTION_ARG_PADDING (passed_mode, ! expand_expr (size_in_bytes (DECL_ARG_TYPE (parm)), 0, VOIDmode, 0)); /* If it is padded below, adjust the stack address upward over the padding. */ if (where_pad == downward) { if (passed_mode != BLKmode) { if (GET_MODE_BITSIZE (passed_mode) % PARM_BOUNDARY) stack_offset.constant += (((GET_MODE_BITSIZE (passed_mode) + PARM_BOUNDARY - 1) / PARM_BOUNDARY * PARM_BOUNDARY / BITS_PER_UNIT) *************** 2. Bug in function inlining There was a persistent bug in function inlining which I finally tracked down to the following code in integrate.c. The code that initially sets up parm_map: if (DECL_ARGUMENTS (fndecl)) { tree decl = DECL_ARGUMENTS (fndecl); int offset = FUNCTION_ARGS_SIZE (header); parm_map = (rtx *)alloca ((offset / UNITS_PER_WORD) * sizeof (rtx)); bzero (parm_map, (offset / UNITS_PER_WORD) * sizeof (rtx)); parm_map -= first_parm_offset / UNITS_PER_WORD; for (formal = decl, i = 0; formal; formal = TREE_CHAIN (formal), i++) { /* Create an entry in PARM_MAP that says what pseudo register is associated with an address we might compute. */ if (DECL_OFFSET (formal) >= 0) Pays no attention to whether or not parm_map will have any real length. FUNCTION_ARGS_SIZE will be zero if REG_PARM_STACK_SPACE is undefined and all the parameters are in registers. Nevertheless, a register parameter may get a spot on the stack in the called (integrable) routine, usually due to its being addressed. The addres will be close to STARTING_FRAME_OFFSET. STARTING_FRAME_OFFSET has nothing whatsoever to do with FIRST_PARM_OFFSET, or with the argument block at all, which (in this case, is relative to an entirely different register). In that case, parm_map, which is effectively zero-length, is thought to contain a mapping for the parameter, even though it came in a register and has no mapping at all, or if it does contain a mapping, it is at FIRST_PARM_OFFSET into parm_map. A bogus rtx will be generating by referencing through an either NIL or uninitialized parm_map, and the compiler will eventually crash. This hasn't happened in the SPARC version because REG_PARM_STACK is defined. It's not clear to me that the patches below contain the correct solution for the general case, but I thought I'd point the problem out. *** integrate.c Fri Mar 17 13:32:04 1989 --- ../gcc/integrate.c Tue Mar 7 08:59:38 1989 *************** *** 1173,1195 **** || XEXP (orig, 1) == arg_pointer_rtx))) { if (XEXP (orig, 0) == frame_pointer_rtx || XEXP (orig, 0) == arg_pointer_rtx) copy = XEXP (orig, 1); else copy = XEXP (orig, 0); if (GET_CODE (copy) == CONST_INT) { ! int c = INTVAL (copy) - STARTING_FRAME_OFFSET; ! if (parm_map && (c > 0)) { copy = parm_map[c / UNITS_PER_WORD]; return XEXP (copy, 0); } return gen_rtx (PLUS, mode, frame_pointer_rtx, gen_rtx (CONST_INT, SImode, c + fp_delta)); } copy = copy_rtx_and_substitute (copy); --- 1173,1195 ---- || XEXP (orig, 1) == arg_pointer_rtx))) { if (XEXP (orig, 0) == frame_pointer_rtx || XEXP (orig, 0) == arg_pointer_rtx) copy = XEXP (orig, 1); else copy = XEXP (orig, 0); if (GET_CODE (copy) == CONST_INT) { ! int c = INTVAL (copy); ! if (c > 0) { copy = parm_map[c / UNITS_PER_WORD]; return XEXP (copy, 0); } return gen_rtx (PLUS, mode, frame_pointer_rtx, gen_rtx (CONST_INT, SImode, c + fp_delta)); } copy = copy_rtx_and_substitute (copy); *************** *** 1269,1291 **** { rtx reg; if (XEXP (copy, 0) == frame_pointer_rtx || XEXP (copy, 0) == arg_pointer_rtx) reg = XEXP (copy, 0), copy = XEXP (copy, 1); else reg = XEXP (copy, 1), copy = XEXP (copy, 0); if (GET_CODE (copy) == CONST_INT) { ! int c = INTVAL (copy) - STARTING_FRAME_OFFSET; ! if (reg == arg_pointer_rtx && c >= first_parm_offset && parm_map) { int index = c / UNITS_PER_WORD; int offset = c % UNITS_PER_WORD; /* If we are referring to the middle of a multiword parm, find the beginning of that parm. OFFSET gets the offset of the reference from the beginning of the parm. */ while (parm_map[index] == 0) --- 1269,1291 ---- { rtx reg; if (XEXP (copy, 0) == frame_pointer_rtx || XEXP (copy, 0) == arg_pointer_rtx) reg = XEXP (copy, 0), copy = XEXP (copy, 1); else reg = XEXP (copy, 1), copy = XEXP (copy, 0); if (GET_CODE (copy) == CONST_INT) { ! int c = INTVAL (copy); ! if (reg == arg_pointer_rtx && c >= first_parm_offset) { int index = c / UNITS_PER_WORD; int offset = c % UNITS_PER_WORD; /* If we are referring to the middle of a multiword parm, find the beginning of that parm. OFFSET gets the offset of the reference from the beginning of the parm. */ while (parm_map[index] == 0) 3. Aggregates of more than 2 words handled in registers The 960 has ldq, stq, and movq (load quad, store quad, and move quad) operations, which, respectively, load 4 words from memory, store four words to memory, or move 4 (aligned) registers to another 4 registers. It is desirable that 4-word structures be supported by the compiler, especially in a package that we have written which emulates IEEE extended-precision floating-point (which is also directly supported by one of our chips). Changes In optabs.c and stor-layout were needed to support a 'movti' type in the machine description and the appropriate coercion of these BLKmode types into TImode types. Similar work is yet to be done for TFmode and 'long double' (extended precision) types, which the compiler appears to partially support. The change to stor-layout.c is simply changing the constant DImode in agg_mode() to WIDEST_AGGREGATE_MODE, which is set in tm.h *** optabs.c Sun Mar 12 17:12:57 1989 --- ../gcc/optabs.c Tue Mar 7 09:00:09 1989 *************** *** 1944,1978 **** mov_optab->handlers[(int) HImode].insn_code = CODE_FOR_movhi; #endif #ifdef HAVE_movsi if (HAVE_movsi) mov_optab->handlers[(int) SImode].insn_code = CODE_FOR_movsi; #endif #ifdef HAVE_movdi if (HAVE_movdi) mov_optab->handlers[(int) DImode].insn_code = CODE_FOR_movdi; #endif - #ifdef HAVE_movti - if (HAVE_movti) - mov_optab->handlers[(int) TImode].insn_code = CODE_FOR_movti; - #endif #ifdef HAVE_movsf if (HAVE_movsf) mov_optab->handlers[(int) SFmode].insn_code = CODE_FOR_movsf; #endif #ifdef HAVE_movdf if (HAVE_movdf) mov_optab->handlers[(int) DFmode].insn_code = CODE_FOR_movdf; - #endif - #ifdef HAVE_movtf - if (HAVE_movtf) - mov_optab->handlers[(int) TFmode].insn_code = CODE_FOR_movtf; #endif #ifdef HAVE_movstrictqi if (HAVE_movstrictqi) movstrict_optab->handlers[(int) QImode].insn_code = CODE_FOR_movstrictqi; #endif #ifdef HAVE_movstricthi if (HAVE_movstricthi) movstrict_optab->handlers[(int) HImode].insn_code = CODE_FOR_movstricthi; #endif --- 1944,1970 ---- *** stor-layout.c Sun Mar 12 17:13:01 1989 --- ../gcc/stor-layout.c Tue Mar 7 09:00:30 1989 *************** *** 176,196 **** enum machine_mode agg_mode (size) unsigned int size; { register int units = size / BITS_PER_UNIT; register enum machine_mode t; if (size % BITS_PER_UNIT != 0) return BLKmode; ! for (t = QImode; (int) t <= (int) WIDEST_AGGREGATE_MODE; t = (enum machine_mode) ((int) t + 1)) if (GET_MODE_SIZE (t) == units) return t; return BLKmode; } /* Return an INTEGER_CST with value V and type from `sizetype'. */ tree --- 176,196 ---- enum machine_mode agg_mode (size) unsigned int size; { register int units = size / BITS_PER_UNIT; register enum machine_mode t; if (size % BITS_PER_UNIT != 0) return BLKmode; ! for (t = QImode; (int) t <= (int) DImode; t = (enum machine_mode) ((int) t + 1)) if (GET_MODE_SIZE (t) == units) return t; return BLKmode; } /* Return an INTEGER_CST with value V and type from `sizetype'. */ tree ***************** 4. "real.h" is included in "tree.h", and also included by "insn-output.c" So, if aux-output.c must include "tree.h" for some reason, multiple definitions occur. A simple guard define fixes this: *** real.h Sun Mar 12 18:20:28 1989 --- ../gcc/real.h Tue Mar 7 09:00:19 1989 *************** *** 15,27 **** You should have received a copy of the GNU General Public License along with GNU CC; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. */ - #ifndef __REAL_H__ - #define __REAL_H__ 1 - /* If we are not cross-compiling, use a `double' to represent the floating-point value. Otherwise, use some other type (probably a struct containing an array of longs). */ #ifndef REAL_VALUE_TYPE #define REAL_VALUE_TYPE double --- 15,24 ---- *************** *** 82,89 **** #define CONST_DOUBLE_CHAIN(r) XEXP (r, 1) /* The MEM which represents this CONST_DOUBLE's value in memory, or const0_rtx if no MEM has been made for it yet, or cc0_rtx if it is not on the chain. */ #define CONST_DOUBLE_MEM(r) XEXP (r, 0) - - - #endif /* __REAL_H__ */ --- 79,83 ---- ************************************************************************** I have taken great care to minimize the changes to the base compiler. A companion note lists suggested improvements which I have not yet implemented or completed the implementation of, or features that would have eased my implementation effort substantially. S. McGeady
zdenko@csd4.milw.wisc.edu (Zdenko Tomasic) (05/24/89)
i960 gcc is nice attempt. Is anybody working on i860 gcc implementation? (Hey, you just might beat the integrated system-hardware implementation. Software before the integrated hardware! Compiler before the system! what a concept!) -- ___________________________________________________________________ Zdenko Tomasic, UWM, Chem. Dept., P.O. Box 413, Milwaukee, WI 53201 UUCP: uwvax!uwmcsd1!uwmcsd4!zdenko ARPA: zdenko@csd4.milw.wisc.edu