mark@quintus.UUCP (Mark Spotswood) (06/06/90)
I have a question about memory initialization in Mips assember. I would like to initialize a memory location to contain a value which is the difference between the addresses of two other labels, something like this: a: .word b-c where b and c are two other labels. If I use the above syntax, the Mips assembler will signal an error saying that the symbol 'c' must be an absolute value. The mips assembler will allow things like: a: .word b or a: .word b-2 If the assembler can figure out what b and b-2 will be, why can't it figure out what b-c will be? Is there a way to do what I want in Mips assember? -mark (mark@quintus.uucp)
meissner@osf.org (Michael Meissner) (06/06/90)
In article <1380@quintus.UUCP> mark@quintus.UUCP (Mark Spotswood) writes: | I have a question about memory initialization in Mips assember. I would | like to initialize a memory location to contain a value which is the | difference between the addresses of two other labels, something like this: | | a: | .word b-c | | where b and c are two other labels. If I use the above syntax, the Mips | assembler will signal an error saying that the symbol 'c' must be an | absolute value. | | The mips assembler will allow things like: | | a: | .word b | | or | | a: | .word b-2 | | If the assembler can figure out what b and b-2 will be, why can't it figure | out what b-c will be? Is there a way to do what I want in Mips assember? Not that I know of. I was prototyping OSF/1 shared libraries with the MIPS assembler, and wanted to get the difference of an item in .data from the start of .data (for which I had a pointer). I finally gave up, and had GCC calculate the offset itself. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA Catproof is an oxymoron, Childproof is nearly so
dave@imax.com (Dave Martindale) (06/12/90)
In article <1380@quintus.UUCP> mark@quintus.UUCP (Mark Spotswood) writes: | I have a question about memory initialization in Mips assember. I would | like to initialize a memory location to contain a value which is the | difference between the addresses of two other labels, something like this: | | a: | .word b-c | | where b and c are two other labels. If I use the above syntax, the Mips | assembler will signal an error saying that the symbol 'c' must be an | absolute value. | | The mips assembler will allow things like: | | a: | .word b | | or | | a: | .word b-2 | | If the assembler can figure out what b and b-2 will be, why can't it figure | out what b-c will be? Is there a way to do what I want in Mips assember? I don't know if this is true of the MIPS software specifically, but it is a limitation with some systems: Initializing a location in memory with the difference between two external addresses requires some way for the assembler to tell the linker that it should calculate the difference between two external symbols and store the result in this location. Some object file formats simply have no way of specifying this computation. For example, suppose you have an object file format that, for every word of code generated, there is an associated tag that says one of: - this word is absolute, do not relocate - this word is an offset from external symbol #N, add the value of that external symbol at link time - this word is an offset from the beginning of the current module; add in this module's starting address at link time The offsets or absolute values are stored in the instruction stream, and the relocation information is stored elsewhere. This format is simple, and has the ability to handle most of the normal sorts of relocation that are needed. However, since each chunk of relocation information specifies at most *one* external symbol whose link-time value can be added to the corresponding instruction-stream word, there is no way to specify that the value of two symbols should be subtracted. To allow the assembler and linker to handle constant expressions that contain more than one reference to an address or size that is determined at link time, the object file format must allow almost arbitrary expressions to be passed between the assembler and the linker.
meissner@osf.org (Michael Meissner) (06/12/90)
In article <1990Jun11.213554.15606@imax.com> dave@imax.com (Dave Martindale) writes: | In article <1380@quintus.UUCP> mark@quintus.UUCP (Mark Spotswood) | writes: | | | I have a question about memory initialization in Mips assember. I would | | like to initialize a memory location to contain a value which is the | | difference between the addresses of two other labels, something like this: | | | | a: | | .word b-c | | | | where b and c are two other labels. If I use the above syntax, the Mips | | assembler will signal an error saying that the symbol 'c' must be an | | absolute value. | | | | The mips assembler will allow things like: | | | | a: | | .word b | | | | or | | | | a: | | .word b-2 | | | | If the assembler can figure out what b and b-2 will be, why can't it figure | | out what b-c will be? Is there a way to do what I want in Mips assember? | | I don't know if this is true of the MIPS software specifically, but it | is a limitation with some systems: | | Initializing a location in memory with the difference between two external | addresses requires some way for the assembler to tell the linker that it | should calculate the difference between two external symbols and store | the result in this location. Some object file formats simply have | no way of specifying this computation. Note, the MIPS assembler does not even allow for subtraction when the items are in the same section, and are constant. For example: .data b: .word 0 .word 1 .word 2 #... diff: .word (.-b)/4 I suspect that part of the reason may be that the MIPS assembler reorganizes the code, and the first pass of the assembler doesn't have the means of telling the second pass to do the appropriate back patching after any rearrangement. I've also gnashed my teeth over the fact that the MIPS assembler does not allow instructions to be put into the data section. Finally, we just discovered the hard way, that the MIPS assembler screws up line numbers if you put non instructions (such as the table of lables for implementing a switch statement) into .text. This is because the line number information is based on a delta from the previous line, and the assembler doesn't count the non instructions in forming the delta's. -- Michael Meissner email: meissner@osf.org phone: 617-621-8861 Open Software Foundation, 11 Cambridge Center, Cambridge, MA Catproof is an oxymoron, Childproof is nearly so
sjc@key.COM (Steve Correll) (06/12/90)
In article <1380@quintus.UUCP> mark@quintus.UUCP (Mark Spotswood) writes: | a: | .word b-c | | If the assembler can figure out what b and b-2 will be, why can't it figure | out what b-c will be? Is there a way to do what I want in Mips assember? In article <1990Jun11.213554.15606@imax.com>, dave@imax.com (Dave Martindale) writes: > Initializing a location in memory with the difference between two external > addresses requires some way for the assembler to tell the linker that it > should calculate the difference between two external symbols and store > the result in this location. Some object file formats simply have > no way of specifying this computation. Indeed, the MIPS COFF object format does not provide relocation for b-c. However, the assembler can't even subtract non-relocatable labels, because it tries to process the operand of .word long before it performs instruction scheduling and reordering, and therefore it doesn't know how many nops it's going to insert, etc. If you need the subtraction to figure out the offset to a field in a data structure, the .struct directive may well help; but if you need the difference of two labels in an instruction stream, sorry about that. -- ...{sun,pyramid}!pacbell!key!sjc Steve Correll
ph@ama-1.ama.caltech.edu (Paul Hardy) (12/01/90)
I've just started programming in MIPS assembler, and I've written a routine to perform fast matrix multiplies. I am sustaining a rate of approximately 6.7 Mflops on a DECstation 5000, an IRIS 4D (using only one processor), and an ESV 10. All use the 25 MHz MIPS R3000. This is a higher number of MFlops than the vendors claim their machines can do, so I guess I should be pretty happy. However, I'm wondering why it's not going faster. This is probably a question for comp.arch.mips, but there's no such newsgroup. The main body of the multiply is a triplet of instructions: simultaneously, a load, add, and multiply are being performed on different registers. Since they're not using each others' registers, they should all execute together. According to the MIPS book, a single-precision floating-point multiply takes 6 cycles, but during the last two cycles another multiply can begin, so effectively it takes four cycles if many multiplies occur back-to-back. In reality, about 7 cycles elapse between multiplies. The code looks something like (where A, B*, C, D, E, F are single-precision floating point registers, and offset is a hard-coded constant): mul.s A, A, B1 lwc1 C, offset($BASE) add.s E, E, D mul.s C, C, B2 ## 1 cycle stall if load takes 2 cycles etc. A stalled load will hold up the following multiply if it takes more than three cycles to perform. Stalling the add shouldn't affect speed at all, since it's working on other data. Sticking nops above all the mul.s instructions didn't make any difference, so I took them out again. It would seem that loads are taking a long, long time. This is unfortunate, because all data is in cache. The only machine that page faulted during 100,000 iterations of the loop was the E&S machine: 9 times -- fairly insignificant. This is a trial with 10 x 10 matrices, so all of the data fits in one 1k page. All loads in the loop of the operation occur from sequential memory locations. This was done with hopes of decreasing access time on subsequent lookups from the same bank in a cache RAM. I write results in integer registers; they don't get written back into the cache until I'm out of registers (I hold about 20 values, so I perform one write every 380 floating point operations for a 10 x 10 matrix). Does anyone have any experience with this? Where are the extra 3 cycles going? How long does it _really_ take to load a value from cache? If it does take a lot more than 2 cycles, then I could relax make the subroutine a lot more flexible. By the way, this is a very nice assembler language to program in! --Paul
rowen@mips.COM (Chris Rowen) (12/05/90)
Paul Hardy (ph@ama-1.ama.caltech.edu) writes: >The main body of the multiply is a triplet of instructions: simultaneously, >a load, add, and multiply are being performed on different registers. Since >they're not using each others' registers, they should all execute together. >According to the MIPS book, a single-precision floating-point multiply takes >6 cycles, but during the last two cycles another multiply can begin, so >effectively it takes four cycles if many multiplies occur back-to-back. >In reality, about 7 cycles elapse between multiplies. The code looks >something like (where A, B*, C, D, E, F are single-precision floating point >registers, and offset is a hard-coded constant): > > mul.s A, A, B1 > lwc1 C, offset($BASE) > add.s E, E, D > mul.s C, C, B2 ## 1 cycle stall if load takes 2 cycles > etc. > >Does anyone have any experience with this? Where are the extra 3 cycles going? >How long does it _really_ take to load a value from cache? If it does take a >lot more than 2 cycles, then I could relax make the subroutine a lot more >flexible. As I recall, the relevant pipelining rules of the R3010 are the following: 1) An ADD cannot start or finish in cycle in which a MUL starts or finishes 2) Only one instruction can start in any cycle 3) A load can finish in any cycle This means that the add cannot start until the multiply has completed Pipelining of instructions as coded: CYCLE 1 2 3 4 5 6 7 8 9 10 11 mul.s START------ ------ RESULT lwc1 START RESULT add.s START RESULT mul.s START ------ ------ RESULT lwc1 START RESULT add.s START This is six cycles per triple. If you can reorder the code a little, it should get faster: CYCLE 1 2 3 4 5 6 7 8 9 10 11 mul.s START------ ------ RESULT add.s START RESULT lwc1 START RESULT mul.s START ------ ------ RESULT add.s START RESULT lwc1 START RESULT This is three cycles per triple. Chris Rowen
ph@ama-1.ama.caltech.edu (Paul Hardy) (12/05/90)
In article <43786@mips.mips.COM> rowen@mips.COM (Chris Rowen) writes:
1) An ADD cannot start or finish in cycle in which a MUL starts or finishes
2) Only one instruction can start in any cycle
3) A load can finish in any cycle
This means that the add cannot start until the multiply has completed
...
If you can reorder the code a little, it should get faster
...
[preferred order: mul.s, add.s, lwc1, mul.s, add.s, lwc1, etc.]
Chris Rowen
Someone else at MIPS, Mark Johnson, mentioned this to me yesterday.
The bottom line is that the floating-point adder, multiplier, and divider
circuits all share one exponent adder. I had erroneously assumed that
they each had their own. For operations using the floating-point adder,
multiplier or divider, this exponent adder is used during the first cycle
for exponent approximation, and the last cycle for normalization.
Therefore, these operations should be arranged so that they don't end
on the same cycle, and so that one does not begin on the same cycle
that another one ends. A stall of the pending floating-point operation
will result. This wasn't obvious from Kane's book.
Mark also pointed me to an excellent article (written by him, Chris,
and Paul Ries): "The MIPS R3010 Floating-Point Coprocessor" IEEE Micro,
June 1988, pp. 53-62. I recommend this to anyone who wants to write
floating-point assembly code for the R3010; I have a much better
understanding of the chip after reading this article.
Thanks to both of you for your very helpful advice on this problem.
--Paul