[comp.sys.mac.programmer] Increment

kaufman@Neon.Stanford.EDU (Marc T. Kaufman) (12/18/90)

In article <ewright.661465844@convex.convex.com> ewright@convex.com (Edward V. Wright) writes:
>In <1990Dec17.172613.7941@cs.umn.edu> sec@cs.umn.edu (Stephen E. Collins) writes:

>Actually, this would have to be

->x++:         LOAD  X
->             INC   X
->             STORE X

>Unless you have an instruction to increment variables in memory!

Well, since we ARE in a Mac group, lets just look at the code MPW C generates
for just such constructs:

	i = i+1;
	MOVE.L     i,D2
	ADDQ.L     #$1,D2
	MOVE.L     D2,i

	i++;
	MOVE.L     i,D2
	ADDQ.L     #$1,i

	++i;
	ADDQ.L     #$1,i

Behold.  The 68K does, indeed, have an instruction to increment variables in
memory.

Marc Kaufman (kaufman@Neon.stanford.edu)

philip@pescadero.Stanford.EDU (Philip Machanick) (12/18/90)

In article <1990Dec18.001753.3756@Neon.Stanford.EDU>, kaufman@Neon.Stanford.EDU (Marc T. Kaufman) writes:
|> Well, since we ARE in a Mac group, lets just look at the code MPW C generates
|> for just such constructs:
|> 
|> 	i = i+1;
|> 	MOVE.L     i,D2
|> 	ADDQ.L     #$1,D2
|> 	MOVE.L     D2,i
|> 
|> 	i++;
|> 	MOVE.L     i,D2
|> 	ADDQ.L     #$1,i
|> 
|> 	++i;
|> 	ADDQ.L     #$1,i
|> 
|> Behold.  The 68K does, indeed, have an instruction to increment variables in
|> memory.

Interesting - but remember looking at "toy" examples doesn't tell you much.
In "real" code, where performance really matters, I would hope the compiler
would have loaded the variable into a register for as long as possbile.
Still, it's bad that the compiler doesn't pick up i=i+small constant as a
special case - it would be even worse if the Pascal compiler also did this
since (the original point) you have no option of asking for i++.

Many of the programmer-directed "optimizations" in C, like register
variables, ought to be unnecessary with a modern optimizing compiler.
-- 
Philip Machanick
philip@pescadero.stanford.edu

kaufman@Neon.Stanford.EDU (Marc T. Kaufman) (12/18/90)

In article <1990Dec18.004838.5623@Neon.Stanford.EDU> philip@pescadero.stanford.edu writes:
>In article <1990Dec18.001753.3756@Neon.Stanford.EDU>, kaufman@Neon.Stanford.EDU (Marc T. Kaufman) writes:
|>> Well, since we ARE in a Mac group, lets just look at the code MPW C generates
|>> for just such constructs:
|>> 
|>> 	i = i+1;
|>> 	MOVE.L     i,D2
|>> 	ADDQ.L     #$1,D2
|>> 	MOVE.L     D2,i
|>> 
|>> 	i++;
|>> 	MOVE.L     i,D2
|>> 	ADDQ.L     #$1,i
|>> 
|>> 	++i;
|>> 	ADDQ.L     #$1,i
|>> 
|>> Behold.  The 68K does, indeed, have an instruction to increment variables in
|>> memory.

>Interesting - but remember looking at "toy" examples doesn't tell you much.
>In "real" code, where performance really matters, I would hope the compiler
>would have loaded the variable into a register for as long as possbile.

Well, to generate the above code, I declared 'external int i'.  The C
compiler DOES put things in registers.  However, MPW C still generates
pretty crufty code for i = i+1:

	MOVE.L	D2,D0
	ADDQ.L	#$1,D0
	MOVE.L	D0,D2

and this is with optimization ON!

The MOVE.L i,D2 in the i++ case is because the value of the expression (i++)
is (i) before the +1.  D2 is dead because we never use the expression value,
and its a shame that the compiler doesn't remove it.  I'm going to look at
gcc and see if doesn't generate better code.  With the complexity of today's
applications, every little 5% helps.

Marc Kaufman (kaufman@Neon.stanford.edu)
>Still, it's bad that the compiler doesn't pick up i=i+small constant as a
>special case - it would be even worse if the Pascal compiler also did this
>since (the original point) you have no option of asking for i++.
>
>Many of the programmer-directed "optimizations" in C, like register
>variables, ought to be unnecessary with a modern optimizing compiler.
>-- 
>Philip Machanick
>philip@pescadero.stanford.edu

wayner@cello.cs.cornell.edu (Peter Wayner) (12/18/90)

kaufman@Neon.Stanford.EDU (Marc T. Kaufman) writes:

>In article <ewright.661465844@convex.convex.com> ewright@convex.com (Edward V. Wright) writes:
>>In <1990Dec17.172613.7941@cs.umn.edu> sec@cs.umn.edu (Stephen E. Collins) writes:

>>Actually, this would have to be

>->x++:         LOAD  X
>->             INC   X
>->             STORE X

>>Unless you have an instruction to increment variables in memory!

>Well, since we ARE in a Mac group, lets just look at the code MPW C generates
>for just such constructs:

>	i = i+1;
>	MOVE.L     i,D2
>	ADDQ.L     #$1,D2
>	MOVE.L     D2,i

>	i++;
>	MOVE.L     i,D2
>	ADDQ.L     #$1,i

>	++i;
>	ADDQ.L     #$1,i

>Behold.  The 68K does, indeed, have an instruction to increment variables in
>memory.

>Marc Kaufman (kaufman@Neon.stanford.edu)

This is good information, but how many cycles does each of these take?
They are both going to memory, incrementing and returning a value. I 
suspect that the latest versions of the 68040 will heavily pipeline
this operation, perhaps, but it is not clear that just using one instruction
is faster than using 3. There are many examples from the fine old VAX
where it was faster to use 3 simple instructions instead of one super-duper
one. This observation was the impetus for the RISC movement. So what
is it? Anyone have a 680?0 manual handy?

Peter Wayner   Department of Computer Science Cornell Univ. Ithaca, NY 14850
EMail:wayner@cs.cornell.edu    Office: 607-255-9202 or 255-1008
Home: 116 Oak Ave, Ithaca, NY 14850  Phone: 607-277-6678

kaufman@Neon.Stanford.EDU (Marc T. Kaufman) (12/18/90)

In article <49832@cornell.UUCP> wayner@cello.cs.cornell.edu (Peter Wayner) writes:
>kaufman@Neon.Stanford.EDU (Marc T. Kaufman) writes:

->	++i;
->	ADDQ.L     #$1,i

->Behold.  The 68K does, indeed, have an instruction to increment variables in
->memory.

>This is good information, but how many cycles does each of these take?
>They are both going to memory, incrementing and returning a value.
>...but it is not clear that just using one instruction is faster than using 3.
>Anyone have a 680?0 manual handy?

Well, we could just resort to first principles and note that to add 1 to a
location in memory requires both reading the old value and writing the new
value.  Even RISC machines are constrained to do that.  In fact, the ADDQ
is one of the faster instructions, as it is only 16 bits with no extensions.

Marc Kaufman (kaufman@Neon.stanford.edu)

Lewis_P@cc.curtin.edu.au (Peter Lewis) (12/18/90)

In article <1990Dec18.001753.3756@Neon.Stanford.EDU>, kaufman@Neon.Stanford.EDU (Marc T. Kaufman) writes:
> Well, since we ARE in a Mac group, lets just look at the code MPW C generates
> for just such constructs:
> 
> 	i = i+1;
> 	MOVE.L     i,D2
> 	ADDQ.L     #$1,D2
> 	MOVE.L     D2,i
> 
> 	i++;
> 	MOVE.L     i,D2
> 	ADDQ.L     #$1,i
> 
> 	++i;
> 	ADDQ.L     #$1,i
> 
> Behold.  The 68K does, indeed, have an instruction to increment variables in
> memory.
> 
> Marc Kaufman (kaufman@Neon.stanford.edu)

Since the discusion was started because of the deficiency in Pascal of not
having the ++ operator (which, BTW, I agree is a deficiency), I thought it
would be interesting to see what THINK Pascal produced.  In fact, TP
compiles thusly ...

var i:integer;
begin
   i:=1;        { MOVEQ    #1,D7 }
   i:=i+1;      { ADDQ.W   #1,D7 }
end.

   So much for C being more efficient than Pascal :-).  Of course as
someone else pointed out, these are toy examples, and comparing MPW C to
THINK Pascal is a bit iffy (someone like to tell us what THINK C produces
(with/without a register statement)).  But its interesting ...
   Peter.
-- 
Disclaimer:Curtin & I have an agreement:Neither of us listen to either of us.
*-------+---------+---------+---------+---------+---------+---------+-------*
Internet: Lewis_P@cc.curtin.edu.au              I Peter Lewis
ACSnet: Lewis_P@cc.cut.oz.au                    I NCRPDA, Curtin University
Bitnet: Lewis_P%cc.curtin.edu.au@cunyvm.bitnet  I GPO Box U1987
UUCP: uunet!munnari.oz!cc.curtin.edu.au!Lewis_P I Perth, WA, 6001, AUSTRALIA
Hack: ResEdit ResEdit 2.0b2, change CODE=5, 00091C: 4EBA 02A4 to 4E71 4E71

olson@bootsie.UUCP (Eric Olson) (12/18/90)

In article <49832@cornell.UUCP>, many people calmly discuss:
>>>Actually, this would have to be
>
>>->x++:         LOAD  X
>>->             INC   X
>>->             STORE X
>
>>>Unless you have an instruction to increment variables in memory!
>>...lets just look at the code MPW C generates for just such constructs:
>
[Next line quoted out of position -EO]
>This is good information, but how many cycles does each of these take?

				; For 68000, no wait state memory
>
>>	i = i+1;
>>	MOVE.L     i,D2		; 16(4/0)
>>	ADDQ.L     #$1,D2	;  8(1/0)
>>	MOVE.L     D2,i		; 16(2/2)
>				;=40(7/2)
>>	i++;
>>	MOVE.L     i,D2		; 16(4/0)
>>	ADDQ.L     #$1,i	; 12(1/2) + 12(3/0)
>				;=40(8/2)
>>	++i;
>>	ADDQ.L     #$1,i	; 12(1/2) + 12(3/0)
>				;=24(4/2)

				; For 68020, no wait state memory
>				; Best Case	Cache Case	Worst Case
>>	i = i+1;
>>	MOVE.L     i,D2		; 3(1/0/0)	7(1/0/0)	9(1/2/0)
>>	ADDQ.L     #$1,D2	; 0(0/0/0)	2(0/0/0)	3(0/1/0)
>>	MOVE.L     D2,i		; 3(0/0/1)	5(0/0/1)	7(0/1/1)
>				;=6(1/0/1)	14(1/0/1)	19(1/4/1)
>>	i++;
>>	MOVE.L     i,D2		; 3(1/0/0)	7(1/0/0)	9(1/2/0)
>>	ADDQ.L     #$1,i	; 3(0/0/1)	4(0/0/1)	6(0/1/1)
				;+3(1/0/0)	5(1/0/0)	6(1/1/0)
>				;=9(2/0/1)	16(2/0/1)	21(2/4/1)
>>	++i;
>>	ADDQ.L     #$1,i	; 3(0/0/1)	4(0/0/1)	6(0/1/1)
				;+3(1/0/0)	5(1/0/0)	6(1/1/0)
>				;=6(1/0/1)	9(1/0/1)	12(1/2/1)

For the 68000, the numbers mean:

	Total Clock Cycles (Read Cycles/Write Cycles)

	Read Cycles and Write Cycles == 4 Clock Cycles.  So, for example,
	18(3/1) is 18 clock cycles, of which 12 (4*3) are read cycles,
	4 (1*4) are write cycles, and the remainder (2) are cycles required
	for some internal function of the processor.

	The assumption that zero wait state memory is used isn't valid for
	all 68000 based Macintoshes; I can't remember which.

For the 68020, the numbers mean:

	Total Clock Cycles (Read Cycles/Instuction Access Cycles/Write Cycles)

	Read, Write and Instruction Access Cycles == 3 Clock Cycles.

	The timings shown for the 68020 assume all operands are longword
	aligned, a 32-bit data bus, and zero wait state memory.

Sorry, I don't have a 68030 manual.

So, what does this all mean?

	1. 68020s are faster than 68000s.
	2. Knowing how fast anything runs on a 68020 is context dependant.
	3. Running two instructions takes longer than running one of the two.
	4. I'm a sucker when somebody says "Anybody got a manual?" :-)

Cheers!

-Eric
-- 
Eric K. Olson, Editor, Prepare()       NOTE: olson@bootsie.uucp will not work!
Lexington Software Design              Internet: olson@endor.harvard.edu
72A Lowell St., Lexington, MA 02173    Usenet:   harvard!endor!olson
(617) 863-9624                         Bitnet:   OLSON@HARVARD

leonardr@svc.portal.com (Leonard Rosenthol) (12/19/90)

In article <1990Dec18.015258.8631@Neon.Stanford.EDU>, kaufman@Neon.Stanford.EDU
(Marc T. Kaufman) writes:
In article <1990Dec18.015258.8631@Neon.Stanford.EDU>, you write:
> Well, to generate the above code, I declared 'external int i'.  The C
> compiler DOES put things in registers.  However, MPW C still generates
> pretty crufty code for i = i+1:
> 
> 	MOVE.L	D2,D0
> 	ADDQ.L	#$1,D0
> 	MOVE.L	D0,D2
> 
> and this is with optimization ON!
> 
	Just out of curiosity, I tried this experiment on our handy UNIX
box (which unfortunately is a NeXT running system 2.0) running gcc -O
(GNU v1.36/NeXT v3.11) and using the routines:

alpha() { int i; i = 1; i = i +1;}
beta()  { int i; i = 1; i++;}
main()  {alpha(); beta()}

And it generated the following:
link	fp, #0
unlk	fp
rts

If a return(i) was added to both alpha and beta, then they both generate:
link	fp, #0
moveq	#2, d0
unlk	fp
rts

Seems like a REAL nice optimization, eh?!?  Oh, it should be pointed out
that the debugger (gdb) requires the link/unlk instructions, and it may be 
possible to even have THEM optimized out.

--
----------------------------------------------------------------------
+ Leonard Rosenthol              | Internet: leonardr@sv.portal.com  +
+ Software Ventures              | GEnie:    MACgician               +
+ MicroPhone II Development Team | AOL:      MACgician1              +
----------------------------------------------------------------------

urlichs@smurf.sub.org (Matthias Urlichs) (12/19/90)

In comp.sys.mac.programmer, article <1990Dec18.015258.8631@Neon.Stanford.EDU>,
  kaufman@Neon.Stanford.EDU (Marc T. Kaufman) writes:
< 
< Well, to generate the above code, I declared 'external int i'.  The C
< compiler DOES put things in registers.  However, MPW C still generates
< pretty crufty code for i = i+1:
< 
< 	MOVE.L	D2,D0
< 	ADDQ.L	#$1,D0
< 	MOVE.L	D0,D2
< 
< and this is with optimization ON!
< 
MPW Pascal is about as bad.

< [...]  I'm going to look at gcc and see if doesn't generate better code.

Substantially; it generates direct ADDQ instructions in all three cases
(x=x+1; x++; ++x), for registers as well as memory locations.
< 
While gcc misses a few obvious optimizations like MOVE.L D0,D0 in some cases,
its code quality approaches being comparable to what a human programmer might
create. That can't be said for the MPW compilers, unfortunately, and there's
no Gnu Pascal yet.

(On the other hand, it's relatively easy to translate Pascal to Modula-2, and
 the p1 Modula compiler generates good code... ;-) )
-- 
Matthias Urlichs -- urlichs@smurf.sub.org -- urlichs@smurf.ira.uka.de     /(o\
Humboldtstrasse 7 - 7500 Karlsruhe 1 - FRG -- +49+721+621127(0700-2330)   \o)/

urlichs@smurf.sub.org (Matthias Urlichs) (12/19/90)

In comp.sys.mac.programmer, article <49832@cornell.UUCP>,
  wayner@cello.cs.cornell.edu (Peter Wayner) writes:
< 
< This is good information, but how many cycles does each of these take?
< They are both going to memory, incrementing and returning a value. I 
< suspect that the latest versions of the 68040 will heavily pipeline
< this operation, perhaps, but it is not clear that just using one instruction
< is faster than using 3.

Don't forget the two memory cycles to read these other instructions, which
also translate to taking more time to read them from disk and effectively
fewer instructions that can be kept in the instruction cache.

The MC68000 manual says that
	ADDQ.L	#1,(Ax)
will take 20 clock cycles (assuming four cycles per memory access), while
	MOVE.L	(Ax),Dy
	ADDQ.L	#1,Dy
	MOVE.L	Dy,(Ax)
takes 8+8+8=24 cycles. (All of these instructions take up 16 bits.)

-- 
Matthias Urlichs -- urlichs@smurf.sub.org -- urlichs@smurf.ira.uka.de     /(o\
Humboldtstrasse 7 - 7500 Karlsruhe 1 - FRG -- +49+721+621127(0700-2330)   \o)/

Invader@cup.portal.com (Michael K Donegan) (12/20/90)

This is all very interesting, but the truth is that there isn't on
program in 10,000 for which it matters what code gets generated
in this case.  And in that one, it only matters in about one
percent of its lines.
	mkd

Lawson.English@p88.f15.n300.z1.fidonet.org (Lawson English) (12/22/90)

Philip Machanick writes in a message to All

PM> Many of the programmer-directed "optimizations" in C, like register 
PM> variables, ought to be unnecessary with a modern optimizing compiler.

With pipelining, and global optimization of registers (is this last even 
possible
on a Mac?), the consensus it that the "register" variable days of C or fast
fading (though (aside from HyperC) I have yet to see a good optimizing 
compiler
on the Mac.

BTW, has anyone thought to time the trap dispatcher for the '020/030 series
computers? My work a few years back on an acellerator card indicated that the
faster the processor, the more overhead the trap dispatcher gave.

Lawson
 

 

--  
Uucp: ...{gatech,ames,rutgers}!ncar!asuvax!stjhmc!300!15.88!Lawson.English
Internet: Lawson.English@p88.f15.n300.z1.fidonet.org

Lawson.English@p88.f15.n300.z1.fidonet.org (Lawson English) (12/22/90)

Marc T. Kaufman writes in a message to All

MTK> The MOVE.L i,D2 in the i++ case is because the value of the expression

MTK> (i++) is (i) before the +1. D2 is dead because we never use the 
MTK> expression value, and its a shame that the compiler doesn't remove 
MTK> it. I'm going to look at gcc and see if doesn't generate better 
MTK> code. With the complexity of today's applications, every little 
MTK> 5% helps. 

So are you checking to see how much time is spent in the trap dispatcher? The
average program (unless you aren't following the Mac User Interface Guidelines
 
or have massive amounts of number crunching) spends more time there than in
any other part of the the program. 20% on a plus, 40+% on a Mac II or higher.


Mac optimization is whole different world...


Lawson
 

 

--  
Uucp: ...{gatech,ames,rutgers}!ncar!asuvax!stjhmc!300!15.88!Lawson.English
Internet: Lawson.English@p88.f15.n300.z1.fidonet.org

keith@Apple.COM (Keith Rollin) (12/24/90)

In article <33019.27737F47@stjhmc.fidonet.org> Lawson.English@p88.f15.n300.z1.fidonet.org (Lawson English) writes:
>Marc T. Kaufman writes in a message to All
>
>MTK> I'm going to look at gcc and see if doesn't generate better 
>MTK> code. With the complexity of today's applications, every little 
>MTK> 5% helps. 

I took a look at one of the samples posted here compiled with both
MPW C and gnu C. This was the source:

void empty(int i) {};
main() { int i = 1; empty(++i); };

Under MPW C, we get the following:

empty():
00000000: 4E56 0000      LINK       A6,#$0000
00000004: 4E5E           UNLK       A6
00000006: 4E75           RTS        

main():
00000000: 4E56 0000      LINK       A6,#$0000
00000004: 2F07           MOVE.L     D7,-(A7)
0000000e: 7E01           MOVEQ      #$01,D7
00000008: 5287           ADDQ.L     #$1,D7
0000000A: 2F07           MOVE.L     D7,-(A7)
0000000C: 4EBA 0000      JSR        empty               ; id: 1
00000010: 2E2E FFFC      MOVE.L     -$0004(A6),D7
00000014: 4E5E           UNLK       A6
00000016: 4E75           RTS        

The LINK/UNLK in empty() is interesting. This doesn't occur under MPW
3.1 C, but it seems to be back in MPW 3.2 C. I'm not sure why.

Also, when compiling the sample, I got the warning from C that said
"Parameter "i" not used within the Body of the function :  empty"
However, when I removed "i" from the definition header, gC complained
with an error that "parameter name omitted".

Under gnu C, we get the following:

empty():
00000000: 4E75           RTS        

main():
00000000: 4878 0002      PEA        $0002
00000004: 4EBA 0000      JSR        empty               ; id: 5
00000008: 584F           ADDQ.W     #$4,A7
0000000A: 4E75           RTS        

Here, we see the same results that one other person noticed when
compiling on their NeXT block. Namely, that the i=1;++i; gets
optimized to "2". Another interesting thing is that empty() is
reduced to a simple RTS. Should an optimizing compiler recognize
that this is a null procedure, and remove the call to it altogether?

By the way, I also tried out gC on a much larger program. Compiled
under MPW 3.2 C, this program was 83K long. Under gC, the program
was 81K. Either:

a) MPW C is better than we thought
b) gC is not as good as we thought
c) the author of the program took advantage of constructs that lent
   themselves to being compiled better, no matter what the compiler was
   (I know that this alternative is not the case, though)
d) or the example I chose just happened to be a bad example ("bad", that
   is, for anyone trying to show that MPW C is a crummy compiler).


-- 
------------------------------------------------------------------------------
Keith Rollin  ---  Apple Computer, Inc.  ---  Developer Technical Support
INTERNET: keith@apple.com
    UUCP: {decwrl, hoptoad, nsc, sun, amdahl}!apple!keith
"Argue for your Apple, and sure enough, it's yours" - Keith Rollin, Contusions

peirce@outpost.UUCP (Michael Peirce) (12/24/90)

In article <47576@apple.Apple.COM>, keith@Apple.COM (Keith Rollin) writes:
> By the way, I also tried out gC on a much larger program. Compiled
> under MPW 3.2 C, this program was 83K long. Under gC, the program
> was 81K. Either:
> 
> a) MPW C is better than we thought
> b) gC is not as good as we thought
> c) the author of the program took advantage of constructs that lent
>    themselves to being compiled better, no matter what the compiler was
>    (I know that this alternative is not the case, though)
> d) or the example I chose just happened to be a bad example ("bad", that
>    is, for anyone trying to show that MPW C is a crummy compiler).

I'm no compiler expert (and it probably shows :-), but I think we're
barking up the wrong trees with our simple examples.

Really good compilers shine in the complex programs.  They look at
the context of big complex programs and do wonderous things.  

I remember working with some DEC compilers on VAX/VMS that did such
things.  One example was some very fancy automatic inlining functions, 
then it optimizing the result down to very a few instructions.  It allowed 
programmers to write very abstract code and still get the efficency some 
people only believe you get in C.

If a compiler can't figure out that i++ and i=i+1 aren't the same
thing, this is just plan stupid.  Now, I realize that some compilers
are still fairly stupid and we need to pay attention to details in
certain critical pieces of code to squeeze the most out our Macs,
but someday the compilers on the Mac will catch up with the rest of
the industry. (please please please!)

Another point to keep in mind is that all the time spend hand tweeking
can often be better spent rethinking the algorithm in the first place.
Very efficient implementations of poor algorithms will be slower than
fair implementations of superior algorithms.

-- michael, shooting is mouth off again...


--  Michael Peirce         --   {apple,decwrl}!claris!outpost!peirce
--  Peirce Software        --   Suite 301, 719 Hibiscus Place
--  Macintosh Programming  --   San Jose, California 95117
--           & Consulting  --   (408) 244-6554, AppleLink: PEIRCE

freek@fwi.uva.nl (Freek Wiedijk) (12/24/90)

keith@Apple.COM (Keith Rollin) writes:
>                  Another interesting thing is that empty() is
>reduced to a simple RTS. Should an optimizing compiler recognize
>that this is a null procedure, and remove the call to it altogether?

What happens if you declare empty static and compile the program with the
flag -finline-functions?  I do not have gcc for MPW, so I cannot try it
myself.  I am very curious what will be left.

Freek "the Pistol Major" Wiedijk                      E-mail: freek@fwi.uva.nl
#P:+/ = #+/P?*+/ = i<<*+/P?*+/ = +/i<<**P?*+/ = +/(i<<*P?)*+/ = +/+/(i<<*P?)**

keith@Apple.COM (Keith Rollin) (12/28/90)

In article <1530@carol.fwi.uva.nl> freek@fwi.uva.nl (Freek Wiedijk) writes:
>keith@Apple.COM (Keith Rollin) writes:
>>                  Another interesting thing is that empty() is
>>reduced to a simple RTS. Should an optimizing compiler recognize
>>that this is a null procedure, and remove the call to it altogether?
>
>What happens if you declare empty static and compile the program with the
>flag -finline-functions?  I do not have gcc for MPW, so I cannot try it
>myself.  I am very curious what will be left.

Pretty slick! The program:

static void empty(int i) {};
main() { int i = 1; empty(++i); };

reduces to:

Module:            Flags=$00=(Local Code)  Module="main%"(1) Segment="Main"(2)
00000000: 4E56 0000      'NV..'            LINK       A6,#$0000
00000004: 4E5E           'N^'              UNLK       A6
00000006: 4E75           'Nu'              RTS        

Now...I wonder why the LINK/UNLK are still there. I also used -mbg off, so
it's not for debugging purposes...

-- 
------------------------------------------------------------------------------
Keith Rollin  ---  Apple Computer, Inc.  ---  Developer Technical Support
INTERNET: keith@apple.com
    UUCP: {decwrl, hoptoad, nsc, sun, amdahl}!apple!keith
"Argue for your Apple, and sure enough, it's yours" - Keith Rollin, Contusions

Keith.Rollin@f20.n226.z1.FIDONET.ORG (Keith Rollin) (12/28/90)

Reply-To: keith@Apple.COM

In article <1530@carol.fwi.uva.nl> freek@fwi.uva.nl (Freek Wiedijk) writes:
>keith@Apple.COM (Keith Rollin) writes:
>>                  Another interesting thing is that empty() is
>>reduced to a simple RTS. Should an optimizing compiler recognize
>>that this is a null procedure, and remove the call to it altogether?
>
>What happens if you declare empty static and compile the program with the
>flag -finline-functions?  I do not have gcc for MPW, so I cannot try it
>myself.  I am very curious what will be left.

Pretty slick! The program:

static void empty(int i) {};
main() { int i = 1; empty(++i); };

reduces to:

Module:            Flags=$00=(Local Code)  Module="main%"(1) Segment="Main"(2)
00000000: 4E56 0000      'NV..'            LINK       A6,#$0000
00000004: 4E5E           'N^'              UNLK       A6
00000006: 4E75           'Nu'              RTS        

Now...I wonder why the LINK/UNLK are still there. I also used -mbg off, so
it's not for debugging purposes...

-- 
------------------------------------------------------------------------------
Keith Rollin  ---  Apple Computer, Inc.  ---  Developer Technical Support
INTERNET: keith@apple.com
    UUCP: {decwrl, hoptoad, nsc, sun, amdahl}!apple!keith
"Argue for your Apple, and sure enough, it's yours" - Keith Rollin, Contusions

 + Organization: Apple Computer Inc., Cupertino, CA

--  
Keith Rollin - via FidoNet node 1:105/14
    UUCP: ...!{uunet!glacier, ..reed.bitnet}!busker!226!20!Keith.Rollin
INTERNET: Keith.Rollin@f20.n226.z1.FIDONET.ORG

urlichs@smurf.sub.org (Matthias Urlichs) (12/31/90)

In comp.sys.mac.programmer, article <47610@apple.Apple.COM>,
  keith@Apple.COM (Keith Rollin) writes:
< In article <1530@carol.fwi.uva.nl> freek@fwi.uva.nl (Freek Wiedijk) writes:
< >
< >What happens if you declare empty static and compile the program with the
< >flag -finline-functions?  I do not have gcc for MPW, so I cannot try it
< >myself.  I am very curious what will be left.
< 
< static void empty(int i) {};
< main() { int i = 1; empty(++i); };
<	reduces to:
< Module:            Flags=$00=(Local Code)  Module="main%"(1) Segment="Main"(2)
< 00000000: 4E56 0000      'NV..'            LINK       A6,#$0000
< 00000004: 4E5E           'N^'              UNLK       A6
< 00000006: 4E75           'Nu'              RTS        
< 
<Now...I wonder why the LINK/UNLK are still there. I also used -mbg off, so
<it's not for debugging purposes...
< 
Probably because you didn't use -fomit-frame-pointer.
(Procedure entry/exit code doesn't go thru gcc's optimizer.)

-- 
Matthias Urlichs -- urlichs@smurf.sub.org -- urlichs@smurf.ira.uka.de     /(o\
Humboldtstrasse 7 - 7500 Karlsruhe 1 - FRG -- +49+721+621127(0700-2330)   \o)/

lins@Apple.COM (Chuck Lins) (01/03/91)

In article <47576@apple.Apple.COM> keith@Apple.COM (Keith Rollin) writes:
>By the way, I also tried out gC on a much larger program. Compiled
>under MPW 3.2 C, this program was 83K long. Under gC, the program
>was 81K. Either:

Code size is not necessarily a valid measure of code quality. For the 020 and
030 the compiler can generate MORE instructions and yet the code will run
faster. Alignment of data to longwords boundaries is very important on these
processors. (And both MPW C and Pascal are poor in this regard.) There are
other factors as well but there's no sense going into them unless you want to
write a compiler.


-- 
Chuck Lins               | "Is this the kind of work you'd like to do?"
Apple Computer, Inc.     | -- Front 242
20525 Mariani Avenue     | Internet:  lins@apple.com
Mail Stop 37-BD          | AppleLink: LINS@applelink.apple.com
Cupertino, CA 95014      | "Self-proclaimed Object Oberon Evangelist"
The intersection of Apple's ideas and my ideas yields the empty set.