smcgerty@vax1.tcd.ie (02/12/91)
Hi 68000 users! Here's a little trick that someone might find useful: (maybe its common knowlage?) Right, picture the problem; you want to move, say, 1200 bytes from A to B QUICKLY but you couldn't be bothered getting the Blitter to do it/Blitter is busy/You just don't know how to get the blitter to do it. So you do it like this LEA Source,A0 LEA Dest,A1 MOVE.W #300,D0 ; 1200 Bs=300 LWs Loop: MOVE.L (A0)+,(A1)+ DBRA D0,Loop How about this, which takes about 2/3 of the time of the above: LEA Source,A0 LEA Dest,A1 MOVE.W #25,D0 ;25*48=1200 bytes Loop: MOVEM.L (A0)+,D1-D7/A2-A6 ;12 LWs! = 48 bytes MOVEM.L D1-D7/A2-A6,(A1) ADDA.L #48,A1 ;since MOVEM can't have (A1)+ as Dest. operand DBRA D0,Loop Ok, so its a little register intensive, but you can always save all the regs before using the routine, and restore them later. Just to get a bit more speed, you could have a bigger loop, which has, say, five itterations of the original loop in one loop, which saves you 4 DBRA instructions for every 5-itteration. (I think thats almost 40 clock cycles!) You may think thats trivial, but it all mounts up! Anyone got any other tricks? ---------------------------------------------------------------------------- | / T | / Stephen John McGerty | Amiga // | | / | |/ smcgerty@vax1.tcd.ie (C.Sci.) | "Hmm.. No, nothing." \\// | |__________________________________________|_______________________________|
jesup@cbmvax.commodore.com (Randell Jesup) (02/19/91)
In article <1991Feb11.160212.7749@vax1.tcd.ie> smcgerty@vax1.tcd.ie writes: >Here's a little trick that someone might find useful: >(maybe its common knowlage?) Yes. [example of movem-loop follows..] Or you could use CopyMem() (or CopyMemQuick() when you know the source and destination are aligned). They use movem-loops when possible. (In fact, under 2.0 CopyMem is adaptive to the processor in use). Suprising what you can do when you use the OS.... -- Randell Jesup, Keeper of AmigaDos, Commodore Engineering. {uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com BIX: rjesup The compiler runs Like a swift-flowing river I wait in silence. (From "The Zen of Programming") ;-)
smcgerty@vax1.tcd.ie (02/21/91)
In article <19100@cbmvax.commodore.com>, jesup@cbmvax.commodore.com (Randell Jesup) writes: > In article <1991Feb11.160212.7749@vax1.tcd.ie> smcgerty@vax1.tcd.ie writes: >>Here's a little trick that someone might find useful: >>(maybe its common knowlage?) > Yes. Not judging by the response I got... Remember, there's always someone lower than you on the learning curve.... > [example of movem-loop follows..] > Or you could use CopyMem() (or CopyMemQuick() when you know the source > and destination are aligned). They use movem-loops when possible. (In > fact, under 2.0 CopyMem is adaptive to the processor in use). > Suprising what you can do when you use the OS.... > -- > Randell Jesup, Keeper of AmigaDos, Commodore Engineering. Hey, I don't doubt the OS is very fast and neat; we all use it quite often, and its great etc etc.. However, as far as giving people a deeper understanding of 68000 programming is concerned , an example of a movem-loop in assembly is a bit better than a recommendation to use an OS routine. By writing my example, I wasn't really trying to fulfill someone's desire to have a fast-copy-memory routine, but instead I wanted to stimulate an interest in the techniques of using the 68000 efficiently. If everyone purely relied on OS routines, without knowing how they worked, then there would be a lot more ignorance about the nitty-gritty techniques of programming the Amiga. Re-inventing the wheel is often the best way of educating yourself. I find it helpful, and I reckon others do too. ---------------------------------------------------------------------------- | / T | / Stephen John McGerty | Amiga // | | / | |/ smcgerty@vax1.tcd.ie (C.Sci.) | "Hmm.. No, nothing." \\// | |__________________________________________|_______________________________|
dillon@overload.Berkeley.CA.US (Matthew Dillon) (02/23/91)
In article <1991Feb21.115145.7828@vax1.tcd.ie> smcgerty@vax1.tcd.ie writes: >In article <19100@cbmvax.commodore.com>, jesup@cbmvax.commodore.com (Randell Jesup) writes: >> In article <1991Feb11.160212.7749@vax1.tcd.ie> smcgerty@vax1.tcd.ie writes: >>... > >Hey, I don't doubt the OS is very fast and neat; we all use it quite often, and >its great etc etc.. However, as far as giving people a deeper understanding of >68000 programming is concerned , an example of a movem-loop in assembly is a >bit better than a recommendation to use an OS routine. > >By writing my example, I wasn't really trying to fulfill someone's desire to >have a fast-copy-memory routine, but instead I wanted to stimulate an interest >in the techniques of using the 68000 efficiently. > >Re-inventing the wheel is often the best way of educating yourself. I find it >helpful, and I reckon others do too. >... I generally post this about once a year when the question comes up.. here is a fully working MOVMEM() call that optimizes via MOVEM: -Matt Matthew Dillon dillon@Overload.Berkeley.CA.US 891 Regal Rd. uunet.uu.net!overload!dillon Berkeley, Ca. 94708 USA ; MOVMEM.A ; ; (c)Copyright 1990, Matthew Dillon, All Rights Reserved section text,code ; movmem(src, dst, len) (ANSI) ; bcopy(src, dst, len) (UNIX) ; A0 A1 D0 DICE-REG ; A0 A1 D0 internal ; 4(sp) 8(sp) 12(sp) ; ; The memory move algorithm is somewhat more of a mess ; since we must do it either ascending or decending. xdef _movmem xdef _bcopy ; UNIX xdef @movmem xdef @bcopy ; UNIX _bcopy: _movmem: move.l 4(sp),A0 move.l 8(sp),A1 move.l 12(sp),D0 @bcopy: @movmem: cmp.l A0,A1 ;move to self beq xbmend bls xbmup xbmdown adda.l D0,A0 ;descending copy adda.l D0,A1 move.w A0,D1 ;CHECK WORD ALIGNED lsr.l #1,D1 bcs xbmdown1 move.w A1,D1 lsr.l #1,D1 bcs xbmdown1 cmp.l #259,D0 ;chosen by calculation. bcs xbmdown8 move.l D0,D1 ;overhead for bmd44: ~360 divu #44,D1 bvs xbmdown8 ;too big (> 2,883,540) movem.l D2-D7/A2-A6,-(sp) ;use D2-D7/A2-A6 (11 regs) move.l #44,D0 bra xbmd44b xbmd44a sub.l D0,A0 ;8 total 214/44bytes movem.l (A0),D2-D7/A2-A6 ;12 + 8*11 4.86 cycles/byte movem.l D2-D7/A2-A6,-(A1) ; 8 + 8*11 xbmd44b dbf D1,xbmd44a ;10 swap D1 ;D0<15:7> already contain 0 move.w D1,D0 ;D0 = remainder movem.l (sp)+,D2-D7/A2-A6 xbmdown8 move.w D0,D1 ;D1<2:0> = #bytes left later lsr.l #3,D0 ;divide by 8 bra xbmd8b xbmd8a move.l -(A0),-(A1) ;20 total 50/8bytes move.l -(A0),-(A1) ;20 = 6.25 cycles/byte xbmd8b dbf D0,xbmd8a ;10 sub.l #$10000,D0 bcc xbmd8a move.w D1,D0 ;D0 = 0 to 7 bytes and.l #7,D0 bne xbmdown1 xbmend move.l 8(sp),D0 rts xbmd1a move.b -(A0),-(A1) ;12 total 22/byte xbmdown1 ; = 22 cycles/byte xbmd1b dbf D0,xbmd1a ;10 sub.l #$10000,D0 bcc xbmd1a move.l 8(sp),D0 rts xbmup move.w A0,D1 ;CHECK WORD ALIGNED lsr.l #1,D1 bcs xbmup1 move.w A1,D1 lsr.l #1,D1 bcs xbmup1 cmp.l #259,D0 ;chosen by calculation bcs xbmup8 move.l D0,D1 ;overhead for bmu44: ~360 divu #44,D1 bvs xbmup8 ;too big (> 2,883,540) movem.l D2-D7/A2-A6,-(sp) ;use D2-D7/A2-A6 (11 regs) move.l #44,D0 bra xbmu44b xbmu44a movem.l (A0)+,D2-D7/A2-A6 ;12 + 8*11 ttl 214/44bytes movem.l D2-D7/A2-A6,(A1) ;8 + 8*11 4.86 cycles/byte add.l D0,A1 ;8 xbmu44b dbf D1,xbmu44a ;10 swap D1 ;D0<15:7> already contain 0 move.w D1,D0 ;D0 = remainder movem.l (sp)+,D2-D7/A2-A6 xbmup8 move.w D0,D1 ;D1<2:0> = #bytes left later lsr.l #3,D0 ;divide by 8 bra xbmu8b xbmu8a move.l (A0)+,(A1)+ ;20 total 50/8bytes move.l (A0)+,(A1)+ ;20 = 6.25 cycles/byte xbmu8b dbf D0,xbmu8a ;10 sub.l #$10000,D0 bcc xbmu8a move.w D1,D0 ;D0 = 0 to 7 bytes and.l #7,D0 bne xbmup1 move.l 8(sp),D0 rts xbmu1a move.b (A0)+,(A1)+ xbmup1 xbmu1b dbf D0,xbmu1a sub.l #$10000,D0 bcc xbmu1a move.l 8(sp),D0 rts END
dej@qpoint.amiga.ocunix.on.ca (David Jones) (02/23/91)
>In article <1991Feb11.160212.7749@vax1.tcd.ie> smcgerty@vax1.tcd.ie writes: >How about this, which takes about 2/3 of the time of the above: > > LEA Source,A0 > LEA Dest,A1 > MOVE.W #25,D0 ;25*48=1200 bytes >Loop: MOVEM.L (A0)+,D1-D7/A2-A6 ;12 LWs! = 48 bytes > MOVEM.L D1-D7/A2-A6,(A1) > ADDA.L #48,A1 ;since MOVEM can't have (A1)+ as Dest. operand > DBRA D0,Loop > >Anyone got any other tricks? Ya. Save yourself some code. Check out CopyMem() in exec.library (V33 or greater). Disassemble it. Essentially, it is the above code. -- | The Q-Point David Jones |\ Amiga S/W development UUCP: dej@qpoint.amiga.ocunix.on.ca | \ Fido: 1:163/109.8 | \ | \ "I can understand why someone would want to go out, get drunk | -\---- and wake up the next morning with a splitting headache and | / \ absolutely no memory of the night before, but I *cannot* | / \ understand why anyone would want to do that more than once." |/ \ +---------- - Don Elgee
hughesmp@vax1.tcd.ie (03/02/91)
In article <dej.0456@qpoint.amiga.ocunix.on.ca>, dej@qpoint.amiga.ocunix.on.ca (David Jones) writes: >>In article <1991Feb11.160212.7749@vax1.tcd.ie> smcgerty@vax1.tcd.ie writes: >>How about this, which takes about 2/3 of the time of the above: >> >>[..usage of movem deleted..] >> >>Anyone got any other tricks? > > Ya. Save yourself some code. Check out CopyMem() in exec.library > (V33 or greater). Disassemble it. Essentially, it is the above code. Hey cmon man, he doesn't want to hear about supplied software. Often you find stuff written by someone else, particularly the OS, sucks. You want one thing quick. It wants something else slow. So you write it _yourself_. At least that way you know exactly what's going on, how fast, and everyone will be able to use it. Not just people with V33 or greater, whatever that is. He asks (if you read the posting) if anyone else has any tricks. He wants to know if there are any other ways of squeezing more out of what is basically a not-very-fast-processor. One byte per 4 cycles stinks, so what'd it be like without movem? Are there any other ways of doing something else faster; try and get summat out of the machine, if you don't want to waste your money on a bigger chip in the series? Don't say find out about the OS, because it is a heap of it. You want _real_optimisation_ for the specific problem, for which some general ideas may help. Movem is one. The OS is not. Matt Dillon's program is very nice, coping with non-word boundaries and everything, but if you want _everything_ out of the machine, forget those checks. Align your data, and use the plain movems. Shove the loop in a cupboard, and in-line the code. On a processor running at the speed of a low 68000, those cycles count. Save them. Don't give a damn about memory. Remember, only a heartless fiend can get the true max out of the machine. Work everything to the bloody stumps, and waste everything else. T. SICK - the Slightly Intelligent Crazy Rosebi - We came. We saw. We went away again. #! r
lkoop@pnet01.cts.com (Lamonte Koop) (03/04/91)
hughesmp@vax1.tcd.ie writes: >In article <dej.0456@qpoint.amiga.ocunix.on.ca>, dej@qpoint.amiga.ocunix.on.ca (David Jones) writes: >>>In article <1991Feb11.160212.7749@vax1.tcd.ie> smcgerty@vax1.tcd.ie writes: >>>How about this, which takes about 2/3 of the time of the above: >>> >>>[..usage of movem deleted..] >>> >>>Anyone got any other tricks? >> >> Ya. Save yourself some code. Check out CopyMem() in exec.library >> (V33 or greater). Disassemble it. Essentially, it is the above code. > >Hey cmon man, he doesn't want to hear about supplied software. Often you >find stuff written by someone else, particularly the OS, sucks. You want Not in my experience. Just because the OS is "supplied" or written by someone else, it doesn't mean you have to go about re-inventing the wheel because you feel "it sucks"...a feeling which I strongly disagree with. Yes, the OS has it's problems, but it has quite a few excellent points to it as well. >one thing quick. It wants something else slow. So you write it _yourself_. >At least that way you know exactly what's going on, how fast, and everyone >will be able to use it. Not just people with V33 or greater, whatever >that is. He asks (if you read the posting) if anyone else has any tricks. >He wants to know if there are any other ways of squeezing more out of what >is basically a not-very-fast-processor. One byte per 4 cycles stinks, so >what'd it be like without movem? Are there any other ways of doing something >else faster; try and get summat out of the machine, if you don't want to >waste your money on a bigger chip in the series? Don't say find out about >the OS, because it is a heap of it. You want _real_optimisation_ for the >specific problem, for which some general ideas may help. Movem is one. The >OS is not. Matt Dillon's program is very nice, coping with non-word >boundaries and everything, but if you want _everything_ out of the machine, >forget those checks. Align your data, and use the plain movems. Shove the >loop in a cupboard, and in-line the code. On a processor running at the >speed of a low 68000, those cycles count. Save them. Don't give a damn about >memory. Remember, only a heartless fiend can get the true max out of the >machine. Work everything to the bloody stumps, and waste everything else. From you comments, I have just a few of my own: First of all, I have absolutely nothing against optimizing code...in fact I am all for it, and any ideas pertaining to it. However, your attitude seems to be quite hostile towards the OS...which is NOT "full of it". In fact, you seem to be the sort who would write code which crashes just about every machine except a particular model. This may not be the case, but I don't see how you would get decently multitasking-friendly applications when you avoid the OS. Second of all, how to you propose to get anything done...when you insist on reinventing everything? > >T. > >SICK - the Slightly Intelligent Crazy Rosebi - >We came. We saw. We went away again. >#! r LaMonte Koop Internet: lkoop@pnet01.cts.com ARPA: crash!pnet01!lkoop@nosc.mil UUCP: {hplabs!hp-sdd ucsd nosc}!crash!pnet01!lkoop "It's a dog-eat-dog world...and I'm wearing Milk Bone underwear"--Norm
jesup@cbmvax.commodore.com (Randell Jesup) (03/05/91)
In article <1991Mar2.042511.7894@vax1.tcd.ie> hughesmp@vax1.tcd.ie writes: >In article <dej.0456@qpoint.amiga.ocunix.on.ca>, dej@qpoint.amiga.ocunix.on.ca (David Jones) writes: >> Ya. Save yourself some code. Check out CopyMem() in exec.library >> (V33 or greater). Disassemble it. Essentially, it is the above code. > >Hey cmon man, he doesn't want to hear about supplied software. Often you >find stuff written by someone else, particularly the OS, sucks. You want ... >will be able to use it. Not just people with V33 or greater, whatever >that is. V33 is 1.2. Anyone who is running anything earlier than 1.2 deserves 10 lashes with a wet noodle (since 1.0 and 1.1 were only available on A1000's, and they can upgrade in a snap - almost all modern stuff requires 1.2). >waste your money on a bigger chip in the series? Don't say find out about >the OS, because it is a heap of it. You want _real_optimisation_ for the >specific problem, for which some general ideas may help. Movem is one. The >OS is not. Matt Dillon's program is very nice, coping with non-word >boundaries and everything, but if you want _everything_ out of the machine, >forget those checks. Align your data, and use the plain movems. Shove the >loop in a cupboard, and in-line the code. Guess what: what you suggest is exactly what's in the OS. There's CopyMem(), for non-aligned data (ala matt's), and CopyMemQuick(), for aligned data. It can't inline the code, but if you're transferring enough data for movem-loops to make a difference, the cycles for a single subroutine call to start it is WAY down in the noise (plus you win in that on a chip-only machine, ROM access can be far faster than ram access, depending on video mode). And if you happen to run your code on 2.0 with an '020 or better, suddenly your copies get even quicker, since we have separate copy loops for different processors. -- Randell Jesup, Keeper of AmigaDos, Commodore Engineering. {uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com BIX: rjesup The compiler runs Like a swift-flowing river I wait in silence. (From "The Zen of Programming") ;-)
sschaem@starnet.uucp (Stephan Schaem) (03/06/91)
Talking about people that dont like their OS function.... If you think something is not fit, creat your own: why be stuck with other people way of thinking?! I'm not saying replacing but doing addition/extension. I dont extensilvy use intuition (screen mostly) since I have other need and Have fight to mutch to get things to be done the intuition way. The previews example where text: there should be diferent way to handle text, and I find FF or the 2.0 'emulation' not at text display 'peak'.So when I need special text feature I use my own library Alway using the OS is not ALWAY the best solution, and should be the only way to make things work...