[comp.sys.amiga.tech] Fast 2-D Scrolling Method Needed

bosch@dg.dg.com (Derek Bosch) (02/27/89)

Hi!  I am working on a couple of projects that require extremely fast
2-D scrolling of a 16 or 32 color playfield.  The entire display is going to
be roughly 4 lo-res screens high, by 5 lo-res screens wide, so that would 
be too large to have a SuperBitmap to scroll around (and still have any free
memory left :^P.  What other options do I have?  These projects require as fast
screen updates as possible.  Any suggestions will be greatly appreciated!

	Derek Bosch
	Data General

ewhac@well.UUCP (Leo 'Bols Ewhac' Schwab) (03/02/89)

In article <112@dg.dg.com> bosch@dg.UUCP (Derek Bosch) writes:
>The entire display is going to
>be roughly 4 lo-res screens high, by 5 lo-res screens wide, so that would 
>be too large to have a SuperBitmap to scroll around (and still have any free
>memory left :^P.  What other options do I have?  [ ... ]

	Well, you could try using 'tiles'.  That is, divide the virtual
screen up into X by Y cells, each of which are a fixed size.  When you have
your cells, you can represent your virtual screen as an array of bytes/words
arranged as X by Y.  Each element in the array serves as an index into the
array of cells you have created.  Thus, you can assemble your entire screen by
blitting smaller cells into it.

	Faery Tale is done this way.  Faery Tale's virtual screen size is
HUGE.

_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
Leo L. Schwab -- The Guy in The Cape	INET: well!ewhac@ucbvax.Berkeley.EDU
 \_ -_		Recumbent Bikes:	UUCP: pacbell > !{well,unicom}!ewhac
O----^o	      The Only Way To Fly.	      hplabs / (pronounced "AE-wack")
"Work FOR?  I don't work FOR anybody!  I'm just having fun."  -- The Doctor

donw@zehntel.zehntel.com (Don White) (03/03/89)

In article <112@dg.dg.com> bosch@dg.UUCP (Derek Bosch) writes:
>Hi!  I am working on a couple of projects that require extremely fast
>2-D scrolling of a 16 or 32 color playfield.  The entire display is going to
>be roughly 4 lo-res screens high, by 5 lo-res screens wide, so that would 
>	Derek Bosch
>	Data General

       This should probably be in comp.sys.amiga.tech, but here goes,
   If you allocate and store all your bitmaps in ram, you are going to
   be using a bunch of memory. There is no way around that. If you want
   speed, storing in RAM is the best way to go. After that, check the RKM
   for uses of Dual Play field and SuperBitMap. I think there are specific
   commands for doing scrolling.

   Don White
   zehntel!donw
   Box 271177 Concord, CA. 94527-1177

   Follow-ups to comp.sys.amiga.tech please.

cg@myrias.UUCP (Chris Gray) (03/04/89)

In article <10871@well.UUCP> ewhac@well.UUCP (Leo 'Bols Ewhac' Schwab) writes:

>	Well, you could try using 'tiles'.  That is, divide the virtual
>screen up into X by Y cells, each of which are a fixed size.  When you have
>your cells, you can represent your virtual screen as an array of bytes/words
>arranged as X by Y.  Each element in the array serves as an index into the
>array of cells you have created.  Thus, you can assemble your entire screen by
>blitting smaller cells into it.
>
>	Faery Tale is done this way.  Faery Tale's virtual screen size is
>HUGE.

Using tiles for a scrolling screen works well. If you look carefully at
Faery Tale, you will see that it actually uses a two-level tile system. At
a guess (I'm not in front of my Amiga to count), it uses tiles composed of
4 x 4 subtiles, each of which is about 16 x 16 pixels. It must have been quite
a chore getting the subtiles set up so that everything goes together properly,
and still looks good. You can see the tile size most easily by looking at
some of the floor tiles - the part that isn't shadowed is a basic subtile.

-- 
Chris Gray		Myrias Research, Edmonton	+1 403 428 1616
	{uunet!mnetor,ubc-vision,watmath,vax135}!alberta!myrias!cg

bosch@dg.dg.com (Derek Bosch) (03/07/89)

In article <10871@well.UUCP> ewhac@well.UUCP (Leo 'Bols Ewhac' Schwab) writes:
>	Well, you could try using 'tiles'.  That is, divide the virtual
>screen up into X by Y cells, each of which are a fixed size.  When you have
>your cells, you can represent your virtual screen as an array of bytes/words
>arranged as X by Y.  Each element in the array serves as an index into the
>array of cells you have created.  Thus, you can assemble your entire screen by
>blitting smaller cells into it.

Tiles sound like the way to go.  Which way is faster though?  1)  Create a new
bitmap by scrolling portions of the screen that stay on the screen, and blitting
in only the new portions of the screen.  2)  Create a totally new bitmap by 
blitting in each cell, each time you update the screen.  

It seems that at first glance that 1) would be faster, but I don't know.  Anyone
out there know for sure?

'tanks a lot!

-db-

chad@cup.portal.com (Chad The-Walrus Netzer) (03/08/89)

In a previous article (Derek Bosch) writes:
)In article <10871@well.UUCP> ewhac@well.UUCP (Leo 'Bols Ewhac' Schwab) writes:
)>	Well, you could try using 'tiles'.  That is, divide the virtual
)>screen up into X by Y cells, each of which are a fixed size.  When you have
)>your cells, you can represent your virtual screen as an array of bytes/words
)>arranged as X by Y.  Each element in the array serves as an index into the
)>array of cells you have created.  Thus, you can assemble your entire screen by
)>blitting smaller cells into it.
)Tiles sound like the way to go.  Which way is faster though?  1) Create a new
)bitmap by scrolling portions of the screen that stay on the screen, and
)blitting in only the new portions of the screen.  2) Create a totally new
)bitmap by blitting in each cell, each time you update the screen. 

	Well, for a scrolling screen, why not use the scrolling feature built
into the Amiga's hardware?  This way, when you are scrolling, you need
only fill/blit in the new areas as they are about to scroll onto the
screen (ie.  Tile method).  However, the scrolling itself is done by the
hardware, and will involve only a few pointer changes, rather than
repeatedly blitting whole bitmaps by just a few pixels (SLOW).
	The problems with this method are:
	- Integrating it with the Intuition environment.  It is not
impossible, but I don't know how important this is to your application.
	- Uses lots of memory.  But since ALL scrolling methods will
probably use lots of memory, you probably are prepared for this.  Things
like recopying memory sections and re-setting pointers will have to be
done, so that the bitmap doesn't 'creep' through memory (like a 'queue').
By allowing the hardware to scroll for you most of the time, and then
re-blitting the bitmap back into place each time it scrolls by 16 pixels (a
word) or so, you will reduce blitter usage by approximately 1/16th, and
improve its efficiency by operating solely on word boundaries.  Note:  For
smoothness, you will probably want to intercept the Blitter at the bottom
of the frame and NOT by using WaitTOF(), etc...  This will give you the
TRUE vertical retrace time to use the blitter when it is most efficient. 

	Keep in mind that using the hardware on this level could negate
the direct use of the Intuition and Graphpics libraries (but not
necessarily).  Decide what is important for your project.  Also, doing
this correctly does not mean you have to give up multi-tasking, just
manage your own displays more carefully.  A good knowledge of Intuition,
Graphics, and the Hardware is necessary.  You DEFINATELY need to read the
Hardware Manual's sections on Playfields, scrolling, and the Copper VERY
CAREFULLY!

	Hope this helps (and didn't discourage) you, and just ask if you
need more help/ideas.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
				Chad 'The_Walrus' Netzer -> AmigaManiac++
"Faster than a speeding Blitter..."

ewhac@well.UUCP (Leo 'Bols Ewhac' Schwab) (03/09/89)

In article <113@dg.dg.com> bosch@dg.UUCP (Derek Bosch) writes:
>Tiles sound like the way to go.  Which way is faster though?  1)  Create a new
>bitmap by scrolling portions of the screen that stay on the screen, and
>blitting in only the new portions of the screen.  2)  Create a totally new
>bitmap by blitting in each cell, each time you update the screen.  
>
	If you're doing hardware scrolling, #1 is the winning method.  If,
however, you're scrolling the image in a "window", and you're not faking it
with dual playfield, then it's a toss-up.

	A nice rule of thumb is to try to draw a given pixel as few times as
possible.  Computations and hardware scrolling are cheap.  Drawing bits is
the expensive part, so the fewer times you draw bits, the better.

_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
Leo L. Schwab -- The Guy in The Cape	INET: well!ewhac@ucbvax.Berkeley.EDU
 \_ -_		Recumbent Bikes:	UUCP: pacbell > !{well,unicom}!ewhac
O----^o	      The Only Way To Fly.	      hplabs / (pronounced "AE-wack")
"Work FOR?  I don't work FOR anybody!  I'm just having fun."  -- The Doctor

nsw@cord.UUCP (Neil Weinstock) (03/14/89)

In article <10921@well.UUCP> ewhac@well.UUCP (Leo 'Bols Ewhac' Schwab) writes:
>In article <113@dg.dg.com> bosch@dg.UUCP (Derek Bosch) writes:
>>Tiles sound like the way to go.  Which way is faster though?  1)  Create a new
>>bitmap by scrolling portions of the screen that stay on the screen, and
>>blitting in only the new portions of the screen.  2)  Create a totally new
>>bitmap by blitting in each cell, each time you update the screen.  
>>
>	If you're doing hardware scrolling, #1 is the winning method.  If,
>however, you're scrolling the image in a "window", and you're not faking it
>with dual playfield, then it's a toss-up.
>
>	A nice rule of thumb is to try to draw a given pixel as few times as
>possible.  Computations and hardware scrolling are cheap.  Drawing bits is
>the expensive part, so the fewer times you draw bits, the better.

Hmmm.  Pardon my thickness, but what exactly do you mean by "hardware 
scrolling" here?  Blitting the screen down a line?  Copper manipulation?
If you're talking about blitting, then it might become expensive with several
bitplanes and very high frame update and scrolling rates.

Copper manipulation would be tricky.  This doesn't stop me, though (it never
does ;-) from pondering funky ways of solving the problem by using massive
copper list manipulation instead of massive blitting.  I present one such
idea for your consideration.

Ideally, as you say, you want to draw the bits as few times as possible.  If
it were possible, then it would be nice to allocate a bitmap slightly larger
than the screen, and simply construct the off-portion of the screen, then
scroll it on to the screen as needed by playing with the copper list.  
Unfortunately, this would result in the memory area covered by the bitmap 
"creeping" all over the place, and in the end it would be no better than 
just allocating the whole danged bitmap in the first place.

So, what if we do this.  First, divide the bitmap into horizontal slices, say
20 scan lines high for starters.  Allocate two extra slices, one for offscreen
at the top, one for offscreen at the bottom.  Unfortunately, the slices would
have to be the *full width* of the superbitmap.  A bummer, but if the region
around which you want to scroll is more vertically oriented, then we're still
coming out ahead.  Anyway, we create a copper list that successively loads in 
the address of the each succeeding slice into the bitplane pointers, creating 
a smooth contiguous display.

	Question #1:  Can the copper reload just the bitplane pointers
	    quickly enough to avoid blank scan lines between slices?  If not,
	    then we're hosed.

Now, horizontal scrolling is achieved with "standard" copper list twiddling.
Vertical scrolling, on the other hand, is achieved by creating (via tiles,
or algorithms, or who knows what) the next slice to be scrolled onto the screen,
and integrating it into the copper lists.  When a slice gets scrolled
completely off the edge of screen, it can be freed up and reused.  It may
be desirable to leave some sort of "buffer zone" at the top and bottom if you
don't know which direction you'll be needing to scroll.

So, the end result is that you have created a circular queue of display slices
which allows the display memory to be run like a treadmill.  The height of each
slice would determine how long you have to create the off-screen slice, but
also how much you have to create.  Optimal size would depend on the
application.

I am convinced that if pulled off correctly, this could make for absolutely
lightning fast scrolling of an arbitrarily tall but finitely wide field, with
many bitplanes.

	Question #2a:  Is this feasible?  I know I could do it on an Atari 800
	    using display lists, but my copper list experience is much 
 	    more limited.  I can't believe that any of Jay Miner's older 
	    machines could do something the newer ones couldn't...

	Question #2b:  If it is feasible, has it been done?

	Question #2c:  Is this what you were talking about all along Leo? ;-)

Well, that's all for today.  Whew!

 /.- -- .. --. .- .-. ..- .-.. . ... .- -- .. --. .- .-. ..- .-.. . ...\
/ Neil Weinstock | att!cord!nsw     | "One man's garbage is another     \
\ AT&T Bell Labs | nsw@cord.att.com | man's prune danish." - Harv Laser /
 \.- -- .. --. .- .-. ..- .-.. . ... .- -- .. --. .- .-. ..- .-.. . .../

jesup@cbmvax.UUCP (Randell Jesup) (03/15/89)

In article <736@cord.UUCP> nsw@cord.UUCP (Neil Weinstock) writes:
>	Question #1:  Can the copper reload just the bitplane pointers
>	    quickly enough to avoid blank scan lines between slices?  If not,
>	    then we're hosed.

	I dunno.  Try it.  It may depend on bitplanes/resolution.

>So, the end result is that you have created a circular queue of display slices
>which allows the display memory to be run like a treadmill.  The height of each
>slice would determine how long you have to create the off-screen slice, but
>also how much you have to create.  Optimal size would depend on the
>application.

	Sounds neat.

-- 
Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup

ditto@cbmvax.UUCP (Michael "Ford" Ditto) (03/15/89)

[ Changed to .tech only ]

In article <736@cord.UUCP> nsw@cord.UUCP (Neil Weinstock) writes:
>	Question #1:  Can the copper reload just the bitplane pointers
>	    quickly enough to avoid blank scan lines between slices?  If not,
>	    then we're hosed.

It can.  The Amix console driver does this.  It only uses one bitplane
at the moment, so to load several planes it might have to WAIT for a
slightly earlier point.  Currently it waits for the beginning of the
line.  (BTW, if it for some reason couldn't load them fast enough, the
result would not be blank lines, but some extra continuation of the
previous "slice"'s bitmap, and the current slice would be shifted to
the right.)

>Now, horizontal scrolling is achieved with "standard" copper list twiddling.

(Hopefully this means by modifying the places in the copper list where the
bitplane-start and horizontal-scroll registers are set.)

>Vertical scrolling, on the other hand, is achieved by creating (via tiles,
>or algorithms, or who knows what) the next slice to be scrolled onto the screen,
>and integrating it into the copper lists.  When a slice gets scrolled
>completely off the edge of screen, it can be freed up and reused.  It may
>be desirable to leave some sort of "buffer zone" at the top and bottom if you
>don't know which direction you'll be needing to scroll.

The Amix console driver does this with a few variations:  A "slice" is one
scanline (i.e. it doesn't really use slices, it just starts the display
at whatever line it feels like).  Since each slice (except the first one
in memory) happens to immediately follow the previous one in memory, no
copper action is necessary between slices except when the "last" slice
wraps around into the first.  This means the copperlist really only needs
one change to support this scrolling system:  At what would have been the
end, it must WAIT for the spot where the memory bitmap will wrap around,
and just MOVE the new BPLxPT values.  The current version does not have
any "buffer zone" -- exactly the number of pixels that will be displayed
are allocated.  Since the modified copper list is only "installed" at
the top of the next field, this means that it is possible that it might
be scribbling in the part of bitmap that is supposed to be at the
bottom of the screen (and will be there after the new copper list takes
effect next field) but is currently displayed at the very top.  This is
actually noticable sometimes.

>So, the end result is that you have created a circular queue of display slices
>which allows the display memory to be run like a treadmill.  The height of each
>slice would determine how long you have to create the off-screen slice, but
>also how much you have to create.

The height of a "slice" actually has no effect, it's the amount of "buffer
zone" that determines how far you can stay ahead.  The "slice" essentially
becomes an arbitrary, variable thing equal to however much you scroll in
a given field.  For a video game you probably have to finish the update by
the next frame, so it basically limits how many pixels you can scroll per
frame without having some possibility of re-using some memory that is
still being displayed.  If you're double buffering, you wouldn't need any
buffer zone.

>I am convinced that if pulled off correctly, this could make for absolutely
>lightning fast scrolling of an arbitrarily tall but finitely wide field, with
>many bitplanes.

It does.  Type "cat /etc/termcap" and you'll be dizzy real quick.
-- 
					-=] Ford [=-

"The number of Unix installations	(In Real Life:  Mike Ditto)
has grown to 10, with more expected."	ford@kenobi.cts.com
- The Unix Programmer's Manual,		...!sdcsvax!crash!kenobi!ford
  2nd Edition, June, 1972.		ditto@cbmvax.commodore.com

nsw@cord.UUCP (Neil Weinstock) (03/17/89)

In article <6283@cbmvax.UUCP> ditto@cbmvax.UUCP (Michael "Ford" Ditto) writes:
>In article <736@cord.UUCP> nsw@cord.UUCP (Neil Weinstock) writes:
>>	Question #1:  Can the copper reload just the bitplane pointers
>>	    quickly enough to avoid blank scan lines between slices?  If not,
>>	    then we're hosed.
>
>It can.  The Amix console driver does this.  It only uses one bitplane
>at the moment, so to load several planes it might have to WAIT for a
>slightly earlier point.  Currently it waits for the beginning of the
>line.  (BTW, if it for some reason couldn't load them fast enough, the
>result would not be blank lines, but some extra continuation of the
>previous "slice"'s bitmap, and the current slice would be shifted to
>the right.)

Ahh, of course.  Given your scheme described below, it probably wouldn't even
be bothersome (maybe even not noticeable).

This Amix console driver only works if you're not running with windows or 
anything, right?  I can't even imagine a way to extend this to working with
windows....

[ ... ]
>>Vertical scrolling, on the other hand, is achieved by creating (via tiles,
>>or algorithms, or who knows what) the next slice to be scrolled onto the screen,
>>and integrating it into the copper lists.  When a slice gets scrolled
>>completely off the edge of screen, it can be freed up and reused.  It may
>>be desirable to leave some sort of "buffer zone" at the top and bottom if you
>>don't know which direction you'll be needing to scroll.
>
>The Amix console driver does this with a few variations:  A "slice" is one
>scanline (i.e. it doesn't really use slices, it just starts the display
>at whatever line it feels like).  Since each slice (except the first one
>in memory) happens to immediately follow the previous one in memory, no
>copper action is necessary between slices except when the "last" slice
>wraps around into the first.  This means the copperlist really only needs
>one change to support this scrolling system:  At what would have been the
>end, it must WAIT for the spot where the memory bitmap will wrap around,
>and just MOVE the new BPLxPT values.  The current version does not have

I hope I would have thought of this if I actually had started to work on it
in detail.  So, if memory is a sheet of paper, you've simply rolled it into
a cylinder, and only need copper list stuff at the seam.  Neat.

[ ... ]
>>...  The height of each
>>slice would determine how long you have to create the off-screen slice, but
>>also how much you have to create.
>
>The height of a "slice" actually has no effect, it's the amount of "buffer
>zone" that determines how far you can stay ahead.  The "slice" essentially
>becomes an arbitrary, variable thing equal to however much you scroll in
>a given field.  For a video game you probably have to finish the update by
>the next frame, so it basically limits how many pixels you can scroll per
>frame without having some possibility of re-using some memory that is
>still being displayed.  If you're double buffering, you wouldn't need any
>buffer zone.

My feeling is that there are certain scenarios where you might want to build
a chunk of bitmap at a time, rather than scanline by scanline, even though you
are only scrolling one (or two or whatever) scanline at a time.  However,
given the fact that the notion of slices is really superfluous when using the
technique you described above, the amount of buffer zone becomes a simple 
thing to vary, without affecting anything else seriously.

>>I am convinced that if pulled off correctly, this could make for absolutely
>>lightning fast scrolling of an arbitrarily tall but finitely wide field, with
>>many bitplanes.
>
>It does.  Type "cat /etc/termcap" and you'll be dizzy real quick.

I'd love to get dizzy.  Care to send me a 2500UX? for evaluation? ;-) ;-) ;-)
Seriously, assuming that the text routines are quick, I can't imagine there
being a faster bitmapped scrolling screen anywhere (within limits).  Might 
make an impressive demo all by itself.

I must say it is refreshing to learn that some harebrained scheme I've come 
up with actually works and is being used somewhere.  I hope I get to write a 
scrolling game someday so I can try this all out...

 /.- -- .. --. .- .-. ..- .-.. . ... .- -- .. --. .- .-. ..- .-.. . ...\
/ Neil Weinstock | att!cord!nsw     | "One man's garbage is another     \
\ AT&T Bell Labs | nsw@cord.att.com | man's prune danish." - Harv Laser /
 \.- -- .. --. .- .-. ..- .-.. . ... .- -- .. --. .- .-. ..- .-.. . .../

w-colinp@microsoft.UUCP (Colin Plumb) (03/18/89)

I'm pretty sure this is feasable.  Handling multiple bitplanes is
tricky (you have to change them all in one scan line), but I think
it can be done with some ingenuity...

Don't bother with the slices.  Each bitplane is as tall as the screen,
plus some slop for off-screen rendering.  Initially, you start the
display at row 0 and it goes to row n-1.  Rows n through m-1 are
off-screen.  Then, as you scroll down, rows 0, 1, 2, 3... fall off the
top, and rows n, n+1, n+2, n+3... appear on the bottom.  When
row m would be needed, stick a copper instruction into the display
list to reset the memory pointer to row 0.  Then this instruction moves
up the copper list until it gets to the top of the screen and we're
back where we started.

For multiple bitplanes, stagger them so the copper instructions are between
different scan lines.  Thus, only two registers (BPLxPTH and BPLxPTL,
the high and low halves of the pointer) have to be reset in one scan
line.  Seems doable.

To render into this bitplane, you can write your own graphics routines
that understand the need to wrap around, or do nasty things to the
layers.library and let it worry about the discontinuity in the
circular buffer.

You can also use this to scroll horizontally, although you have to be more
careful - I doubt you can jump from end to beginning of your circular
buffer in mid-scan-line.

Maybe someone more into graphics hacking than I (Leo?  Have you finished
Onion yet?) can write some sample code.
-- 
	-Colin (uunet!microsoft!w-colinp)

"Don't listen to me.  I never do." - The Doctor