[comp.sys.atari.st] TRAP handler question

jafischer@lily.waterloo.edu (Jonathan A. Fischer) (03/27/88)

	This will undoubtedly reveal the vast storehouse of ignorance I have
at my fingertips, but here goes anyway...

	Since "Tempus" came out, it seems that countless programmers took it
as a personal challenge to write some screen routines that were as fast. 
Myself included.  Well, writing the printing routines wasn't much trouble (a
nice intro to 68000 assembly language programming, I felt), but in trying to
hook the thing into the GEMDOS trap handler so that it catches Cconws() calls,
I've come across a pretty strange bug.  Hopefully someone can help me out. 

	I use oldvect = Setexc(...) to point TRAP #1 to my trap handler with
no problem.  In the trap handler, I check 6(sp) for the function number.  This
is where the problem is... 
==============================================================================
fp_handler_:
	cmpi.w	$9,6(sp)	/ Never results in "true" condition.
	beq	is_Cconws

	movea.l oldvect,a0	/ Not Cconws(), so jump to GEMDOS.
	jmp	(a0)

is_Cconws:
	move.l	(sp),4(sp)	/ Overwrite GEMDOS function number with return
	lea	4(sp),sp	/ address.
	jmp	fastprt		/ Jump to the fast print routine.
==============================================================================
	However, as I said, the test never results in the jump to is_Cconws.
I examined the stack with 'db', MWC's debugger, and the value '9' is certainly
on the stack (although I don't know why it's at 6(sp) rather than 4(sp)...),
but here's the weird part: this is what the stack looks like:
==============================================================================
	BEFORE executing "cmpi.w ...":
	sp -->	XXXX XXXX 0000 0009 YYYY YYYY ...
				    ^^^^^^^^^ pointer to string
			  ^^^^^^^^^	      GEMDOS function # (why long &
			  			not int?)
		^^^^^^^^^		      return address

	AFTER executing "cmpi.w ...":
	sp -->	XXXX XXXX CE21 0001 YYYY YYYY ...
==============================================================================
	That is, the cmpi instruction seems to be changing memory contents!
(I know that that can't be what's happening, so what's going on?)

	The "CE21 0001" is garbage, & the first word changes everytime (the
0001 doesn't).  But why does it change from 0000 0009?  Is it that there are
still interrupts going on that change that area of memory?  (Like, maybe
debugging a TRAP is a no-no?)  Or what?  I am completely lost.

	I have ACACHE, so I had a quick look at its source, but at first
glance I couldn't see anything that it's doing differently.  Maybe I'll
examine in more detail.

	Actually, I'll probably end up using a different method, since this
will only speed up Cconws(), and a lot of programmers use Bconout(1, character)
in a loop, since it's faster than Cconws().  Ideally, I'd like to have ALL
screen text output (non-VDI) go through the fastprt() routine.	If someone can
tell me how to do this (without checking explicitly for one of Cconout(),
Cconws(), Bconout(1, ch), and Fwrite() to stdout), then I'd appreciate it.

	(And yes, I know I'll have to add a test for stdout being redirected to
a file.)
--
	- Jonathan A. Fischer,    jafischer@lily.waterloo.edu
...{ihnp4,allegra,decvax,utzoo,utcsri}!watmath!lily!jafischer
"Happiness is a dead lobber."
	- motto of a "Gauntlet" player.

fred@pnet01.cts.com (Fred Brooks) (03/28/88)

I think you should check the stack to see if you were in super or user mode
when you called the trap. If called from user mode check the usp for your
function number and whatever args. Look in the superstack where the SR was
saved can check the super bit (" BTST 5, (a7)") as the status is saved along
with the pc with every TRAP call.

UUCP: {cbosgd hplabs!hp-sdd sdcsvax nosc}!crash!pnet01!fred
ARPA: crash!pnet01!fred@nosc.mil
INET: fred@pnet01.cts.com

david@bdt.UUCP (David Beckemeyer) (03/29/88)

In article <3738@watcgl.waterloo.edu> jafischer@lily.waterloo.edu (Jonathan A. Fischer) writes:
>	This will undoubtedly reveal the vast storehouse of ignorance I have
>at my fingertips, but here goes anyway...
>
>	Since "Tempus" came out, it seems that countless programmers took it
>as a personal challenge to write some screen routines that were as fast. 
>Myself included.  Well, writing the printing routines wasn't much trouble (a
>nice intro to 68000 assembly language programming, I felt), but in trying to
>hook the thing into the GEMDOS trap handler so that it catches Cconws() calls,
>I've come across a pretty strange bug.  Hopefully someone can help me out. 
>
>	I use oldvect = Setexc(...) to point TRAP #1 to my trap handler with
>no problem.  In the trap handler, I check 6(sp) for the function number.  This
>is where the problem is... 
>==============================================================================
>fp_handler_:
>	cmpi.w	$9,6(sp)	/ Never results in "true" condition.
>	beq	is_Cconws
>
>	movea.l oldvect,a0	/ Not Cconws(), so jump to GEMDOS.
>	jmp	(a0)
>
>is_Cconws:
>	move.l	(sp),4(sp)	/ Overwrite GEMDOS function number with return
>	lea	4(sp),sp	/ address.
>	jmp	fastprt		/ Jump to the fast print routine.
>==============================================================================

I don't want Jonathon to feel as though I'm flaming him.  This is a posting
that hopefully everybody will find informative.

First, besides from the "bug" you're describing, it's never gonna work to
catch Cconws() like this (as you mention), but I'll get to that later.

GEMDOS is entered via a TRAP #1 instruction.  The 68000 has a supervisor and
user mode, each with separate stacks.  The 68000 exception stack frame (like
from an Interrupt or Trap) looks like this:

	(sp) 	-> word SR (status register)
	2(sp)	-> long return PC (program return address)

When an exception is generated, the 68000 pushes the SR register and the
current PC onto the supervisor stack, regardless of whether the processor was
running in user or supervisor mode at the time.  SP (or A7) addresses the
Supervisor stack when the trap handler is entered. There is a bit in the SR
on the stack that indicates which mode the CPU was running in.  It can be
checked like this:

	btst.b	#5,(sp)
	bne	in_super

in_user:
	move.l	usp,a0		* arguments are on User Stack
	bra	go_on

in_super:
	lea	6(sp),a0	* arguments are on Supervisor Stack

go_on:
	cmp.w	#9,(a0)		* look at GEMDOS func. code
	beq	is_Conws


The above code will load the A0 register with a pointer to the GEMDOS
arguments (including the function #).  For Cconws() that means:

	(a0)	-> word 9 (Conws func. code)
	2(a0)	-> long address of string to print.

And now for the real problem....

>	Actually, I'll probably end up using a different method, since this
>will only speed up Cconws(), and a lot of programmers use Bconout(1, character)
>in a loop, since it's faster than Cconws().  Ideally, I'd like to have ALL
>screen text output (non-VDI) go through the fastprt() routine.	If someone can
>tell me how to do this (without checking explicitly for one of Cconout(),
>Cconws(), Bconout(1, ch), and Fwrite() to stdout), then I'd appreciate it.
>
>	(And yes, I know I'll have to add a test for stdout being redirected to
>a file.)

Exactly.  Cconws() does NOT simply mean "print to the screen".  It means
"write a string to GEMDOS standard output (handle 1)". You can't just catch
the Cconws() call to make a real output handler.  You must deal with anything
that uses handle 1, including watching for redirection with Fforce(), or
Fwrite()'s to handle 1.

Catching Bconout(2, c) and Bcostat(2) is probably the best solution to
catching all non VDI characters.  It's clean, but you will have to interpret
the vt-52 escape codes if you want to remain compatible.  The trap handling
idea above will work in a similar manner to catch the Bios (trap #13) calls.
Catching the BIOS calls will have the best chance for compatibility with the
most programs.  Messing around at the GEMDOS level could cause lot's of things
to break.

The vt-52 escape codes that the ST uses are documented in the Atari
"Hitchhikers Guide to the BIOS".
-- 
David Beckemeyer			| "Yuh gotta treat people jes' like yuh	
Beckemeyer Development Tools		| do mules. Don't try to drive 'em. Jes'
478 Santa Clara Ave, Oakland, CA 94610	| leave the gate open a mite an' let 'em
UUCP: ...!ihnp4!hoptoad!bdt!david 	| bust in!"

wes@obie.UUCP (Barnacle Wes) (03/29/88)

In article <2739@crash.cts.com>, fred@pnet01.cts.com (Fred Brooks) writes:
> I think you should check the stack to see if you were in super or user mode
> when you called the trap.

You're close here, Fred.  A trap always puts you in Super mode, so you
will have the status register on the stack.  Since you're in Super
mode, you're automagically using the ssp (as a7 or sp).  That's why
the function code is at 6(sp).  I don't know why the cmpi is changing
the data it looks at!

	Wes Peters
-- 
    /\              -  "Against Stupidity,  -    {backbones}!
   /\/\  .    /\    -  The Gods Themselves  -  utah-cs!utah-gr!
  /    \/ \/\/  \   -   Contend in Vain."   -  uplherc!sp7040!
 / U i n T e c h \  -       Schiller        -     obie!wes

t68@nikhefh.hep.nl (Jos Vermaseren) (03/30/88)

Reguarding the interception of trap 1 and inparticular Cconws to make the
screen faster:

One much faster way to write to the screen is the use of Bconout(5,c) which
is a poorly documented feature. It writes characters to the screen in a 'raw'
mode. This means that the VT52 escape codes are not interpreted and all
characters are written, including escapes, hex 00 etc. The controle codes
can then be done with Bconout(2,c). This gives an improvement of a factor
two or more over using Cconout and also an improvement of about a factor two
over Cconws. Much time is used in the filtering of the escape codes apparently.

To make another BIOS traphandler doesn't look like a solution, as most of
the screen writing time sits in the trap handler and the saving of registers
that comes with it.
The reason that some programs like Tempus have such a fast screen has several
reasons:
1:	The screen routines are user routines that are directly accessible
	for the program. They are written in such a way that it is not
	necessary to save registers.
2:	In Tempus the font files have been reshuffled to make the order of
	the pixels different from the GEM format.
3:	Absolutely no interpretation of tabs takes place. A good tab handler
	would slow the output down by about a factor two (guestimate).
Putting a trap handler inbetween would spoil much already. I think that Tempus
comes rather close to what is possible on the ST so any general solution
will never come close to it.

What is missing from the BIOS would be an equivalent of Cconws. Let us call that
Bconws. It would have to accept a device number and the number of characters
in the string because the raw output should not be stopped by a 0x00. That
would avoid the traphandler for most of the output. If you install such an
extra function your screen will become much faster than the Bconout(5,c) call.
All other BIOS calls will become a little slower then, unless you find a
version independent way to copy the BIOS jumptables at the installation of
your program.

Jos Vermaseren
T68@nikhefh.hep.nl

jafischer@lily.waterloo.edu (Jonathan A. Fischer) (03/31/88)

[]
	First off, thanks for your response, David.  Exactly why I posted the
question.  Oh yeah, and by "bug" I was really referring to a bug in the way I
was trying to do it, certainly not a bug on the ST's part.

	Since I posted the question, I've hardly had any time for further
twiddling, but I did discover that in fact, a7 was pointing to the supervisor
stack, although db.prg, after stopping the program at the breakpoint, had a7
pointing to the user mode stack.  That's where the main confusion lay.  I did
a dump of the supervisor stack and discovered that that's what my program was
examining.  And the words that were changing after the 'cmpi' instruction were
_from_ the supervisor stack.  I.e., when the breakpoint was reached, telling
db.prg to print out the stack (by saying "a7,6?x") resulted in:
	XXXX XXXX 0000 0009 YYYY YYYY
and after stepping through the 'cmpi' instruction, saying "a7,6?x" showed:
	XXXX XXXX AAAA AAAA YYYY YYYY,
where AAAA AAAA is the second longword from the supervisor stack, not the user
stack.  Weird or what?

	Well, I'll see what I can do as to actually trapping everything that
prints to the screen, rather than trapping Cconws().  And I'm implementing the
VT52 escapes in a lazy way, in that I just pass them to the real Cconws(). 
Didn't want to reinvent the wheel as far as that goes.
--
	- Jonathan A. Fischer,    jafischer@lily.waterloo.edu
...{ihnp4,allegra,decvax,utzoo,utcsri}!watmath!lily!jafischer
	Pascal SUCC()'s.

K538915@CZHRZU1A.BITNET (03/31/88)

Ah! For once Jos and I in total agreement...
Anyway I was going to write something along the same lines, so now just
a few comments:

t68@nikhefh.hep.nl (Jos Vermaseren) writes:
>To make another BIOS traphandler doesn't look like a solution, as most of
>the screen writing time sits in the trap handler and the saving of registers
>that comes with it.

Most of the trivial BIOS calls take 0.08 to 0.1 mS (measured with calls
from CCD/OOS Pascal and assembler), so the time spent just doing the trap
is in the same order of magnitude as a fast screen output routine.

>The reason that some programs like Tempus have such a fast screen has several
>reasons:
..........
>2:      In Tempus the font files have been reshuffled to make the order of
>        the pixels different from the GEM format.

As far as I know they use DEGAS format fonts, which means you can address
the scanlines of the font with the address register relative with post-
increment addressing mode (with other words: you save an adda.w instruction).
DEGAS format fonts naturally have other limitations.

>What is missing from the BIOS would be an equivalent of Cconws. Let us call tha
>Bconws. It would have to accept a device number and the number of characters
>in the string because the raw output should not be stopped by a 0x00. That
>would avoid the traphandler for most of the output....................

It yould be nice if such a thing existed for input too.

>..................................................If you install such an
>extra function your screen will become much faster than the Bconout(5,c) call.
>All other BIOS calls will become a little slower then, unless you find a
>version independent way to copy the BIOS jumptables at the installation of
>your program.

And Tempus naturally (well at least I assume so) doesn't do single character
output, it outputs strings.

The problem with BIOS traps slowing things down is a problem in UniTerm too,
one thing I did was to pack as much as possible in to one call of the XBIOS
supexec function: it returns the number of characters in the RS232 buffer,
if a key has been pressed, the state of the DCD and Break bits and a mouse
status bit, all with the overhead of one trap. On the other hand there is
no way  to  read more than one character at a time from the serial port
without rewriting the RS232 handler, so you still have to do one trap per
character. Matter of fact the most annoying thing is, that due to the fact
that the kbshift variable isn't accessible legaly in pre Blitter-TOS ROM's,
UniTerm spends a bl**dy 0.08 mS per main loop just finding out if somebody
pressed <CapsLock>.


                                Simon Poole
                        UUCP:   ...mcvax!cernvax!forty2!poole
                        Bitnet: K538915@CZHRZU1A

jafischer@lily.waterloo.edu (Jonathan A. Fischer) (04/01/88)

In article <444@nikhefh.hep.nl> t68@nikhefh.hep.nl (Jos Vermaseren) writes:
>The reason that some programs like Tempus have such a fast screen has several
>reasons:
>1:	The screen routines are user routines that are directly accessible
>	for the program. They are written in such a way that it is not
>	necessary to save registers.

	The saving of registers actually represents a very tiny fraction of
the time used by a typical "fastprt()" (as mine is called) routine.  This is
because fastprt() is called with a whole string, rather than a character at a
time.  The bulk of the time consumed is the simple loop that loads a
character, finds the offset into the bitmap array, then copies 8 or 16 bytes
of bitmap data to the screen. 

>3:	Absolutely no interpretation of tabs takes place. A good tab handler
>	would slow the output down by about a factor two (guestimate).

	Actually, handling tabs adds perhaps 0.1% of overhead, with respect to
time taken.  The same goes for handling the ESC codes, too, since all that is
added is one compare instruction, as in:

	<load character into register>
	cmpi.b	$ESC,<register>
	bgt	not_ESC

	...	/* Processing of ESC, TAB, CR, LF, ... */

not_ESC:
	...	/* Regular character handling */

As you can see, if a given character is _not_ a control character (i.e.  it is
greater than 27), then all that is executed is the "cmpi.b" instruction.  This
adds essentially no time, when compared to the total time taken to print a
typical string. 

--
	- Jonathan A. Fischer,    jafischer@lily.waterloo.edu
...{ihnp4,allegra,decvax,utzoo,utcsri}!watmath!lily!jafischer
	Pascal SUCC()'s.

unpowell@csvax.liv.ac.uk (04/20/88)

	I've never seen or heard of this program Tempus, but one way I know
of to considerably speed up character output and one that may be used in
this program is to "hardware scroll" the screen.
	A factor I'd like to draw your attention to is that in all three screen
resolutions a row of characters occupies $500 or 1280 bytes of memory. Say
the screen is at location $70000.
	The physical screen position should first be put to $70000. All
character rows are printed normally until it comes time to scroll. Instead
of moving 32000-1280=30720 bytes to scroll the screen we instead add $500
to the physical screen position, thus displaying the area of memory one
character line further forward in memory i.e. $70500.
	Snag. We will eventually run out of memory if we keep moving the
screen onwards though memory. So we reserve an area approximately twice
the size of the standard screen, 64000 bytes say. When we are displaying
the last 32000 bytes of this block we then copy rows 1 to 24 (remember
row 0 is the top one) to the start of this extended screen area and then
display the first 32000 bytes of this area as the screen.
	Reserving 64000 bytes (enough for 50 character rows) will involve
moving 32000 bytes of memory every 25 scrolls. The standard BIOS routine
has to move (32000-1280)*25=768000 bytes every 25 scrolls. Quite a saving
in time for just an extra 32K screen memory.
	Using this method speeds up screen output fantastically.

	Mark Powell

********************************************************************************

 "...there's no success   JANET unpowell@uk.ac.liv.csvax
  like failure and        UUCP  {backbone}!mcvax!ukc!mupsy!liv-cs!unpowell
  failure's no success    ARPA  unpowell%csvax.liv.ac.uk@nss.cs.ucl.ac.uk
  at all..." B.Dylan

********************************************************************************

jpdres13@usl-pc.UUCP (John Joubert) (04/26/88)

-----------------------------------

Mark,

	How did you get the machine to scroll from a non-512 byte boundary?

In the past, when I changed the address of the logical screen I had to make
sure that I placed it on a 512 byte boundary.  When I did not I got a really
screwy screen if I did not crash.  How did you get around that?

----------------------------------------------------------------------------
John Joubert                         |     /\  |    /\    |     _ 
jpdres13@usl-pc.USL   or ...         |     \|<>|>|> \|<>|>|><`|`|
ut-sally!usl!usl-pc!jpdres13         |-----/|-------/|----------------------
GEnie: J.JOUBERT                     |     \/       \/
-----------------------------------------------------------------------------

hase@netmbx.UUCP (Hartmut Semken) (04/27/88)

In article <544@csvax.liv.ac.uk> unpowell@csvax.liv.ac.uk writes:
>
>	I've never seen or heard of this program Tempus, but one way I know
>of to considerably speed up character output and one that may be used in
>this program is to "hardware scroll" the screen.

And what about scrolling windows?
Tempus (new version coming Real Soon Now) supports 4 windows on screen.
One can be scrolled...
hase
-- 
Hartmut Semken, Lupsteiner Weg 67, 1000 Berlin 37 hase@netmbx.UUCP
I think, you may be right in what I think you're thinking. (Douglas Adams)

unpowell@csvax.liv.ac.uk (05/03/88)

In article <1204@usl-pc.UUCP>, jpdres13@usl-pc.UUCP (John Joubert) writes:
> -----------------------------------
> Mark,
> 
> 	How did you get the machine to scroll from a non-512 byte boundary?
> 
> In the past, when I changed the address of the logical screen I had to make
> sure that I placed it on a 512 byte boundary.  When I did not I got a really
> screwy screen if I did not crash.  How did you get around that?

	I'm not quite sure what your problem is. If you do displace the screen
by, say, 256 bytes while the desktop is being displayed you do get a "really
screwy screen". This is due to each raster line of the screen requiring
160 bytes in colour and 80 bytes in monochrome. So this displacment of the
screen puts the display somewhere in the "middle" of a "display line". The
first number that is a multiple of 160, 80 and 256 is 1280, which is
incidentally the amount of memory each character row (on a 25 row display)
occupies. Thus moving the screen by this amount, $500, will displace the
screen by an entire character row. With a careful bit of manipulation this
effect can be turned into a hardware screen scrolling routine.
	I don't know about the crashing, the STs not "that" bad.

	Mark Powell

********************************************************************************

 "...I hate the white	JANET unpowell@uk.ac.liv.csvax
  man, and the man	UUCP  {backbone}!mcvax!ukc!mupsy!liv-cs!unpowell
  who turned you all	ARPA  unpowell%csvax.liv.ac.uk@nss.cs.ucl.ac.uk
  loose..." R. Harper

********************************************************************************