jafischer@lily.waterloo.edu (Jonathan A. Fischer) (03/27/88)
This will undoubtedly reveal the vast storehouse of ignorance I have at my fingertips, but here goes anyway... Since "Tempus" came out, it seems that countless programmers took it as a personal challenge to write some screen routines that were as fast. Myself included. Well, writing the printing routines wasn't much trouble (a nice intro to 68000 assembly language programming, I felt), but in trying to hook the thing into the GEMDOS trap handler so that it catches Cconws() calls, I've come across a pretty strange bug. Hopefully someone can help me out. I use oldvect = Setexc(...) to point TRAP #1 to my trap handler with no problem. In the trap handler, I check 6(sp) for the function number. This is where the problem is... ============================================================================== fp_handler_: cmpi.w $9,6(sp) / Never results in "true" condition. beq is_Cconws movea.l oldvect,a0 / Not Cconws(), so jump to GEMDOS. jmp (a0) is_Cconws: move.l (sp),4(sp) / Overwrite GEMDOS function number with return lea 4(sp),sp / address. jmp fastprt / Jump to the fast print routine. ============================================================================== However, as I said, the test never results in the jump to is_Cconws. I examined the stack with 'db', MWC's debugger, and the value '9' is certainly on the stack (although I don't know why it's at 6(sp) rather than 4(sp)...), but here's the weird part: this is what the stack looks like: ============================================================================== BEFORE executing "cmpi.w ...": sp --> XXXX XXXX 0000 0009 YYYY YYYY ... ^^^^^^^^^ pointer to string ^^^^^^^^^ GEMDOS function # (why long & not int?) ^^^^^^^^^ return address AFTER executing "cmpi.w ...": sp --> XXXX XXXX CE21 0001 YYYY YYYY ... ============================================================================== That is, the cmpi instruction seems to be changing memory contents! (I know that that can't be what's happening, so what's going on?) The "CE21 0001" is garbage, & the first word changes everytime (the 0001 doesn't). But why does it change from 0000 0009? Is it that there are still interrupts going on that change that area of memory? (Like, maybe debugging a TRAP is a no-no?) Or what? I am completely lost. I have ACACHE, so I had a quick look at its source, but at first glance I couldn't see anything that it's doing differently. Maybe I'll examine in more detail. Actually, I'll probably end up using a different method, since this will only speed up Cconws(), and a lot of programmers use Bconout(1, character) in a loop, since it's faster than Cconws(). Ideally, I'd like to have ALL screen text output (non-VDI) go through the fastprt() routine. If someone can tell me how to do this (without checking explicitly for one of Cconout(), Cconws(), Bconout(1, ch), and Fwrite() to stdout), then I'd appreciate it. (And yes, I know I'll have to add a test for stdout being redirected to a file.) -- - Jonathan A. Fischer, jafischer@lily.waterloo.edu ...{ihnp4,allegra,decvax,utzoo,utcsri}!watmath!lily!jafischer "Happiness is a dead lobber." - motto of a "Gauntlet" player.
fred@pnet01.cts.com (Fred Brooks) (03/28/88)
I think you should check the stack to see if you were in super or user mode when you called the trap. If called from user mode check the usp for your function number and whatever args. Look in the superstack where the SR was saved can check the super bit (" BTST 5, (a7)") as the status is saved along with the pc with every TRAP call. UUCP: {cbosgd hplabs!hp-sdd sdcsvax nosc}!crash!pnet01!fred ARPA: crash!pnet01!fred@nosc.mil INET: fred@pnet01.cts.com
david@bdt.UUCP (David Beckemeyer) (03/29/88)
In article <3738@watcgl.waterloo.edu> jafischer@lily.waterloo.edu (Jonathan A. Fischer) writes: > This will undoubtedly reveal the vast storehouse of ignorance I have >at my fingertips, but here goes anyway... > > Since "Tempus" came out, it seems that countless programmers took it >as a personal challenge to write some screen routines that were as fast. >Myself included. Well, writing the printing routines wasn't much trouble (a >nice intro to 68000 assembly language programming, I felt), but in trying to >hook the thing into the GEMDOS trap handler so that it catches Cconws() calls, >I've come across a pretty strange bug. Hopefully someone can help me out. > > I use oldvect = Setexc(...) to point TRAP #1 to my trap handler with >no problem. In the trap handler, I check 6(sp) for the function number. This >is where the problem is... >============================================================================== >fp_handler_: > cmpi.w $9,6(sp) / Never results in "true" condition. > beq is_Cconws > > movea.l oldvect,a0 / Not Cconws(), so jump to GEMDOS. > jmp (a0) > >is_Cconws: > move.l (sp),4(sp) / Overwrite GEMDOS function number with return > lea 4(sp),sp / address. > jmp fastprt / Jump to the fast print routine. >============================================================================== I don't want Jonathon to feel as though I'm flaming him. This is a posting that hopefully everybody will find informative. First, besides from the "bug" you're describing, it's never gonna work to catch Cconws() like this (as you mention), but I'll get to that later. GEMDOS is entered via a TRAP #1 instruction. The 68000 has a supervisor and user mode, each with separate stacks. The 68000 exception stack frame (like from an Interrupt or Trap) looks like this: (sp) -> word SR (status register) 2(sp) -> long return PC (program return address) When an exception is generated, the 68000 pushes the SR register and the current PC onto the supervisor stack, regardless of whether the processor was running in user or supervisor mode at the time. SP (or A7) addresses the Supervisor stack when the trap handler is entered. There is a bit in the SR on the stack that indicates which mode the CPU was running in. It can be checked like this: btst.b #5,(sp) bne in_super in_user: move.l usp,a0 * arguments are on User Stack bra go_on in_super: lea 6(sp),a0 * arguments are on Supervisor Stack go_on: cmp.w #9,(a0) * look at GEMDOS func. code beq is_Conws The above code will load the A0 register with a pointer to the GEMDOS arguments (including the function #). For Cconws() that means: (a0) -> word 9 (Conws func. code) 2(a0) -> long address of string to print. And now for the real problem.... > Actually, I'll probably end up using a different method, since this >will only speed up Cconws(), and a lot of programmers use Bconout(1, character) >in a loop, since it's faster than Cconws(). Ideally, I'd like to have ALL >screen text output (non-VDI) go through the fastprt() routine. If someone can >tell me how to do this (without checking explicitly for one of Cconout(), >Cconws(), Bconout(1, ch), and Fwrite() to stdout), then I'd appreciate it. > > (And yes, I know I'll have to add a test for stdout being redirected to >a file.) Exactly. Cconws() does NOT simply mean "print to the screen". It means "write a string to GEMDOS standard output (handle 1)". You can't just catch the Cconws() call to make a real output handler. You must deal with anything that uses handle 1, including watching for redirection with Fforce(), or Fwrite()'s to handle 1. Catching Bconout(2, c) and Bcostat(2) is probably the best solution to catching all non VDI characters. It's clean, but you will have to interpret the vt-52 escape codes if you want to remain compatible. The trap handling idea above will work in a similar manner to catch the Bios (trap #13) calls. Catching the BIOS calls will have the best chance for compatibility with the most programs. Messing around at the GEMDOS level could cause lot's of things to break. The vt-52 escape codes that the ST uses are documented in the Atari "Hitchhikers Guide to the BIOS". -- David Beckemeyer | "Yuh gotta treat people jes' like yuh Beckemeyer Development Tools | do mules. Don't try to drive 'em. Jes' 478 Santa Clara Ave, Oakland, CA 94610 | leave the gate open a mite an' let 'em UUCP: ...!ihnp4!hoptoad!bdt!david | bust in!"
wes@obie.UUCP (Barnacle Wes) (03/29/88)
In article <2739@crash.cts.com>, fred@pnet01.cts.com (Fred Brooks) writes: > I think you should check the stack to see if you were in super or user mode > when you called the trap. You're close here, Fred. A trap always puts you in Super mode, so you will have the status register on the stack. Since you're in Super mode, you're automagically using the ssp (as a7 or sp). That's why the function code is at 6(sp). I don't know why the cmpi is changing the data it looks at! Wes Peters -- /\ - "Against Stupidity, - {backbones}! /\/\ . /\ - The Gods Themselves - utah-cs!utah-gr! / \/ \/\/ \ - Contend in Vain." - uplherc!sp7040! / U i n T e c h \ - Schiller - obie!wes
t68@nikhefh.hep.nl (Jos Vermaseren) (03/30/88)
Reguarding the interception of trap 1 and inparticular Cconws to make the screen faster: One much faster way to write to the screen is the use of Bconout(5,c) which is a poorly documented feature. It writes characters to the screen in a 'raw' mode. This means that the VT52 escape codes are not interpreted and all characters are written, including escapes, hex 00 etc. The controle codes can then be done with Bconout(2,c). This gives an improvement of a factor two or more over using Cconout and also an improvement of about a factor two over Cconws. Much time is used in the filtering of the escape codes apparently. To make another BIOS traphandler doesn't look like a solution, as most of the screen writing time sits in the trap handler and the saving of registers that comes with it. The reason that some programs like Tempus have such a fast screen has several reasons: 1: The screen routines are user routines that are directly accessible for the program. They are written in such a way that it is not necessary to save registers. 2: In Tempus the font files have been reshuffled to make the order of the pixels different from the GEM format. 3: Absolutely no interpretation of tabs takes place. A good tab handler would slow the output down by about a factor two (guestimate). Putting a trap handler inbetween would spoil much already. I think that Tempus comes rather close to what is possible on the ST so any general solution will never come close to it. What is missing from the BIOS would be an equivalent of Cconws. Let us call that Bconws. It would have to accept a device number and the number of characters in the string because the raw output should not be stopped by a 0x00. That would avoid the traphandler for most of the output. If you install such an extra function your screen will become much faster than the Bconout(5,c) call. All other BIOS calls will become a little slower then, unless you find a version independent way to copy the BIOS jumptables at the installation of your program. Jos Vermaseren T68@nikhefh.hep.nl
jafischer@lily.waterloo.edu (Jonathan A. Fischer) (03/31/88)
[] First off, thanks for your response, David. Exactly why I posted the question. Oh yeah, and by "bug" I was really referring to a bug in the way I was trying to do it, certainly not a bug on the ST's part. Since I posted the question, I've hardly had any time for further twiddling, but I did discover that in fact, a7 was pointing to the supervisor stack, although db.prg, after stopping the program at the breakpoint, had a7 pointing to the user mode stack. That's where the main confusion lay. I did a dump of the supervisor stack and discovered that that's what my program was examining. And the words that were changing after the 'cmpi' instruction were _from_ the supervisor stack. I.e., when the breakpoint was reached, telling db.prg to print out the stack (by saying "a7,6?x") resulted in: XXXX XXXX 0000 0009 YYYY YYYY and after stepping through the 'cmpi' instruction, saying "a7,6?x" showed: XXXX XXXX AAAA AAAA YYYY YYYY, where AAAA AAAA is the second longword from the supervisor stack, not the user stack. Weird or what? Well, I'll see what I can do as to actually trapping everything that prints to the screen, rather than trapping Cconws(). And I'm implementing the VT52 escapes in a lazy way, in that I just pass them to the real Cconws(). Didn't want to reinvent the wheel as far as that goes. -- - Jonathan A. Fischer, jafischer@lily.waterloo.edu ...{ihnp4,allegra,decvax,utzoo,utcsri}!watmath!lily!jafischer Pascal SUCC()'s.
K538915@CZHRZU1A.BITNET (03/31/88)
Ah! For once Jos and I in total agreement... Anyway I was going to write something along the same lines, so now just a few comments: t68@nikhefh.hep.nl (Jos Vermaseren) writes: >To make another BIOS traphandler doesn't look like a solution, as most of >the screen writing time sits in the trap handler and the saving of registers >that comes with it. Most of the trivial BIOS calls take 0.08 to 0.1 mS (measured with calls from CCD/OOS Pascal and assembler), so the time spent just doing the trap is in the same order of magnitude as a fast screen output routine. >The reason that some programs like Tempus have such a fast screen has several >reasons: .......... >2: In Tempus the font files have been reshuffled to make the order of > the pixels different from the GEM format. As far as I know they use DEGAS format fonts, which means you can address the scanlines of the font with the address register relative with post- increment addressing mode (with other words: you save an adda.w instruction). DEGAS format fonts naturally have other limitations. >What is missing from the BIOS would be an equivalent of Cconws. Let us call tha >Bconws. It would have to accept a device number and the number of characters >in the string because the raw output should not be stopped by a 0x00. That >would avoid the traphandler for most of the output.................... It yould be nice if such a thing existed for input too. >..................................................If you install such an >extra function your screen will become much faster than the Bconout(5,c) call. >All other BIOS calls will become a little slower then, unless you find a >version independent way to copy the BIOS jumptables at the installation of >your program. And Tempus naturally (well at least I assume so) doesn't do single character output, it outputs strings. The problem with BIOS traps slowing things down is a problem in UniTerm too, one thing I did was to pack as much as possible in to one call of the XBIOS supexec function: it returns the number of characters in the RS232 buffer, if a key has been pressed, the state of the DCD and Break bits and a mouse status bit, all with the overhead of one trap. On the other hand there is no way to read more than one character at a time from the serial port without rewriting the RS232 handler, so you still have to do one trap per character. Matter of fact the most annoying thing is, that due to the fact that the kbshift variable isn't accessible legaly in pre Blitter-TOS ROM's, UniTerm spends a bl**dy 0.08 mS per main loop just finding out if somebody pressed <CapsLock>. Simon Poole UUCP: ...mcvax!cernvax!forty2!poole Bitnet: K538915@CZHRZU1A
jafischer@lily.waterloo.edu (Jonathan A. Fischer) (04/01/88)
In article <444@nikhefh.hep.nl> t68@nikhefh.hep.nl (Jos Vermaseren) writes: >The reason that some programs like Tempus have such a fast screen has several >reasons: >1: The screen routines are user routines that are directly accessible > for the program. They are written in such a way that it is not > necessary to save registers. The saving of registers actually represents a very tiny fraction of the time used by a typical "fastprt()" (as mine is called) routine. This is because fastprt() is called with a whole string, rather than a character at a time. The bulk of the time consumed is the simple loop that loads a character, finds the offset into the bitmap array, then copies 8 or 16 bytes of bitmap data to the screen. >3: Absolutely no interpretation of tabs takes place. A good tab handler > would slow the output down by about a factor two (guestimate). Actually, handling tabs adds perhaps 0.1% of overhead, with respect to time taken. The same goes for handling the ESC codes, too, since all that is added is one compare instruction, as in: <load character into register> cmpi.b $ESC,<register> bgt not_ESC ... /* Processing of ESC, TAB, CR, LF, ... */ not_ESC: ... /* Regular character handling */ As you can see, if a given character is _not_ a control character (i.e. it is greater than 27), then all that is executed is the "cmpi.b" instruction. This adds essentially no time, when compared to the total time taken to print a typical string. -- - Jonathan A. Fischer, jafischer@lily.waterloo.edu ...{ihnp4,allegra,decvax,utzoo,utcsri}!watmath!lily!jafischer Pascal SUCC()'s.
unpowell@csvax.liv.ac.uk (04/20/88)
I've never seen or heard of this program Tempus, but one way I know of to considerably speed up character output and one that may be used in this program is to "hardware scroll" the screen. A factor I'd like to draw your attention to is that in all three screen resolutions a row of characters occupies $500 or 1280 bytes of memory. Say the screen is at location $70000. The physical screen position should first be put to $70000. All character rows are printed normally until it comes time to scroll. Instead of moving 32000-1280=30720 bytes to scroll the screen we instead add $500 to the physical screen position, thus displaying the area of memory one character line further forward in memory i.e. $70500. Snag. We will eventually run out of memory if we keep moving the screen onwards though memory. So we reserve an area approximately twice the size of the standard screen, 64000 bytes say. When we are displaying the last 32000 bytes of this block we then copy rows 1 to 24 (remember row 0 is the top one) to the start of this extended screen area and then display the first 32000 bytes of this area as the screen. Reserving 64000 bytes (enough for 50 character rows) will involve moving 32000 bytes of memory every 25 scrolls. The standard BIOS routine has to move (32000-1280)*25=768000 bytes every 25 scrolls. Quite a saving in time for just an extra 32K screen memory. Using this method speeds up screen output fantastically. Mark Powell ******************************************************************************** "...there's no success JANET unpowell@uk.ac.liv.csvax like failure and UUCP {backbone}!mcvax!ukc!mupsy!liv-cs!unpowell failure's no success ARPA unpowell%csvax.liv.ac.uk@nss.cs.ucl.ac.uk at all..." B.Dylan ********************************************************************************
jpdres13@usl-pc.UUCP (John Joubert) (04/26/88)
----------------------------------- Mark, How did you get the machine to scroll from a non-512 byte boundary? In the past, when I changed the address of the logical screen I had to make sure that I placed it on a 512 byte boundary. When I did not I got a really screwy screen if I did not crash. How did you get around that? ---------------------------------------------------------------------------- John Joubert | /\ | /\ | _ jpdres13@usl-pc.USL or ... | \|<>|>|> \|<>|>|><`|`| ut-sally!usl!usl-pc!jpdres13 |-----/|-------/|---------------------- GEnie: J.JOUBERT | \/ \/ -----------------------------------------------------------------------------
hase@netmbx.UUCP (Hartmut Semken) (04/27/88)
In article <544@csvax.liv.ac.uk> unpowell@csvax.liv.ac.uk writes: > > I've never seen or heard of this program Tempus, but one way I know >of to considerably speed up character output and one that may be used in >this program is to "hardware scroll" the screen. And what about scrolling windows? Tempus (new version coming Real Soon Now) supports 4 windows on screen. One can be scrolled... hase -- Hartmut Semken, Lupsteiner Weg 67, 1000 Berlin 37 hase@netmbx.UUCP I think, you may be right in what I think you're thinking. (Douglas Adams)
unpowell@csvax.liv.ac.uk (05/03/88)
In article <1204@usl-pc.UUCP>, jpdres13@usl-pc.UUCP (John Joubert) writes: > ----------------------------------- > Mark, > > How did you get the machine to scroll from a non-512 byte boundary? > > In the past, when I changed the address of the logical screen I had to make > sure that I placed it on a 512 byte boundary. When I did not I got a really > screwy screen if I did not crash. How did you get around that? I'm not quite sure what your problem is. If you do displace the screen by, say, 256 bytes while the desktop is being displayed you do get a "really screwy screen". This is due to each raster line of the screen requiring 160 bytes in colour and 80 bytes in monochrome. So this displacment of the screen puts the display somewhere in the "middle" of a "display line". The first number that is a multiple of 160, 80 and 256 is 1280, which is incidentally the amount of memory each character row (on a 25 row display) occupies. Thus moving the screen by this amount, $500, will displace the screen by an entire character row. With a careful bit of manipulation this effect can be turned into a hardware screen scrolling routine. I don't know about the crashing, the STs not "that" bad. Mark Powell ******************************************************************************** "...I hate the white JANET unpowell@uk.ac.liv.csvax man, and the man UUCP {backbone}!mcvax!ukc!mupsy!liv-cs!unpowell who turned you all ARPA unpowell%csvax.liv.ac.uk@nss.cs.ucl.ac.uk loose..." R. Harper ********************************************************************************