[comp.sys.ibm.pc] Screen-writing speed & snow elimination

dmt@mtunb.ATT.COM (Dave Tutelman) (03/13/89)

In the past couple of months, there have been a number of notes
(and responses) on the subjects of:
   -	Snow on the screen, and code to eliminate it.
   -	Why the BIOS is so slow.
   -	Fast routines to write to the screen.

I've recently had the occasion to check the performance of various
screen-writing routines (including the BIOS).  The attached paper
is (1) a tutorial on snow elimination techniques, and (2) the results
of my performance measurements.

Enjoy!

+---------------------------------------------------------------+
|    Dave Tutelman						|
|    Physical - AT&T Bell Labs  -  Lincroft, NJ			|
|    Logical -  ...att!mtunb!dmt				|
|    Audible -  (201) 576 2442					|
+---------------------------------------------------------------+






                    _S_N_O_W-_F_R_E_E _S_C_R_E_E_N _W_R_I_T_I_N_G _v_s. _V_I_D_E_O _B_I_O_S:
                         _T_U_T_O_R_I_A_L _A_N_D _P_E_R_F_O_R_M_A_N_C_E _T_E_S_T_S

                                 Dave Tutelman
                                16 Tilton Drive
                               Wayside, NJ 07712

                                (201) 922 - 9576










                      1. Principles of Snow Elimination
                        1.1. Theory
                        1.2. Practice


                      2. Performance
                        2.1. Test Measurements
                          2.1.1. Basic Measurements
                          2.1.2. Effect of Snow Elimination
                          2.1.3. Effect of String Writes
                          2.1.4. Effect of Pointer Computation
                        2.2. Why is BIOS so Slow?



























       Screen Writing                                           2-13-89





                                     - 2 -


       _1. _P_r_i_n_c_i_p_l_e_s _o_f _S_n_o_w _E_l_i_m_i_n_a_t_i_o_n

       If you write programs for MSDOS PCs, you face an interesting
       dilemma: how to write to the screen.
         - If you use the BIOS, you will take a performance hit; it's
           _s_l_o_w.
         - If you write directly to video RAM to speed it up, you have
           to write different code for each kind of video display. And
           some displays have an added difficulty, "snow", which is
           notoriously hard to eliminate.

       Snow is the visual noise that appears on the screen of certain
       displays when a program reads or writes to the video RAM. The CGA
       (IBM Color Graphics Adapter) is particularly snowy, but is hardly
       the only offender.

       This note discusses where snow comes from, and how to eliminate
       it by writing to video RAM during retrace. It also gives some
       detailed performance measurements that show how much speed can be
       gained by avoiding the BIOS calls; improvements of a factor of
       ten are common, and the gain can be as high as a factor of
       seventy.

       _1._1. _T_h_e_o_r_y

       When IBM introduced the Color Graphics display adapter (CGA),
       they made an unfortunate design decision. A display adapter needs
       to read from its memory "as needed" by the raster sweep, and
       write to its memory "as needed" by the CPU. To save money on the
       board, they didn't do this with a true dual-port memory; instead,
       they allowed the CPU to take precedence over the raster when they
       access the video RAM on the same memory cycle. During such
       cycles, the video generator can't read from memory, and doesn't
       know what video signal to put out. So it guesses, and usually
       wrong (hey, what's the chance of getting eight bits all right).
       Wrong guesses look like snow on the screen.

       IBM overcame this hardware deficiency in software. They wrote
       their video driver in the BIOS so that it writes to memory only
       when the video beam is turned off; At the end of each horizontal
       line on the screen, the adapter turns off the beam and allows it
       to _r_e_t_r_a_c_e back to the left edge of the screen to begin the next
       horizontal line. It is possible to do a "snowless" write to video
       memory if you do it only during the retrace. You can also use the
       vertical retrace (which is of much longer duration), while the
       beam is returned from the bottom of the screen to the top.

       The BIOS writes to the screen only during horizontal or vertical
       retrace, and so can your program. It is possible to tell when the
       display is retracing, because a bit in the display adapter's
       status register is 1 during horizontal retrace, and another bit
       is 1 during vertical retrace.

       Let's show some actual code to do such a write. Our first example
       will be simple (really naive); in other words, it looks good but
       doesn't work. We will evolve our code until we have a working

       Screen Writing                                           2-13-89





                                     - 3 -


       snowless write.

       Suppose we want to write a word _v_i_d_w_o_r_d to video RAM. We've saved
       the offset in video RAM in the variable _v_i_d_o_f_f_s_e_t. (We'll use
       conventions of the "C" language, and Turbo C where we can't say
       things "portably".) We know that:
         - The segment part of the base address of the CGA is 0xB800.
         - The CGA'a status register is at input port 0x3DA.
         - The horizontal sweep bit is bit 0; the vertical sweep bit is
           bit 1. Thus the combined sweep mask on the status byte is
           0x09.
       Thus the C code to do a snow-free write might be:

               /* Just spin until a retrace bit turns on. */
               while (( inportb (0x3DA) & 0x09 ) == 0 )   { };


               /* We're in retrace; write it. */
               poke ( 0xB800, vidword, vidoffset );

       Might be, but isn't. Unfortunately, probability says you'll
       encounter a horizontal retrace much more often than a vertical
       retrace, and the horizontal one lasts a very short time. The code
       above looks tight, but it's not nearly tight enough; by the time
       it actually writes to video RAM, the retrace will be over. If you
       ever encounter a program where the "snow" is in a vertical band a
       fixed distance from the left edge of the screen, you'll be
       observing a failed attempt at snow elimination. Instead of
       removing the snow, it just _s_y_n_c_h_r_o_n_i_z_e_s it to the horizontal
       sweep.

       The table below shows the sweep characteristics of a CGA display.
       Other displays have similar characteristics, varying from the CGA
       by a factor of less than two.

                                    Horizontal        Vertical
                                    -----------     -------------
            Sweeps per second         15,750             60
            Duration of retrace     10 microsec     2158 microsec

       Thus the horizontal retrace is a more attractive target (we can
       write to the screen more frequently), but a much harder one to
       hit than the vertical retrace. The next section shows the
       programming techniques to catch the retrace as frequently as
       possible, in order to maximize our throughput to the screen.

       Before proceeding, however, it's worth mentioning another
       technique for snow elimination: turning off the video altogether
       for a short burst of screen writes. The video can be disabled by
       simply turning off a bit in the mode register of the display
       adapter. It can stay off for the time required to write about 250
       characters (three lines on the screen) according to Brad Davidson
       (Usenet message number 1884@druxq.UUCP, August 6, 1985). However,
       it's not clear how often you can do this before the flicker
       becomes annoying. Since you have to wait for vertical retrace to
       start, you couldn't possibly do this more than 30 times a second

       Screen Writing                                           2-13-89





                                     - 4 -


       (assuming you could get away with turning off the display every
       other frame); that's a maximum rate of 7500 characters per
       second. The use of retrace gives about the same rate, but without
       the program complexity of having to buffer three lines and send a
       burst to the screen. For this reason, I won't discuss it any
       further here.

       _1._2. _P_r_a_c_t_i_c_e

       I'll start off assuming that you want your code to run across a
       variety of DOS machines, including 4.77 MHz PCs and clones. If
       you're going to try for the horizontal retrace, you have to
       detect it and use it in under 10 microseconds (less than 50
       machine cycles on a "slow" PC or XT).

       You will need to program it in assembler. Pascal or C won't give
       you the speed to catch the horizontal retrace. The reason is that
       you must point a register pair (say, ES:DI, so we can use a
       "string-move" instruction) at the target area in video RAM _b_e_f_o_r_e
       you start to look for H-retrace; if you find retrace and _t_h_e_n
       load the pointer into the register pair, you'll be late in
       writing. It's impossible to express this constraint in Pascal or
       C; you have to code it yourself in assembler.

       Once again, we want to write _v_i_d_w_o_r_d to location _v_i_d_o_f_f_s_e_t.
       Suppose we've also stored the video seqment in _v_i_d_s_e_g_m_e_n_t and the
       port address in _v_i_d_p_o_r_t. We've even been clever enough to put
       vidoffset and vidsegment in adjacent words, so we can load them
       into registers with a single instruction. The code we have so far
       is:

               mov  dx,vidport              ;Address of control register
               mov  cx,vidword              ;Data to be written
               les  di,dword ptr [vidoffset]  ;Load pointer into ES:DI
               mov  ah,9                    ;2-bit mask for V+H retrace


            wait_retrace:
               in   al,dx                   ;Get the status register to AL
               test al,ah                   ;V+H retrace mask
               jz   wait_retrace            ;loop till either bit turns on


               mov  ax,cx                   ;put the data in the accumulator
               stosw                        ;do the screen write

       This is a lot better than our last try. We have much less
       "synchronized snow" than we had before, but it's not all gone.
       What's wrong?

       Well, the code to detect horizontal retrace and use it is now
       shorter than the retrace itself, so we're doing some snow
       reduction. However, there's a strong possibility that we'll start
       to search for H-retrace while we're already in it, and even late
       in it. If we start our loop late in the H-retrace, we'll be out
       of it before we do the write, hence the remaining snow at the

       Screen Writing                                           2-13-89





                                     - 5 -


       left edge of the screen. To eliminate all the snow, we must first
       make sure we're _o_u_t _o_f _t_h_e _r_e_t_r_a_c_e before we start to look for
       it. That way we'll be sure to find it early in the retrace. The
       code below does that.


               mov  dx,vidport              ;Address of control register
               mov  cx,vidword              ;Data to be written
               les  di,dword ptr [vidoffset]  ;Load pointer into ES:DI
               mov  ah,9                    ;2-bit mask for V+H retrace


            wait_sweep:
               in   al,dx                   ;Get the status register to AL
               test al,1                    ;H-retrace?
               jnz  wait_sweep              ;... Yes. Wait till it turns off


            wait_retrace:
               in   al,dx                   ;Get the status register to AL
               test al,ah                   ;V+H retrace mask
               jz   wait_retrace            ;loop till either bit turns on


            do_it:
               mov  ax,cx                   ;put the data in the accumulator
               stosw                        ;do the screen write

       This is almost the code that I use in my routines. However, to
       enhance performance, I add a couple of checks:
         - I keep a static variable called _s_n_o_w_o_k, which I set to 1 for
           display adapters that don't put snow on the screen. If this
           variable is set, I bypass the snow suppression code.
         - Before I check for the horizontal sweep, I look to see if
           we're in vertical retrace. The V-retrace is thousands of
           microseconds, and the retrace bit turns off long before the
           beam turns back on, so we can be write to video RAM any time
           we see it.
         - I change a <TEST,JNZ> pair of instructions to <RCR,JC>, which
           saves a couple of machine cycles.
       The resulting code is:
















       Screen Writing                                           2-13-89





                                     - 6 -


               mov  dx,vidport              ;Address of control register
               mov  cx,vidword              ;Data to be written
               les  di,dword ptr [vidoffset]  ;Load pointer into ES:DI
               mov  ah,9                    ;2-bit mask for V+H retrace
               test snowok,1                ;Do we need to do snow suppression?
               jz   do_it                   ;... No.
               in   al,dx                   ;Read the status register
               test al,8                    ;Vertical retrace in progress?
               jnz  do_it                   ;... Yes. Don't need to wait


            wait_sweep:
               in   al,dx                   ;Get the status register to AL
               rcr  al,1                    ;Faster than  TEST AL,1
               jc   wait_sweep              ;Still in H-retrace. Wait


            wait_retrace:
               in   al,dx                   ;Get the status register to AL
               test al,ah                   ;V+H retrace mask
               jz   wait_retrace            ;loop till either bit turns on


            do_it:
               mov  ax,cx                   ;put the data in the accumulator
               stosw                        ;do the screen write

       Before we move on to performance measurements, however, I'd like
       to comment on a few differences between this and a code fragment
       recently posted by Ward Christensen (Usenet message
       5082@phoenix.Princeton.EDU, January 2, 1989). Ward attributes the
       code to "FASTWRITE", by an author whose name he's forgotten.

         - Ward recommends bypassing the snow elimination code if the
           board is a monochrome board instead of a CGA; he claims that
           the sweep bit doesn't toggle with a mono board, and the
           program will hang. In my experience, bypassing snow removal
           for a mono board is a good idea, but _n_o_t because it hangs the
           program; it doesn't. However, the mono board doesn't have
           inherent snow; the video memory is better designed. Without
           the retrace checks, the program will run a lot faster.

           As Ward points out, the best way to decide what to do is to
           look at the video mode variable in the BIOS. If the mode is
           7, it's a mono board. If the mode is 0 to 6, it's a CGA (or
           another display in CGA-compatible mode).

         - FASTWRITE recommends turning off interrupts while looking for
           retrace and writing to the screen. It (and other snow removal
           programs I've seen) accomplish this by surrounding the snow
           removal code with a CLI-STI instruction pair.

           I oppose this in principle, and find it unnecessary in
           practice. I believe that turning off interrupts should only
           be done if the system or application will be corrupted by the
           occurrence of an interrupt (e.g.- while switching stacks, or

       Screen Writing                                           2-13-89





                                     - 7 -


           unloading the received byte from a UART before the next byte
           overwrites it). But what is the consequence of being
           interrupted in our routine? Either a spot of snow on the
           screen or a delay in the output of a character. Either is
           invisible if interrupts are infrequent events; neither
           threatens the integrity of any other operation of the
           computer.

         - FASTWRITE doesn't treat vertical retrace as a special case.
           This slows the "screen throughput" by 5%-10%, as measured by
           the techniques in the next section.

       _2. _P_e_r_f_o_r_m_a_n_c_e

       _2._1. _T_e_s_t _M_e_a_s_u_r_e_m_e_n_t_s

       _2._1._1. _B_a_s_i_c _M_e_a_s_u_r_e_m_e_n_t_s

       I made a series of measurements of "screen throughput" in
       characters per second, on several MSDOS Personal Computers.
       Throughput was measured by sending 20,000 to 100,000 characters
       to the screen, and timing the duration with a stopwatch. The
       calling program was written in C, and made repeated calls to a
       function _d_p_u_t_c (_y, _x, _c_h_a_r_a_c_t_e_r, _c_o_l_o_r), which was itself coded
       in assembler. The calling program was, roughly speaking:

                 for (n=0; n<N; n++)
                     for (i=0; i<25; i++)
                         for (j=0; j<80; j++)
                             dputc (i, j, n+' ', 0x07);

       where N was chosen to give a reasonable time to measure with a
       stopwatch.

       Table 1 shows the results, for three different versions of the
       _d_p_u_t_c () function.
         - The first version simply invokes the appropriate BIOS
           functions to move the cursor and write the character. This
           establishes a _b_a_s_e_l_i_n_e _t_h_r_o_u_g_h_p_u_t for each computer.
         - The second version is coded in assembler, using the snow
           elimination code discussed earlier in these notes.
         - The third version establishes a maximum throughput for the
           computer, by writing directly to video RAM without worrying
           about snow. The CGA and the AT&T 400-line displays showed
           plenty of snow with this version.
       The numbers in the table show the baseline throughput, and the
       improvement factor over the baseline for each of the faster
       versions. For instance, the AT&T PC 6300 PLUS can write to the
       screen using the BIOS at 1000 characters per second.  If it spews
       characters at the screen raw (without snow elimination), it can
       go at 15 times that speed, or 15,000 characters per second.






       Screen Writing                                           2-13-89





                                     - 8 -


                                    TABLE 1
                     CHARACTER WRITING SPEED USING dputc()


         Computer        IBM     Clone   AT&T    IBM     AT&T    AT&T
                          XT      XT     6300    AT      6300+   6386
                         -----   -----   -----   -----   -----   -----
         CPU Chip        8088    8088    8086    286     286     386
                         -----   -----   -----   -----   -----   -----
         Display Board   CGA     Herc    AT&T    CGA     AT&T    EGA
                                 Mono    400-ln          400-ln
                         -----   -----   -----   -----   -----   -----
         Characters      360     380     770     830     1000    2800
            per second
            using BIOS
         Improvement
            factor
            over BIOS:
          -Snow            8      11      11       8      12       8
            eliminated
          -Just write     14      12      14      16      15      13
            to video

       The results are pretty consistent.  Calling a snow-free
       assembly-coded function improves throughput by a factor of ten
       over the BIOS, for a wide range of processors and displays. The
       maximum obtainable improvement (forgetting the snow elimination
       altogether and just writing blindly to the video RAM) gives an
       improvement factor of 14 over the BIOS.

       _2._1._2. _E_f_f_e_c_t _o_f _S_n_o_w _E_l_i_m_i_n_a_t_i_o_n

       Looking at the table, we see that we can pick up most of the
       improvement even if we wait for retrace to eliminate snow. The
       speed with snow eliminated is between 50% and 92% of the maximum
       raw speed possible.

       I ran some measurements on the relative contributions of the
       vertical and horizontal retrace to the speed of the snow-free
       functions.

         - The functions using only horizontal retrace were about 20%
           slower than the functions using both vertical and horizontal
           retrace.

         - The functions using only vertical retrace were _v_e_r_y _s_l_o_w.
           They were within a factor of two of the baseline BIOS
           results, and slower than the BIOS in some cases.

       _2._1._3. _E_f_f_e_c_t _o_f _S_t_r_i_n_g _W_r_i_t_e_s

       There is a way to gain a major improvement even over these
       results. Remember that we're writing from a C program that loops
       and calls _d_p_u_t_c() for each character.  But most programs do the
       vast majority of their screen output as _c_h_a_r_a_c_t_e_r _s_t_r_i_n_g_s, not
       isolated characters. This suggests an opportunity to code an

       Screen Writing                                           2-13-89





                                     - 9 -


       assembler routine _d_p_u_t_s() that writes a snow-free string when
       called. Such a function would be faster by saving:
         - The less-efficient C code for the inner loop of the program.
         - The overhead of a function call for the vast majority of
           screen writes.
         - The need to compute the offset in video RAM for each
           character; the pointer just needs to be stepped.

       I have coded the _d_p_u_t_s() function, and measured it with a program
       that's output-equivalent to the previous test program:

                 line [80] = '\0';
                 for (n=0; n<N; n++) {
                     for (j=0; j<80; j++)
                         line [j] = n+' ';
                     for (i=0; i<25; i++)
                         dputs (i, 0, line, 0x07);
                 }

       The results, shown in Table 2, are in the form of improvement
       factors over the corresponding row of the character-at-a-time
       tests.

                                    TABLE 2
                       STRING WRITING SPEED USING dputs()


         Computer        IBM     Clone   AT&T    IBM     AT&T    AT&T
                         XT       XT     6300    AT      6300+   6386
                         -----   -----   -----   -----   -----   -----
         CPU Chip        8088    8088    8086    286     286     386
                         -----   -----   -----   -----   -----   -----
         Display Board   CGA     Herc    AT&T    CGA     AT&T    EGA
                                 Mono    400-ln          400-ln
                         -----   -----   -----   -----   -----   -----
         Improvement
            factor over
            dputc() tests
          -BIOS            1.5     1.5     1.5     1.4     1.5     1.5
            calls
          -Snow            4       4       3       2.3     2       1.3
            eliminated
          -Just write      5.5     6       5       4       4.5     4
            to video

       We can see that, except for the 386 box, the total improvement
       over the baseline BIOS data ranges between factors of 20 and 44.
       For instance, the improvement for the XT clone is the previous
       improvement factor (11), times the improvement due to writing
       strings (4), or 44. Where snow elimination isn't a problem, the
       improvement factor can be even higher. For instance, the Hercules
       monochrome board, with no snow problem, can have its screen
       written by a factor of 72 (12 times 6) faster than the BIOS.




       Screen Writing                                           2-13-89





                                    - 10 -


       _2._1._4. _E_f_f_e_c_t _o_f _P_o_i_n_t_e_r _C_o_m_p_u_t_a_t_i_o_n

       To my surprise, one thing for which I've occasionally criticized
       the BIOS design was _n_o_t much of a speed factor.  The BIOS
       recalculates the video RAM address offset each time it is called.
       Since the 8088 is horribly slow at multiplying, it would seem
       smart to remember the last location written and the offset
       associated with it. If the new location is close (say, in the
       same row), simply adjust the old offset by addition rather than
       computing a new one from scratch by multiplication.

       To test this assertion, I measured throughput using both methods,
       for a few of the test machines; the results are shown in Table 3.

                                    TABLE 3
                   IMPROVEMENT DUE TO INCREMENTAL COMPUTATION


                   CPU Chip        8088    8086    286
                                   ----    ----    ----
                   Snow            1.00    1.00    1.00
                      eliminated
                   Just write      2.22    1.03    0.95
                      to video

       The "adjustment" method showed a difference only in the "raw
       write" case. Where we took the trouble to eliminate snow, the
       time spent waiting for retrace completely masked any improvement
       in the offset computation.

       Even in the case of raw writes, the results were hardly decisive.
       The slowest of the machines (using a 4.77 MHz 8088 CPU) saw a
       significant improvement. The improvement in the case of the 8 MHz
       8086 chip was barely measurable. With the 286 chip (where the
       multiply instruction is much faster), computing from scratch was
       actually faster than adjusting the old offset.


       _2._2. _W_h_y _i_s _B_I_O_S _s_o _S_l_o_w?

       Unfortunately, most of the problem is in the fundamental design
       of the BIOS, not the implementation. It is impossible to make any
       major improvements by cleverly implementing the functions as
       defined. Here are the major contributing factors:

         - The use of the software interrupt as the function interface
           carries more overhead than a function call, but not all that
           much. However, grouping functions together on an interrupt
           (and selecting which function by switching on the value of
           AH) requires every function call to be handled by a
           "dispatcher" that burns a nontrivial number of machine
           cycles.

         - The selection of function calls is itself a botch. In the
           first place, there is no string write call, and we have seen
           how big a difference it can make. (String write has recently

       Screen Writing                                           2-13-89





                                    - 11 -


           been added to the BIOS, but portable programs obviously won't
           use that call since it won't be in most BIOSes in the field.)

         - An additional problem with the function set is that the
           screen write functions only write where the cursor is, and
           they don't move the cursor automatically. One reason that the
           BIOS is so slow is that, when a program writes a string, it
           has to make _t_w_o BIOS calls for each character: one to write
           the character and the other to advance the cursor. My dputc()
           and dputs() functions write to an arbitrary place on the
           screen, independent of where the visible cursor is blinking.














































       Screen Writing                                           2-13-89

flong@sdsu.UUCP (Fred J. E. Long) (03/13/89)

Please correct me if I am wrong, but can't you do this for fast writes:

1) Write the characters to page 2
2) Switch to page 2 for viewing
3) Write the characters to page 1
4) Go back to page 1 for viewing

Can you do this, or am I thinking of Apple ]['s ?

-- 
Fred J. E. Long	
San Diego State University, San Diego, California  92093
ARPA: flong%midgard@ucscc.ucsc.edu	
UUCP: ...!ucsd!sdsu!flong

jc58+@andrew.cmu.edu (Johnny J. Chin) (03/14/89)

Yes, you may write to page 2 and flip back and forth BUT ....
it will still cause snow.

And what's worse ... you can't do that with a standard monochrome card.

I've tried it and writing to any of the pages still makes the CGA snow.
And on the monochrome, writing to pages other then page 0 causes unpredictable
results.
      __________                                ___
     /          \                          /   /    /_/ / /\/
    _/  /   /   /                       __/.  /__  / / / / /
   /     /     /
  /           / 4730 Centre Ave. #412   ARPAnet: Johnny.J.Chin@andrew.cmu.edu
 /  -------  /  Pittsburgh, PA  15213   BITnet:  jc58@andrew
 \__________/   (412) 268-8936          UUCP: ...!harvard!andrew.cmu.edu!jc58
 Computer Dr.

Disclaimer:   The views expressed herein are STRICTLY my own, and not CMU's.

spolsky-joel@CS.YALE.EDU (Joel Spolsky) (03/14/89)

In article <3562@sdsu.UUCP> flong@sdsu.UCSD.EDU (Fred J. E. Long) writes:
| 
| Please correct me if I am wrong, but can't you do this for fast writes:
| 
| 1) Write the characters to page 2
| 2) Switch to page 2 for viewing
| 3) Write the characters to page 1
| 4) Go back to page 1 for viewing

The IBM MDA (monochrome display adapter) only has one page. So, unless
you want to write two sets of i/o routines, no, you can't.

+----------------+----------------------------------------------------------+
|  Joel Spolsky  | bitnet: spolsky@yalecs.bitnet     uucp: ...!yale!spolsky |
|                | internet: spolsky@cs.yale.edu     voicenet: 203-436-1483 |
+----------------+----------------------------------------------------------+
                                                      #include <disclaimer.h>

dmt@mtunb.ATT.COM (Dave Tutelman) (03/15/89)

>In article <3562@sdsu.UUCP> flong@sdsu.UCSD.EDU (Fred J. E. Long) writes:
>| Please correct me if I am wrong, but can't you do this for fast writes:
>| 
>| 1) Write the characters to page 2
>| 2) Switch to page 2 for viewing
>| 3) Write the characters to page 1
>| 4) Go back to page 1 for viewing

In article <53527@yale-celray.yale.UUCP> spolsky-joel@CS.YALE.EDU (Joel Spolsky) responds:
>The IBM MDA (monochrome display adapter) only has one page. So, unless
>you want to write two sets of i/o routines, no, you can't.

Actually, the MDA doesn't have a snow problem at all, so you don't need to
go to this trouble.  Fortunate indeed, because Joel is right that it
has only one page (though the Hercules monochrome has several).  Joel
is also right that you'll need multiple routines: one with the page
switching and one without.

However, let's see if there is something to be gained from page-switching.
The most speed you can hope to gain is the difference between raw screen writes
and those protected by waiting for retrace.  The price of this is the
need to keep two copies of the screen up-to-date.  (I assume you want
to make incremental updates to an existing screen, not just repaint
the whole screen each time you change it.  That means that the "off-screen"
page must be kept identical to the "on-screen" one.)

Thus, to make the two-page technique pay off, you've got to be able to do
raw screen writes at least twice as fast as the retrace-protected
writes, because you've got to do twice as many of them.
My measurements show precious few cases where the raw writes were
more than twice as fast as the retrace-protected writes.  In the majority
of machines (especially the slower ones, where algorithm performance 
buys the most), the speed gain is less than x1.5.  In those cases,
keeping two pages is a net loss.

So here are the places where keeping two screens might be worthwhile:

   -	The cases where the speedup is worth it.  From my data, that
	includes cases like a PC6300 PLUS (286 box) or a PC6386
	(386 box) with a display that NEEDS snow removal.

   -	Painting a brand new screen for the first time.  Some programs
	have places where the entire screen context changes.  Page
	switching is a way of making that happen insantly and
	snow-free.

   -	Those programs where there is "dead time" after a screen
	write, that can be used for updating the off-screen page.
	Many places in interactive programs have this characteristic.
	However, it's got to be more than one or two characters to
	pay off; even the BIOS has no apparent performance problem
	where you write less than a line.

   -	Those programs where the APPEARANCE of speed is more important
	than real speed.  With page switching, the update is done
	invisibly and then "snaps" to the screen; you don't watch
	the screen being painted.  (I've always found this sort
	of operation more annoying than watching the screen go; if
	it were actually slower, it'd be intolerable.  However, no
	accounting for taste  :-)

In summary, good idea Fred, and it works.  But its advantages are
limited to special cases.  If I wanted to write ONE way of doing
it for all my console programs, I'd use retrace-protection.

+---------------------------------------------------------------+
|    Dave Tutelman						|
|    Physical - AT&T Bell Labs  -  Lincroft, NJ			|
|    Logical -  ...att!mtunb!dmt				|
|    Audible -  (201) 576 2442					|
+---------------------------------------------------------------+

brad@looking.UUCP (Brad Templeton) (03/16/89)

It is possible to write a complete 80 by 25 page full of information on
a slow 4.7 mhz PC to a snowy colour card in 1/15 second by waiting for
retrace, and thus getting no snow.

I wrote a curses library package ages ago that's used in some of my
software packages that does just this.  1/15th of a second is just
barely noticeable.  You can't notice anything on a machine faster than a
PC.

No fancy bank switch schemes are needed, just good retrace waiting code.
-- 
Brad Templeton, Looking Glass Software Ltd.  --  Waterloo, Ontario 519/884-7473

shurr@cbnews.ATT.COM (Larry A. Shurr) (03/18/89)

In article <53527@yale-celray.yale.UUCP> spolsky-joel@CS.YALE.EDU (Joel Spolsky) writes:
}In article <3562@sdsu.UUCP> flong@sdsu.UCSD.EDU (Fred J. E. Long) writes:

}| Please correct me if I am wrong, but can't you do this for fast writes:

}| 1) Write the characters to page 2
}| 2) Switch to page 2 for viewing
}| 3) Write the characters to page 1
}| 4) Go back to page 1 for viewing

}The IBM MDA (monochrome display adapter) only has one page. So, unless
}you want to write two sets of i/o routines, no, you can't.

True and it also doesn't eliminate the need for snow removal if your
CGA requires snow removal to begin with, so it doesn't eliminate that
overhead.

regards, Larry
-- 
Signed: Larry A. Shurr (att!cbnews!cbema!las or osu-cis!apr!las)
Clever signature, Wonderful wit, Outdo the others, Be a big hit! - Burma Shave
(With apologies to the real thing.  The above represents my views only.)