pf@diab.se (Per Fogelstr|m) (07/26/88)
This is my very own reflections about the recent discussion. I have during my days worked with a few software and hardware driven graphic systems. And i can not, from what i have learned until now accept that a software BitBlit done by a general purpose micro (for ex a 68K ) can be faster than a "hardware". If we assume that the speed of the BitBlit is dependent on the memory bandwidth, all bus cycles doing anything else than movin data is overhead. Agree ??. Okay, so even if a 68k is doing a "Blit" in straight code it will consume some memory bandwidth. A micro programmed "hardware" blit will be able to use every memory cycle for data accesses, thus having much higher transfer rate. Correct me if i'm wrong... I know that the Amiga has been discussed. From what i know the Amiga has the graphics part of the main memory isolated from the rest of the bus. This means that the Blitter in the Amiga can do a Blit without disturbing the 68k cpu. And even if the 68k accesses the graphics memory the blitter will still have 50% of the availible bandwidth left for its work, because the 68k only needs the other 50%. (Hope i'm not to desinformed). Assuming we have a fast micro (an 68030 or a NS32535) the would at least be supported by their on chip caches. Even if the hitrate in theese caches are as low as 50% an external hardware Blitter could use the other 50% for its work, and the micro would run at full speed. So the cpu does not have to wait for the memory. Someone pointed out that placing characters is the main work for the BitBlit. Yes, that is correct in some systems and this is a problem in many cases. Placing a character can take from 20micro seconds and up, and the cpu has to wait for the blitter to be ready before placing the next character. Then 20 microseconds is to litle for a context switch and the cpu is wasitng time waiting. But 20-30 microseconds is at least faster than the cpu can place the caracter anyway. And if the graphic engine is smart enough it can fetch characters from the buffer or main memory itself and offloding the micro until it needs some help with a special character (scrolling etc.) And talking about scrolling ! Everyone who has seen a Sun scroll must agre that the screen must be scrolled in less than one frame (1/70 of a second) to make a pleasant impression. I would like to see a 68k do that with a 1k x 1k x 8 plane display in one frame time. An other argument i have heard is "But look at the Mac, its cpu driven and its fast !". Well, i just have one answer to that, what is the display resolution ? Ok, start flaming i'm ready, but beware, my blitter is fast !!
jesup@cbmvax.UUCP (Randell Jesup) (07/28/88)
In article <399@ma.diab.se> pf@ma.UUCP (Per Fogelstr|m) writes: >Someone pointed out that placing characters is the main work for the BitBlit. >Yes, that is correct in some systems and this is a problem in many cases. >Placing a character can take from 20micro seconds and up, and the cpu has to >wait for the blitter to be ready before placing the next character. Then >20 microseconds is to litle for a context switch and the cpu is wasitng time >waiting. But 20-30 microseconds is at least faster than the cpu can place >the caracter anyway. And if the graphic engine is smart enough it can fetch >characters from the buffer or main memory itself and offloding the micro >until it needs some help with a special character (scrolling etc.) The only trouble with doing characters with a blitter is setup time is usually longer than the actual blit (at least for 8N wide fixed- width fonts that happen to be aligned on "nice" boundaries.) For this reason, 1.3 of the amiga OS has something that intercepts Text() calls and renders them via CPU if the font is 8 pixels wide and aligned on a byte boundary. This makes things like emacs really blazingly fast, no discernable rendering time. The blitter and normal software handles the more general cases, and scrolling. >And talking about scrolling ! Everyone who has seen a Sun scroll must agre >that the screen must be scrolled in less than one frame (1/70 of a second) >to make a pleasant impression. I would like to see a 68k do that with >a 1k x 1k x 8 plane display in one frame time. I get real annoyed at sun-2 (no blitter) scrolling speeds. And it's only dealing with 1 bitplane! >An other argument i have heard is "But look at the Mac, its cpu driven and its >fast !". Well, i just have one answer to that, what is the display resolution ? Not compared to an amiga with blitter. Try dragging a color window on a Mac-II (remember, it has an '020 at twice the speed of the amiga's '000.) Blitter's also can do VERY fast line draws, with a few extra gates/ registers, as well as area fills, etc. -- Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup
root@cca.ucsf.edu (Computer Center) (07/28/88)
It's an old debating trick to try to make points against the opposing view by mis-characterizing it and then arguing against the distorted image. Every one posting on this subject advocating cpu data shuffling for displays has tried to fake us all out by pretending that this is done in word sized units all neatly aligned on word boundaries. Now let's see your software timings for real _bit_blit_ operations such as moving a block 37 bits wide aligned starting at bit 17 in the source position and starting at bit 29 in the destination position on a machine with 32-bit registers and data paths. Thos Sumner (thos@cca.ucsf.edu) BITNET: thos@ucsfcca (The I.G.) (...ucbvax!ucsfcgl!cca.ucsf!thos) OS|2 -- an Operating System for puppets. #include <disclaimer.std>
guy@gorodish.Sun.COM (Guy Harris) (07/28/88)
> It's an old debating trick to try to make points against the opposing > view by mis-characterizing it and then arguing against the distorted > image. It's an equally old trick to make a counterfactual assertion and treat it as an axiom.... > Every one posting on this subject advocating cpu data shuffling for > displays has tried to fake us all out by pretending that this is > done in word sized units all neatly aligned on word boundaries. > > Now let's see your software timings for real _bit_blit_ operations > such as moving a block 37 bits wide aligned starting at bit 17 in the > source position and starting at bit 29 in the destination position > on a machine with 32-bit registers and data paths. *Sigh* I don't think *anybody* claimed that all bit moving is "done in word size units all neatly aligned on word boundaries." *HOWEVER*: For applications in terminals, there are three cases of "bitblt" that dominate: drawing characters, scrolling windows and window-window operations such as exchanging off-screen data with the display. These cases also cover the most common graphics operations on personal computers. Drawing a character requires decoding a found structure to find the location of the charcter in the fount bitmap and calling "bitblt" to draw the character on the display. For a general fount format and typical character sizes, over half the total time to draw a character on the Blit goes into overhead: at least one subroutine call and setup, opening the fount, building the argument list for "bitblt", calling "bitblt", and having "bitblt" in turn decode and clip its arguments and decide how to draw the image. Because the characters are so small -- drawing the letter 'a' touches 7 words of memory -- actually changing the pixels in the destination bitmap is relatively unimportant. Our overhead is not unreasonable; the Blit draws about 2500 characters per second in the standard fount, which is 9 pixels (not 8) by 14. An experimental version with eight-bit wide characters drawn only on byte boundaries, that avoided the overhead of calling "bitblt" and used a special fount format that was easy to decode (the current format is somewhat compressed for economy of memory), was only a factor of two faster. This is insufficent speed-up for so great a loss of generality. The second common case of "bitblt" is scrolling a rectangular region of a bitmap, usually the display. Since the word boundaries in the scan lines of a bitmap are at the same place in each line, the speed of scrolling depends primarily on the speed of the MC68000 instruction mov.l %a0@+, %a1@+ or, in C, register long *p, *q; *p++ = *q++; For typical rectangles, the edges, which must be handled with more complicated code, do not dominate the performance. There is nothing hardware can do to accelerate this loop except provide faster memory access. If the display were accessed through a narrower or clumsier interface, it would take longer to move the data. The last common case is shuffling on- and off-screen rectangles. It can be made fast by a simple observation: the off-screen bitmaps are allocated by "balloc", which is given as argument the rectangle on the display occupied by the data. This rectangle is assigned to "rect" in the resulting "Bitmap". "balloc" can therefor allocate the bitmap so that the word boundaries occur in the same places in the image as they do in the display, reducing to the scrolling case the "bitblt" call that copies the data. This is the last feature of the "Bitmap" data structure: "Bitmap.rect" defines not only the co-ordinate system but also the word fragmentation; the "x" co-ordinate modulo 16 is 0 at the first bit of the word in every bitmap. This results in a factor of two to four speed-up for window-shuffling "bitblt" operations and combines neatly with the way textures are generated without diminishing the generality of the graphics primitives. Of course, there is also the wide, non-aligned case of "bitblt" to be supported, but almost by construction it occurs rarely, and the memory and software are clean enough to make it acceptably fast when it is executed. from "Hardware/Software Trade-offs for Bitmap Graphics on the Blit", Rob Pike, Bart Locanthi, and John Reiser, Software-Practice and Experience, Vol. 15(2), 131-151 (February 1985). I tend to believe Rob Pike and company when they say that "for real _bit_blit_ operations such as moving a block 37 bits wide aligned starting at bit 17 in the source position and starting at bit 29 in the destination position on a machine with 32-bit registers and data paths" are not typical (at least in the way they used Blits) except for character painting, where overhead above and beyond the bit-pushing dominates. If you have evidence to indicate that this is not the case, let's see it. In the aforementioned paper, they also discuss timings. They compare a Sun-1 (with a somewhat unusual frame buffer), a Sun-2 (with a conventional frame buffer with a BitBlt chip that acts only on the frame buffer), and the Blit. I don't know how much the Sun-2 with BitBlt chip resembles the "hardware BitBlt" support that has been discussed here, but here are the figures (minus those for the atypical Sun-1 frame buffer); all timings are in milliseconds: Operation Sun-2 Sun-2 Blit (display w/BB chip) (memory, no BB chip) Scroll screen vertically 109 82.2 129 Scroll screen horizontally 110 311 376 Letter 'a' at random positions on the screen 0.34 0.74 0.42 Texturing a random 40x40 square 0.82 1.78 1.60 "The characters were drawn in a 9x14 pixel fount, but the bounding box for the letter 'a' is only 8x7. Both systems used "bitblt" to draw characters, rather than special purpose primitives, and executed clipping code." (from the article) So it appears that an 8MhZ 68000 (Blit) can compete reasonably well with a 10MhZ 68010 (Sun-2), even with the assistance of the Sun-2s BitBlt chip. I don't know why the Sun-2 scrolled vertically memory-to-memory *faster* than it did display-to-display. If a BitBlt chip is reasonably cheap, and can do the whole job, it may be worth it. Note that in the cases shown, you got at most a 3.5x speedup (scroll screen horizontally). For vertical scrolling, you got only 1.18x; for randomly drawing the letter 'a', you got only 1.23x; and for texturing a random 40x40 square, you got 1.95x. How cheap does it have to be for that to be worth it? (The "do the whole job" comes from comments made in the paper that a half-hearted hardware assist can get in the way, rather than help.)
aglew@urbsdc.Urbana.Gould.COM (07/28/88)
>It's an old debating trick to try to make points against the opposing >view by mis-characterizing it and then arguing against the distorted >image. > >Every one posting on this subject advocating cpu data shuffling for >displays has tried to fake us all out by pretending that this is >done in word sized units all neatly aligned on word boundaries. > >Now let's see your software timings for real _bit_blit_ operations >such as moving a block 37 bits wide aligned starting at bit 17 in the >source position and starting at bit 29 in the destination position >on a machine with 32-bit registers and data paths. > >Thos Sumner (thos@cca.ucsf.edu) BITNET: thos@ucsfcca Has anyone got usage statistics for Blit operations? Like, how many are well aligned, on word/halfword/byte boundaries? How many are to characters, etc.? About 3 years ago when I was trying to choose areas for research, statistics for graphics operations like those we are familiar with for instruction set usage was suggested. I haven't been looking in the meantime - has anybody done work on this? aglew@gould.com
henry@utzoo.uucp (Henry Spencer) (07/29/88)
In article <399@ma.diab.se> pf@ma.UUCP (Per Fogelstr|m) writes: >... all bus cycles doing anything else than movin data is overhead. >Agree ??. Okay, so even if a 68k is doing a "Blit" in straight code >it will consume some memory bandwidth. A micro programmed "hardware" blit >will be able to use every memory cycle for data accesses, thus having much >higher transfer rate. Correct me if i'm wrong... Almost right, which means "wrong". Take out the word "much" and I'll go along with it. Bulk data movement, like scrolling, can be done with 68k instructions like MOVEM, which move a couple of dozen words of data for every instruction fetch. Yes, avoiding the fetches would speed things up, but not by nearly as much as you think. People designing things like Blitters, DMA interfaces, etc., consistently ignore just how quickly a modern CPU can move data if the programmer really sits down and thinks for a while about how to do it. Most modern CPUs can nearly saturate their buses with data movement if they really try. >Assuming we have a fast micro (an 68030 or a NS32535) the would at least be >supported by their on chip caches. Even if the hitrate in theese caches are >as low as 50% an external hardware Blitter could use the other 50% ... You miss an important point: those caches are not there to free up external memory cycles, they are there to help slow memory keep up with a fast CPU. It's not at all inconceivable to get 50% cache hits (which is low for an instruction cache but good for a tiny data cache like the 030's) *and* complete saturation of the external memory bandwidth, when one of those CPUs gets going. >Someone pointed out that placing characters is the main work for the BitBlit. >Yes, that is correct in some systems and this is a problem in many cases. >Placing a character can take from 20micro seconds and up, and the cpu has to >wait for the blitter to be ready before placing the next character... This is in fact nearly irrelevant, because there are probably 200us or more of overhead required before that 20us BitBlt. Character drawing is a case where BitBlt speed is irrelevant, because character drawing speeds are TOTALLY dominated by the overhead of finding the character and deciding where to put it. -- MSDOS is not dead, it just | Henry Spencer at U of Toronto Zoology smells that way. | uunet!mnetor!utzoo!henry henry@zoo.toronto.edu
gillies@p.cs.uiuc.edu (07/29/88)
Re: Render characters Another approach (taken by the Xerox DLion) is to have a separate instruction just for displaying characters. This instruction, called (appropriately) "TextBLT", knows about the font table formats and specialized for displaying rectangular blobs of text. TextBLT is also implemented in microcode, on the DLion's AMD2900-based CPU.
daver@nscimg.b16.sc.NSC.COM (Dave Rand) (07/29/88)
In article <1313@ucsfcca.ucsf.edu> root@cca.ucsf.edu (Computer Center) writes: > >Now let's see your software timings for real _bit_blit_ operations >such as moving a block 37 bits wide aligned starting at bit 17 in the >source position and starting at bit 29 in the destination position >on a machine with 32-bit registers and data paths. > >Thos Sumner (thos@cca.ucsf.edu) BITNET: thos@ucsfcca On the NS32CG16 (32 bit CPU, 16 bit external data path), here are the numbers. The source is 3 words (less, really, but that's life in BITBLT). This needs to be moved to 4 (NOT 3) words of destination. The shift is 12. The height is not shown: I assume 32 lines. The times are given in clocks, and MICROseconds, assuming 15 Mhz operation. EXTBLT 35 + ( 13 + (12*4)) * 32 = 1987 clocks, or 132 usecs BBFOR 48 + ( 61 + (4+4) + 25 + 4) * 32 = 3184 clocks, or 212 usecs BBOR 42 + (107 + (4+4) + 44 + 4) * 32 = 5258 clocks, or 350 usecs BBAND 45 + (111 + (4+4) + 44 + 4) * 32 = 5389 clocks, or 359 usecs The EXTBLT instruction in the NS32CG16 drives the DP8510 BITBLT unit. This does the shift and ALU operation in hardware - the CPU provides only the addresses. The BBFOR, BBOR, BBAND (and other BITBLT functions) are implemented in microcode directly. These instructions execute WITHOUT hardware assist of any form. The times shown include the shift of 12 (shifts of 0-8 bits are hidden by the fetch time of the destination data, due to the scheduled load/pipeline feature of the Series 32000 architechure). BBFOR is a "Fast OR", performing a left-to-right BITBLT operation. BBOR, BBAND, EXTBLT and the other BITBLT operations in the NS32CG16 allow a full 4-direction (left-to-right, right-to-left, top down, and bottom up) BITBLT. In moderate quantity, the price is $20-30. If you need more information, please contact me. Dave Rand daver@nscimg.nsc.com {pyramid|sun}!nsc!nscimg!daver These opinions in no way represent those of National Semiconductor.
root@cca.ucsf.edu (Computer Center) (07/29/88)
In article <61783@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) writes: (In an attempt to refute my point about substituting word-blitting for bit-blitting without admitting it being a debater's trick). > > For applications in terminals, there are three cases of "bitblt" that > dominate: drawing characters, scrolling windows and window-window > operations such as exchanging off-screen data with the display. These > cases also cover the most common graphics operations on personal > computers. > ... > > from "Hardware/Software Trade-offs for Bitmap Graphics on the Blit", Rob Pike, > Bart Locanthi, and John Reiser, Software-Practice and Experience, Vol. 15(2), > 131-151 (February 1985). > Another debater's trick: this one is called Appeal to Authority. Never mind, I suppose, that this is 3 1/2 years after publication of the above and who knows how long before that it was written. A few things have happened in this business since then. But the real point is that these are exactly the applications that should not be blitted at all; the video mapping controller should be handling all of that. For example, at last month's Usenix meeting Bell Technologies was showing their Intel 82786 (I hope I got the number right) video controller running smoothly scrolled text over 2/3 of a high-res screen while occupying the remainder with instant opening and closing overlapped windows. No jerks, no glitches, no skew were to be seen. It sure made the skew distorted scrolling of the corner cutting move-screen-bits-with-the-cpu systems look awful. Thos Sumner (thos@cca.ucsf.edu) BITNET: thos@ucsfcca (The I.G.) (...ucbvax!ucsfcgl!cca.ucsf!thos) OS|2 -- an Operating System for puppets. #include <disclaimer.std>
jesup@cbmvax.UUCP (Randell Jesup) (07/29/88)
In article <61783@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: > For applications in terminals, there are three cases of "bitblt" that > dominate: drawing characters, scrolling windows and window-window > operations such as exchanging off-screen data with the display. These > cases also cover the most common graphics operations on personal > computers. They were dealing with straight terminals and simple rectangular windows, being used as a mainly character-oriented interface to larger machines. A very different envirionment from today's microcomputers, such as the Amiga, Mac, etc. > decide how to draw the image. Because the characters are so small -- > drawing the letter 'a' touches 7 words of memory -- actually changing > the pixels in the destination bitmap is relatively unimportant. Our Characters are a somewhat special case, and are well worth special-casing in the code. The blitter does help a lot with proportional kerned fonts, less with monospaced fonts, and not at all with monospaced byte-multiple wide fonts aligned on byte boundaries. Unfortunately, this last case doesn't happen often, especially in a windowing envirionment. It can make editors that cover the screen several times faster. > The second common case of "bitblt" is scrolling a rectangular region > of a bitmap, usually the display. Since the word boundaries in the > scan lines of a bitmap are at the same place in each line, the speed of > scrolling depends primarily on the speed of the MC68000 instruction Once again, this is true in a text-based envirionment. In a WIMP envirionment, this is much less true. Block operations usually start on arbitrary boundaries, and tend to be inconvenient widths. > register long *p, *q; > *p++ = *q++; > > For typical rectangles, the edges, which must be handled with more > complicated code, do not dominate the performance. There is nothing > hardware can do to accelerate this loop except provide faster memory > access. If the display were accessed through a narrower or clumsier > interface, it would take longer to move the data. This is nowhere near as fast as the memory system can go nowadays, even given the slowest/cheapest DRAMS. For that loop, even unrolled, the cpu is being used at least 33% for instruction fetch, and even so the CPU only uses every other memory cycle. > The last common case is shuffling on- and off-screen rectangles. It > can be made fast by a simple observation: the off-screen bitmaps are > allocated by "balloc", which is given as argument the rectangle on the > display occupied by the data. This rectangle is assigned to "rect" in > the resulting "Bitmap". "balloc" can therefor allocate the bitmap so > that the word boundaries occur in the same places in the image as they > do in the display, reducing to the scrolling case the "bitblt" call > that copies the data. This is nowhere near the common case on machines like the Amiga. >I tend to believe Rob Pike and company when they say that "for real _bit_blit_ >operations such as moving a block 37 bits wide aligned starting at bit 17 in >the source position and starting at bit 29 in the destination position on a >machine with 32-bit registers and data paths" are not typical (at least in the >way they used Blits) except for character painting, where overhead above and >beyond the bit-pushing dominates. If you have evidence to indicate that this >is not the case, let's see it. You've said the operative clause: the way they used Blits. As I've said, blitter hardware can buy you linedraw and areafill as well relatively cheaply. These things are MUCH faster as part of blitter than as done by the CPU, up to 20x for linedraw. >If a BitBlt chip is reasonably cheap, and can do the whole job, it may be worth >it. Note that in the cases shown, you got at most a 3.5x speedup (scroll >screen horizontally). For vertical scrolling, you got only 1.18x; for randomly >drawing the letter 'a', you got only 1.23x; and for texturing a random 40x40 >square, you got 1.95x. How cheap does it have to be for that to be worth it? You get bigger wins in animation or multitasking evironments. A blitter is relatively cheap, if you already need video chips (of course, Commodore has chip design facilities, and uses custom chips for most things.) The blitter on the amiga is just a part of one of the graphics chips, maybe 1/4 of it. A factor of 2-4x can make a really amazing difference in percieved speed, especially if update operations go down to 1-frame time. Using my Sun-2 (no blitter, no color) is positively painful compared to my amiga, even though the amiga is running in 4-colors (in this case). -- Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup
elg@killer.DALLAS.TX.US (Eric Green) (07/29/88)
In message <61783@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) says: >If a BitBlt chip is reasonably cheap, and can do the whole job, it may be worth >it. Note that in the cases shown, you got at most a 3.5x speedup (scroll >screen horizontally). For vertical scrolling, you got only 1.18x; for randomly >drawing the letter 'a', you got only 1.23x; and for texturing a random 40x40 >square, you got 1.95x. How cheap does it have to be for that to be worth it? >(The "do the whole job" comes from comments made in the paper that a >half-hearted hardware assist can get in the way, rather than help.) > It's interesting to note that the Amiga chipset was originally designed for "the ultimate video game", which required a) low cost, and b) the ability to move random irregularly shaped objects with blazing speed, doing logic operations upon the operands (e.g. one favorite video game trick is EOR'ing in the moving object into the background bitmap, then EOR'ing it out when it's ready to be moved, and EOR it into the new location). Amazing how well-suited such a chip is for a low-cost windowing system... well, not-so-amazing, really, since the chipset designers knew an aweful lot about designing high speed video systems, while the designers of the Sun probably didn't have that experience when they were faced with the problem of speeding up their graphics rendering. Seems like the video game jocks have something to show us Unix jocks, after all.... Just some meaningless trivia to generate flames... ;-) Eric -- Eric Lee Green ..!{ames,decwrl,mit-eddie,osu-cis}!killer!elg Snail Mail P.O. Box 92191 Lafayette, LA 70509 MISFORTUNE, n. The kind of fortune that never misses.
henry@utzoo.uucp (Henry Spencer) (08/01/88)
In article <1315@ucsfcca.ucsf.edu> root@cca.ucsf.edu (Computer Center) writes: >For example, at last month's Usenix meeting Bell Technologies was >showing their Intel 82786 (I hope I got the number right) video >controller running smoothly scrolled text over 2/3 of a high-res >screen while occupying the remainder with instant opening and closing >overlapped windows. No jerks, no glitches, no skew were to be seen. > >It sure made the skew distorted scrolling of the corner cutting >move-screen-bits-with-the-cpu systems look awful. To quote someone whose name I can't recall :-), "another debater's trick"! This time, comparing tomorrow's system with yesterday's. A 25 MHz AMD 29000 (note, not 2900) and a suitably cooperative memory subsystem should be able to do a *software* BitBlt that would make an Amiga look equally awful. If you want to compare the latest hot BitBlt chip, compare it against the latest hot CPU. -- MSDOS is not dead, it just | Henry Spencer at U of Toronto Zoology smells that way. | uunet!mnetor!utzoo!henry henry@zoo.toronto.edu
glennw@nsc.nsc.com (Glenn Weinberg) (08/02/88)
In article <1988Jul28.173301.7275@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <399@ma.diab.se> pf@ma.UUCP (Per Fogelstr|m) writes: >>Assuming we have a fast micro (an 68030 or a NS32535) the would at least be >>supported by their on chip caches. Even if the hitrate in theese caches are >>as low as 50% an external hardware Blitter could use the other 50% ... > >You miss an important point: those caches are not there to free up external >memory cycles, they are there to help slow memory keep up with a fast CPU. >It's not at all inconceivable to get 50% cache hits (which is low for an >instruction cache but good for a tiny data cache like the 030's) *and* >complete saturation of the external memory bandwidth, when one of those >CPUs gets going. It is not true that the sole purpose of cache is to help slow memory keep up with fast processors. In multiprocessors, in particular, one of the most important functions of caches (and especially copy-back caches) is to reduce bus traffic and so allow a relatively slow bus to support a number of fast processors. Within the next five years, this "relatively slow bus" will need a bandwidth of multiple hundreds of Megabytes per second (you read that right-- the bus will need a bandwidth of several Gigabits per second) in order to support a multiprocessor system made up of, say, 8-16 50-MIPS processors. And the only way that you limit yourself to "only" needing hundreds of Megabytes per second is by using copy-back caches. -- Glenn Weinberg Email: glennw@nsc.nsc.com National Semiconductor Corporation Phone: (408) 721-8102 (My opinions are strictly my own, but you can borrow them if you want.)
guy@gorodish.Sun.COM (Guy Harris) (08/02/88)
> (In an attempt to refute my point about substituting word-blitting > for bit-blitting without admitting it being a debater's trick). Excuse me, but I don't *consider* it a debating trick. Unless you can demonstrate that it *is* one - which you have *not* done - I have no intention of "admitting it is (one)." However, I *do* consider blithely dismissing arguments you don't like as "debating tricks" to be a debating trick. The point made by Pike and company is that the bulk of the operations performed on the Blit *were* word-oriented, except for some that were dominated by overhead above-and-beyond the bit-pushing, and therefore that the fact that pushing bits on arbitrary boundaries is more expensive isn't important. This is similar to the point that in many applications, integer multiplications are usually multiplications by constants, and therefore machines that don't have multiply instructions don't suffer a big performance hit in those applications. You *don't* always have to make the *general* case fast; you want to concentrate on making the *common* case fast. > Another debater's trick: this one is called Appeal to Authority. Umm, right. By this logic, *any* citation of *any* paper is "Apppeal to Authority", and thus dismissable as a "debating trick". Clever trick, that. Sorry, but Pike and company, at least, have demonstrated some level of expertise in the matter of making bit-mapped display hardware and software. As such, appealing to their authority is not without merit. > Never mind, I suppose, that this is 3 1/2 years after publication > of the above and who knows how long before that it was written. > A few things have happened in this business since then. In other words, the common types of bitblt operations have changed since then?
elg@killer.DALLAS.TX.US (Eric Green) (08/02/88)
In message <62296@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) says:
$Sorry, but Pike and company, at least, have demonstrated some level of
$expertise in the matter of making bit-mapped display hardware and software.  As
$such, appealing to their authority is not without merit.
$
$$ Never mind, I suppose, that this is 3 1/2 years after publication
$$ of the above and who knows how long before that it was written.
$$ A few things have happened in this business since then.
$
$ In other words, the common types of bitblt operations have changed since then?
 No, but the common types of CPU RAM have :-). For example, the slowest
 256K DRAM's that I've seen are 150ns, while I remember the "good old
 days" of 64K DRAM's, where 250ns was fast.... I won't bother you
 further by going taking the Wayback machine back to the late 70's,
 when 16K DRAMs were lucky to keep up with a 6502.
 What made sense for a 8mhz 68000 in 1980 did not necessarily make
sense 4 years later for that same 8mhz 68000.....
 --
Eric Lee Green    ..!{ames,decwrl,mit-eddie,osu-cis}!killer!elg
          Snail Mail P.O. Box 92191 Lafayette, LA 70509              
       MISFORTUNE, n. The kind of fortune that never misses.jesup@cbmvax.UUCP (Randell Jesup) (08/03/88)
In article <62296@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes: >> Never mind, I suppose, that this is 3 1/2 years after publication >> of the above and who knows how long before that it was written. >> A few things have happened in this business since then. > >In other words, the common types of bitblt operations have changed since then? Yes. Pike et al were only looking at blitters used for text- oriented terminals that also had graphics capabilities. -- Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup
jesup@cbmvax.UUCP (Randell Jesup) (08/03/88)
In article <1988Aug1.061714.25907@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >To quote someone whose name I can't recall :-), "another debater's >trick"! This time, comparing tomorrow's system with yesterday's. >A 25 MHz AMD 29000 (note, not 2900) and a suitably cooperative memory >subsystem should be able to do a *software* BitBlt that would make >an Amiga look equally awful. If you want to compare the latest hot >BitBlt chip, compare it against the latest hot CPU. Sure, and let me make a 1.2u CMOS version of the amiga blitter and it'll do the same thing to the 29000. The Amiga blitter is in 3u NMos or HMos or some such, 4+ year old tech, running at 7 Mhz with a 16bit bus. -- Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup
henry@utzoo.uucp (Henry Spencer) (08/03/88)
In article <4410@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes: > Sure, and let me make a 1.2u CMOS version of the amiga blitter and >it'll do the same thing to the 29000. The Amiga blitter is in 3u NMos or >HMos or some such, 4+ year old tech, running at 7 Mhz with a 16bit bus. My point was that if the opposition can make unfair comparisons (new Intel hardware against ten-year-old CPU), I can make them too. I'm not sure I'd bet on 1.2u CMOS beating the 29000, though: that processor is *really good* at saturating memory bandwidth. -- MSDOS is not dead, it just | Henry Spencer at U of Toronto Zoology smells that way. | uunet!mnetor!utzoo!henry henry@zoo.toronto.edu
henry@utzoo.uucp (Henry Spencer) (08/03/88)
In article <4409@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes: > ... Pike et al were only looking at blitters used for text- >oriented terminals that also had graphics capabilities. I would conjecture -- note that this is only a conjecture -- that even the highly graphics-oriented machines spend far more time displaying plain old text than most people think. I'd love to see numbers on this; does anybody have some? -- MSDOS is not dead, it just | Henry Spencer at U of Toronto Zoology smells that way. | uunet!mnetor!utzoo!henry henry@zoo.toronto.edu
anc@camcon.co.uk (Adrian Cockcroft) (08/05/88)
In article <76700044@p.cs.uiuc.edu>, gillies@p.cs.uiuc.edu writes: > > Re: Render characters > > Another approach (taken by the Xerox DLion) is to have a separate > instruction just for displaying characters. This instruction, called > (appropriately) "TextBLT", knows about the font table formats and > specialized for displaying rectangular blobs of text. TextBLT is also > implemented in microcode, on the DLion's AMD2900-based CPU. The Intel 82786 has a CHARBLT instruction. There are two forms, in the nicest one you define a font to the chip, up to 256 16x16 pixel characters mapped through an indirection table so that e.g. all unwanted chars map to the same glyph, you then give it a string and a charcount and the CHARBLT instruction draws proportionally spaced characters for you (the font can be kerned for italic). This runs at full memory bandwidth speeds. The font has a header for each glyph giving its size and some mode control bits. The 82786 can also have a very high memory bandwidth of 40 Mb/s on a 16 bit wide bus. It uses page mode DRAMS in two banks interleaved so that a new word is read every 50ns. A burst lasting about a microsecond fills the 25 word FIFO that feeds the video output registers, leaving plenty of memory bandwidth for drawing operations. I think the blitter also does a block fetch although it might use a RMW cycle. The CHARBLT runs at 20000 chars/sec. In general the 82786 is probably faster then the 34010 but is less programmable. -- | Adrian Cockcroft ..!uunet!mcvax!ukc!camcon!anc -[T]- Cambridge Consultants Ltd, anc@uk.co.camcon or anc@camcon.uucp | Science Park, Cambridge CB4 4DW, England, UK (0223) 358855 (You are in a maze of twisty little C004's, all alike...)
mitch@Stride.COM (Thomas Mitchell) (08/06/88)
In article <1988Jul28.173301.7275@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <399@ma.diab.se> pf@ma.UUCP (Per Fogelstr|m) writes: >>... all bus cycles doing anything else than movin data is overhead. >>Agree ??. Okay, so even if a 68k is doing a "Blit" in straight code >>Correct me if i'm wrong... >Almost right, which means "wrong". Take out the word "much" and I'll go >... People designing things like >Blitters, DMA interfaces, etc., consistently ignore just how quickly a >modern CPU can move data if the programmer really sits down and thinks This is true, and a common surprise. Many 'DMA' processors are not as fast as the main processor. They commonly do not have, a bus interface equal to the processor or instruction cache or other goodies we now expect in a micro-processor. In a system the DMA processor also must arbitrate with the processor for control of the bus. Then communicate (message?) with the processor .... Well-- The result is that DMA processors are a loss except to the sales department. If I was careful -- I use the words DMA processor and not DMA device. It is possible to build custom hardware (a device) that does DMA to or from main memory vastly faster than a 'programed' transfer but such things are today rare. -- Thomas P. Mitchell (mitch@stride1.Stride.COM) Phone: (702)322-6868 TWX: 910-395-6073 FAX: (702)322-7975 MicroSage Computer Systems Inc. Opinions expressed are probably mine.
anc@camcon.co.uk (Adrian Cockcroft) (08/08/88)
In article <76700044@p.cs.uiuc.edu>, gillies@p.cs.uiuc.edu writes: > > Re: Render characters > > Another approach (taken by the Xerox DLion) is to have a separate > instruction just for displaying characters. This instruction, called > (appropriately) "TextBLT", knows about the font table formats and > specialized for displaying rectangular blobs of text. TextBLT is also > implemented in microcode, on the DLion's AMD2900-based CPU. The Intel 82786 has a charblt instruction. There are two forms, in the nicest one you define a font to the chip, up to 256 16x16 pixel characters mapped through an indirection table so that e.g. all unwanted chars map to the same glyph, you then give it a string and a charcount and the CHARBLT instruction draws proportionally spaced characters for you (the font can be kerned for italic). This runs at full memory bandwidth speeds. The font has a header for each glyph giving its size and some mode control The 82786 can also have a very high memory bandwidth of 40 Mb/s on a 16 bit wide bus. It uses page mode DRAMS in two banks interleaved so that a new word is read every 50ns. A burst lasting about a microsecond fills the 25 word FIFO that feeds the video output registers, leaving plenty of memory bandwidth for drawing operations. I think the blitter also does a block fetch although it might use a RMW cycle. The CHARBLT runs at 20000 chars/sec. The CHARBLT can draw 1 bit deep characters into 1,2,4 or 8 bit deep bitmaps. -- | Adrian Cockcroft ..!uunet!mcvax!ukc!camcon!anc -[T]- Cambridge Consultants Ltd, anc@uk.co.camcon or anc@camcon.uucp | Science Park, Cambridge CB4 4DW, England, UK (0223) 358855 (You are in a maze of twisty little C004's, all alike...)
rminnich@super.ORG (Ronald G Minnich) (08/08/88)
In article <1988Aug3.153415.9033@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <4409@cbmvax.UUCP> jesup@cbmvax.UUCP (Randell Jesup) writes: >> ... Pike et al were only looking at blitters used for text- >>oriented terminals that also had graphics capabilities. >I would conjecture -- note that this is only a conjecture -- that even >the highly graphics-oriented machines spend far more time displaying plain >old text than most people think. I'd love to see numbers on this; does >anybody have some? well i would guess mine does, when i am not playing computer games :-) (about 50% of the time?) One issue that has been ignored in this discussion is that we are all talking about slightly different things. Henry is talking about blitters, and others are talking about amigas. Seems to me that there is a BIG difference between the graphics supported by Blit and the graphics supported by the amiga- something that many probably don't know. For example, the amiga supports hardware windows. Now, they are a little limited, in that they have to be the width of the screen, so as a result the amiga OS people implemented them as screens, with multiple of the traditional type of window per screen. Thus you actually have several different worlds, each with their own color map and such, each with their own set of windows. You can have a game with 10 open windows on one screen, then flip to your X-windows screen with its 5 or 10 windows and its 2-bit-plane color map, then to your Deluxe Paint screen with its own 4096-color HAM map. The Screens are one thing i like best about the amiga, esp. since they can overlay each other on the physical display and flipping between them takes no time at all. I have yet to see this sort of graphics on anything other than an amiga, and i miss it a lot when i use other machines. It was mentioned somewhere at one point that the original Xerox workstations wanted to support this sort of multiple world environment, but the graphics support was not there (I think it was the Dorado). The other day i saw an IBM Peanut running X windows. The color map on the VGA changed as you moved from window to window. It just about drove me bats. That machine really needed screens, but i think its gotta happen in hardware if it is going to happen at all. To sum up, Amiga graphics hardware != blitter chip. and maybe > ron
stevew@nsc.nsc.com (Steve Wilson) (08/08/88)
In article <840@stride.Stride.COM> mitch@stride.stride.com.UUCP (Thomas Mitchell) writes: > >This is true, and a common surprise. Many 'DMA' processors are not >as fast as the main processor. They commonly do not have, a >bus interface equal to the processor or instruction cache or other >goodies we now expect in a micro-processor. > >In a system the DMA processor also must arbitrate with the processor >for control of the bus. Then communicate (message?) with the >processor .... Well-- > >The result is that DMA processors are a loss except to the sales >department. If your only moving one byte of data between interupts to the CPU telling him you've moved some data, then your right. A DMA controller is a WIN when you can tell it to move a large fixed length piece of data and then forget about it and go do something else. Your correct about losing memory bandwidth for the CPU, but this is a design trade-off point. My favorite example is a serial I/O controller I designed (Henry..You listening!!) where I had 12 serial I/O channels. The processor could handle about 7000 interupts/sec. How ya gonna do 19.2Kb of constant data flow across 12 channels with that kind of interupt response time. DMA was a cheap answer. (Henry, I'm NOT going to put the DMA hardware on a general purpose CPU!) Point being that there are applications where special hardware such as DMA makes sense, and there are applications where its a dumb idea! Steve Wilson National Semiconductor [The above opinions are mine, not those of my employer! ]
pf@diab.se (Per Fogelstr|m) (08/09/88)
In article <840@stride.Stride.COM> mitch@stride.stride.com.UUCP (Thomas Mitchell) writes: >.......................... Many 'DMA' processors are not >as fast as the main processor. They commonly do not have, a >bus interface equal to the processor or instruction cache or other >goodies we now expect in a micro-processor. > >If I was careful -- I use the words DMA processor and not DMA >device. It is possible to build custom hardware (a device) that >does DMA to or from main memory vastly faster than a 'programed' >transfer but such things are today rare. As a matter of fact, modern busses, supporting more than 100Mb transfer rate migth saturate the processor. :-))) Okay, okay, lets be serious. What i mean is that there is no need for what was called a DMA procecssor back in the good o'l days. DMA channels was needed because the main proceesor couldn't handle the data rate from Mag tapes and disks, etc. What is needed today in multiprocessor systems is a mechanism wich allows the "programmed CPU" on the disk controller board to burst the data over the bus fast as H..L. This to obey the rules for multi-procesor buses: 1. Don't use the bus. 2. If you must, be fast. 3. To be fast transfer more data than addresses. e.g use block transfers.
hwt@leibniz.UUCP (Henry Troup) (08/10/88)
The estimable Mr. Spencer (got to keep all the Henry's straight) queries if anyone has numbers for how much character i/o happens as against graphics on a graphics (bitmap) terminal. I don't know, but I do remember that character writing speed was a big thing for MacIntosh QuickDraw (9k characters per second). It's in the Byte interview in 1984. a hw (percent) leibniz@bnr-di t
daveh@cbmvax.UUCP (Dave Haynie) (08/10/88)
in article <61783@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) says: > Keywords: BitBlit. > The second common case of "bitblt" is scrolling a rectangular region > of a bitmap, usually the display. Since the word boundaries in the > scan lines of a bitmap are at the same place in each line, the speed of > scrolling depends primarily on the speed of the MC68000 instruction > mov.l %a0@+, %a1@+ > or, in C, > register long *p, *q; > *p++ = *q++; > For typical rectangles, the edges, which must be handled with more > complicated code, do not dominate the performance. There is nothing > hardware can do to accelerate this loop except provide faster memory > access. If the display were accessed through a narrower or clumsier > interface, it would take longer to move the data. With a MC68000, not so. Given an equal memory access speed, something like a DMA controller can be several times faster than the 68000. All it needs do is fetch data from location A, dump it to location B, and increment some internal counters. While it looks like that's what the 68000 is doing, it's really also fetching the move instruction and a branch instruction of some kind. So for every word moved, you're probably fetching as many instruction words as overhead. Certainly the 68010 in some cases and the 68020 in most cases solve this problem via caching, but I can't yet buy either of these parts for the $2.50 or so I pay for a 68000. > If a BitBlt chip is reasonably cheap, and can do the whole job, it may be worth > it. Note that in the cases shown, you got at most a 3.5x speedup (scroll > screen horizontally). For vertical scrolling, you got only 1.18x; for randomly > drawing the letter 'a', you got only 1.23x; and for texturing a random 40x40 > square, you got 1.95x. How cheap does it have to be for that to be worth it? > (The "do the whole job" comes from comments made in the paper that a > half-hearted hardware assist can get in the way, rather than help.) You also have to consider a few more things. For instance, if you have a blitter that operates on video memory and lets the CPU do things with non video memory in parallel (like on the Amiga, and apparently on the Sun mentioned), then you have a big advantage, in that any blit may end up costing nothing but the setup time in terms of real CPU usage. Still no good reason to use the blitter for small, single character blits, but it can really be a justification for larger things. And given that a blit chip can often be a much simpler design than the host CPU, there's a real good chance it WILL be able to have a faster path to memory. That depends of course on the chip and the base CPU in your system. If the combination of a blitter chip and 68000 ran me more than a 68020, that had better be one heck of a blitter, or I'm wasting my $$$ -- the 68020 being more general purpose than a blitter can give you a better overall system performance. But if I can get my blitter and 68000 CPU and maybe a bunch of other functions for less than the cost of a 68010, I'm probably winning (if I'm not concerned about the 68010's virtual memory facilities, which a Sun of course obviously is). -- Dave Haynie "The 32 Bit Guy" Commodore-Amiga "The Crew That Never Rests" {ihnp4|uunet|rutgers}!cbmvax!daveh PLINK: D-DAVE H BIX: hazy "I can't relax, 'cause I'm a Boinger!"
haahr@phoenix.Princeton.EDU (Paul Gluckauf Haahr) (08/11/88)
In article <118@leibniz.UUCP> hwt@leibniz.UUCP (Henry Troup) writes: > The estimable Mr. Spencer (got to keep all the Henry's straight) > queries if anyone has numbers for how much character i/o happens > as against graphics on a graphics (bitmap) terminal. > I don't know, but I do remember that character writing speed was a > big thing for MacIntosh QuickDraw (9k characters per second). It's > in the Byte interview in 1984. The Byte reference is to the February 1984 issue, and the rendering speed given was actually 7K characters/second. (the information is given on page 37 and repeated on page 76). Still, remarkably fast for a 68000, even given that this was done in hand coded assembly language. The 9K seemed about a factor 5 too high, which is why I looked the article up. 7000 chars/sec is still faster than I would have expected. They do not give sizes of the characters, and say in the article that it is irrelevant, but that still probably assumes something like 9x14. Much larger characters would probably hurt performace. Later Macs may be faster (if someone recoded the QuickDraw stuff to use the bit field instructions, the Mac II could scream). By way of comparision, the Pike/Locanthi/Reiser "Hardware/Software Tradeoffs for Bitmap Graphics on the Blit" paper gives numbers (page 146) that work out to 2400 chars/sec for the blit, 900 chars/sec for the Sun-1, and 2950 for the Sun-2. This assumes rendering one character at a time. Locanthi's fastest example from the EUUG "Fast bitblt() with asm() and cpp" paper gives 6200 chars per second for a 16 MHz, 2 wait state 68020. Again, this is for one bitblt() call per character. My monochrome sun-3/60 (68020, 20 MHz, bwtwo, with the normal, not high resolution, monitor), using the large console font (gallant.r.19) comes out to about 3200 chars/sec. I have no idea if they render more than one character at once. This is a very large font, however, and the output routine is in the prom monitor. I did not try to write a program to test pixrect character speeds on a normal sized font. My own bitblt, on the same sun-3/60, for an 8x14 font, gives 8100 chars/sec, if characters are rendered individually. If characters are batched up and bitblt() is called only once, the speed is > 16000 chars/sec. This code is a combination of c (with inline assembly for fetching the characters from the font bitmap and bitblt()s narrower than one word) and compile-on-the-fly code for bitblt()s spanning word boundaries. The real point: the Macintosh, with no hardware assist, and hand-coded assembly, draws characters very fast. paul haahr princeton!haahr or haahr@princeton.edu
henry@utzoo.uucp (Henry Spencer) (08/13/88)
In article <5493@nsc.nsc.com> stevew@nsc.UUCP (Steve Wilson) writes: >>The result is that DMA processors are a loss except to the sales >>department. > >If your only moving one byte of data between interupts to the CPU >telling him you've moved some data, then your right. A DMA controller >is a WIN when you can tell it to move a large fixed length piece of data >and then forget about it and go do something else... [In an example] the >processor could handle about 7000 interupts/sec. How ya gonna do >19.2Kb of constant data flow across 12 channels with that kind of interupt >response time. DMA was a cheap answer... I think you've missed the point slightly. Clearly, for high data rates it is necessary to have buffering of some kind, to keep the latency requirements down to the point where the CPU can satisfy them. One way of doing that is to have the device DMA into memory. Another way is to put buffering on board, but have the CPU do the actual transfer into memory when a reasonable amount of data has accumulated. Actual timings in several cases have clearly shown that buffered devices with CPU data movement can beat DMA devices. The main reason is that in many systems, the CPU normally has possession of the bus and the DMA device must first throw the CPU off. There can be quite substantial overhead in doing so, and if the DMA device then transfers a few bytes and goes away again, the bus-ownership-transfer overhead can hurt throughput badly. For a modern CPU which can move data quickly, it is worth considering using buffering instead of DMA in the peripherals. (Side benefits are that it makes the drivers simpler, and it's much more flexible -- most DMA schemes can't do things like putting network headers down in one place and the actual packet data down in another, or calculating an IP checksum as the data is being moved.) -- Intel CPUs are not defective, | Henry Spencer at U of Toronto Zoology they just act that way. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
pf@diab.se (Per Fogelstr|m) (08/13/88)
In article <1843@gofast.camcon.co.uk> anc@camcon.co.uk (Adrian Cockcroft) writes: > >In general the 82786 is probably faster then the 34010 but is less >programmable. >-- And the NS DP8500 is much faster than the 82786 and at least as programmable as the Ti 34010. And the DP8500 can have glyphs up to 256x256 and can do kerning as well. I did a CRT emulator by only adding a UART to the chip, easy as 1 2 3 . Look mom' no "CPU".
henry@utzoo.uucp (Henry Spencer) (08/14/88)
In article <1848@titan.camcon.co.uk> anc@camcon.co.uk (Adrian Cockcroft) writes: >The Intel 82786 has a charblt instruction. There are two forms, in the nicest >one you define a font to the chip, up to 256 16x16 pixel characters... So if my characters are, say, 17x17, I can't use it? This is precisely the sort of stupid restriction that makes people forget the chip and do it in software instead, to save the hassle of deciding when the hardware is actually useful. >... (the font can be kerned for italic).... How is the kerning defined? 10-1 it's some sloppy kludge. -- Intel CPUs are not defective, | Henry Spencer at U of Toronto Zoology they just act that way. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
pf@diab.se (Per Fogelstr|m) (08/21/88)
In article <1988Aug13.205229.24467@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <1848@titan.camcon.co.uk> anc@camcon.co.uk (Adrian Cockcroft) writes: >>The Intel 82786 has a charblt instruction. There are two forms, in the nicest >>one you define a font to the chip, up to 256 16x16 pixel characters... > >So if my characters are, say, 17x17, I can't use it? This is precisely the >sort of stupid restriction that makes people forget the chip and do it in >software instead, Intel always put a lot of dumb restrictions in their silicon. However it is possible to use larger fonts by using normal bitblit transfers. The sad thing is that you cant take advantage of the special functions used by charblt. (Table lookup etc.) NS DP8500 also has a charblt instruction but this chip can handle up to 65536 characters up to 256x256 pixels. Yes I know this is an restriction as well, but i think it will last for a while.
yuval@taux02.UUCP (Gideon Yuval) (09/01/88)
Are the S/W bitBLT algorithms available (preferably in "C")?
-- 
Gideon Yuval, yuval@taux01.nsc.com, +972-2-690992 (home) ,-52-522255(work)
 Paper-mail: National Semiconductor, 6 Maskit St., Herzliyah, Israel
                                                TWX: 33691, fax: +972-52-558322