rcd@ico.isc.com (Dick Dunn) (01/09/90)
woody@rpp386.cactus.org (Woodrow Baker) writes: > baffico@adobe.COM (Tom Baffico) writes: > >...As to the benefit of having a FPU, for most controllers the performance > > increase is actually quite small. > > AS to the benifits of a FPU, I am certain that the speedup would matter. I tend to agree with Tom, and while I haven't investigated it, I assume that Adobe has. Intuitively, it seems reasonable; I'd think the down'n' dirty work of the font rendering would be mostly fixed point. Probably a lot of it is not numerical work at all. Still, as I suggest, I ASSume that Adobe knows what they're doing. Woody - please submit some evidence that an FPU would make some useful speedup in a PostScript controller. I don't mean your conjecture; I mean some real evidence, or solid reasoning. Since, once again, you're trying to tell Adobe that they don't know their business, the burden of proof is on you. -- Dick Dunn rcd@ico.isc.com uucp: {ncar,nbires}!ico!rcd (303)449-2870 ...Mr. Natural says, "Use the right tool for the job."
mcdonald@aries.scs.uiuc.edu (Doug McDonald) (01/10/90)
In article <1990Jan9.044252.18617@ico.isc.com> rcd@ico.isc.com (Dick Dunn) writes: >woody@rpp386.cactus.org (Woodrow Baker) writes: >> >> AS to the benifits of a FPU, I am certain that the speedup would matter. > >I tend to agree with Tom, and while I haven't investigated it, I assume >that Adobe has. Intuitively, it seems reasonable; I'd think the down'n' >dirty work of the font rendering would be mostly fixed point. Probably a >lot of it is not numerical work at all. Still, as I suggest, I ASSume that >Adobe knows what they're doing. > >Woody - please submit some evidence that an FPU would make some useful >speedup in a PostScript controller. I don't mean your conjecture; I mean >some real evidence, or solid reasoning. Since, once again, you're trying >to tell Adobe that they don't know their business, the burden of proof is >on you. >-- Adobe's business is BUSINESS - i.e. profits, as is that of printer makers. Whether a FPU is worthwhile depends on how much more money it will make for them, and a faster printer might not sell enough more. Its complicated.
ken@cs.rochester.edu (Ken Yap) (01/10/90)
|Adobe's business is BUSINESS - i.e. profits, as is that of printer |makers. Whether a FPU is worthwhile depends on how much more money |it will make for them, and a faster printer might not sell enough more. |Its complicated. Why don't we base this discussion on something more substantial? I repost an article I saved. Hope this leads to better discussion. From: adobe!taft@decwrl.DEC.COM (Ed Taft) Date: 18 Feb 1986 1750-PST (Tuesday) Subject: Hardware support for PostScript Several people have suggested hardware enhancements (e.g., faster CPUs, RasterOp chips, etc.) to improve the performance of PostScript printers. Naturally, this is a topic of great interest to us at Adobe. I'd like to share a few of our current thoughts with you. Please note that I am talking only about current products; I am not speculating about future ones. Adobe's approach to PostScript has been first to define a fully general software model for the programming language and page description capabilities and only then to consider how hardware can be employed to accelerate the software. Experience with a pure software implementation of PostScript (of which the LaserWriter is a good example) gives us an understanding of what parts of the implementation would benefit most from hardware support. There are three major activities that together account for most of the execution time in Adobe's implementation of PostScript. These are: (1) Low-level raster manipulations, principally painting character bitmaps and filling trapezoids located at arbitrary bit boundaries. For typical pages, this activity dominates everything else if all characters are already in the font cache. (2) Character scan conversion. This is a very compute intensive operation because the original character definitions are at a high level and are being pushed through the full PostScript graphics machinery. In particular, there is a lot of arithmetic, both fixed and floating point. (3) PostScript input scanning and interpretation. This includes parsing the input stream, constructing tokens, looking up names, pushing and popping stacks, etc. The amount of time consumed by this activity varies considerably according to the type of page description and the programming style. A text document that consists primarily of strings and calls to simple PostScript procedures consumes relatively little time in the interpreter; a document that executes a lot of PostScript code for each mark placed on the page consumes proportionately more. Of course, I have deliberately left out time spent waiting for input data or waiting for the print engine. The effect of a slow communication channel or a slow print engine can completely dominate everything else. More to the point, obtaining the best performance requires the ability to perform communication, execution, and printing activities in parallel. The above three activities benefit from significantly different kinds of hardware support. (Of course, in a strictly software implementation, a faster CPU should speed all three activities.) Considering them in order: (1) Simple hardware for shifting and masking makes a substantial difference here; the full generality of RasterOp is not needed. The idea is to minimize the number of CPU instructions and memory cycles needed to perform simple, repetitive bit moving operations. A shifter-masker is included in the Adobe Redstone controller, versions of which are used in all present PostScript printers except the LaserWriter. This activity is one that would benefit greatly from having a separate, parallel processor; its interface with the rest of PostScript would be quite simple. (2) Efficient arithmetic is of particular importance here. Also, since a vast amount of code is being executed and all of it is written in a high-level language (C in the case of Adobe's implementation), the overall quality of compiled code is important. Apart from arithmetic, no single component dominates, so it's not practical to assembly-code much of it. (3) Here is a place where some special hardware and/or microcode might help. The PostScript interpreter's data structures and algorithms are sufficiently straightforward that custom hardware may be practical. Whether or not this makes sense economically depends on how much time is spent in the interpreter relative to everything else, which, as I said, is highly application dependent.
amanda@mermaid.intercon.com (Amanda Walker) (01/10/90)
In article <1990Jan9.182332.8554@cs.rochester.edu>, ken@cs.rochester.edu (Ken Yap) writes: > Why don't we base this discussion on something more substantial? I > repost an article I saved. Hope this leads to better discussion. I knew this discussion was getting familiar. Thanks! > Experience with a pure software implementation > of PostScript (of which the LaserWriter is a good example) gives us an > understanding of what parts of the implementation would benefit most > from hardware support. Another thing that I would imagine it's good for is that by running the implementation on something like a UNIX box, you can profile it and actually look at where the time goes. This is critical for finding out what will actually make the most difference when you speed it up. > (1) Low-level raster manipulations, principally painting character > bitmaps and filling trapezoids located at arbitrary bit boundaries. > For typical pages, this activity dominates everything else if all > characters are already in the font cache. This sounds like a good candidate for hardware. The experience of some of the PostScript clone controllers seems to show that a TI 34010 graphics processor or an AMD 29000 can significantly improve the interpreter's ability to lug bits around. > (2) Character scan conversion. This is a very compute intensive > operation because the original character definitions are at a high > level and are being pushed through the full PostScript graphics > machinery. In particular, there is a lot of arithmetic, both fixed and > floating point. This looks like the least tractable as far as throwing off-the-shelf hardware at it is concerned. A good FPU (like a 68881 or 68882), perhaps with some hand-tuned code for multiplying things by the CTM, would be my guess at the best approach. > (3) [...] A text document that consists primarily of > strings and calls to simple PostScript procedures consumes relatively > little time in the interpreter; a document that executes a lot of > PostScript code for each mark placed on the page consumes > proportionately more. This seems to indicate that the major avenue of improvement is not the tokenizer but the code that runs through executable arrays. Now, these look a lot from the outside like pretty conventional token-threaded code. A change to subroutine-threaded code could make a significant improvement in execution speed, although it might take more space and could introduce compatibility problems with code that does 'put's into executable arrays. > More to the > point, obtaining the best performance requires the ability to perform > communication, execution, and printing activities in parallel. Yes. One thing I noticed from using a Dataproducts LZR-2665 was that even though its controller didn't seem too much faster than a LaserWriter at imaging, the aggregate throughput was often much higher since it has two page buffers, and could be imaging a page while the previous one was still being sent out to the marking engine. Hardware buffering for serial ports (and maybe even a second processor for AppleTalk or Ethernet) would also reduce this kind of waiting. There's nothing like having a Linotron 300 ignore its input stream for 10 seconds per page to make you appreciate an I/O processor :-). > A shifter-masker is > included in the Adobe Redstone controller, versions of which are used in all > present PostScript printers except the LaserWriter. This activity is one > that would benefit greatly from having a separate, parallel processor; its > interface with the rest of PostScript would be quite simple. Well, aside from the fact that this is a little dated, advances in the 68000 family can also gain some of these benefits. The 68020 and 68030, for example, are much better at shifting and doing bit field operations that the 68000 was. Amanda Walker Speaker To PostScript InterCon Systems Corporation --
woody@rpp386.cactus.org (Woodrow Baker) (01/10/90)
In article <1990Jan9.044252.18617@ico.isc.com>, rcd@ico.isc.com (Dick Dunn) writes: > > I tend to agree with Tom, and while I haven't investigated it, I assume > that Adobe has. Intuitively, it seems reasonable; I'd think the down'n' > dirty work of the font rendering would be mostly fixed point. Probably a > lot of it is not numerical work at all. Still, as I suggest, I ASSume that > Adobe knows what they're doing. See the repost of the Feb article from Adobe. font outlines do have a mix of FP and fixed-point. I don't have an 68881 FPU manual handy, but i'm sure that the hardware is MUCH faster than software. 'nuff said. NOT a conjecture. Cheers Woody > Dick Dunn rcd@ico.isc.com uucp: {ncar,nbires}!ico!rcd (303)449-2870 > ...Mr. Natural says, "Use the right tool for the job."
mikep@crackle.amd.com (Mike Parker) (01/11/90)
I'll just remove all attributions for fear of getting them all wrong... | | > Experience with a pure software implementation | > of PostScript (of which the LaserWriter is a good example) gives us an | > understanding of what parts of the implementation would benefit most | > from hardware support. | | Another thing that I would imagine it's good for is that by running | the implementation on something like a UNIX box, you can profile it | and actually look at where the time goes. This is critical for finding | out what will actually make the most difference when you speed it up. There are far too many other first order effects. I've spent a lot of time trying to get a handle on where PS spends its processing time and I do have some hard data (was it Dick Dunn that wanted numbers?). First, the processor makes a big difference. I work for AMD so I really understand the Am29000 much better than others, but it is clear that an external shifter might not help the Am29000 as much as say the 68000. Case in point, there are a lot of bit-blt accelerators available for 68000 (like Cirrus chip) but our bit-blt code for Am29000 is completely memory bound, the only external hardware that would help is a faster memory system. The memory system is another key factor. One example: On one particular board, the Am29000 running Phoenix clone with the Am29027 FPU is 46x a Laserwriter Plus while without the FPU it is 30x the Plus. But it would be all wrong to say that the FPU gives a 50% boost to performance because we have other boards where the boost is much larger and others where it is much smaller. Choice of software is also a key contributor. Another clone, Pipeline Associates, goes from 5.9x the NTX without the FPU to 10.2x the NTX with the FPU on the same board as the previous Phoenix numbers. So it would appear that Pipeline is more FP dependent than Phoenix. I'm told by people who probably do not know that Adobe is very FP independent, so maybe they'd see less of a hit. Real soon now I'll be able to quote similar numbers for Bauer/uSoft. Further evidence that the probelm is SW vendor dependent is that the Pipeline people worked long and hard to improve performance with and without the FPU and were able to make very large differences and to close the gap significantly. In particular we found that basic add, sub, mul, div were not nearly the culprits that a certain few transcendentals were. Hand coded transcendental routines from Pipeline made huge performance differences in the non-FPU case for some files. Pipeline already had the advantage of a pure integer font rendering mechanism (Nimbus-Q), they changed their bezier solver to pure integer as well as a few other key routines. It was a lot of work and many less caring clone vendors ahven't done the exercise. Being older and bigger, it stands to reason that Adobe has worked pretty hard on this. So profiling on a Laserwriter, or worse yet a UNIX box which might have a memory system very unlike a printer isn't really going to give data applicable to PostScript printers as a whole. | | > (1) Low-level raster manipulations, principally painting character | > bitmaps and filling trapezoids located at arbitrary bit boundaries. | > For typical pages, this activity dominates everything else if all | > characters are already in the font cache. | | This sounds like a good candidate for hardware. The experience of some | of the PostScript clone controllers seems to show that a TI 34010 graphics | processor or an AMD 29000 can significantly improve the interpreter's | ability to lug bits around. Thanks for the plug (all others flame me when the advertising content exceeds the hard data). I'm not so sure that the low-level raster stuff dominates. I've been told that the split is nearly 50/50. I tend to agree with the earlier poster who said that it varies greatly for different pages. But I have hard evidence that it isn't all that low in the case cited where the page is all text and all hits in font cache. We have a 9 page pure text document that we have run on all sorts of configurations. I believe that both Pipeline and Phoenix use the same blit code (supplied by AMD) and yet they get very different results. The first few pages show large differences due to differences in character rendering time for font cache misses, and you can see the time per page curve down exponentially to an asymptote at about the fourth page. At the asymptotee, the Pipeline runs at roughly 0.5 seconds per page on the same exact hardware where the Phoenix code runs at about 0.75 seconds per page. I can't see where any of the difference is anything but "interpretation" (as opposed to raster file manipulation). I have a plan and would like some input on it's validity. We'll take the same exact hardware except we'll change the serializer crystal so we can run at 400 dpi and we'll tell the code to run at 400 dpi. We'll run a variety of pages at both resolutions. It seems like some simple algebra will then give us the intrepretation/raster split. We'll have 87% more pixels so if we take 20% longer to run a file then raster processing time must be 20/87 or 33% of the total task. If enough of you say that the experiment is valid, I'll run it, otherwise I'll run it and just not tell anybody. Please blame all gross spelling errors on a noisy line... mikep Mike Parker mikep@amdcad.AMD.COM