davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/17/90)
Let's pretend that we have the ear of the chip designers at Intel, and that they have asked our opinion on what windows support should be included in the new Intel 586. Please hold any negative comments about Intel, CISC, etc, it's all been said... The questions: What features should be put into the CPU to improve performance and reduce chip count? Will assumptions about graphics memory organization be made, and if so what are they? Do we assume that support will be: a) all general purpose b) mostly MS-windows c) mostly X-windows Start of opinion: From what little info I have, the 486 sales are going first to people using them as servers, second to people running a multitasking o/s, mostly unix, and only third to DOS power users. You can assume that all unix means SysV right now, and that other multitasking systems include Desqview, ms-windows, DR-DOS, and other environments to support DOS programs. Given that servers probably don't need graphics in most installations, I would assume that X-windows has the largest user base of any single window system, although it may be less than half the market. I believe that the time has come when enough address space is available to allow direct mapping of graphics memory into memory on the ISA and EISA bus, and that these busses will be used in the majority of systems using Intel CPUs for the next two years or more. The performance bottleneck of mapping a MB of data into 64k of address space would be severe, even if it were done well. Cheap processors now have enough address space to allow investing some of it in graphics. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) VMS is a text-only adventure game. If you win you can use unix.
graeme@labtam.labtam.oz (Graeme Gill) (12/18/90)
In article <3042@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: > > Let's pretend that we have the ear of the chip designers at Intel, > and that they have asked our opinion on what windows support should be > included in the new Intel 586. > > What features should be put into the CPU to improve performance and > reduce chip count? > The thing crying out for help is overcoming the memory bandwidth bottleneck. Step number one is to add support for burst writes. As far as I know, only two mainstream processors support burst writes: The Intel 80960, and the Amd 29000. Both make dandy processors for X terminals, laser printers etc. etc. as a result. Graeme Gill Electronic Design Engineer Labtam Australia
sef@kithrup.COM (Sean Eric Fagan) (12/18/90)
In article <3042@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: > What features should be put into the CPU to improve performance and >reduce chip count? While I don't know that it would reduce chip count, a Good Thing to have would be: MORE REGISTERS!!!!!! But, of course, it won't happen. *sigh* -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
zap@lysator.liu.se (Zap Andersson) (12/18/90)
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: > Let's pretend that we have the ear of the chip designers at Intel, >and that they have asked our opinion on what windows support should be >included in the new Intel 586. Please hold any negative comments about >Intel, CISC, etc, it's all been said... >The questions: > What features should be put into the CPU to improve performance and >reduce chip count? Once upon a time I built a graphics board for my computer. 240x240 pixels, 16 shades of gray, not very heavy, BUT I had windows-like support in HARDWARE! Now I have NEVER understood why this is not common practice in todays computers! I mean what CAN be easier than to include in the gfx chip that 'when beam reaches this'n'that row/column, start displaying bitmap-data from this'n'that memory! The Amiga is the closes I've seen, supporting these 'semi-hardware' (the amiga uses a co-processor) as horizontal slices of display. With a faster co-processor (i.e. faster than 1 pixel bitclock) you could have hardware windows support! You will NEVER need to worry about memorys overlapping, or in what memory to write! You just write to your 'virtual' screen, and the display chip takes care about it ALL. Can SOMEONE tell me why this increadibly simple idea have so little use today? > Will assumptions about graphics memory organization be made, and if so >what are they? See above. But if your into 586 Windows handling, try to think up something NEW! Don't bother with standards with moss on top....please? /Z -- * * * * * * * * * * * * * * * * * * My signature is smaller than * * yours! - zap@lysator.liu.se * * * * * * * * * * * * * * * * * *
torbenm@freke.diku.dk (Torben [gidius Mogensen) (12/18/90)
graeme@labtam.labtam.oz (Graeme Gill) writes: >> What features should be put into the CPU to improve performance and >> reduce chip count? > The thing crying out for help is overcoming the memory bandwidth >bottleneck. Step number one is to add support for burst writes. As far >as I know, only two mainstream processors support burst writes: >The Intel 80960, and the Amd 29000. Both make dandy processors for >X terminals, laser printers etc. etc. as a result. There is also the ARM. And before you say that this isn't a mainstream processor, I should point out that it has a larger user base than either Intel 80960 or Amd 29000. In fact it is the second most used RISC processor (SPARC being the most used). Torben Mogensen (torbenm@diku.dk)
is@athena.cs.uga.edu ( Bob Stearns) (12/18/90)
In article <1990Dec18.082623.16648@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: >In article <3042@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: >> What features should be put into the CPU to improve performance and >>reduce chip count? > >While I don't know that it would reduce chip count, a Good Thing to have >would be: MORE REGISTERS!!!!!! > >But, of course, it won't happen. *sigh* > While more registers sound like motherhood and apple pie, in the UNIX world they can be a distinct losing proposition. The commonest service provided by the kernel is a state switch between processes. The more registers, the longer this state switch must necessarily take. The only ways out of this require lots more hardware and discipline from both the compilers and the programmer. The first solution involves keeping track of just which registers have been used during a process and only saving those; lots of very smart (expensive) hardware, and a need for discipline to keep from using more registers than really required. The second solution is to provide enough registers so that each process has its own set which never needs to be swapped; this leads to a hard limit on the number of processes allowed in the machine at any one time, OC
erc@pai.UUCP (Eric F. Johnson) (12/18/90)
In article <3042@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: > Let's pretend that we have the ear of the chip designers at Intel, > and that they have asked our opinion on what windows support should be > included in the new Intel 586. Please hold any negative comments about > Intel, CISC, etc, it's all been said... > [...deleted...] > Given that servers probably don't need graphics in most installations, > I would assume that X-windows has the largest user base of any single > window system, although it may be less than half the market. Much less. I would guess that for graphical windowing systems, the largest user base is for the Macintosh window system. (Yes, I'm well aware that the Mac doesn't use an 80x86 chip.) My guess would be, in order: 1) Macintosh--measure user base in millions 2,3) tie- Microsoft Windows, Amiga--around the 1 million mark for MS Windows, more for the Amiga, but that will soon change (note that I'm not saying whether this will be good or bad). If you read Personal Workstation, please tell them that the Amiga sports a multi-tasking operating system with a graphical user interface (for PW's strange application watch). 4) SunView - There are still probably more users of this than X, although that should change in 1991 in favor of X. How many people who buy Suns today still use SunView? Quite a lot, I'd venture. 5) The X Window System, although this will continue to grow, since it is available on multiple architectures and operating systems. > I believe that the time has come when enough address space is > available to allow direct mapping of graphics memory into memory on the > ISA and EISA bus, and that these busses will be used in the majority of > systems using Intel CPUs for the next two years or more. The performance > bottleneck of mapping a MB of data into 64k of address space would be > severe, even if it were done well. Cheap processors now have enough > address space to allow investing some of it in graphics. > -- > bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) > VMS is a text-only adventure game. If you win you can use unix. I like the idea of putting graphics support on chip (and it has been done before). I think it is in Intel's best interests to imbed support for Microsoft Windows into the 80x86, rather than X Window support. (I would prefer X support, which would make my "Wunderclone" into a decent X machine, but from Intel's point of view, MS Windows is the way to go. Also, since X has been ported to such a wide variety of machines, I'm sure that embedding MS Windows support on-chip would also provide a lot of good features that could be used for implementing X as well.) Have fun, -Eric PS, I know there are a lot of Amiga evangelists out there. I'm not trying to get your goat, just noting market reality. I am in no way trying to imply anything at all that could ever be considered bad about your wonderful machine. -- Eric F. Johnson phone: +1 612 894 0313 BTI: Industrial Boulware Technologies, Inc. fax: +1 612 894 0316 automation systems 415 W. Travelers Trail email: erc@pai.mn.org and services Burnsville, MN 55337 USA
jonah@dgp.toronto.edu (Jeff Lee) (12/19/90)
In article <1990Dec18.141944.5041@athena.cs.uga.edu> is@athena.cs.uga.edu ( Bob Stearns) writes: >While more registers sound like motherhood and apple pie, in the UNIX world >they can be a distinct losing proposition. The commonest service provided by >the kernel is a state switch between processes. The more registers, the longer >this state switch must necessarily take. [...] Sigh. We've been through this before: within reason, saving general purpose registers is typically not the most expensive part of a UNIX context switch. The cost of saving 8, 16, or 32 general purpose registers is often less than the cost of saving other process state information. However, the difference in code optimization with 8, 16, or 32 GP registers is often not insignificant. Thus, up to a point you win more through code optimization than you lose due to slower context switching. The tradeoff point depends on the expected rate of context switches. What *can* be annoying is having to save all registers in every exception handler. Having a separate set of GP registers for each processor mode could turn "traps" and "interrupts" into almost instantaneous co-routine switches. The tricky part might be flushing the pipeline correctly -- I don't know how easily this can be done. My caveat on this is that these additional registers should look just like the normal GP registers so that kernel code can be compiled with the same compiler as user code. Only the context save/restore code should need to access registers in another register bank. The PDP10 used to have different user/system register banks so it can be done. Does anyone have any DATA on how frequently system calls, exceptions, and interrupts (a) occur, and (b) result in context switches?
sef@kithrup.COM (Sean Eric Fagan) (12/19/90)
In article <1990Dec18.141944.5041@athena.cs.uga.edu> is@athena.cs.uga.edu ( Bob Stearns) writes: >While more registers sound like motherhood and apple pie, in the UNIX world >they can be a distinct losing proposition. The commonest service provided by >the kernel is a state switch between processes. The more registers, the longer >this state switch must necessarily take. Uhm, have you taken an OS course? And actually *read* the material? Saving the registers is a tiny part of a unix context switch. Most of it is dealing with checking which process can run next, etc. On the other hand, having more registers means that you don't have to go to memory as often, which *will* speed things up. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
borasky@ogicse.ogi.edu (M. Edward Borasky) (12/19/90)
>In article <3042@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: >> What features should be put into the CPU to improve performance and >>reduce chip count? I thought the way to improve performance was to REMOVE features! And the fewer CHIPS that make up a CPU, the slower it is for a given tech- nology. I think they made a mistake putting the co-processor ON CHIP; surely it would be faster if the floating point were done in a \ specialized unit (386/387 style). That way you can get faster floating point with other peoples' coprocessors (Weitek, for example).
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/19/90)
In article <1990Dec18.082623.16648@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: | While I don't know that it would reduce chip count, a Good Thing to have | would be: MORE REGISTERS!!!!!! I think we can assume that the 586 will be a superset of the 486. Can someone quantify what would be gained with more registers, say R0-R7? The cost of saving and restoring on procedure calls is obvious, can someone show that the addition of more would produce a significant net gain. Now if you said make the existing registers more general purpose, I can see that, although the beauty of the Intel instruction set is that by having most of the instructions single byte the memory bandwidth is conserved for data access. The price is that you have special purpose registers. | But, of course, it won't happen. *sigh* Registers were added with the 286 and 386. I have yet to see a compiler which makes use of the 386 registers. I hope people will contribute idea of useful additions, rather than talk about how Intel can be more like {your favorite chip or style}. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) VMS is a text-only adventure game. If you win you can use unix.
jcb@frisbee.Eng.Sun.COM (Jim Becker) (12/19/90)
sef@kithrup.COM (Sean Eric Fagan) writes: In article <3042@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: > What features should be put into the CPU to improve performance and >reduce chip count? While I don't know that it would reduce chip count, a Good Thing to have would be: MORE REGISTERS!!!!!! But, of course, it won't happen. *sigh* -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." There are a whole host of debugging registers in the 486 -- is there any way to use them? One would think that when the chip gets to the market they would have the debugging out of the way, and those registers would be freed up for use by OS and compiler people. -Jim Becker -- -- Jim Becker / jcb%frisbee@sun.com / Sun Microsystems
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/19/90)
In article <1990Dec18.115605.7411@jarvis.csri.toronto.edu> jonah@dgp.toronto.edu (Jeff Lee) writes: | What *can* be annoying is having to save all registers in every | exception handler. Having a separate set of GP registers for each | processor mode could turn "traps" and "interrupts" into almost | instantaneous co-routine switches. The tricky part might be flushing | the pipeline correctly -- I don't know how easily this can be done. | My caveat on this is that these additional registers should look just | like the normal GP registers so that kernel code can be compiled with | the same compiler as user code. Only the context save/restore code | should need to access registers in another register bank. The PDP10 | used to have different user/system register banks so it can be done. You've just described the Z80. I would think it useful to (a) disable interrupts while the registers were swapped, and (b) allow access to the alternate set. This and an instruction to "save alternate regs and enable ints" could be used if more than a few instructions were needed to service the condition. And the converse, of course. The Z80 was fast for its day when that technique was used. I had "parallel port NFS" under CP/M to get access to drives on other machines. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) VMS is a text-only adventure game. If you win you can use unix.
rouellet@crhc.uiuc.edu (Roland G. Ouellette) (12/19/90)
> While more registers sound like motherhood and apple pie, in the > UNIX world they can be a distinct losing proposition. The commonest > service provided by the kernel is a state switch between processes. > The more registers, the longer this state switch must necessarily > take. The only ways out of this require lots more hardware and > discipline from both the compilers and the programmer. In the UNIX world maybe this may be a problem (changing the page table maps in an MP system and figuring out which processes are runnable is probably more of a problem). However your context switch code is likely to involve several procedure calls, each of which may save some registers. By the time the stacks are about to be swapped, most of the user registers will have been flushed out onto the stack of the outgoing process. Only the few that didn't get touched will need saving. The compiler will tell you which ones need to be saved. PLUG: Choices, an OO OS written in an OO language (C++) here at the University of Illinois does this. Vince managed to get g++ (and maybe C Front) to do this for him. [He also complained loudly about hardware enforced context switch instructions which saved every register because his code had less overhead.] This sort of thing might be possible in a UNIX environment, but there's a load of crufty code out there. [I've seen BSD derived code for context switches (from a vendor to remain nameless -- they may have fixed it) which simulated in SW the PCBs used on VAX computers eventhough some of the state was known to be fairly useless on that architecture... like 4 of the 5 stack pointers.] -- = Roland G. Ouellette ouellette@tarkin.enet.dec.com = = 1203 E. Florida Ave rouellet@[dwarfs.]crhc.uiuc.edu = = Urbana, IL 61801 "You rescued me; I didn't want to be saved." = = - Cyndi Lauper =
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/19/90)
In article <15145@ogicse.ogi.edu> borasky@ogicse.ogi.edu (M. Edward Borasky) writes: | I think they made a mistake putting the co-processor ON CHIP; | surely it would be faster if the floating point were done in a \ | specialized unit (386/387 style). That way you can get faster floating | point with other peoples' coprocessors (Weitek, for example). The 486 uses fewer cycles than the 386 for the same instructions. The Weitek can still be added. The boards are easier to design, smaller, and have less support logic, and are thus cheaper to build. If Intel and the board vendors were not recovering design cost and making all the profit the market will bear, I think the 486 would be cheaper than a 386+387. As it is, the prices are comparable, and the cost performance is a lot better on the 486, at least at the system level. How about a new subject line if you want to continue this, it's not related to what could be added to the 586. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) VMS is a text-only adventure game. If you win you can use unix.
is@athena.cs.uga.edu ( Bob Stearns) (12/19/90)
Not only have I taken the OS courses, I have written, mucked about in and generally been involved in more OS type work for various architectures than most people even know exist. Note that when I read "more registers" I think in terms of machines like the CYBER 205 with its 256 64bit registers or even larger sets. Yes, when the register count is a measly 8-32 32bit registers the save/restore overhead is fairly small, although there is also the call versus interrupt penalty, depending upon who must save/restore registers during a call/return sequence. The rest of the state is small compared to the 8K bits of registers I was considering, and the choice of next process to schedule should have been already taken care of by the process list maintenance routines using something like a heap by priority/time so selecting the next one should be a very short algorithm. See Sedgewick on the subject.
thor@thor.atd.ucar.edu (Richard Neitzel) (12/19/90)
In article <450@lysator.liu.se>, zap@lysator.liu.se (Zap Andersson) writes: |> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: |> |> >The questions: |> |> > What features should be put into the CPU to improve performance and |> >reduce chip count? |> |> Once upon a time I built a graphics board for my computer. 240x240 pixels, 16 |> shades of gray, not very heavy, BUT I had windows-like support in HARDWARE! |> Now I have |> NEVER understood why this is not common practice in todays computers! I mean |> what CAN be easier than to include in the gfx chip that 'when beam reaches |> this'n'that row/column, start displaying bitmap-data from this'n'that memory! |> The Amiga is the closes I've seen, supporting these 'semi-hardware' (the |> amiga uses a co-processor) as horizontal slices of display. With a faster |> co-processor (i.e. faster than 1 pixel bitclock) you could have hardware |> windows support! You will NEVER need to worry about memorys overlapping, or |> in what memory to write! You just write to your 'virtual' screen, and the |> display chip takes care about it ALL. |> |> Can SOMEONE tell me why this increadibly simple idea have so little use today? |> |> > Will assumptions about graphics memory organization be made, and if so |> >what are they? |> |> See above. But if your into 586 Windows handling, try to think up something |> NEW! Don't bother with standards with moss on top....please? |> If I interpret correctly what you are asking for, check out the Tadpole TP-AGCV graphics board. Tadpole has a special windowing chip that allows the following to be set via registers: a window's screen x,y start point, it's height and width, the starting location in memory, stacking priority, zoom factor and display enable. Moving a window, [un]displaying it, panning through video memory, setting a window's zoom factor, etc. require one or two writes. In our application, we want to switch between multiple windows nearly instantaneously. The Tadpole board can switch between two sets of two windows faster then the screen refresh rate - makes a neat display to see both sets of windows at the same time (just have a loop that swaps the sets constantly!). Currently they have a 6U VME board with 4 Mb of video ram, but you can also buy the windowing chips from Tadpole. -- Richard Neitzel thor@thor.atd.ucar.edu Torren med sitt skjegg National Center For Atmospheric Research lokkar borni under sole-vegg Box 3000 Boulder, CO 80307-3000 Gjo'i med sitt shinn 303-497-2057 jagar borni inn.
ckp@grebyn.com (Checkpoint Technologies) (12/19/90)
In article <15145@ogicse.ogi.edu> borasky@ogicse.ogi.edu (M. Edward Borasky) writes: >>In article <3042@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: >>> What features should be put into the CPU to improve performance and >>>reduce chip count? >I thought the way to improve performance was to REMOVE features! And This is *not* an option when you have a significant software base to protect. And surely Intel has a gargantuan software base to protect. Same with the 68K line. Just think of it. Intel releases the 586, and to improve performance they remove a few complex instructions and replace them with one or two simpler but faster instructions. No software that used those instructions will run. Intel earns a bad rep and sells zero chips as the journalists take Intel apart for producing an incompatible chip. BTW: Something I read in a PC mag recently irked me. Paraphrased, the author wrote "\"Incompatible\" means that something that's supposed to work together with something else, doesn't". Well, by my book "incompatible" only means that something doesn't work together with something else, it makes no moral judgement about whether it's supposed to. But in the PC world, "incompatible" is taken to mean "bad", "wrong", "evil". I just wanted to get that off my chest... -- First comes the logo: C H E C K P O I N T T E C H N O L O G I E S / / \\ / / Then, the disclaimer: All expressed opinions are, indeed, opinions. \ / o Now for the witty part: I'm pink, therefore, I'm spam! \/
kdarling@hobbes.ncsu.edu (Kevin Darling) (12/19/90)
zap@lysator.liu.se (Zap Andersson) writes: >Once upon a time I built a graphics board for my computer. 240x240 pixels, 16 >shades of gray, not very heavy, BUT I had windows-like support in HARDWARE! >Now I have > NEVER understood why this is not common practice in todays computers! I mean Total agreement! There's no doubt that systems programmers in the not-too-distant future will look back on today as the dark ages of writing windowing software. "What?? No windows in hardware?? Auugh!" I believe the Intel 82786 gfx chip does have this support now. Each window is a different section of memory, and can be of virtually any mode... so for example, you could have a CGA-style window in the middle of a 1Kx1K display. No idea what it costs tho. Anyone know? I read that at least one new fancy terminal uses it. The Philips VSC video chip allows horizontal windows a la the Amiga; but yeah, having more than one window/line seems to be still a ways off. Some days I'm tempted to rig up my own external hardware method. best - kev <kdarling@catt.ncsu.edu>
jbuck@galileo.berkeley.edu (Joe Buck) (12/19/90)
In article <24117@grebyn.com>, ckp@grebyn.com (Checkpoint Technologies) writes: |> Just think of it. Intel releases the 586, and to improve performance |> they remove a few complex instructions and replace them with one or two |> simpler but faster instructions. No software that used those |> instructions will run. Intel earns a bad rep and sells zero chips as |> the journalists take Intel apart for producing an incompatible chip. No, I'm afraid not. What you do is you get an illegal instruction trap when the wierd instruction is run, and the trap handler then emulates the instruction. The chip-maker releases the code for the trap-handler (makes it public) and the PC-clone folks put it in their BIOS ROMs and the Unix-port people put it in their kernels. The lowly user has no idea that anything is different, since the 586 is so much faster that the emulated instruction is faster than the original. No doubt some ignorant journalist will write an article making the point you make. Everyone in the know will proceed to laugh at that journalist. It's been done before, of course; MicroVAXes do just this (they don't support the fancy VAX instructions but emulate them with traps). -- Joe Buck jbuck@galileo.berkeley.edu {uunet,ucbvax}!galileo.berkeley.edu!jbuck
graeme@labtam.labtam.oz (Graeme Gill) (12/19/90)
In article <1990Dec18.113834.5227@diku.dk>, torbenm@freke.diku.dk (Torben [gidius Mogensen) writes: > graeme@labtam.labtam.oz (Graeme Gill) writes: > > >as I know, only two mainstream processors support burst writes: > >The Intel 80960, and the Amd 29000. Both make dandy processors for > >X terminals, laser printers etc. etc. as a result. > > There is also the ARM. And before you say that this isn't a mainstream > processor, I should point out that it has a larger user base than > either Intel 80960 or Amd 29000. In fact it is the second most used > RISC processor (SPARC being the most used). But the ARM only has 16 generally accessible registers. From experience with the 960 I have found that 32 registers looks a bit small when you are reading and writing 4 words at a time. In this regard, the 29000 has an advantage. However, the 29000 is flawed in stalling execution while a store or load multiple instruction is executing. I suspect the ARM also suffers from this problem. The ARM does not seem to be used much outside Europe at the present time. I do not hear much about Acorn computers in Australia, and they do not seem to have any presence outside the home computer market. Graeme Gill
sef@kithrup.COM (Sean Eric Fagan) (12/19/90)
In article <1990Dec18.202842.11771@athena.cs.uga.edu> is@athena.cs.uga.edu ( Bob Stearns) writes: >Note that when I read "more registers" I think >in terms of machines like the CYBER 205 with its 256 64bit registers or even >larger sets. Ah. A misunderstanding... 8-) I just mentioned the 205 to someone in a response to my posting (remember that the ETA-10 is a faster and better 205). However, recall that we were discussing the *86, a machine which has *6* registers available for "general purpose" use, which really aren't (lots and lots of instructions require certain registers, or at least work better with them). More on the subject of context switching: the Elxsi had something like 16 sets of registers on board. During a context switch (e.g., from one thread to another [no supervisor mode on the machine]), it just used the next available set of registers. The hardware *knew* about threads and whatnot, so this was feasible. But I can imagine someone like MIPS or Sun (for the Sparc, of course) putting a few different sets on board, whose sole purpose would be to act as a buffer when handling faults and whatnot. Sort of like register windows, only for context switches, not subroutine calls. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
sef@kithrup.COM (Sean Eric Fagan) (12/19/90)
In article <24117@grebyn.com> ckp@grebyn.UUCP (Checkpoint Technologies) writes: >In article <15145@ogicse.ogi.edu> borasky@ogicse.ogi.edu (M. Edward Borasky) writes: >>I thought the way to improve performance was to REMOVE features! And >This is *not* an option when you have a significant software base to >protect. And surely Intel has a gargantuan software base to protect. >Same with the 68K line. >Just think of it. Intel releases the 586, and to improve performance >they remove a few complex instructions and replace them with one or two >simpler but faster instructions. No software that used those >instructions will run. Intel earns a bad rep and sells zero chips as >the journalists take Intel apart for producing an incompatible chip. Uhm... have you read about the 68030 and the 68040? The '30 removed two instructions that the '20 introduced (CALLM and RETM, I think), that few to no people used. The '40's on-board FPU does only a drastic subset of the 68882 (is that the 68k FPU?). It basicly does add, subtract, mult, and div, and a few others; the rest have to be emulated by the OS (or whatever is in control of the machine). Motorola has not earned a bad rep for that, nor have they sold zero chips. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
graeme@labtam.labtam.oz (Graeme Gill) (12/19/90)
In article <450@lysator.liu.se>, zap@lysator.liu.se (Zap Andersson) writes: > NEVER understood why this is not common practice in todays computers! I mean > what CAN be easier than to include in the gfx chip that 'when beam reaches > this'n'that row/column, start displaying bitmap-data from this'n'that memory! > The Amiga is the closes I've seen, supporting these 'semi-hardware' (the > amiga uses a co-processor) as horizontal slices of display. With a faster > co-processor (i.e. faster than 1 pixel bitclock) you could have hardware > windows support! You will NEVER need to worry about memorys overlapping, or > in what memory to write! You just write to your 'virtual' screen, and the > display chip takes care about it ALL. > > Can SOMEONE tell me why this increadibly simple idea have so little use today? The answer to this is the usual RISC vs CISC arguments. Why have very complicated hardware, that tends to be locked into a particular implementation of windowing etc. , when with a little bit of effort on the window library programmers part you can get the same performance with more general hardware - ie RISC processor and frame buffer. Specialised graphics hardware is usually about a generation behind mainstream processors. Doing windowing in software allows a great deal of flexibility in fixing bugs, keeping up with standards developments, ease of porting code to new generations of hardware, etc., Even some of the high end graphics vendors are throwing out their hardware pipelined 3d transform/clip engines, and putting more general purpose processors in their place, like a bunch of i80860s. There is definitely a place for hardware assist of graphics operations, but "do it all" solutions tend to date rapidly. Graeme Gill Labtam Australia
sef@kithrup.COM (Sean Eric Fagan) (12/19/90)
In article <3058@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: > I think we can assume that the 586 will be a superset of the 486. Can >someone quantify what would be gained with more registers, say R0-R7? Yep. Optimization. Take a look at code produced by either gcc or msc for the '386 some time. Ever hear of the message, "infinite spill"? > Registers were added with the 286 and 386. I have yet to see a >compiler which makes use of the 386 registers. The registers visible to ring three applications for the '386 were fs and fs (making a total of six segment registers, to match the six "gp" registers). And I've seen code use it. Remember that a) they're only 16 bits, and b) in protected mode, loading a segment register with an invalid segment number will cause a fault. I had a version of a compiler that used fs for doing certain weird things (like jumping from a 32-bit segment to a 16-bit segment *shudder*). -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
sef@kithrup.COM (Sean Eric Fagan) (12/19/90)
In article <4748@exodus.Eng.Sun.COM> jcb@frisbee.Eng.Sun.COM (Jim Becker) writes: >There are a whole host of debugging registers in the 486 -- is there >any way to use them? One would think that when the chip gets to the >market they would have the debugging out of the way, and those >registers would be freed up for use by OS and compiler people. Uhm... the debugging registers on the '486 are (if I understand what you're talking about) the same as the debugging registers on the '386 (one addition, I think). They're used for debugging. For example, CodeView under SCO UNIX uses the debugging registers to set data breakpoints (i.e., break when this address is read to, or written to, or executed). They're not visible to the application, I believe, and can't be used in a multiply instruction, for example. -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
sef@kithrup.COM (Sean Eric Fagan) (12/19/90)
In article <1990Dec18.213506.645@ncsuvx.ncsu.edu> kdarling@hobbes.ncsu.edu (Kevin Darling) writes: >I believe the Intel 82786 gfx chip does have this support now. Each >window is a different section of memory, and can be of virtually any >mode... so for example, you could have a CGA-style window in the middle >of a 1Kx1K display. They should not go into CPU's (my opinion). For example, I don't have a *GA on kithrup. I have a cornerstone (early model) with 1600x1200. It doesn't look *anything* like a *GA (when in graphics mode). Having Intel put VGA onto the chip would mean that I would not use it. I'm not against having your hardware manage your graphics; I just don't want my *cpu* to do that. (See the SGI Graphics Board for a *good* example. Also see the NS32GX16 for a good example of why *not* to put this stuff in the CPU.) -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
ckp@grebyn.com (Checkpoint Technologies) (12/19/90)
In article <9876@pasteur.Berkeley.EDU> jbuck@galileo.berkeley.edu (Joe Buck) writes: >What you do is you get an illegal instruction trap >when the wierd instruction is run, and the trap handler then emulates >the instruction. The chip-maker releases the code for the trap-handler >(makes it public) and the PC-clone folks put it in their BIOS ROMs and >the Unix-port people put it in their kernels. You're right. I believed BIOS compatibility would be an issue too, but maybe not. You know, Motorola has been getting away with exactly this. The 68010 took away the user-level MOVE SR,dest instruction. The 68030 took away the user-mode CALLM and RETM instructions (good riddance, I say) introduced on the 68020. But you know what else? No system I know of traps and emulates those for backward compatability. Now the 68040 removes the user-mode trig instructions in the FPU, and replaces them with emulation support. I suspect these will emulated in real systems, unlike the others. -- First comes the logo: C H E C K P O I N T T E C H N O L O G I E S / / \\ / / Then, the disclaimer: All expressed opinions are, indeed, opinions. \ / o Now for the witty part: I'm pink, therefore, I'm spam! \/
johnl@iecc.cambridge.ma.us (John R. Levine) (12/19/90)
In article <3058@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: > I think we can assume that the 586 will be a superset of the 486. Indeed. It seems to me pointless to add new features to the user-mode instruction set. There are enough 386 and 486 chips around that no sane programmer would use the new features, the gain would be unlikely to be worth losing backward compatibilty. Rather, we need either features that transparently improve performance, or else features that are a big enough win that it's worth forsaking backward compatibility. Here are a few suggestions: -- Per-segment paging. As has been beaten to death here before, the current paging scheme limits the total address space of a process to 4GB. With a page table per segment, you actually could map each open file to a segment (a 4GB file is still pretty big) and merge all I/O with virtual memory. -- Better segment performance. On the 286, 386, and 486 it takes forever to load a segment register. On the 486 it takes 9 cycles, compared to 1 cycle for a regular memory load. Perhaps it could cache a dozen or so recently loaded segment numbers. The FS and GS registers are no substitute, nobody has the faintest idea how to manage segment registers separately from address registers. -- (My favorite.) Better interrupt performance. There are two problems. One is that interrupts are just plain slow. A normal interrupt takes 71 cycles, but if you use the facility to run an interrupt in its own task, the interrupt takes 236 cycles. The return takes 231. I know it's doing a lot of work, but get real -- that's close to 20us for a null interrupt handler on a 25MHz part. Some lighter weight interrupts, perhaps assisted by multiple register sets, would be nice. Also, device interrupts on the 486 use the same creaky method that the 8088 did. There's a single interrupt line, and when the interrupt happens it accepts a vector from the interrupt controller. That controller is still an 8259A which only has 8 interrupt lines unless you cascade them which is a kludge. There is no easy way to mask some device interrupts without masking them all (you can stuff commands to the 8259 but it's slow and clumsy.) An interrupt level register that the kernel could manage easily, sort of like the PDP-11 scheme, would be helpful. To support this without having a dedicated interrupt line for each device needs a bus protocol so that devices can post a request for interrupt including the interrupt number, and the CPU can come back later and say "number 17, your interrupt is now taken." If we have all level-triggered interrupts, we could even get by without the call back. -- Graphics support of various kinds. The 860 has a little support for ray tracing, with some instructions that make it easy to whiz through your data structures and figure out what obscures what. One might also like some support for bit-aligned bit-blits, though that tends to tie up the data bus and so would be far more useful if it had some separate path to memory, at least to video memory, that didn't lock out the CPU. -- John R. Levine, IECC, POB 349, Cambridge MA 02238, +1 617 864 9650 johnl@iecc.cambridge.ma.us, {ima|spdcc|world}!iecc!johnl "Typically supercomputers use a single microprocessor." -Boston Globe
brandis@inf.ethz.ch (Marc Brandis) (12/19/90)
In article <1990Dec18.141944.5041@athena.cs.uga.edu> is@athena.cs.uga.edu ( Bob Stearns) writes: >> >>While I don't know that it would reduce chip count, a Good Thing to have >>would be: MORE REGISTERS!!!!!! >> >> >While more registers sound like motherhood and apple pie, in the UNIX world >they can be a distinct losing proposition. The commonest service provided by >the kernel is a state switch between processes. The more registers, the longer >this state switch must necessarily take. ... Of course, a process switch takes longer when you have more registers to save. However, when you look at the typical process switch times in UNIX, you will see that the register saving part is not a dominating part. UNIX process switch times are in the millisecond area, while the time required to save registers is in the microsecond area. How much will it make your process switch time longer when you would have 32 registers in the 386 instead of 8? 24 loads and 24 stores, or something between 100 and 200 processor cycles, which is between 4 and 8 microseconds on a 25 MHz machine. This accounts for around one percent of the process switch time (or less, I do not have exact numbers for 386 implementations of UNIX). Now look at the alternative, which is to let the application do more memory references because it cannot keep enough information in the registers. When you look at recent papers about computer architecture or compiler construction, you will see that a larger register file is able to reduce the number of memory references a lot. Between two process switches, you are very likely to save more than 50 references when having a larger register file, thus making the machine faster. I understand that there are applications in embedded systems where a very fast task switch is important, and where the work done per task switch is low. In these cases a processor context as small as possible is the right choice. However, you do not want to run such a high-overhead process switch like in UNIX on such a system. Marc-Michael Brandis Computer Systems Laboratory, ETH-Zentrum (Swiss Federal Institute of Technology) CH-8092 Zurich, Switzerland email: brandis@inf.ethz.ch
kdarling@hobbes.ncsu.edu (Kevin Darling) (12/19/90)
|In <1990Dec19.052844.4083@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes | |>In <1990Dec18.213506.645@ncsuvx.ncsu.edu> kdarling@hobbes.ncsu.edu (Kevin Darling) writes: |>I believe the Intel 82786 gfx chip does have this support now. Each |>window is a different section of memory, and can be of virtually any mode... | |They should not go into CPU's (my opinion). Agreed, tho in this case it's not. The 82786 is a graphics display and coprocessor chip meant to be used in addition to the normal cpu. It's pretty nice from what I've read about it: Shares 4meg RAM with the cpu All the usual blit/draw functions, plus display zoom/pan each window Display modes include 640x480x256 up to 1024x1024x2 or can sync several to go to even higher color res But the nice thing from my standpoint (writing windowing drivers) is that each "window" is a _separate_ packed-bitmap (up to 32K x 32K pixels) in the shared memory... the 82786 takes care of combining them on the screen. You can have up to 16 displayed windows per scan-line, which seems a good start (no limit vertically). And each displayed window can be of 1-8 bits per pixel in depth (the 82786 changes modes on the fly per window). The start-pos/size of each window is settable on pixel boundaries. So it sounds almost ideal to me. I wouldn't have to worry about overlapping windows or forcing all windows/screen into one mode, etc. Thx for reminder, btw... I need to get the price on these devils. For a good article on this chip, see BYTE August 1987 (!). After all this time, I had figured the chip had never come out, but then I saw an ad for a terminal using it, a few weeks ago. best - kev <kdarling@catt.ncsu.edu>
kdarling@hobbes.ncsu.edu (Kevin Darling) (12/19/90)
OOOPS! Speaking about gfx support, sef@kithrup.COM (Sean Eric Fagan) wrote: >They should not go into CPU's (my opinion). And I replied: >Agreed, tho in this case it's not. The 82786 is a graphics display and >coprocessor chip meant to be used in addition to the normal cpu. It's >pretty nice from what I've read about it: [etc] Sorry. Brain in neutral, I guess. The thread was supposed to be about features *added to cpus*... I got sidetracked onto separate gfx support chips <sigh>. Happy holidays! - kevin <kdarling@catt.ncsu.edu>
mcdonald@aries.scs.uiuc.edu (Doug McDonald) (12/19/90)
In article <1990Dec19.052338.3911@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: >In article <3058@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: >> I think we can assume that the 586 will be a superset of the 486. Can >>someone quantify what would be gained with more registers, say R0-R7? > >Yep. Optimization. Yes, in some person's sense. But maybe not speed. IF you add more registers you have to add instructions to access them. The register addressing system, of the 386 is already quite full. Those of you who want more registers, please explain here on the net exactly what the op-codes you are going to use to acces those registers. Are you goint to add a byte- prefix to every register instruction that says "use the special new register set"? If so, please explain how that would speed execution. Please remember that any operand that would be put in a register would be in the cache anyway. Only once have I needed more registers than the 80286 already has. Due to the greater flexibility in use of the registers of the 386 over the 286, I was able to recode for the 386 and get everything in registers. Result: a 3% speedup. I think a FAR better idea than squeezing in more registers would be to take advantage of the fact that the 80x86 was designed from the start to have an efficient instruction set, leave it that way, and simply use the chip space to make **everything** faster. Doug McDonald
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/19/90)
In article <5800@labtam.labtam.oz> graeme@labtam.labtam.oz (Graeme Gill) writes: | The answer to this is the usual RISC vs CISC arguments. Why have very | complicated hardware, that tends to be locked into a particular implementation | of windowing etc. , when with a little bit of effort on the window library | programmers part you can get the same performance with more general hardware When this lovely generalized system can perform at a reasonable rate, then that's fine. Until then users will want hardware boost because it's more pleasant to use, companies will want it because it's more productive. A display system isn't fast enough until it has to be slowed down to avoid overrunning the input bandwidth of the eye. Until then people will want more, and today that means some hardware assists. In truth you *can't* write software as fast as dedicated hardware, with any amount of effort, much less "a little bit." -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) VMS is a text-only adventure game. If you win you can use unix.
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/19/90)
In article <1990Dec19.052338.3911@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: | In article <3058@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: | > I think we can assume that the 586 will be a superset of the 486. Can | >someone quantify what would be gained with more registers, say R0-R7? | | Yep. Optimization. Take a look at code produced by either gcc or msc for | the '386 some time. Ever hear of the message, "infinite spill"? I meant what I said - "quantify" rather than qualify. Yes optimization would be better and memory accesses would be down, but how much? -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) VMS is a text-only adventure game. If you win you can use unix.
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/19/90)
In article <1990Dec19.060521.16051@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes: [ lots of good stuff ] [ stuff about ints being painfully slow ] | Also, device interrupts on the 486 use the | same creaky method that the 8088 did. There's a single interrupt line, and | when the interrupt happens it accepts a vector from the interrupt controller. Yes, isn't that a nice general solution? It allows simple devices to create interrupts without needing an interrupt controller in the system at all, and yet give 256 discrete interrupts in the vector. | That controller is still an 8259A which only has 8 interrupt lines unless you | cascade them which is a kludge. There is no easy way to mask some device | interrupts without masking them all (you can stuff commands to the 8259 but | it's slow and clumsy.) Here I disagree. While the cascade does cause some latency, it allows groups of interrupts to be enabed and disabled at once, and for some to be edge and some level triggered. How slow and clumsy can a two instruction sequence load to register and out register to port be? | An interrupt level register that the kernel could | manage easily, sort of like the PDP-11 scheme, would be helpful. The 8259 has a mode which disables all low priority interrupts while the current interrupt is being serviced. And one which takes them at single priority "round robin." | To support | this without having a dedicated interrupt line for each device needs a bus | protocol so that devices can post a request for interrupt including the | interrupt number, and the CPU can come back later and say "number 17, your | interrupt is now taken." If we have all level-triggered interrupts, we could | even get by without the call back. This is a bus issue, I think. Actually this whole thing is taking place off chip, so you can do anything you want for interrupts. You can use a multiplexed scheme to reduce the number of lines, with or without the 8259. It's not part of the CPU, except in the 80186 which had a clock, interrupt controller, and a couple of serial i/o ports (1 bit) built in. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) VMS is a text-only adventure game. If you win you can use unix.
dswartz@bigbootay.sw.stratus.com (Dan Swartzendruber) (12/20/90)
I believe Motorola removed user-mode access to the SR not for any issues of efficiency or compatability, but rather to allow virtual machine support. -- Dan S.
rstewart@megatek.UUCP (Rich Stewart) (12/20/90)
In article <1990Dec19.060521.16051@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes: >-- Graphics support of various kinds. The 860 has a little support for >ray tracing, with some instructions that make it easy to whiz through your >data structures and figure out what obscures what. One might also like some >support for bit-aligned bit-blits, though that tends to tie up the data bus >and so would be far more useful if it had some separate path to memory, at >least to video memory, that didn't lock out the CPU. > >-- >John R. Levine, IECC, POB 349, Cambridge MA 02238, +1 617 864 9650 >johnl@iecc.cambridge.ma.us, {ima|spdcc|world}!iecc!johnl >"Typically supercomputers use a single microprocessor." -Boston Globe What on the i860 supports ray tracing? It has limited z buffer support, multiple pixel output, and some color interpolation support. Back to the 586, block operations, pixel functions, and plane operations would all be real nice to support a generic window concept. -Rich
sef@kithrup.COM (Sean Eric Fagan) (12/20/90)
In article <1990Dec19.143749.3216@ux1.cso.uiuc.edu> mcdonald@aries.scs.uiuc.edu (Doug McDonald) writes: >Yes, in some person's sense. But maybe not speed. Give me a break. Go read some papers on compiler design, in particular optimization. If you have more registers, you can cut down accesses to memory, which is *slow*. >IF you add more registers >you have to add instructions to access them. The register addressing >system, of the 386 is already quite full. No shit. >Those of you who want more >registers, please explain here on the net exactly what the op-codes you >are going to use to acces those registers. Are you goint to add a byte- >prefix to every register instruction that says "use the special new >register set"? If so, please explain how that would speed execution. >Please remember that any operand that would be put in a register would be in >the cache anyway. That last statement is *not* necessarily true. Second of all, I never said that it would be easy or possible to add more registers, only desirable to have more registers. Are you so fond of code like mov eax, DWORD PTR [ebx+ecx*8+1234] and then, three instructions later, mov DWORD PTR [esp+12], eax mov eax, DWORD PTR [...] /* another two or three instructions */ mov eax, DWORD PTR [esp+12] Do you *really* understand what this is going to cost you in terms of performance? >Only once have I needed more registers than the 80286 already has. How nice for you. Now go compile some code, and get a disassembly. Note all the memory references, because the compiler had to use them when it would have been nicer to have some extra registers. Count all the spills to memory. Add up all those extra cycles. Fun, isn't it? It's *so* amazing how much faster a chip can be when it has to do a 32-bit data access every instruction! >I think a FAR better idea than squeezing in more registers would be >to take advantage of the fact that the 80x86 was designed from the start >to have an efficient instruction set, leave it that way, and simply use >the chip space to make **everything** faster. The instruction set was designed to be efficient in a different era. Now, it's not so efficient. Why do you think that RISC chips, or even 68k's, are getting such higher performance? -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
sef@kithrup.COM (Sean Eric Fagan) (12/20/90)
In article <3068@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: > I meant what I said - "quantify" rather than qualify. Yes optimization >would be better and memory accesses would be down, but how much? Well... I could suggest you go read any recent (last decade or so) papers on compiler optimization techniques, which would be chock full of them. Also read papers on the RISC chips, and why the register and instruction sets were chosen. Here is a sample of code: r2 = r3 = inb (0x3b8); r2 |= 8; outb (0x3b8, r2); (r0 through r7 are declared locally as 'unsigned long r0, r1, ...;', and inb and outb are declared as 'static inline unsigned char ...', and written using inline assembly) Here is the code gcc generates for that: inb (%dx) movl $952,-220(%ebp) movw -220(%ebp),%dx inb (%dx) movb %al,-216(%ebp) movzbl -216(%ebp),%eax movl %eax,-216(%ebp) movzbl -216(%ebp),%eax movl %eax,-212(%ebp) movl %eax,-216(%ebp) movl -220(%ebp),%eax movl %eax,-220(%ebp) movl -216(%ebp),%eax orl $8,%eax movl %eax,-216(%ebp) movw -220(%ebp),%dx movb -216(%ebp),%al outb (%dx) Excercise for reader: assuming 16 reigsters, rewrite that code using only r0 through r7 (which was all I had declared in my code). Then, take out an intel book on the '386, and figure out the timings of the old code and the new code (assume that the new register set will be accessed in the same amount of time as the old register set, since I'm talking about completely trashing the instruction set and redesigning it). -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
graeme@labtam.labtam.oz (Graeme Gill) (12/20/90)
In article <3066@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: > In article <5800@labtam.labtam.oz> graeme@labtam.labtam.oz (Graeme Gill) writes: > > | The answer to this is the usual RISC vs CISC arguments. Why have very > | complicated hardware, that tends to be locked into a particular implementation > | of windowing etc. , when with a little bit of effort on the window library > | programmers part you can get the same performance with more general hardware > > When this lovely generalized system can perform at a reasonable rate, > then that's fine. Until then users will want hardware boost because it's > more pleasant to use, companies will want it because it's more > productive. > > A display system isn't fast enough until it has to be slowed down to > avoid overrunning the input bandwidth of the eye. Until then people will > want more, and today that means some hardware assists. In truth you > *can't* write software as fast as dedicated hardware, with any amount of > effort, much less "a little bit." > -- I'm happy to say you are wrong. How does 208 Mbit/sec fill rate sound ? Or a 100 Mbit/sec blt rate sound ? That's equivalent to 30 frames a second fill rate on an 8 bit colour 1024 x 800 system, all done in software, no hardware support. It only took me a few weeks work, to code up the routines, and our customers don't have to know anything about it, since all they see is a standard X11 interface. This isn't pie in the sky, we've been shipping for over 12 months. If we'd used available graphics chips like the 34010, 82786, 63484 etc, rather than a general purpose CPU like the 80960 (or the 29000), then the terminals would have been a lot slower, with little or no possibility of fixing the operations those chips don't support very well. In addition, we don't need another CPU chip as well to handle ethernet i/o, X protocol processing etc. You will notice that all the standalone 34010 systems have a 80186 or something in the box as well. Our customers enjoy using the terminals, because they are noticeably faster and more interactive than products based on available graphics chips. This is likely to be the shape of the future. Notice that the Apple Mac accelerator cards are based on 29000 chips, and that a number of accelerator cards for the IBM PCs are starting to appear, based on RISC CPUs rather than graphics chips. As I said before, there is definitely room for hardware assist, but a bit of general purpose CPU goes a long way, especially in cost effective systems. Oh, and bye the way, a lot of the operations are so fast now, that they have to be slowed down in order to see whats going on. Graeme Gill Electronic Design Engineer. Labtam Australia
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/20/90)
In article <1990Dec19.222932.1446@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: | The instruction set was designed to be efficient in a different era. Now, | it's not so efficient. Why do you think that RISC chips, or even 68k's, are | getting such higher performance? Take a look at SPECmarks and rething that last one. The 25 MHz 486 falls between the SS1 and SS+, 33MHz is off the shelf, 40MHz is scheduled in a few months and engineering samples were out for board design, average cycles per instruction is something like 1.3, fairly close to the actual performance of most RISC machine. My point is that the term "such higher performance" is misleading, the 486 is comparable in performance to the typical single user workstation RISC CPU (not many people get a 4/490 for personal use). -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) VMS is a text-only adventure game. If you win you can use unix.
schow@bcarh185.bnr.ca (Stanley T.H. Chow) (12/21/90)
In article <1990Dec19.223934.1568@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: > [...] since I'm talking about completely >trashing the instruction set and redesigning it). I thought Intel already did exactly this! However, being sensible, they called the 80960 instead of a i586. Stanley Chow BitNet: schow@BNR.CA BNR UUCP: ..!uunet!bnrgate!bcarh185!schow (613) 763-2831 ..!psuvax1!BNR.CA.bitnet!schow Me? Represent other people? Don't make them laugh so hard.
schow@bcarh185.bnr.ca (Stanley T.H. Chow) (12/21/90)
In article <5813@labtam.labtam.oz> graeme@labtam.labtam.oz (Graeme Gill) writes: > I'm happy to say you are wrong. How does 208 Mbit/sec fill rate sound ? >Or a 100 Mbit/sec blt rate sound ? That's equivalent to 30 frames a second >fill rate on an 8 bit colour 1024 x 800 system, all done in software, no >hardware support. It only took me a few weeks work, to code up the routines, >and our customers don't have to know anything about it, since all they >see is a standard X11 interface. This isn't pie in the sky, we've been Hmm, 208 MBit/sec = 26 MByte = 26 MPixel of 8 bits each. How do you do 26 million byte writes per second on a 80960? What speed are you running it at? How is your frame buffer organized? Also, how much CPU is left for the user applications? Stanley Chow BitNet: schow@BNR.CA BNR UUCP: ..!uunet!bnrgate!bcarh185!schow (613) 763-2831 ..!psuvax1!BNR.CA.bitnet!schow Me? Represent other people? Don't make them laugh so hard.
wallach@motcid.UUCP (Cliff H. Wallach) (12/21/90)
In article <1990Dec19.223934.1568@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: -In article <3068@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: -- I meant what I said - "quantify" rather than qualify. Yes optimization --would be better and memory accesses would be down, but how much? - -Well... I could suggest you go read any recent (last decade or so) papers on -compiler optimization techniques, which would be chock full of them. Also -read papers on the RISC chips, and why the register and instruction sets -were chosen. - -Here is a sample of code: - - r2 = r3 = inb (0x3b8); - - r2 |= 8; - outb (0x3b8, r2); - -(r0 through r7 are declared locally as 'unsigned long r0, r1, ...;', and -inb and outb are declared as 'static inline unsigned char ...', and written -using inline assembly) - -Here is the code gcc generates for that: - - inb (%dx) - movl $952,-220(%ebp) - movw -220(%ebp),%dx - inb (%dx) - movb %al,-216(%ebp) - movzbl -216(%ebp),%eax - movl %eax,-216(%ebp) - movzbl -216(%ebp),%eax - movl %eax,-212(%ebp) - movl %eax,-216(%ebp) - movl -220(%ebp),%eax - movl %eax,-220(%ebp) - movl -216(%ebp),%eax - orl $8,%eax - movl %eax,-216(%ebp) - movw -220(%ebp),%dx - movb -216(%ebp),%al - outb (%dx) Is this code for real? - -Excercise for reader: assuming 16 reigsters, rewrite that code using only -r0 through r7 (which was all I had declared in my code). Then, take out an -intel book on the '386, and figure out the timings of the old code and the -new code (assume that the new register set will be accessed in the same -amount of time as the old register set, since I'm talking about completely -trashing the instruction set and redesigning it). - Exercise for compiler writers: Generate optimized code for a current architecture. Maybe something like: xor eax,eax mov edx,3b8h in al,edx mov r3[bp],eax or al,8 out edx,al mov r2[bp],eax Cliff Wallach ...uunet!motcid!wallach
mash@mips.COM (John Mashey) (12/21/90)
In article <3080@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: >In article <1990Dec19.222932.1446@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes: > >| The instruction set was designed to be efficient in a different era. Now, >| it's not so efficient. Why do you think that RISC chips, or even 68k's, are >| getting such higher performance? > > Take a look at SPECmarks and rething that last one. The 25 MHz 486 >falls between the SS1 and SS+, 33MHz is off the shelf, 40MHz is >scheduled in a few months and engineering samples were out for board >design, average cycles per instruction is something like 1.3, fairly >close to the actual performance of most RISC machine. > My point is that the term "such higher performance" is misleading, the >486 is comparable in performance to the typical single user workstation >RISC CPU (not many people get a 4/490 for personal use). 1) Generally, the only people who REALLY know the CPI are the architects of a given CPU, because there's no simple way to measure. However, 1.3, I think is rather far off, as shown below. 2) A reasonable approximation, that can actually be measured, is MHz/VAx-mips. (It actually happens that this is pretty close approximation for MIPS machines and others with grossly-similar instruction sets, I think). 3) If you look at MHz/SPEC-integer (a measureable idea of VAX-mips), you find things like (numbers thru Fall SPEC): MHZ SPECint M/S Cache size machine 25 12.4 2.0 64K Sun SS1+, IPC 25 13.3 1.9 128K Intel 486 (from Intel perf brief) 33 19.7 1.7 128K Sun SS/49* (NOT a desktop) 25 19.4 1.3 64K MIPS Magnum 3000 (1.288) 20 15.8 1.3 40K IBM RS6000/520 (1.265) I.e., to be more precise: a 486, with desktop/deskside package, is comparable in integer performance (although not in FP) to a desktop SPARC with a smaller cache. Also, recall yesterday's postings about taking are with compiler choice, timing etc, so all of this has caveats. However, it should be clear that the 486 does NOT have the MHZ/Spec of the more efficient RISCs; in addition, although I don't know exactly what a 486+cache+cache control costs, the MIPS case above costs something like $300-$400, and I suspect that's a bit les than the 486 case. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
jpk@ingres.com (Jon Krueger) (12/21/90)
From article <1990Dec19.060521.16051@iecc.cambridge.ma.us>, by johnl@iecc.cambridge.ma.us (John R. Levine): > With a > page table per segment, you actually could map each open file to a segment > (a 4GB file is still pretty big) and merge all I/O with virtual memory. Including output that you must guarantee has been written to nonvolatile store? In other words, output that survives operating system crashes? -- Jon -- Jon Krueger, jpk@ingres.com
silos@bench.sublink.ORG (Paolo Pennisi) (12/21/90)
In article <3069@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes: > In article <1990Dec19.060521.16051@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes: > > [ lots of good stuff ] > > [ stuff about ints being painfully slow ] > [ stuff in defense of the 8259 approach ] > > Actually this whole thing is taking place off chip, so you can do > anything you want for interrupts. You can use a multiplexed scheme to > reduce the number of lines, with or without the 8259. It's not part of > the CPU, except in the 80186 which had a clock, interrupt controller, > and a couple of serial i/o ports (1 bit) built in. > -- > bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) > VMS is a text-only adventure game. If you win you can use unix. I don't know how long this fact will be true.... actually Intel and AMD has produced higly integrated 80286 processor (which include the interrupt processor), and the whole clone market is stuck with the AT interrupt style which is surely not the best! I think the problem when dealing with 80x86 processors arise from their principal use in the MessDos clone market... This really huge source of money has biased Intel towards wrong goals for its CISC micro line. They need compatibility with the 8086, the 80286, they need the 8086 virtual mode (who cares "apart from the MessDos users" about the emulation of a virtual crippled processor?) I hope (for Intel, cause I don't like its micros) sometime Intell will build a x86 with only the 32bit features of the 486, some more register or, at lest, some more ortogonal instruction, and that will be a great day (for them, I insist). Paolo Pennisi -- (ARPA) silos@bench.sublink.ORG Paolo Pennisi (BANG) ...!otello!bench!silos via Solari 19 (MISC) ppennisi on BIX & PTPOSTEL 20144 Milano ITALIA ----< S U B L I N K N E T W O R K : a new way to *NIX communications >-----
sef@kithrup.COM (Sean Eric Fagan) (12/21/90)
In article <5874@avocado5.UUCP> wallach@motcid.UUCP (Cliff H. Wallach) writes: >Is this code for real? This code is very much for real, and was generated by a very good compiler: gcc 1.37.1 (with a couple of modifications). -- Sean Eric Fagan | "I made the universe, but please don't blame me for it; sef@kithrup.COM | I had a bellyache at the time." -----------------+ -- The Turtle (Stephen King, _It_) Any opinions expressed are my own, and generally unpopular with others.
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/21/90)
In article <44256@mips.mips.COM> mash@mips.COM (John Mashey) writes: | However, it should be clear that the 486 does NOT | have the MHZ/Spec of the more efficient RISCs; in addition, although | I don't know exactly what a 486+cache+cache control costs, | the MIPS case above costs something like $300-$400, and I suspect that's | a bit les than the 486 case. Thanks for many interesting bits. The numbers are interesting for several reasons, both because they indicate a higher cycles per instruction than I saw in the original report I got from one of our business units (and I accept that you may have better figures than I do), but also because they show the 486 as being faster for SPECint than the SS+. They figures I saw may have been with less cache, or may have included the SPECfloat as well. As for cost, it's very hard to compare. Because the 486 bundles a lot of stuff which is normally included in a workstation, MMU, FPU, and some cache and a cache controller, it's hard to do a comparison to RISC which is representative. If you count just the CPU, or CPU and cache, then the 486 looks expensive, while is you count like a vendor, and include the CPU, FPU, MMU, cache and controller, all the glue chips needed for discrete components, and the nebulous cost of motherboard realestate, the 486 may look very desirable. All in all comparing these systems is very hard to do, even for people who don't have any stake in the outcome. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) VMS is a text-only adventure game. If you win you can use unix.
mash@mips.COM (John Mashey) (12/22/90)
In article <3082@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes: >In article <44256@mips.mips.COM> mash@mips.COM (John Mashey) writes: > As for cost, it's very hard to compare. Because the 486 bundles a lot >of stuff which is normally included in a workstation, MMU, FPU, and some >cache and a cache controller, it's hard to do a comparison to RISC which >is representative. If you count just the CPU, or CPU and cache, then the >486 looks expensive, while is you count like a vendor, and include the >CPU, FPU, MMU, cache and controller, all the glue chips needed for >discrete components, and the nebulous cost of motherboard realestate, >the 486 may look very desirable. You clearly do need to compare apples to apples. About the only way I know how to od this is to compare the "CPU cores", i.e., eveything on the CPU side of the memory bus, for example: 486: 486 itself SRAMs (included in all 486-based machines for which SPEC numbers have been published, as far as I can tell) cache controller any other glue needed to get to the memory bus (?) (This looks like it has 2 medium-sized VLSI parts + SRAM, plus (maybe) a little glue. MIPS: R3000 (incl. MMU & cache controller) R3010 FPU SRAMs (direct control by CPU, no extra parts) misc other glue, such as read/write buffers (these days, a few small parts) (This package is what I gave the numbers for; it has 2 medium-sized VLSI parts + SRAM, plus a little glue...) 88K: 88100 2-8 88200s SPARC: (more variable) Integer Unit FPU MMU (either as MMU-part, or Sun-style SRAM design) SRAMs for cache cache control, glue, etc Fortunately, it is actually easier to do this for workstations, than for, for example embedded control, where everybody startsto argue about the need/desirability of various features, and apples-oranges comparisons abound :-) -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
uad1077@dircon.uucp (Ian Kemmish) (12/23/90)
In article <1990Dec18.213506.645@ncsuvx.ncsu.edu> kdarling@hobbes.ncsu.edu (Kevin Darling) writes: >zap@lysator.liu.se (Zap Andersson) writes: >>Once upon a time I built a graphics board for my computer. 240x240 pixels, 16 >>shades of gray, not very heavy, BUT I had windows-like support in HARDWARE! >>Now I have >> NEVER understood why this is not common practice in todays computers! I mean > >Total agreement! There's no doubt that systems programmers in the >not-too-distant future will look back on today as the dark ages >of writing windowing software. "What?? No windows in hardware?? Auugh!" > Hmmm, I've yet to see a windows-in-hardware chip that handles the input semantics of windows or canvasses - you'd still need to handle the canvas hierarchy in software, so having it in hardware as well just doubles the amount of book-keeping you do. Additionally, there is the problem of what you do when you map the n+1'th window.... as I write this, I can see about a hundred canvasses, a good few of them not rectangular. Since I spend far more time drawing pictures than dragging windows, and after all, clipping is ridiculously cheap compared to painting pixels, I find if hard to shake the convicition that a windowing chip would cost me more than it gained me. If you're into graphics, the best thing to invest in is a graphics pipeline and shaded triangle processor. If you're into 2D windows, the best thing would be a font scaler in hardware (i.e. rendering direct from Type1 font descriptions to the screen). -- Ian D. Kemmish Tel. +44 767 601 361 18 Durham Close uad1077@dircon.UUCP Biggleswade ukc!dircon!uad1077 Beds SG18 8HZ United Kingd uad1077%dircon@ukc.ac.uk
kdarling@hobbes.ncsu.edu (Kevin Darling) (12/23/90)
About windowing in hardware, uad1077@dircon.uucp (Ian Kemmish) writes: >Hmmm, I've yet to see a windows-in-hardware chip that handles the input >semantics of windows or canvasses - you'd still need to handle the canvas >hierarchy in software, so having it in hardware as well just doubles >the amount of book-keeping you do. Apologies... I'm not sure what you meant here. Yes, I'd have to keep the bounds and depth info anyway, but I don't think that tiny amount would be a burden. Especially as compared to the burden (code and cpu cycles) of having either user apps or system code do multiple redraws when one window gets unmapped or moved. >Additionally, there is the problem of what you do when you map the >n+1'th window.... <grin> Yes, that's always a bother. But we're talking about possible future hardware, not just today's (quick way out of corner ;-). >Since I spend far more time drawing pictures than dragging windows, >and after all, clipping is ridiculously cheap compared to painting pixels, >I find if hard to shake the conviction that a windowing chip would >cost me more than it gained me. I'm sure it depends on your needs and setup. In my case, I'm programming for a realtime multitasking computer whose cpu must execute both normal programs and windowing code. And any overlapping windows must be handled without asking apps to do redraws, so clipping is out of the question. I'm sure you're right that it wouldn't be a gain for you, but I'm just as convinced it'd be a win in my situation :-). Different strokes... cheers - kevin <kdarling@catt.ncsu.edu>
chris@mimsy.umd.edu (Chris Torek) (12/24/90)
>uad1077@dircon.uucp (Ian Kemmish) writes: >>Hmmm, I've yet to see a windows-in-hardware chip that handles the input >>semantics of windows or canvasses - you'd still need to handle the canvas >>hierarchy in software, so having it in hardware as well just doubles >>the amount of book-keeping you do. In article <1990Dec23.093537.18481@ncsuvx.ncsu.edu> kdarling@hobbes.ncsu.edu (Kevin Darling) writes: >Apologies... I'm not sure what you meant here. Essentially, you must retain the clipping boundaries for all windows in software so that you can tell where the input focus is (for `input is at cursor hot spot' interfaces, anyway; `click-to-type' interfaces could, in theory, ask your hardware-chip `which window number is spot (x,y)', and this can be computed during a display scan: 1/70th of a second for focus to take effect is not too bad). However, typically the answer to `where is the input' is best computed by a different method than `where are the windows', so this doubling is not quite accurate. >>Additionally, there is the problem of what you do when you map the >>n+1'th window.... ><grin> Yes, that's always a bother. But we're talking about possible >future hardware, not just today's (quick way out of corner ;-). Depending on how you define a `window', future hardware might have to handle numbers on the order of 10,000 windows. (X11 was originally designed to make each individual window cheap, unlike SunView; as time passed the windows got `fatter' and now in addition to `widgets', each of which is a window, there are toolkits with `gadgets', which are not. This is one of the reasons X11 is wrong. ---Not to belittle X11: it is a massive effort and there is a lot to be learned from it. Still, it has grown WAY too complicated. More in a moment:) >... any overlapping windows must be handled without asking apps to do >redraws, I agree with this. The window system (as a whole, however it is built) must provide each `window user' (application or whatever) the illusion that it has an arbitrarily large and arbitrarily perfect screen all to itself. There must be a way to find out what flaws exist (e.g., mapped or monochrome instead of true color, 1536x1152 pixel rather than infinite, etc.) for special purpose applications, but the default should be a perfect virtual display. (This is another reason X11 is wrong.) When you draw in an overlapped window, the draw should take place in the window. If the covered region is exposed, the window system must put up the result of the draw. If that means it must draw in off-screen memory, then it must draw in off-screen memory. (Some will make the following objection: `My high end display has 1536x1152 pixels, each with 24 bits of true color. That is 5 megabytes per display. You want a window system to allow 100 overlapping full-sized windows and you want it to retain all 500 megabytes?!?' The answer to this is `yes': `How much did you pay for your high-end display? And you mean to tell me that after that, you cannot afford another $1500 for a 600 MB disk for virtual memory?' The usual comeback is `but the application can recompute the display using less memory': Yes, but so what? That requires more code in every application. Pretty soon you have to buy a few $2500 1.2GB disks to hold the applications, not to mention all that money on developer effort to write the extra redisplay code, not to mention the low bandwidth between the CPU and display compared to on-display, .... The extra data space in each application is not free, either.) >so clipping is out of the question. Not at all---*within* the window system. Anyway, to move back towards architecture, there is one key point when it comes to doing windows in hardware: Working smart will always outdo working hard, but working hard can sometimes (often?) be cheaper. Right now, however, I think the tradeoff remains on the side of `working smart': i.e., doing the windows in software. It is moving towards `working hard', but has not got there yet. Give it a few more years.... -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris
spot@CS.CMU.EDU (Scott Draves) (12/25/90)
In article <28774@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:
Depending on how you define a `window', future hardware might have to
handle numbers on the order of 10,000 windows. (X11 was originally
designed to make each individual window cheap, unlike SunView; as time
passed the windows got `fatter' and now in addition to `widgets', each
of which is a window, there are toolkits with `gadgets', which are not.
X11 Windows are still as small and cheap as they ever were. The
reason Motif uses gadgets is that widgets are fat, especially its
widgets. In my Motif code I never use gadgets and find performance to
be adequate, and my code does a lot of creation and destruction of
widgets, which is their worst case.
The window system (as a whole, however it is built)
must provide each `window user' (application or whatever) the illusion
that it has an arbitrarily large and arbitrarily perfect screen all to
itself.
Well, I don't think arbitrarily large fits in with the rest of this,
but yes. This is basically what PostScript does (with the addition of
abstract coordinates).
(This is another reason X11 is wrong.)
I don't think statements like that are warranted. For something that
is "wrong" X11 is very successful and widely used.
My understanding is that X isn't intended to be very abstract,
abstract colors and coordinates are the place of an extension or a
toolkit. Unfortunately, no such beast exists, which is a shame; I
think it would be very popular.
Similarly, X doesn't specify a user interface, that's the place of
Motif. There's a lot to be said for this modularity.
[ refresh should be handled by the window system. apps invisibly
draw into offscreen buffers and blit to the screen as necessary.
The extra memory is well spent because it saves code in every
application, and saves developement time. ]
I disagree with your analysis. The amount of extra code and effort
that needs to be put into each application is near zero. You are
orders of magnitude away from balancing the cost of the hidden
windows. In any case relatively few unix programs interact with the
window system; most run in terminals (this may eventually (hopefully?)
change).
I really like what the NeXT window system does. It gives an
application three choices for handling refresh. The window system
either 1) saves bitmaps and blits to refresh the screen. 2) saves the
postscript and rerenders to reresh the screen. 3) calls the app.
There are many cases where saving the bits is ridiculously
inefficient. Two examples: 1) the window is displaying a bitmap
image, so the application already has its own offscreen buffer. 2)
the window is sparse, i.e. mostly background.
I have no qualms about spending memory, but you must decide if the
alternatives warrent it.
--
IBM
Scott Draves Intel
spot@cs.cmu.edu Microsoft
rcg@lpi.liant.com (Rick Gorton) (12/26/90)
> What features should be put into the CPU to improve performance and >reduce chip count? > SOME REGISTERS!!!!! And not some of those silly things usable only by instruction FOO where register q contains an address for FOO and register z contains a count for FOO. How about a couple of GEN-YOU-WINE general purpose 32 bit registers? -- Richard Gorton rcg@lpi.liant.com (508) 626-0006 Language Processors, Inc. Framingham, MA 01760 Hey! This is MY opinion. Opinions have little to do with corporate policy.
graeme@labtam.labtam.oz (Graeme Gill) (12/28/90)
In article <3853@bnr-rsc.UUCP>, schow@bcarh185.bnr.ca (Stanley T.H. Chow) writes: > In article <5813@labtam.labtam.oz> graeme@labtam.labtam.oz (Graeme Gill) writes: > > /* omitted to save space */ > > > Hmm, 208 MBit/sec = 26 MByte = 26 MPixel of 8 bits each. > > How do you do 26 million byte writes per second on a 80960? What speed are you > running it at? How is your frame buffer organized? > This is our slow machine, and it runs at 20 Mhz. We have a new model that runs at 25 Mhz. The Frame store is packed 4 pixels per 32 bit word. A burst write instruction takes 2 + 4 * 2 + 1 clock cycles per 16 pixels of bus time. At 20 MHz that equates to a theoretical rate of 29.1 Mbytes/sec (ignoring refresh overhead, interrupt routine overhead, Ethernet DMA overhead etc.) The measured rates using x11perf for 500x500 filled areas is 26 Mpixels/sec. Since it has a 3 deep write queue and scoreboarded reads, the internal CPU operation proceeds in parallel with the bus cycles. > > Also, how much CPU is left for the user applications? > None, Its an X terminal :-) the application runs on the host. This sometimes makes it a faster system than a workstation that has to both draw and run the application. There are the usual tradeoffs of centralized vs. distributed computing. > Stanley Chow BitNet: schow@BNR.CA Graeme Gill Electronic Design Engineer Labtam Australia
ts@cup.portal.com (Tim W Smith) (01/02/91)
< I don't think statements like that are warranted. For something that < is "wrong" X11 is very successful and widely used. What does wrongness have to do with width of use? Look at MS-DOS, for example, to see that wide use does not indicate lack of wrongness.
jesup@cbmvax.commodore.com (Randell Jesup) (01/24/91)
In article <5800@labtam.labtam.oz> graeme@labtam.labtam.oz (Graeme Gill) writes: >In article <450@lysator.liu.se>, zap@lysator.liu.se (Zap Andersson) writes: >> NEVER understood why this is not common practice in todays computers! I mean >> what CAN be easier than to include in the gfx chip that 'when beam reaches >> this'n'that row/column, start displaying bitmap-data from this'n'that memory! >> The Amiga is the closes I've seen, supporting these 'semi-hardware' (the >> amiga uses a co-processor) as horizontal slices of display. With a faster >> co-processor (i.e. faster than 1 pixel bitclock) you could have hardware >> windows support! You will NEVER need to worry about memorys overlapping, or >> in what memory to write! You just write to your 'virtual' screen, and the >> display chip takes care about it ALL. >about a generation behind mainstream processors. Doing windowing in software >allows a great deal of flexibility in fixing bugs, keeping up with standards >developments, ease of porting code to new generations of hardware, etc., Another reason: most current ways of doing windowing in hardware have a fixed number of windows they can support (especially if they each have a different color palette on a non-direct-RGB system). The Amiga has screens ("hardware horizontal windows"), and on each screen you can have windows. Note that there are blank lines between screens: it needs to update the bitmap pointers, color table, etc. The screens are draggable, though they remain a solid horizontal slice (actually, you can sort of do HW windowing, but it's rather limited since you can't change much on the fly across a line). -- Randell Jesup, Keeper of AmigaDos, Commodore Engineering. {uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com BIX: rjesup The compiler runs Like a swift-flowing river I wait in silence. (From "The Zen of Programming") ;-)
jesup@cbmvax.commodore.com (Randell Jesup) (01/24/91)
In article <1582@pai.UUCP> erc@pai.UUCP (Eric F. Johnson) writes: > 1) Macintosh--measure user base in millions > 2,3) tie- Microsoft Windows, Amiga--around the 1 million mark for > MS Windows, more for the Amiga, but that will soon change (note that > I'm not saying whether this will be good or bad). If you read Over 2 million now for Amiga, going up fast. > Personal Workstation, please tell them that the Amiga sports a > multi-tasking operating system with a graphical user interface > (for PW's strange application watch). But that would scale their existing entries to nil, unless they used a log graph. ;-) X is available on the Amiga also (3rd party). >PS, I know there are a lot of Amiga evangelists out there. I'm not trying to >get your goat, just noting market reality. I am in no way trying to imply >anything at all that could ever be considered bad about your wonderful >machine. Noted. MS windows 3.0 can try sell into a fairly large base of existing machines. -- Randell Jesup, Keeper of AmigaDos, Commodore Engineering. {uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com BIX: rjesup The compiler runs Like a swift-flowing river I wait in silence. (From "The Zen of Programming") ;-)