[comp.arch] Let's pretend

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/17/90)

  Let's pretend that we have the ear of the chip designers at Intel,
and that they have asked our opinion on what windows support should be
included in the new Intel 586. Please hold any negative comments about
Intel, CISC, etc, it's all been said...

The questions:

  What features should be put into the CPU to improve performance and
reduce chip count?

  Will assumptions about graphics memory organization be made, and if so
what are they?

  Do we assume that support will be:
	a) all general purpose
	b) mostly MS-windows
	c) mostly X-windows

Start of opinion:

  From what little info I have, the 486 sales are going first to people
using them as servers, second to people running a multitasking o/s,
mostly unix, and only third to DOS power users. You can assume that all
unix means SysV right now, and that other multitasking systems include
Desqview, ms-windows, DR-DOS, and other environments to support DOS
programs.

  Given that servers probably don't need graphics in most installations,
I would assume that X-windows has the largest user base of any single
window system, although it may be less than half the market.

  I believe that the time has come when enough address space is
available to allow direct mapping of graphics memory into memory on the
ISA and EISA bus, and that these busses will be used in the majority of
systems using Intel CPUs for the next two years or more. The performance
bottleneck of mapping a MB of data into 64k of address space would be
severe, even if it were done well. Cheap processors now have enough
address space to allow investing some of it in graphics.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
    VMS is a text-only adventure game. If you win you can use unix.

graeme@labtam.labtam.oz (Graeme Gill) (12/18/90)

In article <3042@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
> 
>   Let's pretend that we have the ear of the chip designers at Intel,
> and that they have asked our opinion on what windows support should be
> included in the new Intel 586.
> 
>   What features should be put into the CPU to improve performance and
> reduce chip count?
> 

	The thing crying out for help is overcoming the memory bandwidth
bottleneck. Step number one is to add support for burst writes.  As far
as I know, only two mainstream processors support burst writes:
The Intel 80960, and the Amd 29000. Both make dandy processors for
X terminals, laser printers etc. etc. as a result.

	Graeme Gill
	Electronic Design Engineer
	Labtam Australia

sef@kithrup.COM (Sean Eric Fagan) (12/18/90)

In article <3042@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
>  What features should be put into the CPU to improve performance and
>reduce chip count?

While I don't know that it would reduce chip count, a Good Thing to have
would be:  MORE REGISTERS!!!!!!

But, of course, it won't happen.  *sigh*

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

zap@lysator.liu.se (Zap Andersson) (12/18/90)

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:


>  Let's pretend that we have the ear of the chip designers at Intel,
>and that they have asked our opinion on what windows support should be
>included in the new Intel 586. Please hold any negative comments about
>Intel, CISC, etc, it's all been said...

>The questions:

>  What features should be put into the CPU to improve performance and
>reduce chip count?

Once upon a time I built a graphics board for my computer. 240x240 pixels, 16 
shades of gray, not very heavy, BUT I had windows-like support in HARDWARE! 
Now I have
 NEVER understood why this is not common practice in todays computers! I mean
what CAN be easier than to include in the gfx chip that 'when beam reaches
this'n'that row/column, start displaying bitmap-data from this'n'that memory!
The Amiga is the closes I've seen, supporting these 'semi-hardware' (the
amiga uses a co-processor) as horizontal slices of display. With a faster
co-processor (i.e. faster than 1 pixel bitclock) you could have hardware
windows support! You will NEVER need to worry about memorys overlapping, or
in what memory to write! You just write to your 'virtual' screen, and the
display chip takes care about it ALL.

Can SOMEONE tell me why this increadibly simple idea have so little use today?

>  Will assumptions about graphics memory organization be made, and if so
>what are they?

See above. But if your into 586 Windows handling, try to think up something
NEW! Don't bother with standards with moss on top....please?

/Z





--
* * * * * * * * * * * * * * * * *
* My signature is smaller than  *
* yours!  - zap@lysator.liu.se  *
* * * * * * * * * * * * * * * * *

torbenm@freke.diku.dk (Torben [gidius Mogensen) (12/18/90)

graeme@labtam.labtam.oz (Graeme Gill) writes:


>>   What features should be put into the CPU to improve performance and
>> reduce chip count?


>	The thing crying out for help is overcoming the memory bandwidth
>bottleneck. Step number one is to add support for burst writes.  As far
>as I know, only two mainstream processors support burst writes:
>The Intel 80960, and the Amd 29000. Both make dandy processors for
>X terminals, laser printers etc. etc. as a result.

There is also the ARM. And before you say that this isn't a mainstream
processor, I should point out that it has a larger user base than
either Intel 80960 or Amd 29000. In fact it is the second most used
RISC processor (SPARC being the most used).

Torben Mogensen (torbenm@diku.dk)

is@athena.cs.uga.edu ( Bob Stearns) (12/18/90)

In article <1990Dec18.082623.16648@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:
>In article <3042@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
>>  What features should be put into the CPU to improve performance and
>>reduce chip count?
>
>While I don't know that it would reduce chip count, a Good Thing to have
>would be:  MORE REGISTERS!!!!!!
>
>But, of course, it won't happen.  *sigh*
>
While more registers sound like motherhood and apple pie, in the UNIX world
they can be a distinct losing proposition. The commonest service provided by
the kernel is a state switch between processes. The more registers, the longer
this state switch must necessarily take. The only ways out of this require 
lots more hardware and discipline from both the compilers and the programmer.
The first solution involves keeping track of just which registers have been
used during a process and only saving those; lots of very smart (expensive)
hardware, and a need for discipline to keep from using more registers than
really required. The second solution is to provide enough registers so that
each process has its own set which never needs to be swapped; this leads to a
hard limit on the number of processes allowed in the machine at any one time,
OC

erc@pai.UUCP (Eric F. Johnson) (12/18/90)

In article <3042@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
>   Let's pretend that we have the ear of the chip designers at Intel,
> and that they have asked our opinion on what windows support should be
> included in the new Intel 586. Please hold any negative comments about
> Intel, CISC, etc, it's all been said...

> [...deleted...] 

>   Given that servers probably don't need graphics in most installations,
> I would assume that X-windows has the largest user base of any single
> window system, although it may be less than half the market.

Much less. I would guess that for graphical windowing systems, the
largest user base is for the Macintosh window system. (Yes, I'm well
aware that the Mac doesn't use an 80x86 chip.) My guess would be, in order:
    1) Macintosh--measure user base in millions
    2,3) tie- Microsoft Windows, Amiga--around the 1 million mark for
    MS Windows, more for the Amiga, but that will soon change (note that
    I'm not saying whether this will be good or bad). If you read
    Personal Workstation, please tell them that the Amiga sports a 
    multi-tasking operating system with a graphical user interface
    (for PW's strange application watch).
    4) SunView - There are still probably more users of this than X,
    although that should change in 1991 in favor of X. How many people
    who buy Suns today still use SunView? Quite a lot, I'd venture.
    5) The X Window System, although this will continue to grow, since
    it is available on multiple architectures and operating systems.

>   I believe that the time has come when enough address space is
> available to allow direct mapping of graphics memory into memory on the
> ISA and EISA bus, and that these busses will be used in the majority of
> systems using Intel CPUs for the next two years or more. The performance
> bottleneck of mapping a MB of data into 64k of address space would be
> severe, even if it were done well. Cheap processors now have enough
> address space to allow investing some of it in graphics.
> -- 
> bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
>     VMS is a text-only adventure game. If you win you can use unix.

I like the idea of putting graphics support on chip (and it has been
done before). I think it is in Intel's best interests to imbed support
for Microsoft Windows into the 80x86, rather than X Window support.
(I would prefer X support, which would make my "Wunderclone" into a decent
X machine, but from Intel's point of view, MS Windows is the way to go.
Also, since X has been ported to such a wide variety of machines, I'm sure
that embedding MS Windows support on-chip would also provide a lot of
good features that could be used for implementing X as well.)

Have fun,
-Eric

PS, I know there are a lot of Amiga evangelists out there. I'm not trying to 
get your goat, just noting market reality. I am in no way trying to imply
anything at all that could ever be considered bad about your wonderful
machine.

-- 
Eric F. Johnson               phone: +1 612 894 0313    BTI: Industrial
Boulware Technologies, Inc.   fax:   +1 612 894 0316    automation systems
415 W. Travelers Trail        email: erc@pai.mn.org     and services
Burnsville, MN 55337 USA

jonah@dgp.toronto.edu (Jeff Lee) (12/19/90)

In article <1990Dec18.141944.5041@athena.cs.uga.edu> is@athena.cs.uga.edu ( Bob Stearns) writes:
>While more registers sound like motherhood and apple pie, in the UNIX world
>they can be a distinct losing proposition. The commonest service provided by
>the kernel is a state switch between processes. The more registers, the longer
>this state switch must necessarily take. [...]

Sigh.  We've been through this before: within reason, saving general
purpose registers is typically not the most expensive part of a UNIX
context switch.  The cost of saving 8, 16, or 32 general purpose
registers is often less than the cost of saving other process state
information.  However, the difference in code optimization with 8, 16,
or 32 GP registers is often not insignificant.  Thus, up to a point
you win more through code optimization than you lose due to slower
context switching.  The tradeoff point depends on the expected rate of
context switches.

What *can* be annoying is having to save all registers in every
exception handler.  Having a separate set of GP registers for each
processor mode could turn "traps" and "interrupts" into almost
instantaneous co-routine switches.  The tricky part might be flushing
the pipeline correctly -- I don't know how easily this can be done.
My caveat on this is that these additional registers should look just
like the normal GP registers so that kernel code can be compiled with
the same compiler as user code.  Only the context save/restore code
should need to access registers in another register bank.  The PDP10
used to have different user/system register banks so it can be done.

Does anyone have any DATA on how frequently system calls, exceptions,
and interrupts (a) occur, and (b) result in context switches?

sef@kithrup.COM (Sean Eric Fagan) (12/19/90)

In article <1990Dec18.141944.5041@athena.cs.uga.edu> is@athena.cs.uga.edu ( Bob Stearns) writes:
>While more registers sound like motherhood and apple pie, in the UNIX world
>they can be a distinct losing proposition. The commonest service provided by
>the kernel is a state switch between processes. The more registers, the longer
>this state switch must necessarily take. 

Uhm, have you taken an OS course?  And actually *read* the material?

Saving the registers is a tiny part of a unix context switch.  Most of it is
dealing with checking which process can run next, etc.

On the other hand, having more registers means that you don't have to go to
memory as often, which *will* speed things up.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

borasky@ogicse.ogi.edu (M. Edward Borasky) (12/19/90)

>In article <3042@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
>>  What features should be put into the CPU to improve performance and
>>reduce chip count?
I thought the way to improve performance was to REMOVE features!  And
the fewer CHIPS that make up a CPU, the slower it is for a given tech-
nology.  I think they made a mistake putting the co-processor ON CHIP;
surely it would be faster if the floating point were done in a \
specialized unit (386/387 style).  That way you can get faster floating
point with other peoples' coprocessors (Weitek, for example).

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/19/90)

In article <1990Dec18.082623.16648@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:

| While I don't know that it would reduce chip count, a Good Thing to have
| would be:  MORE REGISTERS!!!!!!

  I think we can assume that the 586 will be a superset of the 486. Can
someone quantify what would be gained with more registers, say R0-R7?
The cost of saving and restoring on procedure calls is obvious, can
someone show that the addition of more would produce a significant net
gain.

  Now if you said make the existing registers more general purpose, I
can see that, although the beauty of the Intel instruction set is that
by having most of the instructions single byte the memory bandwidth is
conserved for data access. The price is that you have special purpose
registers.

| But, of course, it won't happen.  *sigh*

  Registers were added with the 286 and 386. I have yet to see a
compiler which makes use of the 386 registers.

  I hope people will contribute idea of useful additions, rather than
talk about how Intel can be more like {your favorite chip or style}.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
    VMS is a text-only adventure game. If you win you can use unix.

jcb@frisbee.Eng.Sun.COM (Jim Becker) (12/19/90)

sef@kithrup.COM (Sean Eric Fagan) writes:

    In article <3042@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
    >  What features should be put into the CPU to improve performance and
    >reduce chip count?

    While I don't know that it would reduce chip count, a Good Thing to have
    would be:  MORE REGISTERS!!!!!!

    But, of course, it won't happen.  *sigh*

    -- 
    Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
    sef@kithrup.COM  |  I had a bellyache at the time."

There are a whole host of debugging registers in the 486 --  is  there
any  way  to use them?  One would think that when the chip gets to the
market  they  would  have  the  debugging  out  of  the way, and those
registers would be freed up for use by OS and compiler people.

-Jim Becker
--
--    
	 Jim Becker / jcb%frisbee@sun.com  / Sun Microsystems

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/19/90)

In article <1990Dec18.115605.7411@jarvis.csri.toronto.edu> jonah@dgp.toronto.edu (Jeff Lee) writes:

| What *can* be annoying is having to save all registers in every
| exception handler.  Having a separate set of GP registers for each
| processor mode could turn "traps" and "interrupts" into almost
| instantaneous co-routine switches.  The tricky part might be flushing
| the pipeline correctly -- I don't know how easily this can be done.
| My caveat on this is that these additional registers should look just
| like the normal GP registers so that kernel code can be compiled with
| the same compiler as user code.  Only the context save/restore code
| should need to access registers in another register bank.  The PDP10
| used to have different user/system register banks so it can be done.

  You've just described the Z80. 

  I would think it useful to (a) disable interrupts while the registers
were swapped, and (b) allow access to the alternate set. This and an
instruction to "save alternate regs and enable ints" could be used if
more than a few instructions were needed to service the condition. And
the converse, of course.

  The Z80 was fast for its day when that technique was used. I had
"parallel port NFS" under CP/M to get access to drives on other machines.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
    VMS is a text-only adventure game. If you win you can use unix.

rouellet@crhc.uiuc.edu (Roland G. Ouellette) (12/19/90)

> While more registers sound like motherhood and apple pie, in the
> UNIX world they can be a distinct losing proposition. The commonest
> service provided by the kernel is a state switch between processes.
> The more registers, the longer this state switch must necessarily
> take.  The only ways out of this require lots more hardware and
> discipline from both the compilers and the programmer.

In the UNIX world maybe this may be a problem (changing the page table
maps in an MP system and figuring out which processes are runnable is
probably more of a problem).  However your context switch code is
likely to involve several procedure calls, each of which may save some
registers.  By the time the stacks are about to be swapped, most of
the user registers will have been flushed out onto the stack of the
outgoing process.  Only the few that didn't get touched will need
saving.  The compiler will tell you which ones need to be saved.

PLUG:  Choices, an OO OS written in an OO language (C++) here at the
University of Illinois does this.  Vince managed to get g++ (and maybe
C Front) to do this for him.  [He also complained loudly about
hardware enforced context switch instructions which saved every
register because his code had less overhead.]

This sort of thing might be possible in a UNIX environment, but
there's a load of crufty code out there.  [I've seen BSD derived code
for context switches (from a vendor to remain nameless -- they may
have fixed it) which simulated in SW the PCBs used on VAX computers
eventhough some of the state was known to be fairly useless on that
architecture... like 4 of the 5 stack pointers.]
--
= Roland G. Ouellette			ouellette@tarkin.enet.dec.com	=
= 1203 E. Florida Ave			rouellet@[dwarfs.]crhc.uiuc.edu	=
= Urbana, IL 61801	   "You rescued me; I didn't want to be saved." =
=							- Cyndi Lauper	=

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/19/90)

In article <15145@ogicse.ogi.edu> borasky@ogicse.ogi.edu (M. Edward Borasky) writes:

|          I think they made a mistake putting the co-processor ON CHIP;
| surely it would be faster if the floating point were done in a \
| specialized unit (386/387 style).  That way you can get faster floating
| point with other peoples' coprocessors (Weitek, for example).

  The 486 uses fewer cycles than the 386 for the same instructions. The
Weitek can still be added. The boards are easier to design, smaller, and
have less support logic, and are thus cheaper to build.

  If Intel and the board vendors were not recovering design cost and
making all the profit the market will bear, I think the 486 would be
cheaper than a 386+387. As it is, the prices are comparable, and the
cost performance is a lot better on the 486, at least at the system
level.

  How about a new subject line if you want to continue this, it's not
related to what could be added to the 586.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
    VMS is a text-only adventure game. If you win you can use unix.

is@athena.cs.uga.edu ( Bob Stearns) (12/19/90)

Not only have I taken the OS courses, I have written, mucked about in and
generally been involved in more OS type work for various architectures than
most people even know exist. Note that when I read "more registers" I think
in terms of machines like the CYBER 205 with its 256 64bit registers or even
larger sets. Yes, when the register count is a measly 8-32 32bit registers 
the save/restore overhead is fairly small, although there is also the call
versus interrupt penalty, depending upon who must save/restore registers 
during a call/return sequence.  The rest of the state is small compared to
the 8K bits of registers I was considering, and the choice of next process to
schedule should have been already taken care of by the process list maintenance
routines using something like a heap by priority/time so selecting the next one
should be a very short algorithm. See Sedgewick on the subject.

thor@thor.atd.ucar.edu (Richard Neitzel) (12/19/90)

In article <450@lysator.liu.se>, zap@lysator.liu.se (Zap Andersson) writes:
|> davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
|> 
|> >The questions:
|> 
|> >  What features should be put into the CPU to improve performance and
|> >reduce chip count?
|> 
|> Once upon a time I built a graphics board for my computer. 240x240 pixels, 16 
|> shades of gray, not very heavy, BUT I had windows-like support in HARDWARE! 
|> Now I have
|>  NEVER understood why this is not common practice in todays computers! I mean
|> what CAN be easier than to include in the gfx chip that 'when beam reaches
|> this'n'that row/column, start displaying bitmap-data from this'n'that memory!
|> The Amiga is the closes I've seen, supporting these 'semi-hardware' (the
|> amiga uses a co-processor) as horizontal slices of display. With a faster
|> co-processor (i.e. faster than 1 pixel bitclock) you could have hardware
|> windows support! You will NEVER need to worry about memorys overlapping, or
|> in what memory to write! You just write to your 'virtual' screen, and the
|> display chip takes care about it ALL.
|> 
|> Can SOMEONE tell me why this increadibly simple idea have so little use today?
|> 
|> >  Will assumptions about graphics memory organization be made, and if so
|> >what are they?
|> 
|> See above. But if your into 586 Windows handling, try to think up something
|> NEW! Don't bother with standards with moss on top....please?
|> 
If I interpret correctly what you are asking for, check out the Tadpole TP-AGCV graphics 
board. Tadpole has a special windowing chip that allows the following to be set via 
registers: a window's screen x,y start point, it's height and width, the starting 
location in memory, stacking priority, zoom factor and display enable. Moving a window,
[un]displaying it, panning through video memory, setting a window's zoom factor, etc. 
require one or two writes. In our application, we want to switch between multiple windows
nearly instantaneously. The Tadpole board can switch between two sets of two windows
faster then the screen refresh rate - makes a neat display to see both sets of windows
at the same time (just have a loop that swaps the sets constantly!). Currently they 
have a 6U VME board with 4 Mb of video ram, but you can also buy the windowing chips 
from Tadpole.
-- 
Richard Neitzel thor@thor.atd.ucar.edu	     	Torren med sitt skjegg
National Center For Atmospheric Research	lokkar borni under sole-vegg
Box 3000 Boulder, CO 80307-3000			Gjo'i med sitt shinn
303-497-2057					jagar borni inn.

ckp@grebyn.com (Checkpoint Technologies) (12/19/90)

In article <15145@ogicse.ogi.edu> borasky@ogicse.ogi.edu (M. Edward Borasky) writes:
>>In article <3042@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
>>>  What features should be put into the CPU to improve performance and
>>>reduce chip count?
>I thought the way to improve performance was to REMOVE features!  And

This is *not* an option when you have a significant software base to
protect.  And surely Intel has a gargantuan software base to protect.
Same with the 68K line.

Just think of it.  Intel releases the 586, and to improve performance
they remove a few complex instructions and replace them with one or two
simpler but faster instructions.  No software that used those
instructions will run.  Intel earns a bad rep and sells zero chips as
the journalists take Intel apart for producing an incompatible chip.

BTW: Something I read in a PC mag recently irked me.  Paraphrased, the
author wrote "\"Incompatible\" means that something that's supposed to
work together with something else, doesn't".  Well, by my book
"incompatible" only means that something doesn't work together with
something else, it makes no moral judgement about whether it's supposed
to.  But in the PC world, "incompatible" is taken to mean "bad", "wrong",
"evil".

I just wanted to get that off my chest...
-- 
First comes the logo: C H E C K P O I N T  T E C H N O L O G I E S      / /  
                                                                    \\ / /    
Then, the disclaimer:  All expressed opinions are, indeed, opinions. \  / o
Now for the witty part:    I'm pink, therefore, I'm spam!             \/

kdarling@hobbes.ncsu.edu (Kevin Darling) (12/19/90)

zap@lysator.liu.se (Zap Andersson) writes:
>Once upon a time I built a graphics board for my computer. 240x240 pixels, 16 
>shades of gray, not very heavy, BUT I had windows-like support in HARDWARE! 
>Now I have
> NEVER understood why this is not common practice in todays computers! I mean

Total agreement!  There's no doubt that systems programmers in the
not-too-distant future will look back on today as the dark ages
of writing windowing software.  "What?? No windows in hardware?? Auugh!"

I believe the Intel 82786 gfx chip does have this support now. Each
window is a different section of memory, and can be of virtually any
mode... so for example, you could have a CGA-style window in the middle
of a 1Kx1K display.

No idea what it costs tho.  Anyone know?  I read that at least one
new fancy terminal uses it.

The Philips VSC video chip allows horizontal windows a la the Amiga;
but yeah, having more than one window/line seems to be still a ways
off.  Some days I'm tempted to rig up my own external hardware method.
  best - kev <kdarling@catt.ncsu.edu>

jbuck@galileo.berkeley.edu (Joe Buck) (12/19/90)

In article <24117@grebyn.com>, ckp@grebyn.com (Checkpoint Technologies) writes:
|> Just think of it.  Intel releases the 586, and to improve performance
|> they remove a few complex instructions and replace them with one or two
|> simpler but faster instructions.  No software that used those
|> instructions will run.  Intel earns a bad rep and sells zero chips as
|> the journalists take Intel apart for producing an incompatible chip.

No, I'm afraid not.  What you do is you get an illegal instruction trap
when the wierd instruction is run, and the trap handler then emulates
the instruction.  The chip-maker releases the code for the trap-handler
(makes it public) and the PC-clone folks put it in their BIOS ROMs and
the Unix-port people put it in their kernels.  The lowly user has no idea
that anything is different, since the 586 is so much faster that the
emulated instruction is faster than the original.

No doubt some ignorant journalist will write an article making the point
you make.  Everyone in the know will proceed to laugh at that journalist.

It's been done before, of course; MicroVAXes do just this (they don't
support the fancy VAX instructions but emulate them with traps).


--
Joe Buck
jbuck@galileo.berkeley.edu	 {uunet,ucbvax}!galileo.berkeley.edu!jbuck	

graeme@labtam.labtam.oz (Graeme Gill) (12/19/90)

In article <1990Dec18.113834.5227@diku.dk>, torbenm@freke.diku.dk (Torben [gidius Mogensen) writes:
> graeme@labtam.labtam.oz (Graeme Gill) writes:
> 
> >as I know, only two mainstream processors support burst writes:
> >The Intel 80960, and the Amd 29000. Both make dandy processors for
> >X terminals, laser printers etc. etc. as a result.
> 
> There is also the ARM. And before you say that this isn't a mainstream
> processor, I should point out that it has a larger user base than
> either Intel 80960 or Amd 29000. In fact it is the second most used
> RISC processor (SPARC being the most used).

    But the ARM only has 16 generally accessible registers. From experience
with the 960 I have found that 32 registers looks a bit small when
you are reading and writing 4 words at a time. In this regard, the 29000
has an advantage. However, the 29000 is flawed in stalling execution while
a store or load multiple instruction is executing. I suspect the ARM
also suffers from this problem.
    The ARM does not seem to be used much outside Europe at the present
time. I do not hear much about Acorn computers in Australia, and they
do not seem to have any presence outside the home computer market.

	Graeme Gill

sef@kithrup.COM (Sean Eric Fagan) (12/19/90)

In article <1990Dec18.202842.11771@athena.cs.uga.edu> is@athena.cs.uga.edu ( Bob Stearns) writes:
>Note that when I read "more registers" I think
>in terms of machines like the CYBER 205 with its 256 64bit registers or even
>larger sets. 

Ah.  A misunderstanding... 8-)

I just mentioned the 205 to someone in a response to my posting (remember
that the ETA-10 is a faster and better 205).

However, recall that we were discussing the *86, a machine which has *6*
registers available for "general purpose" use, which really aren't (lots and
lots of instructions require certain registers, or at least work better with
them).

More on the subject of context switching:  the Elxsi had something like 16
sets of registers on board.  During a context switch (e.g., from one thread
to another [no supervisor mode on the machine]), it just used the next
available set of registers.  The hardware *knew* about threads and whatnot,
so this was feasible.  But I can imagine someone like MIPS or Sun (for the
Sparc, of course) putting a few different sets on board, whose sole purpose
would be to act as a buffer when handling faults and whatnot.  Sort of like
register windows, only for context switches, not subroutine calls.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

sef@kithrup.COM (Sean Eric Fagan) (12/19/90)

In article <24117@grebyn.com> ckp@grebyn.UUCP (Checkpoint Technologies) writes:
>In article <15145@ogicse.ogi.edu> borasky@ogicse.ogi.edu (M. Edward Borasky) writes:
>>I thought the way to improve performance was to REMOVE features!  And
>This is *not* an option when you have a significant software base to
>protect.  And surely Intel has a gargantuan software base to protect.
>Same with the 68K line.
>Just think of it.  Intel releases the 586, and to improve performance
>they remove a few complex instructions and replace them with one or two
>simpler but faster instructions.  No software that used those
>instructions will run.  Intel earns a bad rep and sells zero chips as
>the journalists take Intel apart for producing an incompatible chip.

Uhm... have you read about the 68030 and the 68040?  The '30 removed two
instructions that the '20 introduced (CALLM and RETM, I think), that few to
no people used.  The '40's on-board FPU does only a drastic subset of the
68882 (is that the 68k FPU?).  It basicly does add, subtract, mult, and div,
and a few others; the rest have to be emulated by the OS (or whatever is in
control of the machine).

Motorola has not earned a bad rep for that, nor have they sold zero chips.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

graeme@labtam.labtam.oz (Graeme Gill) (12/19/90)

In article <450@lysator.liu.se>, zap@lysator.liu.se (Zap Andersson) writes:
>  NEVER understood why this is not common practice in todays computers! I mean
> what CAN be easier than to include in the gfx chip that 'when beam reaches
> this'n'that row/column, start displaying bitmap-data from this'n'that memory!
> The Amiga is the closes I've seen, supporting these 'semi-hardware' (the
> amiga uses a co-processor) as horizontal slices of display. With a faster
> co-processor (i.e. faster than 1 pixel bitclock) you could have hardware
> windows support! You will NEVER need to worry about memorys overlapping, or
> in what memory to write! You just write to your 'virtual' screen, and the
> display chip takes care about it ALL.
> 
> Can SOMEONE tell me why this increadibly simple idea have so little use today?

    The answer to this is the usual RISC vs CISC arguments. Why have very
complicated hardware, that tends to be locked into a particular implementation
of windowing etc. , when with a little bit of effort on the window library
programmers part you can get the same performance with more general hardware
- ie RISC processor and frame buffer. Specialised graphics hardware is usually
about a generation behind mainstream processors. Doing windowing in software
allows a great deal of flexibility in fixing bugs, keeping up with standards
developments, ease of porting code to new generations of hardware, etc., 
    Even some of the high end graphics vendors are throwing out their 
hardware pipelined 3d transform/clip engines, and putting more general
purpose processors in their place, like a bunch of i80860s.
    There is definitely a place for hardware assist of graphics operations,
but "do it all" solutions tend to date rapidly.

	Graeme Gill
	Labtam Australia
	

sef@kithrup.COM (Sean Eric Fagan) (12/19/90)

In article <3058@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>  I think we can assume that the 586 will be a superset of the 486. Can
>someone quantify what would be gained with more registers, say R0-R7?

Yep.  Optimization.  Take a look at code produced by either gcc or msc for
the '386 some time.  Ever hear of the message, "infinite spill"?

>  Registers were added with the 286 and 386. I have yet to see a
>compiler which makes use of the 386 registers.

The registers visible to ring three applications for the '386 were fs and fs
(making a total of six segment registers, to match the six "gp" registers).
And I've seen code use it.  Remember that a) they're only 16 bits, and b) in
protected mode, loading a segment register with an invalid segment number
will cause a fault.  I had a version of a compiler that used fs for doing
certain weird things (like jumping from a 32-bit segment to a 16-bit
segment *shudder*).

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

sef@kithrup.COM (Sean Eric Fagan) (12/19/90)

In article <4748@exodus.Eng.Sun.COM> jcb@frisbee.Eng.Sun.COM (Jim Becker) writes:
>There are a whole host of debugging registers in the 486 --  is  there
>any  way  to use them?  One would think that when the chip gets to the
>market  they  would  have  the  debugging  out  of  the way, and those
>registers would be freed up for use by OS and compiler people.

Uhm... the debugging registers on the '486 are (if I understand what you're
talking about) the same as the debugging registers on the '386 (one
addition, I think).  They're used for debugging.  For example, CodeView
under SCO UNIX uses the debugging registers to set data breakpoints (i.e.,
break when this address is read to, or written to, or executed).  They're
not visible to the application, I believe, and can't be used in a multiply
instruction, for example.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

sef@kithrup.COM (Sean Eric Fagan) (12/19/90)

In article <1990Dec18.213506.645@ncsuvx.ncsu.edu> kdarling@hobbes.ncsu.edu (Kevin Darling) writes:
>I believe the Intel 82786 gfx chip does have this support now. Each
>window is a different section of memory, and can be of virtually any
>mode... so for example, you could have a CGA-style window in the middle
>of a 1Kx1K display.

They should not go into CPU's (my opinion).

For example, I don't have a *GA on kithrup.  I have a cornerstone (early
model) with 1600x1200.  It doesn't look *anything* like a *GA (when in
graphics mode).  Having Intel put VGA onto the chip would mean that I would
not use it.

I'm not against having your hardware manage your graphics; I just don't want
my *cpu* to do that.  (See the SGI Graphics Board for a *good* example.
Also see the NS32GX16 for a good example of why *not* to put this stuff in
the CPU.)

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

ckp@grebyn.com (Checkpoint Technologies) (12/19/90)

In article <9876@pasteur.Berkeley.EDU> jbuck@galileo.berkeley.edu (Joe Buck) writes:
>What you do is you get an illegal instruction trap
>when the wierd instruction is run, and the trap handler then emulates
>the instruction.  The chip-maker releases the code for the trap-handler
>(makes it public) and the PC-clone folks put it in their BIOS ROMs and
>the Unix-port people put it in their kernels.

You're right.  I believed BIOS compatibility would be an issue too, but
maybe not.

You know, Motorola has been getting away with exactly this.  The 68010
took away the user-level MOVE SR,dest instruction.  The 68030 took away
the user-mode CALLM and RETM instructions (good riddance, I say)
introduced on the 68020.  But you know what else? No system I know of
traps and emulates those for backward compatability. Now the 68040
removes the user-mode trig instructions in the FPU, and replaces them
with emulation support. I suspect these will emulated in real systems,
unlike the others.
-- 
First comes the logo: C H E C K P O I N T  T E C H N O L O G I E S      / /  
                                                                    \\ / /    
Then, the disclaimer:  All expressed opinions are, indeed, opinions. \  / o
Now for the witty part:    I'm pink, therefore, I'm spam!             \/

johnl@iecc.cambridge.ma.us (John R. Levine) (12/19/90)

In article <3058@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>  I think we can assume that the 586 will be a superset of the 486.

Indeed.  It seems to me pointless to add new features to the user-mode
instruction set.  There are enough 386 and 486 chips around that no sane
programmer would use the new features, the gain would be unlikely to be worth
losing backward compatibilty.  Rather, we need either features that
transparently improve performance, or else features that are a big enough win
that it's worth forsaking backward compatibility.  Here are a few
suggestions:

-- Per-segment paging.  As has been beaten to death here before, the current
paging scheme limits the total address space of a process to 4GB.  With a
page table per segment, you actually could map each open file to a segment
(a 4GB file is still pretty big) and merge all I/O with virtual memory.

-- Better segment performance.  On the 286, 386, and 486 it takes forever to
load a segment register.  On the 486 it takes 9 cycles, compared to 1 cycle
for a regular memory load.  Perhaps it could cache a dozen or so recently
loaded segment numbers.  The FS and GS registers are no substitute, nobody
has the faintest idea how to manage segment registers separately from
address registers.

-- (My favorite.)  Better interrupt performance.  There are two problems.
One is that interrupts are just plain slow.  A normal interrupt takes 71
cycles, but if you use the facility to run an interrupt in its own task, the
interrupt takes 236 cycles.  The return takes 231.  I know it's doing a lot
of work, but get real -- that's close to 20us for a null interrupt handler on
a 25MHz part.  Some lighter weight interrupts, perhaps assisted by multiple
register sets, would be nice.  Also, device interrupts on the 486 use the
same creaky method that the 8088 did.  There's a single interrupt line, and
when the interrupt happens it accepts a vector from the interrupt controller.
That controller is still an 8259A which only has 8 interrupt lines unless you
cascade them which is a kludge.  There is no easy way to mask some device
interrupts without masking them all (you can stuff commands to the 8259 but
it's slow and clumsy.)  An interrupt level register that the kernel could
manage easily, sort of like the PDP-11 scheme, would be helpful.  To support
this without having a dedicated interrupt line for each device needs a bus
protocol so that devices can post a request for interrupt including the
interrupt number, and the CPU can come back later and say "number 17, your
interrupt is now taken."  If we have all level-triggered interrupts, we could
even get by without the call back.

-- Graphics support of various kinds.  The 860 has a little support for
ray tracing, with some instructions that make it easy to whiz through your
data structures and figure out what obscures what.  One might also like some
support for bit-aligned bit-blits, though that tends to tie up the data bus
and so would be far more useful if it had some separate path to memory, at
least to video memory, that didn't lock out the CPU.

-- 
John R. Levine, IECC, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@iecc.cambridge.ma.us, {ima|spdcc|world}!iecc!johnl
"Typically supercomputers use a single microprocessor." -Boston Globe

brandis@inf.ethz.ch (Marc Brandis) (12/19/90)

In article <1990Dec18.141944.5041@athena.cs.uga.edu> is@athena.cs.uga.edu ( Bob Stearns) writes:
>>
>>While I don't know that it would reduce chip count, a Good Thing to have
>>would be:  MORE REGISTERS!!!!!!
>>
>>
>While more registers sound like motherhood and apple pie, in the UNIX world
>they can be a distinct losing proposition. The commonest service provided by
>the kernel is a state switch between processes. The more registers, the longer
>this state switch must necessarily take.  ...

Of course, a process switch takes longer when you have more registers to save.
However, when you look at the typical process switch times in UNIX, you will
see that the register saving part is not a dominating part. UNIX process 
switch times are in the millisecond area, while the time required to save
registers is in the microsecond area. How much will it make your process switch
time longer when you would have 32 registers in the 386 instead of 8? 24 loads
and 24 stores, or something between 100 and 200 processor cycles, which is 
between 4 and 8 microseconds on a 25 MHz machine. This accounts for around one
percent of the process switch time (or less, I do not have exact numbers for
386 implementations of UNIX).

Now look at the alternative, which is to let the application do more memory
references because it cannot keep enough information in the registers. 
When you look at recent papers about computer architecture or compiler 
construction, you will see that a larger register file is able to reduce the
number of memory references a lot. Between two process switches, you are very
likely to save more than 50 references when having a larger register file,
thus making the machine faster. 

I understand that there are applications in embedded systems where a very fast
task switch is important, and where the work done per task switch is low. In
these cases a processor context as small as possible is the right choice.
However, you do not want to run such a high-overhead process switch like in
UNIX on such a system.


Marc-Michael Brandis
Computer Systems Laboratory, ETH-Zentrum (Swiss Federal Institute of Technology)
CH-8092 Zurich, Switzerland
email: brandis@inf.ethz.ch

kdarling@hobbes.ncsu.edu (Kevin Darling) (12/19/90)

|In <1990Dec19.052844.4083@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes
|
|>In <1990Dec18.213506.645@ncsuvx.ncsu.edu> kdarling@hobbes.ncsu.edu (Kevin Darling) writes:
|>I believe the Intel 82786 gfx chip does have this support now. Each
|>window is a different section of memory, and can be of virtually any mode...
|
|They should not go into CPU's (my opinion).

Agreed, tho in this case it's not.  The 82786 is a graphics display and
coprocessor chip meant to be used in addition to the normal cpu.  It's
pretty nice from what I've read about it:

 Shares 4meg RAM with the cpu
 All the usual blit/draw functions, plus display zoom/pan each window
 Display modes include 640x480x256 up to 1024x1024x2
  or can sync several to go to even higher color res

But the nice thing from my standpoint (writing windowing drivers) is that
each "window" is a _separate_ packed-bitmap (up to 32K x 32K pixels) in
the shared memory... the 82786 takes care of combining them on the screen.
You can have up to 16 displayed windows per scan-line, which seems a good
start (no limit vertically).  And each displayed window can be of 1-8 bits
per pixel in depth (the 82786 changes modes on the fly per window). The
start-pos/size of each window is settable on pixel boundaries.

So it sounds almost ideal to me.  I wouldn't have to worry about overlapping
windows or forcing all windows/screen into one mode, etc.  Thx for
reminder, btw... I need to get the price on these devils.  For a good
article on this chip, see BYTE August 1987 (!).  After all this time,
I had figured the chip had never come out, but then I saw an ad for a
terminal using it, a few weeks ago.  best - kev <kdarling@catt.ncsu.edu>

kdarling@hobbes.ncsu.edu (Kevin Darling) (12/19/90)

OOOPS!

Speaking about gfx support, sef@kithrup.COM (Sean Eric Fagan) wrote:
>They should not go into CPU's (my opinion).

And I replied:
>Agreed, tho in this case it's not.  The 82786 is a graphics display and
>coprocessor chip meant to be used in addition to the normal cpu.  It's
>pretty nice from what I've read about it: [etc]

Sorry. Brain in neutral, I guess. The thread was supposed to be about
features *added to cpus*... I got sidetracked onto separate gfx support
chips <sigh>. 

Happy holidays! - kevin <kdarling@catt.ncsu.edu>

mcdonald@aries.scs.uiuc.edu (Doug McDonald) (12/19/90)

In article <1990Dec19.052338.3911@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:
>In article <3058@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>>  I think we can assume that the 586 will be a superset of the 486. Can
>>someone quantify what would be gained with more registers, say R0-R7?
>
>Yep.  Optimization.  

Yes, in some person's sense. But maybe not speed. IF you add more registers
you have to add instructions to access them. The register addressing 
system, of the 386 is already quite full. Those of you who want more
registers, please explain here on the net exactly what the op-codes you
are going to use to acces those registers. Are you goint to add a byte-
prefix to every register instruction that says "use the special new
register set"? If so, please explain how that would speed execution.
Please remember that any operand that would be put in a register would be in
the cache anyway. 

Only once have I needed more registers than the 80286 already has. Due
to the greater flexibility in use of the registers of the 386 over the
286, I was able to recode for the 386 and get everything in registers.
Result: a 3% speedup.

I think a FAR better idea than squeezing in more registers would be
to take advantage of the fact that the 80x86 was designed from the start
to have an efficient instruction set, leave it that way, and simply use
the chip space to make **everything** faster. 

Doug McDonald

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/19/90)

In article <5800@labtam.labtam.oz> graeme@labtam.labtam.oz (Graeme Gill) writes:

|     The answer to this is the usual RISC vs CISC arguments. Why have very
| complicated hardware, that tends to be locked into a particular implementation
| of windowing etc. , when with a little bit of effort on the window library
| programmers part you can get the same performance with more general hardware

  When this lovely generalized system can perform at a reasonable rate,
then that's fine. Until then users will want hardware boost because it's
more pleasant to use, companies will want it because it's more
productive.

  A display system isn't fast enough until it has to be slowed down to
avoid overrunning the input bandwidth of the eye. Until then people will
want more, and today that means some hardware assists. In truth you
*can't* write software as fast as dedicated hardware, with any amount of
effort, much less "a little bit."
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
    VMS is a text-only adventure game. If you win you can use unix.

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/19/90)

In article <1990Dec19.052338.3911@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:
| In article <3058@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
| >  I think we can assume that the 586 will be a superset of the 486. Can
| >someone quantify what would be gained with more registers, say R0-R7?
| 
| Yep.  Optimization.  Take a look at code produced by either gcc or msc for
| the '386 some time.  Ever hear of the message, "infinite spill"?

  I meant what I said - "quantify" rather than qualify. Yes optimization
would be better and memory accesses would be down, but how much?
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
    VMS is a text-only adventure game. If you win you can use unix.

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/19/90)

In article <1990Dec19.060521.16051@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes:

	[ lots of good stuff ]

	[ stuff about ints being painfully slow ]

|                                Also, device interrupts on the 486 use the
| same creaky method that the 8088 did.  There's a single interrupt line, and
| when the interrupt happens it accepts a vector from the interrupt controller.

  Yes, isn't that a nice general solution? It allows simple devices to
create interrupts without needing an interrupt controller in the system
at all, and yet give 256 discrete interrupts in the vector.

| That controller is still an 8259A which only has 8 interrupt lines unless you
| cascade them which is a kludge.  There is no easy way to mask some device
| interrupts without masking them all (you can stuff commands to the 8259 but
| it's slow and clumsy.)  

  Here I disagree. While the cascade does cause some latency, it allows
groups of interrupts to be enabed and disabled at once, and for some to
be edge and some level triggered. How slow and clumsy can a two
instruction sequence load to register and out register to port be?

|                         An interrupt level register that the kernel could
| manage easily, sort of like the PDP-11 scheme, would be helpful.

  The 8259 has a mode which disables all low priority interrupts while
the current interrupt is being serviced. And one which takes them at
single priority "round robin."

|                                                                   To support
| this without having a dedicated interrupt line for each device needs a bus
| protocol so that devices can post a request for interrupt including the
| interrupt number, and the CPU can come back later and say "number 17, your
| interrupt is now taken."  If we have all level-triggered interrupts, we could
| even get by without the call back.

  This is a bus issue, I think.

  Actually this whole thing is taking place off chip, so you can do
anything you want for interrupts. You can use a multiplexed scheme to
reduce the number of lines, with or without the 8259. It's not part of
the CPU, except in the 80186 which had a clock, interrupt controller,
and a couple of serial i/o ports (1 bit) built in.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
    VMS is a text-only adventure game. If you win you can use unix.

dswartz@bigbootay.sw.stratus.com (Dan Swartzendruber) (12/20/90)

I believe Motorola removed user-mode access to the SR not for
any issues of efficiency or compatability, but rather to allow
virtual machine support.

--

Dan S.

rstewart@megatek.UUCP (Rich Stewart) (12/20/90)

In article <1990Dec19.060521.16051@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes:
>-- Graphics support of various kinds.  The 860 has a little support for
>ray tracing, with some instructions that make it easy to whiz through your
>data structures and figure out what obscures what.  One might also like some
>support for bit-aligned bit-blits, though that tends to tie up the data bus
>and so would be far more useful if it had some separate path to memory, at
>least to video memory, that didn't lock out the CPU.
>
>-- 
>John R. Levine, IECC, POB 349, Cambridge MA 02238, +1 617 864 9650
>johnl@iecc.cambridge.ma.us, {ima|spdcc|world}!iecc!johnl
>"Typically supercomputers use a single microprocessor." -Boston Globe

What on the i860 supports ray tracing? It has limited z buffer support,
multiple pixel output, and some color interpolation support.

Back to the 586, block operations, pixel functions, and plane operations
would all be real nice to support a generic window concept.

-Rich

sef@kithrup.COM (Sean Eric Fagan) (12/20/90)

In article <1990Dec19.143749.3216@ux1.cso.uiuc.edu> mcdonald@aries.scs.uiuc.edu (Doug McDonald) writes:
>Yes, in some person's sense. But maybe not speed. 

Give me a break.  Go read some papers on compiler design, in particular
optimization.  If you have more registers, you can cut down accesses to
memory, which is *slow*.

>IF you add more registers
>you have to add instructions to access them. The register addressing 
>system, of the 386 is already quite full. 

No shit.

>Those of you who want more
>registers, please explain here on the net exactly what the op-codes you
>are going to use to acces those registers. Are you goint to add a byte-
>prefix to every register instruction that says "use the special new
>register set"? If so, please explain how that would speed execution.
>Please remember that any operand that would be put in a register would be in
>the cache anyway. 

That last statement is *not* necessarily true.

Second of all, I never said that it would be easy or possible to add more
registers, only desirable to have more registers.  Are you so fond of code
like

	mov	eax, DWORD PTR [ebx+ecx*8+1234]

and then, three instructions later,

	mov	DWORD PTR [esp+12], eax
	mov	eax, DWORD PTR [...]
	/* another two or three instructions */
	mov	eax, DWORD PTR [esp+12]

Do you *really* understand what this is going to cost you in terms of
performance?

>Only once have I needed more registers than the 80286 already has. 

How nice for you.  Now go compile some code, and get a disassembly.  Note
all the memory references, because the compiler had to use them when it
would have been nicer to have some extra registers.  Count all the spills to
memory.  Add up all those extra cycles.  Fun, isn't it?  It's *so* amazing
how much faster a chip can be when it has to do a 32-bit data access every
instruction!

>I think a FAR better idea than squeezing in more registers would be
>to take advantage of the fact that the 80x86 was designed from the start
>to have an efficient instruction set, leave it that way, and simply use
>the chip space to make **everything** faster. 

The instruction set was designed to be efficient in a different era.  Now,
it's not so efficient.  Why do you think that RISC chips, or even 68k's, are
getting such higher performance?

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

sef@kithrup.COM (Sean Eric Fagan) (12/20/90)

In article <3068@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>  I meant what I said - "quantify" rather than qualify. Yes optimization
>would be better and memory accesses would be down, but how much?

Well... I could suggest you go read any recent (last decade or so) papers on
compiler optimization techniques, which would be chock full of them.  Also
read papers on the RISC chips, and why the register and instruction sets
were chosen.

Here is a sample of code:

	r2 = r3 = inb (0x3b8);

	r2 |= 8;
	outb (0x3b8, r2);

(r0 through r7 are declared locally as 'unsigned long r0, r1, ...;', and
inb and outb are declared as 'static inline unsigned char ...', and written
using inline assembly)

Here is the code gcc generates for that:

	inb (%dx)
	movl $952,-220(%ebp)
	movw -220(%ebp),%dx
	inb (%dx)
	movb %al,-216(%ebp)
	movzbl -216(%ebp),%eax
	movl %eax,-216(%ebp)
	movzbl -216(%ebp),%eax
	movl %eax,-212(%ebp)
	movl %eax,-216(%ebp)
	movl -220(%ebp),%eax
	movl %eax,-220(%ebp)
	movl -216(%ebp),%eax
	orl $8,%eax
	movl %eax,-216(%ebp)
	movw -220(%ebp),%dx
	movb -216(%ebp),%al
	outb (%dx)

Excercise for reader:  assuming 16 reigsters, rewrite that code using only
r0 through r7 (which was all I had declared in my code).  Then, take out an
intel book on the '386, and figure out the timings of the old code and the
new code (assume that the new register set will be accessed in the same
amount of time as the old register set, since I'm talking about completely
trashing the instruction set and redesigning it).

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

graeme@labtam.labtam.oz (Graeme Gill) (12/20/90)

In article <3066@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
> In article <5800@labtam.labtam.oz> graeme@labtam.labtam.oz (Graeme Gill) writes:
> 
> |     The answer to this is the usual RISC vs CISC arguments. Why have very
> | complicated hardware, that tends to be locked into a particular implementation
> | of windowing etc. , when with a little bit of effort on the window library
> | programmers part you can get the same performance with more general hardware
> 
>   When this lovely generalized system can perform at a reasonable rate,
> then that's fine. Until then users will want hardware boost because it's
> more pleasant to use, companies will want it because it's more
> productive.
> 
>   A display system isn't fast enough until it has to be slowed down to
> avoid overrunning the input bandwidth of the eye. Until then people will
> want more, and today that means some hardware assists. In truth you
> *can't* write software as fast as dedicated hardware, with any amount of
> effort, much less "a little bit."
> -- 

    I'm happy to say you are wrong. How does 208 Mbit/sec fill rate sound ?
Or a 100 Mbit/sec blt rate sound ? That's equivalent to 30 frames a second
fill rate on an 8 bit colour 1024 x 800 system, all done in software, no
hardware support. It only took me a few weeks work, to code up the routines,
and our customers don't have to know anything about it, since all they
see is a standard X11 interface. This isn't pie in the sky, we've been 
shipping for over 12 months.  If we'd used available graphics chips like
the 34010, 82786, 63484 etc, rather than a general purpose CPU like the
80960 (or the 29000), then the terminals would have been a lot slower,
with little or no possibility of fixing the operations those chips don't
support very well. In addition, we don't need another CPU chip as well
to handle ethernet i/o, X protocol processing etc. You will notice
that all the standalone 34010 systems have a 80186 or something in the
box as well.
    Our customers enjoy using the terminals, because they are noticeably
faster and more interactive than products based on available graphics
chips.
    This is likely to be the shape of the future. Notice that the Apple
Mac accelerator cards are based on 29000 chips, and that a number of
accelerator cards for the IBM PCs are starting to appear, based on
RISC CPUs rather than graphics chips.
    As I said before, there is definitely room for hardware assist, but
a bit of general purpose CPU goes a long way, especially in cost
effective systems.
	Oh, and bye the way, a lot of the operations are so fast now,
that they have to be slowed down in order to see whats going on.

	Graeme Gill
	Electronic Design Engineer.
	Labtam Australia

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/20/90)

In article <1990Dec19.222932.1446@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:

| The instruction set was designed to be efficient in a different era.  Now,
| it's not so efficient.  Why do you think that RISC chips, or even 68k's, are
| getting such higher performance?

  Take a look at SPECmarks and rething that last one. The 25 MHz 486
falls between the SS1 and SS+, 33MHz is off the shelf, 40MHz is
scheduled in a few months and engineering samples were out for board
design, average cycles per instruction is something like 1.3, fairly
close to the actual performance of most RISC machine.

  My point is that the term "such higher performance" is misleading, the
486 is comparable in performance to the typical single user workstation
RISC CPU (not many people get a 4/490 for personal use).
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
    VMS is a text-only adventure game. If you win you can use unix.

schow@bcarh185.bnr.ca (Stanley T.H. Chow) (12/21/90)

In article <1990Dec19.223934.1568@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:
>             [...]                      since I'm talking about completely
>trashing the instruction set and redesigning it).

I thought Intel already did exactly this! However, being sensible, they called the
80960 instead of a i586.


Stanley Chow        BitNet:  schow@BNR.CA
BNR		    UUCP:    ..!uunet!bnrgate!bcarh185!schow
(613) 763-2831               ..!psuvax1!BNR.CA.bitnet!schow
Me? Represent other people? Don't make them laugh so hard.

schow@bcarh185.bnr.ca (Stanley T.H. Chow) (12/21/90)

In article <5813@labtam.labtam.oz> graeme@labtam.labtam.oz (Graeme Gill) writes:
>    I'm happy to say you are wrong. How does 208 Mbit/sec fill rate sound ?
>Or a 100 Mbit/sec blt rate sound ? That's equivalent to 30 frames a second
>fill rate on an 8 bit colour 1024 x 800 system, all done in software, no
>hardware support. It only took me a few weeks work, to code up the routines,
>and our customers don't have to know anything about it, since all they
>see is a standard X11 interface. This isn't pie in the sky, we've been 


Hmm, 208 MBit/sec = 26 MByte = 26 MPixel of 8 bits each.

How do you do 26 million byte writes per second on a 80960? What speed are you
running it at? How is your frame buffer organized?

Also, how much CPU is left for the user applications?




Stanley Chow        BitNet:  schow@BNR.CA
BNR		    UUCP:    ..!uunet!bnrgate!bcarh185!schow
(613) 763-2831               ..!psuvax1!BNR.CA.bitnet!schow
Me? Represent other people? Don't make them laugh so hard.

wallach@motcid.UUCP (Cliff H. Wallach) (12/21/90)

In article <1990Dec19.223934.1568@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:
-In article <3068@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
--  I meant what I said - "quantify" rather than qualify. Yes optimization
--would be better and memory accesses would be down, but how much?
-
-Well... I could suggest you go read any recent (last decade or so) papers on
-compiler optimization techniques, which would be chock full of them.  Also
-read papers on the RISC chips, and why the register and instruction sets
-were chosen.
-
-Here is a sample of code:
-
-	r2 = r3 = inb (0x3b8);
-
-	r2 |= 8;
-	outb (0x3b8, r2);
-
-(r0 through r7 are declared locally as 'unsigned long r0, r1, ...;', and
-inb and outb are declared as 'static inline unsigned char ...', and written
-using inline assembly)
-
-Here is the code gcc generates for that:
-
-	inb (%dx)
-	movl $952,-220(%ebp)
-	movw -220(%ebp),%dx
-	inb (%dx)
-	movb %al,-216(%ebp)
-	movzbl -216(%ebp),%eax
-	movl %eax,-216(%ebp)
-	movzbl -216(%ebp),%eax
-	movl %eax,-212(%ebp)
-	movl %eax,-216(%ebp)
-	movl -220(%ebp),%eax
-	movl %eax,-220(%ebp)
-	movl -216(%ebp),%eax
-	orl $8,%eax
-	movl %eax,-216(%ebp)
-	movw -220(%ebp),%dx
-	movb -216(%ebp),%al
-	outb (%dx)

Is this code for real?

-
-Excercise for reader:  assuming 16 reigsters, rewrite that code using only
-r0 through r7 (which was all I had declared in my code).  Then, take out an
-intel book on the '386, and figure out the timings of the old code and the
-new code (assume that the new register set will be accessed in the same
-amount of time as the old register set, since I'm talking about completely
-trashing the instruction set and redesigning it).
-


Exercise for compiler writers: Generate optimized code for a current
architecture.  Maybe something like:

	xor	eax,eax
	mov	edx,3b8h
	in	al,edx
	mov	r3[bp],eax
	or	al,8
	out	edx,al
	mov	r2[bp],eax


Cliff Wallach				...uunet!motcid!wallach

mash@mips.COM (John Mashey) (12/21/90)

In article <3080@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>In article <1990Dec19.222932.1446@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:
>
>| The instruction set was designed to be efficient in a different era.  Now,
>| it's not so efficient.  Why do you think that RISC chips, or even 68k's, are
>| getting such higher performance?
>
>  Take a look at SPECmarks and rething that last one. The 25 MHz 486
>falls between the SS1 and SS+, 33MHz is off the shelf, 40MHz is
>scheduled in a few months and engineering samples were out for board
>design, average cycles per instruction is something like 1.3, fairly
>close to the actual performance of most RISC machine.

>  My point is that the term "such higher performance" is misleading, the
>486 is comparable in performance to the typical single user workstation
>RISC CPU (not many people get a 4/490 for personal use).

1) Generally, the only people who REALLY know the CPI are the architects
of a given CPU, because there's no simple way to measure.
However, 1.3, I think is rather far off, as shown below.

2) A reasonable approximation, that can actually be measured,
is MHz/VAx-mips.  (It actually happens that this is pretty close
approximation for MIPS machines and others with grossly-similar
instruction sets, I think).

3) If you look at MHz/SPEC-integer (a measureable idea of VAX-mips),
you find things like (numbers thru Fall SPEC):

MHZ	SPECint	M/S	Cache size	machine
25	12.4	2.0	64K		Sun SS1+, IPC
25	13.3	1.9	128K		Intel 486 (from Intel perf brief)
33	19.7	1.7	128K		Sun SS/49* (NOT a desktop)
25	19.4	1.3	64K		MIPS Magnum 3000  (1.288)
20	15.8	1.3	40K		IBM RS6000/520  (1.265)

I.e., to be more precise: a 486, with desktop/deskside package,
is comparable in integer performance (although not in FP) to
a desktop SPARC with a smaller cache.  Also, recall yesterday's postings
about taking are with compiler choice, timing etc, so all of this
has caveats.  However, it should be clear that the 486 does NOT
have the MHZ/Spec of the more efficient RISCs; in addition, although
I don't know exactly what a 486+cache+cache control costs,
the MIPS case above costs something like $300-$400, and I suspect that's
a bit les than the 486 case.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

jpk@ingres.com (Jon Krueger) (12/21/90)

From article <1990Dec19.060521.16051@iecc.cambridge.ma.us>,
 by johnl@iecc.cambridge.ma.us (John R. Levine):
> With a
> page table per segment, you actually could map each open file to a segment
> (a 4GB file is still pretty big) and merge all I/O with virtual memory.

Including output that you must guarantee has been written to
nonvolatile store?  In other words, output that survives
operating system crashes?

-- Jon
--

Jon Krueger, jpk@ingres.com 

silos@bench.sublink.ORG (Paolo Pennisi) (12/21/90)

In article <3069@crdos1.crd.ge.COM>, davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) writes:
> In article <1990Dec19.060521.16051@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes:
> 
> 	[ lots of good stuff ]
> 
> 	[ stuff about ints being painfully slow ]
>
[ stuff in defense of the 8259 approach ]
> 
>   Actually this whole thing is taking place off chip, so you can do
> anything you want for interrupts. You can use a multiplexed scheme to
> reduce the number of lines, with or without the 8259. It's not part of
> the CPU, except in the 80186 which had a clock, interrupt controller,
> and a couple of serial i/o ports (1 bit) built in.
> -- 
> bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
>     VMS is a text-only adventure game. If you win you can use unix.

I don't know how long this fact will be true....
actually Intel and AMD has produced higly integrated 80286 processor
(which include the interrupt processor), and the whole clone market is
stuck with the AT interrupt style which is surely not the best!

I think the problem when dealing with 80x86 processors arise from their
principal use in the MessDos clone market...
This really huge source of money has biased Intel towards wrong goals
for its CISC micro line. They need compatibility with the 8086, the
80286, they need the 8086 virtual mode (who cares "apart from the
MessDos users" about the emulation of a virtual crippled processor?)

I hope (for Intel, cause I don't like its micros) sometime Intell will
build a x86 with only the 32bit features of the 486, some more register
or, at lest, some more ortogonal instruction, and that will be a great
day (for them, I insist).

 Paolo Pennisi
-- 
 (ARPA) silos@bench.sublink.ORG				Paolo Pennisi
 (BANG) ...!otello!bench!silos				via Solari 19
 (MISC) ppennisi on BIX & PTPOSTEL			20144 Milano ITALIA
----< S U B L I N K  N E T W O R K  : a new way to *NIX communications >-----

sef@kithrup.COM (Sean Eric Fagan) (12/21/90)

In article <5874@avocado5.UUCP> wallach@motcid.UUCP (Cliff H. Wallach) writes:
>Is this code for real?

This code is very much for real, and was generated by a very good compiler:
gcc 1.37.1 (with a couple of modifications).

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/21/90)

In article <44256@mips.mips.COM> mash@mips.COM (John Mashey) writes:

|               However, it should be clear that the 486 does NOT
| have the MHZ/Spec of the more efficient RISCs; in addition, although
| I don't know exactly what a 486+cache+cache control costs,
| the MIPS case above costs something like $300-$400, and I suspect that's
| a bit les than the 486 case.

  Thanks for many interesting bits. The numbers are interesting for
several reasons, both because they indicate a higher cycles per
instruction than I saw in the original report I got from one of our
business units (and I accept that you may have better figures than I
do), but also because they show the 486 as being faster for SPECint than
the SS+. They figures I saw may have been with less cache, or may have
included the SPECfloat as well.

  As for cost, it's very hard to compare. Because the 486 bundles a lot
of stuff which is normally included in a workstation, MMU, FPU, and some
cache and a cache controller, it's hard to do a comparison to RISC which
is representative. If you count just the CPU, or CPU and cache, then the
486 looks expensive, while is you count like a vendor, and include the
CPU, FPU, MMU, cache and controller, all the glue chips needed for
discrete components, and the nebulous cost of motherboard realestate,
the 486 may look very desirable.

  All in all comparing these systems is very hard to do, even for people
who don't have any stake in the outcome.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
    VMS is a text-only adventure game. If you win you can use unix.

mash@mips.COM (John Mashey) (12/22/90)

In article <3082@crdos1.crd.ge.COM> davidsen@crdos1.crd.ge.com (bill davidsen) writes:
>In article <44256@mips.mips.COM> mash@mips.COM (John Mashey) writes:

>  As for cost, it's very hard to compare. Because the 486 bundles a lot
>of stuff which is normally included in a workstation, MMU, FPU, and some
>cache and a cache controller, it's hard to do a comparison to RISC which
>is representative. If you count just the CPU, or CPU and cache, then the
>486 looks expensive, while is you count like a vendor, and include the
>CPU, FPU, MMU, cache and controller, all the glue chips needed for
>discrete components, and the nebulous cost of motherboard realestate,
>the 486 may look very desirable.

You clearly do need to compare apples to apples.  About the only
way I know how to od this is to compare the "CPU cores",
i.e., eveything on the CPU side of the memory bus, for example:

486:
	486 itself
	SRAMs (included in all 486-based machines for which SPEC numbers
		have been published, as far as I can tell)
	cache controller
	any other glue needed to get to the memory bus (?)
	(This looks like it has 2 medium-sized VLSI parts + SRAM, plus
	(maybe) a little glue.
MIPS:
	R3000 (incl. MMU & cache controller)
	R3010 FPU
	SRAMs (direct control by CPU, no extra parts)
	misc other glue, such as read/write buffers
		(these days, a few small parts)
	(This package is what I gave the numbers for; it has 2 medium-sized
	VLSI parts + SRAM, plus a little glue...)
88K:
	88100
	2-8 88200s

SPARC:  (more variable)
	Integer Unit
	FPU
	MMU (either as MMU-part, or Sun-style SRAM design)
	SRAMs for cache
	cache control, glue, etc

Fortunately, it is actually easier to do this for workstations,
than for, for example embedded control, where everybody startsto argue
about the need/desirability of various features, and apples-oranges
comparisons abound :-)
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

uad1077@dircon.uucp (Ian Kemmish) (12/23/90)

In article <1990Dec18.213506.645@ncsuvx.ncsu.edu> kdarling@hobbes.ncsu.edu (Kevin Darling) writes:
>zap@lysator.liu.se (Zap Andersson) writes:
>>Once upon a time I built a graphics board for my computer. 240x240 pixels, 16 
>>shades of gray, not very heavy, BUT I had windows-like support in HARDWARE! 
>>Now I have
>> NEVER understood why this is not common practice in todays computers! I mean
>
>Total agreement!  There's no doubt that systems programmers in the
>not-too-distant future will look back on today as the dark ages
>of writing windowing software.  "What?? No windows in hardware?? Auugh!"
>

Hmmm, I've yet to see a windows-in-hardware chip that handles the input
semantics of windows or canvasses - you'd still need to handle the canvas
hierarchy in software, so having it in hardware as well just doubles
the amount of book-keeping you do.  Additionally, there is the problem
of what you do when you map the n+1'th window....  as I write this,
I can see about a hundred canvasses, a good few of them not rectangular.
Since I spend far more time drawing pictures than dragging windows,
and after all, clipping is ridiculously cheap compared to painting pixels,
I find if hard to shake the convicition that a windowing chip would
cost me more than it gained me.  If you're into graphics, the best thing
to invest in is a graphics pipeline and shaded triangle processor.  If
you're into 2D windows, the best thing would be a font scaler in hardware
(i.e. rendering direct from Type1 font descriptions to the screen).

-- 
Ian D. Kemmish                    Tel. +44 767 601 361
18 Durham Close                   uad1077@dircon.UUCP
Biggleswade                       ukc!dircon!uad1077
Beds SG18 8HZ United Kingd    uad1077%dircon@ukc.ac.uk

kdarling@hobbes.ncsu.edu (Kevin Darling) (12/23/90)

About windowing in hardware, uad1077@dircon.uucp (Ian Kemmish) writes:
>Hmmm, I've yet to see a windows-in-hardware chip that handles the input
>semantics of windows or canvasses - you'd still need to handle the canvas
>hierarchy in software, so having it in hardware as well just doubles
>the amount of book-keeping you do.

Apologies... I'm not sure what you meant here.  Yes, I'd have to keep
the bounds and depth info anyway, but I don't think that tiny amount would
be a burden.  Especially as compared to the burden (code and cpu cycles)
of having either user apps or system code do multiple redraws when one
window gets unmapped or moved.

>Additionally, there is the problem of what you do when you map the
>n+1'th window.... 

<grin> Yes, that's always a bother.  But we're talking about possible
future hardware, not just today's (quick way out of corner ;-).

>Since I spend far more time drawing pictures than dragging windows,
>and after all, clipping is ridiculously cheap compared to painting pixels,
>I find if hard to shake the conviction that a windowing chip would
>cost me more than it gained me.

I'm sure it depends on your needs and setup.  In my case, I'm programming
for a realtime multitasking computer whose cpu must execute both normal
programs and windowing code.  And any overlapping windows must be handled
without asking apps to do redraws, so clipping is out of the question.
I'm sure you're right that it wouldn't be a gain for you, but I'm just
as convinced it'd be a win in my situation :-).  Different strokes...
  cheers - kevin <kdarling@catt.ncsu.edu>

chris@mimsy.umd.edu (Chris Torek) (12/24/90)

>uad1077@dircon.uucp (Ian Kemmish) writes:
>>Hmmm, I've yet to see a windows-in-hardware chip that handles the input
>>semantics of windows or canvasses - you'd still need to handle the canvas
>>hierarchy in software, so having it in hardware as well just doubles
>>the amount of book-keeping you do.

In article <1990Dec23.093537.18481@ncsuvx.ncsu.edu> kdarling@hobbes.ncsu.edu
(Kevin Darling) writes:
>Apologies... I'm not sure what you meant here.

Essentially, you must retain the clipping boundaries for all windows in
software so that you can tell where the input focus is (for `input is
at cursor hot spot' interfaces, anyway; `click-to-type' interfaces
could, in theory, ask your hardware-chip `which window number is spot
(x,y)', and this can be computed during a display scan: 1/70th of a
second for focus to take effect is not too bad).

However, typically the answer to `where is the input' is best computed
by a different method than `where are the windows', so this doubling is
not quite accurate.

>>Additionally, there is the problem of what you do when you map the
>>n+1'th window.... 

><grin> Yes, that's always a bother.  But we're talking about possible
>future hardware, not just today's (quick way out of corner ;-).

Depending on how you define a `window', future hardware might have to
handle numbers on the order of 10,000 windows.  (X11 was originally
designed to make each individual window cheap, unlike SunView; as time
passed the windows got `fatter' and now in addition to `widgets', each
of which is a window, there are toolkits with `gadgets', which are not.
This is one of the reasons X11 is wrong.  ---Not to belittle X11: it is
a massive effort and there is a lot to be learned from it.  Still, it
has grown WAY too complicated.  More in a moment:)

>... any overlapping windows must be handled without asking apps to do
>redraws,

I agree with this.  The window system (as a whole, however it is built)
must provide each `window user' (application or whatever) the illusion
that it has an arbitrarily large and arbitrarily perfect screen all to
itself.  There must be a way to find out what flaws exist (e.g., mapped
or monochrome instead of true color, 1536x1152 pixel rather than infinite,
etc.) for special purpose applications, but the default should be a
perfect virtual display.  (This is another reason X11 is wrong.)

When you draw in an overlapped window, the draw should take place in
the window.  If the covered region is exposed, the window system must
put up the result of the draw.  If that means it must draw in off-screen
memory, then it must draw in off-screen memory.

(Some will make the following objection:

	`My high end display has 1536x1152 pixels, each with 24 bits
	of true color.  That is 5 megabytes per display.  You want a
	window system to allow 100 overlapping full-sized windows and
	you want it to retain all 500 megabytes?!?'

The answer to this is `yes':  `How much did you pay for your high-end
display?  And you mean to tell me that after that, you cannot afford
another $1500 for a 600 MB disk for virtual memory?'  The usual
comeback is `but the application can recompute the display using less
memory':  Yes, but so what?  That requires more code in every
application.  Pretty soon you have to buy a few $2500 1.2GB disks to
hold the applications, not to mention all that money on developer
effort to write the extra redisplay code, not to mention the low
bandwidth between the CPU and display compared to on-display, ....
The extra data space in each application is not free, either.)

>so clipping is out of the question.

Not at all---*within* the window system.

Anyway, to move back towards architecture, there is one key point when
it comes to doing windows in hardware:

	Working smart will always outdo working hard, but working hard
	can sometimes (often?) be cheaper.

Right now, however, I think the tradeoff remains on the side of `working
smart': i.e., doing the windows in software.  It is moving towards
`working hard', but has not got there yet.  Give it a few more years....
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@cs.umd.edu	Path:	uunet!mimsy!chris

spot@CS.CMU.EDU (Scott Draves) (12/25/90)

In article <28774@mimsy.umd.edu> chris@mimsy.umd.edu (Chris Torek) writes:

   Depending on how you define a `window', future hardware might have to
   handle numbers on the order of 10,000 windows.  (X11 was originally
   designed to make each individual window cheap, unlike SunView; as time
   passed the windows got `fatter' and now in addition to `widgets', each
   of which is a window, there are toolkits with `gadgets', which are not.

X11 Windows are still as small and cheap as they ever were.  The
reason Motif uses gadgets is that widgets are fat, especially its
widgets.  In my Motif code I never use gadgets and find performance to
be adequate, and my code does a lot of creation and destruction of
widgets, which is their worst case.

   The window system (as a whole, however it is built)
   must provide each `window user' (application or whatever) the illusion
   that it has an arbitrarily large and arbitrarily perfect screen all to
   itself.

Well, I don't think arbitrarily large fits in with the rest of this,
but yes.  This is basically what PostScript does (with the addition of
abstract coordinates).

  (This is another reason X11 is wrong.)

I don't think statements like that are warranted.  For something that
is "wrong" X11 is very successful and widely used.

My understanding is that X isn't intended to be very abstract,
abstract colors and coordinates are the place of an extension or a
toolkit.  Unfortunately, no such beast exists, which is a shame; I
think it would be very popular.

Similarly, X doesn't specify a user interface, that's the place of
Motif.  There's a lot to be said for this modularity.

   [ refresh should be handled by the window system.  apps invisibly
     draw into offscreen buffers and blit to the screen as necessary.
     The extra memory is well spent because it saves code in every
     application, and saves developement time. ]

I disagree with your analysis.  The amount of extra code and effort
that needs to be put into each application is near zero.  You are
orders of magnitude away from balancing the cost of the hidden
windows.  In any case relatively few unix programs interact with the
window system; most run in terminals (this may eventually (hopefully?)
change).

I really like what the NeXT window system does.  It gives an
application three choices for handling refresh.  The window system
either 1) saves bitmaps and blits to refresh the screen.  2) saves the
postscript and rerenders to reresh the screen.  3) calls the app.

There are many cases where saving the bits is ridiculously
inefficient.  Two examples: 1) the window is displaying a bitmap
image, so the application already has its own offscreen buffer.  2)
the window is sparse, i.e. mostly background.

I have no qualms about spending memory, but you must decide if the
alternatives warrent it.
--

			IBM
Scott Draves		Intel
spot@cs.cmu.edu		Microsoft

rcg@lpi.liant.com (Rick Gorton) (12/26/90)

>  What features should be put into the CPU to improve performance and
>reduce chip count?
>

SOME REGISTERS!!!!!

And not some of those silly things usable only by instruction FOO
where register q contains an address for FOO and register z contains
a count for FOO.  How about a couple of GEN-YOU-WINE general purpose
32 bit registers?

-- 
Richard Gorton               rcg@lpi.liant.com  (508) 626-0006
Language Processors, Inc.    Framingham, MA 01760
Hey!  This is MY opinion.  Opinions have little to do with corporate policy.

graeme@labtam.labtam.oz (Graeme Gill) (12/28/90)

In article <3853@bnr-rsc.UUCP>, schow@bcarh185.bnr.ca (Stanley T.H. Chow) writes:
> In article <5813@labtam.labtam.oz> graeme@labtam.labtam.oz (Graeme Gill) writes:
> > /* omitted to save space */
> 
> 
> Hmm, 208 MBit/sec = 26 MByte = 26 MPixel of 8 bits each.
> 
> How do you do 26 million byte writes per second on a 80960? What speed are you
> running it at? How is your frame buffer organized?
>
    This is our slow machine, and it runs at 20 Mhz. We have a new model that
runs at 25 Mhz. The Frame store is packed 4 pixels per 32 bit word. A burst
write instruction takes 2 + 4 * 2 + 1 clock cycles per 16 pixels of bus time.
At 20 MHz that equates to a theoretical rate of 29.1 Mbytes/sec (ignoring
refresh overhead, interrupt routine overhead, Ethernet DMA overhead etc.)
The measured rates using x11perf for 500x500 filled areas is 26 Mpixels/sec.
    Since it has a 3 deep write queue and scoreboarded reads, the internal
CPU operation proceeds in parallel with the bus cycles.

> 
> Also, how much CPU is left for the user applications?
> 
    None, Its an X terminal :-) the application runs on the host. This sometimes
makes it a faster system than a workstation that has to both draw and run
the application. There are the usual tradeoffs of centralized vs. distributed
computing.

> Stanley Chow        BitNet:  schow@BNR.CA

Graeme Gill
Electronic Design Engineer
Labtam Australia

ts@cup.portal.com (Tim W Smith) (01/02/91)

< I don't think statements like that are warranted.  For something that
< is "wrong" X11 is very successful and widely used.

What does wrongness have to do with width of use?  Look at MS-DOS,
for example, to see that wide use does not indicate lack of
wrongness.

jesup@cbmvax.commodore.com (Randell Jesup) (01/24/91)

In article <5800@labtam.labtam.oz> graeme@labtam.labtam.oz (Graeme Gill) writes:
>In article <450@lysator.liu.se>, zap@lysator.liu.se (Zap Andersson) writes:
>>  NEVER understood why this is not common practice in todays computers! I mean
>> what CAN be easier than to include in the gfx chip that 'when beam reaches
>> this'n'that row/column, start displaying bitmap-data from this'n'that memory!
>> The Amiga is the closes I've seen, supporting these 'semi-hardware' (the
>> amiga uses a co-processor) as horizontal slices of display. With a faster
>> co-processor (i.e. faster than 1 pixel bitclock) you could have hardware
>> windows support! You will NEVER need to worry about memorys overlapping, or
>> in what memory to write! You just write to your 'virtual' screen, and the
>> display chip takes care about it ALL.

>about a generation behind mainstream processors. Doing windowing in software
>allows a great deal of flexibility in fixing bugs, keeping up with standards
>developments, ease of porting code to new generations of hardware, etc., 

	Another reason: most current ways of doing windowing in hardware
have a fixed number of windows they can support (especially if they
each have a different color palette on a non-direct-RGB system).  The Amiga
has screens ("hardware horizontal windows"), and on each screen you can have
windows.  Note that there are blank lines between screens: it needs to update
the bitmap pointers, color table, etc.  The screens are draggable, though they
remain a solid horizontal slice (actually, you can sort of do HW windowing,
but it's rather limited since you can't change much on the fly across a line).

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
The compiler runs
Like a swift-flowing river
I wait in silence.  (From "The Zen of Programming")  ;-)

jesup@cbmvax.commodore.com (Randell Jesup) (01/24/91)

In article <1582@pai.UUCP> erc@pai.UUCP (Eric F. Johnson) writes:
>    1) Macintosh--measure user base in millions
>    2,3) tie- Microsoft Windows, Amiga--around the 1 million mark for
>    MS Windows, more for the Amiga, but that will soon change (note that
>    I'm not saying whether this will be good or bad). If you read

	Over 2 million now for Amiga, going up fast.

>    Personal Workstation, please tell them that the Amiga sports a 
>    multi-tasking operating system with a graphical user interface
>    (for PW's strange application watch).

	But that would scale their existing entries to nil, unless they used
a log graph. ;-)  X is available on the Amiga also (3rd party).

>PS, I know there are a lot of Amiga evangelists out there. I'm not trying to 
>get your goat, just noting market reality. I am in no way trying to imply
>anything at all that could ever be considered bad about your wonderful
>machine.

	Noted.  MS windows 3.0 can try sell into a fairly large base of
existing machines.

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
The compiler runs
Like a swift-flowing river
I wait in silence.  (From "The Zen of Programming")  ;-)