[comp.sys.amiga.advocacy] 680x0 vs 80x86

dant@ryptyde.UUCP (Daniel Tracy) (06/22/91)

"Are you suggesting that the unified cache is a limitation designed to ensure
MS-DOS compatibility?  How would a split cache break MS-DOS?"
 
This probably has something to do with the fact that the 80x86 has
general-purpose registers, as opposed to seperate address and data
registers, right? This would at least make it harder to impliment?

jbickers@templar.actrix.gen.nz (John Bickers) (06/22/91)

Quoted from <92@ryptyde.UUCP> by dant@ryptyde.UUCP (Daniel Tracy):
> "Are you suggesting that the unified cache is a limitation designed to ensure
> MS-DOS compatibility?  How would a split cache break MS-DOS?"

> This probably has something to do with the fact that the 80x86 has
> general-purpose registers, as opposed to seperate address and data

    It's usually because if you have a split cache you break programs
    that use self-modifying code.

    This mightn't break MS-DOS itself, but the big PClone software
    companies (eg: Microsoft) aren't known for the production quality
    of their code.

    Also note that the address registers can be used to store data. I
    believe Microsoft did so with one of their Amiga products? Something
    like storing data in the top byte of an address register, on the
    assumption that only the low 24 bits are used for addressing
    purposes.

    Slick people, Microsoft.
--
*** John Bickers, TAP, NZAmigaUG.        jbickers@templar.actrix.gen.nz ***
***         "Endless variations, make it all seem new" - Devo.          ***

peter@Sugar.NeoSoft.com (Peter da Silva) (06/23/91)

In article <92@ryptyde.UUCP> dant@ryptyde.UUCP (Daniel Tracy) writes:
> This probably has something to do with the fact that the 80x86 has
> general-purpose registers, as opposed to seperate address and data
> registers, right? This would at least make it harder to impliment?

You gotta be kidding... *every* register on the 80x86 just about is special
purpose. There are like a couple of GP accumulators, and the rest are string
pointers, segment pointers, stack pointers, and the like.
-- 
Peter da Silva.   `-_-'   <peter@sugar.neosoft.com>.
                   'U`    "Have you hugged your wolf today?"

dant@ryptyde.UUCP (Daniel Tracy) (06/24/91)

Responding to the following:
 
"Also note that the address registers can be used to store data. I
    believe Microsoft did so with one of their Amiga products? Something
    like storing data in the top byte of an address register, on the
    assumption that only the low 24 bits are used for addressing
    purposes."
 
Apple also did this with the original MacOS. Of course, this was in the day
of 128K machines. But the transition has been rather smooth since Apple
warned developers about 32-bitness back in 1986, so most programs are
"32-bit clean".

melling@cs.psu.edu (Michael D Mellinger) (06/24/91)

In article <105@ryptyde.UUCP> dant@ryptyde.UUCP (Daniel Tracy) writes:

   Apple also did this with the original MacOS. Of course, this was in the day
   of 128K machines. But the transition has been rather smooth since Apple
   warned developers about 32-bitness back in 1986, so most programs are
   "32-bit clean".

Yeah, the patches for LightSpeed C and Pascal will be shipping anytime
now. :-).  (Does this count?  Are these major products? :-))

-Mike

sho@gibbs.physics.purdue.edu (Sho Kuwamoto) (06/24/91)

In article <clean> melling@cs.psu.edu (Michael D Mellinger) writes:
>[Re: 32-bit clean apps for the mac]
>Yeah, the patches for LightSpeed C and Pascal will be shipping anytime
>now. :-).  (Does this count?  Are these major products? :-))

THINK C (formerly Lightspeed C) produces 32-bit clean code, but is not
itself 32-bit clean.  I have been told that a System 7 studly version
is forthcoming, but I think we all know the dangers of vaporware.  

This article may not be entirely appropriate for this group, but my
thinking was thus: 
   I will regret not having posted this if, a week from now, someone
   writes an article in which he or she states that one of the two
   main compilers for the mac is incapable of producing code which
   runs under A/UX or 32-bit mode.

After all, the only reason to talk about macs in this group is to
avoid stupid flame wars in which the mac users don't know what the
amiga is about and vice versa.  Armed with sufficient information, we
can get into stupid flame wars in which both sides are basically well
informed. 

-Sho
-- 
sho@physics.purdue.edu <<-- after all, this is the information age.

torrie@cs.stanford.edu (Evan Torrie) (06/24/91)

melling@cs.psu.edu (Michael D Mellinger) writes:


>In article <105@ryptyde.UUCP> dant@ryptyde.UUCP (Daniel Tracy) writes:

>   Apple also did this with the original MacOS. Of course, this was in the day
>   of 128K machines. But the transition has been rather smooth since Apple
>   warned developers about 32-bitness back in 1986, so most programs are
>   "32-bit clean".

>Yeah, the patches for LightSpeed C and Pascal will be shipping anytime
>now. :-).  (Does this count?  Are these major products? :-))

  Actually no.  Not compared to say Word, Excel, MacWrite, Filemaker, etc.
Anyway, I think the problem with these products is
1.  They're integrated development environments which do some fairly funky
    things with the OS environment to allow debugging/editing etc all
    in the same package.
2.  Symantec (which took over Think Technologies) has been fairly negligent
    over the past 2 years of these products, to the extent that a fair
    few users have switched to the better supported MPW (which is 32-bit
    clean and also sports C++ for example).

-- 
------------------------------------------------------------------------------
Evan Torrie.  Stanford University, Class of 199?       torrie@cs.stanford.edu   
"I didn't get where I am today without knowing a good deal when I see one,
 Reggie."  "Yes, C.J."

dant@ryptyde.UUCP (Daniel Tracy) (06/24/91)

Responding to the following:

"You gotta be kidding... *every* register on the 80x86 just about is special
purpose. There are like a couple of GP accumulators, and the rest are string
pointers, segment pointers, stack pointers, and the like."

I was referring to the general-purpose address/data registers used in the 
8086 line! You wouldn't call these general purpose?

jerry@polygen.uucp (Jerry Shekhel) (06/25/91)

jbickers@templar.actrix.gen.nz (John Bickers) writes:
>
>    It's usually because if you have a split cache you break programs
>    that use self-modifying code.
>

Doubtful, John, since every OS in existence treats code as data when it
loads it into memory for execution.

>--
>*** John Bickers
--
+-------------------+----------------------+---------------------------------+
| JERRY J. SHEKHEL  | POLYGEN CORPORATION  | When I was young, I had to walk |
| Drummers do it... | Waltham, MA USA      | to school and back every day -- |
|    ... In rhythm! | (617) 890-2175       | 20 miles, uphill both ways.     |
+-------------------+----------------------+---------------------------------+
|           ...! [ princeton mit-eddie bu sunne ] !polygen!jerry             |
|                            jerry@polygen.com                               |
+----------------------------------------------------------------------------+

mykes@amiga0.SF-Bay.ORG (Mike Schwartz) (06/25/91)

In article <1991Jun24.051233.3203@neon.Stanford.EDU> torrie@cs.stanford.edu (Evan Torrie) writes:
>2.  Symantec (which took over Think Technologies) has been fairly negligent
>    over the past 2 years of these products, to the extent that a fair
>    few users have switched to the better supported MPW (which is 32-bit
>    clean and also sports C++ for example).
>

Too bad MPW doesn't multitask.  When you start up a compile in MPW, you can't
use your MPW editor to browse your files.  This was a horrible design decision
when they made MPW in the first place.  Since multifinder isn't a true multitasking
solution, the best you can hope for is to run two HUGE copies of MPW and maybe
the MPW tools will be friendly enough to allow you to compile from one and edit
from the other.

>-- 
>------------------------------------------------------------------------------
>Evan Torrie.  Stanford University, Class of 199?       torrie@cs.stanford.edu   
>"I didn't get where I am today without knowing a good deal when I see one,
> Reggie."  "Yes, C.J."

--
****************************************************
* I want games that look like Shadow of the Beast  *
* but play like Leisure Suit Larry.                *
****************************************************

navas@cory.Berkeley.EDU (David C. Navas) (06/25/91)

In article <112@ryptyde.UUCP> dant@ryptyde.UUCP (Daniel Tracy) writes:
>I was referring to the general-purpose address/data registers used in the 
>8086 line! You wouldn't call these general purpose?

Uh, what registers would those be?  cs:ip, es:di, ds:si, ss:[bp|sp] are
generally address pointers.  Of course, cs:ip is used, as is ss:sp.  If
you program in 'C', most likely bp is used as well.  This leaves you
with the address pairs es:di and ds:si.  What a bonanza...

Then you have your "data" registers -- ax, bx, cx, dx.  bx can be used as
a segment offset (I think), ax is required if your doing any mults/divs,
cx is useful for doing shifts, and that usually leaves me with dx.  Another
veritable bonanza...

It's an improvement over a 6502, maybe, but what register here would you
call general purpose?  I wouldn't call any of them general purpose.  The
closest thing would be 'bx'.

David Navas                                   navas@cory.berkeley.edu
	2.0 :: "You can't have your cake and eat it too."
Also try c186br@holden, c260-ay@ara and c184-ap@torus

peter@Sugar.NeoSoft.com (Peter da Silva) (06/25/91)

In article <112@ryptyde.UUCP> dant@ryptyde.UUCP (Daniel Tracy) writes:
> I was referring to the general-purpose address/data registers used in the 
> 8086 line! You wouldn't call these general purpose?

Oh, sure. Both of them.
-- 
Peter da Silva.   `-_-'   <peter@sugar.neosoft.com>.
                   'U`    "Have you hugged your wolf today?"

dant@ryptyde.UUCP (Daniel Tracy) (06/25/91)

Responding to the following:

>Yeah, the patches for LightSpeed C and Pascal will be shipping anytime
>now. :-).  (Does this count?  Are these major products? :-))
>
>-Mike
 
Correct me if I'm wrong, but these products don't exist anymore and aren't
supported. I think they're old versions of THINK Pascal and C. As a matter,
I haven't heard those names for a while. Are they now shareware/freeware?

torrie@cs.stanford.edu (Evan Torrie) (06/25/91)

mykes@amiga0.SF-Bay.ORG (Mike Schwartz) writes:

>Too bad MPW doesn't multitask.  

  This is more a multi-threading, rather than multitasking example, since
the MPW editor and compiler run in the same address space.  I guess you 
can argue semantics of multi-threading vs multitasking.

>When you start up a compile in MPW, you can't
>use your MPW editor to browse your files.  This was a horrible design decision
>when they made MPW in the first place.  Since multifinder isn't a true multitasking
>solution, the best you can hope for is to run two HUGE copies of MPW and maybe
>the MPW tools will be friendly enough to allow you to compile from one and edit
>from the other.

  Or else, you use a different editor to edit your programs...  like
Alpha, the Emacs clone.  

-- 
------------------------------------------------------------------------------
Evan Torrie.  Stanford University, Class of 199?       torrie@cs.stanford.edu  
"And in the death, as the last few corpses lay rotting in the slimy
 thoroughfare, the shutters lifted in inches, high on Poacher's Hill..."

rjc@churchy.gnu.ai.mit.edu (Ray Cromwell) (06/25/91)

In article <1154@stewart.UUCP> jerry@stewart.UUCP (Jerry Shekhel) writes:
>jbickers@templar.actrix.gen.nz (John Bickers) writes:
>>
>>    It's usually because if you have a split cache you break programs
>>    that use self-modifying code.
>>
>
>Doubtful, John, since every OS in existence treats code as data when it
>loads it into memory for execution.

  But this is different, Jerry, because in this case the OS KNOWS
how to clear caches. If a lot of MS-DOG programs used self-modifying
programs, or if the OS itself doesn't know how to treat caches,
code will break. Hence, Intel probably keeping I&D unified to avoid
an MS-DOG nightmare.

  
>>--
>>*** John Bickers
>--
>+-------------------+----------------------+---------------------------------+
>| JERRY J. SHEKHEL  | POLYGEN CORPORATION  | When I was young, I had to walk |
>| Drummers do it... | Waltham, MA USA      | to school and back every day -- |
>|    ... In rhythm! | (617) 890-2175       | 20 miles, uphill both ways.     |
>+-------------------+----------------------+---------------------------------+
>|           ...! [ princeton mit-eddie bu sunne ] !polygen!jerry             |
>|                            jerry@polygen.com                               |
>+----------------------------------------------------------------------------+


--
/ INET:rjc@gnu.ai.mit.edu     *   // The opinions expressed here do not      \
| INET:r_cromwe@upr2.clu.net  | \X/  in any way reflect the views of my self.|
\ UUCP:uunet!tnc!m0023        *                                              /

navas@cory.Berkeley.EDU (David C. Navas) (06/26/91)

In article <1154@stewart.UUCP> jerry@stewart.UUCP (Jerry Shekhel) writes:
>jbickers@templar.actrix.gen.nz (John Bickers) writes:
>>    It's usually because if you have a split cache you break programs
>>    that use self-modifying code.

>Doubtful, John, since every OS in existence treats code as data when it
>loads it into memory for execution.

Hmm, and I thought some of your other points were good, but now I know you
were just smoking something :)

But seriously, any OS that has to worry about loading code and cache
consistency has special code to deal with that.

DOS programs, on the other hand, don't, because they've never had to.
These programs could be rewritten, but when you have 40-60 million
DOS users, you don't tend to go out and break something like that.

David Navas                                   navas@cory.berkeley.edu
	2.0 :: "You can't have your cake and eat it too."
Also try c186br@holden, c260-ay@ara and c184-ap@torus

jbickers@templar.actrix.gen.nz (John Bickers) (06/26/91)

Quoted from <1154@stewart.UUCP> by jerry@polygen.uucp (Jerry Shekhel):
> jbickers@templar.actrix.gen.nz (John Bickers) writes:

> >    It's usually because if you have a split cache you break programs
> >    that use self-modifying code.

> Doubtful, John, since every OS in existence treats code as data when it
> loads it into memory for execution.

    I believe you'll find that these OS's have functions to "flush" the
    various caches, to keep things in sync. Or rely on overflowing the
    cache...

    I also believe that loading programs is a very minor proportion of
    the total usage RAM gets on a typical machine, so it's a minor loss
    in efficiency.

> | JERRY J. SHEKHEL  | POLYGEN CORPORATION  | When I was young, I had to walk |
--
*** John Bickers, TAP, NZAmigaUG.        jbickers@templar.actrix.gen.nz ***
***         "Endless variations, make it all seem new" - Devo.          ***

dant@ryptyde.UUCP (Daniel Tracy) (06/26/91)

In article <1991Jun25.012010.3154@Sugar.NeoSoft.com> peter@Sugar.NeoSoft.com (Peter da Silva) writes:
>In article <112@ryptyde.UUCP> dant@ryptyde.UUCP (Daniel Tracy) writes:
>> I was referring to the general-purpose address/data registers used in the 
>> 8086 line! You wouldn't call these general purpose?
>
>Oh, sure. Both of them.
>-- 
>Peter da Silva.   `-_-'   <peter@sugar.neosoft.com>.
>                   'U`    "Have you hugged your wolf today?"

I believe there are 8. All of which can be used as data registers, most
can be used as address registers, and 7 are implicit parameters to various
commands.

kls30@duts.ccc.amdahl.com (Kent L Shephard) (06/27/91)

In article <1991Jun25.165516.13021@mintaka.lcs.mit.edu> rjc@churchy.gnu.ai.mit.edu (Ray Cromwell) writes:
>In article <1154@stewart.UUCP> jerry@stewart.UUCP (Jerry Shekhel) writes:
>>jbickers@templar.actrix.gen.nz (John Bickers) writes:
>>>
>>>    It's usually because if you have a split cache you break programs
>>>    that use self-modifying code.
>>>
>>
>>Doubtful, John, since every OS in existence treats code as data when it
>>loads it into memory for execution.
>
>  But this is different, Jerry, because in this case the OS KNOWS
>how to clear caches. If a lot of MS-DOG programs used self-modifying
>programs, or if the OS itself doesn't know how to treat caches,
>code will break. Hence, Intel probably keeping I&D unified to avoid
>an MS-DOG nightmare.

Wrong.  Intel decided to go with a unified cache for one because it is
simpler to implement.  Also if you have a 4 way set assoc. cache you
have basically 4 small caches.  Also in Intel processors you have
instuctions that have data included or immediately following.  Kind of
hard to separate data and instuctions.

Moto went with a seperate cache because the architecture is different.
The type of instructions are different.

As for self modifying code.  The machines that use Moto processors are
more guilty of this.  The Mac and Atari machines uses self modifying code
for copy protection.  When Moto started putting small caches on their
chips it created a nightmare.

Self modifying code would have broken the 386 with cache.

Also if a cache is designed properly it should be completly transparent to
software.
>
>  
>>>--
>>>*** John Bickers
>>--
>>+-------------------+----------------------+---------------------------------+
>>| JERRY J. SHEKHEL  | POLYGEN CORPORATION  | When I was young, I had to walk |
>>| Drummers do it... | Waltham, MA USA      | to school and back every day -- |
>>|    ... In rhythm! | (617) 890-2175       | 20 miles, uphill both ways.     |
>>+-------------------+----------------------+---------------------------------+
>>|           ...! [ princeton mit-eddie bu sunne ] !polygen!jerry             |
>>|                            jerry@polygen.com                               |
>>+----------------------------------------------------------------------------+
>
>
>--
>/ INET:rjc@gnu.ai.mit.edu     *   // The opinions expressed here do not      \
>| INET:r_cromwe@upr2.clu.net  | \X/  in any way reflect the views of my self.|
>\ UUCP:uunet!tnc!m0023        *                                              /


--
/*  -The opinions expressed are my own, not my employers.    */
/*      For I can only express my own opinions.              */
/*                                                           */
/*   Kent L. Shephard  : email - kls30@DUTS.ccc.amdahl.com   */

peter@Sugar.NeoSoft.com (Peter da Silva) (06/27/91)

In article <mykes.3740@amiga0.SF-Bay.ORG> mykes@amiga0.SF-Bay.ORG (Mike Schwartz) writes:
> Too bad MPW doesn't multitask.  the best you can hope for is to run two
> HUGE copies of MPW and maybe the MPW tools will be friendly enough to
> allow you to compile from one and edit from the other.

This is Mike "write in assembly and blow away the O/S" Schwartz?

It's a parallel world, right?
-- 
Peter da Silva.   `-_-'   <peter@sugar.neosoft.com>.
                   'U`    "Have you hugged your wolf today?"

elg@elgamy.raidernet.com (Eric Lee Green) (06/27/91)

From article <1154@stewart.UUCP>, by jerry@polygen.uucp (Jerry Shekhel):
> jbickers@templar.actrix.gen.nz (John Bickers) writes:
>>    It's usually because if you have a split cache you break programs
>>    that use self-modifying code.
> Doubtful, John, since every OS in existence treats code as data when it
> loads it into memory for execution.

Note that AmigaDOS takes that into account, for the 68030/68040 class
machines. Flushes the program cache so that the new data is loaded into it.
MSDOS, of course, does nothing of the sort. Which is why MSDOS-based
machines have an integrated code/data cache, which is less efficient than a
split cache (since data generally has a less than 50% hit rate, and data
accesses would keep kicking program cache lines out of memory... whereas
code has 90% or better hit rate with decent cache sizes).

--
Eric Lee Green   (318) 984-1820  P.O. Box 92191  Lafayette, LA 70509
elg@elgamy.RAIDERNET.COM               uunet!mjbtn!raider!elgamy!elg

torrie@cs.stanford.edu (Evan Torrie) (06/27/91)

kls30@duts.ccc.amdahl.com (Kent L Shephard) writes:

>In article <1991Jun25.165516.13021@mintaka.lcs.mit.edu> rjc@churchy.gnu.ai.mit.edu (Ray Cromwell) writes:
>>
>>  But this is different, Jerry, because in this case the OS KNOWS
>>how to clear caches. If a lot of MS-DOG programs used self-modifying
>>programs, or if the OS itself doesn't know how to treat caches,
>>code will break. Hence, Intel probably keeping I&D unified to avoid
>>an MS-DOG nightmare.

>Wrong.  Intel decided to go with a unified cache for one because it is
>simpler to implement.  Also if you have a 4 way set assoc. cache you
>have basically 4 small caches.  

  But you still have only one internal path from the CPU to the cache, thus
cutting your bandwidth in half vs a split I/D Harvard architecture.  For an
example of why this is important, check out the parallelism in any of today's
microprocessors' pipelines.  
  Does Intel still use their 386 instruction prefetch buffer in the
486?  I suppose that should shore up some of the performance loss from
having a unified cache.

>Also in Intel processors you have
>instuctions that have data included or immediately following.  Kind of
>hard to separate data and instuctions.

  An example of such an instruction?  The 68K has data in its instructions, 
the ADDQ #x, Dn for example, but this doesn't stop an I/D cache (since the 
data is non-modifiable).

>Moto went with a seperate cache because the architecture is different.
>The type of instructions are different.

  Moto went with separate caches because of their performance.

>As for self modifying code.  The machines that use Moto processors are
>more guilty of this.  The Mac and Atari machines uses self modifying code
>for copy protection.  When Moto started putting small caches on their
>chips it created a nightmare.

  So the copy-protection schemes don't use self-modifying code anymore
[in fact, most Mac programs don't use copy-protection other than
manual-type methods].
  At least Motorola could do this, unlike Intel.

>Self modifying code would have broken the 386 with cache.

  Not with a unified cache.

>Also if a cache is designed properly it should be completly transparent to
>software.

  Transparent to user software, perhaps, but often the OS has
to be intimately aware of the cache, just as it has to be aware of the 
TLB.
-- 
------------------------------------------------------------------------------
Evan Torrie.  Stanford University, Class of 199?       torrie@cs.stanford.edu   
Murphy's Law of Intelism:  Just when you thought Intel had done everything
possible to pervert the course of computer architecture, they bring out the 860

mykes@amiga0.SF-Bay.ORG (Mike Schwartz) (06/27/91)

In article <1991Jun25.062028.2265@neon.Stanford.EDU> torrie@cs.stanford.edu (Evan Torrie) writes:
>mykes@amiga0.SF-Bay.ORG (Mike Schwartz) writes:
>
>>Too bad MPW doesn't multitask.  
>
>  This is more a multi-threading, rather than multitasking example, since
>the MPW editor and compiler run in the same address space.  I guess you 
>can argue semantics of multi-threading vs multitasking.
>

You bet I can argue the value of multitasking (what you call multithreading).  Like
the capability to run TWO MPW tools at the same time (I run a dozen CLI tools on my
Amiga simoultaneously with ease).  How about the ability to make an application out
of multiple tasks?  How about the ability to RUN 10 Applications and still use
ZERO percent of the CPU time?  How about the ability to run multiple applications
on a 512K machine?  

Or how about those fancy hard disk compression utilities that abound on the Mac?  Just
rename a folder with a few hundred files in it, and the Mac goes to sleep for as long
as it takes to decompress all those files.

>>When you start up a compile in MPW, you can't
>>use your MPW editor to browse your files.  This was a horrible design decision
>>when they made MPW in the first place.  Since multifinder isn't a true multitasking
>>solution, the best you can hope for is to run two HUGE copies of MPW and maybe
>>the MPW tools will be friendly enough to allow you to compile from one and edit
>>from the other.
>
>  Or else, you use a different editor to edit your programs...  like
>Alpha, the Emacs clone.  
>

And it only eats another meg of RAM if you want to be able to edit at will.
At least there is a REAL editor for the Mac :)  Does it feature multiple
UNDO/REDO like CygnusEd?

--
****************************************************
* I want games that look like Shadow of the Beast  *
* but play like Leisure Suit Larry.                *
****************************************************

mykes@amiga0.SF-Bay.ORG (Mike Schwartz) (06/27/91)

In article <125@ryptyde.UUCP> dant@ryptyde.UUCP (Daniel Tracy) writes:
>In article <1991Jun25.012010.3154@Sugar.NeoSoft.com> peter@Sugar.NeoSoft.com (Peter da Silva) writes:
>>In article <112@ryptyde.UUCP> dant@ryptyde.UUCP (Daniel Tracy) writes:
>>> I was referring to the general-purpose address/data registers used in the 
>>> 8086 line! You wouldn't call these general purpose?
>>
>>Oh, sure. Both of them.
>>-- 
>>Peter da Silva.   `-_-'   <peter@sugar.neosoft.com>.
>>                   'U`    "Have you hugged your wolf today?"
>
>I believe there are 8. All of which can be used as data registers, most
>can be used as address registers, and 7 are implicit parameters to various
>commands.

The 80x8x has a few registers, but that does not make them general purpose.  All but
2 are dedicated for use by various instructions of the CPU.  You CAN use some of them
as general purpose registers, but you won't want to use much of the CPU's instruction
set.  Specifically, the 80x8x has the following registers: AX,BX,CX,DX,SI,DI,BP,CS,DS,SS,ES,
and SP.  AX-DX can all be used similarly, but many instructions on the CPU require
them to be used in dedicated manners.  For example, the CX register is ALWAYs the
count register for shift and memory move operations.  The CS,DS,SS,ES registers are
"segment" registers which are only useful for allowing the CPU to address a specific
64K part of memory.  It is possible to access the entire address space of the CPU, but
it requires manipulating both a segment register and an address register.  And to
make things worse, MS-DOS precludes the use of 32-bit registers (restricted to 16
bits without major tricks), and the math required to manipulate the segment registers
in a general purpose way is expensive.

On the other hand, the 68000 has 8 general purpose data registers and 7 general purpose
address registers.  The stack pointer is an 8th address register, and can be used
as such if the software disables interrupts and deliberately disallows use of the
stack.

Every 68000 instruction that uses a data register can equally use any of the 8
data registers.  Every 68000 instruction that uses an address register can equally
use any of the 8 address registers.  As an added feature of the 68000, each of the
address registers can be used like a segment register (point to a 64K segment), but
the math issue is non-existent.

--
****************************************************
* I want games that look like Shadow of the Beast  *
* but play like Leisure Suit Larry.                *
****************************************************

lron@easy.lrcd.com (Dwight Hubbard) (06/27/91)

In article <125@ryptyde.UUCP> dant@ryptyde.UUCP (Daniel Tracy) writes:
>In article <1991Jun25.012010.3154@Sugar.NeoSoft.com> peter@Sugar.NeoSoft.com (Peter da Silva) writes:
>>In article <112@ryptyde.UUCP> dant@ryptyde.UUCP (Daniel Tracy) writes:
>>> I was referring to the general-purpose address/data registers used in the
>>> 8086 line! You wouldn't call these general purpose?
>>
>>Oh, sure. Both of them.
>>--
>>Peter da Silva.   `-_-'   <peter@sugar.neosoft.com>.
>>                   'U`    "Have you hugged your wolf today?"
>
>I believe there are 8. All of which can be used as data registers, most
>can be used as address registers, and 7 are implicit parameters to various
>commands.

Intel may call them General purpos registers, but they certainly don't look
all that General purpose to me.
--
----------------------------------------------------------------------
-Dwight Hubbard             INTERNET: lron@easy.lrcd.com             -
-Kaneohe, Hawaii            USENET  : ...!uunet!easy!lron            -
-                           BIX     : lron                           -
----------------------------------------------------------------------

lron@easy.lrcd.com (Dwight Hubbard) (06/27/91)

In article <1154@stewart.UUCP> jerry@polygen.uucp (Jerry Shekhel) writes:
>jbickers@templar.actrix.gen.nz (John Bickers) writes:
>>
>>    It's usually because if you have a split cache you break programs
>>    that use self-modifying code.
>>
>
>Doubtful, John, since every OS in existence treats code as data when it
>loads it into memory for execution.

He's right, on machines with split caches the OS will have to force a cache
flush after writing the CODE data to memory.  While it is no problem to
modify the OS routine that loads data from disk to flush the cache, it will
certainly cause problems if quite a few pieces of important software use
self modifing code as they will have to be modified as well to still function.

--
----------------------------------------------------------------------
-Dwight Hubbard             INTERNET: lron@easy.lrcd.com             -
-Kaneohe, Hawaii            USENET  : ...!uunet!easy!lron            -
-                           BIX     : lron                           -
----------------------------------------------------------------------

jasonp@oakhill.sps.mot.com (Jason Perez) (06/27/91)

    Just for my information, what is the cache size in the 80486?


-- 
Jason Perez			  |
jasonp@vulcan.sps.mot.com         | j0p7771@sigma.tamu.edu (after aug 17.)
UUCP: uunet!crash!pro-lep!jasonp  | "Gig 'em!"     "Frodo lives!"
INet: jasonp@pro-lep.cts.com      | "Don't have a cow man!"

torrie@cs.stanford.edu (Evan Torrie) (06/28/91)

mykes@amiga0.SF-Bay.ORG (Mike Schwartz) writes:
>>  Or else, you use a different editor to edit your programs...  like
>>Alpha, the Emacs clone.  
>>

>And it only eats another meg of RAM if you want to be able to edit at will.

  The application suggests 512K.  I haven't bothered to set it any lower 
since I don't have RAM problems.

>At least there is a REAL editor for the Mac :)  Does it feature multiple
>UNDO/REDO like CygnusEd?

  Of course.

-- 
------------------------------------------------------------------------------
Evan Torrie.  Stanford University, Class of 199?       torrie@cs.stanford.edu   
"I didn't get where I am today without knowing a good deal when I see one,
 Reggie."  "Yes, C.J."

torrie@cs.stanford.edu (Evan Torrie) (06/28/91)

jasonp@oakhill.sps.mot.com (Jason Perez) writes:


>    Just for my information, what is the cache size in the 80486?

 8K, 4-way set associative, unified cache.

-- 
------------------------------------------------------------------------------
Evan Torrie.  Stanford University, Class of 199?       torrie@cs.stanford.edu   
"I didn't get where I am today without knowing a good deal when I see one,
 Reggie."  "Yes, C.J."

daveh@cbmvax.commodore.com (Dave Haynie) (06/28/91)

In article <e3e502oG080e01@JUTS.ccc.amdahl.com> kls30@DUTS.ccc.amdahl.com (PUT YOUR NAME HERE) writes:
>In article <1991Jun25.165516.13021@mintaka.lcs.mit.edu> rjc@churchy.gnu.ai.mit.edu (Ray Cromwell) writes:

>>  But this is different, Jerry, because in this case the OS KNOWS
>>how to clear caches. If a lot of MS-DOG programs used self-modifying
>>programs, or if the OS itself doesn't know how to treat caches,
>>code will break. Hence, Intel probably keeping I&D unified to avoid
>>an MS-DOG nightmare.

>Wrong.  Intel decided to go with a unified cache for one because it is
>simpler to implement.  

Wrong.  They did it for compatibility, plain and simple.  The additional
complexity of a split cache system is trivial compared to the work they went
through buildin the rest of the '486 anyway.  Plus, they've already proven
they can handle it, in the i860 and i960.  And they've also proven they're
willing to share pieces between processors when applicable, as the i860
MMU demonstrates (it's basically the clean parts of the '386 MMU).

>Also if you have a 4 way set assoc. cache you have basically 4 small caches.

Of course you do.  The set associtivity is a big win, and lets you have smaller
MMU page sizes (you never want your tag index to consist of translated 
addresses).  Same reason Motorola uses a 4 set associative cache design for
both '040 caches.

>Also in Intel processors you have instuctions that have data included or 
>immediately following.  Kind of hard to separate data and instuctions.

Not at all, works just dandy in Motorola processors, which certainly have the
"immediate" addressing mode.  Everything's coming from the same external memory
anyway, the split cache simply gives you advantages:

	- Instruction streams are highly linear, data streams aren't always,
	  and are very often located all over the place.  With the split cache,
	  data thrashing doesn't clobber your instruction caching.
	- Instruction and Data needed in very different stages of the machine
	  pipeline.  With the split cache, you get parallel cache access to
	  both I and D, the unified cache by it's nature supports only 
	  sequential access.  Should one cache miss, there's no cache/bus fetch 
	  contention as there would be with the unified cache.

>Moto went with a seperate cache because the architecture is different.
>The type of instructions are different.

"MOVE.L #$0000,D0" sure looks to be just the kind of instruction you're 
claiming Intel needs to back up with its unified cache.  There are certainly
differences in instruction sets and architecture, but not to the extent you're
claiming.  The fact is, the '486 could have been a Harvard machine just as 
easily as the '040.  The only thing that stopped it was code compatibility.
UNIX wouldn't be any problem, and I imagine OS/2 wouldn't either, but MS-DOS
would die in flames on such a machine.  Intel only sells '486s because they
run MS-DOS.  So despite obvious and well understood architectural arguments to
the contrary, sound marketing sense made the '486's cache unified.

>As for self modifying code.  The machines that use Moto processors are
>more guilty of this.  

Not really.  Sure, some bad programming is done on all OSs.  There were even 
five or six Amiga programs which failed due to cache problems.  But Motorola
has been moving in the separate I/D cache direction since it first put an I
cache on the '020 many, many moons ago.  So Motorola system code has been, over
all, much, much better than Intel system code.  Not only that, but the Motorola
market is fragmented, OS wise.  Something that causes 30% if Atari programs to
fail isn't death to Motorola as long as Apple, C=, UNIX, etc. are happy.

>When Moto started putting small caches on their chips it created a nightmare.

For whom?  Certainly not for the Amiga.  It didn't seem to be much of a problem
for Sun, HP, or Apple either.  Atari's list of problems begins with "F-line
exceptions used for OS function traps"; even if they have problems with cache,
that's the least of their worries.

>Self modifying code would have broken the 386 with cache.

It doesn't have to.  All '386s have external caches, by their nature unified.
Any D to I transform is consistent within such a cache, so self-modifying code
is absolutely no problem.

>Also if a cache is designed properly it should be completly transparent to
>software.

If you have the proper OS support, they are.  Caches are never fully 
transparent to the OS.  MS-DOS has the unique problem of having no OS to manage
any caching for the applications programs, so only very simple cache designs
are successful.  Separate I/D caches are completely transparent under UNIX, 
since memory pages are specifically allocated as text and data, and any
writes to text pages are trapped by the MMU.  Amiga, and presumably Mac, OSs
have sufficient OS support to transparently manage most of the caching
problems that would get you under MS-DOS.  However, self-modifying code will
cause untrapped program failures under AmigaOS, rather than trapped program
failures under UNIX.

-- 
Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
	"This is my mistake.  Let me make it good." -R.E.M.

dant@ryptyde.UUCP (Daniel Tracy) (06/28/91)

Responding to the following:

>On the other hand, the 68000 has 8 general purpose data registers and
>7 general purpose address registers.

What I meant by "general purpose" was that many registers can be used to
hold both addresses and data. Granted, this is limiting architecturally,
and the implimentation is brain-dead as you imply, since most of those
are used as implicit parameters in various instructions.

dant@ryptyde.UUCP (Daniel Tracy) (06/28/91)

Responding to the following:

>Just for my information, what is the cache size in the 80486?

I don't know if someone already answered this, I'm very behind on messages.
Just to get this in the open so it can be corrected if wrong:

The 486's cache is 8K, is 4-way set-associative, and operates in writethrough
mode only. It also has bus snooping to detect DMA operations and update the
cache accordingly.

In contrast, the 68040's cache is seperated into two 4K "parallel" caches
(what is "parallel" about them? Can they both be read on the same cycles?
Or do they just "widen the bus" to the caches?).
The 68040's cache is also 4-way set-associative (something I don't really
understand well), has bus snooping, but it also has a writeback, or copyback
mode which improves performance (by not copying data back to RAM until the bus
isn't busy).

kls30@duts.ccc.amdahl.com (Kent L Shephard) (06/29/91)

In article <1991Jun27.064123.27492@neon.Stanford.EDU> torrie@cs.stanford.edu (Evan Torrie) writes:
>kls30@duts.ccc.amdahl.com (Kent L Shephard) writes:
>
>>In article <1991Jun25.165516.13021@mintaka.lcs.mit.edu> rjc@churchy.gnu.ai.mit.edu (Ray Cromwell) writes:
>>>
>>>  But this is different, Jerry, because in this case the OS KNOWS
>>>how to clear caches. If a lot of MS-DOG programs used self-modifying
>>>programs, or if the OS itself doesn't know how to treat caches,
>>>code will break. Hence, Intel probably keeping I&D unified to avoid
>>>an MS-DOG nightmare.
>
>>Wrong.  Intel decided to go with a unified cache for one because it is
>>simpler to implement.  Also if you have a 4 way set assoc. cache you
>>have basically 4 small caches.  
>

According to Hennesy and Patterson - Computer Architecture a Quantitive
Approach, pgs 423-425.    Assuming 53% references for instructions, an 8k
unified cache vs a 4k instruction, 4k data cache; the results are as
follows.
Miss Rates
SIZE             Instruction only     Data only      Unified

8KB                  5.8%               6.8%          8.3%

You get an overall miss rate of  6.27% for data/instruction seperate.
You get an overall miss rate of  8.3% for unified.

>  But you still have only one internal path from the CPU to the cache, thus
>cutting your bandwidth in half vs a split I/D Harvard architecture.  For an
>example of why this is important, check out the parallelism in any of today's
>microprocessors' pipelines.  
>  Does Intel still use their 386 instruction prefetch buffer in the
>486?  I suppose that should shore up some of the performance loss from
>having a unified cache.
>

We know both companies claim a hit rate above 90% for their caches.
You also forget that replacement algor. makes a lot of difference in the
hit rate.   Also separate caches require that you have replacement algor.
for both caches.  You also need hardware for both control circuits.  You
need two sets of tag rams, etc.

Intel made a trade between 2-3% performance improvement vs. less chip area
and complexity of design.  They also got their product out the door a LOT
faster than Motorola.

>>Also in Intel processors you have
>>instuctions that have data included or immediately following.  Kind of
>>hard to separate data and instuctions.
>
>  An example of such an instruction?  The 68K has data in its instructions, 
>the ADDQ #x, Dn for example, but this doesn't stop an I/D cache (since the 
>data is non-modifiable).
>
>>Moto went with a seperate cache because the architecture is different.
>>The type of instructions are different.
>
>  Moto went with separate caches because of their performance.

Lets face it during design you make trade offs.  Intel made one Moto made
another.

>
>>As for self modifying code.  The machines that use Moto processors are
>>more guilty of this.  The Mac and Atari machines uses self modifying code
>>for copy protection.  When Moto started putting small caches on their
>>chips it created a nightmare.
>
>  So the copy-protection schemes don't use self-modifying code anymore
>[in fact, most Mac programs don't use copy-protection other than
>manual-type methods].
>  At least Motorola could do this, unlike Intel.

Mac programs now don't use self modifying code.  They did before and it
broke a lot of software when Moto started putting instruction and data
cache (small but there) on the 68k line of chips.

>
>>Self modifying code would have broken the 386 with cache.
>
>  Not with a unified cache.

Yes, with a unified cache you can break code that does weird things.  With
a separate cache you can break ill behaved code.

>
>>Also if a cache is designed properly it should be completly transparent to
>>software.
>
>  Transparent to user software, perhaps, but often the OS has
>to be intimately aware of the cache, just as it has to be aware of the 
>TLB.

The OS does not have to be aware of the cache unless it wants to turn it
on or off.  The CPU has to intimately know the cache.  The OS does not
need to know it is there.  The OS needs to know about the TLB because the
OS will handle page faults, loading descriptor tables, and just overall
hadling of virtual memory.

The OS knows nothing about a cache miss unless the page was swapped to
disk.  You would then get a page fault, bring the page into physical
memory, then the CPU would handle the cache miss.

A cache should be transparent if someone tell you otherwise they are
mistaken.  I've designed memory management and cache controller units.
The cache controller has always been transparent to the software.

Even in multiprocessor systems the cache is transparent.  You would use
a cache coherency protocol like MSI, MESI, MOESI, etc. and you would
impliment all your algor. in hardware.

>-- 
>------------------------------------------------------------------------------
>Evan Torrie.  Stanford University, Class of 199?       torrie@cs.stanford.edu   
>Murphy's Law of Intelism:  Just when you thought Intel had done everything
>possible to pervert the course of computer architecture, they bring out the 860

--
/*  -The opinions expressed are my own, not my employers.    */
/*      For I can only express my own opinions.              */
/*                                                           */
/*   Kent L. Shephard  : email - kls30@DUTS.ccc.amdahl.com   */

torrie@cs.stanford.edu (Evan Torrie) (06/30/91)

dant@ryptyde.UUCP (Daniel Tracy) writes:

>In contrast, the 68040's cache is seperated into two 4K "parallel" caches
>(what is "parallel" about them? Can they both be read on the same cycles?

  Yes.  Useful when your pipeline is overlapping EA Fetches with 
Instruction fetches.

>The 68040's cache is also 4-way set-associative (something I don't really
>understand well), 

  Means that four memory addresses can map to the same cache-line without
invalidating each other.  Generally helps prevent pathological cases of
program behaviour from destroying your hit ratio.

>has bus snooping, but it also has a writeback, or copyback
>mode which improves performance (by not copying data back to RAM until the bus
>isn't busy).

  The nice thing about these on the 040 is that they are selectable on
a page-by-page basis.  So you can make some addresses (such as I/O addresses)
non-cacheable, while ordinary code etc runs in copyback mode.

-- 
------------------------------------------------------------------------------
Evan Torrie.  Stanford University, Class of 199?       torrie@cs.stanford.edu   
"Lay me place and bake me pie, I'm starving for me gravy... Leave my shoes
and door unlocked, I might just slip away - hey - just for the day."

tim@proton.amd.com (Tim Olson) (06/30/91)

In article <fbH=02SF08zd01@JUTS.ccc.amdahl.com>
kls30@DUTS.ccc.amdahl.com (PUT YOUR NAME HERE) writes:
			   ^^^^^^^^^^^^^^^^^^ Have you done it, yet??

| The OS does not have to be aware of the cache unless it wants to turn it
| on or off.  The CPU has to intimately know the cache.  The OS does not
| need to know it is there.  The OS needs to know about the TLB because the
| OS will handle page faults, loading descriptor tables, and just overall
| hadling of virtual memory.
|
| A cache should be transparent if someone tell you otherwise they are
| mistaken.  I've designed memory management and cache controller units.
| The cache controller has always been transparent to the software.

There are many different types of caching schemes.  A frequently used
scheme to speed up cache references (especially in instruction caches,
where coherency is not an issue) is to use virtual addresses for the
tag lookup and comparison.  In this case, the OS must be aware of the
cache and invalidate all or part of it on a change of
virtual-to-physical mapping, which the OS controls.

--
	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)

torrie@cs.stanford.edu (Evan Torrie) (06/30/91)

kls30@duts.ccc.amdahl.com (Kent L Shephard) writes:

>According to Hennesy and Patterson - Computer Architecture a Quantitive
>Approach, pgs 423-425.    Assuming 53% references for instructions, an 8k
>unified cache vs a 4k instruction, 4k data cache; the results are as
>follows.
>Miss Rates
>SIZE             Instruction only     Data only      Unified
>8KB                  5.8%               6.8%          8.3%

>You get an overall miss rate of  6.27% for data/instruction seperate.
>You get an overall miss rate of  8.3% for unified.

 No argument here so far.

>>  But you still have only one internal path from the CPU to the cache, thus
>>cutting your bandwidth in half vs a split I/D Harvard architecture.  For an
>>example of why this is important, check out the parallelism in any of today's
>>microprocessors' pipelines.  
>>  Does Intel still use their 386 instruction prefetch buffer in the
>>486?  I suppose that should shore up some of the performance loss from
>>having a unified cache.
>>

>We know both companies claim a hit rate above 90% for their caches.
>You also forget that replacement algor. makes a lot of difference in the
>hit rate.   

  But both use LRU replacement, so they're equivalent.

>Also separate caches require that you have replacement algor.
>for both caches.  You also need hardware for both control circuits.

  Yes, but this isn't much chip area compared to the actual cache
storage.

>You need two sets of tag rams, etc.

  But each cache is only 4K => the total # of tag rams is exactly the same
in the 486's 8K cache vs the '040's 2x4K.

>Intel made a trade between 2-3% performance improvement vs. less chip area
>and complexity of design.  

  Uhhh, sorry.  This is where I violently disagree with you.  You make the
jump from miss ratio = 2-3% difference, to suddenly asserting that overall
performance improvement is only 2-3%.
  If we read from H&P again, pg 423.

"Unlike other levels of the memory hierarchy, caches are sometimes divided 
into instruction-only and data-only caches.  Caches can contain that
can contain either instructions or data are unified caches, or mixed
caches.  The CPU knows whether it is issuing an instruction address or
a data address, so there can be separate ports for both, thereby
doubling the bandwidth between the cache and the CPU.  (Section 6.4 in
Chapter 6 shows the advantages of dual memory ports for pipelined
execution.)  Separate caches also offers the opportunity of optimising
each cache separately: different capacities, block sizes, and
associativities may lead to better performance.  SPLITTING THUS
AFFECTS THE COST AND PERFORMANCE FAR BEYOND WHAT IS INDICATED BY THE
CHANGE IN MISS RATES.
[my emphasis].

>They also got their product out the door a LOT
>faster than Motorola.

  But ended up being 20-25% slower.

>>  Moto went with separate caches because of their performance.

>Lets face it during design you make trade offs.  Intel made one Moto made
>another.

  Most trade-offs involve a choice.  For Intel, there was no such
choice if they wanted to retain their captive market.

>>
>>>Self modifying code would have broken the 386 with cache.
>>
>>  Not with a unified cache.

>Yes, with a unified cache you can break code that does weird things. 

  An example of such "weird things"?  

>With a separate cache you can break ill behaved code.

  Namely, self-modifying code.

>The OS does not have to be aware of the cache unless it wants to turn it
>on or off.  The CPU has to intimately know the cache.  The OS does not
>need to know it is there.  The OS needs to know about the TLB because the
>OS will handle page faults, loading descriptor tables, and just overall
>hadling of virtual memory.

  This depends on how simple your cache is.  If it's some large
second-level cache, then it's probably transparent to the OS.  But, 
high speed, on-chip caches more often than not require the attention
of the OS.  For example, virtually addressed caches require flushes on
context switches.  Copy-back caches require special handling by the
OS on I/O, shared memory operations (see H&P pg 467)
  In fact, with virtually addressed caches, a cache can even make its
presence felt up in the programming languages/OS interface.  See 
H&P pg 460 for an example.

>A cache should be transparent if someone tell you otherwise they are
>mistaken.  

  H&P tells me that the OS often has to be aware of the cache.  So
are they mistaken?

>I've designed memory management and cache controller units.
>The cache controller has always been transparent to the software.

  But these are probably physically addressed, write through caches, 
right?  Both of which can cause bottlenecks in a high performance
cache design.

>Even in multiprocessor systems the cache is transparent.  You would use
>a cache coherency protocol like MSI, MESI, MOESI, etc. and you would
>impliment all your algor. in hardware.

-- 
------------------------------------------------------------------------------
Evan Torrie.  Stanford University, Class of 199?       torrie@cs.stanford.edu   
"I didn't get where I am today without knowing a good deal when I see one,
 Reggie."  "Yes, C.J."