[comp.sys.3b1] Hardware freaks Unite

Mariusz@fbits.ttank.com (Mariusz Stanczak) (03/24/91)

No offence to the Subject, but I just finished "reading" 
an interesting article in Electronic Design (USSN 0013-4872,
Vol 39, No. 5, March 14 1991, page 59) entitled Upgrade a
68030-Based System With a Clever Cache Design.
The article outlines how to build and add to an existing
system (i.e. no changes to architecture) a cache controller
doughterboard with 5 chips (two cache comparators and three
PALs) plus 8 static RAM chips that (on the 68030) gives
a theoretical 29% speedup.  I can bearly follow traces, and
read signal names, but it appears that all (similar ;-))
signals that this project uses are on the 68010, SO maybe
the brave-at-heart-hardware-types would be interested at
looking into the feasibility of transfering the idea to the
3B1 hardware?  Involved project no doubt, but if possible,
it'd be a safe way to boost performance of the system (as
opposed to trying to fit a more efficient processor and
making the OS angry ;-)).

Lights off,

-Mariusz

P.S.	I'd be happy to send photocopies to anyone interested
	in pursuing the thought.
-- 
INET: Mariusz@fbits.ttank.com
CIS : 71601.2430@compuserve.com
UUCP: ..!uunet!zardoz!ttank!fbits!Mariusz

botton@i88.isc.com (Brian D. Botton) (03/26/91)

In article <95@fbits.ttank.com> Mariusz@fbits.ttank.com (Mariusz Stanczak) writes:
>No offence to the Subject, but I just finished "reading" 
>an interesting article in Electronic Design (USSN 0013-4872,
>Vol 39, No. 5, March 14 1991, page 59) entitled Upgrade a
>68030-Based System With a Clever Cache Design.
>The article outlines how to build and add to an existing
>system (i.e. no changes to architecture) a cache controller
>doughterboard with 5 chips (two cache comparators and three
>PALs) plus 8 static RAM chips that (on the 68030) gives
>a theoretical 29% speedup.  I can bearly follow traces, and
>read signal names, but it appears that all (similar ;-))
>signals that this project uses are on the 68010, SO maybe
>the brave-at-heart-hardware-types would be interested at
>looking into the feasibility of transfering the idea to the
>3B1 hardware?  Involved project no doubt, but if possible,
>it'd be a safe way to boost performance of the system (as
>opposed to trying to fit a more efficient processor and
>making the OS angry ;-)).
>

  Motorola has an app note that describes how to make a daughter board for
the 68020 to replace a 68000 or 68010, you can also have a 68881/2.  I would
dearly love to make this board, including support for vidpal, but I haven't
had the time.  I just don't have time to do both the kernel mods and the
hardware.
  BTW, the 68020 does have an instruction cache and the app note claims ~ 100%
increase in performance.  If anyone who has access to the code would like to
colaberate I would like the help.

--
     ...     ___	     ***
   _][_n_n___i_i ________  *******		Brian D. Botton
  (____________I_I______I_I_______I		laidbak!botton  or
  /ooOOOO OOOOoo  oo oooo  oo   oo		laidbak!bilbo!brian

john@chance.UUCP (John R. MacMillan) (03/27/91)

|...  Involved project no doubt, but if possible,
|it'd be a safe way to boost performance of the system (as
|opposed to trying to fit a more efficient processor and
|making the OS angry ;-)).

Unless the cache is of physical memory and completely transparent the
virtual memory system would have to know about it.  And if that is
the case, you probably wouldn't get as big performance wins because
the physical memory is changing a lot to handle the virtual address
space.

Of course the only way to find out is for one of you folks who know
which end of a soldering iron to hold to try it out... :-)

dnichols@ceilidh.beartrack.com (DoN Nichols) (03/28/91)

In article <1991Mar27.063904.10091@chance.UUCP> john@chance.UUCP (John R. MacMillan) writes:

	[ ... ]
>
>Of course the only way to find out is for one of you folks who know
>which end of a soldering iron to hold to try it out... :-)

	Oh, the soldering iron does a good job of teaching that, the first
time you pick it up wrong after it has been plugged in. :-)

	I remember a advertisment in something like Electronic Design, in
the early '70s, in which the company had gotten a nice-looking secretary to
model for the photos to illustrate their experienced assemblers, and she was
holding the iron (an Ungar, if I remember right) by the CERAMIC HEATING
ELEMENT!

	We got a few chuckles out of that!  (From her expression, the iron
wasn't plugged in.)

	DoN.
-- 
Donald Nichols (DoN.)		| Voice (Days):	(703) 664-1585
D&D Data			| Voice (Eves):	(703) 938-4564
Disclaimer: from here - None	| Email:     <dnichols@ceilidh.beartrack.com>
	--- Black Holes are where God is dividing by zero ---

Mariusz@fbits.ttank.com (Mariusz Stanczak) (03/28/91)

In article <1991Mar27.063904.10091@chance.UUCP>, john@chance.UUCP (John R. MacMillan) writes:
> |...  Involved project no doubt, but if possible,
> |it'd be a safe way to boost performance of the system (as
> |opposed to trying to fit a more efficient processor and
> |making the OS angry ;-)).
> 
> Unless the cache is of physical memory and completely transparent the
> virtual memory system would have to know about it.  And if that is

	Yes it is.  In this particular implementation 8MB is directly
    mapped into 64KB of SRAM as 128 pages (page consists of 16KB 4-byte
    lines with the address space A2-15 covering the 16K lines for each
    page).


> the case, you probably wouldn't get as big performance wins because
> the physical memory is changing a lot to handle the virtual address
> space.

	That may be, though I wouldn't know.  The calculations are
    done for the 68030 with a statement "This design gives an average
    hit/miss ratio of about 85%" as the base for comming up through
    some more calculations that assume '030 clock cycle efficiency
    for reads and writes at 29% theoretical speedup.


> Of course the only way to find out is for one of you folks who know
> which end of a soldering iron to hold to try it out... :-)

	Yes... and in the mean time, would anybody with some formal
    training in this matters care to comment how a VM OS would affect
    the stated theoretical speedup factors that are based at physical
    mapping and R/W-instructions efficency of a given processor?

    -Mariusz
-- 
INET: Mariusz@fbits.ttank.com
CIS : 71601.2430@compuserve.com
UUCP: ..!uunet!zardoz!ttank!fbits!Mariusz

jbm@uncle.uucp (John B. Milton) (04/06/91)

In article <1991Mar28.041451.906@ceilidh.beartrack.com> dnichols@ceilidh.beartrack.com (DoN Nichols) writes:
>In article <1991Mar27.063904.10091@chance.UUCP> john@chance.UUCP (John R. MacMillan) writes:
>
>	[ ... ]
>>
>>Of course the only way to find out is for one of you folks who know
>>which end of a soldering iron to hold to try it out... :-)
[Unger solding iron]

I remember an add for Opus floppy disks, which showed a model wearing an Opus
T-shirt and holding a floppy WITH HER FINGERS RIGHT ON THE WINDOW. The next
month I saw the exact same add, only she was holding the floppy by the side...

John
-- 
John Bly Milton IV, jbm@uncle.UUCP, n8emr!uncle!jbm@osu-cis.cis.ohio-state.edu
(614) h:252-8544, w:785-1110; N8KSN, AMPR: 44.70.0.52; Don't FLAME, inform!

agodwin@acorn.co.uk (Adrian Godwin) (04/15/91)

In article <1991Mar26.023948.3966@i88.isc.com> botton@i88.isc.com (Brian D. Botton) writes:
>  Motorola has an app note that describes how to make a daughter board for
>the 68020 to replace a 68000 or 68010, you can also have a 68881/2.  I would
>dearly love to make this board, including support for vidpal, but I haven't
>had the time.  I just don't have time to do both the kernel mods and the
>hardware.
>  BTW, the 68020 does have an instruction cache and the app note claims ~ 100%
>increase in performance.  If anyone who has access to the code would like to
>colaberate I would like the help.

A number of companies sell daughter boards - some of them based on the Motorola
app. note - as an upgrade for the Commodore Amiga. It seems to generally held that
they only manage about a 40% performance increase UNLESS they're also fitted with
32-bit memory - otherwise, the use of the '020 in 16-bit mode severely limits
its performance. 

However, since most of these boards directly replace the Amiga's 68pin DIP 68000,
and are intended to fit inside the limited space available inside an A500, they 
may be useable on the 3b1. This would at least reduce the problem to the
software part and would provide, in addition, a cheap memory upgrade path.

Apologies if the thought of putting an Amiga upgrade inside your 3b1 offends you :-).

-adrian

-- 
--------------------------------------------------------------------------
Adrian Godwin                                        (agodwin@acorn.co.uk)

ward@tsnews.Convergent.COM (Ward Griffiths) (04/16/91)

agodwin@acorn.co.uk (Adrian Godwin) writes:

>Apologies if the thought of putting an Amiga upgrade inside your 3b1 offends you :-).

Not like an idea I've seen elsewhere about designing a '286
accelerator for the DOS-73 board.

-- 
          Ward Griffiths, Unisys NCG aka Convergent Technologies                The people that make Unisys' official opinions get paid more.  A LOT more.
===========================================================================          To Hell with "Only One Earth"!  Try "At Least One Solar System"!

If I say love, I'll sound sentimental, and if I say sex, I'll sound cynical.    I'll call it pair bonding and sound scientific.         The Golden Apple

Mariusz@fbits.ttank.com (Mariusz Stanczak) (04/17/91)

In article <6386@acorn.co.uk>, agodwin@acorn.co.uk (Adrian Godwin) writes:
> A number of companies sell daughter boards - some of them based on the
> Motorola app. note - as an upgrade for the Commodore Amiga. It seems to 
> generally held that they only manage about a 40% performance increase 
> UNLESS they're also fitted with 32-bit memory - otherwise, the use of
> the '020 in 16-bit mode severely limits its performance. 

	Yes, so do others for the Mac... the question for me would be
    are those boards system specific, i.e. do they assume (and "solve")
    anything about the hardware/software quirks of the given system
    for which they are marketed?


> However, since most of these boards directly replace the Amiga's 68pin 
> DIP 68000, and are intended to fit inside the limited space available 
> inside an A500, they may be useable on the 3b1. This would at least reduce 
> the problem to the software part [...]

	So, as above... what would be the pitfals(sp).  If such a "card"
    wouldn't be hardware specific, and (correct me anybody) since most
    of *nix is written in C, the stuff written for '010 should run on
    '040 as the later is a superset of the former.  If the compiler
    writer didn't screw it up in the first place, that is (but then the
    chances are that that was not the case as, most likely the compiler
    used was the AT&T's "portable" `pcc').


> Apologies if the thought of putting an Amiga upgrade inside your 3b1 
> offends you :-).

	Could it? ;-)  Hey, for me, if it works (even with ONLY(?) a 40%
    boost) I'll take it NOW, even if it ment having had to add an "a"
    (for Amiga) to the 3b1 logo on the nameplate ;-) (but NOT if it ment
    being banned from this newsgroup for commiting such heresey(sp))


> -adrian
> --------------------------------------------------------------------------
> Adrian Godwin                                        (agodwin@acorn.co.uk)


-Mariusz
-- 
INET: Mariusz@fbits.ttank.com
CIS : 71601.2430@compuserve.com
UUCP: ..!uunet!zardoz!ttank!fbits!Mariusz

agodwin@acorn.co.uk (Adrian Godwin) (04/18/91)

In article <100@fbits.ttank.com> Mariusz@fbits.ttank.com (Mariusz Stanczak) writes:
>In article <6386@acorn.co.uk>, agodwin@acorn.co.uk (Adrian Godwin) writes:
>> A number of companies sell daughter boards - some of them based on the
>> Motorola app. note - as an upgrade for the Commodore Amiga. It seems to 
>
>	Yes, so do others for the Mac... the question for me would be
>    are those boards system specific, i.e. do they assume (and "solve")
>    anything about the hardware/software quirks of the given system
>    for which they are marketed?

They may assume things about the bus timing - the critical timings on one
system may differ from those on another, and these add-ons aren't going to be as
well specified as say, an emulator. One card I know of adds some memory at a
fixed address (though it might be possible to change it with a different PAL)
- however, it does it at an address that can't exist on a 24-bit-address
68000 so it can't possibly conflict. Control registers may - or may not -
be similar.

>> inside an A500, they may be useable on the 3b1. This would at least reduce 
>> the problem to the software part [...]
>
>	So, as above... what would be the pitfals(sp).  If such a "card"
>    wouldn't be hardware specific, and (correct me anybody) since most

As I understand it, mostly to the kernel. The higher 680x0 processors
have a different interrupt/exception stack frame to the 68000, and the 
system needs also to recognise memory that isn't in it's normal addressing 
range. A different mechanism is used to load the PSR.

Perhaps there are other differences too - it's only a drop-in replacement 
for the Amiga because its OS knows about alternative processors.

Even so, some applications do things they shouldn't, and fall over. This
should be less of a problem on the 3b1, because Unix applications are much
better behaved than those for OSs without process protection - but don't
forget to fix the diagnostics as well as the real OS ! Or use an upgrade
card that makes it possible to switch between a 68020 and a real 68000.

-- 
--------------------------------------------------------------------------
Adrian Godwin                                        (agodwin@acorn.co.uk)

murphyn@motcid.UUCP (Neal P. Murphy) (04/19/91)

agodwin@acorn.co.uk (Adrian Godwin) writes:

>In article <1991Mar26.023948.3966@i88.isc.com> botton@i88.isc.com (Brian D. Botton) writes:
>>  Motorola has an app note that describes how to make a daughter board for
>>the 68020 to replace a 68000 or 68010, you can also have a 68881/2.  I would
>>dearly love to make this board, including support for vidpal, but I haven't
>>had the time.  I just don't have time to do both the kernel mods and the
>>hardware.
>>  BTW, the 68020 does have an instruction cache and the app note claims ~ 100%
>>increase in performance.  If anyone who has access to the code would like to
I happened to find this app note in a drawer in the semi-conductor sales
office here. The figures in it seemed to indicate 40%-60% increase in
performance. I believe this board just directly replaces the '000 or '010
with an '020 and an '881.

Of course, I had the wild idea that one could strap two Unix-PC mother-
boards together to create a 32-bit bus. This would probably require
major low-level code changes, though...

NPN

cgy@cs.brown.edu (Curtis Yarvin) (04/21/91)

In article <6935@bone13.UUCP> murphyn@motcid.UUCP (Neal P. Murphy) writes:
>Of course, I had the wild idea that one could strap two Unix-PC mother-
>boards together to create a 32-bit bus. This would probably require
>major low-level code changes, though...

"I had the wild idea that one could glue two ostriches together to create a
camel.  This would probably require major advances in adhesive technology,
though..."

"I had the wild idea that one could cut a Sun-3 in half to create two
souped-up Unix-PCs.  This would probably require extremely careful sawing,
though..."

et cetera.

Curtis

"I tried living in the real world
 Instead of a shell
 But I was bored before I even began." - The Smiths

Mariusz@fbits.ttank.com (Mariusz Stanczak) (04/21/91)

In article <6464@acorn.co.uk>, agodwin@acorn.co.uk (Adrian Godwin) writes:
> They may assume things about the bus timing - the critical timings on one
[...]
> Control registers may - or may not - be similar.
[...]
> The higher 680x0 processors
> have a different interrupt/exception stack frame to the 68000, and the 
> system needs also to recognise memory that isn't in it's normal addressing 
> range. A different mechanism is used to load the PSR.
[...]
> -- 
> --------------------------------------------------------------------------
> Adrian Godwin                                        (agodwin@acorn.co.uk)

Thank you for your comments, Adrian.  For those reasons, especially
the stack frame differences... that affects everything(!) a drop-in uP
upgrade doesn't sound like a viable route to go, though I'd love to be
proven otherwise.  A cache would be a much less obtrusive way to boost
a little the, already adequate, performance of this machine, with full
compatibility assured.  ...it'd be cheaper too ;-) (well maybe not, 25ns
memory still isn't a commodity item, but SIMPLER it would be for sure!)

-Mariusz
-- 
INET: Mariusz@fbits.ttank.com
CIS : 71601.2430@compuserve.com
UUCP: ..!uunet!zardoz!ttank!fbits!Mariusz

tkacik@kyzyl.mi.org (Tom Tkacik) (04/22/91)

In article <101@fbits.ttank.com>, Mariusz@fbits.ttank.com (Mariusz Stanczak) writes:
>                    A cache would be a much less obtrusive way to boost
> a little the, already adequate, performance of this machine, with full
> compatibility assured.  ...it'd be cheaper too ;-) (well maybe not, 25ns
> memory still isn't a commodity item, but SIMPLER it would be for sure!)

A drop in cache would be an expensive way not to gain any performance
at all in a 3b1.  A cache is simply a way to make main memory look
faster.  It works well on machines whose processor can use 25ns memory,
but have to settle for cheaper 80ns main memory.  A litte 25ns cache can
make megabytes of 80ns memory look faster.

On the 3b1 all of the memory runs at full speed.  The 68010 does not use
any wait states, (even for expansion memory).  A cache
cannot speed  it up at all.  Just think of your 3b1 as already having
4Meg of cache.
-- 
Tom Tkacik                |
tkacik@kyzyl.mi.org       |     To rent this space, call 1-800-555-QUIP.
...!rphroy!kyzyl!tkacik   |

Mariusz@fbits.ttank.com (Mariusz Stanczak) (04/24/91)

In article <326@kyzyl.mi.org>, tkacik@kyzyl.mi.org (Tom Tkacik) writes:
> On the 3b1 all of the memory runs at full speed.  The 68010 does not use
> any wait states, (even for expansion memory).  A cache
> cannot speed  it up at all.  Just think of your 3b1 as already having
> 4Meg of cache.
Thinking, thinking... hmmm.  Makes sense, and if that's the case, it just
shows how little I know about the hardware... the article in Electronic
Design indeed uses a '030 @ 33MHz, though there's nothing about wait 
states (only about a "retry mode" of the '030).  Is 'wait states' all
there is about lower then possible processor performance, or at the
11MHz with 150ns memory such factors (if any) don't come into play?
(just curious).

-Mariusz
-- 
INET: Mariusz@fbits.ttank.com
CIS : 71601.2430@compuserve.com
UUCP: ..!uunet!zardoz!ttank!fbits!Mariusz

tkacik@hobbes.cs.gmr.com (Tom Tkacik CS/50) (04/25/91)

In article <102@fbits.ttank.com>, Mariusz@fbits.ttank.com (Mariusz
Stanczak) writes:
|> In article <326@kyzyl.mi.org>, tkacik@kyzyl.mi.org (Tom Tkacik) writes:
|> > On the 3b1 all of the memory runs at full speed.  The 68010 does not use
|> > any wait states, (even for expansion memory).  A cache
|> > cannot speed  it up at all.  Just think of your 3b1 as already having
|> > 4Meg of cache.
|> Thinking, thinking... hmmm.  Makes sense, and if that's the case, it just
|> shows how little I know about the hardware... the article in Electronic
|> Design indeed uses a '030 @ 33MHz, though there's nothing about wait 
|> states (only about a "retry mode" of the '030).  Is 'wait states' all
|> there is about lower then possible processor performance, or at the
|> 11MHz with 150ns memory such factors (if any) don't come into play?
|> (just curious).

The 3B1 uses a 68010 processor running at 10MHz.  The rest of the system
was designed to complement that processor.  The memory can be accessed by
the processor as fast as the processor can go (ie., no wait states).
Memory speed is not the bottleneck.  The processor is the bottleneck.

Maybe a little explaination of wait states is in order.
The 68010 uses 4 clock cycles to access one word of memory.
At 10MHz, that's 400ns.  The memory used in the 3B1 is 150ns.  With all of
the delays on the motherboard, (and through the 68010 itself), that work's
out about right.  When slower memory is used, (like the ROM for example),
400ns is not enough time to get the data from the memory to the processor.
When this happens, the processor must be stopped until the memory is ready.
The processor stops by being told to wait an extra clock cycle,
(adding 100ns to the available memory access time).  Because the processor
cannot do anything useful during this time, it is called a 'wait state'.

The 3B1 has a slow access mode (used for ROM and slow I/O devices)
that adds 5 wait states (extra clock cycles).
(900ns should be enough time for just about anything).
But all of the main memory uses no wait states.

Some of the newer workstations which use much faster processor speeds,
(up to 66Mhz in the case of the new HP 700 series), allows memory only about
15ns to give up its data.  Megabytes of memory that fast would cost a fortune.
Many waits states are required.  A good cache will allow the processor to
average fewer wait states. 

Higher performance in the 3b1 will only be gained by either
increasing the processor speed,
or by putting in a more powerful processor (68020 or 68030).

The problem with putting in a 68020 or 68030 has been hashed out before,
(the interrupt stack frame problem, which required kernel hacks to get it
to work).

The problem with putting in a faster 68010 is twofold.  First, I am not aware
of a faster 68010.  I think that 10MHz is the fasted Motorola made,
(though I am sure I will be corrected on that if I'm wrong:-).
If a faster 68010 could be found, then the clock speed could be increased.
Now, faster memory would be needed, (and a cache could be used).
However, I believe that the rest of the system was also designed to run
at 10MHz,
and more than just memory would start to fail.

--
Tom Tkacik
GM Research Labs
tkacik@hobbes.cs.gmr.com

Mariusz@fbits.ttank.com (Mariusz Stanczak) (04/29/91)

In article <51582@rphroy.UUCP>, tkacik@hobbes.cs.gmr.com (Tom Tkacik CS/50) writes:
> The 3B1 uses a 68010 processor running at 10MHz.  The rest of the system
[...]
> Memory speed is not the bottleneck.  The processor is the bottleneck.
[...]
> The 68010 uses 4 clock cycles to access one word of memory.
> At 10MHz, that's 400ns.  The memory used in the 3B1 is 150ns.  With all of
> the delays on the motherboard, (and through the 68010 itself), that work's
> out about right.  When slower memory is used, (like the ROM for example),
[...]
> Higher performance in the 3b1 will only be gained by either
> increasing the processor speed,
> or by putting in a more powerful processor (68020 or 68030).
[...]
> The problem with putting in a 68020 or 68030 has been hashed out before,
> (the interrupt stack frame problem, which required kernel hacks to get it
> to work).

Good stuff... the right numbers tell (to the right person) the story.
And, that just about closes the topic... doesn't it? (at least for me...
I better leave speedup ideas to the better equipped).  Still it'd be
nice (dream on boy ;-)).  And the one I still have (a "realizable fantasy")
is a SCSI I/F.  How is that progressing?

-Mariusz
-- 
INET: Mariusz@fbits.ttank.com
CIS : 71601.2430@compuserve.com
UUCP: ..!uunet!zardoz!ttank!fbits!Mariusz

Alvin@cup.portal.com (Alvin Henry White) (04/29/91)

This is kind of a strange group.  I think it is the first I  have seen where
people write in to say "I don't know how to do that and I wouldn't be 
interested in doing it even if someone else figures out how to do it."
-alvin
Alvin H. White, Gen. Sect.
G.O.D.S.B.R.A.I.N.
Government Online Database Systems
Bureau for Resource Allocations to Information Networks
 alvin@cup.portal.com