[comp.sys.apple] The Apple //f: A Possible Future for the Apple //

toddpw@tybalt.caltech.edu (Todd P. Whitesel) (02/22/90)
		The Apple //f: A Possible Future for the Apple //
				by Todd Whitesel

     This paper has one purpose: to define a successor to the //gs that will
take on the Amiga and win. The Apple // needs public commitment from Apple if it
is to survive, and it MUST have a long term strategy if it is to come out on top
in the low end market. The Apple // is about the only machine left which is
simple enough to be readily made into an inexpensive performer for the 90's.

     While this is a tall order, I think Apple's exhaustive push into the
business market has left it uniquely equipped. Now that they're inviting
suggestions (and resumes, perhaps?) it is possible to present the results of
much investigation into the //gs architecture and have them considered
seriously.

     I'm going to start with immediate changes to the current system package,
and then delve into the motherboard where the radical ideas lie. Cost
considerations are Apple's prerogative, but if we invest some decent money and
think ahead this time it will more than pay off in long term sales. The
motherboard should have connectors to accept things like VRAMs and genlock as
they become available, and designing for them now (while the chip set is still
on the drawing board) will save everyone lots of trouble and money later on.

     (Non-disclaimer: We certainly won't get a machine like this overnight, but
a goal to shoot for is important too. If some of these ideas sound really
ludicrous, or I forgot something, mail me why at the addresses given later.)

     And now... the Apple //f, a machine positioned for small business, home
productivity, education, MIDI, animation, overlay, and general hacking.

     The case is about the size of a Mac //ci, with room for two internal
drives, the power supply, and expansion cards. The computer can sit on its side,
and the top of the case is molded inside to support peripheral cards while in
this position.

     The drive mechanisms from Apple 3.5's and Unidisks can be mounted
internally with a simple installation kit. The SWIM Disk Port connector on the
back can connect Disk ][s and extra UniDisks. The Disk Port is run by an
improved version of the //c+ disk coprocessor which can DMA disk blocks into
memory.

     Internal hard drives are an option and are sold with all but the bare
minimum system configurations. The SCSI port is capable of synchronous SCSI and
uses DMA. It supports both a port on the back and an internal connector.

     The earphone jack outputs true stereo, and the analog section is better
isolated for noise immunity. A volume control is accessible from outside the
case.

     The Apple RGB has stereo RCAs and a composite input. This complements the
CPU's stereo and the monitor can now be used with VCRs, saving budget conscious
buyers the expense of a separate television.

     Hypercard GS (or equivalent) is provided as part of the System Software
package. An HFS FST is included.

     A choice of keyboard is provided for those who prefer the Standard ADB
Keyboard or a third party keyboard.

     The Game I/O port is connected to previously unused pins on the ADB
microcontroller, which recognizes a Game I/O joystick as an ADB joystick and
performs the software timing loops required to read it. This frees up the CPU
and makes the joystick attractive as a controller for new games.

     The CPU and its system components can be purchased separately, allowing
customers to easily personalize their system package.

     Disk ]['s are sold separately from the CPU, as few new users need them.

     The power supply is a heavy duty one on par with Applied Engineering's and
has a quiet fan built in. The fan's air flow is simple and efficient: in through
the fan, over the motherboard and between the expansion cards, and out.

     Every component of the system is backed by a one year warranty.

     A list of stable mail and Email addresses, those of the Apple Computer
Customer Input Division (or something like that), is included with every
product.

     The 65816/65832 runs at least 7 Mhz or better from a user-upgradable cache
system or processor direct slot. The cache controller has a direct path to main
RAM. Virtual memory is provided for by running appropriate signals to the
processor direct slot. Latch-on-write is implemented to cushion the effect of
wait-stated writes.

     The custom chip set is completely new. The internal architecture is well
balanced and supports enough DMA bandwidth to move data around without adversely
affecting the CPU. All peripherals require less CPU attention because they are
controlled by coprocessors or are DMA compatible.

     The CPU controller implements true vectored interrupts. Software interrupt
signatures are latched by the controller, and peripheral interrupt lines are fed
to a priority encoder whose output is used to generate the vector address of an
interrupt request. This reduces interrupt overhead to a bare minimum and allows
efficient emulation of 65832 COP floating point unit instructions (perhaps via
Floating Point Engine drivers). The BRK opcode is reserved for debugging or
perhaps software emulation of the WDM instruction's future functions. Other CPU
lines, especially RDY, are used to advantage by the system or at least made
available on the processor direct slot.

     The memory map is fleshed out and is fully supported by the chip set. To
make the cost reasonable, no motherboard is sold 'full' and this is based on the
expectation that falling chip prices will allow users to upgrade by themselves
or via third party expansion cards.

     All internal peripherals and I/O run at least 3.58 Mhz. This simplifies bus
synchronization and DMA logic, and improves internal I/O performance immensely.
A memory map outline reads as follows:

     Banks 00-BF: Main RAM. Four SIMM sockets, maybe eight, at least 120 ns and
probably better. RAM configurations with 256K / 1M / 4M SIMMs in small upgrade
steps are supported, and all 12 Megabytes are DMA compatible. The DRAM
controller supports DRAM page mode operation to assist the cache system.

     Banks C0-CF: Video RAM expansion. A connector for up to 1 Meg of DMA
compatible VRAM in addition to the 128K standard VRAM in banks $E0-E1. 24 bit
frame buffer cards are expected in the future when VRAM prices drop. All VRAM is
controlled by a fully programmable video generator and graphics coprocessor,
hopefully redefining the basic video hardware for the last time. A Game Graphics
Toolset is introduced in versions for both the //gs Turborez and the //f
blitter, allowing blit-aware games to run on both platforms.

     Banks D0-DF: Sound RAM and expansion. The sound RAM is memory mapped and is
DMA compatible, simplifying the sound tools immensely. New sound tools enforce
memory management of the sound RAM, but the low 64K is reserved for old sound
tools if they are running. An enhanced DOC supports the new sound RAM,
eliminates the swap mode bug, and supports a high quality digitizer.

     Bank E0: Largely compatible with the present //gs memory map. Banks $E0 and
$E1 are stored in VRAMs, thereby reducing the video display overhead of standard
Apple // video modes to one access per scan line. Super hires display overhead
is considerably reduced. The VRAMs are controlled by a simple priority system,
with priority given to video refresh, CAS before RAS DRAM refresh, slow (slot)
DMA, CPU, and fast DMA, in that order. Should the CPU have to wait, the RDY line
is used to insert wait states.

     Bank E1: Also compatible with the //gs, but with I/O space given to
internal peripherals. DOC registers are memory mapped to simplify note
sequencing. SCC registers are memory mapped to assist time-critical serial
drivers.

     Banks E2-EF: Fast I/O space. A contiguous 128K per slot is reserved for
fast bus extensions. (This is my sketchiest idea -- more later.)

     Banks F0-FF: ROM for tools, fonts, a 16 bit BASIC that is toolbox aware
(Beagle Compiler meets the toolbox?), and a built-in ROMdisk driver. No ROM
address space is wasted on the ROMdisk itself; a connector for third party EPROM
disks supports a 'slinky' and DMA compatible 16 Megabyte ROMdisk. A lightning
fast WORM disk can be made using currently available 128K EPROMs and one EEPROM
to store directories and keep track of rewritten blocks. The toolbox aware BASIC
is important because it lets people just fool around on the Apple again. If the
interpreter is properly extended to support pointers and structures (and better
flow controls), acceptable applications and games could be written and
distributed in BASIC. This would open up the machine to a new generation of
hackers and casual programmers; while the project could take a while, it would
be well worth it.

			*    *    *

     How hard would it actually be to implement or provide for all of this? No
one but Apple can answer that. But while they're still open to ideas, here are
some comments and some specific ideas in various stages of development. (Sorry
if I repeat something, I'm pretty much going through the 2nd Ed. Hardware
Reference to jog my memory as I write this.)

     "[This manual] .... will also be useful to anyone wanting to know how to
take advantage of all the features of these computers." (p. xix)
     "The Apple IIGS is, above all, an Apple II." (p. xx)

     Bravo. Couldn't put it better.

     "The design of the Apple IIGS is radically different from that of the
standard Apple II." (p. 11)

     I don't know about that. As somebody on America Online started a folder to
gripe, it's more like a //e on steroids, and primarily because of the Mega //.
The Mega // is more like an IOU and an MMU and some glue on one chip, and
because it has to run at 1 Mhz the VGC goes through all sorts of contortions to
make super hires work. Worse, the Mega // has lots of circuitry that have to be
duplicated in the FPI and the VGC, increasing the cost and complexity of an
otherwise simple design.

     Please don't let this happen again.

     At a bare minimum, all the memory systems should be running on 3.58M.
Simple priority logic and RDY-inserted wait states can be used to arbitrate CPU
and DMA access to the video and sound RAM.

     Slow side synchronization is a major killer in time critical operations.
This is why all internal I/O should be run at 3.58M with RDY wait states.

     The whole machine should be DMA compatible, especially if a DMA controller
is added that transfers data across the bus instead of using the read/write
approach. The cost of the extra 2 times performance might be too high, though;
while adding DMA address counters to the glue chips would work (they generate
the multiplexed RAM addresses anyway) it would also require extra request and
acknowledge lines. In any case, a DMA controller that handles all the bulk data
moving is a must, especially if it can do many channels concurrently without too
much overhead. On the fly Run-Length-Encoding decompression would be easy to add
in and would make compressed animation possible (without a blitter and even
without fill mode).

     Video stuff: The video really needs to be redone on one chip. This saves a
lot of extra logic, and if VRAMs are used for banks $E0-E1 then the refresh
overhead is pretty negligible. It also sets us up for bigger VRAMs in the
expansion slot, which should be more of a direct path into and out of the VRAMs
than an actual video card. The actual video expansion connector would have all
the control lines on it, like sync, clocks and video (including special signals
like 'keycolor' for overlay purposes), but also have access to the VRAM shift
registers, because a digitizer card could feed data into the shift registers for
real-time snapshots of a video frame (even interlaced ones).

     It would be great if the reduced logic 'classic video generator' was then
enhanced by making it more programmable, say by adding address registers for the
various buffers (on reset they assume //gs values of course), expanding the SCB
into a real word and allowing pixel sizes up to 24/32 bits, programmable dot
clocks, character clocks, sync timing and interlace (for use with arbitrary
monitors), and external synchronization for genlock. Allowing the use of
different dot clocks is important because it simplifies super hires and higher
resolution video modes. (By switching crystals in mid stream, you get your
different dot clock with only minor glitches in the output, because the sync
counters just keep going, and the character clock can be reprogrammed fairly
quickly.) In fact, mixing of modes would be nice because then you could have an
old game running with a black & white image of its screen in a window. Some
pretty neat software and hardware tricks would be needed to pull this off,
though.

     The main reason I want a fully programmable video generator is that we will
be stuck with our chip set for as long as we own the computer, and we want that
to last a long time. The Amiga chips are socketed and you can bet Commodore is
working on revisions. Unfortunately (for them) their video still uses DMA
entirely so there is a fairly low limit on the number of colors they can display
at high resolution. Since 64K VRAMs are getting cheap and 256K's will come down
in a few years (I hope) we can get a jump on them by planning for the new
technology now.

     Line drawing and area fill (as blitter features) would also come in handy
for compressed animation and Quickdraw primitives, and quartic drawing for
rasterization of Royal fonts. (I don't know how tough any of this is to
implement so I'll keep my mouth shut.)

     Another bizarre idea I had was that of making the blitter speak regions,
which are not entirely Rects (the usual blitter object) but could be lists of
Rects. Actually, a region could be rasterized into a list of horizontal strips
and that could be compactly represented, so that might be a blitter structure to
use. Again, I don't know how complicated that would be. But it would be nice if
you could use the same structure to specify an arbitrary shaped frame buffer
(and maybe the graphics modes of each part?) to both the blitter and the video
generator, and since they would probably be the same chip then it should save us
some logic. This is one trick that can get windows in hardware, and while large
sprites (or two whole movable display rectangles) would be better, it would
allow the ultimate in page flipping.

     What we really could use is a hardwired Quickdraw coprocessor that does
blit operations and understands fairly sophisticated graphics objects. The
AM29000 is (I believe) too expensive and (IMHO) overkill because you can build a
workstation around one of those things. The TMS34010 would be a nice choice
(it's used by many VGA cards, for example, and is only a 68 pin PLCC). It has
many of the programmable capabilities I'd like to see in a video generator, but
I don't know how much it costs in quantity.

     I think the best solution would be to check out the Turborez board, and
possibly license some of their technology, or even the whole thing if hardware
compatibility is a real issue by then. How games will support both the Turborez
and an Apple-sanctioned blitter is a nasty programming issue if the interfaces
are sufficiently different, which is why I proposed a Game Graphics toolset. I
assume that the toolset could be written to provide low level subroutines for
the game program, thus providing reasonable performance without sacrificing the
toolbox philosophy. Unfortunately, very careful and agonizing planning would be
involved in such a toolset, so it is almost easier to let games write their own
drivers. Horrors, I know, but can you think of a clean and generic blitter model
for the toolset to assume?

     Sound: now that we finally know how to use the DOC properly, it would help
if the sound system was done some justice by removing all the little bugs and
annoyances. Many were disappointed that the ROM 3 didn't have true stereo on
board, especially when there is a circuit in the hardware reference manual to do
it that costs about $10 in parts.

     One good reason to memory map the DOC registers is that the sound interface
loses any context which might have to be saved. You could then have an expansion
card DMAing commands to the DOC without fighting for the Sound GLU, which is a
really cheap interface anyway.  Ensuring that a slow slot DMA read of the DOC
registers will be completed in time could be a small problem though. Sound ram
could potentially be worse. The problem here is that slow DMA protocol never
said "thou shalt use RDY." Nonuse of the RDY pin is a major reason for many of
the weird hardware interfaces in the //gs.

     ADB: it's hard to improve on what the ROM 3 did here. The only thing I can
think of is adding Game I/O support to the microcontroller now that there are
pins free. I would rather play Tunnels of Armageddon and Crystal Quest with a
joystick, and until there are lots of ADB joysticks out there (don't hold your
breath) many new games will not want to waste their time in PREAD so we will
have to use the mouse. (For me it is worse because I have the new wimpy-ball
mouse and its responsiveness for games is not good at all. I don't use a mouse
pad, but my desk is clean and so is the mouse. The ball just simply doesn't
weigh enough to get the kind of precise traction you need for games.)

     Disk Port: all I can say is put a SWIM on it, but don't necessarily make
the FDHD standard; and coprocess it like on the //c+. (I wish dealers would push
the //c+ more, it's a totally underrated machine.)

     Serial Port: if there were two 16 byte spaces (E1/C09x, E1/C0Ax) that
caused a register write of A0-A3, and then a data read or write (holding down
RDY if necessary), it would simplify the time-critical parts of the Appletalk
drivers quite a bit, and might make 57.6Kbaud possible. The SCC is a nice chip
with some amazing capabilities. If DMA and interrupts are set properly it makes
Appletalk packet reception a piece of cake.

     Clock: the Clock control & border color register will probably require the
clock interface to stay in the video generator. It'd be nice if we had a VIA or
two and used the clock with it that way like the Mac does. VIAs are cheap and
they provide an important capability: 1 microsecond resolution timers. You can
do some sneaky things with these and programs could be forced to use a toolbox
call to gain exclusive access to them. (I don't see anything wrong with that.)

     Slots: here's the fun one. Lots of people would like a faster bus, but it
will take some doing. Totally new slots are out because that axes compatibility.
16 bit data paths have too many sacrifices involved, like bus sizing and byte
swapping, because too much of the machine has always been 8 bits. Before long,
it becomes reasonable to just pump up the bus clock.

     (Totally new slots for video, EPROMs, VRAMs, etc. don't fit the above
paragraph because they [a] have no restrictions from the past, [b] are essential
to the future fulfillment of the machine, and [c] are real cheap, like D-sub
connectors or some other mounting which can be prototyped easily.)

     One thing first: I vote we nuke Inhibit. It's practically useless on the
//gs and nobody needs language cards in the //gs anyway. We can probably reuse
it for something else.

     Now, how can we do a faster bus? We want it to have these properties:

     1. uses original 50-pin card edge connectors
     2. uses a minimum of new pins (only a couple are still free)
     3. utilizes old pins if at all possible
     4. supports 'passive' slow cards that use select lines as per the //gs
     5. attempts to support slow DMA cards
     6. supports fast single byte reads and DMA transfers
     7. supports block DMA transfers, though not necessarily a 'burst mode'
     8. arbitration should be simple, efficient, and out of band
     9. try to get the peak bandwidth full if possible

     I've had plenty of random ideas on how to do this, but so far nothing has
really been consistent enough to hold up for very long. We can either use the
standard Apple DMA model (ok, who does the address come from next) or we can try
to set up a distributed model that uses handshaking instead of addresses to keep
block transfers moving after they have been set up with the address bus. This
requires some pretty good handshaking which I'm not sure can be made simple
enough. NO WAY are we going to do a byte wide NuBus. While I like the idea of
transactions and 'one address then data-data-data', NuBus arbitration sucks so
ours will be out of band (DMA IN/OUT maybe).

     I have been trying to develop a peer to peer with external arbiter idea
because it has some nice advantages: address transactions only needed to start a
block transfer, handshaking takes care of it after that; All multi cycle
transfers are inherently split (say the bus runs a 14 Mhz with 74HCT646's doing
the isolation and data latching) and sustained transfers get inherently
pipelined; full bandwidth is only used if enough transfers are happening
concurrently so that someone is ready every bus cycle.

     It also has a lot of problems, such as: every bus member limited to one
read and one write transfer simultaneously (if you use 646's); this means the
real limit is how many transfers can be going in or out of a given memory system
and full bandwidth utilization is hard to get, except for intensive demos; and
the arbitration chip could get really nasty.

     It would almost be easier to just use the old model and run it faster. This
means less peak bandwidth (all those addresses flying around) but you could
still slap an HCT646 on every bus member to implement latch on write, which
would save the DMA controller some time. Block DMA cries out for page mode, and
if there was a way to run four or eight (or more?) bytes around in rapid
succession (stored in small FIFOs on the GLU chips, say) then you could get
reasonable bandwidth from a read/write DMA generator.

     Does any of this make sense? Does anyone have their own ideas?

     I'd be glad to discuss via Email (hope I'm not signing a death warrant on
my free time) and you can reach me at the following addresses:

Internet: toddpw @ tybalt.caltech.edu (use it if you can, it's free)
America Online: toddpw

One last thing: I called Western Design Center and they said the 65832/65032
projects were on hold until they had a customer account for them, and then they
would be produced specially for that customer. These 32 bit extensions of the
65816 are intended to address many of the 65816's shortcomings with respect to
high-level languages and math operations, and would be a very good thing to
happen to the Apple //. We never hear anything about Apple and WDC discussing
them and the I hope the silence does not last for too long.

Todd Whitesel