[unix-pc.general] hardware solution for direct access to video ram

botton@laidbak.UUCP (Brian D. Botton) (07/28/89)

  Hi netlanders.  I am posting this to both unix-pc.general and comp.sys.att,
even though there seem to be some people that would disagree.

  This article is fairly long as is gives the theory and step-by-step method
for constructing a daughter board that allows user level code to access the
video ram on a 3B1.  The driving force behind this board was the desire to port
Mgr to the 3B1 while NOT making any hardware modifications to the actual
mother board.  The idea for the board came from a friend and co-worker,
Brad Bosch, who is a Mgr fanatic.  Brad and I have been running our modified
3B1's continuously for over a month without a problem and I am confident that
there are no problems lurking to bite us later.  We have also gotten the
portable bit blit routines for Mgr working.  So, without further delay, I will
begin.

THEORY OF OPERATION

  The 3B1 memory map allocates addresses 0x420000 to 0x42ffff for video ram,
even though all of these addresses are not displayed on the CRT.  As most of
you know, addressing beyond 0x3fffff causes a memory fault, which is why you
cannot access the video ram.  If you take a look at the equations for the MMU
pal, page A-10 of the UNIX PC Reference Manual, you will find the following:

	BERREN = BGC* X PA22* X SPA23* X LPS0* X LPS1* X T90 X PAS
		 + BGC* X SUPV* X SPA23* X PA22* X KADDR X T90 X PAS
		 + BGC* X SUPV* X SPA23* X PA22* X RW X LWE* X T90
		 + BGC* X SUPV* X SPA23* X PA22 X IODTACK
		 + BGC* X SUPV* X SPA23* X DTACK

  The signal BERREN is what causes a bus error on pin 22 of the 68010.
Address 0x42XXXX means that SPA23 will be low and PA22 will be high, and thus
the first 3 terms of the equation will be false and have no effect.  Therefore,
in order to stop a bus error from occuring when addressing the video ram, the
last two terms must be made false also.  Looking at the other MMU pal equations,
and the available inputs, it becomes clear that simply reprogramming the pal
will not do you any good.  However, if you follow the SUPV signal back to it's
origin, you will see that it is just FC2 on pin 26 of the 68010.  This is the
key to my solution.  

  Which is, a daughter board is placed between the 68010 and the mother board,
that takes off the address lines needed to decode video ram access and forces
the mother board, and MMU pal, to think that it is in supervisor, a.k.a. kernel,
mode.  Yes, this means that ANY user level process can access the video ram, but
when you examine the consequences of this, most of them are trivial, especially
when you now have direct access to the screen.  I suspect the people porting X
to the 3B1 will be very happy to have this direct video ram access.

  To continue, if you look at sheet 4 of the CPU schematic, you will see that
IC 26G does the actual decoding.  Address lines A16, A17, A18, A19, and GATE1*
are inputs, where GATE1* is A24* X A23.  Notice that A20 and A21 are not used
at all to decode device addresses.  What this means is the memory map for the
3B1 has several places where addresses are aliased, i.e., video ram is at
0x42XXXX, 0x52XXXX, 0x62XXXX, and 0x72XXXX.  Therefore, the following equation
expresses what needs to be done:

	SUPV = FC2 + (A23* X A22 X A19* X A18* X A17 X A16*)

  Changing the SUPV signal in any manner is risky, too many propagation delays
could easily cause the MMU pal to function incorrectly.  To keep delays down,
I decided to use a pal with a 10nsec maximum delay time.  Using a pal also makes
manufacturing the daughter board much easier because there are less components
to worry about.  After I verified that my ideas were correct, I programmed a
25nsed UV pal and it worked also.

  The following is the listing of the program file used by abel, the pal
programming tool that I have access to, to burn the pal.  As you can see, it is
a simple program that anyone can follow and duplicate/modify for his/her pal
programmer.

*** CUT HERE - vidpal.abl ****** CUT HERE ************

module VIDPAL
title 'Map video ram into user space on 3B1.
Brian D. Botton		June 27, 1989'

"
"   Permission is granted to use this file in any way as long as:
"	1.  No profit is made off of my design.
"	2.  Credit is given where it is due.
"
"
"   U1 - MC68010L10
"   U2 - PAL16L8DCN

	U2	device	'P16L8';

"   Constants:
	x = .X.;

"   Pass FC2 through, unless addressing 0x42xxxx, 0x52xxxx,
"   0x62xxxx, or 0x72xxxx, then force SUPV high.

	A23, A22, A19, A18	pin	1, 2, 3, 4;
	A17, A16, FC2		pin	5, 6, 9;
	SUPV			pin	12;

"   Tie all unused inputs to Vcc, pins 8, 9, & 11.

equations
	SUPV = FC2 # (!A23 & A22 & !A19 & !A18 & A17 & !A16);

test_vectors 'SUPV test'
	([A23, A22, A19, A18, A17, A16, FC2] -> SUPV)

"	 A  A  A  A  A  A  F	  S
"	 2  2  1  1  1  1  C	  U
"	 3  2  9  8  7  6  2	  P
"				  V

	[x, x, x, x, x, x, 1] ->  1;
	[0, 1, 0, 0, 1, 0, x] ->  1;
	[x, 0, x, x, x, x, 0] ->  0;
	[x, x, x, x, 0, x, 0] ->  0;

end VIDPAL

*** CUT HERE ****** CUT HERE ************


REQUIRED SUPPLIES

	1.  Double sided PC board blank, approximately 3 3/8 x 1 5/8 inches.
	2.  One roll of 4 pt. and one roll of 2 pt. artist tape.
	3.  One 64 pin low insertion force socket, P/N 44F8066, page 769.
	4.  Three 40 pin carrier with socket pin terminals, P/N 46N6790,
	    page 736.
	5.  One .1 micro farad tantalum capacitor, 7 volts or greater.
	6.  One MMI16L8DCN pal (10nsec), see text.
	7.  2 inches of 28 gauge solid wire, no insulation.
	8.  Solder, etchant, one XXX drill bit, one drill bit smaller than
	    XXX, drill, hobby knife, diagonal cutters, a jewelers file.
	9.  Rosin core solder and a soldering iron, small wattage.
	10. A spare 20 pin IC socket, needed only for layout purposes.

  The part and page numbers are from the Newark Electronics catalog #110.
The XXX drill bit should be sized to allow the socket pin terminals to be easily
inserted into the PC board.  Sorry, I don't know the size, I just happened to
have one that worked and the catalog doesn't give the diameter.

STEP-BY-STEP INSTRUCTIONS

  The purpose of the daughter board is to allow the FC2 signal to be intercepted
and to provide a mounting point for the pal.  You will want to orient the PC
board such that pin 1 of the 68010, U1, is at the top left corner of the board
with the pal, U2, to the right of U1, with U2's pin 1 horizontal to pin 49 of
U1.  While you may certainly make the daughter board using a bread-board and
point-to-point wiring, I liked the etched PC board approach better.  All
referenced to the board are made assuming the long axis of the board is vertical
and the short axis is horizontal.  So let's begin.

1.  First we want to determine the size of the daughter board.  To do this, lay
    the 64 pin IC socket along the left side of the PC board and the 20 pin IC
    socket to the right.  You want to allow about 3/8 of an inch clearance
    between the two sockets to allow room for the traces to be laid down.  Mark
    the square formed by the two sockets, allowing about 1/8 inch of overhang on
    each side to give the board strength.  After marking the square, cut it out,
    smooth the edges with a file or sandpaper, and clean both surfaces until
    shiny.

2.  Carefully place the 64 pin socket on the left side of the PC board and mark
    where the pin 1 and 32 are.  Remove the socket, and draw a line connecting
    the two marks with a straight edge.  Replace the socket and mark where the
    other 30 pins touch the line.  Needless to say, this is very critical, you
    cannot be too accurate, so TAKE YOUR TIME.  Using the point of the hobby
    knife, make starting indentations where the pins are marked.  Use a smaller
    than XXX drill bit and drill 32 starter holes, then switch to the XXX drill
    bit and drill the finish holes.  Make sure the holes are not too tight, this
    will make inserting the pin sockets more difficult and it will be easier to
    break some pins.  Again, be very patient drilling these holes, especially
    the starter holes.

3.  Place the socket on the board, aligning the left holes with the drilled
    holes, and mark where pin 33 and 64 should go.  Again, draw a line between
    the marks and use the socket to mark where the other 30 holes go. 

	CAUTION:  You will be running traces between pins 38 & 39, 40 & 41, and
		  41 & 42.  Make sure you get good, equal spacing around these
		  pins, it will make your life much easier.

    As in step two, drill the holes.

4.  Place the 20 pin IC socket along the right edge of the board and align it
    so that it's pin 1 is directly to the right of pins 16 and 49 of U1.  Using
    the same mark and drill procedure described above, drill the 10 left holes
    and then the 10 right holes.  We now have to drill holes to allow signals to
    pass from one side of the board to the other.

5.  Drill a small hole half way between pins 1 and 20 and about 1/4 inch above
    U2, call this hole A.  Drill hole B 2/3 of the way from U1 to U2 and between
    pins 41 & 42.  Drill hole C 1/3 of the way from U1 to U2 and between pins
    40 & 41.  Drill hole D 2/3 of the way from U1 to U2 and between pins 38 &
    39.  Drill hole E 1/4 inch to the left and center of pins 40 & 41, and
    finally, drill hole F 1/4 inch to the right and center of pins 26 & 27.

6.  The next step is to make solder pads for the pin sockets and feed through
    holes.  To do this we will use the 4 pt. artist tape and hobby knife, and
    make sure the board is clean.  It is quite likely you will need to clean
    both sides again.  Pads are made by placing the tape across the hole
    horizontally so the hole is in the center of the tape and trim the tape so
    that one widths worth of tape is to the left and right of the hole.  After
    the tape has been applied to the hole, carefully use the blunt end of the
    hobby knife to burnish the tape to the board.  This is important because
    otherwise the etchant will undercut the tape and ruin the board.  Mash it
    down good, but don't distort the tape.  If needed, don't be afraid to remove
    a piece of tape and try again.  On the top side of the board, put pads on
    all feed through holes and on pins 26 and 49 of U1.  On the bottom side of
    board, put pads on all holes.

	NOTE:  While at the art supply store, pick up a set of small
	       donut pads.  While optional, they make the feed throughs,
	       A through F, a lot easier to do and look nicer.

7.  Now we start to put down the signal traces using the 2 pt. artist tape.
    Starting on the top side, run a trace from U1, pin 49 to hole A above U2,
    this will provide +5 volts to the pal.  Run a trace from U1, pin 26 above
    hole E (don't touch E!), between holes 41 & 42 to hole B, this is FC2.  Run
    a trace from hole E between holes 40 & 41 to hole C, this is GND to the pal.
    And run a trace from hole F to hole D between holes 38 & 39, this is the new
    SUPV signal.  It is easiest if you press the tape down at one hole and
    burnish as you lay the tape.  If you do it this way your will find it is
    easy to control the path of the tape and make any required curves.  Be
    careful when laying out curves, any kinks in the tape will allow etchant
    under the tape.

8.  Turn the board over and start laying out the bottom traces.  For those of
    you who are unfamiliar with doing this kind of work, take your time so you
    don't make any mistakes.  Remember, the top is still at the top, but the
    right and left sides are reversed.  Run a trace from U1, pin 16 to hole E,
    and from hole C to U2, pin 10, this completes the ground.  Run a trace from
    hole A to U2 pins 20, 7, 8, and 11.  I recommend running the trace down the
    middle of U2, this completes the +5 volts.  U2 pins 7 & 8 are unused and are
    tied to +5 as per the spec. sheets for the pal.  Run traces from U1, pin 52
    to U2, pin 1, from U1, pin 51 to U2, pin 2, from U1, pin 47 to U2, pin 3,
    from U1, pin 46 to U2, pin 4, from U1, pin 45 to U2, pin 5, and from U1, pin
    44 to U2, pin 6.  These are address lines A23, A22, A19, A18, A17, and A16,
    respectively.  Run a trace from hole B to U2, pin 9, this completes FC2.
    Finally, run a trace from hole D, around the bottom of U2 to U2, pin 12,
    this completes the SUPV signal and all of the traces.

	WARNING:  NEVER allow any metal to touch the etchant, violent
		  chemical reactions can occur.

9.  Double check all of your work, making sure traces go where they belong, and
    that all of the tape is securely attached to the PC board.  Pour the etchant
    into a class or plastic bowl and place the card into the etchant.  To etch
    the board evenly, you wand to arrange things so the card remains on edge
    while emersed in the solution.  This allows both sides to etch at the same
    rate and greatly increases the quality of the etching.  I find that it is
    helpful to have a couple of plastic sticks to rock the board around and to
    stir the etchant.  After all of the excess copper has been removed, remove
    the board from the etchant and run under hot water for several minutes,
    while scrubbing the board to remove any etchant.  Remove the tape and
    carefully shine up the copper traces.  If you have used a fiberglass PC
    board, and you should, you can now hold up the board to a strong light and
    see all the traces.  Double check to make sure they go where they belong and
    that there are no shorts, especially where traces go between pins.

10. Insert 64 of the pin sockets all the way into the holes for U1.  If any
    happen to break, save them, we'll use them in a minute.  Solder the top
    of pins 16, 26, and 49 to the pads beneath the shoulders of the pin sockets.
    Turn the board over and solder all 64 pins to their pads.  Make sure that
    the traces on the top side of the board don't short to any of the pin
    sockets.  You may need to use the hobby knife to trim a trace or two.  Next
    insert the 20 pin sockets into U2, if you happened to break any of the pins
    off of other pin sockets, you may use them on U2.  Solder the 20 pin sockets
    to their pads and using diagonal cutters, cut off all 20 pins of U2 only.
    Take the 28 gauge wire and strip off a couple of inches of insulation.  For
    each hole, A through E, pass the end of the wire through about 1/16 of an
    inch and bend it over 90 degrees.  Holding the bent wire to the pad beneath
    it, cut the wire off on the other side leaving 1/16 of an inch.  Bend this
    end over, leaving the wire in a U shape and solidly in the hole.  Solder the
    wire on both sides of the board.

	NOTE:  When I say cut off the pin of the pin sockets, I mean the small
	       piece below the larger cylindrical socket.

11. Cut off U1, pin 26, we don't want to pass FC2 to the mother board.  Using a
    jewelers file, file off the bottom part of the pin socket so it will have no
    chance of shorting when inserted into the 64 pin IC socket.  Take that 64
    pin IC socket and file the socket (top half) of pin 26, again, so it has no
    chance of shorting to the filed down pin 26 on the daughter board.  This
    should leave a small trough in the plastic of the socket that is more then
    big enough for a piece of 28 gauge wire to lie in without sticking out above
    the top of the socket.

12. Strip 1 inch of insulation off of the 28 gauge wire and stick the end of the
    wire into what is left of IC socket pin 26.  Very carefully solder the wire
    to the pin 26 socket.  The plastic of the IC socket melts very easily so you
    must work fast and accurately.  When the solder cools, cut the wire, leaving
    about 1/2 inch and bend the wire at a 90 degree angle into the trough and
    toward the center of the socket.

13. Very carefully insert the daughter board U1 pin sockets into the 64 pin IC
    socket.  Don't bend any of the pin sockets, or you'll have to unsolder the
    pin and replace it.  Likewise, don't bend any of the IC socket pins or you
    will have to replace it also.  If you were careful when drilling U1's
    holes, it should be fairly easy to get the daughter board into the socket.
    When the socket is completely on, pass the 28 gauge wire solder to IC
    socket pin 26 through hole F.  Make sure there is no contact with daughter
    board pin 26, solder the wire to the top pad of hole F, and clip the excess
    wire off.

14. Take the .1 micro farad capacitor and position it on the top of the daughter
    board between U1 and U2 and solder one leg to hole A and the other to hole
    C.  This will provide power supply decoupling to the pal.  I ran my board
    for several days before I put a scope on the +5 and saw all the spikes.
    While the board works without the capacitor, it's good practice to use it
    anyway, and it did clean up the +5 quite a bit.

15. Ohm out all traces to make sure there are no shorts.  Also, check to make
    sure that none of the pins short to any traces.  If there is a short, fix
    it with the hobby knife.

	The crowd roars as they realize you have finished your
	daughter board, :-).  I have made three boards, the first
	took about 5 hours because I had to find all my PC board
	junk.  The second and third took about 3 3/4 hours each
	to complete.

INSTALLING THE DAUGHTER BOARD

1.  Install the pal on the daughter board.

2.  Take the cover off your machine, this has been described elsewhere on the
    net.

3.  Carefully remove the 68010 from the mother board.

4.  Insert the 68010 into the pin sockets on the daughter board.

5.  Carefully insert the IC socket pins of the daughter board into the IC socket    pins on the mother board.

6.  Re-assemble your machine.

7.  You're done!!!

TESTING

  Declare an unsigned sort such as:

	unsigned short *video = (unsigned short *)0x420000;

  Don't forget the cast, or the pointer will point to the wrong place,
namely 0x0000.  Now you just place a value where video points to:

	*video = 0xffff;

HOW DO I GET A PAL?

  I got my pals from Hamilton Avnet Electronics, but they have a $50 minimum
order, and unless you are going to make several boards this may not be the best
answer.  Jameco Electronics lists a 16L8 pal which I suspect is either 25 or
35nsec.
  To add further controversy, I don't know what the affects other upcoming
hardware projects will have, so I don't know if you should use the 10nsec or
a 25nsec pal.  The 10nsec pal uses 180 ma of current, or .9 watts!  You
can get a 25nsec pal that uses 45 ma of current, or .225 watts.  It may be
that some people will find they have a current or heat problem and the slower
pal is better.  On the other hand, having as short a propagation delay as
possible may be more important.  All I can say is that the 10nsec pal works
fine in my machine, a 3B1 with 1 Meg on board ram, a two port combo card
with an additional .5 Meg, and a 40 Meg hard disk.  Brad's 3B1 has 1 Meg of
ram and an 80 Meg drive and hasn't had any problems either.
  Finally, I'm not so sure I want to get into the pal burning business.  For
one, I don't know how much interest this will generate.  For another, I don't
have a whole lot of free time to dedicate to this kind of thing.  So, what I
propose is that all of those people who are interested in this little project
send me mail at !laidbak!botton.
  What I would like to know is how many people are interested in this project,
even if you can get and burn your own pal.  Also, tell me if you would like me
to provide a programmed pal, which speed, and if you would like the sockets
provided as well.  Let's set a cutoff date of August 15 for this survey, and
we'll see what happens.

MISCELLANEOUS

  Again, there is no reason why you must use a double sided PC board, that was
just my method.  It can easily be done with point-to-point wiring or on a single
sided PC board, but each has it's own set of problems.

  Enjoy your newfound video freedom.

DISCLAIMER - Junk to cover my ___.

  This project was done on my own time, and even though it works for
me, there is no guarantee that it will work for you.

  Also, do with this as you may, as long as:

	1.  You do not profit financially from it.
	2.  Credit is given where it is due.

-- 
     ...     ___
   _][_n_n___i_i ________		Brian D. Botton
  (____________I I______I		laidbak!botton
  /ooOOOO OOOOoo  oo oooo

scs@itivax.iti.org (Steve Simmons) (07/28/89)

botton@laidbak.UUCP (Brian D. Botton) writes:

>  Hi netlanders.

Hi yourself.  Nice introductory article.

>  This article is fairly long as is gives the theory and step-by-step method
>for constructing a daughter board that allows user level code to access the
>video ram on a 3B1. . . .

Usually these hardware article make me nod my head sagely and wish I
wasn't such an idiot with a soldering iron.  However, there are some
interesting implications....

>THEORY OF OPERATION

>  The 3B1 memory map allocates addresses 0x420000 to 0x42ffff for video ram,
>even though all of these addresses are not displayed on the CRT.  As most of
>you know, addressing beyond 0x3fffff causes a memory fault, which is why you
>cannot access the video ram.  If you take a look at the equations for the MMU
>pal, page A-10 of the UNIX PC Reference Manual, you will find the following:
> [details removed]
> . . . it becomes clear that simply reprogramming the pal
>will not do you any good.  However, if you follow the SUPV signal back to it's
>origin, you will see that it is just FC2 on pin 26 of the 68010.  This is the
>key to my solution.  
>  Which is, a daughter board is placed between the 68010 and the mother board,
>that takes off the address lines needed . . .

There might be a very interesting implication here.  He's talking
about direct access to the video ram (wouldn't this be easier and safer
with an appropriate new device driver for fb?  never mind, it's irrelevant
to the implication).  But it's clear he's found and removed one bottleneck
to addressing >4MB!  With ram prices coming down (and down, and down) I
see an interesting idea.

Using a similar scheme is it possible put >4M in the Unix-PC?  This
should be of great interest to the mondo combo people and the X people.
At first glance I though a killer gating item might be the number of lines
on the backplane, but it looks like this might enable one to completely
bypass the backplane for system memory expansion!  Mod the mondo combo to
use SIMM modules (1Megabyte, 100ns are $180 and still dropping), and think
about a 8 or 16MB 3b1.  Brrrr!  You'd still need the backplane, but only
for power and accessing the non-memory parts of the board (serial ports,
etc).  This also has the benefit that when your 3b1 finally dies, you can
possibly put the SIMMs into the replacement.  Yes, I know SIMMs stick up
too high to fit thru the openings into the backplane.  Didn't a large
chunk of the mondo combo stick out the back?  Anyway, it's an idea.
-- 
Steve Simmons		          scs@vax3.iti.org
Industrial Technology Institute     Ann Arbor, MI.
"Velveeta -- the Spam of Cheeses!" -- Uncle Bonsai

botton@laidbak.UUCP (Brian D. Botton) (07/29/89)

In article <2434@itivax.iti.org> scs@itivax.iti.org (Steve Simmons) writes:
>botton@laidbak.UUCP (Brian D. Botton) writes:
>
>>  Hi netlanders.
>
>Hi yourself.  Nice introductory article.
>

Thank you, ;-).

>							 . . . He's talking
>about direct access to the video ram (wouldn't this be easier and safer
>with an appropriate new device driver for fb?

  I'm afraid not, that requires a system call trap which has a fair amount
of overhead.  Just to write one 16 bit word to memory requires setting up
the stack, trapping to the kernel, finally doing the write, return from the
trap, and cleaning up the stack.
  If you want speed, forget about using a device driver.  Most high speed
graphics devices allow you to map the video memory into your process space,
but unfortunately the 3B1 doesn't allow this.  If you look on page 16 of the
cpu schematic, you'll see that A12-A21 go into the page table ram chips, and
MA12-MA21 come out.  Therefore there is no way to modify the page table entry
to address > 0x3fffff.  The really tragic thing here is that there are 3
unused bits in the page table, so A22 and A23 COULD have been added to the
virtual memory space, and given us 16Meg process size, :-).  I think I could
come up with a way to do this in hardware, but there would have to be a major
amount of work done to the software, i.e. kernel, shared libraries, etc.
  On a side note, the page tables are big enough for only ONE process at a
time.  That means a context switch has to swap out the table to run another
process, I can hear my disk thrashing now!.  Sure would have been nice if they
had used a little larger ram chips, even one additional process worth of ram
would have made a big difference.

> 		. . .  But it's clear he's found and removed one bottleneck
>to addressing >4MB!

  Yes, but there are problems with that.  If you aren't careful you could
allow access to ANY area in memory with disasterous results.  That's why I
went to the trouble of decoding the address.  In fact, I originally tried
to include address lines 20 & 21 in the decode logic.  Unfortunately that
generated too many terms for the pal and I was lucky to discover that that
they weren't needed.  In addition, this method does open up the addresses to
ALL processes.  While the paranoid won't like the security problems, there
isn't any way to hurt the system.

>Using a similar scheme is it possible put >4M in the Unix-PC?

  Not unless you go above the 16 Meg mark.  As I mentioned in my original
posting, the video ram has alias addresses at 0x52XXXX, 0x62XXXX, and 0x72XXXX.
If you take a look at the way other addresses are decoded above 0x3fffff, you
will see that most of them are aliased in a similar manner.
  How do you get above the 16 Meg mark? Not with a 68010, but I've had dreams
of a 68020 and 68881 . . . .

>Steve Simmons		          scs@vax3.iti.org

  Thanks for the comments Steve.  I wasn't sure how this would be received,
especially being as long as it was.

-- 
     ...     ___
   _][_n_n___i_i ________		Brian D. Botton
  (____________I I______I		laidbak!botton
  /ooOOOO OOOOoo  oo oooo

botton@laidbak.UUCP (Brian D. Botton) (08/06/89)

  A week ago or so I posted an article describing how I gained access to
the video ram on my 3B1.  To tell the truth, I've been a little under-whelmed
with the response I've received.  I did receive a few letters, one form John Bly
Milton IV asking some questions about why I went to such extreme measures.
I hope John doesn't mind if I answer his questions/comments publicly so that
others will understand just why it is worth the trouble.

JOHN:  Seems a bit brute force.

  It is, but because of the design of the 3B1 there is little alternative.
If you want to access an area of protected memory you have basicly three
choices, device driver (see below), use the virtual memory system to map that
memory into your process's address space, or allow direct access.  Personally
I would have preferred the second option, but the page table rams do not allow
access to memory greater then 4Meg, and therefore this wouldn't work. It's too
bad too, because there are three unused bits in the page table rams that could
have been used, which would have allowed this :-(.

JOHN: Is mgr really that good?

  I'll admit that I don't get my jollies from window managers, but it is
public domain and for those who have used sunview on a Sun 3/50 under 4.0,
painful isn't it ;-), Mgr is easily an order of magnitude faster.  It is
fairly small, ~200k, so it shouldn't eat up gobs of memory.  It sure is nice
when your window manager doesn't have to be paged in or out.

JOHN: Why didn't you try a software (loadable device driver) approach?

  A very good question and one that bears answering.  Lets take a look at what
happens when you do a system call, such as a write.  Assuming you have already
opened the device, the sequence of events are:

	1.  The user process puts data into a buffer, it doesn't matter what
	    kind of buffer, variable, array, malloced memory, etc.
	2.  The user process calls write() with the proper parameters, this
	    causes several bytes to be pushed onto the process's stack.
	3.  That write() routine is actually a stub routine, probably written
	    in assembly, that further manipulates several bytes on the stack.
	    The stub then executes a trap instruction that forces the processor
	    into the supervisor mode and transfers execution to the kernel.
	    To do this several more bytes are written onto the stack, take a
	    look at the Motorola documents on the 680x0 family for details.
	4.  The kernel figures out which system call was desired from a lookup
	    table and then jumps to that routine.
	5.  The device driver retrieves the address of the buffer and transfers
	    the data, in this case to video memory.  If this had been a block
	    device instead of a character device, data would have been
	    transferred to a buffer after it was allocated, but for video ram
	    we would have a character device and thus no extra buffer.
	6.  Step 4, 3, and 2 are reversed, undoing all of those stack
	    manipulations.  Also, when the system call returns the kernel takes
	    the opportunity to check if another process should run, so you may
	    loose the processor until the next context switch.

  Now lets take a look at my solution, assuming that you have already set the
video pointer like so:

	unsigned short data, *video = (unsigned short *)0x420000;

  And let's assume you want to write to the 23rd u_short, you would do:

	video[22] = data;

  I'm sorry folks, but this seems like a heck of a lot easier.

  There are some additional benefits to this approach, such as:

	1.  You don't have to spend who knows how may hours writing
	    and testing a device driver.  I wrote a device driver for
	    a Ramtek graphics device on a BSD 4.3 VAX when I was in
	    college and I know how hard it can be to find subtle bugs.
	    But I must admit, a device driver for access to the video
	    ram is fairly trivial, just look at the vidram device driver
	    that has been posted on the net by Mike "Ditto" Ford.
	2.  The special window functions are now in user level code which
	    is far easier to debug.  When Brad and I were working on the
	    portable bit blit code it was made a lot easier than if we had
	    to keep reloading a device driver.  And who knows how many times
	    we would have crashed our machines getting it right.
	3.  Because you now have one screen worth of ram available where it
	    belongs, you can allocate one less buffer. Plus you don't have to
	    make expensive system calls to update the buffer.  For those people
	    who are sleeping, this works too:

	    data = video[22];

	4.  Many window managers expect to see the video memory mapped into
	    user space, Mgr does, and I suspect that X does also, even though
	    I haven't seen any code.  Having this access makes porting a whole
	    lot easier.  In fact, Brad and I weren't going to do the port until
	    we came up with an easy way to get to the video ram.
	5.  It is fast, as fast as the 3B1 with a 10 MHz clock will ever get.
	    My method requires two operations, an offset added to the base and
	    one word of data transferred to that address, i.e, a few machine
	    instructions with < 10 memory references.  The device driver method
	    requires what, 10 - 30 instructions, 10 - 30 instruction fetches,
	    and all those stack writes and reads.  Plus the device driver has to
	    have a way to calculate the offset, possibly requiring an address to
	    be sent in the data stream.
	6.  This is a security hole.  If the page table could have been modified
	    then the MMU pal would take care of this for us.  But since it can't
	    we have a hardware mod.  But this really isn't that big of a deal
	    on a small system.  It isn't like there are a hundred users and you
	    have to protect the screen from peepers.  Security is one of the
	    resons I went to the trouble of using all those address lines in
	    my pal.
	7.  Window manager code doesn't belong in the kernel anyway.  When we
	    get Mgr working all the way we're going to remove the wind.o 
	    driver, which will give us better than 40k of precious kernel space
	    back.
	8.  I don't know if I should mention this, but I don't see any reason
	    to hide it.  The displayable portion of the video does not use up
	    all of the video ram.  So we also have an automatic shared memory
	    segment at the end of video ram.  BUT, it is wide open and you're
	    probably a fool to use it and an idiot to rely on it.

  I hope this answers some questions and piques some curiosity about what we,
the 3B1/7300 user community, can do with our machines.  Personally, I think the
ability to get away from ua and use a "real" window system is worth the
afternoon it takes to make a daughter board.  We also get to have source for a
major part of the system, that alone is enough for me to want to change to Mgr,
X, or what ever.
  Again, I welcome comments, good and bad.  And if you too need a pal and don't
have access to a programmer, let me know and we'll see what happens.

-- 
     ...     ___
   _][_n_n___i_i ________		Brian D. Botton
  (____________I I______I		laidbak!botton
  /ooOOOO OOOOoo  oo oooo

jbm@uncle.UUCP (John B. Milton) (08/09/89)

In article <2575@laidbak.UUCP> botton@laidbak.UUCP (Brian D. Botton) writes:
...
>JOHN: Why didn't you try a software (loadable device driver) approach?
I was refering to an everything in the driver aproach, where access would still
be fast, but your point is well taken. It is wonderful to be able to work on
drivers the way we can on this machine, but it's still a bitch next to regular
user level programs. The idea is begining to grow on me, which does bring up a
bit of a delema as far as how I'm going to implement the screen part of the
X server.

>	8.  I don't know if I should mention this, but I don't see any reason
>	    to hide it.  The displayable portion of the video does not use up
>	    all of the video ram.  So we also have an automatic shared memory
>	    segment at the end of video ram.  BUT, it is wide open and you're
>	    probably a fool to use it and an idiot to rely on it.

I WAS hoping you wouldn't it, but it is 32768-31320=1448 bytes...

John
-- 
John Bly Milton IV, jbm@uncle.UUCP, n8emr!uncle!jbm@osu-cis.cis.ohio-state.edu
(614) h:294-4823, w:785-1110; N8KSN, AMPR: 44.70.0.52; Don't FLAME, inform!

tkacik@rphroy.UUCP (Tom Tkacik) (08/14/89)

In article <2434@itivax.iti.org> scs@itivax.iti.org (Steve Simmons) writes:
>botton@laidbak.UUCP (Brian D. Botton) writes:
>
>>  Hi netlanders.
>
>Hi yourself.  Nice introductory article.
>
>>  This article is fairly long as is gives the theory and step-by-step method
>>for constructing a daughter board that allows user level code to access the
>>video ram on a 3B1. . . .
>
>Usually these hardware article make me nod my head sagely and wish I
>wasn't such an idiot with a soldering iron.  However, there are some
>interesting implications....

>Using a similar scheme is it possible put >4M in the Unix-PC?  This
>should be of great interest to the mondo combo people and the X people.

This is an interesting idea.  But I do not think that it is possible.
I see three problems which will prevent it from working.

1) Memory map RAM.  To add more memory, the memory management must be
increased.  Page Mapping RAM goes from  0x400000 to 0x4007ff.  This could
be doubled to allow another Meg, because there is nothing in the
0x400800 to 0x400fff range.  But I think that timing would be a problem.
The way that page mapping works requires very fast page mapping RAMs, and
I am not sure that a fast enough circuit could be designed on a 
daughter board.  (I would like to here someone prove me wrong :-).

2) The kernel must know that another 4Meg exists.  I am not sure what is
involved, but I do know that the current kernel will have to be modified.
Can anybody say that this really is a minor change?  I have my doubts.

3) The killer.  Where to put another 4Meg.  The address space defined
by Convergent is really designed with the current 4Meg limit.
They split it up into 4 4Meg partitions.
	1) Fast memory -- RAM		(400nsec access)
	2) Fast I/O    -- video ram etc.
	3) Slow memory -- ROM		(900nsec access, or is it 1usec?)
	4) Slow I/O    -- misc. I/O devices

There is no place left to address more memory.  It looks like the fast I/O
only uses 0x400000 to 0x4fffff, and that perhaps we could steal
0x500000 to 0x7fffff (3Meg).  Well, 3Meg is still better than none, and
at least it is in the fast access area, so there would be no speed penalty.
Unfortunately, as Brian Botton mentioned, the address lines are not fully
decoded so that 0x400000 looks like 0x500000, 0x600000, and 0x700000.
The addressing would have to be reworked, to fully decode the I/O in this
range.


It is an interesting idea, and if these points are really minor, and
someone thinks it can be done, I am willing to help try.

Any more comments?

---
Tom Tkacik		GM Research Labs,   Warren MI  48090
uunet!edsews!rphroy!megatron!tkacik
"If you can't stand the bugs, stay out of the roach-motel."  Ron Guilmette
-- 
---
Tom Tkacik		GM Research Labs,   Warren MI  48090
uunet!edsews!rphroy!megatron!tkacik
"If you can't stand the bugs, stay out of the roach-motel."  Ron Guilmette