[comp.os.minix] what I'm doing with protected mode Minix

Leisner.Henr@xerox.com (Marty) (01/21/88)
What follows is a summary of what I've been doing with my Minix system for the
last 6 months on a PC-AT.  I welcome comments and opinions.  I feel a number of
things I changed makes Minix a more robust system.  I'm not sure how the
following fits into the official Minix strategy.  It kinda assumes you're
reasonably familiar with the 286 architecutre and know:
	TSS = task state segment
	LDT = local descriptor table
	GDT = global descriptor table, etc.
	
  Enjoy....

Here is a summary of what steps I took to get a protected mode Minix system up
and running.  I made a number of changes to a system in real mode in order to
prepare for a protected mode system. This is an outline of what I did.  

The scope of what I did became a rewrite of a number of sections.  Whenever
possible, I moved processer dependent code into seperate files whenever
possible.  

I haven't done much with it in the last 6 weeks.  I  have the system running on
a seperate AT in my office and use it regularly to unshar archives, untar
various tar files, etc.  I've had the system up and running for several weeks at
a time with no major problems -- occassionally I get an occassional protection
violation (which causes a core dump) and very occasssionally I get a double
fault (I don't attempt an error recovery except core dump on an exception -- if
the system produces the exception (as opposed to a user process) -- my system is
dead (can't recover).

I've made extensive changes to the kernel and the memory manager -- the file
system hasn't been touched (well almost -- I had to change something from int to
unsigned to get around an Aztec bug (my pipes leaked ;-))

I used Bach's Design of the Unix Operating System for reference at times (I
wanted to see how sys V did some things).

I pretty much am running a quasi-1.2 system.  I haven't implemented the termcap
stuff yet or the boot from hard disk patches (I'm gonna get back to this soon).

Oh, by the way ,  I'm using an HP1631D Logic State Analyzer with a pod for the
80286 and an 80286 dissasembler.  Don't leave home without one!!

DESIGN GOALS

I did a lot of kludgey things to get it to work.  I wanted to do a number of
things via the C preprocessor, but this often didn't work out the way I planned.
My basic design included these goals:
1) fork would share text
2) The memory manager would start to think in terms of attaching regions to
processes (where a region is a contiguous block of memory).
3) The memory manager would allocate memory in 256 byte pages (so a 16 bit
number would span 16Mbytes of physical memory).
4) Run two hole lists to get at extended memory (one below 640K, one above 1
Mbyte).
5) Be able to run more complicated memory models (at a minimum 1 data segment, 1
code segment and 1 stack segment).   
6) Take advantage of the processor architecture (i.e. perhaps use call-gates
instead of always passing messages).
7) Start using 32 bit virtual addresses instead of T, D or S + 16 bit offset.  
8) Be able to implement some reasonable form of shared memory.
9) Be able to use it as a real-time system (one task running on a 1kHz interrupt
for process-control type things)
10) Try and cleanly split processor/system dependent from independent code.
This isn't easy.  It turned out the modularity of the system didn't always buy
me that much because I had to often change the MM and kernel together.  Or put
in special code in the MM to be compatible with the way the kernel worked .  Or
... (you get the picture)...
11) When appropriate, rewrite rather than patching.
12) Let the system structure dictate the data structures.  I didn't want to go
through translation between internal data structures and processor dependent
data structures.  The segmented architecture of the x286 throws too much stuff
out the window (I think).  
13) Write in C whenever possible.  If possible use canned assembler routines to
take advantage of architectural features C compilers generally won't employ
(i.e. string instructions on x286 for block moves).
14) Maintain binary compatability with older Minix versions.
15) Be able to add server tasks easily.

REAL MODE ENHANCEMENTS/CHANGES
This is a list of changes made which were tested in real mode:
Kernel:
1) used lidt instruction to define a new interrupt vector table.  Reprogrammed
interrupt controls to use 40H-4FH as where vectors are (interrupts 8H-FH are
reserved for internal x286 exceptions).
2) took out reboot code (in order to reboot, I have to power down)
3) starting using disable/restore like Xinu (instead of lock/unlock)
4) used the Aztec port subroutines (inportb(port) returns the result of the in
instruction -- I found the port_in(number, &result) a little strange -- I prefer
something like result = port_in(number)
5) I generally have seperate subroutines to read/write from/to user/kernel
space.  I found this a little cleaner than doing to umaps followed by a
phys_copy.  I also am using the Aztec routine movblock(char far *src, char far
*dst, int num_bytes) and limiting block moves to 64K.  Since my process memory
space is no longer contiguous, this appears to be ok.
6) My kernel relocates dynamically.  I'm booting off DOS first, the kernel gets
its code segment and treats that as its base address (instead of hardcoding the
boot address at 0x600).  I found this easier so I don't have to format a floppy
disk each time a build a system.
7) trapped a few exceptions in real mode (i.e illegal opcode: interrupt 6,
segment overrun, intr 13).  This makes a system surprisingly more robust.
8) Initialize date/time off Cmos clock
9) changed head.asm to give fs, mm and init a stack after the end of BSS.  
10) don't sort partitions in the winchester driver (so the hd numbers agreed
with my dos fdisk).  Besides the partition sorting was kinda dead code.
11) I wanted the kernel, mm and fs to use stacks at the end of their bss area (a
patch to the a.out file was sufficient.  The kernel assumes the stack starts at
the end of the bss.  This way the kernel could set up seperate stack segments if
desired (I wasn't thrilled how kernel, mm, fs and init have their stacks
embedded in their data area.)
12) Removed some if(pc_at) code -- since I only use PC-ATs and my enhancements
are generally PC-AT specific.

Memory Manager:
1) Memory manager allocates regions for code, text and data.  How fork works
becomes memory model dependent.  For shared I&D, no sharing takes place between
processes and both text, data and stack share 1 segment (maximum 64K).  For
split I&D, text is shared (not copied on fork()) and data/stack is created.
2) Since the memory manager knows where everything is located at, it does its
own copying (rather than passing this onto the system task).
3) Brk is a problem.  It doesn't apply much to 8086 architectures.  The best you
can hope to do is brk individual segments when more complicated memory models
exist.  I'm kludging brk now.  I've communicated with other's who've implemented
Unix on segmented architectures and the agreement is brk is pretty useless.
4) changes memory allocation size from 16 byte units to 256 byte units.
5) I kinda took out stack checking for the interim.  I have enough other
protection violations that I can tell when the stack causes problems.


PROTECTED MODE IMPLEMENTATION

Oh boy.  This gets complicated.

1) kernel, mm and fs run code and data/stack out of GDT privelege level 0.
2) Each user level process runs out of an LDT with 2 selectors (currently) --
text and data/stack.  User level processes run at level 3.
3) Each selector in each process's LDT is aliased as a GDT entry (read/write
access) at level 0.    This solves such mundane problems as:  how does the FS
load code on exec into an executable segment?  MM also knows where these GDT
entries are (essentially each region allocated maps to 1 GDT entry).
4) Added to the system task some new messages.
They are MAKE_REGION, DESTROY_REGION.  When the MM makes/destroys regions it
needs to construct/delete GDT entries.  Its somewhat kludgey but seemed
reasonable at the time.
5) Certain initialization of the TTY console driver is necessary.  Since we are
now using virtual addressing, we don't have to be concerned about where the
video ram is located once we have a virtual address.  Video ram is accessed at
level 0.
6) Interrupt initialization becomes somewhat more involved.  All interrupts are
currently task gates.  I didn't see much of a reason to use trap and interrupt
gates anywhere; since in Minix all context is saved on the way into an interrupt
anyway.  We may as well finish the task switch.  This means the kernel now
becomes a task (unprotected Minix it seems the kernel just assumes the identity
of the caller.
7) System call server and task switching.
After I drop into protected mode, I execute the following subroutine:
/* this acts as a system call server -- it rest hangs the kernel task in the
while
 * loop
 */
static void startup_protected_mode()
{
	set_task_register(KERNEL_TSS_SELECTOR);
	
	
	disable();
	while(1) {
		restart();		/* run new proc */
		
		/* can only get here from context of previous scheduled 
		 * process -- calling semantics put function in CX,
		 * src/dest in ax and message pointer in bx.
		 */
		sys_call(proc_ptr->proc_context.cx_image,  cur_proc,
proc_ptr->proc_context.ax_image,
			  proc_ptr->proc_context.bx_image);
	}
}

I've totally munged the proc structure.  I put the TSS in the proc table  The
LDTs also currently sit in the proc table.

Restart looks like this now:
PUBLIC void restart()
{	
	short ps;
	
	ps = disable();		/* no interrupt while task switching */
	
	clear_out_backlink_chain();
	if(cur_proc == IDLE)
		far_jump(0, BUILD_SELECTOR(IDLE_TSS_INDEX, GDT, 0));
	else	
		far_jump(0, proc_ptr->proc_tss_selector);	/* do a task switch */
	
	
	restore(ps);	/* put interrupts back */
}

I hope the above two code examples give a feel for what I'm doing.
An Intr32 will cause a task switch into the kernel.  A far jump to a Task State
Segment starts a new task running and saves the old state.  Pretty neat (by the
way, a task switch on the 286 takes about 185 clock cycles.

8) Had to supply necessary 286 opcodes via codemacros for Aztec assembler.
Supplied simple subroutine library to access these special opcodes from C.
9) Had to build GDT and IDT in real mode before switching into protected mode.
10) Removed some address space checks in the kernel.  Replaced the umap
mechanism with the following function:
/* Checks to see if the selected virtual addresses are legal for the
 * selected process.
 * If TRUE, it will not generate exceptions.
 * If returns FALSE, it will cause problems (exceptions) if attempted.
 * Runs with current ldt of calling process.
 *
 * It basically takes the place of umap in a virtual addressed machine.
 */
int check_proc_addr_space(rp, address, selector, size)
register PROC *rp;
int selector;		/* segment (to become selector) */
char *address;		/* address within segment 	*/
unsigned size;		/* size of block within segment */


OBSERVATIONS

The protection mechanisms make things much easier to debug when there are major
malfunctions in software.  Generally, the offending CS:IP is displayed, so it is
possible to see where the failure occurred.

This was also of use bringing up the system.  Once I got I/O going to the
screen, a large amount of the system become self-diagnostic.

I did have one bad bug which required having a seperate idle task.  Also kernel
mode must run with interrupts disabled.  I initially idled in the kernel
halting (wait for interrupt), but eventually it caused glitches.  I understood
the problem well enough to know how to fix it (make idle a seperate task) but
didn't do a thorough analysis on it.

I totally changed dmp around.  F1 reports virtual addresses, F2 reports the
memory map.  Kinda neat to see all my user tasks running in the same same
virtual address space (LDT selector 0 for code, LDT selector 1 for data, both at
level 3).

With some optimization, certain things should be much faster (i.e. context
switching).  I'm not sure of performance with respect to the old kernel -- but I
know the protection mechansisms buys much more reliability.  Minor performance
degradation is definitely worth it.

DISTRIBUTION

I'm not sure at the moment I  have the facilities to release this.   Everything
I'm doing is based on Aztec C and I use a good percentage of the Aztec library
for processor dependent stuff.  I use a few simple Aztec subroutines (i.e.
index, movblock, movmem, port access instructions) + some Xinu-like stuff
(disable()/restore() instead of lock()/unlock()).  

When I recompiled the kernel on the Aztec 4.1 compiler, it crashed.  There were
some new (and innovative) bugs in the 4.1 compiler which my kernel tripped over.
So I'm compiling the kernel with the 3.4 compiler and the memory manager with
the 4.1 compiler.  If you have 3.4, you should have no problems.

I suppose I could supply specifications for the interested programmer to rewrite
the Aztec supplied subroutines (their really trivial) or I could rewrite them
myself (a few hours of work) or see if Manx will allow me to distribute a few
hundred lines of copyrighted assembler.  In addition, the Manx assembler
supports some nice assembler features in a macro package (i.e. a procdef macro
to automatically pull stuff of the stack).  I generally find stuff like:
	mov	ax,16[bp]
 kinda impossible to follow.

The changes I've made were so extensive, I'm not sure it would be reasonable to
post diffs.  I'd be willing to put out source and binary versions of what I'm
working with if its okay with Andy.

To the kernel I've added the following files:
   
   286opcod.asm -- provices C interfaces to 286 opcodes
   intr286.c    -- interrupt support routines for managing IDT and handling
    					exceptions
   mpx286.asm   --   
   misc286.c    -- assorted code (generally support routines to manage
descriptor tables)
   klib286.asm  -- 286 dependent assembler support (i.e. dma_read/write,
    					 read cmos ram)
   286info.h	-- defines template structures for 286 descriptors/TSS/etc.
Provides a number of macros for a large number of predefined segments.
   80286.h	-- code macros for the assembler for 80286 special opcodes

The above accounts for about 2000 lines of new, 286 specific code in addition to
all the changes sprinkled in with the base system. 
 
 To the memory manager, I added a file called region.c which started to allocate
memory in regions.  I also made extensive changes to the fork/exec
implementation as outlined above.     

FURTHER WORK

The following is a list of some things I'm going to be doing (in no particular
order):
1) start replacing some system task messages with call gates (i.e. let mm or fs
treat them as normal subroutine calls).
2)  Develop a seperate interface between user procs and the kernel to treat all
pointers at 32 bit virtual addresses.  This would be in addition to the way the
current kernel interface works.  The kernel, mm and fs would start to treat
address as 32 bit virtual address and the Interrupt 32 handler would have to
repackage the 16 bit address into 32 bit virtual addresses.
3) Let kernel, mm and fs map in the user proc address space (in LDT) to access
user memory.
  Prior to move to/from user/supervisor space, privelege has to examined to make
sure OS won't cause a protection exception.  
4) Encode the initial IP in the a.out files (instead of defaulting to 0).  I
find it a pain in the ass to have to link in crtso in front of every project.
This looks like a simple enough enhacnement.
5) Develop a scheme for installable device drivers.  Or at least a reasonable
scheme to include device drivers with code, data and stack space seperate than
the kernel. This may be handy for certain uses (ethernet?)
6) My boot loader sometimes hangs.  Don't know why.  Sometimes I need to
powerdown a few times before I finally can boot.
7) Look into bringing up ethernet capability using the 3com  ethernet board.
I'd want to do most reasonable things (ftp, login, file server) via XNS and
TCP/IP.  I'd also think it would be spiffy to bring up a subset of Cornell
Bridge on Minix (Bridge allows XDE to act as a windowed front end for a BSD 4.3
system running XNS).
8) Start using extended memory.

marty
ARPA:	leisner.henr@xerox.com
GV:  leisner.henr
NS:  martin leisner:henr801c:xerox