[comp.os.minix] Announcement - Experimental 32-bit Kernel

mcm@peach.ucsb.edu (Marcelo Mourier) (09/25/90)
Hi everybody!

	For the last couple of months I've been working on an experimental
32-bit kernel for the i386. As Bruce's 32-bit kernel, this one is based on
PC Minix 1.5.  However, several changes have been made, especially in the way
memory is managed.

	As I don't have Bruce's kernel installed, I'm not very familiar with
it.  However, I think that (correct me if wrong), even though he uses paging,
the memory management strategy used is still the same as in old Minix; eg.,
it is basically segment oriented.  There's one code and one data segment
(now up to 4GB long) both starting at virtual address zero.  These virtual
addresses are mapped into linear addresses by the 386's segmentation mechanism,
based on the memory map information in the code and data segment descriptors
stored in the process' LDT.  With paging enabled, these linear addresses are
finally mapped into physical addresses by the 386's paging mechanism. However,
this mapping is "constant", in that it never changes after being set up by the
kernel initialization routine.  Basically, it is used for skipping the various
holes in the PC's physical memory layout, thus presenting a "neat" physical
address space where the linear addresses can be mapped into.  In this way, the
relocation and protection of the various virtual address spaces is done by the
segmentation unit of the 386's MMU.

	In my experimental kernel (MEK) the memory management strategy is quite
different.  The story starts with the way virtual memory is laid out in a
process.  Each process has its own 4GB virtual address space, shared by its
three logical segments (text, data, stack) and by the kernel.  The kernel
occupies the top-most 8MB of each process' virtual address space; the remaining
4GB-8MB are left for the process' text, data, and stack segments.  The text
segment starts at virtual address zero, the data segment starts at the first
available 4MB boundary after the end of the text segment (e.g., at virtual
address 0x00400000 in most cases), and the stack segment starts (ends?) at 
virtual address 0xFF800000.  As I said before, the last 8MB of the process'
virtual address space are reserved for mapping the kernel.  This is how the
kernel is shared among all processes in the system.  Addresses 0xFF800000 to
0xFFBFFFFF hold the kernel's text segment, and addresses 0xFFC00000 to
0xFFFFFFFF hold the kernel's data segment.  The reason for starting the data
segment at the first 4MB boundary after the end of the text segment has to do
with code sharing.  By having the data segment start at that address we are
separating the code page table and the data page table.  Therefore, code can be
shared among several processes (by sharing the PDE that points to the code page
table), without having to share any piece of the data segment.

	MEK doesn't use LDT's for defining a process' address space.  It uses
only the GDT, which contains eight segment descriptors: null, kernel code,
kernel data, task code, task data, user code, user data, and TSS.  All tasks
share the task code and data descriptors, and all user processes (including MM
and FS) share the user code and data descriptors.  The kernel and task 
descriptors define segments that span the whole 4GB virtual address space, and
that are based at linear address zero.  In this way, the kernel has access to
the whole address space of a process.  The user descriptors define segments 
that start at linear address zero and end at linear address 0xFF7FFFFF, thus
limiting the user's virtual address space size to 4GB-8MB.  The code segment
descriptors have different DPL's: the kernel has DPL=0, the task has DPL=1,
and the user has DPL=3.  In this way we make tasks run at CPL=1 and user
processes run at CPL=3, in the same way as in current PC Minix.  As all 
segments are based at linear address zero, there's no distinction between
virtual addresses and linear addresses.  The relocation of the different
process' virtual address spaces is done by the paging unit of the 386's MMU.
In MEK each process has its own set of page directory and page tables, that
is an integral part of its memory map information.  This set is comprised of
at least four pages: the page directory table (PDT), one code page table (CPT),
one data page table (DPT), and one stack page table (SPT).  In addition, there
are the kernel code page table and the kernel data page table which are shared
by all processes.  The proc structure has a new entry, p_pdt, which contains
the physical address of the PDT of the process.  When a process is restarted by
restart(), register CR3 in the 386 is reloaded with the value stored in the
p_pdt field of the process table entry of the process being restarted.  In this
way, the virtual -> physical mapping is switched to that of the new process.

	Managing address translations in this way has some interesting 
consequences.  For once, at any given time the kernel knows about only one
memory map (the one of the currently active process), which means that any
virtual address from a different process is meaningless for him.  Secondly, the
kernel ONLY knows about the physical memory used by itself and by the currently
active process; any other piece of memory is not directly addressable by him.
This has some nasty consequencies when a block of physical memory needs to be
copied to some arbitrary physical address.  In order to be able to access ANY
page of physical memory, the kernel uses the following trick.  Two pages in the
kernel's data address space are reserved for use as source and destination
"window-pages".  The kernel dinamically maps these pages to the corresponding
physical pages he wants to access.  After the mapping is set up, the kernel
can copy the data by reading from the source window and writing to the
destination window.

	Accessing video memory is done in the same way.  A third page in the
kernel's data address space is reserved for the "video window", which is
mapped into the video RAM in the video adapter during system initialization.
Same for the BIOS vector table and BIOS data area, which are located in the
first page of physical memory.  This page frame is mapped into the kernel's
BIOS window-page, also during initialization.

	MEK is in its debugging stage; that is, it is NOT working yet.  Making
it work may take a couple of days or a couple of months, depending on how many
bugs there are.  The main purpose of this mail is to see if there's anyone out
there who would be interested in helping during the debugging.  By sharing the
job among a couple of people we might have it up and running sonner (if ever!).
The "prerequisites" for this kind of job are: (1) a solid understanding of how
PC Minix 1.5 works in protected mode (even better, how Bruce's 32-bit kernel
works). (2) be very familiar with the architecture of the i386, especially 
everything related to its MMU. (3) have a 32-bit development system where to
assemble/compile MEK.  This last point requires more explanation.  As I said
above I don't have Bruce's 32-bit Minix installed in my machine.  I did all
MEK's work under SCO Unix Sys V/386.  This Unix comes with the Microsoft 32-bit
Software Developement Tools (masm, cc, etc.).  This means that if you have a
different assembler (gas, etc.), all the assembly files will have to be 
rewritten...  I know, picking up Microsoft's Macro Assembler was not a very
good decision, but at that time it was the only well documented set of tools
I had available...  I'm far from being a good assembly language programmer (if
such a thing exists!); so, if there's anybody familiar with some other 386
assembler (gas?), who's also good on squeezing bytes and clock cycles, and who
is willing to revise and translate the assembly files, please also contact me.
The executable file format used by my link-editor is AT&T's COFF (Common Object
File Format).  This means that if your link-editor uses other file format, some
modifications will have to be made in 'mkimage', the program that creates the
binary image of the OS from the various pieces that constitute it.

	Finally, what I've done is just the kernel work.  Some changes will
have to be made in MM, in order to reflect the new way memory is managed.
These changes, however, should be much smaller than the ones required in the
kernel.

	I hope all this blah, blah, blah, is of any interest for any of you.
Otherwise, sorry for the noise!  For those of you who would like to help in
any way, please drop me a mail.  If you have further questions, I'll try to
answer them.  However, this project is keeping me VERY busy, so don't expect
my reply holding your breath...

Bye everyone!



--
Marcelo Mourier (mcm@cs.ucsb.edu)