mcm@peach.ucsb.edu (Marcelo Mourier) (09/25/90)
Hi everybody! For the last couple of months I've been working on an experimental 32-bit kernel for the i386. As Bruce's 32-bit kernel, this one is based on PC Minix 1.5. However, several changes have been made, especially in the way memory is managed. As I don't have Bruce's kernel installed, I'm not very familiar with it. However, I think that (correct me if wrong), even though he uses paging, the memory management strategy used is still the same as in old Minix; eg., it is basically segment oriented. There's one code and one data segment (now up to 4GB long) both starting at virtual address zero. These virtual addresses are mapped into linear addresses by the 386's segmentation mechanism, based on the memory map information in the code and data segment descriptors stored in the process' LDT. With paging enabled, these linear addresses are finally mapped into physical addresses by the 386's paging mechanism. However, this mapping is "constant", in that it never changes after being set up by the kernel initialization routine. Basically, it is used for skipping the various holes in the PC's physical memory layout, thus presenting a "neat" physical address space where the linear addresses can be mapped into. In this way, the relocation and protection of the various virtual address spaces is done by the segmentation unit of the 386's MMU. In my experimental kernel (MEK) the memory management strategy is quite different. The story starts with the way virtual memory is laid out in a process. Each process has its own 4GB virtual address space, shared by its three logical segments (text, data, stack) and by the kernel. The kernel occupies the top-most 8MB of each process' virtual address space; the remaining 4GB-8MB are left for the process' text, data, and stack segments. The text segment starts at virtual address zero, the data segment starts at the first available 4MB boundary after the end of the text segment (e.g., at virtual address 0x00400000 in most cases), and the stack segment starts (ends?) at virtual address 0xFF800000. As I said before, the last 8MB of the process' virtual address space are reserved for mapping the kernel. This is how the kernel is shared among all processes in the system. Addresses 0xFF800000 to 0xFFBFFFFF hold the kernel's text segment, and addresses 0xFFC00000 to 0xFFFFFFFF hold the kernel's data segment. The reason for starting the data segment at the first 4MB boundary after the end of the text segment has to do with code sharing. By having the data segment start at that address we are separating the code page table and the data page table. Therefore, code can be shared among several processes (by sharing the PDE that points to the code page table), without having to share any piece of the data segment. MEK doesn't use LDT's for defining a process' address space. It uses only the GDT, which contains eight segment descriptors: null, kernel code, kernel data, task code, task data, user code, user data, and TSS. All tasks share the task code and data descriptors, and all user processes (including MM and FS) share the user code and data descriptors. The kernel and task descriptors define segments that span the whole 4GB virtual address space, and that are based at linear address zero. In this way, the kernel has access to the whole address space of a process. The user descriptors define segments that start at linear address zero and end at linear address 0xFF7FFFFF, thus limiting the user's virtual address space size to 4GB-8MB. The code segment descriptors have different DPL's: the kernel has DPL=0, the task has DPL=1, and the user has DPL=3. In this way we make tasks run at CPL=1 and user processes run at CPL=3, in the same way as in current PC Minix. As all segments are based at linear address zero, there's no distinction between virtual addresses and linear addresses. The relocation of the different process' virtual address spaces is done by the paging unit of the 386's MMU. In MEK each process has its own set of page directory and page tables, that is an integral part of its memory map information. This set is comprised of at least four pages: the page directory table (PDT), one code page table (CPT), one data page table (DPT), and one stack page table (SPT). In addition, there are the kernel code page table and the kernel data page table which are shared by all processes. The proc structure has a new entry, p_pdt, which contains the physical address of the PDT of the process. When a process is restarted by restart(), register CR3 in the 386 is reloaded with the value stored in the p_pdt field of the process table entry of the process being restarted. In this way, the virtual -> physical mapping is switched to that of the new process. Managing address translations in this way has some interesting consequences. For once, at any given time the kernel knows about only one memory map (the one of the currently active process), which means that any virtual address from a different process is meaningless for him. Secondly, the kernel ONLY knows about the physical memory used by itself and by the currently active process; any other piece of memory is not directly addressable by him. This has some nasty consequencies when a block of physical memory needs to be copied to some arbitrary physical address. In order to be able to access ANY page of physical memory, the kernel uses the following trick. Two pages in the kernel's data address space are reserved for use as source and destination "window-pages". The kernel dinamically maps these pages to the corresponding physical pages he wants to access. After the mapping is set up, the kernel can copy the data by reading from the source window and writing to the destination window. Accessing video memory is done in the same way. A third page in the kernel's data address space is reserved for the "video window", which is mapped into the video RAM in the video adapter during system initialization. Same for the BIOS vector table and BIOS data area, which are located in the first page of physical memory. This page frame is mapped into the kernel's BIOS window-page, also during initialization. MEK is in its debugging stage; that is, it is NOT working yet. Making it work may take a couple of days or a couple of months, depending on how many bugs there are. The main purpose of this mail is to see if there's anyone out there who would be interested in helping during the debugging. By sharing the job among a couple of people we might have it up and running sonner (if ever!). The "prerequisites" for this kind of job are: (1) a solid understanding of how PC Minix 1.5 works in protected mode (even better, how Bruce's 32-bit kernel works). (2) be very familiar with the architecture of the i386, especially everything related to its MMU. (3) have a 32-bit development system where to assemble/compile MEK. This last point requires more explanation. As I said above I don't have Bruce's 32-bit Minix installed in my machine. I did all MEK's work under SCO Unix Sys V/386. This Unix comes with the Microsoft 32-bit Software Developement Tools (masm, cc, etc.). This means that if you have a different assembler (gas, etc.), all the assembly files will have to be rewritten... I know, picking up Microsoft's Macro Assembler was not a very good decision, but at that time it was the only well documented set of tools I had available... I'm far from being a good assembly language programmer (if such a thing exists!); so, if there's anybody familiar with some other 386 assembler (gas?), who's also good on squeezing bytes and clock cycles, and who is willing to revise and translate the assembly files, please also contact me. The executable file format used by my link-editor is AT&T's COFF (Common Object File Format). This means that if your link-editor uses other file format, some modifications will have to be made in 'mkimage', the program that creates the binary image of the OS from the various pieces that constitute it. Finally, what I've done is just the kernel work. Some changes will have to be made in MM, in order to reflect the new way memory is managed. These changes, however, should be much smaller than the ones required in the kernel. I hope all this blah, blah, blah, is of any interest for any of you. Otherwise, sorry for the noise! For those of you who would like to help in any way, please drop me a mail. If you have further questions, I'll try to answer them. However, this project is keeping me VERY busy, so don't expect my reply holding your breath... Bye everyone! -- Marcelo Mourier (mcm@cs.ucsb.edu)