aglew@crhc.uiuc.edu (Andy Glew) (09/19/90)
> [T]he UMAX4.3 kernel, which, in >turn, should not be confused in any way with a parallelized version of >the 4.3BSD kernel. UMAX4.3 only provides 4.3BSD-like kernel features >over a base UMAX"4.2" kernel (VM, process control, etc) - it is not >a "port" of the 4.3BSD kernel. [The kernel network code is a major >exception to this]. That base kernel is still a reflection of the >original (1985) design for UMAX. This is not to say that design is >wrong (which would be wrong to say, because it mostly works!), but >rather to point out that some continuing problems - such as VM - >are because of the original design. > >John Robert LoVerso Home: john@loverso.loem.ma.us I'm jumping onto this quote because it is evidence of a problem I have seen several times in OS development that takes advantage of specialized features of a particular computer architecture (in this case, parallelism). I'm posting to comp.arch and comp.os.research because this is an OS/architecture issue; I'm posting to comp.sys.encore because the original post was there. In short: if you take advantage of specific architectural features in your OS, you may make it difficult to update the OS. Many computer companies nowadays do not really develop their OS from scratch - rather, they take BSD, or MACH, or System V, and port it to their hardware. Thus, these computer companies are caught in the middle: they would like to pass on to their customers updates of the OS they are based on, as well as the computer company's own "value added" for the OS. If the changes to the generic OS to support the computer company's architecture are well isolated and encapsulated, there's little problem - the changes are re-done, and the customer receives BSD4.3 just a little after the company received it. eg. Examples of such "controlled" architectural dependencies in the OS are different virtual memory structures, device drivers, etc. However, if the changes to the generic OS to support the computer company's architecture are widespread, then a lot of code needs to be changed on each update of the base OS. "Parallelizing" the UNIX kernel seems to be an example of such a change, to take advantage of a specific architectural feature, that produces widespread changes to the kernel source. The initial steps of paralellization are small: typically a global kernel semaphore, then a few finer grain semaphores to allow concurrent file activity, etc. Eventually, there is a lot of parallel synchronization spread in different places throughout the code - and it becomes rather a pain to update the base version of the OS. I believe that we have seen this occur with both of the vendors of shared memory parallel machines, Sequent and Encore. Both had, I believe, a BSD4.2 porting base for their parallel versions. Both took a long time getting up to BSD4.3 (I remember watching from a competitor that was BSD4.3 almost before (:-{) 4.3 was official, wondering what was taking Sequent so long), and when they announced BSD4.3 it was really just an update of user level programs and a few kernel things (networking especially), around the underlying BSD4.2 kernel. Sequent may have a "true" BSD4.3 kernel by now (or their kernel may be so different that it can no longer be called BSDish), but the post that started me off seems to indicate that Encore is not yet "true" BSD4.3. "Parallelizing" the UNIX kernel really isn't all that hard. It's been done many times, at many different companies. Some have taken the gradual approach of applying first coarse grain locks, and then refining them; some have "totally redesigned" the underlying kernel. The reason why we do not see a great variety of parallel UNIXes is not that it's difficult to do, but that parallelizing UNIX means that you're in for a lot of grungy, expensive, software maintenance work, treading water trying to keep up with updates from BSD or AT&T or SUN. How to avoid this problem?: (1) Become very good at the *process* of parallelizing your kernel. Develop tools, etc. so that you can easily parallelize the updates from your kernel OS supplier, even if the underlying OS changes greatly. (2) Give your OS changes back to your OS supplier. Really specific architectural changes may not be interesting to the supplier; but parallelism is of pretty general interest and utility. A reasonably portable shared memory parallel kernel is probably possible. There are several gotchas here: a) persuading your company that you should give all of this proprietary code away may be difficult - isn't that giving away all of your competitive advantage? Well, yes it is - if everybody can use the same parallel OS, then all you have to compete on is your hardware performance. b) even if your company is willing to give your parallel OS back to the OS supplier, the OS supplier may not want to take it. They may not trust you - they may be afraid that your parallel OS will only run well on your hardware. They may have NIH syndrome (AT&T suffers that greatly). Or theymay just not want to bother - after all, they don't have a need for the features you've added. Eventually, something like the features you want will become part of the standard BSD or AT&T UNIX; so maybe the best plan is just to admit that whatever you're building is a stopgap, until your OS supplier starts supplying an OS with the feature you want. -- Andy Glew, a-glew@uiuc.edu [get ph nameserver from uxc.cso.uiuc.edu:net/qi]
aglew@crhc.uiuc.edu (Andy Glew) (09/19/90)
> [T]he UMAX4.3 kernel, which, in >turn, should not be confused in any way with a parallelized version of >the 4.3BSD kernel. UMAX4.3 only provides 4.3BSD-like kernel features >over a base UMAX"4.2" kernel (VM, process control, etc) - it is not >a "port" of the 4.3BSD kernel. [The kernel network code is a major >exception to this]. That base kernel is still a reflection of the >original (1985) design for UMAX. This is not to say that design is >wrong (which would be wrong to say, because it mostly works!), but >rather to point out that some continuing problems - such as VM - >are because of the original design. > >John Robert LoVerso Home: john@loverso.loem.ma.us I'm jumping onto this quote because it is evidence of a problem I have seen several times in OS development that takes advantage of specialized features of a particular computer architecture (in this case, parallelism). I'm posting to comp.arch and comp.os.research because this is an OS/architecture issue; I'm posting to comp.sys.encore because the original post was there. In short: if you take advantage of specific architectural features in your OS, you may make it difficult to update the OS. Many computer companies nowadays do not really develop their OS from scratch - rather, they take BSD, or MACH, or System V, and port it to their hardware. Thus, these computer companies are caught in the middle: they would like to pass on to their customers updates of the OS they are based on, as well as the computer company's own "value added" for the OS. If the changes to the generic OS to support the computer company's architecture are well isolated and encapsulated, there's little problem - the changes are re-done, and the customer receives BSD4.3 just a little after the company received it. eg. Examples of such "controlled" architectural dependencies in the OS are different virtual memory structures, device drivers, etc. However, if the changes to the generic OS to support the computer company's architecture are widespread, then a lot of code needs to be changed on each update of the base OS. "Parallelizing" the UNIX kernel seems to be an example of such a change, to take advantage of a specific architectural feature, that produces widespread changes to the kernel source. The initial steps of paralellization are small: typically a global kernel semaphore, then a few finer grain semaphores to allow concurrent file activity, etc. Eventually, there is a lot of parallel synchronization spread in different places throughout the code - and it becomes rather a pain to update the base version of the OS. I believe that we have seen this occur with both of the vendors of shared memory parallel machines, Sequent and Encore. Both had, I believe, a BSD4.2 porting base for their parallel versions. Both took a long time getting up to BSD4.3 (I remember watching from a competitor that was BSD4.3 almost before (:-{) 4.3 was official, wondering what was taking Sequent so long), and when they announced BSD4.3 it was really just an update of user level programs and a few kernel things (networking especially), around the underlying BSD4.2 kernel. Sequent may have a "true" BSD4.3 kernel by now (or their kernel may be so different that it can no longer be called BSDish), but the post that started me off seems to indicate that Encore is not yet "true" BSD4.3. "Parallelizing" the UNIX kernel really isn't all that hard. It's been done many times, at many different companies. Some have taken the gradual approach of applying first coarse grain locks, and then refining them; some have "totally redesigned" the underlying kernel. The reason why we do not see a great variety of parallel UNIXes is not that it's difficult to do, but that parallelizing UNIX means that you're in for a lot of grungy, expensive, software maintenance work, treading water trying to keep up with updates from BSD or AT&T or SUN. How to avoid this problem?: (1) Become very good at the *process* of parallelizing your kernel. Develop tools, etc. so that you can easily parallelize the updates from your kernel OS supplier, even if the underlying OS changes greatly. (2) Give your OS changes back to your OS supplier. Really specific architectural changes may not be interesting to the supplier; but parallelism is of pretty general interest and utility. A reasonably portable shared memory parallel kernel is probably possible. There are several gotchas here: a) persuading your company that you should give all of this proprietary code away may be difficult - isn't that giving away all of your competitive advantage? Well, yes it is - if everybody can use the same parallel OS, then all you have to compete on is your hardware performance. b) even if your company is willing to give your parallel OS back to the OS supplier, the OS supplier may not want to take it. They may not trust you - they may be afraid that your parallel OS will only run well on your hardware. They may have NIH syndrome (AT&T suffers that greatly). Or theymay just not want to bother - after all, they don't have a need for the features you've added. Eventually, something like the features you want will become part of the standard BSD or AT&T UNIX; so maybe the best plan is just to admit that whatever you're building is a stopgap, until your OS supplier starts supplying an OS with the feature you want.