john@dido.UUCP (John Collins) (11/29/84)
Why do so many commands such as 'ps', 'ipcs' and what have you have to use the /unix namelist to find out kernel addresses? I'd like to propose subdivisions of /dev/kmem, thus /dev/kmem/proc for the process table /dev/kmem/inode for the inode table and so forth. Implementation would be trivial. Think of the advantages: 1. "Anyone" could write their own "ps" without being superuser with X-ray vision on /dev/*mem etc. 2. You could control access to the various bits as you wished - no worrying about people monitoring clists for passwords etc. 3. Ps would run a lot faster not having to pick its way through the symbol table of /unix. 4. Ps (and other such programs) would not have to know if the current system wasn't /unix. Should be an environment variable at present anyway. Ok - what have I overlooked..... Start flaming now!! -- John Collins Please note that I am visiting Sweden. Address all replies to ist!inset!jmc Phone: +44 727 57267 Snail mail: 47 Cedarwood Drive, St Albans, Herts, AL4 0DN, England.
olson@fortune.UUCP (Dave Olson) (12/02/84)
At Fortune Systems, we took a somewhat different approach. What we did was to fix forever in low core the important variables (or pointers thereto, in the case of tables such as the proc table.) This does indeed speed up ps, pstat, vmstat, etc. We also put the SIZES of the various structures (user, proc, etc) in low core. This means that (with suitable re-writing of ps, pstat, etc.) you can even change the sizes of the structures without re-compiling all the utilities, as long as you are simply adding new stuff at the end. It means a bit more work in the auto-configuration code, but since we load drivers from prom, and the driver will have to work with multiple releases of the OS, the driver's can't have the values compiled into them, and therefore have to be able to look them up in known locations. As a side note, the number of proc's, clists, disk buffers, etc. are settable by a user mode program, which writes the info into an EAROM, which is then examined by the kernel at auto-configuration, so we absolutely CAN'T have table sizes compiled in. Dave Olson, Fortune Systems UUCP: {ihnp4,ucbvax!amd}!fortune!olson ARPA: amd!fortune!olson@BERKELEY
perry@heurikon.UUCP (Perry Kivolowitz) (12/02/84)
> Why do so many commands such as 'ps', 'ipcs' and what have you have to use > the /unix namelist to find out kernel addresses? > > I'd like to propose subdivisions of /dev/kmem, thus > > /dev/kmem/proc for the process table > /dev/kmem/inode for the inode table > > and so forth. Implementation would be trivial. > > Think of the advantages: > > 1. "Anyone" could write their own "ps" without being superuser with > X-ray vision on /dev/*mem etc. > > 2. You could control access to the various bits as you wished - no > worrying about people monitoring clists for passwords etc. > > 3. Ps would run a lot faster not having to pick its way through the > symbol table of /unix. > > 4. Ps (and other such programs) would not have to know if the current > system wasn't /unix. Should be an environment variable at present > anyway. > > Ok - what have I overlooked..... Start flaming now!! > > -- > John Collins > It has been said often that the best ideas are the ones which are simple. This seems to me to be an effective yet simple way to break key data structures into the open. The idea of doing so is not a new one to this news group, numerous persons voiced arguments both for and against such a tack. (I can't recall any of the arguments against...someone want to re- fresh me?) To my mind the most important of the points listed by John is making programs which currently require the name-list of the currently executing o.s., name-list independent. It's a good idea - and fits in well with established UNIX* philosophy. Another point (though pretty obvious, is well worth making) is that im- plementing /dev/kmem/data-structure files will not in any way make the current ps, df, etc incompatible. Not a point to be dismissed lightly. Related topics: Research at Stony Brook University; Viewing data struc- tures in the kernel as relations and operating on them as such. ``Processes as Files'', Thomas J. Killian, 1984 Summer Usenix Conference. Perry S. Kivolowitz Heurikon Corporation ----------------------------------------------- *UNIX is a trademark of a once proud but now morally destitute company.
sjr@ubu.UUCP (Stephen J. Rumsby) (12/02/84)
[eat me] I too would be interseted in details of this. I have heared about these devices before but have, as yet been unable to track down any details. Has anybody out there implemented them? Steve. PS. When I was told about them they were given the name "Kernel tables".
joe@fluke.UUCP (Joe Kelsey) (12/03/84)
I think that one system cal UNIX has been missing for years is one to return kernel structures. Sure, /dev/kmem is a very general way to provide whatever access you want, but it is an open hole into the system and as such it is open to security problems, etc. A system call can be made much more secure and can also provide MUCH faster access to the required structures than hacking around with namelists and kmem... However, it does complicate the case of ps, etc., reading core dumps. No matter what change you propose to /dev/kmem, it is bound to break ps, pstat, etc., access to crash dumps. I really like being able to use ps on crashes, but I also dislike the speed penalty you pay for runtime-access. There must be a better (or at least faster) way! As I see it, ther is no easy solution (or free lunch for that matter). /Joe
gijs@sara70.UUCP (gijs) (12/04/84)
In <161@dido.UUCP> john@dido.UUCP (John Collins) writes: >Why do so many commands such as 'ps', 'ipcs' and what have you have to use >the /unix namelist to find out kernel addresses? >I'd like to propose subdivisions of /dev/kmem, thus > /dev/kmem/proc for the process table > /dev/kmem/inode for the inode table >and so forth. Implementation would be trivial. >Think of the advantages: >1. "Anyone" could write their own "ps" without being superuser with > X-ray vision on /dev/*mem etc. And what about the X-ray vision on /dev/swap? >2. You could control access to the various bits as you wished - no > worrying about people monitoring clists for passwords etc. The best thing to do is *not* to give any access to system tables. Accessible tables will soon be used by many application programs which prevents format changes of such tables. And even if you restrict access to the super user it would be a bad idea. Adding some feature to a ps-like program should never mean adding another "view" to a kernel driver. There are also practical problems: - how do you interpret pointers to things? - what to do if you need many tables (=files) open at the same time on small machines? >3. Ps would run a lot faster not having to pick its way through the > symbol table of /unix. I suspect that ps spends most of its time reading swap files. Gijs Mos, Free University Dept of Biology Amsterdam {seismo,decvax,philabs}!mcvax!sara70!gijs
chris@umcp-cs.UUCP (Chris Torek) (12/05/84)
Actually, I profiled ``ps'' to see why it was so slow. On 4.2BSD at least, ps aux spends the majority of its time in the stat() system call, poking through /dev to find all the tty numbers.... -- (This line accidently left nonblank.) In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (301) 454-7690 UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@maryland
jim@haring.UUCP (12/05/84)
>3. Ps would run a lot faster not having to pick its way through the > symbol table of /unix. A long time ago on V7 I changed 'ps' to only read the name list once on boot, then store the useful stuff in a file somewhere. It only then needs to check that the file exists and do a stat on /(vm)unix to tell whether it really needs to plow through the namelist once more. I'm sure a lot of others did the same; I notice the most recent version of the 4.2BSD 'ps' from Berkeley does it. Jim McKie Centrum voor Wiskunde en Informatica, Amsterdam mcvax!jim
chuqui@nsc.UUCP (Cheshire Chuqui) (12/05/84)
References <161@dido.UUCP> <1974@vax4.fluke.UUCP> Reply-To: chuqui@nsc.UUCP (Cheshire Chuqui) Distribution: Organization: Plaid Heaven Keywords: Summary: >However, it does complicate the case of ps, etc., reading core dumps. >No matter what change you propose to /dev/kmem, it is bound to break >ps, pstat, etc., access to crash dumps. I really like being able to >use ps on crashes, but I also dislike the speed penalty you pay for >runtime-access. There must be a better (or at least faster) way! > >As I see it, ther is no easy solution (or free lunch for that matter). Actually, there is. If there were decent crash analysis tools, we wouldn't need to hack out the run-time tools to do crash analysis for us and take the associated run-time penalties. I have a distinct aversion for programs that run exceptionally slow because of a seldom-used special case-- better to make two versions; one fast for normal use and one for the special case. I've looked at converting kmem reads to system calls-- in my copious free time one of these days I'd like to implement it. Kmem, to put it simply, scares me. chuq -- From the center of a Plaid pentagram: Chuq Von Rospach {cbosgd,decwrl,fortune,hplabs,ihnp4,seismo}!nsc!chuqui nsc!chuqui@decwrl.ARPA ~But you know, monsieur, that as long as she wears the claw of the dragon upon her breast you can do nothing-- her soul belongs to me!~
rogers@dadla.UUCP (Roger Southwick) (12/06/84)
<TAKE THIS BUG!!! SPLAT! HACK!! REND!!>>> What I'd really love to see is a system call which you could get the information needed for ps. This would simplify the dickens out of "ps", "w", and other programs which you need to find out about non-children processes (like an auto-logout based on time program we have that was hacked from the "w" code...). Anyway, if the complete system call is not possible, I would like a cleaner interface into /dev/?mem. Perhaps a system call which you could use to get addresses of various things rather than just knowing how big items are, and looking from the beginning of tables (or however ps handles it). This would make those tables whose size is determined at boot time a breeze to look thru, and really wouldn't cost that much. Something like: getaddrof(item, addressp) int item; char *addressp; "item" could be selected from a list of defines, and we would supply a char pointer to stuff the address into ("addressp", above). The call should only good for the super user, so users can't go looking at other people's data (possibly getting a password...). Well, enough of the strangeness. I suppose it's only wishfull thinking on my part. I guess the handstands one must go through is really job security. I mean, gosh.. then EVEN the unskilled could write a portable "ps"... :-) -Roger UUCP: HOST!tektronix!dadla!rogers Where HOST is any one of: masscomp,decvax,allegra,uf-cgrl,mit-eddie,mit-ems, uoregon,psu-cs,orstcs,zehntel,ucbcad,ucbvax,purdue, uw-beaver,reed,ogcvax,ihnp4,tekred,minn-ua,cbosg CSnet: rogers%dadla@tektronix ARPAnet: rogers%dadla%tektronix@csnet-relay
rpw3@redwood.UUCP (Rob Warnock) (12/08/84)
Just a hint... Other systems have been faced with this problem in the past. It might be worth looking around to see how others have addressed it. For one "reasonable" implementation of system calls to fetch kernel tables, go look at a copy of the TOPS-10 Monitor Calls Manual (I hear they're still sold at some local DEC offices) and look at the "gettab" call and the data structures it references. - There are "n" tables, sub-indexed by item number. New tables (added as the kernel functionality changes) are added to the end. The low numbers don't change across releases. Often the sub-index is the job (UNIX "process") number. Much more (but not all) of the per-job information is kept in main memory than in UNIX (i.e., the "proc" tables are bigger and the "upages" are smaller). Functions are provided to assist in finding things on swap space. - Tables can be variable length, and range-checking is done when you ask for an item. - Some tables are priviledged to the super-user "[1,2]"; some are wide open; and some can be read by anyone if you are asking about "this" job (but only [1,2] can ask about other jobs). - As expected, all of the tables and all of the fields and all of the magic values within fields have symbolic names. Unfortunately, only languages with macro capabilities (such as MACRO-10 or BLISS) can easily get at them. (In UNIX read "as" and "C".) - Provisions are there (and used) to handle sub-tables of arbitrary extent, such as when you have multiple CPUs and want per-CPU data rather than per-system (TOPS-10/SMP supported up to 6 CPUs, last time I looked), such as # of interrupts by CPU, or idle time by CPU, etc. - For any table, you can get the address of the table (if you are [1,2]). - Instead of /dev/mem, per se, there are the general "peek" and "poke" calls, for those (hopefully rare) times one needs to twiddle bits. (c.f. next para.) - A system call named "spy" (apt name) allows the kernel itself to be mapped read-only into your address space, ENORMOUSLY speeding up "systat" (a.k.a. "ps") and "sysdpy" (a.k.a. "mon" in net.sources recently). - TOPS-10 has a restricted form of setuid programs ("jacct" programs), so all of this can be hidden safely for ordinary users to access (like "ps"). [See Note] - There is MUCH MUCH more in these tables than I have ever seen displayed on any TOPS-10 program, or any 4.1bsd program for that matter. One sometimes wonders at the overhead of maintaining the data in the tables! (although much of it there just to help DEC's internal developers tune the system). But looking more closely one sees that you're only paying for pointers to tables that have to be there anyway (pointers are cheap). - Along with the "gettab" facility, there is the "meter" facility which allows a data-gathering program to activate "meter points" in the kernel. The program gets sent an event record each time control passes through such a meter point. One can reduce the number of events recorded by qualifying the event with such things as job number, number of times per event, contents of some data field, etc. All in all, I think UNIX hackers could gain a number of good ideas by browsing through the manual for this ancient but venerable O.S. -- in particular, the software-interrupt, async I/O, and real-time features should drop into UNIX quite nicely (inasmuch as they were added progressively to TOPS-10 itself). [Note] It's my understanding that the Bell Labs patent on "setuid" applies only to cases where the setuid is an attribute of the FILE, and also only when the "uid" to set to can be a VARIABLE, since systems such as TOPS-10 had the restricted case of setuid-root (based on a kernel table of magic filenames) some time before UNIX. Rob Warnock Systems Architecture Consultant UUCP: {ihnp4,ucbvax!dual}!fortune!redwood!rpw3 DDD: (415)572-2607 Envoy: rob.warnock/kingfisher USPS: 510 Trinidad Ln, Foster City, CA 94404
rcd@opus.UUCP (Dick Dunn) (12/08/84)
>... > >However, it does complicate the case of ps, etc., reading core dumps. > >No matter what change you propose to /dev/kmem, it is bound to break > >ps, pstat, etc., access to crash dumps. I really like being able to > >use ps on crashes, but I also dislike the speed penalty you pay for > >runtime-access. There must be a better (or at least faster) way! > > > >As I see it, ther is no easy solution (or free lunch for that matter). > > Actually, there is. If there were decent crash analysis tools, we wouldn't > need to hack out the run-time tools to do crash analysis for us and take > the associated run-time penalties... But there's a big win in having the same tool used for a look at a running system and at a crash dump--you're MUCH more likely to get the same answer. If you see something odd in a ps or a pstat of a sick system, you might like to be able to euthanatize it and find the same thing in the corpse. Moreover, there are probably reasonable ways to streamline or eliminate the digging through the namelist without breaking the usefulness as a debugging tool. > I've looked at converting kmem reads to system calls-- in my copious free > time one of these days I'd like to implement it. Kmem, to put it simply, > scares me. Agree wholeheartedly. I've always been uneasy about it; I've been absolutely paranoid since a moderately clever programmer showed me the program he wrote in about a day to spy on clists and watch any terminal he chose. (Admittedly it missed characters, but...) I'd probably even give up the ability to "adb -w /vmunix /dev/kmem" to indulge my paranoia a bit. Why not retain the current model but put some finer-grained protection in? That is, let the kernel be selective about what areas of memory it will read for a process. This is perhaps ratty, but you don't end up trying to reorganize a system call when you discover that ps, pstat, or some new tool you've designed needs access to a data structure you didn't plan on at first. -- Dick Dunn {hao,ucbvax,allegra}!nbires!rcd (303)444-5710 x3086 ...Are you making this up as you go along?
geoff@utcs.UUCP (Geoff Collyer) (12/13/84)
I beg to differ, Rob. I used TOPS-10 heavily for five years while in high school and university. I think that UNIX does most things better than TOPS-10 and in particular I don't think software interrupts nor asynchronous I/O should be grafted onto UNIX. Berkeley has taken the approach of trying to turn signals into perfect software interrupts and in doing so has complicated UNIX programs that wish to use signals correctly. If you wish to continue is this vein, then TOPS-10-style software interrupts and asynchronous I/O make some sense, but please, spare us the control blocks in user address space. The problem with software interrupts (and thus signal catching) is that they introduce asynchrony into user processes. Since asynchrony is a cause of hard-to-find timing bugs and since it complicates code which exploits it, I would prefer it to be sublimated into activities synchronous with user processes. The Thoth operating system (and more recently Verex and the V system) provides no form of asynchrony within a process, but rather provides cheap processes, message-passing and process destruction. In principle, something happening asynchronously with a process kills the process, doesn't affect it or generates a message for it. These mechanisms have proven sufficient for the tasks undertaken on Thoth and seem sufficient in general to avoid software interrupts and their associated bugs. This is somewhat simplified, but I have yet to see a genuine need for software interrupts, given a Thoth-like environment. I would suggest the Thoth book to anyone who wants more details (I think the full title is Multi-process Structuring and Portability: the Thoth System by David Cheriton, ex of Waterloo, now at Stanford).
gnu@sun.uucp (John Gilmore) (12/13/84)
> Just a hint... Other systems have been faced with this problem in the > past. It might be worth looking around to see how others have addressed it. The Data General AOS and AOS/VS systems solved this problem cleanly. (Actually I'm impressed, now that I've seen Unix, at HOW cleanly they built a system with just about all the features of Unix, with many fewer wild kludges and many more features.) One of the defined system calls took a PID as argument and returned a struct giving info about the process. Think of it as "stat" on a process instead of a file. There was no reason to restrict access to this system call, so you could do it on any process. To find all the jobs in the system, start at pid 1 and recurse asking about all its daughters (that's part of the info returned). Note that the struct returned need not correspond to any particular data structure in the kernel, but can be built at the time you ask for it. Now maybe it took a day longer to write than 'ps', but in cleanliness it is certainly is closer to Heaven.
jmc@ist.UUCP (12/14/84)
> Why not retain the current model but put some finer-grained protection in? > That is, let the kernel be selective about what areas of memory it will > read for a process. This is perhaps ratty, but you don't end up trying to > reorganize a system call when you discover that ps, pstat, or some new tool > you've designed needs access to a data structure you didn't plan on at > first. > Dick Dunn a Gosh it does sound rather like what I originally suggested..... -- John Collins calling courtesy of ist Please reply to ...!mcvax!ist!inset!jmc Phone: +44 727 57267 Snail: 47 Cedarwood Drive, St Albans, Herts, AL4 0DN, England
jans@mako.UUCP (Jan Steinman) (12/15/84)
In article <1869@sun.uucp> gnu@sun.uucp (John Gilmore) writes: >The Data General AOS and AOS/VS systems solved this problem cleanly... To >find all the jobs in the system, start at pid 1 and recurse asking about all >its daughters... How did it deal with daemons? Were all processes required living parents? (Please don't flame me! I'm just a curious person masquerading as a wizard!) -- :::::: Jan Steinman Box 1000, MS 61-161 (w)503/685-2843 :::::: :::::: tektronix!tekecs!jans Wilsonville, OR 97070 (h)503/657-7703 ::::::
gnu@sun.uucp (John Gilmore) (12/25/84)
Jan Steinman of Tek asked: > In article <1869@sun.uucp> gnu@sun.uucp (John Gilmore) writes: > >The Data General AOS and AOS/VS systems solved this problem cleanly... To > >find all the jobs in the system, start at pid 1 and recurse asking about all > >its daughters... > > How did it deal with daemons? Were all processes required living parents? I believe processes without parents were adopted by pid 1, as in Unix. At any rate, every process had a live parent, maybe not the one it started with.
dae@psuvax1.UUCP (Dave Eckhardt) (12/27/84)
[The parent article has expired here. Oh well...] I've been thinking on this problem, and have come to what I think might be a satisfactory answer. Caveats: (1) I am not really a kernel person or a wizard. (2) This is still in a rough form. I hope you can read it. (3) I'm lazy and busy. If anybody wants to write the code... Well...here goes (~/src/ideas/kernsym/proposal): How about a struct usymbol { char *Sym_name; caddr_t Sym_start; off_t Sym_extent; int Sym_id; } usymbols[] = { { "symbols", usymbols, 0, 0 }, /* extent filled in later */ { "avenrun", avenrun, sizeof(avenrun), 0 }, { "proc", proc, sizeof(proc), 0 }, { 0, 0, 0, 0 } }; in the kernel? and the following: struct usymbol getSymbol(name) returns the usymbol entry to the user. int Symread(symid, offset, buffer, nchars) by analogy to int read(file, buffer, nchars) That way the "average" proc could say struct usymbol procsym; struct proc p; int n; procsym = getSymbol("proc"); for (n = 0; n < NPROC; n++) /* gee, is NPROC a usymbol, too? */ { ngot = Symread(procsym.Sym_id, n * sizeof(p), &p, sizeof(p)); . . . } Notes: the Sym_id field is just the array subscript...I guess either (a) an initSyms() in the kernel fills them in, or else (b) the kernel has a modified usymbol structure w/o the Sym_id... Rationale: (1) It makes sense to me that symbols should be named. In order to provide for that, there is the getSymbol call. However, since Symread uses a *numerical* id, all the kernel needs do is check that the Sym_id does exist the offset to Symread is >= 0 offset + nchars <= extent copyout the data instead of searching the array each time. (2) I think an initSyms is a good idea, esp. since it is conceptually possible that the size of some things is not known at compile-time. (3) It's faster than nlist by a good bit, I should think. (4) Each site has total control over what symbols are available. (5) No reason to take away (k)mem--it's *useful*, just too potent for the average user. (After all, you can use kmem to update the usymbols table on the fly, no? :-) Comments, anybody? Flames? Please respond to ...psuvax1!gondor!dae if you think of it. -- \ / \/ \ / From the furnace of Daemon ( ...{psuvax1,gondor,shire}!dae ) \/ (814) 237-1901 "I will have no covenants but proximities" [Emerson] Don't worry--I'm just a piglet of your immigration...
smh@mit-eddie.UUCP (Steven M. Haflich) (12/28/84)
These various ideas to make kernel structure accesses more efficient are certainly well motivated, but I will never understand the current propensity for installing less-than-absolutely-necessary hacks in the kernel. I paraphrase a theorem from DMR's [?] dimly-remembered but venerable paper on the implementation of Unix: The kernel is [ought to be] little more than an I/O multiplexer. My corollary: Every byte of kernel code implementing unnecessary or little-used features is a byte permanently unavailable for buffer cache and paging user code. Historically the kernel namelist problem has been elegantly solved many times by various commands which, upon first invocation after a reboot, create a predigested symbol table file somewhere (like in /tmp, which gets cleared out by every reboot). It seems to me that the right thing to do is to standardize this procedure, run it just once per boot from /etc/rc, and be done with it. Existing tools are perfectly adequate: nm -pg /vmunix | fgrep -f /interesting_symbols > /kernel_symbols The resulting /kernel_symbols file is quite short and can be read and parsed trivially and quickly using scanf. The mechanism seems elegant and Unixy, something becoming rare in these days of .5MB kernels.
jre@amdahl.UUCP (Joe Eykholt) (01/15/85)
I know I'm coming into this discussion towards the end and I haven't been keeping up with all the other proposals, but: I think the method my predecesors at Amdahl chose to solve this problem is rather clean. They added a special file that references the process table memory, exactly like /dev/mem. Actually this file is just another minor number for the /dev/mem driver. A read from offset 0 in /dev/smem/proc reads from the start of the process table. Our modified ps doesn't require a namelist. If none is specified it just opens /dev/smem/proc and reads process structures until it gets an EOF. While I agree that this feature isn't necessary, and that the cost of every kernel enhancement should be carefully considered. I think it would be nice to get away from programs which need to interpret the namelist, and have to know a lot about the layout of the kernel. Other special memory files we provided are: /dev/smem/file, inode, stats (for sysinfo), text, and var. -- Joe Eykholt ...{hplabs,ihnp4,amd,drivax,nsc,sun}!amdahl!jre [Opinions expressed by me are not necessarily held by any other entity.]