jr@oglvee.UUCP (Jim Rosenberg) (01/28/89)
Our system is an Altos 2000 running Xenix System V. The CPU is a 386, and the C compiler produces 4 as sizeof(int). However we seem to be hitting rollover of pids at 32K, implying that the kernel must be using short as the type of a pid -- at least internally. I have two questions. Why wouldn't the kernel use a true int for a pid, preventing rollover until 2147483647 or so? Surely this isn't just because someone thought it would louse up the output format of ps?? As system administrator should I be concerned about letting the pids roll over? We've had this happen several times with no apparent ill effects. I'm not concerned about the kernel -- it seems to know what to do when pids roll over. But what about all those programs using mktemp() or $$ ? Does anyone have any horror stories about applications that behaved badly after pid rollover? -- Jim Rosenberg pitt Oglevee Computer Systems >--!amanue!oglvee!jr 151 Oglevee Lane cgh Connellsville, PA 15425 #include <disclaimer.h>
jfh@rpp386.Dallas.TX.US (John F. Haugh II) (01/29/89)
In article <460@oglvee.UUCP> jr@oglvee.UUCP (Jim Rosenberg) writes: >Our system is an Altos 2000 running Xenix System V. The CPU is a 386, and the >C compiler produces 4 as sizeof(int). However we seem to be hitting rollover >of pids at 32K, implying that the kernel must be using short as the type of a >pid -- at least internally. I have two questions. Why wouldn't the kernel use >a true int for a pid, preventing rollover until 2147483647 or so? Surely this >isn't just because someone thought it would louse up the output format of ps?? I can't answer for the EXACT reason this was done this way, but can explain the behaviour better. For starters, PID's roll over at 30000 exactly. The code goes something like if (nextpid == 30000) nextpid = 1; So it is quite intentional. Why an unsigned short wasn't used might really be a good question since 64K can be expressed in as many digits as 30,000 can. >As system administrator should I be concerned about letting the pids roll over? >We've had this happen several times with no apparent ill effects. I administer several systems. This one rolls over every 4 days or so, the large system at work rolls over about once a week. They do this so long as they are up continuously and never have had problems because of it. > I'm not >concerned about the kernel -- it seems to know what to do when pids roll over. Correct. In the same code there is another loop which scans the process table looking for a process already using that PID. If it finds one, it increments `nextpid' and starts all over again. Eventually it must find a process ID it can use ... >But what about all those programs using mktemp() or $$ ? Does anyone have any >horror stories about applications that behaved badly after pid rollover? Insuring the process ID is unique should do the trick. Failing that, I have used the process group ID or parent process ID if a process was supposed to leave temp files remaining to be used by another process after it had exited. I don't think you would have to worry unless your system is generating new processes at a rate high enough to have multiple rollovers per day. On a side bar - there are systems where the process ID is a long for exactly this sort of reason. Imagine how many active processes an Amdahl 58xx or Cray X/MP would have? If a 3MIP 68020 has 400 process slots, how many would you put in your 400 user mainframe? -- John F. Haugh II +-Ad of the Week:---------------------- VoiceNet: (214) 250-3311 Data: -6272 |"Your hole is our goal" InterNet: jfh@rpp386.Dallas.TX.US | -- Gearhart Wireline Services UucpNet : <backbone>!killer!rpp386!jfh +------ Shrevesport, LA -------------
dave@lsuc.uucp (David Sherman) (01/30/89)
In article <460@oglvee.UUCP> jr@oglvee.UUCP (Jim Rosenberg) writes: >Our system is an Altos 2000 running Xenix System V. The CPU is a 386, and the >C compiler produces 4 as sizeof(int). However we seem to be hitting rollover >of pids at 32K, implying that the kernel must be using short as the type of a >pid -- at least internally. I have two questions. Why wouldn't the kernel use >a true int for a pid, preventing rollover until 2147483647 or so? Surely this >isn't just because someone thought it would louse up the output format of ps?? The rollover has been at 30000 since time immemorial. On the 16-bit PDP11, where pid was stored in a (short) int, there obviously would have been a problem after 32767, and I suspect the original design of a 30000 cutoff was simply to make it easier to track how many rollovers there had been (they were rare in those days, remember?). >As system administrator should I be concerned about letting the pids roll over? >We've had this happen several times with no apparent ill effects. I'm not >concerned about the kernel -- it seems to know what to do when pids roll over. >But what about all those programs using mktemp() or $$ ? Does anyone have any >horror stories about applications that behaved badly after pid rollover? There's a 1/30000 chance, so it's likely happened somewhere along the line. However, it would be hard to spot (imagine guessing at that as the cause of a bug!), so I doubt too many people will have had the horror story and have lived to tell of it :-) David Sherman -- Moderator, mail.yiddish { uunet!attcan att pyramid!utai utzoo } !lsuc!dave
root@spdyne.UUCP (01/30/89)
In article <460@oglvee.UUCP> jr@oglvee.UUCP (Jim Rosenberg) writes: >.... However we seem to be hitting rollover >of pids at 32K, implying that the kernel must be using short as the type of a >pid -- at least internally. And John F. Haugh II (jfh@rpp386.Dallas.TX.US) Writes: >I can't answer for the EXACT reason this was done this way, but can explain >the behaviour better. >For starters, PID's roll over at 30000 exactly. The code goes something >like > > if (nextpid == 30000) > nextpid = 1; I belive that it sets nextpid to 100, at least it did on our BSD system. -Chert Pellett root@spdyne
guy@auspex.UUCP (Guy Harris) (02/01/89)
>> if (nextpid == 30000) >> nextpid = 1; > > I belive that it sets nextpid to 100, at least it did on our BSD system. And it doesn't on his (S5?) system. Different systems do it differently. (Actually, the S5R3 code sets "nextpid" - actually, "mpid" - to 0, but if process ID 0 isn't already in use something rather strange has happened on your system....)
chip@ateng.ateng.com (Chip Salzenberg) (02/01/89)
According to jr@oglvee.UUCP (Jim Rosenberg): >Our system is an Altos 2000 running Xenix System V. The CPU is a 386, and >the C compiler produces 4 as sizeof(int). However we seem to be hitting >rollover of pids at 32K, implying that the kernel must be using short as the >type of a pid -- at least internally. The getpid() system call has to work for '286 binaries under Xenix/386. So all pids must be representable as a 16-bit integers. -- Chip Salzenberg <chip@ateng.com> or <uunet!ateng!chip> A T Engineering Me? Speak for my company? Surely you jest! "It's no good. They're tapping the lines."
jfh@rpp386.Dallas.TX.US (John F. Haugh II) (02/01/89)
In article <923@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes: >And it doesn't on his (S5?) system. Different systems do it >differently. (Actually, the S5R3 code sets "nextpid" - actually, "mpid" >- to 0, but if process ID 0 isn't already in use something rather >strange has happened on your system....) The general effect of the entire ordeal is something like struct proc *pp; again: if (++mpid == MAXPID) { mpid = 0; goto again; } for (pp = proc;pp < v.ve_proc;pp++) if (pp->p_stat && pp->p_pid == mpid) /* oops, exists */ goto again; The kernel repeatedly scans for a valid pid. So the concern over conflicting pid's is unfounded. There is much more involved as well. Unless Guy wants to post the source to newproc(), I think this is sufficient detail to answer the original question. Of course, with school back in session, you really didn't expect me to go posting code with a goto in it ;-) See what you made me do? ;-) -- John F. Haugh II +-Ad of the Week:---------------------- VoiceNet: (214) 250-3311 Data: -6272 |"Your hole is our goal" InterNet: jfh@rpp386.Dallas.TX.US | -- Gearhart Wireline Services UucpNet : <backbone>!killer!rpp386!jfh +------ Shrevesport, LA -------------
debra@alice.UUCP (Paul De Bra) (02/02/89)
In article <1989Jan31.164710.19502@ateng.ateng.com> chip@ateng.ateng.com (Chip Salzenberg) writes: }According to jr@oglvee.UUCP (Jim Rosenberg): }>Our system is an Altos 2000 running Xenix System V. The CPU is a 386, and }>the C compiler produces 4 as sizeof(int). However we seem to be hitting }>rollover of pids at 32K, implying that the kernel must be using short as the }>type of a pid -- at least internally. } }The getpid() system call has to work for '286 binaries under Xenix/386. }So all pids must be representable as a 16-bit integers. This is not the reason at all. The limit on pid's is 30000, which is just an arbitrary number, for historical reasons, and which appears to be the same on all Unix systems (at least all I've seen). It probably is smaller than 32767 because Unix originally ran on 16-bit machines (PDP), but the number is just arbitrary. Any number greater than the maximal number of processes would do. Paul. -- ------------------------------------------------------ |debra@research.att.com | uunet!research!debra | ------------------------------------------------------
john@frog.UUCP (John Woods) (02/03/89)
In article <12118@rpp386.Dallas.TX.US>, jfh@rpp386.Dallas.TX.US (John F. Haugh II) writes: > In article <923@auspex.UUCP> guy@auspex.UUCP (Guy Harris) writes: > >And it doesn't on his (S5?) system. Different systems do it > >differently. (Actually, the S5R3 code sets "nextpid" - actually, "mpid" > >- to 0, but if process ID 0 isn't already in use something rather > >strange has happened on your system....) > The kernel repeatedly scans for a valid pid. So the concern over > conflicting pid's is unfounded. It is possible, nay probable, that the reason someone had it cycle from 100 was that they noticed that on their system, enough of the low numbers were in use by daemons that it wasted a lot of time finding the lowest unused one. Though the cost per hour of uptime may be nearly negligible, the cost of avoiding it IS negligible. -- John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101 ...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu Presumably this means that it is vital to get the wrong answers quickly. Kernighan and Plauger, The Elements of Programming Style
friedl@vsi.COM (Stephen J. Friedl) (02/05/89)
> I believe that it sets nextpid to 100, at least it did on our BSD system. In article <923@auspex.UUCP>, guy@auspex.UUCP (Guy Harris) writes: > And it doesn't on his (S5?) system. Different systems do it > differently. (Actually, the S5R3 code sets "nextpid" - actually, "mpid" > - to 0, but if process ID 0 isn't already in use something rather > strange has happened on your system....) Please note that PIDs are not necessarily monotonically increasing on all systems. On the AT&T 3B15 (the master CPU for the multiprocessor 3B4000) the PIDs jump all over the place. For example, a trivial program to fork 20 times prints PIDs in the following order: 12331 12331 8236 12331 8236 16397 12331 8236 16397 20535 12331 8236 16397 20535 28918 12331 8236 16397 20535 28918 Note the reassignment of old PIDs here. You have to look at the PIDs in hex to get any kind of pattern, and it probably reflects processor assignment or some such. Steve -- Stephen J. Friedl 3B2-kind-of-guy friedl@vsi.com V-Systems, Inc. I speak for you only attmail!vsi!friedl Santa Ana, CA USA +1 714 545 6442 {backbones}!vsi!friedl Nancy Reagan on these *stupid* .signatures: "Enough already, OK?"
boyd@necisa.necisa.oz (Boyd Roberts) (02/06/89)
In article <12118@rpp386.Dallas.TX.US> jfh@rpp386.Dallas.TX.US (John F. Haugh II) writes: > for (pp = proc;pp < v.ve_proc;pp++) > if (pp->p_stat && pp->p_pid == mpid) /* oops, exists */ > goto again; > Guess again, bozo. You've also got to check for p->p_pgrp clashes. You can't have mpid == an active process group id. Enough of this discussion. Obviously 30000 was chosen as a "nice" number that fits into a signed 16 bit int. And, yes, I'm sure it's not interesting modulo Jon Bon Jovi's short size. Who cares about p->p_pid? As long as it's unique and != any active pgrp. Boyd Roberts NEC Information Systems Australia boyd@necisa.necisa.oz ``When the going gets wierd, the weird turn pro...''
aegl@root.co.uk (Tony Luck) (02/08/89)
In article <8857@alice.UUCP> debra@alice.UUCP () writes: >This is not the reason at all. The limit on pid's is 30000, which is just >an arbitrary number, for historical reasons, and which appears to be the >same on all Unix systems (at least all I've seen). It probably is smaller than >32767 because Unix originally ran on 16-bit machines (PDP), but the number >is just arbitrary. Any number greater than the maximal number of processes >would do. ^^^^^^^ Why "greater"? If you reduce the range of pids enough, eventually you get to the state where the number of possible pids is equal to the number of slots in the proc table, and thus you could do away with the few places that still search the proc table for pids by just defining the pid to be the index into the proc table. (and WOW, you could save 2 bytes from the proc structure by not having a p_pid element at all!). What would break if you did this (do the big mainframes already do something like this anyway ... if you want 1000 users on a machine you must be able to cope with 10,000 active processes ... and 30,000 pids would wrap round every few minutes anyway) ... it might be rather disconcerting to run a script like: while : do echo "" & wait done and have the *same* pid printed for every echo (assuming an otherwise idle system, and a proc table allocator that gave you the same slot every time) But would anything actually break? -Tony Luck <aegl@root.co.uk>
edler@cmcl2.NYU.EDU (Jan Edler) (02/09/89)
In article <1043@vsi.COM> friedl@vsi.COM (Stephen J. Friedl) writes: >Please note that PIDs are not necessarily monotonically >increasing on all systems. On the AT&T 3B15 (the master CPU for >the multiprocessor 3B4000) the PIDs jump all over the place. For >example, a trivial program to fork 20 times prints PIDs in the >following order: > >12331 12331 8236 12331 8236 16397 12331 8236 16397 20535 >12331 8236 16397 20535 28918 12331 8236 16397 20535 28918 > >Note the reassignment of old PIDs here. You have to look >at the PIDs in hex to get any kind of pattern, and it probably >reflects processor assignment or some such. I find this surprising. I claim that an implicit assumption prevails in conventional UNIX usage, against immediate reuse of pids. There are many cases where such immediate reuse can lead to problems. A simple example is a background process that you want to kill. So you send it a signal, but in the meantime it already terminated and another process inherited the pid. If the new process belongs to someone else, it may not matter (unless you are root!), but it could just as easily be yours, and you'll end up hitting the wrong process. In fact, this problem exists in all UNIX's I'm familiar with: it is possible (but probably rare) for a pid to be reused immediately. This possibility is generally ignored. Has it ever bitten anyone? Who knows? I don't see any way to reliably use pids (or any other names within a limited namespace) without some kind of assurance against immediate reuse (except for the use of comparing the return value of fork against the return value of wait). Even with such an assumption, how long must the prohibition on reuse last? I don't know the answer to this, but in practice the pid space needs to be large enough that reuse isn't likely to be "soon". If such an assumption were not needed, we might as well limit the pid space to the maximum number of concurrently active processes, NPROC, typically only a few hundred or thousand. In fact, in conventional implementations, we could just let the pid be the index into the proc table. Jan Edler NYU Ultracomputer Research Laboratory edler@nyu.edu ...!cmcl2!edler (212) 998-3353
walker@ficc.uu.net (Walker Mangum) (02/10/89)
In article <697@root44.co.uk>, aegl@root.co.uk (Tony Luck) writes: > Why "greater"? If you reduce the range of pids enough, eventually you > get to the state where the number of possible pids is equal to the > number of slots in the proc table, and thus you could do away with the > few places that still search the proc table for pids by just defining > the pid to be the index into the proc table. (and WOW, you could save > 2 bytes from the proc structure by not having a p_pid element at all!). > What would break if you did this (do the big mainframes already do > something like this anyway ... if you want 1000 users on a machine you > must be able to cope with 10,000 active processes ... and 30,000 pids > Actually, there's a much easier method (since the pid really is arbitrary). Many OS's for real-time type systems (where getting to a process's control info with efficiency is important) simply assign a "pid" (task id) that is the *index* into the "process table", or is actually an address in the "process table". A system that comes to mind is Modcomp's MAX 32 OS. It ain't Unix, but it *can* process 50,000 *external* interrupts and do 10,000 process context switches per second! -- Walker Mangum | Adytum, Incorporated phone: (713) 333-1509 | 1100 NASA Road One UUCP: uunet!ficc!walker (walker@ficc.uu.net) | Houston, TX 77058 Disclaimer: $#!+ HAPPENS
smb@ulysses.homer.nj.att.com (Steven M. Bellovin) (02/13/89)
In article <3059@ficc.uu.net>, walker@ficc.uu.net (Walker Mangum) writes: } Actually, there's a much easier method (since the pid really is arbitrary). } Many OS's for real-time type systems (where getting to a process's control } info with efficiency is important) simply assign a "pid" (task id) that is } the *index* into the "process table", or is actually an address in the } "process table". A system that comes to mind is Modcomp's MAX 32 OS. It } ain't Unix, but it *can* process 50,000 *external* interrupts and do 10,000 } process context switches per second! Many years ago, I saw a mod to a V6 UNIX(r) system that used the process table slot as the low-order 8 bits of the pid, and incremented a counter in the high-order bits to ensure that a pid wasn't reused too soon.
greg@ncr-sd.SanDiego.NCR.COM (Greg Noel) (02/13/89)
In article <11212@ulysses.homer.nj.att.com> smb@ulysses.homer.nj.att.com (Steven M. Bellovin) writes: >Many years ago, I saw a mod to a V6 UNIX(r) system that used the process >table slot as the low-order 8 bits of the pid, and incremented a counter >in the high-order bits to ensure that a pid wasn't reused too soon. I confess -- I did it. It seemed to me that the computer would be wasting a lot of time looking through the process table to see if a PID was already in use, so I dreamed up the hack of using the low-order bits as the table index to avoid the search. It was OK on a PDP-11/40, but on a PDP-11/70 we were approaching the maximum number of processes, and the process numbers themselves were cycling too fast -- if processes were being created and dying rapidly, the same process slot tended to be reused, so after 128 tries, the same PID came up again. Replacing the free-process stack with a queue so that different process slots were being used simply put back in the overhead that removing the search had taken out. But the real killer was that I could never measure any performance improvement and the code required was bigger. Since space was always tight (you youngsters haven't lived until you've shoehorned a full networking kernel into 49,152 bytes) I eventually took it out. (MO, would this be another Greg Noel Hack?) -- -- Greg Noel, NCR Rancho Bernardo Greg.Noel@SanDiego.NCR.COM or greg@ncr-sd