dave@onfcanim.UUCP (Dave Martindale) (01/18/88)
This morning, arbitron ran automatically and its output was mailed to me.
I was very surprised to find that we have 27 active users on this machine.
On BSD unix, arbitron generally uses the pipeline
nusers=`last | sort -u +0 -1 | wc -l`
to count the number of users who have logged in this month, in order to
ignore dormant users in the passwd file. Running the "last | sort"
component of the pipeline by hand did indeed produce 27 lines of output,
which consisted of:
13 living, breating users
10 uucp logins
2 dummy entries "reboot" and "shutdown"
2 lines due to the printf("\nwtmp begins %s\n", ctime(whatever))
that last does after printing all data in the file
So, 13 out of 27 reported users were real. I expect figures to be similarly
inflated on most BSD systems.
I've replaced the pipeline above with:
nusers=`last | sed -e '/^$/d' -e '/^wtmp begins/d' -e '/ ~ /d' -e '/^U/d' \
| sort -u +0 -1 | wc -l`
Since it depends on the fact that our uucp logins all begin with 'U', you
would have to edit it as appropriate for your username conventions.
The pipeline used for USG systems probably needs patching in a similar manner.
If everyone fixed this, I wonder how much the "total users" count
would go down net-wide?
Dave Martindale
{musocs,watmath}!onfcanim!dave
rsk@s.cc.purdue.edu (Rock Wombat) (01/20/88)
In article <15538@onfcanim.UUCP> dave@onfcanim.UUCP (Dave Martindale) writes: >On BSD unix, arbitron generally uses the pipeline > nusers=`last | sort -u +0 -1 | wc -l` >to count the number of users who have logged in this month, in order to >ignore dormant users in the passwd file. This can also result in a count that is too low. On our systems, we zero the utmp and wtmp files every 24 hours, so this pipeline would only reveal those users who have logged in since the last zap. I'm not sure that there's a good solution to this problem; is there a general way to answer the question "How many active users does this machine have?" that avoids this difficulty? -- Rich Kulawiec, rsk@s.cc.purdue.edu, s.cc.purdue.edu!rsk PUCC Unix Staff
reid@decwrl.dec.com (Brian Reid) (01/20/88)
Counting the number of users on the system is probably the most error-prone of all the measurements. The various counting procedures are extremely sensitive to local administrative policies. This is the main reason why the script mails a copy of the results to the system administrator as well as to me. I've never believed them to be more accurate than maybe to within half or double the posted amount, and the comments attached to each months' Arbitron report say as much.
page@swan.ulowell.edu (Bob Page) (01/20/88)
YaBut... # @(#)arbitron 2.4.2 06/05/87 # Range of /etc/passwd UID's that represent actual people (rather than # maintenance accounts or daemons or whatever) lowUID=90 highUID=20000 # # ###### Scheme #1: fast but usually returns too big a number nusers=`awk -F: "BEGIN {N=0}\\$3>=$lowUID && \\$3<=$highUID{N=N+1}END{print N}" </etc/passwd` Maybe arbitron needs to combine 1 & 2. Do a 'last' to get the active users, then filter out daemons etc. Or awk the passwd file then prune it based on active users (in 'last'). No, I'm not volunteering for anything. ..Bob -- Bob Page, U of Lowell CS Dept. page@swan.ulowell.edu ulowell!page "I don't know such stuff. I just do eyes." -- from 'Blade Runner'
jerry@oliveb.olivetti.com (Jerry Aguirre) (01/21/88)
In article <1978@s.cc.purdue.edu> rsk@s.cc.purdue.edu.UUCP (Rock Wombat) writes: >This can also result in a count that is too low. On our systems, we zero >the utmp and wtmp files every 24 hours, so this pipeline would only reveal >those users who have logged in since the last zap. I'm not sure that there's >a good solution to this problem; is there a general way to answer the question >"How many active users does this machine have?" that avoids this difficulty? How about extracting home directories from the password file and then checking to see if they have a .login or .profile that has been accessed in the measurement interval. This will filter out inactive users and daemon logins (UUCP). Of course users with multiple accounts will still show up as more than one person but I see no automatic way to avoid that. That problem is even worse here because I send in reports for 5 systems. Many of the users have secondary accounts on other systems so they get counted multiple times. What I really need is a way to merge the data from the 5 systems and submit a single report with duplicates merged. I am sure that other sites using NNTP for reading have the same problem. It is common at some sites for a new user to automatically get an account on every system. Jerry Aguirre @ Olivetti ATC
wundt@wundt.psy.vu.nl (Wundt Administrator) (01/21/88)
There are several suggestions given other than last | sort to count the number of users. One of these requires that "active" users be grouped within a contiguous range of uid numbers, but you can sort. See the arbitron.sh (the program ! as distributed) for all the suggestions given. The first example I read about (complaining) said that only 13 of the 27 users were real. If uucp daemons, etc. are kept in the range 0-20 or 0-100 (then root dosn't get counted either) this could solve one problem. Further, users who are not allowed to read news could be given very high UIDs, e.g. greater than 10,000 (or 1000). The "problem" with this suggestion requires that the password file be "manageable". If "process- demon- users" are spread throughout it probably won't be fixed. (I'd include the text here, but if you are really interested, read the distribution next time around (if you haven't got it on hand) michael felt
") (01/22/88)
Rock Wombat writes:
) is there a general way to answer the question
) "How many active users does this machine have?" that avoids this difficulty?
How ubiquitous and uniform is /usr/adm/lastlog?
Matt
rsalz@bbn.com (Rich Salz) (01/22/88)
-In article rsk@s.cc.purdue.edu.UUCP (Rock Wombat) writes: -...is there a general way to answer the question -"How many active users does this machine have?" Not really, short of hardcoding a number into your arbitron script; which you might not want to rule out-of-hand... In news.admin, jerry@oliveb.UUCP (Jerry Aguirre) writes: >multiple times. What I really need is a way to merge the data from the >5 systems and submit a single report with duplicates merged. Here's a script that basically does it, feed it a bunch of reports. #! /bin/sh ## arb-merge. Read set of arbitron reports and merge them. ## This needs to be made portable and configurable the way the real arbitron ## stuff is, but for now... it works. HOST=`hostname` DATE=`date | sed -e 's/....\(...\).*19\(..\)/\119\2/'` cat $@ | awk '\ BEGIN { # Are we ignoring the current system (e.g., duplicate)? Ignore = 1 # Total number of users and readers for all systems. Users = 0 NetReaders = 0 # List of systems whose reports we have processed. SysCount = 1 SysList[0] = "--ERR--" # List of newsgroup names and count thereof. GroupCount = 1 GroupName[0] = "--ERR--" # Associative array of number of readers, indexed by group name. GroupReaders[0] = "--ERR--" # Associate array of "seen this newsgroup?", indexed by group name. HaveGroup["--ERR--"] = "no" } $1 == "Host" { # We assume there are not lots of hosts, so no associative array. Ignore = 0 for (i = 1; i < SysCount; i++) if (SysList[i] == $2) Ignore = 1 if (Ignore == 0) { SysList[SysCount] = $2 SysCount++ } } $1 == "Users" { if (Ignore == 0) Users += $2 } $1 == "NetReaders" { if (Ignore == 0) NetReaders += $2 } $1 == "ReportDate" { # We could (should?) check for a bad date here, and ignore reports, # but that means we might have to back out of bumping the Users and # NetReaders somehow -- not worth it. } $1 ~ /[0-9]+/ && $2 ~ /comp\.|misc\.|news\.|rec\.|sci\.|soc\.|talk\./ { if (Ignore == 0) if (HaveGroup[$2] != "yes") { HaveGroup[$2] = "yes" GroupName[GroupCount] = $2 GroupCount++ GroupReaders[$2] = $1 } else GroupReaders[$2] += $1 } END { printf "99999 Host\t\tHOST\n" printf "99998 Users\t\t%d\n", Users printf "99997 NetReaders\t%d\n", NetReaders printf "99996 ReportDate\tDATE\n" printf "99995 SystemType\tnews-arbitron-2.4\n" for (i = 1; i < SysCount; i++) printf "99994 OtherHost\t%s\n", SysList[i] for (i = 1; i < GroupCount; i++) print GroupReaders[GroupName[i]], GroupName[i] }' \ | sort -nr \ | sed -e "s/HOST/${HOST}/" -e "s/DATE/${DATE}/" -e "s/9999[0-9] //" We don't use it at BBN yet, but eventually I'll have some daemon on all major servers run arbitron and mail the results to me or a program, or have clients just mail me their .newsrc anonymously... -- For comp.sources.unix stuff, mail to sources@uunet.uu.net.
jc@minya.UUCP (01/23/88)
In article <14258@oddjob.UChicago.EDU>, matt@oddjob.UChicago.EDU ("Don't even know my real name!") writes: > Rock Wombat writes: > > ) is there a general way to answer the question > ) "How many active users does this machine have?" that avoids this difficulty? > > How ubiquitous and uniform is /usr/adm/lastlog? > Well, here there is no such file. This machine is used by only two users, and we aren't interested in running accounting. We have much better uses for the cycles and disk blocks. With the growth of the workstation market, we will see more and more machines run like this. I've explained to quite a lot of users how to free up space and time on their workstations by eliminating all the stuff that makes sense only on a multi-user system. Perhaps we need mods to readnews, vnews, rn, and so on that keep a file of readership history? Well, actually, we don't; I've determined the number of readers on several systems by something like: find /usr -name ".newsrc" -mtime -10 -print | wc -l How's that for a cpu-gobbling solution? Does anyone know an easy way to produce a list of the home directories of all users? Maybe awk could be used to chew up /etc/passwd and spit out the fifth field of each, "/.newsrc" could be appended to each, the mtime of each could be tested, and so on. Now if I only understood awk well enough to get anything other than "bailing out near line 1" 98% of the time... -- John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)
dave@lsuc.uucp (David Sherman) (01/25/88)
In article <181@wundt.psy.vu.nl> wundt@psy.vu.nl (Wundt Administrator) writes: >There are several suggestions given other than last | sort >to count the number of users. > >One of these requires that "active" users be grouped within >a contiguous range of uid numbers, but you can sort. That's OK, but a bit magical, and apt to be lost track of at some point. Better is to look into the last field of /etc/passwd, and do something such as assuming everyone without either a null entry (implying /bin/sh) or /bin/csh is a non-real user. That'll take care of uucico processes and other non-shell users. Of course, the various names for shells on your site may vary from being just sh and csh. David Sherman The Law Society of Upper Canada Toronto -- { uunet!mnetor pyramid!utai decvax!utcsri ihnp4!utzoo } !lsuc!dave
rwhite@nusdhub.UUCP (Robert C. White Jr.) (02/10/88)
Hi all, you are going about this thing all wrong..... [if I may.] Each of us has a certain number of entries in our /etc/passwd files which do not represent valid users, that is obvious, it is also relativly constant as a scalar quantity. While the actual number of valid users may vary greatly on some systems [schools especially] the actual number of deamons is about the same on any given system over any given time. Granted, when you add a SNA adapter and assocaited software, or a new LAN, or some such, you do tend to add an "owner" or deamon, but other than that, you get a one-user-one-entry relation. Arbitron should have an SYSDEPUSR=## line which contains the number of system level user entries in the /etc/passwd file. We then count up the number of system level entries our passwd file contains when we install arbitron. every time there-after, that arbitron runs it can simply COUNT=`expr ${COUNT} - ${SYSDEPUSR}` the normal figuring method and BANG!!!! you get more reasonable numbers. As far as this goes, an install script could even be manufactured which would implement all those nasty suggestions once and only once, and then leave the rest of the maitenence to whoever cares. [i.e. If I think my arbitron numbers are all wrong, I simply rerun the install script, and answer a few simple questions like ignore(y/n) ?] Rob.
sa@ttidca.TTI.COM (Steve Alter) (02/19/88)
In article <585@nusdhub> rwhite@nusdhub (Robert C. White Jr.) writes: } Each of us has a certain number of entries in our /etc/passwd files } which do not represent valid users, that is obvious, it is also } relativly constant as a scalar quantity. While the actual number } of valid users may vary greatly on some systems [schools especially] } the actual number of deamons is about the same on any given system } over any given time. (:-) (:-) (:-) (:-) (:-) (:-) (:-) (:-) (:-) (:-) (:-) (:-) Relatively constant as a scalar quantity? My foot! Fully *two-thirds* of our password-file (which was around 1700 lines this morning) is occupied by these non-user accounts! They're project accounts; they're scattered throughout the file, and more of them appear every week. I had to do some major hacking on that little section of arbitron to get it to come up with a semi-accurate count of our real users. Of course, we're definitely the exception. {8-) Smileys included because this is just a fun (non-informative) and non-flaming posting, but it's still the truth. (-:) (-:) (-:) (-:) (-:) (-:) (-:) (-:) (-:) (-:) (-:) (-:) -- Steve Alter ...!{csun,rdlvax,trwrb,psivax}!ttidca!alter or alter@tti.com Citicorp/TTI, Santa Monica CA (213) 452-9191 x2541