reid@decwrl.dec.com (Brian Reid) (01/01/70)
In article <358@white.gcm> dc@white.UUCP (Dave Caswell) writes: >In article <74@bacchus.DEC.COM> reid@decwrl.UUCP (Brian Reid) writes: > and it is equally ridiculous to have sources >-that are posted automatically every month be in a newsgroup that is archived >-in thousands of sites all over the world. > >It is equally ridiculous to post something automatically every month that is >archived in thousands of sites all over the world. I thought that was what I just said. The reason I post it to a discussion group is so that it won't be archived. The whole purpose of the discussion groups is to provide a forum for things like this that the poster knows should not be archived.
reid@decwrl.UUCP (Brian Reid) (06/01/87)
This is the source for the "arbitron" program that is used to produce the data for the monthly USENET readership surveys in news.lists. It is posted to this newsgroup because there is no unmoderated sources newsgroup any more. #! /bin/sh # @(#)arbitron 2.4.1 02/09/87 # arbitron -- this program produces rating sweeps for USENET. # # Usage: arbitron # # To use this program, edit the "configuration" section below so that the # information is correct for your site, and then run it. It will produce a # readership survey for your machine and mail that survey to decwrl, with # a cc to you. # # To participate in the international monthly ratings sweeps, # run "arbitron" every month. I will run the statistics program on the last # day of each month; it will include any report that has reached it by that # time. To make sure your site's data is included, run the survey program no # later than the 20th day of each month. # # Brian Reid, DEC Western Research Lab, reid@decwrl # Updated and bugfixed by # Spencer Thomas, U.of Utah # Geoff Kuenning, SAH Consulting # Updated to work with 2.10.1 and older news systems by # Lindsay Cleveland, AT&T Technologies/Bell Labs # Made to work with 16-bit address spaces by # Andy Walker, Maths Dept., University of Nottingham, UK # # Note that the results of this program are dependent on the rate at which # you expire news. If you are a small site that expires news rapidly, the # results may indicate fewer active readers than you actually have. # ########################################################################### # Configuration information. Edit this section to reflect your site data. # TMPDIR=/tmp NEWS=/usr/lib/news SPOOL=/usr/spool/news # Make a crude stab at determining the system type. If your installation has # only one type of system, you can edit out the "if" statement and just turn # this into an assignment statement of the correct value. if [ -d /usr/ucb ] then STYPE="bsd" else STYPE="usg" fi # Range of /etc/passwd UID's that represent actual people (rather than # maintenance accounts or daemons or whatever) lowUID=5 highUID=9999 # If you aren't running a distributed news system (nntpd & rrn, usually), # leave NEWSHOST blank. Else set it to the name of the host from which you # can rcp a copy of the active file. NEWSHOST= # uucp path: {ihnp4, decvax, ucbvax}!decwrl!netsurvey # summarypath="netsurvey@decwrl.dec.com $USER" summarypath="ihnp4!decwrl!netsurvey $USER" # We need to find the uucp name of your host. If this code doesn't work, # then just put it in literally like this: # hostname="ihnp4" case $STYPE in bsd) cmd='hostname || uuname -l';; sysv)cmd='uname -n || uuname -l || hostname';; *) cmd='uuname -l';; esac; hostname=`sh -c "$cmd" 2>&-` PATH=$NEWS:/usr/local/bin:/usr/ucb:/usr/bin:/bin ############################################################################ export PATH # --------------------------------------------------------------------------- trap "rm -f $TMPDIR/arb.*.$$; exit" 0 1 2 3 15 set `date` dat="$2$6" destination="${MAILER-mail} $summarypath" ################################ # Here are several expressions, each of which figures out approximately how # many people use this machine. Comment out all but 1 of them; pick the one # you like best. Initially the most universal but least reliable of them is # uncommented. # # ###### Scheme #1: fast but usually returns too big a number nusers=`awk -F: "BEGIN {N=0}\\$3>=$lowUID && \\$3<=$highUID{N=N+1}END{print N}" </etc/passwd` # # ###### Scheme #2 (works with BSD systems) #nusers=`last | sort -u +0 -1 | wc -l` # # ###### Scheme #3 (works with USG systems) #nusers=`who /etc/wtmp | sort -u +0 -1 | wc -l` ################################ # # Set up awk scripts; these are too large to pass as arguments on most # systems. # # This awk script generates the actual output report. # We use 'sed' to substitute in the shell variables to save ourselves # endless hassle trying to find quoting/backslashing problems. # # The input to this script consists of two types of lines (pre-sorted): # # (1) Active-file lines. These have four fields: newsgroup name, # first existing article, last article number, 'y' or 'n' # to allow/disallow posting. # mod.mac 00001 00001 y # # (2) .newsrc-derived lines. These have three fields: the newsgroup # name, the user name and the articles-read information. The latter # can be arbitrarily complex. It can also be arbitrarily long; # this can potentially break either awk or sed, in which # case the script will not work. # mod.map joe 1-199 # # The script uses the type 1 lines to define the newsgroups # and their active article ranges. The .newsrc (type 2) lines are # then used to deduce which users are reading that group (a group # is being read if the last article seen is in that group's active # article range). # sed "/^#/d s/NUSERS/$nusers/g s/HOSTNAME/$hostname/g s/DATE/$dat/g" > $TMPDIR/arb.fmt.$$ << 'DOG' # makereport -- utility for "arbitron". Early versions were copied from a # similar script distributed with "subscribers.sh" by Blonder, McCreery, and # Herron. # BEGIN { rdrcount = 0 ; reader = "" ; grpcount = 0 ; realusers = 0} # # Active file line: dispose of previous group (if any), record group, and # record first and last article numbers. Set group's reader count to none. NF == 4 { if (grpname != "") { printf("%d %s\n",grpcount, grpname) } grpname = $1 grpfirst = $3 grplast = $2 grpcount = 0 } # # .newsrc line. Break out the final number, which is the last article that # has actually been read. This is a pretty good indicator of the person's # true interest in the group. If 'lastread' for the group is a current # (unexpired) article, record a reader for that group. Finally, record # the user as a "real" user of the news system. # NF == 3 { if ($1 != grpname) next; n1 = split($3, n2, "-") n3 = split(n2[n1], n4, ",") lastread = n4[n3] if ((grpfirst != grplast) && (lastread >= grpfirst) && (lastread <= grplast)) { grpcount++ if (realuser[$2] != 1) { realuser[$2] = 1 realusers++ } } } # # End of file. Print the report in 2 columns. END { printf("9999 Host\t\t%s\n","HOSTNAME") printf("9998 Users\t\t%d\n",NUSERS) printf("9997 NetReaders\t%d\n",realusers) printf("9996 ReportDate\t%s\n","DATE") printf("9995 SystemType\tnews-arbitron-2.4\n") # For reorganized network, report a group even if nobody reads it. This will # help us keep track of where the groups propagate. printf("%d %s\n",grpcount, grpname) } DOG cat >$TMPDIR/arb.pwd.$$ <<'MOUSE' BEGIN { seen["/"]=1; seen[""] = 1; } { if (seen[$6]!=1) { printf("if [ -r %s/.newsrc ] ; then ", $6) printf("sed -n '/: [0-9]/s/:/ %s/p' <%s/.newsrc; fi\n",$1,$6) seen[$6]=1; } } MOUSE # First, make sure we have an active file if [ -z "$NEWSHOST" ] then ACTIVE=$NEWS/active else ACTIVE=/tmp/arb.active.$$ rcp $NEWSHOST:$NEWS/active $ACTIVE fi if [ ! -s $ACTIVE ] then echo arbitron: ACTIVE file missing or empty. Cannot continue. exit 1 fi # Next, get the list of .newsrc files with duplicates and unreadable files # removed. awk -F: -f $TMPDIR/arb.pwd.$$ </etc/passwd | sh >$TMPDIR/arb.tmp.$$ # Check to make sure that we found some if [ -s $TMPDIR/arb.tmp.$$ ] then # See if "active" file has 4 fields or only two (pre-2.10.2) set `sed 1q < $ACTIVE` > /dev/null 2>&1 if [ $# -eq 2 ] then egrep '^[a-z]*\.' $ACTIVE | while read group last do dir=`echo "$group" | sed 's;\.;/;g'` first=`ls $SPOOL/$dir | grep '^[0-9]*' | sort -n | sed 1q` case $STYPE in usg) echo "$group $last ${first:-$last} X";; *) echo "$group $last ${first-$last} X" esac done else egrep '^[a-z]*\.' $ACTIVE fi | sort - $TMPDIR/arb.tmp.$$ | awk -f $TMPDIR/arb.fmt.$$ | sort -nr | sed '/^$/d s/^999[0-9] //' | $destination else echo Unable to find any readable .newsrc files 2>&1 exit 1 fi
reid@decwrl.UUCP (Brian Reid) (10/01/87)
This is the source for the "arbitron" program that is used to produce the data for the monthly USENET readership surveys in news.lists. It is posted to this newsgroup because there is no unmoderated sources newsgroup any more. #! /bin/sh # @(#)arbitron 2.4.2 06/05/87 # arbitron -- this program produces rating sweeps for USENET. # # Usage: arbitron # # To use this program, edit the "configuration" section below so that the # information is correct for your site, and then run it. It will produce a # readership survey for your machine and mail that survey to decwrl, with # a cc to you. # # To participate in the international monthly ratings sweeps, # run "arbitron" every month. I will run the statistics program on the last # day of each month; it will include any report that has reached it by that # time. To make sure your site's data is included, run the survey program no # later than the 20th day of each month. # # Brian Reid, DEC Western Research Lab, reid@decwrl # Updated and bugfixed by # Spencer Thomas, U.of Utah # Geoff Kuenning, SAH Consulting # Updated to work with 2.10.1 and older news systems by # Lindsay Cleveland, AT&T Technologies/Bell Labs # Made to work with 16-bit address spaces by # Andy Walker, Maths Dept., University of Nottingham, UK # Nagging Bourne shell bug fixed by # Tom Donahue, Rabbit Software Corp # # Note that the results of this program are dependent on the rate at which # you expire news. If you are a small site that expires news rapidly, the # results may indicate fewer active readers than you actually have. # ########################################################################### # Configuration information. Edit this section to reflect your site data. # TMPDIR=/tmp NEWS=/usr/lib/news SPOOL=/usr/spool/news # Make a crude stab at determining the system type. If your installation has # only one type of system, you can edit out the "if" statement and just turn # this into an assignment statement of the correct value. if [ -d /usr/ucb ] then STYPE="bsd" else STYPE="usg" fi # Range of /etc/passwd UID's that represent actual people (rather than # maintenance accounts or daemons or whatever) lowUID=5 highUID=9999 # If you aren't running a distributed news system (nntpd & rrn, usually), # leave NEWSHOST blank. Else set it to the name of the host from which you # can rcp a copy of the active file. NEWSHOST= # uucp path: {ihnp4, decvax, ucbvax}!decwrl!netsurvey # summarypath="netsurvey@decwrl.dec.com $USER" summarypath="ihnp4!decwrl!netsurvey $USER" # We need to find the uucp name of your host. If this code doesn't work, # then just put it in literally like this: # hostname="ihnp4" case $STYPE in bsd) cmd='hostname || uuname -l';; sysv)cmd='uname -n || uuname -l || hostname';; *) cmd='uuname -l';; esac; hostname=`sh -c "$cmd" 2>&-` PATH=$NEWS:/usr/local/bin:/usr/ucb:/usr/bin:/bin ############################################################################ export PATH # --------------------------------------------------------------------------- trap "rm -f $TMPDIR/arb.*.$$; exit" 0 1 2 3 15 set `date` dat="$2$6" destination="${MAILER-mail} $summarypath" ################################ # Here are several expressions, each of which figures out approximately how # many people use this machine. Comment out all but 1 of them; pick the one # you like best. Initially the most universal but least reliable of them is # uncommented. # # ###### Scheme #1: fast but usually returns too big a number nusers=`awk -F: "BEGIN {N=0}\\$3>=$lowUID && \\$3<=$highUID{N=N+1}END{print N}" </etc/passwd` # # ###### Scheme #2 (works with BSD systems) #nusers=`last | sort -u +0 -1 | wc -l` # # ###### Scheme #3 (works with USG systems) #nusers=`who /etc/wtmp | sort -u +0 -1 | wc -l` ################################ # # Set up awk scripts; these are too large to pass as arguments on most # systems. # # This awk script generates the actual output report. # We use 'sed' to substitute in the shell variables to save ourselves # endless hassle trying to find quoting/backslashing problems. # # The input to this script consists of two types of lines (pre-sorted): # # (1) Active-file lines. These have four fields: newsgroup name, # first existing article, last article number, 'y' or 'n' # to allow/disallow posting. # mod.mac 00001 00001 y # # (2) .newsrc-derived lines. These have three fields: the newsgroup # name, the user name and the articles-read information. The latter # can be arbitrarily complex. It can also be arbitrarily long; # this can potentially break either awk or sed, in which # case the script will not work. # mod.map joe 1-199 # # The script uses the type 1 lines to define the newsgroups # and their active article ranges. The .newsrc (type 2) lines are # then used to deduce which users are reading that group (a group # is being read if the last article seen is in that group's active # article range). # sed "/^#/d s/NUSERS/$nusers/g s/HOSTNAME/$hostname/g s/DATE/$dat/g" > $TMPDIR/arb.fmt.$$ << 'DOG' # makereport -- utility for "arbitron". Early versions were copied from a # similar script distributed with "subscribers.sh" by Blonder, McCreery, and # Herron. # BEGIN { rdrcount = 0 ; reader = "" ; grpcount = 0 ; realusers = 0} # # Active file line: dispose of previous group (if any), record group, and # record first and last article numbers. Set group's reader count to none. NF == 4 { if (grpname != "") { printf("%d %s\n",grpcount, grpname) } grpname = $1 grpfirst = $3 grplast = $2 grpcount = 0 } # # .newsrc line. Break out the final number, which is the last article that # has actually been read. This is a pretty good indicator of the person's # true interest in the group. If 'lastread' for the group is a current # (unexpired) article, record a reader for that group. Finally, record # the user as a "real" user of the news system. # NF == 3 { if ($1 != grpname) next; n1 = split($3, n2, "-") n3 = split(n2[n1], n4, ",") lastread = n4[n3] if ((grpfirst != grplast) && (lastread >= grpfirst) && (lastread <= grplast)) { grpcount++ if (realuser[$2] != 1) { realuser[$2] = 1 realusers++ } } } # # End of file. Print the report in 2 columns. END { printf("9999 Host\t\t%s\n","HOSTNAME") printf("9998 Users\t\t%d\n",NUSERS) printf("9997 NetReaders\t%d\n",realusers) printf("9996 ReportDate\t%s\n","DATE") printf("9995 SystemType\tnews-arbitron-2.4\n") # For reorganized network, report a group even if nobody reads it. This will # help us keep track of where the groups propagate. printf("%d %s\n",grpcount, grpname) } DOG cat >$TMPDIR/arb.pwd.$$ <<'MOUSE' BEGIN { seen["/"]=1; seen[""] = 1; } { if (seen[$6]!=1) { printf("if [ -r %s/.newsrc ] ; then ", $6) printf("sed -n '/: [0-9]/s/:/ %s/p' <%s/.newsrc; fi\n",$1,$6) seen[$6]=1; } } MOUSE # First, make sure we have an active file if [ -z "$NEWSHOST" ] then ACTIVE=$NEWS/active else ACTIVE=/tmp/arb.active.$$ rcp $NEWSHOST:$NEWS/active $ACTIVE fi if [ ! -s $ACTIVE ] then echo arbitron: ACTIVE file missing or empty. Cannot continue. exit 1 fi # Next, get the list of .newsrc files with duplicates and unreadable files # removed. awk -F: -f $TMPDIR/arb.pwd.$$ </etc/passwd | sh >$TMPDIR/arb.tmp.$$ # Check to make sure that we found some if [ -s $TMPDIR/arb.tmp.$$ ] then # See if "active" file has 4 fields or only two (pre-2.10.2) set `sed 1q < $ACTIVE` if [ $# -eq 2 ] then egrep '^[a-z]*\.' $ACTIVE | while read group last do dir=`echo "$group" | sed 's;\.;/;g'` first=`ls $SPOOL/$dir | grep '^[0-9]*' | sort -n | sed 1q` case $STYPE in usg) echo "$group $last ${first:-$last} X";; *) echo "$group $last ${first-$last} X" esac done else egrep '^[a-z]*\.' $ACTIVE fi | sort - $TMPDIR/arb.tmp.$$ | awk -f $TMPDIR/arb.fmt.$$ | sort -nr | sed '/^$/d s/^999[0-9] //' | $destination else echo Unable to find any readable .newsrc files 2>&1 exit 1 fi
woods@hao.UUCP (10/03/87)
In article <11731@decwrl.DEC.COM> reid@decwrl.UUCP (Brian Reid) writes: >This is the source for the "arbitron" program that is used to produce the >data for the monthly USENET readership surveys in news.lists. It is posted to >this newsgroup because there is no unmoderated sources newsgroup any more. This is true but irrelevant. Do you really think the moderator of comp.sources.misc is going to refuse to post it? Give me a break! Is the one or two day delay caused by the moderation of that group going to cause the net survey to collapse? Good grief! --Greg -- UUCP: {husc6, gatech, oddjob, ames, noao}!hao!woods CSNET: woods@ncar.csnet INTERNET: woods@hao.ucar.edu
matt@oddjob.UUCP (10/04/87)
Brian Reid writes: ) >This is the source for the "arbitron" program ... It is posted to ) >this newsgroup because there is no unmoderated sources newsgroup any more. Greg Woods writes: ) This is true but irrelevant. Do you really think the moderator of ) comp.sources.misc is going to refuse to post it? To argue by analogy, one may wish to bypass or evade a censorship board, even if one is certain of receiving the censors' approval. Matt
rick@seismo.CSS.GOV (Rick Adams) (10/04/87)
Yet one has to wonder: If alt.sources is "proof" of the need/demand for unmoderated sources, why does one feel obligated to post to a group specifically created for discussion only? Is this an admission that alt.sources is inadequate? (Or maybe inappropriate?) You would think it would at least be cross posted... --rick
matt@oddjob.UChicago.EDU (My Name Here) (10/07/87)
) Is this an admission that alt.sources is inadequate? ) (Or maybe inappropriate?) Or insufficiently-propagated. Matt
reid@decwrl.dec.com (Brian Reid) (10/09/87)
In article <44103@beno.seismo.CSS.GOV> rick@seismo.CSS.GOV (Rick Adams) writes: >Yet one has to wonder: If alt.sources is "proof" of the need/demand for >unmoderated sources, why does one feel obligated to post to a group >specifically created for discussion only? > >Is this an admission that alt.sources is inadequate? (Or maybe inappropriate?) >You would think it would at least be cross posted... Hey, guys, woof woof. This is not posted by me, it is posted by crontab using a shell script I wrote back before alt.sources existed. So I forgot to update it to include alt.sources. I still intend to crosspost to comp.sources.d just because alt.sources doesn't go everywhere and it is to everyone's benefit for as many sites as possible to run the arbitron script. But I'm delighted to see the discussion. Sure, some people are annoyed, but everything annoys *somebody*. My goal is to get as many people as possible to install and run the script, and any technique that works to draw attention to it is fine with me. Also it's ridiculous to have sources that are posted automatically every month be sent to a moderator, and it is equally ridiculous to have sources that are posted automatically every month be in a newsgroup that is archived in thousands of sites all over the world.
webber@brandx.rutgers.edu (Webber) (10/11/87)
In article <74@bacchus.DEC.COM>, reid@decwrl.dec.com (Brian Reid) writes: > ... > Also it's ridiculous to have sources that are posted automatically every > month be sent to a moderator, and it is equally ridiculous to have sources > that are posted automatically every month be in a newsgroup that is archived > in thousands of sites all over the world. Well, the world is full of ridiculous things. Indeed, the world is actually a reductio ad absurdum argument between G*d and Lucifer that somehow went wrong. Software that should be seen by everyone should really go to news.announce. Software that is only relevant to sysadmins should go to news.sysadmins. Software seeking a discerning audience should go to alt.sources. Of course, since software describes itself, it is always appropriate to send it to comp.sources.d. ---------- BOB (webber@aramis.rutgers.edu ; rutgers!aramis.rutgers.edu!webber)
dc@gcm (Dave Caswell) (10/11/87)
In article <74@bacchus.DEC.COM> reid@decwrl.UUCP (Brian Reid) writes:
-
-Hey, guys, woof woof. This is not posted by me, it is posted by crontab using
-a shell script I wrote back before alt.sources existed. So I forgot to update
-it to include alt.sources. I still intend to crosspost to comp.sources.d just
-because alt.sources doesn't go everywhere and it is to everyone's benefit for
-as many sites as possible to run the arbitron script.
-
-Also it's ridiculous to have sources that are posted automatically every
-month be sent to a moderator, and it is equally ridiculous to have sources
-that are posted automatically every month be in a newsgroup that is archived
-in thousands of sites all over the world.
It is equally ridiculous to post something automatically every month that is
archived in thousands of sites all over the world.
jack@swlabs.UUCP (Jack Bonn) (10/12/87)
In article <74@bacchus.DEC.COM>, reid@decwrl.dec.com (Brian Reid) writes: > Also it's ridiculous to have sources that are posted automatically every > month ... Enough said. By the way, I know I am quoting out of context and I know that the original poster intended another conclusion than that that I'm drawing. But posting arbitron repeatedly is not going to "get my attention" in any way but a negative one, similar to the poster who wasn't sure he was getting out on the net and was going to post every day until someone replied. -- Jack Bonn, <> Software Labs, Ltd, Box 451, Easton CT 06612 uunet!swlabs!jack
allbery@ncoast.UUCP (Brandon Allbery) (10/15/87)
As quoted from <74@bacchus.DEC.COM> by reid@decwrl.dec.com (Brian Reid): +--------------- | Also it's ridiculous to have sources that are posted automatically every | month be sent to a moderator, and it is equally ridiculous to have sources | that are posted automatically every month be in a newsgroup that is archived | in thousands of sites all over the world. +--------------- No argument from me. However, I suspect that news.groups (or maybe news.misc) would be a more appropriate place for it. Yes, it's a source, but it's also a news-specific program. (The Notesfiles version could go to news.notes; this also means that it doesn't have to be carried by the non-Notes sites on the net.) -- Brandon S. Allbery, moderator of comp.sources.misc {{harvard,mit-eddie}!necntc,well!hoptoad,sun!mandrill!hal}!ncoast!allbery ARPA: necntc!ncoast!allbery@harvard.harvard.edu Fido: 157/502 MCI: BALLBERY <<ncoast Public Access UNIX: +1 216 781 6201 24hrs. 300/1200/2400 baud>> "Just one word, Data: _it_didn't_happen_!" - Tasha Yar