[news.groups] USENET READERSHIP SUMMARY REPORT FOR MAR 88

reid@decwrl.dec.com (Brian Reid) (04/03/88)

USENET READERSHIP SUMMARY REPORT for Mar 88

This is the first article in a monthly posting series from the Network
Measurement Project at the DEC Western Research Laboratory in Palo Alto,
California. 

This survey is based on a sample of data taken from various USENET sites.
At the end of this message there is a short explanation of the measurement
techniques and the meaning of the various statistics. The messages that
follow this one show survey data sorted by various criteria.

The newsgroup volume and article counts that I post are often significantly
different from the ones posted by Rick Adams, because he includes the size of
a crossposted article in every group to which it is posted, whereas I charge
that size only to the first-named group. 

The complete set of readership data (of which this is a summary) is posted
in news.lists. The software that will let your site participate in the
survey is in comp.sources.d and news.admin

			Brian Reid


OVERALL SUMMARY:
                             This            Estimated
                            Sample         for entire net
Sites:                      715                 9700
Fraction reporting:        7.37%                 100%
Users with accounts:      87246              1183000
Netreaders:               19605               265000

Average readers per site:                          27
Percent of users who are netreaders:            22.47%
Average traffic per day (megabytes):            3.921
Average traffic per day (messages):              1714
Traffic measurement interval:    last              21 days
Readership measurement interval: last              75 days
Sites used to measure propagation:                562


Valid data received from these sites:

16bits.dec.com 3comvax 3d.dec.com 4gl.nl 60600.dec.com abstl
abvax.abnet.com acer acheron acornrc actisb adelie agora akgua akguc
akofin.dec.com akov11.dec.com alberta alien.dec.com alliant
alpha.eecs.nwu.edu alv amdahl amdcad ames ames.arc.nasa.gov
andrew.cmu.edu ant.dec.com apollo aqua.dec.com aramis.rutgers.edu
arran.tcom.stc.co.uk array arthur.cs.purdue.edu ascvax asd.dec.com
astroatc atari ateng atom aurora auscso axis b11 babbage bacchus
bagels.dec.com banana.uq.oz bartle.dec.com basser bcd-dyn bcsfse bdmrrr
beach.cis.ufl.edu bellman beno beowulf.ucsd.edu bigboy bjs black bms-at
bnl bnr-rsc boole briar.philips.com brillig brillig.umd.edu brspyr1
bsu-cs btnix bu-it bucasb bucsb bucsd buengc bunker bute.tcom.stc.co.uk
c3engr c3pe cacilj cadomin caip.rutgers.edu calgary calyx canopus
cascade casee.dec.com castor.dec.com catfish catlabs caus-dp cavell
cbterra cca ccng ccnysci celica.dec.com cf-cm cfisun cg-atla cgcha
cgl.ucsf.edu cgofs.dec.com chalmers charlie chemabs
cheviot.newcastle.ac.uk child.dec.com chinet cimamt.dec.com cit-vax
cit-vlsi clinet clio clt.dec.com cmcl2 cocktrice cognex cogpsi coherent
concurrent.co.uk condor cookie.dec.com cooper coplex cornell cortex cos
cosmo cp1 cpro crash crete.cs.glasgow.ac.uk crin cs.glasgow.ac.uk
cs.hw.ac.uk cs.nott.ac.uk csadfa cseg csi csib csm9a csuchico csustan
culdev1 curie.dec.com curium.dec.com cutsys cutter cuuxb cxsea cybvax0
daisy dalcs dalcsug dale dalton dandelion darth dasys1 dayton dciem
dcl-cs dcrlg1 dcrtg1 ddsw1 ddtcg1 decuac decwet.dec.com decwrl.dec.com
desint devon dgis dhw68k diamond.bbn.com dinl dino discg1 dkstar
douglass.columbia.edu drexel dri1 dri2 drra dsacg1 dsacg2 dsacg3
dsachg1 dscatl dsinc dssdev.dec.com dukempd dvlmarv dycom earvax
ecrcvax ector.cs.purdue.edu edison elan elbereth.rutgers.edu elric
elrond elroy elsie ems encore eneevax.umd.edu
enterprise.mtl.u-tokyo.junet entropy.ms.washington.edu eos eplunix
ernie.berkeley.edu euclid euler.rutgers.edu eur3b2 euraiv1 exec expya
exunido fai falcon.mtl.u-tokyo.junet fas fedeva fermat.rutgers.edu
fgvaxz.dec.com flab fortune fuksi gamma.eecs.nwu.edu gang-of-four
garfield gargoyle.uchicago.edu gatech gauss.rutgers.edu gcm geac
genghis genrad gidday.dec.com gilroy glacier gold gondor.cs.psu.edu
gray grebyn grian gryphon gvgpsa gyre.umd.edu haddock.ima.isc.com
hadron hammer handel hardees.rutgers.edu harvisr
hawaii.cs.glasgow.ac.uk hawkmoon hcx1 hechcx helios helps hi.unm.edu
hillst.dec.com hodge hoptoad hoser.berkeley.edu hpstek.dec.com hqda-ai
hscfvax hsi hurratio husc4 husc7 hutch ibmpa.pasc.ibm.com ico icot32
icus idsssd iisat ileaf ima imagen imsys imt3b2 indian.dec.com infinet
intrin ipso.oz iraul1 isg100 islabs ivory izimbra.css.gov jasper jhereg
ji.berkeley.edu jimi.cs.unlv.edu jolnet jplgodo jrdv04.dec.com jumbo
kaos karhu ki4pv killer klinzhai.rutgers.edu kodak kolvi korppi kpd
krebs labrea lakesys lamc laura leadsv.leads.lmsc.com leah.albany.edu
lehi3b15 leia lifia lindy lithium liuida ll1 ll1a lll-tis lll-winken
loglule looking lowlif.dec.com lsuc lucy.dec.com lxn lzaz lznv lzsc
lzwi m2c maccs madnix mancol mandrill mas1 materna math.rutgers.edu
maxwell maynard mcdchg mcf mcgill-vision mck-csc meccsd meccts
medusa.cs.purdue.edu megatest memory.dec.com mendel metasoft metavax
methods mhres midevl.dec.com mildew.dec.com mimsy mimsy.umd.edu mind
minnow mit-eddie mitsumi mnetor mntgfx moogvax mordred.cs.purdue.edu
mosaic.dec.com moscom mrstve mss mstar mtgzy mtgzz mtunb mtund mtune
mtunf mtung mtunh mtuni mtunj mtunk mtuxo mtwain.dec.com mtx5a mtx5c
mtx5d mtx5e mtxinu munsell musky2 mvcad3.dec.com myrias nac.dec.com
natinst navajo navion.dec.com nbires ncoast ncr-sd ncrats ncrcae ncrcpx
ndmath ndsuvax necntc nesac2 netsys neumann newton.rutgers.edu
nexus.dec.com nez nicmad nitro noao novavax nttlab nuchat nucleus
nucsrl nud nusdhub nutmeg.dec.com obdient occrsh occrsh.att.com octopus
oddjob oddput odyssee ogg olgb1 olivea oliveb olivee olivej oliven
omepd ondine onecom onfcanim ontenv opus orca orchid orcisi
orion.arc.nasa.gov oscvax osiris oswego otter otto owlmnt palo-alto
panda parsely paul.rutgers.edu pbhya pbhyb pbhyc pbhyd pbhye pbhyf
pbhyg pcrat pdn pegasus percival peregrine pgg.dec.com phoenix
phoenix.princeton.edu phri piaget pixar plaid polecat popeye portnoy
poseidon potomac princeton psc psuvax1.cs.psu.edu psy.vu.nl ptsfa puck
pvab pwa-b pwcs pyr pyramid pyrdc qetzal qiclab qtc quest questar quick
radio rainbo.dec.com ramblr.dec.com ravine.dec.com raybed2 rayssd
rayssdb rayssde re.sics.se redwood reed remsit remus.rutgers.edu rencon
renoir.berkeley.edu retix rgb.dec.com rhesus.primate.wisc.edu rhi
richp1 rlgvax rmi rochester rocky rocky.oswego.edu roll.dec.com rolls
romp rosevax rti ruby rutgers.rutgers.edu sauron scenic.dec.com scgvaxd
scicom scoman.dec.com scribe.dec.com sdcrdcf sdcsmb sdcsvax sdn sdti
se-sd shamash shark shasta shell shell-gw shigeo.dec.com sialis sics
sics.se sigma sigmast siouxi.dec.com sis slxsys small-u sneaky
sniff.dec.com solaris soma sphinx.uchicago.edu splut sq sri-spam
ssdevo.dec.com sstat sstmv1.dec.com starfish stcns3 stevie.cs.unlv.edu
stl stratus stride suadb sugar sunybcs sw1e swan.ulowell.edu
swaps.dec.com swatsun swlabs t9103 tahoe.unr.edu tallis.dec.com tantra
td2cad teddy tekecs tekgvs teliut tellab5 temvax teqila.dec.com
terminus teti tfh.dec.com thebay.dec.com thelink titan.arc.nasa.gov
titan.arpa tkov58.dec.com tmsoft tnosoes toccata.rutgers.edu
took.dec.com topaz.rutgers.edu toroid.dec.com tove.umd.edu trancept
trillium troa01.dec.com tropix trwrb.dsd.trw.com tsc.dec.com tslanpar
ttidca tucos turtlevax tut.cis.ohio-state.edu tybalt.caltech.edu tymix
ubvax ucbarpa.berkeley.edu ucla-an ucqais ucsd.ucsd.edu udiego ufqtp
uhnix1 uhnix2 uiucuxa uiucuxc uiucuxf ukecc ultra.dec.com umd5
umix.cc.umich.edu unicom uokmax uop uqcspe.oz urth usceast
usmrw2.dec.com utacs uthelios utstat uunet uvabme uw-warp uxa uxe uxf
valkyr.dec.com van-bc vdelta video.dec.com videovax.tek.com
viking.dec.com violet.berkeley.edu viper vireo.dec.com virgil virginia
viusys voder.nsc.com voodoo vrdxhq vsi1 vu-vlsi w3vh wa3wbu watale
watcgl watdcsu watdragon water watmath watt watvlsi watyew wb3ffv well
westend.columbia.edu whyvax.dec.com wiley wilkie.dec.com wjvax
wookie.dec.com wp3b01 wright xanth xicom xray xworld.dec.com yendor
yetti yoda.dec.com yunexus zap zaphod zinn zorch zycad zyx

------------------------------------------------------------------------------
		EXPLANATION OF THE MEASUREMENTS AND STATISTICS

Survey data is taken by having one person at each site run a program called
"arbitron", which looks at the news or notes files and determines the
newsgroups that the user has read within a recent interval. To "read" a
newsgroup means to have been presented with the opportunity to look at at
least one message in it. Going through a newsgroup with the "n" key counts
as reading it. For a news site, "user X reads group Y" means that user X's
.newsrc file has marked at least one unexpired message in Y. If there is no
traffic in a newsgroup for the measurement period, then the survey will show
that nobody reads the group. For a notes site, "user X reads group Y" means
that user X has been in the notesfile with the sequencer in the last 14 days.
The "14 days" interval for notesfiles corresponds to "unexpired" for news.

The "arbitron" program is periodically posted to comp.sources.d, or is
available from me (decwrl!reid). The notesfiles version of the program should
be available through standard notesfiles software distribution channels as
well.

SITES SURVEYED IN THIS SAMPLE

"This Sample" means the set of sites that have sent in an arbitron report
within the past "Readership measurement interval" days. In every case the
most recent report from each site is used. At the moment, some of the
readership reports are several months old. In future postings those reports
will have expired and will not be included.

One might argue that the sample is self-selected, and thereby be biased. It
does in fact have a certain self-selection factor in it, because we only get
data from sites at which someone participates in the survey. However, we do
not require the participation of every user at a site, only one user. The
survey program returns data for every user on the system on which it was run.
Since there are an average of 30 people per site reading news, there is a
certain amount of randomness introduced that way. Of course, the sample is
biased in favor of large sites (they are more likely to have a user willing
to run the survey program) and software-development-oriented sites (more
likely to have a user *able* to run the survey program).

NETWORK SIZE

I determine the network size by looking at the set of sites that are
mentioned in the Path lines of news articles arriving at decwrl. This number
is consistently higher than the number of sites that posted a message (as
measured and posted from uunet) because it includes passive sites that are
on the paths between posting sites and decwrl. Each month I store the names
of the hosts that are named that month, and for this report I used the past
13 months worth of data.

There are 9667 different sites in the Path lines of articles that
arrived at decwrl in the last 13 months. There are 
different sites in the comp.mail.maps data, but comp.mail.maps includes every
site that participates in uucp; there is a considerable number of machines
that exchange uucp mail but do not get USENET. Of those 9667 sites,
77 (0%) are DEC E-net hosts not part of uucp, and
which therefore are not included in the  figure.

Despite these various difficulties, I believe that 9700 is the best
estimate for the size of USENET. Because it is actually a measurement of the
number of sites that have posted a message or that are on the path to a site
that has posted a message, it will be slightly smaller than the number of
sites that actually read netnews. Any site that believes it is not being
counted can just ensure that it posts at least one message a year, so that
it will be counted.


NUMBER OF USERS

The number of users at each site is determined in a site-specific fashion.
Sometimes it is done by counting the number of user accounts that have
shells and login directories. Sometimes it is done by counting the number of
people who have logged in to the machine in some interval. Sometimes other
techniques are used. This number is probably not very accurate--certainly
not more accurate than to within a factor of two.


ESTIMATED TOTAL NUMBER OF PEOPLE WHO READ THIS GROUP, WORLDWIDE

There are two sources of error in this number. The number is computed by
multiplying the number of people in the sample who actually read the group by
the ratio of estimated network size to sample size. The estimated total can
therefore be biased by errors in the network size estimate (see above) and
also by errors in the determination of whether or not someone reads a group.
Assuming that "reading a group" is roughly the same as "thumbing through a
magazine", in that you don't necessarily have to read anything, but you have
to browse through it and see what is there, then the measurement error will
come primarily from inability to locate .newsrc files, which can either be
protected or moved out of root directories. There is no way of measuring the
effect on the measurements from unlocated .newsrc files, but it is not likely
to be more than a few percent of the total news readers.

PROPAGATION: HOW MANY SITES RECEIVE THIS GROUP AT ALL

This number is the percent of the sites that are even receiving this
newsgroup. The information necessary to compute propagation was not generated
by early versions of the arbitron program, so the "basis" (number of sites)
used to generate the Propagation figure is smaller than the "Sites in this
sample" figure. A site's data will be used to compute propagation if either
(a) it reports zero readers for at least one group, or (b) it is using an
arbitron with an explicit version number that is high enough. 

MESSAGES PER MONTH AND KILOBYTES PER MONTH

Traffic is measured at decwrl, in Palo Alto, California. Any message that has
arrived at decwrl within the last "Traffic measurement interval" days is
counted, regardless of when it was posted. Monthly rates are computed by
taking the total traffic, dividing by the number of days in the traffic
measurement interval, and multiplying by 30. Decwrl runs 2.10.3 news, which
does not store the "Date-Received", "Relay-version" or "Posting-version"
header lines; the amount of space occupied at your site might be higher, and
the number of bytes transmitted between machines is probably higher. By
definition this number is correct, because it is an exact measurement, but it
may differ from the traffic at your site by as much as 15% due to timing
differences and news version differences. Timing differences will be random,
but will average out in the long run. News version differences will cause a
systematic error that is additively uniform across all newsgroups, and which
therefore does not significantly affect ratios.

If a message is crossposted to several groups simultaneously, it is charged
only to the first-named group in the list. Note that this differs from the
statistics posted from uunet every 2 weeks: the uunet data charge a message
equally to every group that it is crossposted to.


PARTICIPATION RATIO: MESSAGES per MONTH per 1000 READERS

This number is exactly what it says: the number of messages per month in
that newsgroup, divided by the number of 1000 readers. It is an indication
of how involved the readers of the group are in the traffic, of whether they
are mostly listeners or mostly talkers. Its accuracy is limited by the
accuracy of its two components. The messages per month  figure is exact; the
reader count is only as accurate as the network size estimate, which is in
worst case accurate to 40%. Therefore you should treat this number as having
an error margin of plus or minus 40%. However, ratios between participation
ratios for different newsgroups are quite accurate, since the network-size
component divides out.

COST RATIO: DOLLARS PER MONTH PER READER

The most controversial field in the survey report is the "$US per month per
reader". It is the estimated number of dollars that are being spent on
behalf of each reader, worldwide, on telephone costs to transmit this
newsgroup. The cost ratio does not include the cost of disk storage to store
the news or of computer time to process it; both of those are assumed to be
free.

The cost ratio is computed as follows:

$US/month/reader = ($USPerMonthPerSite * numberOfSites) / numberOfReaders
$USPerMonthPersite = KBytesTrafficPerMonth * $USPerKByte * Propagation factor
$USPerKByte = ($USperMinute / KBytesPerMinute) * (1 - CompressionFactor)
$USperMinute = 0.10	[ten cents per minute avg phone cost]
KBytesPerMinute = 60 * BytesPerSecond / 1000
BytesPerSecond = 100	[average transfer rate over 1200-baud line]
CompressionFactor = 0.4 [40% compression is typical for netnews]

Combining all these gives

$USPerMonthPersite =
    KBytesTrafficPerMonth * (0.10 / 6) * (1 - 0.4)
  = KBytesTrafficPerMonth / 100

Therefore:

$US/month/reader =
    (KBytesTrafficPerMonth * numberOfSites) / (100 * numberOfReaders)

The accuracy of this number is in fact better than the accuracy of the
participation ratio, because the source of error--the network size
estimate--is present both in the numerator and the denominator, and therefore
cancels out. The primary source of bias in this number comes from the bias in
the "estimated number of readers, worldwide", which is described above. Treat
this value as being accurate to within about 25%.


SITE PARTICIPATION

I would like to receive data from every site on USENET. The arbitron programs
(posted comp.sources.d along with this report) work on news 2.9, 2.10.[1-3],
2.11, and on many versions of notesfiles.


Brian Reid
DEC Western Research Laboratory, Palo Alto CA
reid@decwrl.DEC.COM
{ihnp4,allegra,decvax,ucbvax,sun,pyramid,cbosgd}!decwrl!reid

reid@decwrl.dec.com (Brian Reid) (04/06/88)

I think that Watunga is trying to punish me for not making any April Fool's
postings. Yet another obscure (and relatively harmless) bug has shown up in
the monthly readership postings, which caused the listing for comp.sys.amiga
to be erroneously merged with the listing for comp.sys.amiga.tech

Here's what happened. In order to cope with various brain-damaged software
that some people insist on using, such as 2.9 news or old notesfiles,
my arbitron processing program has some heuristics in it to compensate for
newsgroup names that are truncated to 14 characters. For example, if it sees
a report saying that some site has 11 readers of a group named
comp.sys.zenit or soc.culture.ja, I will consult the table of real group
names and fill in "comp.sys.zenith" or "soc.culture.japan".

Specifically, what happens is this:

   if reported-newsgroup-name has exactly 14 characters
      and those 14 characters exactly match the first 14 characters of
         a legal newsgroup name
   then
      substitute the longer name for the shorter name in the report.

The problem is that "comp.sys.amiga" is exactly 14 characters, and
"comp.sys.amiga.tech" is a group name whose first 14 characters exactly match.
So everyplace that I got a report for "comp.sys.amiga" the program
automatically substituted "comp.sys.amiga.tech".

I have changed the logic as follows:

   if reported-newsgroup-name is not a known newsgroup
      and if reported-newsgroup-name has exactly 14 characters
      and those 14 characters exactly match the first 14 characters of
           a legal newsgroup name
      and the remaining characters (15 through n) of that longer legal
    	   newsgroup name do not contain a '.' (period)
      then
 	 substitute the longer name for the shorter name in the report

It's not worth re-running and re-posting for this month, but this should all
work next month.

Brian