[news.groups] USENET READERSHIP SUMMARY REPORT FOR JUN 88

reid@decwrl.dec.com (Brian Reid) (07/02/88)

USENET READERSHIP SUMMARY REPORT for Jun 88
New features this month:

  * "Crossposting percentage" computed for each group.
  * New cost figures for per-reader cost, based on Trailblazer connection
    to uunet's 800 number ($0.0025/kilobyte).
----------------------------------------------------------------------------

This is the first article in a monthly posting series from the Network
Measurement Project at the DEC Western Research Laboratory in Palo Alto,
California. 

This survey is based on a sample of data taken from various USENET sites.
At the end of this message there is a short explanation of the measurement
techniques and the meaning of the various statistics. The messages that
follow this one show survey data sorted by various criteria.

The newsgroup volume and article counts that I post are often significantly
different from the ones posted by Rick Adams, because he includes the size of
a crossposted article in every group to which it is posted, whereas I charge
that size only to the first-named group. 

The complete set of readership data (of which this is a summary) is posted
in news.lists. The software that will let your site participate in the
survey is in comp.sources.d and news.admin

			Brian Reid


OVERALL SUMMARY:
                             This            Estimated
                            Sample         for entire net
Sites:                      767                 8811
Fraction reporting:        8.71%                 100%
Users with accounts:      96773              1111000
Netreaders:               19509               224000

Average readers per site:                          25
Percent of users who are netreaders:            20.16%
Average traffic per day (megabytes):            3.867
Average traffic per day (messages):              1654
Traffic measurement interval:    last              28 days
Readership measurement interval: last              75 days
Sites used to measure propagation:                617


Valid data received from these sites:

16bits.dec.com 3comvax 4gl.nl abstl abvax.abnet.com abvax.icd.abnet.com
abyss.dec.com acer acetes acheron acornrc actisb adelie agora akgua
akguc alberta alembic aleytys alv alva amc-vlsi amdahl amdcad
ames.arc.nasa.gov andrew.cmu.edu ant.dec.com apollo aramis.rutgers.edu
ariel arnor arran.tcom.stc.co.uk array arrow.garage.nj.att.com
arthur.cs.purdue.edu ascvax ashok.dec.com aspen2.dec.com astroatc atari
ateng atlact.dec.com atom atssc aurora axis b11 babbage bacchus banzai
bartle.dec.com basser bcd-dyn bcsfse beach.cis.ufl.edu
bearcat.garage.nj.att.com belfast bellman beno
bentley.garage.nj.att.com bigboy bjs black bms-at bnlux0 bnr-rsc
briar.philips.com brillig.umd.edu brspyr1 brushtail bsu-cs btnix bu-it
bu-tyng bucasb bucsb bucsd buengc bunker bute.tcom.stc.co.uk c3engr
c3pe cacilj cad cad.dec.com cadomin cadse.dec.com caelum
caip.rutgers.edu calgary carola cascade catfish catlabs caus-dp cavell
cbterra cca ccng ccnysci ceetm1 certes cfisun cg-atla cgl.ucsf.edu
chalmers charlie chemabs chgv04.dec.com child.dec.com chinet chip
cimamt.dec.com cimnet.dec.com circe cit-vax cit-vlsi claris clinet clio
clt.dec.com cmcl2 coherent comdesign concurrent.co.uk condor
cookie.dec.com coplex cord.garage.nj.att.com cortex corum cos cosmo cp1
cpro cpsc6b crash creamy crete.cs.glasgow.ac.uk crick crin cs.hw.ac.uk
cs.nott.ac.uk csadfa cseg csi csib csinn csm9a csuf3b csustan culdev1
curium.dec.com cuuxb cxsea cybvax0 daisy daitc dalcs dalcsug dale
dandelion darth dasys1 dataspan dayton dcl-cs dcrlg1 ddsw1 ddtcg1
decsim.dec.com decuac decvax.dec.com decwet.dec.com decwin.dec.com
decwrl.dec.com denning desint devcim.dec.com devlab.dec.com devon dgis
dhw68k diamond.bbn.com dino douglass.columbia.edu
douglass.cs.columbia.edu dpdmai.dec.com drexel dri1 dri2 dsacg1 dsacg2
dsacg3 dsachg1 dscatl dsinc dssdev.dec.com dukempd dvlmarv dycom earvax
eastend.columbia.edu ecrcvax edison edrsys edsel.garage.nj.att.com
egvideo elan elbereth.rutgers.edu elroy elsie ems encore
eneevax.umd.edu enterprise.mtl.u-tokyo.junet entire
entropy.ms.washington.edu eos eplrx7 eplunix eridan erlang.dec.com
ernie.berkeley.edu eros euclid euclid.dec.com euler.rutgers.edu eur3b2
euraiv1 exec expya fai falcon.mtl.u-tokyo.junet fdcv06.dec.com fedeva
fermat.rutgers.edu flab fnatte fortune fuksi gaboon gang-of-four
garfield gargoyle.uchicago.edu gatech gauss.rutgers.edu gcm geac genrad
giamem.dec.com gilroy glacier goby.dec.com gold gondor.cs.psu.edu
granjon.garage.nj.att.com gray grebyn grian gt5000 gvgpsa gyre.umd.edu
haddock.ima.isc.com hadron hammer handel hardees.rutgers.edu hardy
harvisr hawaii.cs.glasgow.ac.uk hawkmoon hcx1 hechcx helios herman
hi-csc hi.unm.edu hodge home hpscad.dec.com hqda-ai hscfvax hsi
hurratio husc4 husc7 hutch ibmpa.pasc.ibm.com ico icot32 idsssd iisat
iitmax ileaf ima ima.isc.com imagen imsys imt3b2 inco indian.dec.com
infinet intrin io ipso.oz iscuva iscuvb iscuvc iscuvd iscuve iscuvf
isg100 islabs istg.dec.com itivax itrpe izimbra.css.gov jack jackson
jaimes.dec.com jarvis.csri jasper jclyde jetson jfcl.dec.com jhereg
ji.berkeley.edu jimi.cs.unlv.edu jon.dec.com jonlab jrdvax.dec.com
jumbo juniper kaoa01.dec.com kaos karhu ki4pv klinzhai.rutgers.edu
kodak kolvi korppi kpd krebs kylie labrea lakesys lamc leah.albany.edu
lehi3b15 leia leman.dec.com lifia lily lindy lithium liuida ll1 ll1a
lll-tis lll-winken logico loglule lownlab lsuc lucy.dec.com lxn lzaz
lzfmd lzfme lznv lzsc m2c maccs madnix maestro mandrill mas1 materna
math.rutgers.edu maynard mcdchg mcf mcgill-vision mck-csc mcmi meccsd
meccts medusa.cs.purdue.edu megatest metasoft metavax mhres
mildew.dec.com mimsy.umd.edu mind minnow mipos2 mipos2.intel.com mipos3
mipos3.intel.com mips miranda mit-eddie mmd1 mnetor mntgfx moogvax
mordred.cs.purdue.edu mosart moscom mss mstar mtfmc mtfmi mtgzd mtgzf
mtgzg mtgzi mtgzk mtgzm mtgzn mtgzp mtgzq mtgzt mtgzy mtgzz mtunb mtund
mtune mtunf mtung mtunh mtuni mtunj mtunk mtunn mtuxj mtuxo
mtwain.dec.com mtx5a mtx5c mtx5e mtxinu munsell musky2 mvcad3.dec.com
myrias n8ino nac.dec.com nacad.dec.com nagano natinst navajo nbifet
ncoast ncr-sd ncrats ncrcae ncrcpx ndmath ndsuvax nesac2 netsys
newton.rutgers.edu nexus.dec.com nicmad nitro njin.rutgers.edu noao
novavax nttlab nttta nuchat nucleus nud nwnexus occrsh octopus oddjob
oddput odyssee ohsu-hcx olgb1 olivea oliveb olivee olivej oliven omepd
ondine onfcanim ontenv orca orchid orcisi orion.arc.nasa.gov
orion.cf.uci.edu oscvax osiris osiris.sics.se oswego oswego.oswego.edu
otishq otter otto owlmnt oxtrap packard.garage.nj.att.com palo-alto
panda park.columbia.edu parsely pbhya pbhyb pbhyc pbhyd pbhye pbhyf
pbhyg pcrat pdn pedev pegasus peregrine pernod.dec.com phoenix
phoenix.princeton.edu phri pierce.garage.nj.att.com pixar plaid polya
polyof.poly.edu polyslo popeye portia poseidon possum potomac potoroo
prcpto princeton prodix psc psuhcx psuvax1.cs.psu.edu psy.vu.nl ptsfa
puck pvab pwa-b pwcs pyr pyr.gatech.edu pyramid pyrdc qiclab qtc
quark.dec.com quick.quick.com quokka radio rainbo.dec.com
rancho.dec.com rangly.dec.com rayssd rayssdb rayssde rdvax.dec.com
re.sics.se redwood reed rel remus.rutgers.edu rencon
renoir.berkeley.edu retix rgb.dec.com rhesus.primate.wisc.edu rhi
riverside.columbia.edu rlgvax rmi rocky.oswego.edu rolls rosevax
rsts32.dec.com rti ruby rutgers.rutgers.edu sales.dec.com sandoz saturn
sauron scenic.dec.com scgvaxd scicom scobee scoman.dec.com sdchem
sdcrdcf sdcsmb sdcsvax sdn sdti se-sd shamash shark shasta shell
shire.dec.com sialis sics.se sigma sigmast sirius sirius.ua.oz sis
skylrk.dec.com smdvx1 smdvx1.intel.com sneaky sniff.dec.com softway
solaris soma sphinx.uchicago.edu splut sri-spam ssdevo.dec.com starfish
stb stcns3 stereo.dec.com stevie.cs.unlv.edu stratus stride studsys
suadb sugar sunybcs sw1e swan.ulowell.edu swatsun swlabs t9103
tahoe.unr.edu tank td2cad teddy tekbspa tekecs tekgvs teliut tellab5
teti tfh.dec.com thor thundr.dec.com titan.arc.nasa.gov tle.dec.com
tmsoft tnosoes toccata.rutgers.edu topaz.rutgers.edu tove.dec.com
tove.umd.edu tropix trwrb.dsd.trw.com tsc.dec.com ttidca tucos
tunatx.dec.com tybalt.caltech.edu tymix ubvax ubvax.ub.com
ucbarpa.berkeley.edu ucla-an ucqais ucsd.ucsd.edu ufqtp uhnix1 uhnix2
uisc1 uiucuxa uiucuxc ukecc ukma ultra.dec.com umb umd5
umix.cc.umich.edu unicom uokmax uqcspe.oz usceast usmrw2.dec.com utacs
utstat uunet uw-june uxe uxf valkyr.dec.com vanuata.cs.glasgow.ac.uk
vector video.dec.com videovax.tek.com violet.berkeley.edu viper virgil
virginia viusys voder.nsc.com voodoo vrdxhq vsi1 vu-vlsi vulcan w3vh
wa3wbu watale watcgl watdaffy watdcsu watdewey watdonald watdragon
water wathuey watlouie watmath watscrooge watson watt watvlsi watyew
wb3ffv well widgit wiley winery.dec.com wjvax wombat wookie.dec.com
wp3b01 wright wundt wuphys xanth xicom xray yarra yendor yetti
yoda.dec.com york.columbia.edu yunccn yunexus zap zaphod ziebmef zinn
zorch zycad

------------------------------------------------------------------------------
		EXPLANATION OF THE MEASUREMENTS AND STATISTICS

Survey data is taken by having one person at each site run a program called
"arbitron", which looks at the news or notes files and determines the
newsgroups that the user has read within a recent interval. To "read" a
newsgroup means to have been presented with the opportunity to look at at
least one message in it. Going through a newsgroup with the "n" key counts
as reading it. For a news site, "user X reads group Y" means that user X's
.newsrc file has marked at least one unexpired message in Y. If there is no
traffic in a newsgroup for the measurement period, then the survey will show
that nobody reads the group. For a notes site, "user X reads group Y" means
that user X has been in the notesfile with the sequencer in the last 14 days.
The "14 days" interval for notesfiles corresponds to "unexpired" for news.

The "arbitron" program is periodically posted to comp.sources.d, or is
available from me (decwrl!reid). The notesfiles version of the program should
be available through standard notesfiles software distribution channels as
well.

SITES SURVEYED IN THIS SAMPLE

"This Sample" means the set of sites that have sent in an arbitron report
within the past "Readership measurement interval" days. In every case the
most recent report from each site is used. At the moment, some of the
readership reports are several months old. In future postings those reports
will have expired and will not be included.

One might argue that the sample is self-selected, and thereby be biased. It
does in fact have a certain self-selection factor in it, because we only get
data from sites at which someone participates in the survey. However, we do
not require the participation of every user at a site, only one user. The
survey program returns data for every user on the system on which it was run.
Since there are an average of 30 people per site reading news, there is a
certain amount of randomness introduced that way. Of course, the sample is
biased in favor of large sites (they are more likely to have a user willing
to run the survey program) and software-development-oriented sites (more
likely to have a user *able* to run the survey program).

NETWORK SIZE

I determine the network size by looking at the set of sites that are
mentioned in the Path lines of news articles arriving at decwrl. This number
is consistently higher than the number of sites that posted a message (as
measured and posted from uunet) because it includes passive sites that are
on the paths between posting sites and decwrl. Each month I store the names
of the hosts that are named that month, and for this report I used the past
13 months worth of data.

There are 8729 different sites in the Path lines of articles that
arrived at decwrl in the last 13 months. There are 4797
different sites in the comp.mail.maps data, but comp.mail.maps includes every
site that participates in uucp; there is a considerable number of machines
that exchange uucp mail but do not get USENET. Of those 8729 sites,
87 (0%) are DEC E-net hosts not part of uucp, and
which therefore are not included in the 4797 figure.

Despite these various difficulties, I believe that 8811 is the best
estimate for the size of USENET. Because it is actually a measurement of the
number of sites that have posted a message or that are on the path to a site
that has posted a message, it will be slightly smaller than the number of
sites that actually read netnews. Any site that believes it is not being
counted can just ensure that it posts at least one message a year, so that
it will be counted.


NUMBER OF USERS

The number of users at each site is determined in a site-specific fashion.
Sometimes it is done by counting the number of user accounts that have
shells and login directories. Sometimes it is done by counting the number of
people who have logged in to the machine in some interval. Sometimes other
techniques are used. This number is probably not very accurate--certainly
not more accurate than to within a factor of two.


ESTIMATED TOTAL NUMBER OF PEOPLE WHO READ THIS GROUP, WORLDWIDE

There are two sources of error in this number. The number is computed by
multiplying the number of people in the sample who actually read the group by
the ratio of estimated network size to sample size. The estimated total can
therefore be biased by errors in the network size estimate (see above) and
also by errors in the determination of whether or not someone reads a group.
Assuming that "reading a group" is roughly the same as "thumbing through a
magazine", in that you don't necessarily have to read anything, but you have
to browse through it and see what is there, then the measurement error will
come primarily from inability to locate .newsrc files, which can either be
protected or moved out of root directories. There is no way of measuring the
effect on the measurements from unlocated .newsrc files, but it is not likely
to be more than a few percent of the total news readers.

PROPAGATION: HOW MANY SITES RECEIVE THIS GROUP AT ALL

This number is the percent of the sites that are even receiving this
newsgroup. The information necessary to compute propagation was not generated
by early versions of the arbitron program, so the "basis" (number of sites)
used to generate the Propagation figure is smaller than the "Sites in this
sample" figure. A site's data will be used to compute propagation if either
(a) it reports zero readers for at least one group, or (b) it is using an
arbitron with an explicit version number that is high enough. 


MESSAGES PER MONTH AND KILOBYTES PER MONTH

Traffic is measured at decwrl, in Palo Alto, California. Any message that has
arrived at decwrl within the last "Traffic measurement interval" days is
counted, regardless of when it was posted. Monthly rates are computed by
taking the total traffic, dividing by the number of days in the traffic
measurement interval, and multiplying by 30. Decwrl runs 2.10.3 news, which
does not store the "Date-Received", "Relay-version" or "Posting-version"
header lines; the amount of space occupied at your site might be higher, and
the number of bytes transmitted between machines is probably higher. By
definition this number is correct, because it is an exact measurement, but it
may differ from the traffic at your site by as much as 15% due to timing
differences and news version differences. Timing differences will be random,
but will average out in the long run. News version differences will cause a
systematic error that is additively uniform across all newsgroups, and which
therefore does not significantly affect ratios.

If a message is crossposted to several groups simultaneously, it is charged
only to the first-named group in the list. Note that this differs from the
statistics posted from uunet every 2 weeks: the uunet data charge a message
equally to every group that it is crossposted to.


CROSSPOSTING PERCENTAGE: WHAT FRACTION OF THE ARTICLES ARE CROSSPOSTED

"Crossposting" means to post the same article simultaneously in more than one
newsgroup. In genuine "news" systems crossposting is implemented with Unix
links and does not increase the storage or transmisison cost, though in some
other systems crossposted articles are unbundled and must be stored and
transmitted separately.

The "crossposting percentage" is the percentage of the articles in this group
that are crossposted to at least one other group. If every article in this
group is crossposted, the percentage will be 100%; if none is crossposted,
then the percentage will be 0%. The crossposting percentage figure does not
take the size of the article into account, only the number of articles.
Crossposting a 50,000-byte article or a 50-byte article both cause the same
tally.


COST RATIO: DOLLARS PER MONTH PER READER

The most controversial field in the survey report is the "$US per month per
reader". It is the estimated number of dollars that are being spent on behalf
of each reader, worldwide, on telephone and computer costs to transmit this
newsgroup. The rate of $.0025 per kilobyte is the same value used in the
UUNET statistics reported biweekly. It is based on discussions among system
administrators about the true cost of news transmission.

The cost ratio is computed as follows:

$US/month/reader = ($USPerMonthPerSite * numberOfSites) / numberOfReaders
$USPerMonthPersite = KBytesTrafficPerMonth * $USPerKByte * Propagation factor
$USPerKByte = 0.0025

Combining all these gives

$USPerMonthPersite =
    KBytesTrafficPerMonth * 0.0025
  = KBytesTrafficPerMonth / 400

Therefore:

$US/month/reader =
    (KBytesTrafficPerMonth * numberOfSites) / (400 * numberOfReaders)

The accuracy of this number is in fact better than the accuracy of the
participation ratio, because the source of error--the network size
estimate--is present both in the numerator and the denominator, and therefore
cancels out. The primary source of bias in this number comes from the bias in


the "estimated number of readers, worldwide", which is described above. Treat
this value as being accurate to within about 25%.


SITE PARTICIPATION

I would like to receive data from every site on USENET. The arbitron programs
(posted comp.sources.d along with this report) work on news 2.9, 2.10.[1-3],
2.11, and on many versions of notesfiles.


Brian Reid
DEC Western Research Laboratory, Palo Alto CA
reid@decwrl.DEC.COM
{ihnp4,allegra,decvax,ucbvax,sun,pyramid}!decwrl!reid