[net.news] polling and statistics

lauren@vortex.UUCP (Lauren Weinstein) (03/18/86)

Just a technical point about polling.  Comparing Usenet polls of this
sort to Gallup, Nielson (they're the ones with the boxes on the TV's,
not Arbitron), etc. is inappropriate.  The polling companies use VERY
carefully selected samples to enable small sample populations to
represent larger total populations.  The sort of things we see on the net
are what would be termed "self-selected" polls, where anyone can
participate or not by their own choice.  While the results of such self-select
polls (similar to 900 telephone polls, when you think about it) may
be interesting, they normally will not be applicable in a statistical
sense to larger populations.  In other words, self-select polls tell
you what the people voting thought or say they did.
They don't say much about what the people who DIDN'T vote
are thinking or doing--they simply do not have much statistical validity 
for generalization to larger populations.

They can still be fun, though--900 numbers sure are popular.

There are some definite "traps" in self-select polling.  For example,
doing consistency checks (looking for "uniform" sorts of responses)
across a self-select population may be misleading, since the self-select
population may be selecting themselves due to unknowable (to the polling
entity) factors.  To put it bluntly, any conclusions drawn from a self-select
population must be considered at least partially suspect when
it comes to generalization.

As a statistics professor I once knew used to say, "If you didn't pick
the sample population yourself, according to valid statistical criteria,
nothing you do later [with a self-select population] can validate the
sample for generalization."

A classic case involves 900 polls.  It was found from some statistics that
people responding to dial-in 900 polls tended to have incomes above a certain
level.  Consistency checks on the data indicated that this was true across
the entire populaton of callers, regardless of geographic region.  Of course,
to presume that this means that the U.S. population as a whole has that sort
of income would be incorrect--it simply means that the people who CHOSE
to CALL generally fit the higher income category.

As for the glacier polls, I think they *are* interesting, but I would not
want to see any important decisions based on the information so
gleaned.  In particular, I feel that the cost factors could be incorrect
by large amounts (not necessarily are, but *could* be) since
nobody on the network really has even an approximate handle on the true 
cost factors involved in netnews transmissions.  There are only guesses.
The group readership info, on the other hand, is of a less critical 
nature and can probably be taken as being indicative of the
reading habits of certain segments of the Usenet population--not
the entire population, but certain segments.

While I most certainly do *not* suggest that this book has anything
whatever to do with the glacier poll, I might recommend that persons
unfamiliar with some of the fine points of statistics and polling
read the classic book, "Lying With Statistics."  It is pretty much
required reading for anyone who wants to be able to interpret many
of the "statistics" we find quoted by the mass media these days.

Once again, I am definitely *not* suggesting that there is any lying or 
planned bias of any kind in the glacier poll.  I only bring up 
this book since it *is* good reading for people interested in
statistics in general.

--Lauren--

reid@glacier.ARPA (Brian Reid) (03/19/86)

Several people have brought up the issue of self-selected samples in private
mail to me; I've answered individually. Since Lauren posted this note I feel
it's time to post an explanation.

Summary: this is not a self-selected poll. It has a certain self-selected
flavor to it, though an indirect one, but the classic problems of
self-selected respondents do not apply here. Unfortunately, there is no way
to tell how much the self-select factor is affecting things, but I am
confident that its effect is small, and I am certain that its effect is not
dominating the data.

The reason for this is that I don't need a response from every user, I only
need a response from one user per site. Naturally the poll selects in favor
of sites that have users who are willing to respond, but if the average
population of a site is high enough, then the results are reasonably
unbiased. It is therefore biased in favor of larger sites, but that turns
out to be OK because the larger sites are where most of the readers are.
Sites "well", "ritcv", "gitpyr", and "cod" have 700 netnews readers among
the four of them (1 service bureau, 2 university machines, 1 government lab).
That (and dozens of other sites like them) totally dominate the 5-user
Unisoft 68000 machines whose 5 users are all too busy to respond.

I've been discussing this issue at great length with various statisticians
around Stanford, and although everybody agrees that USENET is far too
complex and amorphous for quantitative analysis of the quality done for,
say, television or presidential elections,  they point out that I am polling
for ratios and not for absolute counts, which tends to compensate for a
number of different kinds of bias.

My own calculations (I'm reasonably well trained in statistics) lead me to
believe that the "what percentage of the population reads this group"
columns are accurate to within about 25% (i.e. a share of 10% could be
12.5% or it could be 8%), and that the "how many people read this group,
worldwide" column is accurate to within 100% (i.e. a figure of 5000 people
reading it could be 2500 or it could be 10000). The ratios are a lot more
accurate than the absolute numbers, and the ratios between absolute numbers
are probably even more accurate.

The "Dollars per reader per month" column should perhaps have been labeled
"cost per reader per month". It's true that there is no guarantee that those
numbers are anything resembling dollars, but it is also true that whatever
units they are in, they are the same for all newsgroups and therefore can be
compared in ratio. In other words, if net.religion.christian (the most
expensive group per reader) costs 30.00 units per reader, and net.cooks
costs 1.00 unit per reader, then it is quite true that
net.religion.christian is 30 times more expensive than net.cooks FOR THE
SAMPLED POPULATION. Whether it is 30 times more expensive for the whole
network, or only 25 times more expensive, or 35 times more expensive, is
determined with the same accuracy as the readership ratios, which I believe
to be 25%.

The way to improve the data, of course, is to take more of it. Send in your
arbitron results to netsurvey@glacier.
-- 
	Brian Reid	decwrl!glacier!reid
	Stanford	reid@SU-Glacier.ARPA

chris@minnie.UUCP (Chris Grevstad) (03/20/86)

lauren@vortex.UUCP (Lauren Weinstein) says:
>Just a technical point about polling.  Comparing Usenet polls of this
>sort to Gallup, Nielson (they're the ones with the boxes on the TV's,
>not Arbitron), ...

Just a small thing here.  My family was recently a Nielson family and they did
NOT hook up a box to our tv.  Rather, they gave us a logbook in which we
recorded our viewing for the day.  ALthough there were some shows that I did
not watch that I think are worthwhile, I was sorely tempted to report that I
did watch them.  This certainly makes me wonder about the validity of the
Nielson ratings.

-- 
	Chris Grevstad
	{sdcsvax,hplabs}!sdcrdcf!psivax!nrcvax!chris
	ucbvax!calma!nrcvax!chris
	ihnp4!nrcvax!chris

   "No, I'll tell you the truth.  She was a striptease dancer.  I saw
    her first in an obscene movie, one of those things in obscenacolor.
    Naked of course. They had a Kodiak bear strapped to a table, muzzled..."

lauren@vortex.UUCP (Lauren Weinstein) (03/22/86)

A large portion of Nielson TV ratings that are normally involved
with the small sample of "box" families are the
"overnight" rating and share figures.  Nielson and the other
ratings firms also rely very heavily on user-maintained logs for
information.  The accuracy of these logs is frequently called
into question by various groups.  In fact, the accuracy of the boxes
is also questioned, since there is evidence that some families in the
small sample may try "stack" the statistics in various ways by leaving
the TV on when nobody is watching and similar tactics.  How much
effect they actually have with such tactics is difficult to determine.

--Lauren--

chuq@sun.uucp (Chuq Von Rospach) (03/22/86)

> lauren@vortex.UUCP (Lauren Weinstein) says:
> >Just a technical point about polling.  Comparing Usenet polls of this
> >sort to Gallup, Nielson (they're the ones with the boxes on the TV's,
> >not Arbitron), ...
> 
> Just a small thing here.  My family was recently a Nielson family and they did
> NOT hook up a box to our tv.  Rather, they gave us a logbook in which we
> recorded our viewing for the day.  ALthough there were some shows that I did
> not watch that I think are worthwhile, I was sorely tempted to report that I
> did watch them.  This certainly makes me wonder about the validity of the
> Nielson ratings.

Neison does two types of rating. The ratings you all hear about are based on
the rating boxes hooked up to TV's at 'Neilson Family' households. They do a
secondary and much larger survey using TV diaries to keep track of the 
validity of their family mix and make sure their smaller sample is really
valid. There does tend to be a bit of cheating in both the diary and box
samples (for example, it used to be that the boxes couldn't tell which
UHF station was tued to, to it was preset to a specific preferred UFH
channel, which got the vote regardless of the channel really tuned to) but
statistically they tend to wash out.

chuq

-- 
:From catacombs of a past participle:   Chuq Von Rospach 
chuqi%plaid@sun.ARPA			FidoNet: 125/84
CompuServe: 73317,635
{decwrl,decvax,hplabs,ihnp4,pyramid,seismo,ucbvax}!sun!plaid!chuq

I used to really worry about splitting my infinitives until I realized
that most people had never heard of them.