[net.news] Chuq on Statistics

woods@hao.UUCP (Greg Woods) (03/24/86)

> Lauren's article (as rebutted by Brian) was a LOT more innacurate than
> the statistics he attempted to discredit. 

   Could you substantiate that claim, please? Sounds like an opinion to me,
not the fact that you appear to state it as.

> Yes, I'm MORE than ready to use
> the results of the statistics to try to streamline the net 

   Why? Do they happen to agree with YOUR opinions, by chance? The question is
not whether the results are "good" or "bad", but whether they accurately 
reflect the habits of the net as a whole. I'm not saying they definitely
don't, but you are acting as though they definitely do.


> what Brian has really done is blow away the USENET attitudes regarding
> volume and utility -- there is now REAL evidence that volume and readership
> are completely unconnected

  My point is that the statistics, which come from a (as far as we know)
random 3% of the net, do NOT show a damn thing, necessarily. They indicate
possibilities, which should be investigated, but they do not in themselves
validate any conclusions.

> the ego-based write mostly groups.

  The so-called "soapbox" groups have already been eliminated here. The 
argumentative tone of your article makes it look like at least YOU think
I'm trying to save those groups. Untrue. In fact, we may not be around
much longer as a net site anyway because of the volume of what I consider
to be garbage. No, I don't want to enforce my opinion on the rest of the net,
but I don't want the opinion of 3% of the net forced on ME, either.
> 
> This may seem silly, but I think that it is logical for streamlining 
> of the net to be done by getting rid of the high volume/low readership
> groups -- the most affect for the least netwide trauma (except to the people
> who like to hear themselves type).

  Again, you ar missing my point, and also trying to cast me as supporting
garbage. I would agree with your statement above, but I claim it has nothing
to do with the real issue here, which is whether ths survey currently being
taken accurately reflects the entire net. 

> I didn't realize you were trained in statistics. 

   I have had courses in the subject, but do not claim to be an expert. 
Nevertheless, I can recognize the most obvious pitfalls.

> How would you recommend  improving the data then?

  Two ways: the best would be to PROPERLY select a subset of the net to
respond to the survey; this is probably impossible. Secondly, a more
standardized program (I have twice before suggested a C program; that you
have conveniently ignored this shows that you haven't really listened to
a thing I've said in your haste to make me look bad). I suppose a third
way would be to think up some way to encourage more people who CAN respond
to the current survey to do so. No, I don't have all the answers, but that
isn't a criteria for pointing out flaws in the current procedure.
> No offense intended, but I prefer to listen to the
> people trained in the discipline...

  Sure sounds like you intended offense to me! And if Brian were truly
trained in statistics as he claims he is (and I don't doubt that, I think
he is just overlooking the obvious for reasons only he knows) then he
would know better than to hold these data up as representing the entire
net. ONce again, I ENCOURAGE him to continue the survey; if several
weeks from now, he has data from 60% of the net who could and happened
to feel like responding, then I'll be more inclined to believe the results.
> 
>  I'm not bitching directly at Greg on this

   You could have fooled me!

> 
> First, Brian's stats are showing some real fallacies in the way things are
> done on Usenet. 

  You are still ignoring my salient point: so far, these stats show NOTHING
with any degree of reliability.

> One is the assumption that volume == utility, which is
> being shown to be definitely not true. 
   I might be willing to agree that this may be true, but I still don't
agree that this is 'definitely' being shown.

> In many groups, a few very
> vociferous users can completely overwhelm the rest of the readership.
   
   You don't need a survey to tell that! Look at Rich Rosen! :-(

> Second, there is the implied 'it isn't good for me, so we can't do it'.

  What are you talking about?

> Eventually we're going to have to make decisions about what the net is
> really here for, as volume and costs continue to rise. The LOGICAL thing is
> to streamline that which affects the least users, which is difficult to do
> currently because we've never before known who is reading thins 

   ..and we still DON'T!!!
--Greg
--
{ucbvax!hplabs | decvax!noao | mcvax!seismo | ihnp4!seismo}
       		        !hao!woods

CSNET: woods@ncar.csnet  ARPA: woods%ncar@CSNET-RELAY.ARPA

"If the game is lost, we're all the same; no one left to place or take the 
blame; Will we leave this place an empty stone, or a shining ball of earth,
we can call our home"

david@ukma.UUCP (David Herron, NPR Lover) (03/26/86)

In article <2019@hao.UUCP> woods@hao.UUCP (Greg Woods) writes:
>...
>> How would you recommend  improving the data then?
>
>  Two ways: the best would be to PROPERLY select a subset of the net to
>respond to the survey; this is probably impossible. Secondly, a more
>standardized program (I have twice before suggested a C program; that you
>have conveniently ignored this shows that you haven't really listened to
>a thing I've said in your haste to make me look bad). I suppose a third
>way would be to think up some way to encourage more people who CAN respond
>to the current survey to do so. No, I don't have all the answers, but that
>isn't a criteria for pointing out flaws in the current procedure.

Why do you keep suggesting C programs?  I don't understand.  I *wrote*
large portions of arbitron, I *know* how much easier it was to write
the report generator with awk than it would in C.  (And I'm a very
accomplished C programmer as well).  Sure the result would run faster
but the shell script is 1) easier to understand/write/maintain, 
2) only run once a month (or less often), and 3) smaller, both in
source and executable.

Be that as it may, I'm convinced that it's useful to know who is 
reading the newsgroups.  When I wrote my version I was wanting to
make an informed decision as to which newsgroups to cut due to phone
bill problems, and I didn't have a good idea which newsgroups were
being read.  [oooh.. I just realized something about arbitron...
it only works with 2.10.2 and above, which keeps everybody else from
responding].  

I do agree tho that until we have large portions of the net responding
then the data isn't all that correct.  But the first returns are rather
interesting.

If a newsgroup isn't read on half the machines on the net, then
why should the net transport it everywhere?

(One answer is so that new users at each site can have a chance
to start reading that newsgroup... but that isn't the answer
I'm interested in... there's still another "49%" of the net
that doesn't read that newsgroup.)
-- 
		Gone to New Orleans for Spring Break!!!

Pressing business requiring postmaster attention can be brought
to the attention of my substitute by mailing to postmaster@ukma
(or postmaster@uky.csnet).
--
David Herron,  cbosgd!ukma!david, david@UKMA.BITNET, david@uky.csnet