[news.misc] Validity of USENET news statistics

tale@pawl.rpi.edu (David C Lawrence) (09/22/89)

In <18401@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
Brad> Greg thinks no, because at his site they have an NNTP server
Brad> where nobody reads news and lots of clients which are never
Brad> counted in arbitron surveys.  Is this typical?  I also know
Brad> sites which have a server where lots of people read news on the
Brad> server, and others read on the clients.  Which is more common?

Well, to provide at least one more data point in this, we are set up
much like NCAR.  Only usenet@rpi.edu has a .newsrc that has been
touched at all recently and then only for testing things.  Only one
other person who can log into the server has a .newsrc there, which
hasn't been touched since mid-Feb.  Reading is done via both NNTP and
NFS, and, like Greg, I can't login to every host which uses rpi.edu
for USENET service.  rpi.edu would provide some pretty non-representative
statistics for Rensselaer.

Brad> But I fail to see why this is a problem.  Are many arbitron
Brad> reports sent in for servers with no readers?  Why would anybody
Brad> bother so send in such a report?  Arbitron senders, let me know?

We don't send it, as much as I wish I could collect the data.  I might
begin working soon on a way to provide some useful data from out nntpd
syslogging, but it seems pretty inaccurate.  For one thing, different
uses can't really be told apart and for another there is the NFS end.
I think most of the information provided via syslog is only really
useful locally.

Another source of USENET statistics, inpaths, is much more easy to
deal with because it just has to run on the server.  Now presumably
some of that data could be corrupted if it were to run on machines
which used that server, but in general it has a lot less of the
problems which arbitron has.  Then again, they are measuring
completely different things.  Perhaps it is time to find another
interesting aspect of the net to measure, too.

Dave
--
 (setq mail '("tale@pawl.rpi.edu" "tale@itsgw.rpi.edu" "tale@rpitsmts.bitnet"))

karl@ficc.uu.net (Karl Lehenbauer) (09/22/89)

In article <1989Sep22.061846.4560@rpi.edu> tale@pawl.rpi.edu (David C Lawrence) writes:
>Only usenet@rpi.edu has a .newsrc that has been
>touched at all recently and then only for testing things.  

>... Reading is done via both NNTP and
>NFS...

Sounds like we need improved collection methods, perhaps an enhanced arbitron?
-- 
-- uunet!ficc!karl	"The last thing one knows in constructing a work 
			 is what to put first."  -- Pascal

pst@anise.acc.com (Paul Traina) (09/23/89)

NNTP server sites are becoming more typical.  At the Baltimore USENIX NNTP
and News BOF's I suggested that we add a non-secure "USER" command which
would be used just for statistics gathering purposes.  If a client (such as rn)
were modified to send out "USER user@host.domain" upon connection to the
server,  we could gather enough statistics to build arbitron or other reports.

Since this has come up again and again,  I'll hack in the changes to nntpd
and rn tonight.  I'll publish the changes and someone (or myself) can come
up with some awk scripts to build arbitron reports.  If we like it,  let's
make it a formal hack.

Paul
-- 
The difference between this school and a cactus plant is that the
cactus has the pricks on the outside.

tale@pawl.rpi.edu (David C Lawrence) (09/23/89)

In <1989Sep22.205741.29056@anise.acc.com> pst@anise.acc.com (Paul Traina):
Paul> Since this has come up again and again, I'll hack in the changes
Paul> to nntpd and rn tonight.  I'll publish the changes and someone
Paul> (or myself) can come up with some awk scripts to build arbitron
Paul> reports.  If we like it, let's make it a formal hack.

Very interesting; the USER-type command was exactly the same sort of
thing I thought of this morning (though I was calling in STATS, but
that it pretty irrelevant).  I would be more than happy to work on
some awk scripts when you let us know the log format to expect.

Speaking of log format, please make it another compile-time option so
I can tell it where to put the logfile; I personally would prefer to
have the log separate from syslog activity, but that is open for
discussion.

What should be in the log?  I would expect the most useful information
now would be at the very least user@host and each .newsrc line for
subscribed groups.  The client inews would have to provide the
information which should be pretty easy.  Any other information that
would be useful?  How, for example, could data from unsubscribed
groups be useful?

Dave

By the way, besides working on the awk scripts, I'll add the
appropriate connextion-time USER command to GNUS.
--
 (setq mail '("tale@pawl.rpi.edu" "tale@itsgw.rpi.edu" "tale@rpitsmts.bitnet"))

pst@anise.acc.com (Paul Traina) (09/23/89)

I posted the code I wrote for the USER command and promptly got a note from
Brian Kantor asking me not to diddle things since we're talking about changes
to the protocol.  Since that's the case,  I've decided it was best to cancel
the article containing my diffs.  Anyone that wants it on an individual
basis may send me mail and I will forward it to you as long as you understand
that Brian and cohorts are working to revamp the protocol.  Unfortunately
I don't remember anyone talking about user identification (or more agressively,
authentication) except myself the last time this was in a public forum.

In any case, drop me a note if you want it, or sit tight and wait to see
what the folks in the know are doing (which will probably be a more elegant
way of doing things in any case).

-- 
The difference between this school and a cactus plant is that the
cactus has the pricks on the outside.