[news.admin] News Versions, another approach

egisin@mks.com (Eric Gisin) (11/16/89)

I wrote a short awk program that identifies news versions
based on Message-ID syntax. It wasn't too accurate
because B news, Notes, VMS, and P news all use the same syntax.
I did discover that there are several other versions
that I could not identify, which I have called E, and F and Other.

This analysis differs from the version-control-message method in two ways.
First, it only recognizes sites that have posted in the last two weeks.
Second, it also recognizes sites that do not respond to version messages,
which makes this method more reliable. Many C news sites don't seem to respond.
It is also including many mailing list gateways.

Here are the results from running it on a 2.4MB history file (2 weeks).

type	# distinct domains
B	2162
C	477
E	36
F	448
Other	325
Total	3448

Here are the REs I used in the awk program. ID is <$2@$3>.
$2 ~ /^[0-9]+$/	{		# B news
$2 ~ /^(1989|[A-Z0-9]*\.89)[JFMASOND]/ {		# C news
$2 ~ /^[JFMASOND].*1989/ {	# E news
$2 ~ /^[0-9.]+AA/ {		# F news
{				# Other

tale@pawl.rpi.edu (David C Lawrence) (11/17/89)

In <1989Nov15.213552.9499@mks.com> egisin@mks.com (Eric Gisin) writes:
Eric> This analysis differs from the version-control-message method in
Eric> two ways.  First, it only recognizes sites that have posted in
Eric> the last two weeks.  Second, it also recognizes sites that do
Eric> not respond to version messages, which makes this method more
Eric> reliable. Many C news sites don't seem to respond.  It is also
Eric> including many mailing list gateways.

It is a little more reliable as far as finding those sites goes, as
long as someone has posted a message from there.  It is a little
inaccurate though because some sites change the format of their
Message-IDs in spite of whatever news version they are running.  (Is
CMU running B or C News?  They've got some hairy M-IDs.)  As an
example, if you had done this in late late September then Rensselaer
would have been reported as both a C News site and a B News site even
in spite of the fact that we were running C News the whole time; in
fact, in the very near future we will be consistently mis-represented
as a B News site by this method.

Then there is all of the other M-IDs which find their way into news,
from either mailers (as you mention) or from client programmes like
GNUS.  Add to that all of the extra site names which appear in M-IDs
(is rpi.edu one site or is it six for each *.rpi.edu sub-domain?  Or
is it six hundred for each *.*.rpi.edu machine?) and the heuristic
begins rapidly failing.  It seems like a good general idea but I think
it needs refinement.

Dave
-- 
 (setq mail '("tale@pawl.rpi.edu" "tale@ai.mit.edu" "tale@rpitsmts.bitnet"))

wesommer@athena.mit.edu (Bill Sommerfeld) (11/17/89)

In article <1989Nov16.234445.16469@rpi.edu> tale@pawl.rpi.edu (David C Lawrence) writes:

   Is CMU running B or C News?  

Neither.  The site "andrew.cmu.edu" is using software they wrote
locally known as the "Andrew Message System" (AMS), which is notable
for some fairly spiffy user interfaces and multi-media capabilities.

They hook up to the CMU-CS news server via NNTP, and accept and post
articles that way; the software in question is something of a hack,
and incorporates a special case for alt.gourmand.

   They've got some hairy M-IDs.

AMS message ID's look like this:

AZMrfK8GFU3JROj054

that is, exactly 18 characters long, always, with no "sub-structure".
There is a program provided with AMS called decode_id which, when fed
an AMS message ID, prints:

``0ZMrg2IGFU3J1Ok058'': generated 16 Nov 1989 at 22:29:06 EST from
[18.70.0.213], pid 23232, ctr (mod 256) of 1.

					- Bill
--
Henry Spencer is so much of a  |    Bill Sommerfeld at MIT/Project Athena
minimalist that I often forget |    sommerfeld@mit.edu
he's there - anonymous         |