[news.software.b] C News milestone

henry@zoo.toronto.edu (Henry Spencer) (11/14/90)

As I believe I've mentioned before, we regularly run statistics on usage
of different news systems by analyzing our history file.  The major news
systems generate message-IDs of distinctive forms, so it's not hard to
get an idea of how many people are running which system.  This counting
approach has flaws -- in particular, it obviously tends to miss sites
that seldom post anything to network-wide newsgroups -- but it has the
enormous advantage that it's a quick, cheap, local operation, so we can
do it weekly for keeping running track of the situation.

The general patterns in recent times have been fairly consistent.  We
categorize news systems into B, C, and ?, the last being message-IDs
which fit no known format.  ? has been growing rapidly, probably as a
result of more and more inter-network gatewaying; there is no single
dominant pattern among the ? message-IDs, but a lot of them are clearly
the results of gatewaying.  B has been declining very slowly.  And C
has been growing steadily, with definite signs that the growth is
accelerating.

Anyway, the current milestone is that last weekend (while I was away at
Windycon, which is why you're just hearing about it now), C passed 1000.
Undoubtedly the C News site count actually hit four digits quite some
time ago, since we know of major C News users who seldom post anything to
the outside world, but it's definite now.
-- 
"I don't *want* to be normal!"         | Henry Spencer at U of Toronto Zoology
"Not to worry."                        |  henry@zoo.toronto.edu   utzoo!henry

rk@theep.boston.ma.us (Robert A. Kukura) (11/14/90)

In article <1990Nov13.215847.23684@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:

   As I believe I've mentioned before, we regularly run statistics on usage
   of different news systems by analyzing our history file.  The major news
   systems generate message-IDs of distinctive forms, so it's not hard to
   get an idea of how many people are running which system.  This counting
   approach has flaws -- in particular, it obviously tends to miss sites
   that seldom post anything to network-wide newsgroups -- but it has the
   enormous advantage that it's a quick, cheap, local operation, so we can
   do it weekly for keeping running track of the situation.

   The general patterns in recent times have been fairly consistent.  We
   categorize news systems into B, C, and ?, the last being message-IDs
   which fit no known format.  ? has been growing rapidly, probably as a
   result of more and more inter-network gatewaying; there is no single
   dominant pattern among the ? message-IDs, but a lot of them are clearly
   the results of gatewaying.  B has been declining very slowly.  And C
   has been growing steadily, with definite signs that the growth is
   accelerating.

For one, you are probably missing sites where the emacs gnus
newsreader is used because it generates its own ids of the form:

	Message-ID: <RK.90Nov14085626@theep.boston.ma.us>

I don't know if any other newsreaders/posters generate their own
message ids.

   Anyway, the current milestone is that last weekend (while I was away at
   Windycon, which is why you're just hearing about it now), C passed 1000.
   Undoubtedly the C News site count actually hit four digits quite some
   time ago, since we know of major C News users who seldom post anything to
   the outside world, but it's definite now.

Congratulations.

   -- 
   "I don't *want* to be normal!"         | Henry Spencer at U of Toronto Zoology
   "Not to worry."                        |  henry@zoo.toronto.edu   utzoo!henry

-- 
-Bob Kukura		internet: rk@theep.boston.ma.us
			uucp: spdcc!theep!rk

peter@ficc.ferranti.com (Peter da Silva) (11/15/90)

In article <1990Nov13.215847.23684@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
> The general patterns in recent times have been fairly consistent.  We
> categorize news systems into B, C, and ?, the last being message-IDs
> which fit no known format.  ? has been growing rapidly, probably as a
> result of more and more inter-network gatewaying;

Probably more because people are getting tired of the C-News message ID
format and are installing programs like "mkid". For example, this message
was posted via C news, but I'm sure the format gets put in '?'.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com 

tanner@cdis-1.compu.com (Dr. T. Andrews) (11/16/90)

In article <1990Nov13.215847.23684@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
) As I believe I've mentioned before, we regularly run statistics on
) usage of different news systems by analyzing our history file.  The
) major news systems generate message-IDs of distinctive forms, ...
Right, and as you note there are people out there whose news systems
don't generate the ``expected'' message-ID form.  I cite particularly
ours, which generates 7-digit (base 36 digits) message IDs.

We did not find the default C news message-IDs, which read like
Russian novels, suitable.  The program "seq.c" which generates
these message-IDs is available as a ``shar'' file upon e-mailed
application to me at the address shown above.  Act now, and receive
at no extra charge the patch to the "anne.jones" script to use
this program.

aardvark@cunix7.prime.com (Don Koch) (11/20/90)

In article <=8:6F68@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter
da Silva) writes:
...
|> Probably more because people are getting tired of the C-News message
ID
|> format and are installing programs like "mkid". For example, this
message
|> was posted via C news, but I'm sure the format gets put in '?'.

Which causes some 'notes' to sometimes gag on the article ID. 
Definitely
a notes defect; but I guess it's one way to cut down on the flaming
:-).

Just to confuse things, we use C news but have outside posters that
generate
their own ids: xrn, nn and notes (rn? what's that??).

--
Don Koch
aardvark@primerd.prime.com
These are only my opinions and not necessarily those of my employer.

peter@ficc.ferranti.com (Peter da Silva) (11/20/90)

In article <1990Nov19.231927@cunix7.prime.com> aardvark@cunix7.prime.com (Don Koch) writes:
> Which causes some 'notes' to sometimes gag on the article ID. 

What characters does "notes" gag on? I've already trimmed my article ID
to allow for old-fashioned B news sites that create file names in tmp based
on the ID (L.<fxy/ba@voodoo.com> is a great file name :->), so a few more
would be no problem.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com 

wisner@hayes.ims.alaska.edu (Bill Wisner) (11/21/90)

>It should get put in "Message IDs which don't conform to the RFCs".
>1036 says quite clearly that the form should be "<unique@full_domain_name>"
>in order to conform with 822.

Er, Dave, <=8:6F68@xds13.ferranti.com> does look like
<unique@full_domain_name> to me...

Bill Wisner <wisner@hayes.ims.alaska.edu> Gryphon Gang Fairbanks AK 99775
"Hang it in your ear, Wisner." -- Jay Maynard <jay@splut.conmicro.com>

cudep@warwick.ac.uk (Ian Dickinson) (11/21/90)

In article <1990Nov13.215847.23684@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>Anyway, the current milestone is that last weekend (while I was away at
>Windycon, which is why you're just hearing about it now), C passed 1000.
>Undoubtedly the C News site count actually hit four digits quite some
>time ago, since we know of major C News users who seldom post anything to
>the outside world, but it's definite now.

Nice to hear that a lot of people have changed over...

But I'm going to change the Message-ID format soon since it's too long.
I expect a fair proportion of sites will end up doing this.
This will probably break your method of analysis further.

At the moment though, this certainly gives a different picture than a
version control, since generating automatic replys isn't always easy...

However, is there any chance that the patchdate could be included in
the version reply with future versions of Cnews?

The simple 'C' we get now, doesn't give much information really.

Thanks,

--
\/ato.  Ian Dickinson.      GNU's feelin' horny.       Kunst und Wahnsinn.
vato@warwick.ac.uk                Sabeq.                  Mind the gap!
vato@tardis.cs.ed.ac.uk
gdd046@cck.cov.ac.uk            "I know what you sell - I don't want to buy!"

peter@ficc.ferranti.com (Peter da Silva) (11/21/90)

Regarding my message <=8:6F68@xds13.ferranti.com>:

In article <NS|^SD|@rpi.edu> tale@rpi.edu (David C Lawrence) writes:
> It should get put in "Message IDs which don't conform to the RFCs".
> 1036 says quite clearly that the form should be "<unique@full_domain_name>"
> in order to conform with 822.

I don't understand what you're getting at. <=8:6F68@xds13.ferranti.com>
consists of a unique string followed by this system's full domain name.

% uuname -l
xds13
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com 

henry@zoo.toronto.edu (Henry Spencer) (11/22/90)

In article <1990Nov21.101834.15370@warwick.ac.uk> cudep@warwick.ac.uk (Ian Dickinson) writes:
>But I'm going to change the Message-ID format soon since it's too long.
>I expect a fair proportion of sites will end up doing this.

There are a significant number of people who have taken this tack, but the
number of people who've just installed it "straight" is much greater.

Shorter message-IDs are coming.

>However, is there any chance that the patchdate could be included in
>the version reply with future versions of Cnews?

It's on the "to be looked at" list.
-- 
"I don't *want* to be normal!"         | Henry Spencer at U of Toronto Zoology
"Not to worry."                        |  henry@zoo.toronto.edu   utzoo!henry

newsadm@iddth.id.dk (Nick Sandru (news adm)) (11/23/90)

tanner@cdis-1.compu.com (Dr. T. Andrews) writes:

>In article <1990Nov13.215847.23684@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>) As I believe I've mentioned before, we regularly run statistics on
>) usage of different news systems by analyzing our history file.  The
>) major news systems generate message-IDs of distinctive forms, ...
>Right, and as you note there are people out there whose news systems
>don't generate the ``expected'' message-ID form.  I cite particularly
>ours, which generates 7-digit (base 36 digits) message IDs.

We have at our site a c-news system, an nntp server and nn news readers on 
the client machines. nn generates it's own message-IDs to be used with by
nntp's inews, so c-news doesn't generate its own ones. Only the control
messages get c-news-style message-IDs.

Long Haired Nick.

rayan@cs.toronto.edu (Rayan Zachariassen) (11/28/90)

peter@ficc.ferranti.com (Peter da Silva) writes:

>I don't understand what you're getting at. <=8:6F68@xds13.ferranti.com>
>consists of a unique string followed by this system's full domain name.

Uniqueness is a necessary but insufficient quality.  It must also use
valid syntax.  Since : is an RFC822 special character, '=8:6F68' is not
a valid local-part for the message-id.

rayan,
pedant

brad@looking.on.ca (Brad Templeton) (11/28/90)

In article <4_377B@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>Regarding my message <=8:6F68@xds13.ferranti.com>:
>
>In article <NS|^SD|@rpi.edu> tale@rpi.edu (David C Lawrence) writes:
>> It should get put in "Message IDs which don't conform to the RFCs".
>> 1036 says quite clearly that the form should be "<unique@full_domain_name>"
>> in order to conform with 822.
>
>I don't understand what you're getting at. <=8:6F68@xds13.ferranti.com>
>consists of a unique string followed by this system's full domain name.

RFC822 says that message-ids must be  <valid-domain-email-address> which
is not just <unique@domain> but <validusername@domain>, which means that
certain characters are out, like colons, commas, etc.

In fact, the set of no-nos is:

		'<>"():,; \t[]@\\

quote, angles, double quote, parens, colon, comma, semi, space, tab,
brackets, at-sign, backslash.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

Makey@Snoopy.Logicon.COM (Jeff Makey) (11/29/90)

In article <1990Nov27.230750.3478@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
>RFC822 says that message-ids must be  <valid-domain-email-address> which
>is not just <unique@domain> but <validusername@domain>

Nowhere does RFC 822 (or any other relvant specification that I know
of) suggest that the local-part of a Message-Id field should be a
valid user name.  That would be rather silly, because then it would be
impossible to distinguish (using Message-Id) two different messages
from the same person.  (Boy, would *that* cut down on Usenet traffic!)

To quote most of the relevant RFC 822 syntax rules:

      msg-id      =  "<" addr-spec ">"            ; Unique message id

      addr-spec   =  local-part "@" domain        ; global address

      local-part  =  word *("." word)             ; uninterpreted
                                                  ; case-preserved

      word        =  atom / quoted-string

      atom        =  1*<any CHAR except specials, SPACE and CTLs>

      specials    =  "(" / ")" / "<" / ">" / "@"  ; Must be in quoted-
                  /  "," / ";" / ":" / "\" / <">  ;  string, to use
                  /  "." / "[" / "]"              ;  within a word.

It is quite clear that any of the objectionable characters are allowed
by RFC 822 if they are quoted.

                           :: Jeff Makey

Department of Tautological Pleonasms and Superfluous Redundancies Department
    Disclaimer: All opinions are strictly those of the author.
    Domain: Makey@Logicon.COM    UUCP: {ucsd,nosc}!snoopy!Makey

henry@zoo.toronto.edu (Henry Spencer) (11/29/90)

In article <846@Snoopy.Logicon.COM> Makey@Snoopy.Logicon.COM (Jeff Makey) writes:
>It is quite clear that any of the objectionable characters are allowed
>by RFC 822 if they are quoted.

Be careful, however, because RFC1036 puts a few extra restrictions on for
news.  (Notably, ">" cannot appear in the message ID no matter what sort
of games you play with quotes.)
-- 
"The average pointer, statistically,    |Henry Spencer at U of Toronto Zoology
points somewhere in X." -Hugh Redelmeier| henry@zoo.toronto.edu   utzoo!henry

urlichs@smurf.sub.org (Matthias Urlichs) (11/29/90)

In news.software.b, article <1990Nov27.230750.3478@looking.on.ca>,
  brad@looking.on.ca (Brad Templeton) writes:
< 
< In fact, the set of no-nos is:
< 
< 		'<>"():,; \t[]@\\
< 
Unfortunately, some of these work in message-IDs quite well and without any
problems whatever (e.g. colon, brackets), while e.g. a slash / would be
valid except that it is the separator for UNIX path names.

-- 
Matthias Urlichs -- urlichs@smurf.sub.org -- urlichs@smurf.ira.uka.de     /(o\
Humboldtstrasse 7 - 7500 Karlsruhe 1 - FRG -- +49+721+621127(0700-2330)   \o)/

peter@ficc.ferranti.com (Peter da Silva) (11/29/90)

In article <1990Nov27.230750.3478@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
> RFC822 says that message-ids must be  <valid-domain-email-address> which
> is not just <unique@domain> but <validusername@domain>, which means that
> certain characters are out, like colons, commas, etc.

Valid user name? That sort of implies you should be able to handle mail
directed to the message ID... hmmm...

> [no] quote, angles, double quote, parens, colon, comma, semi, space, tab,
> brackets, at-sign, backslash.

And, for the sake of B News sites that create temp files that match the
message ID for locking, it's a good idea to eschew slashes...

Anyway, I'm compliant now... you want a copy of this version of mkid?
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com 

lear@turbo.bio.net (Eliot) (11/30/90)

brad@looking.on.ca (Brad Templeton) writes:

>RFC822 says that message-ids must be  <valid-domain-email-address> which
>is not just <unique@domain> but <validusername@domain>, which means that
>certain characters are out, like colons, commas, etc.

RFC 822 states exactly the following:

     4.6.1.  MESSAGE-ID / RESENT-MESSAGE-ID

             This field contains a unique identifier  (the  local-part
        address  unit)  which  refers to THIS version of THIS message.
        The uniqueness of the message identifier is guaranteed by  the
        host  which  generates  it.  This identifier is intended to be
        machine readable and not necessarily meaningful to humans.   A
        message  identifier pertains to exactly one instantiation of a
        particular message; subsequent revisions to the message should
        each receive new message identifiers.

In addition,

     msg-id      =  "<" addr-spec ">"            ; Unique message id
     addr-spec   =  local-part "@" domain        ; global address
     local-part  =  word *("." word)             ; uninterpreted
                                                 ; case-preserved
     word        =  atom / quoted-string
     quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or
                                                 ;   quoted chars.

     qtext       =  <any CHAR excepting <">,     ; => may be folded
                     "\" & CR, and including
                     linear-white-space>

So basically, the only characters that cannot be in the local part of
a message id are "/" and CR.
-- 
Eliot Lear
[lear@turbo.bio.net]

brad@looking.on.ca (Brad Templeton) (11/30/90)

In article <gr#mg2.r[1@smurf.sub.org> urlichs@smurf.sub.org (Matthias Urlichs) writes:
>In news.software.b, article <1990Nov27.230750.3478@looking.on.ca>,
>  brad@looking.on.ca (Brad Templeton) writes:
>< 
>< In fact, the set of no-nos is:
>< 
>< 		'<>"():,; \t[]@\\
>< 
>Unfortunately, some of these work in message-IDs quite well and without any
>problems whatever (e.g. colon, brackets), while e.g. a slash / would be
>valid except that it is the separator for UNIX path names.

They "work" in message-ids, that's true.   In the sense that most of the
current software allows them.  But they are not valid according to the
standard -- except when quoted, as somebody correctly pointed out.

Colons as an example of something you're wrong on.  I used to put colons in
message-ids, but I got a couple of complaints.   I still put in single quotes
without complaint, but I am removing this in the next version of my ClariNet
message processor.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

aardvark@cunix7.prime.com (Don Koch) (11/30/90)

In article <92373K5@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter
da Silva) writes:
|> In article <1990Nov19.231927@cunix7.prime.com>
aardvark@cunix7.prime.com (Don Koch) writes:
|> > Which causes some 'notes' to sometimes gag on the article ID. 
|> 
|> What characters does "notes" gag on? I've already trimmed my article
ID
|> to allow for old-fashioned B news sites that create file names in tmp
based
|> on the ID (L.<fxy/ba@voodoo.com> is a great file name :->), so a few
more
|> would be no problem.
|> -- 
|> Peter da Silva.   `-_-'
|> +1 713 274 5180.   'U`
|> peter@ferranti.com 

Notes looks for a number, in which your given example has none.  It
doesn't
generate a file name from it, it just stores the number in its internal
database.  Besides, L.<fxy/ba@voodoo.com> is a lousy filename for a
Unix
based system: the slash isn't allowed.  It's also a lousy article ID
with
the L. hanging off the front of it.  (I won't mention the other
disallowed
characters due to various RFCs since that's been hashed over several
times
already.  Too bad worm-cans don't have consumer warning labels on them.
:-))

Peter, you've got to learn to pick better examples :-).

I'll grant that notes isn't great at handling all article IDs.  It can
handle
C News ID, though.

--
Don Koch
aardvark@primerd.prime.com
These are only my opinions and not necessarily those of my employer.
"alt - the Fox TV of Netnews." -me

cudep@warwick.ac.uk (Ian Dickinson) (12/07/90)

In article <KHW7.C@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>Anyway, I'm compliant now... you want a copy of this version of mkid?

Yes please, Peter.
--
\/ato.  Ian Dickinson.      GNU's feelin' horny.    Send your dollars, Homeboy,
vato@warwick.ac.uk                Sabeq.            I'm a Pink Boy for "Bob"
vato@tardis.cs.ed.ac.uk
gdd046@cck.cov.ac.uk  I live a life of `going to' and I'll die with nuthin done

amanda@visix.com (Amanda Walker) (12/11/90)

Here's what I use to generate C News message IDs:

--------

#include <stdio.h>
#include <time.h>

char charset[] = "abcdefghijklmnopqrstuvwxyz012345";    /* 32 "digits" */

main()
{
        unsigned long t;
        unsigned short p;

        time(&t);
        p = getpid();

        printf("%c%c%c%c%c%c%c%c%c%c",
                charset[(t >> 25) & 31],
                charset[(t >> 20) & 31],
                charset[(t >> 15) & 31],
                charset[(t >> 10) & 31],
                charset[(t >>  5) & 31],
                charset[t & 31],
                charset[(p >> 10) & 31],
                charset[(p >> 5) & 31],
                charset[p & 31],
                charset[((t >> 30) & 3) + ((p >> 13) & 4)]); /* extra bits */
}

--------

It assumes that the time is 32 bits, and that the PID is 16, but
changing it to cope with other sizes is trivial.

It's not quite a minimum number of characters, but 10 characters seems close
enough for Usenet, and they're nice well-behaved alphanumeric strings.


Amanda Walker
Visix Software Inc.
-- 
"I have never seen anything fill up a vacuum so fast and still suck."
		--Rob Pike commenting on the X Window System

brad@looking.on.ca (Brad Templeton) (12/13/90)

Well, hey, if you want to get really small, you can use a 64 char safe set
without problems (letters,digits,dash and dot) and also divide the time by
64, unless you suspect your system will reboot and assign your pid in 60
seconds.  In fact, if it takes over 64 seconds to boot, you could take only
the lower 11 bits of the process number, unless your system will fork 2K
processes per minute.   Subtract today from your date and you will also
keep within 20 bits on the date for the next couple of years.   Thus 6 chars
will do it for as long as your software exists.   One could get even
smaller.


Actually "guaranteed unique short ascii string" might be a handy standard
library routine to have around, to use in place of all the getpid() file
names in the world.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473