henry@zoo.toronto.edu (Henry Spencer) (11/14/90)
As I believe I've mentioned before, we regularly run statistics on usage of different news systems by analyzing our history file. The major news systems generate message-IDs of distinctive forms, so it's not hard to get an idea of how many people are running which system. This counting approach has flaws -- in particular, it obviously tends to miss sites that seldom post anything to network-wide newsgroups -- but it has the enormous advantage that it's a quick, cheap, local operation, so we can do it weekly for keeping running track of the situation. The general patterns in recent times have been fairly consistent. We categorize news systems into B, C, and ?, the last being message-IDs which fit no known format. ? has been growing rapidly, probably as a result of more and more inter-network gatewaying; there is no single dominant pattern among the ? message-IDs, but a lot of them are clearly the results of gatewaying. B has been declining very slowly. And C has been growing steadily, with definite signs that the growth is accelerating. Anyway, the current milestone is that last weekend (while I was away at Windycon, which is why you're just hearing about it now), C passed 1000. Undoubtedly the C News site count actually hit four digits quite some time ago, since we know of major C News users who seldom post anything to the outside world, but it's definite now. -- "I don't *want* to be normal!" | Henry Spencer at U of Toronto Zoology "Not to worry." | henry@zoo.toronto.edu utzoo!henry
rk@theep.boston.ma.us (Robert A. Kukura) (11/14/90)
In article <1990Nov13.215847.23684@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
As I believe I've mentioned before, we regularly run statistics on usage
of different news systems by analyzing our history file. The major news
systems generate message-IDs of distinctive forms, so it's not hard to
get an idea of how many people are running which system. This counting
approach has flaws -- in particular, it obviously tends to miss sites
that seldom post anything to network-wide newsgroups -- but it has the
enormous advantage that it's a quick, cheap, local operation, so we can
do it weekly for keeping running track of the situation.
The general patterns in recent times have been fairly consistent. We
categorize news systems into B, C, and ?, the last being message-IDs
which fit no known format. ? has been growing rapidly, probably as a
result of more and more inter-network gatewaying; there is no single
dominant pattern among the ? message-IDs, but a lot of them are clearly
the results of gatewaying. B has been declining very slowly. And C
has been growing steadily, with definite signs that the growth is
accelerating.
For one, you are probably missing sites where the emacs gnus
newsreader is used because it generates its own ids of the form:
Message-ID: <RK.90Nov14085626@theep.boston.ma.us>
I don't know if any other newsreaders/posters generate their own
message ids.
Anyway, the current milestone is that last weekend (while I was away at
Windycon, which is why you're just hearing about it now), C passed 1000.
Undoubtedly the C News site count actually hit four digits quite some
time ago, since we know of major C News users who seldom post anything to
the outside world, but it's definite now.
Congratulations.
--
"I don't *want* to be normal!" | Henry Spencer at U of Toronto Zoology
"Not to worry." | henry@zoo.toronto.edu utzoo!henry
--
-Bob Kukura internet: rk@theep.boston.ma.us
uucp: spdcc!theep!rk
peter@ficc.ferranti.com (Peter da Silva) (11/15/90)
In article <1990Nov13.215847.23684@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: > The general patterns in recent times have been fairly consistent. We > categorize news systems into B, C, and ?, the last being message-IDs > which fit no known format. ? has been growing rapidly, probably as a > result of more and more inter-network gatewaying; Probably more because people are getting tired of the C-News message ID format and are installing programs like "mkid". For example, this message was posted via C news, but I'm sure the format gets put in '?'. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
tanner@cdis-1.compu.com (Dr. T. Andrews) (11/16/90)
In article <1990Nov13.215847.23684@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
) As I believe I've mentioned before, we regularly run statistics on
) usage of different news systems by analyzing our history file. The
) major news systems generate message-IDs of distinctive forms, ...
Right, and as you note there are people out there whose news systems
don't generate the ``expected'' message-ID form. I cite particularly
ours, which generates 7-digit (base 36 digits) message IDs.
We did not find the default C news message-IDs, which read like
Russian novels, suitable. The program "seq.c" which generates
these message-IDs is available as a ``shar'' file upon e-mailed
application to me at the address shown above. Act now, and receive
at no extra charge the patch to the "anne.jones" script to use
this program.
aardvark@cunix7.prime.com (Don Koch) (11/20/90)
In article <=8:6F68@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes: ... |> Probably more because people are getting tired of the C-News message ID |> format and are installing programs like "mkid". For example, this message |> was posted via C news, but I'm sure the format gets put in '?'. Which causes some 'notes' to sometimes gag on the article ID. Definitely a notes defect; but I guess it's one way to cut down on the flaming :-). Just to confuse things, we use C news but have outside posters that generate their own ids: xrn, nn and notes (rn? what's that??). -- Don Koch aardvark@primerd.prime.com These are only my opinions and not necessarily those of my employer.
peter@ficc.ferranti.com (Peter da Silva) (11/20/90)
In article <1990Nov19.231927@cunix7.prime.com> aardvark@cunix7.prime.com (Don Koch) writes: > Which causes some 'notes' to sometimes gag on the article ID. What characters does "notes" gag on? I've already trimmed my article ID to allow for old-fashioned B news sites that create file names in tmp based on the ID (L.<fxy/ba@voodoo.com> is a great file name :->), so a few more would be no problem. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
wisner@hayes.ims.alaska.edu (Bill Wisner) (11/21/90)
>It should get put in "Message IDs which don't conform to the RFCs". >1036 says quite clearly that the form should be "<unique@full_domain_name>" >in order to conform with 822. Er, Dave, <=8:6F68@xds13.ferranti.com> does look like <unique@full_domain_name> to me... Bill Wisner <wisner@hayes.ims.alaska.edu> Gryphon Gang Fairbanks AK 99775 "Hang it in your ear, Wisner." -- Jay Maynard <jay@splut.conmicro.com>
cudep@warwick.ac.uk (Ian Dickinson) (11/21/90)
In article <1990Nov13.215847.23684@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >Anyway, the current milestone is that last weekend (while I was away at >Windycon, which is why you're just hearing about it now), C passed 1000. >Undoubtedly the C News site count actually hit four digits quite some >time ago, since we know of major C News users who seldom post anything to >the outside world, but it's definite now. Nice to hear that a lot of people have changed over... But I'm going to change the Message-ID format soon since it's too long. I expect a fair proportion of sites will end up doing this. This will probably break your method of analysis further. At the moment though, this certainly gives a different picture than a version control, since generating automatic replys isn't always easy... However, is there any chance that the patchdate could be included in the version reply with future versions of Cnews? The simple 'C' we get now, doesn't give much information really. Thanks, -- \/ato. Ian Dickinson. GNU's feelin' horny. Kunst und Wahnsinn. vato@warwick.ac.uk Sabeq. Mind the gap! vato@tardis.cs.ed.ac.uk gdd046@cck.cov.ac.uk "I know what you sell - I don't want to buy!"
peter@ficc.ferranti.com (Peter da Silva) (11/21/90)
Regarding my message <=8:6F68@xds13.ferranti.com>: In article <NS|^SD|@rpi.edu> tale@rpi.edu (David C Lawrence) writes: > It should get put in "Message IDs which don't conform to the RFCs". > 1036 says quite clearly that the form should be "<unique@full_domain_name>" > in order to conform with 822. I don't understand what you're getting at. <=8:6F68@xds13.ferranti.com> consists of a unique string followed by this system's full domain name. % uuname -l xds13 -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
henry@zoo.toronto.edu (Henry Spencer) (11/22/90)
In article <1990Nov21.101834.15370@warwick.ac.uk> cudep@warwick.ac.uk (Ian Dickinson) writes: >But I'm going to change the Message-ID format soon since it's too long. >I expect a fair proportion of sites will end up doing this. There are a significant number of people who have taken this tack, but the number of people who've just installed it "straight" is much greater. Shorter message-IDs are coming. >However, is there any chance that the patchdate could be included in >the version reply with future versions of Cnews? It's on the "to be looked at" list. -- "I don't *want* to be normal!" | Henry Spencer at U of Toronto Zoology "Not to worry." | henry@zoo.toronto.edu utzoo!henry
newsadm@iddth.id.dk (Nick Sandru (news adm)) (11/23/90)
tanner@cdis-1.compu.com (Dr. T. Andrews) writes: >In article <1990Nov13.215847.23684@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: >) As I believe I've mentioned before, we regularly run statistics on >) usage of different news systems by analyzing our history file. The >) major news systems generate message-IDs of distinctive forms, ... >Right, and as you note there are people out there whose news systems >don't generate the ``expected'' message-ID form. I cite particularly >ours, which generates 7-digit (base 36 digits) message IDs. We have at our site a c-news system, an nntp server and nn news readers on the client machines. nn generates it's own message-IDs to be used with by nntp's inews, so c-news doesn't generate its own ones. Only the control messages get c-news-style message-IDs. Long Haired Nick.
rayan@cs.toronto.edu (Rayan Zachariassen) (11/28/90)
peter@ficc.ferranti.com (Peter da Silva) writes: >I don't understand what you're getting at. <=8:6F68@xds13.ferranti.com> >consists of a unique string followed by this system's full domain name. Uniqueness is a necessary but insufficient quality. It must also use valid syntax. Since : is an RFC822 special character, '=8:6F68' is not a valid local-part for the message-id. rayan, pedant
brad@looking.on.ca (Brad Templeton) (11/28/90)
In article <4_377B@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >Regarding my message <=8:6F68@xds13.ferranti.com>: > >In article <NS|^SD|@rpi.edu> tale@rpi.edu (David C Lawrence) writes: >> It should get put in "Message IDs which don't conform to the RFCs". >> 1036 says quite clearly that the form should be "<unique@full_domain_name>" >> in order to conform with 822. > >I don't understand what you're getting at. <=8:6F68@xds13.ferranti.com> >consists of a unique string followed by this system's full domain name. RFC822 says that message-ids must be <valid-domain-email-address> which is not just <unique@domain> but <validusername@domain>, which means that certain characters are out, like colons, commas, etc. In fact, the set of no-nos is: '<>"():,; \t[]@\\ quote, angles, double quote, parens, colon, comma, semi, space, tab, brackets, at-sign, backslash. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
Makey@Snoopy.Logicon.COM (Jeff Makey) (11/29/90)
In article <1990Nov27.230750.3478@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes: >RFC822 says that message-ids must be <valid-domain-email-address> which >is not just <unique@domain> but <validusername@domain> Nowhere does RFC 822 (or any other relvant specification that I know of) suggest that the local-part of a Message-Id field should be a valid user name. That would be rather silly, because then it would be impossible to distinguish (using Message-Id) two different messages from the same person. (Boy, would *that* cut down on Usenet traffic!) To quote most of the relevant RFC 822 syntax rules: msg-id = "<" addr-spec ">" ; Unique message id addr-spec = local-part "@" domain ; global address local-part = word *("." word) ; uninterpreted ; case-preserved word = atom / quoted-string atom = 1*<any CHAR except specials, SPACE and CTLs> specials = "(" / ")" / "<" / ">" / "@" ; Must be in quoted- / "," / ";" / ":" / "\" / <"> ; string, to use / "." / "[" / "]" ; within a word. It is quite clear that any of the objectionable characters are allowed by RFC 822 if they are quoted. :: Jeff Makey Department of Tautological Pleonasms and Superfluous Redundancies Department Disclaimer: All opinions are strictly those of the author. Domain: Makey@Logicon.COM UUCP: {ucsd,nosc}!snoopy!Makey
henry@zoo.toronto.edu (Henry Spencer) (11/29/90)
In article <846@Snoopy.Logicon.COM> Makey@Snoopy.Logicon.COM (Jeff Makey) writes: >It is quite clear that any of the objectionable characters are allowed >by RFC 822 if they are quoted. Be careful, however, because RFC1036 puts a few extra restrictions on for news. (Notably, ">" cannot appear in the message ID no matter what sort of games you play with quotes.) -- "The average pointer, statistically, |Henry Spencer at U of Toronto Zoology points somewhere in X." -Hugh Redelmeier| henry@zoo.toronto.edu utzoo!henry
urlichs@smurf.sub.org (Matthias Urlichs) (11/29/90)
In news.software.b, article <1990Nov27.230750.3478@looking.on.ca>,
brad@looking.on.ca (Brad Templeton) writes:
<
< In fact, the set of no-nos is:
<
< '<>"():,; \t[]@\\
<
Unfortunately, some of these work in message-IDs quite well and without any
problems whatever (e.g. colon, brackets), while e.g. a slash / would be
valid except that it is the separator for UNIX path names.
--
Matthias Urlichs -- urlichs@smurf.sub.org -- urlichs@smurf.ira.uka.de /(o\
Humboldtstrasse 7 - 7500 Karlsruhe 1 - FRG -- +49+721+621127(0700-2330) \o)/
peter@ficc.ferranti.com (Peter da Silva) (11/29/90)
In article <1990Nov27.230750.3478@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes: > RFC822 says that message-ids must be <valid-domain-email-address> which > is not just <unique@domain> but <validusername@domain>, which means that > certain characters are out, like colons, commas, etc. Valid user name? That sort of implies you should be able to handle mail directed to the message ID... hmmm... > [no] quote, angles, double quote, parens, colon, comma, semi, space, tab, > brackets, at-sign, backslash. And, for the sake of B News sites that create temp files that match the message ID for locking, it's a good idea to eschew slashes... Anyway, I'm compliant now... you want a copy of this version of mkid? -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
lear@turbo.bio.net (Eliot) (11/30/90)
brad@looking.on.ca (Brad Templeton) writes: >RFC822 says that message-ids must be <valid-domain-email-address> which >is not just <unique@domain> but <validusername@domain>, which means that >certain characters are out, like colons, commas, etc. RFC 822 states exactly the following: 4.6.1. MESSAGE-ID / RESENT-MESSAGE-ID This field contains a unique identifier (the local-part address unit) which refers to THIS version of THIS message. The uniqueness of the message identifier is guaranteed by the host which generates it. This identifier is intended to be machine readable and not necessarily meaningful to humans. A message identifier pertains to exactly one instantiation of a particular message; subsequent revisions to the message should each receive new message identifiers. In addition, msg-id = "<" addr-spec ">" ; Unique message id addr-spec = local-part "@" domain ; global address local-part = word *("." word) ; uninterpreted ; case-preserved word = atom / quoted-string quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or ; quoted chars. qtext = <any CHAR excepting <">, ; => may be folded "\" & CR, and including linear-white-space> So basically, the only characters that cannot be in the local part of a message id are "/" and CR. -- Eliot Lear [lear@turbo.bio.net]
brad@looking.on.ca (Brad Templeton) (11/30/90)
In article <gr#mg2.r[1@smurf.sub.org> urlichs@smurf.sub.org (Matthias Urlichs) writes: >In news.software.b, article <1990Nov27.230750.3478@looking.on.ca>, > brad@looking.on.ca (Brad Templeton) writes: >< >< In fact, the set of no-nos is: >< >< '<>"():,; \t[]@\\ >< >Unfortunately, some of these work in message-IDs quite well and without any >problems whatever (e.g. colon, brackets), while e.g. a slash / would be >valid except that it is the separator for UNIX path names. They "work" in message-ids, that's true. In the sense that most of the current software allows them. But they are not valid according to the standard -- except when quoted, as somebody correctly pointed out. Colons as an example of something you're wrong on. I used to put colons in message-ids, but I got a couple of complaints. I still put in single quotes without complaint, but I am removing this in the next version of my ClariNet message processor. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
aardvark@cunix7.prime.com (Don Koch) (11/30/90)
In article <92373K5@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes: |> In article <1990Nov19.231927@cunix7.prime.com> aardvark@cunix7.prime.com (Don Koch) writes: |> > Which causes some 'notes' to sometimes gag on the article ID. |> |> What characters does "notes" gag on? I've already trimmed my article ID |> to allow for old-fashioned B news sites that create file names in tmp based |> on the ID (L.<fxy/ba@voodoo.com> is a great file name :->), so a few more |> would be no problem. |> -- |> Peter da Silva. `-_-' |> +1 713 274 5180. 'U` |> peter@ferranti.com Notes looks for a number, in which your given example has none. It doesn't generate a file name from it, it just stores the number in its internal database. Besides, L.<fxy/ba@voodoo.com> is a lousy filename for a Unix based system: the slash isn't allowed. It's also a lousy article ID with the L. hanging off the front of it. (I won't mention the other disallowed characters due to various RFCs since that's been hashed over several times already. Too bad worm-cans don't have consumer warning labels on them. :-)) Peter, you've got to learn to pick better examples :-). I'll grant that notes isn't great at handling all article IDs. It can handle C News ID, though. -- Don Koch aardvark@primerd.prime.com These are only my opinions and not necessarily those of my employer. "alt - the Fox TV of Netnews." -me
cudep@warwick.ac.uk (Ian Dickinson) (12/07/90)
In article <KHW7.C@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >Anyway, I'm compliant now... you want a copy of this version of mkid? Yes please, Peter. -- \/ato. Ian Dickinson. GNU's feelin' horny. Send your dollars, Homeboy, vato@warwick.ac.uk Sabeq. I'm a Pink Boy for "Bob" vato@tardis.cs.ed.ac.uk gdd046@cck.cov.ac.uk I live a life of `going to' and I'll die with nuthin done
amanda@visix.com (Amanda Walker) (12/11/90)
Here's what I use to generate C News message IDs:
--------
#include <stdio.h>
#include <time.h>
char charset[] = "abcdefghijklmnopqrstuvwxyz012345"; /* 32 "digits" */
main()
{
unsigned long t;
unsigned short p;
time(&t);
p = getpid();
printf("%c%c%c%c%c%c%c%c%c%c",
charset[(t >> 25) & 31],
charset[(t >> 20) & 31],
charset[(t >> 15) & 31],
charset[(t >> 10) & 31],
charset[(t >> 5) & 31],
charset[t & 31],
charset[(p >> 10) & 31],
charset[(p >> 5) & 31],
charset[p & 31],
charset[((t >> 30) & 3) + ((p >> 13) & 4)]); /* extra bits */
}
--------
It assumes that the time is 32 bits, and that the PID is 16, but
changing it to cope with other sizes is trivial.
It's not quite a minimum number of characters, but 10 characters seems close
enough for Usenet, and they're nice well-behaved alphanumeric strings.
Amanda Walker
Visix Software Inc.
--
"I have never seen anything fill up a vacuum so fast and still suck."
--Rob Pike commenting on the X Window System
brad@looking.on.ca (Brad Templeton) (12/13/90)
Well, hey, if you want to get really small, you can use a 64 char safe set without problems (letters,digits,dash and dot) and also divide the time by 64, unless you suspect your system will reboot and assign your pid in 60 seconds. In fact, if it takes over 64 seconds to boot, you could take only the lower 11 bits of the process number, unless your system will fork 2K processes per minute. Subtract today from your date and you will also keep within 20 bits on the date for the next couple of years. Thus 6 chars will do it for as long as your software exists. One could get even smaller. Actually "guaranteed unique short ascii string" might be a handy standard library routine to have around, to use in place of all the getpid() file names in the world. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473