[news.admin] Message-ID bugs in nntp and netnews?

cfe+@andrew.cmu.edu (Craig F. Everhart) (04/12/88)

I happened to notice the message from der Mouse arguing that Message-IDs aren't
supposed to be case-sensitive.  We both agreed that according to RFC822,
they're case-sensitive (well, the part to the left of the @-sign is
case-sensitive).  Here are parts of an interchange we had on the subject.

> Date: Sat, 9 Apr 88 02:00:17 EDT
> From: der Mouse  <mouse@Larry.McRCIM.McGill.EDU>
> To: cfe+@andrew.cmu.edu
> Subject: Re: Message-IDs: how they're built

> > Yes, we running andrew.cmu.edu are under the impression that
> > Message-IDs are case-sensitive.  I can find you chapter and verse in
> > RFC822 if you like.

> I've found it myself.  (I was surprised by this when I noticed it in
> nntp.  Struck me as a bit of a misfeature, and now you tell me it's an
> out-and-out bug.)

> > Does netnews not conform?

> Well, our copy of 2.10.3 doesn't.  And I haven't touched the history
> file code.

> > How does it differ in this regard?

> When accessing the dbm-format history file, Message-IDs are lowercased.
> The text history file contains the Message-ID in the original case, but
> the dbm file contains it lowercased, and on lookup of course it is
> lowercased as well.

> > What might suffer if two different messages, with message-ID fields
> > differing only in case, hit the netnews distribution mechanism?

> Systems that have this bug, of which there are doubtless many, would
> reject whichever article they receive later, under the impression
> they've already seen it.  It will thus neither appear at nor propagate
> through such sites.

Thus, there's a clear problem.  We read the standards as permitting
case-sensitive local-part's in Message-IDs (even if the domain part can be
case-folded), yet widely-distributed software doesn't follow that convention.

How about if Netnews software only lowercased the @domain part of message-IDs
instead of the whole thing?  This would be correct, and would also deal with
the principal source of varying case in mail headers: varying capitalizations
of the same domain name.

As you can probably tell, Andrew message-IDs are a bunch of bits (time,
composing machine's IP address, other stuff) spelled out in a large-base number
(currently it's a base-64 number).  If the problem is serious enough, we could
switch to a smaller alphabet that doesn't require that upper and lower case be
treated distinctly.  This would, of course, make the message-IDs longer, and no
more transparently decodable.

We went to the base-64 scheme a year or so ago because our Message-IDs were too
long for some old netnews systems (longer than 64 characters), so we're a
little reluctant to make yet another change (to conform to netnews
requirements) without knowing if there are additional (#$@*%*) constraints that
we'd also have to meet.

I'm interested in hearing about (a) any additional constraints, other than
what's already in RFC822, (b) whether there's hope of getting ``the current''
news software fixed if it's indeed broken in this way, and (c) how far one
might realistically hope such fixes might propagate, and how soon.

If there's a better forum for this discussion, I'm happy to be educated.

                Thanks,
                Craig Everhart
                Andrew message system
                Internet: cfe+@andrew.cmu.edu

eggert@sea.sm.unisys.com (Paul Eggert) (04/13/88)

In article <IWMHjSy00VsLM4h7Rc@andrew.cmu.edu> cfe+@andrew.cmu.edu (Craig F.
Everhart) writes:

	We went to the base-64 scheme a year or so ago because our Message-IDs
	were too long for some old netnews systems....  I'm interested in
	hearing about ... any additional constraints, other than what's
	already in RFC822.

Beware "%".  We use base 90 here, employing the following Ascii characters
(spaces denote Ascii characters that we don't use):

	 !"#$ &'()*+,-./0123456789:; = ?
	 ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
	`abcdefghijklmnopqrstuvwxyz{|}~

The following printable characters were omitted:

	<,@,>	part of Message-ID syntax
	%	tickles a news.2.11 bug: a printf(x) should be a printf("%s",x)

Perhaps < need not be omitted?  I lack RFC822.  --paul

cfe+@andrew.cmu.edu (Craig F. Everhart) (04/13/88)

Thanks for the reply, and for an example of another system that assumes that
its Message-IDs won't be case-folded.

(By the way, Eric Raymond tells me that ``3.0 never lowercases or otherwise
modifies article-IDs. The length limit is presently 64, but could be changed
simply by tweaking a #define,'' so that suggests that the future won't
necessarily have this problem.)

RFC822 (and at least the ucbvax gateway) prohibits the following characters,
besides control characters, space, and rubout:
        "(),.:;<>@[\]
Looks like you are trying to use most of them (in particular:
        "(),.:;[\]
), so I'd recommend that you pull those last 10 characters and stick with a
base-80 representation.

                Craig Everhart
                Andrew message system

rees@apollo.uucp (Jim Rees) (04/16/88)

    I happened to notice the message from der Mouse arguing that Message-IDs aren't
    supposed to be case-sensitive.  We both agreed that according to RFC822,
    they're case-sensitive (well, the part to the left of the @-sign is
    case-sensitive).  Here are parts of an interchange we had on the subject.

RFC822 is not necessarily the correct standard.  There is another RFC that
covers news articles, although it generally defers to RFC822 when appropriate.

The news standard doesn't say anything about case folding, but it does say
that the part after the '@' is a site name, the implication being that this
part should be case-folded.  Unfortunately the current 2.11 software folds
the entire message-id, not just the site name part.  So if you want to make
sure your articles are seen, you had best make sure they are unique even if
the entire message-id is folded.

It would probably be a good idea if future releases of the news software
would only fold the sitename part of the message-id.  But for the foreseeable
future, we're stuck with the current scheme.  If anyone wants to tackle it,
the fix goes around the call to lcase() in history(), ifuncs.c.  You would
also need to fix expire, I think.

Historical note:  Back when 2.10 first came out, and article-id was turned
into message-id, the news software used to do some gratuitous domain-mucking
to guess that the domain of a pre-2.10 site was probably "UUCP."  Those of
us who weren't in the "UUCP" domain (I was the admin at uw-beaver.arpa
at the time) got screwed by this, because all our articles would show up
as duplicates, one with a .UUCP domain and the other with a .arpa domain.
To make a long story short, part of the fix was to case-fold the site
(and domain) part of the message-id.  I wondered at the time if we would
get in trouble for also folding the non-sitename part.  I guess maybe we have.

P.S.:  My message-id consists of a time stamp plus a unique node id assigned
at the factory to my Apollo node.  If you are on the Apollo corporate net
you can find out what node I'm using from the message-id.