cfe+@andrew.cmu.edu (Craig F. Everhart) (04/13/88)
Folks, I apologize for the wide posting, but I've just gotten wind of a problem I'd like to nab as quickly as possible. (In addition to the personal recipients, this should be being posted to newsgroups news.software.b, news.software.nntp, and news.admin.) Apparently, there's a lot of software (nntp, in particular) that processes netnews that handles the local-parts of Message-ID fields as case-insensitive. We at andrew.cmu.edu have been generating case-sensitive local-parts for Message-ID fields (basically using A-Z, a-z, 0-9, and two other legal characters as the 64 ``digits'' for a base-64 representation of a large integer). Case-sensitivity is defended in RFC822 for the local-part type, which is what's used for a Message-ID; it's discussed in section 3.4.7 of RFC822. (There's an exception when the local-part is the string ``Postmaster'' of any alphabetic case, but we don't use that.) According to der Mouse, at least 2.10.3 lowercases Message-IDs when dealing with the dbm-format history file. There may well be additional instances of the problem; part of the reason for my wide broadcast is to determine how widely the problem is distributed. Why might Message-IDs be lowercased? Perhaps in order to canonicalize the capitalization of the domain name (as you recall, a Message-ID is basically "<" local-part "@" domain ">"). If this is a reasonable goal, it can still be done without lowercasing (``canonicalizing'') the capitalization of the local-part. I'm not interested in arguing whether it's a reasonable goal, but since nobody should be going in to the domain part of a Message-ID and re-capitalizing the domain name there, this lowercasing is probably completely unnecessary. Even if it's necessary, it's pretty simple to replace code like: lowerall(MsgIDBuf); with char *ptr; ptr = rindex(MsgIDBuf, "@"); if (ptr != NULL) lowerall(ptr); (This will fail to lowercase all the ``domain'' part only if there's a (quoted) ``@'' in the domain part. It will survive quoted ``@'' characters in the local-part. Missing the recapitalization of part of the domain part once every 50 years shouldn't be a problem: as previously stated, the domain part could probably stand never being lowercased.) Alternatively, the call to (e.g.) lowerall() could just be eliminated. As I said in an earlier post to news.admin, we've already dinked with the format of Andrew's message-IDs to coexist with netnews (its unpublished limitation to a maximum length). Finding out this new limitation on the semantics of netnews's Message-IDs suggests that maybe there are some more constraints buried in there. I'd like to know (a) what those additional constraints are, and (b) whether it's feasible to make this fix to as-yet-unreleased versions of news software. Alternatively, if somebody can defend the case-insensitive treatment of local-parts of Message-ID fields, I'm all ears. I fully realize that changing software sources is radically different from changing the behavior of widely-installed software. What I'm after, ideally, is assurance that the problem will be corrected in new and soon-to-be-released versions of netnews software (so that local-parts will be treated in a case-sensitive manner), that backbone sites will pick up the new software reasonably soon, and that in five years or so, everybody will have converted to code that handles Message-IDs properly. With assurances like those, the probability of a collision in the Message-ID fields we generate is low enough that I wouldn't plan on worrying about it. Why wouldn't I just change our message-ID format? Well, given the spec that's in place, can you tell me how many other composers of message-IDs might generate case-sensitive ones? (``It's not just *my* status--I have to be thinking of the precedent I set for *future* former Presidents...'') Thanks, Craig Everhart Andrew message system Internet: cfe+@andrew.cmu.edu