[news.software.b] Assumptions in mthreads about References/Message-Id lines

dlee@pallas.athenanet.com (Doug Lee) (01/29/91)

I've been running mthreads for a few days with maximum verbosity and have
noticed some problems with message ids.  In particular, the following
types of ids are considered 'Bad ref's by mthreads:

<CBM.91.03.13>  (all comp.binaries.mac ids look like this)
<HANK@vm.tau.ac.il>writes  (legal id, but mthreads doesn't stop at the '>')
    [The above is from a "cited-text" reference]
<1991Jan22.215801.4557@Neon.Sta  (too long;
  source:  "References: <1991Jan18.231330.16290@Neon.Stanford.EDU>")
[null reference]  (resulting from the following lines (indentation below
    "References:" preserved):)
  References: <1991Jan23.213736.28220@Neon.Stanford.EDU> 
   <1991Jan24.152931.1325@NCoast.ORG><1991Jan25.073516.29644@Neon.Stanford.EDU> 
   <1991Jan26.035750.11786@NCoast.ORG><1991Jan27.014242.2863@Neon.Stanford.EDU>

Admittedly, some of the above references are obnoxious in one way or another
(IMHO, 42 characters is unnecessarily long for a message id), but my copy of
standard.mn (_Standard for Interchange of USENET Messages_, October 20, 1986)
advises against making ANY assumptions about message ids except that they
    1) are begun with '<' and terminated with '>', and
    2) contain no white space, non-printing characters, or extra '<'s or '>'s.
(Further limitations are, however, strongly advised in this standard.)  As for
the multi-line example, I have not found any standard SPECIFICALLY permitting
or forbidding this practice.  I suspect mthreads' having come up with a null
reference may have had something to do with the fact that the first two lines
of the multi-line "References:" information ended with a space, but this is
purely guesswork.

Considering the above examples, would there be any objection to loosening the
requirements for valid message ids in the trn/mthreads code?

I am posting rather than mailing this question so as to permit further
observations and/or discussion regarding this problem.  Also, as it is quite
possible that the standard I referred to has been superseded, I welcome
corrections.

-- 
Doug Lee  (dlee@athenanet.com or uunet!pallas!dlee)

eggert@twinsun.com (Paul Eggert) (01/29/91)

dlee@pallas.athenanet.com (Doug Lee) writes:

	my copy of standard.mn (_Standard for Interchange of USENET Messages_,
	October 20, 1986) advises against making ANY assumptions about message
	ids except that they

		1) are begun with '<' and terminated with '>', and
		2) contain no white space, non-printing characters, or extra
			'<'s or '>'s.

	... it is quite possible that the standard I referred to has been
	superseded, I welcome corrections.

Message-ids are not as simple as they should be.  No news system I know of
implements message-ids completely.  But if you want to know the current
news-related standards, the following list may be a bit more up-to-date than
your copy:

    RFC 822 specifies the format of messages; RFC 1036 uses this.
    RFC 977 specifies NNTP, the Network News Transfer Protocol.
    RFC 1036 specifies the format of Usenet articles.
    RFC 1123 amends RFC 822.
    RFC 1153 specifies the digest format sometimes used in moderated groups.

(An RFC is a Request For Comment, a de facto standard in the Internet
Community.  It is a form of published software standard, done through the
Network Information Center (NIC) at SRI.  Copies of RFCs are often posted to
the net and obtainable from archive sites.)

henry@zoo.toronto.edu (Henry Spencer) (01/30/91)

In article <1991Jan29.064624.357@twinsun.com> eggert@twinsun.com (Paul Eggert) writes:
>... if you want to know the current
>news-related standards, the following list may be a bit more up-to-date than
>your copy:
>
>    RFC 822 specifies the format of messages; RFC 1036 uses this.
>    RFC 977 specifies NNTP, the Network News Transfer Protocol.
>    RFC 1036 specifies the format of Usenet articles.
>    RFC 1123 amends RFC 822.
>    RFC 1153 specifies the digest format sometimes used in moderated groups.

You might also want to look at notebook/rfcerrata in the C News distribution,
which corrects a number of minor errors in RFC1036 and discusses some
ambiguous and/or contentious points about it.
-- 
If the Space Shuttle was the answer,   | Henry Spencer at U of Toronto Zoology
what was the question?                 |  henry@zoo.toronto.edu   utzoo!henry