[comp.sources.bugs] trn valid_message_id

em@dce.ie (Eamonn McManus) (01/11/91)

When looking at message ids in References lines, trn performs a number of
checks to see if they are valid.  Apart from the obvious test of
well-formedness, which ensures that each id has the form <...@...>, it
considers invalid any id whose local part contains any lower case letter
and no digits.  This is the relevant code, from mt_process.c:

    /* Try to weed-out non-ids (user@domain) by looking for lower-case without
    ** digits in the unique portion.  B news ids are all digits; standard C
    ** news are digits with mixed case; and Zeeff message ids are any mixture
    ** of digits, certain punctuation characters and upper-case.
    */
    lower_case = 0;
    do {
	if( *start <= '9' && *start >= '0' ) {
	    return 1;					/* RETURN */
	}
	lower_case = lower_case || (*start >= 'a' && *start <= 'z');
    } while( ++start < mid );

    return !lower_case;

As a consequence, any followup to this article <whyme@dce.ie> will be
divorced from its parent, appearing as a parallel thread if it preserves
the Subject line, or as a completely different thread if the Subject is
changed.

What I would like to know is, What is the above test trying to accomplish?
How would <user@domain> get into a References or Message-Id line in the
first place?  Is there any reason why I should not hack the above code out
of mt-process.c (as in fact I have)?

,
Eamonn

davison%borla@kithrup.com (01/12/91)

> [trn] considers invalid any id whose local part contains any lower case
> letter and no digits.

This should read "...no upper case letters or digits."

Yes, it currently does this.  My copy hasn't been doing this for about 5
months now, and everyone else's copy will have this section removed in the
next patch (Coming Soon).

> What I would like to know is, What is the above test trying to accomplish?

If you've ever spent some time weeding through all the message ids that
come into your system, you'll notice some real weird ones out there.  An
easy way to get a feel for this is to run mthreads with the -v (verbose)
option and check out all the errors it mentions in the mt.log file.

There appears to be some gateways (or something) that don't support the
References line that have decided to use it for a path, a mailing list name
or something equally bizarre in the form of an otherwise well-formed message
id.  I was trying to weed these out, but the effort is not worth it.  As I
said, this code will disappear shortly.

The worst problem that mthreads has to deal with is the people who run their
article through some sort of subsitution that changes all '>'s into another
character.  They're trying to get around the cited-text limit, but in the
process they deform all the references on the References line.  Mthreads
attempts to deal with this by recognizing a number of popular transformations,
but it has to try to distinguish this specific mangling from, for example,
a truncated message id (which is another really common problem).  Because of
this, anyone who changes their citation character to a '.' in this reference-
mangling manner will not get their article put on the tree in the right
place.  Mthreads is unable to determine if "<555@joe.ca.us." was meant to
continue or not unless it memorizes the popular domain endings, and I
decided this wasn't worth the effort.

** Reference line mangling -- just say *No*! **
-- 
 \  /| / /|\/ /| /(_)     Wayne Davison
(_)/ |/ /\|/ / |/  \      0004475895@mcimail.com (preferred)
   (W   A  Y   N   e)     davison@dri.com (...!uunet!drivax!davison)

em@dce.ie (Eamonn McManus) (01/17/91)

0004475895@mcimail.com writes:
>> [trn] considers invalid any id whose local part contains any lower case
>> letter and no digits.
>
>This should read "...no upper case letters or digits."

I would like to know how the code I quoted could be interpreted this way.

,
Eamonn

davison%borla@kithrup.com (01/19/91)

I claimed:
> This should read "...no upper case letters or digits."

but Eamonn McManus wants to know:
> ... how the code I quoted could be interpreted this way.

Simple.  Through the marvels of faulty memory (mine).  You were, of course,
correct in your evaluation of the code.  I was allowing a faulty recollection
of what I thought I had implemented cloud my interpretation of the logic.

Anyway, keep an eye out for trn patch #2 -- it will be arriving in the next
few days (I've sent the completed patch out to a few sites for checking).
-- 
 \  /| / /|\/ /| /(_)     Wayne Davison
(_)/ |/ /\|/ / |/  \      0004475895@mcimail.com (preferred)
   (W   A  Y   N   e)     davison@dri.com (...!uunet!drivax!davison)