[news.software.b] Why are C news message-IDs so non-minimalist?

roy@phri.nyu.edu (Roy Smith) (12/01/89)

	Given the underlying minimalist philosophy of C news, I'm surprised
they went from the minimalist B news Message-IDs of the form <sequence@host>
to the verbose <year.month.day.hour.minute.phase-of-moon@host> style, which
is only exceeded in verbosity by Andrew Message-IDs.  Why?  As far as I can
tell, it's just another 18 or so bytes to bloat the length of messages and
(more importantly) history files.
-- 
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
{att,philabs,cmcl2,rutgers,hombre}!phri!roy -or- roy@alanine.phri.nyu.edu
"The connector is the network"

lmb@vicom.com (Larry Blair) (12/01/89)

In article <1989Nov30.162609.9435@phri.nyu.edu> roy@phri.nyu.edu (Roy Smith) writes:
=
=	Given the underlying minimalist philosophy of C news, I'm surprised
=they went from the minimalist B news Message-IDs of the form <sequence@host>
=to the verbose <year.month.day.hour.minute.phase-of-moon@host> style, which
=is only exceeded in verbosity by Andrew Message-IDs.  Why?  As far as I can
=tell, it's just another 18 or so bytes to bloat the length of messages and
=(more importantly) history files.

This is an important question that has been asked repeatedly.  Besides causing
the citation line to run on (like the one above), it enlarges the history and
causes rn to barf on the long References: line.

Even though there have been informal patches posted to fix this, the vast
majority of C News sites are reluctant to include anything the is non-
official.

Henry and Geoff: this _is_ a problem that is growing as the use of C News
grows.  It affects everyone no matter what news system they are running.
It is only fair to the entire net that you post an official patch to
reduce the size of the Message-ID:.
-- 
Larry Blair   ames!vsi1!lmb   lmb@vicom.com

geoff@utstat.uucp (Geoff Collyer) (12/01/89)

I don't want to cope with rewriting a sequence-number file, since it's
a nuisance to avoid damaging it if the system crashes during the update.
The current verbose format is easy to generate in a shell script (i.e.
inews) and should be unique, though I regret the verbosity.  I'm in the
midst of revising inews and the new one should invoke a little program
to generate a compact and unique message-id (well, local-part) without
the aid of a sequence-number file.

What surprises me is that no one has complained about the host-part of
message-ids.  utstat still claims to be utstat.uucp; if it were to
claim to be utstat.toronto.edu (or even utstat.utstat.toronto.edu), the
host-part would meet or exceed the size of the current, bloated
local-part.  And we have short domain names; even given B-style
local-parts, nothing can help
<1234@national-institute-for-medical-research.mrc.ac.uk> or
<5678@vax.cancer-clinical-trials-unit.birmingham.ac.uk>, not even
compress (I did not make up the host-parts, honest).

department-of-statistics-university-of-toronto-ontario-canada-m5s-1a1.utstat.toronto.edu!geoff
-- 
Geoff Collyer		utzoo!utstat!geoff, geoff@utstat.toronto.edu

henry@utzoo.uucp (Henry Spencer) (12/01/89)

In article <1989Nov30.162609.9435@phri.nyu.edu> roy@phri.nyu.edu (Roy Smith) writes:
>	Given the underlying minimalist philosophy of C news, I'm surprised
>they went from the minimalist B news Message-IDs of the form <sequence@host>
>to the verbose <year.month.day.hour.minute.phase-of-moon@host> style, which
>is only exceeded in verbosity by Andrew Message-IDs.  Why?  ...

The crucial observation is that the *machinery* for generating the old
style is C code and is somewhat fragile in the presence of crashes etc.,
while the equivalent for the new style is robust and decentralized and
can be implemented in shell.  That is, we are being minimalist, but in
a non-obvious way.

Actually, we *do* agree that the message-ids are a bit long, and changes
to this are in the works.  One problem is that we can't do anything about
the length of the "host" part... and host names 20+ characters long are
not at all rare in articles these days.  The record in the survey I did
a few days ago was over 40, and Geoff claims to have seen still longer.
-- 
Mars can wait:  we've barely   |     Henry Spencer at U of Toronto Zoology
started exploring the Moon.    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

jm36+@andrew.cmu.edu (John Gardiner Myers) (12/01/89)

In <1989Nov30.162609.9435@phri.nyu.edu>, roy@phri.nyu.edu (Roy Smith)
writes:
> <year.month.day.hour.minute.phase-of-moon@host> style, which
> is only exceeded in verbosity by Andrew Message-IDs.  

Um, Andrew Message-ID's are shorter.  They contain much the same
information plus an IP address, though.
-- 
_.John G. Myers		Internet: John.G.Myers@andrew.cmu.edu
(412) 268-2984		LoseNet:  ...!seismo!ihnp4!wiscvm.wisc.edu!give!up

" Maynard) (12/03/89)

In <1989Nov30.162609.9435@phri.nyu.edu>, roy@phri.nyu.edu (Roy Smith)
writes:
> <year.month.day.hour.minute.phase-of-moon@host> style, which
> is only exceeded in verbosity by Andrew Message-IDs.  

Not all C news message IDs are 47K long...look at mine. It was generated
by a program written by Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us), and
the program is easy to splice into the C news system.

-- 
Jay Maynard, EMT-P, K5ZC, PP-ASEL   | Never ascribe to malice that which can
jay@splut.conmicro.com       (eieio)| adequately be explained by stupidity.
{attctc,bellcore}!texbell!splut!jay +----------------------------------------
 "...when hasn't gibberish been legal C?" -- Tom Horsley, tom@ssd.harris.com

henry@utzoo.uucp (Henry Spencer) (12/03/89)

In article <5:CS3_@splut.conmicro.com> jay@splut.conmicro.com (Jay "you ignorant splut!" Maynard) writes:
>Not all C news message IDs are 47K long...look at mine. It was generated
>by a program written by Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us), and
>the program is easy to splice into the C news system.

Jay, you might want to check whether Jon's program is using both uppercase
and lowercase in its message-ids.  It shouldn't, since the "local part" of
the message-id (before the "@") is case-insensitive.
-- 
Mars can wait:  we've barely   |     Henry Spencer at U of Toronto Zoology
started exploring the Moon.    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (12/04/89)

Here it is if anyone wants a copy.  I believe that there was something
similar posted awhile back but I didn't have a copy.

--- cut here --- mkid.c ----

/*
    Put this in /usr/lib/newsbin/inject and change anne.jones to
    use it to make a message id.
*/

/* string of some valid message id characters */

char	string[] = "!#$%^&*_+|-=~`{}'/?ABCDFGHJKLMNPQRSTVWXYZ1234567890";

#define size (sizeof string - 1)

main()
{
   long	time();

   /* subtract off the time I wrote this and assume that pids never get
      reused in 60 seconds. */

   print_num((time((long *)0) - 627672773) / 60);
   print_num((long)getpid());

   return 0;
}


print_num(num)
long	num;
{
   do {
      (void) printf("%c", string[num % size]);
      num /= size;
   } while (num);

}

-- 
Jon Zeeff    		<zeeff@b-tech.ann-arbor.mi.us>
Branch Technology 	<zeeff@b-tech.mi.org>

henry@utzoo.uucp (Henry Spencer) (12/04/89)

In article <1989Dec3.073310.18501@utzoo.uucp> I wrote:
>Jay, you might want to check whether Jon's program is using both uppercase
>and lowercase in its message-ids.  It shouldn't, since the "local part" of
>the message-id (before the "@") is case-insensitive.

Sigh...  I must start getting 8 hours of sleep a night.  The situation is
actually more complicated.  The rules (RFC1036 and 822) say that the
domain part -- after the "@" -- is case-insensitive, but the local part --
before the "@" -- is case-sensitive except for some odd special cases.
So one would think that case distinctions in the local part would be okay.
Unfortunately, B2.11 considers *both* parts case-insensitive for some
bizarre reason, and 2.11 is much too widely distributed to ignore.
-- 
Mars can wait:  we've barely   |     Henry Spencer at U of Toronto Zoology
started exploring the Moon.    | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

" Maynard) (12/04/89)

In article <1989Dec3.073310.18501@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In article <5:CS3_@splut.conmicro.com> jay@splut.conmicro.com (Jay "you ignorant splut!" Maynard) writes:
>>Not all C news message IDs are 47K long...look at mine. It was generated
>>by a program written by Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us), and
>>the program is easy to splice into the C news system.
>Jay, you might want to check whether Jon's program is using both uppercase
>and lowercase in its message-ids.  It shouldn't, since the "local part" of
>the message-id (before the "@") is case-insensitive.

I just looked; here's the declaration for the set of characters it will use:

char	string[] = "=:+#-&._ABCDFGHJKLMNPQRSTVWXYZ1234567890";

It takes the current minute, adds the current process ID, and then prints
it in the base of the length of the string above, using the characters
of the string as the numbers. (Did that make sense?) It's about 35 lines
long, and fast.

-- 
Jay Maynard, EMT-P, K5ZC, PP-ASEL   | Never ascribe to malice that which can
jay@splut.conmicro.com       (eieio)| adequately be explained by stupidity.
{attctc,bellcore}!texbell!splut!jay +----------------------------------------
 "...when hasn't gibberish been legal C?" -- Tom Horsley, tom@ssd.harris.com

weening@polya.Stanford.EDU (Joe Weening) (12/06/89)

In article <YRQVP1@b-tech.mi.org> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes:
>
>/* string of some valid message id characters */
>
>char	string[] = "!#$%^&*_+|-=~`{}'/?ABCDFGHJKLMNPQRSTVWXYZ1234567890";

This code, or something like it, is failing with older B News software
that doesn't like "/" in the Message-ID's, because it uses them to
construct filenames in /tmp.  If you are currently doing this, please
consider changing your code for the sake of others' sanity.

zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (12/06/89)

>>/* string of some valid message id characters */
>>
>>char	string[] = "!#$%^&*_+|-=~`{}'/?ABCDFGHJKLMNPQRSTVWXYZ1234567890";
>
>This code, or something like it, is failing with older B News software
>that doesn't like "/" in the Message-ID's, because it uses them to
>construct filenames in /tmp.  If you are currently doing this, please
>consider changing your code for the sake of others' sanity.

Anyone using mkid should remove '/' to accommodate this B news bug.  Does
anyone foresee any other problems with the character set (even though they
are legal according to the rfcs)?

-- 
Jon Zeeff    		<zeeff@b-tech.ann-arbor.mi.us>
Branch Technology 	<zeeff@b-tech.mi.org>

henry@utzoo.uucp (Henry Spencer) (12/07/89)

In article <TM+HD^@b-tech.uucp> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes:
>>>char	string[] = "!#$%^&*_+|-=~`{}'/?ABCDFGHJKLMNPQRSTVWXYZ1234567890";
>>
>Anyone using mkid should remove '/' to accommodate this B news bug.  Does
>anyone foresee any other problems with the character set (even though they
>are legal according to the rfcs)?

I would be a little bit nervous about |~`{} due to the obscenities
sometimes perpetrated when news flows over Bitnet links.
-- 
1233 EST, Dec 7, 1972:         |     Henry Spencer at U of Toronto Zoology
last ship sails for the Moon.  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

IRWIN@pucc.Princeton.EDU (Irwin Tillman) (12/08/89)

In article <1989Dec6.200813.5267@utzoo.uucp>, henry@utzoo.uucp
(Henry Spencer) writes:

>>>>char string[] = "!#$%^&*_+|-=~`{}'/?ABCDFGHJKLMNPQRSTVWXYZ1234567890";
>>>
>>Anyone using mkid should remove '/' to accommodate this B news bug.  Does
>>anyone foresee any other problems with the character set (even though they
>>are legal according to the rfcs)?
>
>I would be a little bit nervous about |~`{} due to the obscenities
>sometimes perpetrated when news flows over Bitnet links.

In addition, I'd suggest that you avoid the circumflex.  It also tends to get
munged when news flows over links that don't do ASCII/EBCDIC translation well.

james@bigtex.cactus.org (James Van Artsdalen) (12/09/89)

In <1989Dec6.200813.5267@utzoo.uucp>, henry@utzoo (Henry Spencer) writes:

| In <TM+HD^@b-tech.uucp> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes:
| char	string[] = "!#$%^&*_+|-=~`{}'/?ABCDFGHJKLMNPQRSTVWXYZ1234567890";

> > Anyone using mkid should remove '/' to accommodate this B news bug.  Does
> > anyone foresee any other problems with the character set (even though they
> > are legal according to the rfcs)?

> I would be a little bit nervous about |~`{} due to the obscenities
> sometimes perpetrated when news flows over Bitnet links.

Why go to the effort of finding out what characters won't work, and
instead just don't tempt fate?  I don't feel that irresistible urge to
discover another thousand ways to break news systems world-wide.
I can see no reason to use anything other than alphanumerics.

My failure mode: I run ihave/sendme messages through at(1) to delay
them before sending them out (a 48 hour delay makes ihave/sendme work
nice for backup news feeds).  A while back some mysterious error
messages came from cron.  The problem was that the body of the ihave
was in the at(1) script via "<<", and the shell was interpreting $ and `.