roy@phri.nyu.edu (Roy Smith) (12/01/89)
Given the underlying minimalist philosophy of C news, I'm surprised they went from the minimalist B news Message-IDs of the form <sequence@host> to the verbose <year.month.day.hour.minute.phase-of-moon@host> style, which is only exceeded in verbosity by Andrew Message-IDs. Why? As far as I can tell, it's just another 18 or so bytes to bloat the length of messages and (more importantly) history files. -- Roy Smith, Public Health Research Institute 455 First Avenue, New York, NY 10016 {att,philabs,cmcl2,rutgers,hombre}!phri!roy -or- roy@alanine.phri.nyu.edu "The connector is the network"
lmb@vicom.com (Larry Blair) (12/01/89)
In article <1989Nov30.162609.9435@phri.nyu.edu> roy@phri.nyu.edu (Roy Smith) writes:
=
= Given the underlying minimalist philosophy of C news, I'm surprised
=they went from the minimalist B news Message-IDs of the form <sequence@host>
=to the verbose <year.month.day.hour.minute.phase-of-moon@host> style, which
=is only exceeded in verbosity by Andrew Message-IDs. Why? As far as I can
=tell, it's just another 18 or so bytes to bloat the length of messages and
=(more importantly) history files.
This is an important question that has been asked repeatedly. Besides causing
the citation line to run on (like the one above), it enlarges the history and
causes rn to barf on the long References: line.
Even though there have been informal patches posted to fix this, the vast
majority of C News sites are reluctant to include anything the is non-
official.
Henry and Geoff: this _is_ a problem that is growing as the use of C News
grows. It affects everyone no matter what news system they are running.
It is only fair to the entire net that you post an official patch to
reduce the size of the Message-ID:.
--
Larry Blair ames!vsi1!lmb lmb@vicom.com
geoff@utstat.uucp (Geoff Collyer) (12/01/89)
I don't want to cope with rewriting a sequence-number file, since it's a nuisance to avoid damaging it if the system crashes during the update. The current verbose format is easy to generate in a shell script (i.e. inews) and should be unique, though I regret the verbosity. I'm in the midst of revising inews and the new one should invoke a little program to generate a compact and unique message-id (well, local-part) without the aid of a sequence-number file. What surprises me is that no one has complained about the host-part of message-ids. utstat still claims to be utstat.uucp; if it were to claim to be utstat.toronto.edu (or even utstat.utstat.toronto.edu), the host-part would meet or exceed the size of the current, bloated local-part. And we have short domain names; even given B-style local-parts, nothing can help <1234@national-institute-for-medical-research.mrc.ac.uk> or <5678@vax.cancer-clinical-trials-unit.birmingham.ac.uk>, not even compress (I did not make up the host-parts, honest). department-of-statistics-university-of-toronto-ontario-canada-m5s-1a1.utstat.toronto.edu!geoff -- Geoff Collyer utzoo!utstat!geoff, geoff@utstat.toronto.edu
henry@utzoo.uucp (Henry Spencer) (12/01/89)
In article <1989Nov30.162609.9435@phri.nyu.edu> roy@phri.nyu.edu (Roy Smith) writes: > Given the underlying minimalist philosophy of C news, I'm surprised >they went from the minimalist B news Message-IDs of the form <sequence@host> >to the verbose <year.month.day.hour.minute.phase-of-moon@host> style, which >is only exceeded in verbosity by Andrew Message-IDs. Why? ... The crucial observation is that the *machinery* for generating the old style is C code and is somewhat fragile in the presence of crashes etc., while the equivalent for the new style is robust and decentralized and can be implemented in shell. That is, we are being minimalist, but in a non-obvious way. Actually, we *do* agree that the message-ids are a bit long, and changes to this are in the works. One problem is that we can't do anything about the length of the "host" part... and host names 20+ characters long are not at all rare in articles these days. The record in the survey I did a few days ago was over 40, and Geoff claims to have seen still longer. -- Mars can wait: we've barely | Henry Spencer at U of Toronto Zoology started exploring the Moon. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
jm36+@andrew.cmu.edu (John Gardiner Myers) (12/01/89)
In <1989Nov30.162609.9435@phri.nyu.edu>, roy@phri.nyu.edu (Roy Smith) writes: > <year.month.day.hour.minute.phase-of-moon@host> style, which > is only exceeded in verbosity by Andrew Message-IDs. Um, Andrew Message-ID's are shorter. They contain much the same information plus an IP address, though. -- _.John G. Myers Internet: John.G.Myers@andrew.cmu.edu (412) 268-2984 LoseNet: ...!seismo!ihnp4!wiscvm.wisc.edu!give!up
" Maynard) (12/03/89)
In <1989Nov30.162609.9435@phri.nyu.edu>, roy@phri.nyu.edu (Roy Smith) writes: > <year.month.day.hour.minute.phase-of-moon@host> style, which > is only exceeded in verbosity by Andrew Message-IDs. Not all C news message IDs are 47K long...look at mine. It was generated by a program written by Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us), and the program is easy to splice into the C news system. -- Jay Maynard, EMT-P, K5ZC, PP-ASEL | Never ascribe to malice that which can jay@splut.conmicro.com (eieio)| adequately be explained by stupidity. {attctc,bellcore}!texbell!splut!jay +---------------------------------------- "...when hasn't gibberish been legal C?" -- Tom Horsley, tom@ssd.harris.com
henry@utzoo.uucp (Henry Spencer) (12/03/89)
In article <5:CS3_@splut.conmicro.com> jay@splut.conmicro.com (Jay "you ignorant splut!" Maynard) writes: >Not all C news message IDs are 47K long...look at mine. It was generated >by a program written by Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us), and >the program is easy to splice into the C news system. Jay, you might want to check whether Jon's program is using both uppercase and lowercase in its message-ids. It shouldn't, since the "local part" of the message-id (before the "@") is case-insensitive. -- Mars can wait: we've barely | Henry Spencer at U of Toronto Zoology started exploring the Moon. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (12/04/89)
Here it is if anyone wants a copy. I believe that there was something similar posted awhile back but I didn't have a copy. --- cut here --- mkid.c ---- /* Put this in /usr/lib/newsbin/inject and change anne.jones to use it to make a message id. */ /* string of some valid message id characters */ char string[] = "!#$%^&*_+|-=~`{}'/?ABCDFGHJKLMNPQRSTVWXYZ1234567890"; #define size (sizeof string - 1) main() { long time(); /* subtract off the time I wrote this and assume that pids never get reused in 60 seconds. */ print_num((time((long *)0) - 627672773) / 60); print_num((long)getpid()); return 0; } print_num(num) long num; { do { (void) printf("%c", string[num % size]); num /= size; } while (num); } -- Jon Zeeff <zeeff@b-tech.ann-arbor.mi.us> Branch Technology <zeeff@b-tech.mi.org>
henry@utzoo.uucp (Henry Spencer) (12/04/89)
In article <1989Dec3.073310.18501@utzoo.uucp> I wrote: >Jay, you might want to check whether Jon's program is using both uppercase >and lowercase in its message-ids. It shouldn't, since the "local part" of >the message-id (before the "@") is case-insensitive. Sigh... I must start getting 8 hours of sleep a night. The situation is actually more complicated. The rules (RFC1036 and 822) say that the domain part -- after the "@" -- is case-insensitive, but the local part -- before the "@" -- is case-sensitive except for some odd special cases. So one would think that case distinctions in the local part would be okay. Unfortunately, B2.11 considers *both* parts case-insensitive for some bizarre reason, and 2.11 is much too widely distributed to ignore. -- Mars can wait: we've barely | Henry Spencer at U of Toronto Zoology started exploring the Moon. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
" Maynard) (12/04/89)
In article <1989Dec3.073310.18501@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <5:CS3_@splut.conmicro.com> jay@splut.conmicro.com (Jay "you ignorant splut!" Maynard) writes: >>Not all C news message IDs are 47K long...look at mine. It was generated >>by a program written by Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us), and >>the program is easy to splice into the C news system. >Jay, you might want to check whether Jon's program is using both uppercase >and lowercase in its message-ids. It shouldn't, since the "local part" of >the message-id (before the "@") is case-insensitive. I just looked; here's the declaration for the set of characters it will use: char string[] = "=:+#-&._ABCDFGHJKLMNPQRSTVWXYZ1234567890"; It takes the current minute, adds the current process ID, and then prints it in the base of the length of the string above, using the characters of the string as the numbers. (Did that make sense?) It's about 35 lines long, and fast. -- Jay Maynard, EMT-P, K5ZC, PP-ASEL | Never ascribe to malice that which can jay@splut.conmicro.com (eieio)| adequately be explained by stupidity. {attctc,bellcore}!texbell!splut!jay +---------------------------------------- "...when hasn't gibberish been legal C?" -- Tom Horsley, tom@ssd.harris.com
weening@polya.Stanford.EDU (Joe Weening) (12/06/89)
In article <YRQVP1@b-tech.mi.org> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes: > >/* string of some valid message id characters */ > >char string[] = "!#$%^&*_+|-=~`{}'/?ABCDFGHJKLMNPQRSTVWXYZ1234567890"; This code, or something like it, is failing with older B News software that doesn't like "/" in the Message-ID's, because it uses them to construct filenames in /tmp. If you are currently doing this, please consider changing your code for the sake of others' sanity.
zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) (12/06/89)
>>/* string of some valid message id characters */ >> >>char string[] = "!#$%^&*_+|-=~`{}'/?ABCDFGHJKLMNPQRSTVWXYZ1234567890"; > >This code, or something like it, is failing with older B News software >that doesn't like "/" in the Message-ID's, because it uses them to >construct filenames in /tmp. If you are currently doing this, please >consider changing your code for the sake of others' sanity. Anyone using mkid should remove '/' to accommodate this B news bug. Does anyone foresee any other problems with the character set (even though they are legal according to the rfcs)? -- Jon Zeeff <zeeff@b-tech.ann-arbor.mi.us> Branch Technology <zeeff@b-tech.mi.org>
henry@utzoo.uucp (Henry Spencer) (12/07/89)
In article <TM+HD^@b-tech.uucp> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes: >>>char string[] = "!#$%^&*_+|-=~`{}'/?ABCDFGHJKLMNPQRSTVWXYZ1234567890"; >> >Anyone using mkid should remove '/' to accommodate this B news bug. Does >anyone foresee any other problems with the character set (even though they >are legal according to the rfcs)? I would be a little bit nervous about |~`{} due to the obscenities sometimes perpetrated when news flows over Bitnet links. -- 1233 EST, Dec 7, 1972: | Henry Spencer at U of Toronto Zoology last ship sails for the Moon. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
IRWIN@pucc.Princeton.EDU (Irwin Tillman) (12/08/89)
In article <1989Dec6.200813.5267@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes: >>>>char string[] = "!#$%^&*_+|-=~`{}'/?ABCDFGHJKLMNPQRSTVWXYZ1234567890"; >>> >>Anyone using mkid should remove '/' to accommodate this B news bug. Does >>anyone foresee any other problems with the character set (even though they >>are legal according to the rfcs)? > >I would be a little bit nervous about |~`{} due to the obscenities >sometimes perpetrated when news flows over Bitnet links. In addition, I'd suggest that you avoid the circumflex. It also tends to get munged when news flows over links that don't do ASCII/EBCDIC translation well.
james@bigtex.cactus.org (James Van Artsdalen) (12/09/89)
In <1989Dec6.200813.5267@utzoo.uucp>, henry@utzoo (Henry Spencer) writes: | In <TM+HD^@b-tech.uucp> zeeff@b-tech.ann-arbor.mi.us (Jon Zeeff) writes: | char string[] = "!#$%^&*_+|-=~`{}'/?ABCDFGHJKLMNPQRSTVWXYZ1234567890"; > > Anyone using mkid should remove '/' to accommodate this B news bug. Does > > anyone foresee any other problems with the character set (even though they > > are legal according to the rfcs)? > I would be a little bit nervous about |~`{} due to the obscenities > sometimes perpetrated when news flows over Bitnet links. Why go to the effort of finding out what characters won't work, and instead just don't tempt fate? I don't feel that irresistible urge to discover another thousand ways to break news systems world-wide. I can see no reason to use anything other than alphanumerics. My failure mode: I run ihave/sendme messages through at(1) to delay them before sending them out (a 48 hour delay makes ihave/sendme work nice for backup news feeds). A while back some mysterious error messages came from cron. The problem was that the body of the ihave was in the at(1) script via "<<", and the shell was interpreting $ and `.