rsalz@bbn.com (Rich Salz) (03/22/91)
I know there are several Message-ID formats out there: B news sequence-number style. C News verbose date style. Various radix-64 compressions of the above. How about this one <yydddss.pppp@host> where yy Last two digits of the year ddd The day of the year, 000-365. ss The current number of seconds, 00-59. pppp The Process ID (not fixed format). host The hostname (not fixed format). For example 9203212.1856@papaya.bbn.com This is 27 bytes long. The host-part is invariant, and the unique-part is only 12 bytes, but it will vary by a couple depending on the pid. Obviously, the only time this will have a problem is if the same process submits two articles within the same second. I don't think that's likely to happen. Comments? /r$ -- Please send comp.sources.unix-related mail to rsalz@uunet.uu.net. Use a domain-based address or give alternate paths, or you may lose out.
francis@wolfman.cis.ohio-state.edu (RD Francis) (03/22/91)
In article <3427@litchi.bbn.com> rsalz@bbn.com (Rich Salz) writes:
How about this one
<yydddss.pppp@host>
where
yy Last two digits of the year
ddd The day of the year, 000-365.
ss The current number of seconds, 00-59.
pppp The Process ID (not fixed format).
host The hostname (not fixed format).
For example
OK, let's risk making a real fool of myself; as far as I have been
able to tell in the past, when a machine is rebooted, it starts
counting processes from 1 (or whatever) again. Imagine a machine
that goes down and comes back up twice in one day, and it'd easy to
imagine a situation where someone could *possibly* hit a conflict,
unlikely as it is.
Am I exposing my relative ignorance of the guts of Unix here?
--
R David Francis francis@cis.ohio-state.edu
wisner@ims.alaska.edu (Bill Wisner) (03/22/91)
> <yydddss.pppp@host> > where > yy Last two digits of the year > ddd The day of the year, 000-365. > ss The current number of seconds, 00-59. > pppp The Process ID (not fixed format). > host The hostname (not fixed format). I'd make the year four digits, to guarantee uniqueness (at least for the next eight thousand years). Also, this scheme has a big hole you could pilot a B-2 through. I've used machines that were used heavily enough to cycle through all 30,000 PIDs several times in a day. On such a machine, it's almost inevitable that sooner or later a process will manage to to repeat a PID/seconds combination. Bill Wisner <wisner@ims.alaska.edu> Gryphon Gang Fairbanks AK 99775 "If you have a problem with one of my users, take it to me, and if I need to kill them, I will." -- Eliot Lear <lear@turbo.bio.net>
tar@math.ksu.edu (Tim Ramsey) (03/22/91)
rsalz@bbn.com (Rich Salz) writes: >How about this one > <yydddss.pppp@host> How about: <tttttttt.pppp@host> where: tttttttt is the return value of time(2) in base-16 and: pppp is the process id At the time I posted this, t == 27e988be. That's only 8 characters. If you went to a larger radix this would be smaller. -- Tim Ramsey (tar@math.ksu.edu) (913) 532-6750 (voice) (913) 532-7004 (FAX) Department of Mathematics, Kansas State University, Manhattan KS 66506-2602
brad@looking.on.ca (Brad Templeton) (03/22/91)
Is there any reason for the message-id to be readable? The date is elsewhere in the message, and indeed elsewhere in the history file to some extent. I say make it as small as you can, either the sequence number, which is smallest, or a radix 85 (or however many safe characters there are in message-ids) encoding of the minute and process-id, with epoch when you started your site up. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
billd@fps.com (Bill Davidson) (03/22/91)
In article <3427@litchi.bbn.com> rsalz@bbn.com (Rich Salz) writes: >How about this one > <yydddss.pppp@host> >where > yy Last two digits of the year > ddd The day of the year, 000-365. > ss The current number of seconds, 00-59. > pppp The Process ID (not fixed format). > host The hostname (not fixed format). >For example > 9203212.1856@papaya.bbn.com >This is 27 bytes long. The host-part is invariant, and the unique-part is >only 12 bytes, but it will vary by a couple depending on the pid. > >Obviously, the only time this will have a problem is if the same process >submits two articles within the same second. I don't think that's likely >to happen. The same PID can occur with two different processes during the same day due to turn-over. This is quite common on fast machines that run with dozens of users. Two posts could easily occur during the same second of their given minute (obviously they are probably hours apart). Sure it's unlikely but with as many machines as are on the net do you really want to take the gamble? If enough machines run with this scheme for enough hours, it will break. Maybe something more along the lines of hhmmss. It adds four more chars but still guarantees uniqueness. It also makes the Message-ID hard to predict for sendme message abusers (one of the goals of the Cnews style). Alternatively, the number of seconds in that have occured in that day since midnight could be used. This would only add three more chars since there are only 86400 seconds in a day. You could put it in hex and do the number of seconds since 12am Jan 1 and only add 2 chars. I guess I'm starting to run amuck. Sorry. --Bill -- *ANOTHER* dumb move! -- Dick Spanner, Private Investigator
rsalz@bbn.com (Rich Salz) (03/22/91)
I got email pointing out that if the PID wraps around in less than 24 hours, and the new process posts within the same second, there will be a conflict. Hmm... I don't think this is likely unless the machine crashes a lot. (Maybe it *IS* that likely. :-) At any rate, here's what I'm going to do now: <ddd.sss.ppp@fqdn> where ddd The day of the year, in radix 64 (not fixed width) sss The second within the day, in radix 64 (not fixed width) ppp The process ID, in radix 64 (not fixed width) fqdn The fully-qualified domain name Here's some sample code: /* ** Test program to generate Message-ID's. ** The ID includes the day number, the second within the day, and the current ** process ID. To conserver space, they are decoded into radix-64 strings, ** using [0-9a-zA-Z.+] to represent 0..63. Assumes 32-bit longs. ** ** Rich $alz <rsalz@bbn.com>, 22-March-1991. */ #include <stdio.h> #include <sys/types.h> #include <time.h> static char ALPHABET[] = "+.ZYXWVUTSRQPONMLKJIHGFEDCBAzyxwvutsrqponmlkjihgfedcba9876543210"; extern char *strchr(); /* ** Turn a number into a Radix-64 string. */ void Radix64(l, buff) register unsigned long l; register char *buff; { register char *p; register int i; char temp[20]; /* Simple sanity checks. */ l &= 0xFFFFFFFF; if (l == 0) { *buff++ = '0'; *buff = '\0'; return; } /* Format the string, in reverse. */ for (p = temp; l; l >>= 6) *p++ = ALPHABET[(int)(l & 077)]; /* Reverse it. */ for (i = p - temp; --i >= 0; ) *buff++ = *--p; *buff = '\0'; } /* ** Decode and print a radix-64 string as a number. */ Decode64(what, l, p) char *what; long l; char *p; { long l2; char *cp; printf("%s: %ld = %s = ", what, l, p); for (l2 = 0; *p; p++) { if ((cp = strchr(ALPHABET, *p)) == NULL) { printf("-->Invalid char %c\n", *p); return; } l2 = (l2 << 6) + cp - ALPHABET; } printf("%ld\n", l2); } /* ** Stub routine to get the fully-qualified domain name of this host. */ char * GetFQDN() { static char buff[256]; gethostname(buff, sizeof buff); return buff; } main() { struct tm *gmt; time_t now; char day64[20]; char pid64[20]; char sec64[20]; unsigned long day; unsigned long sec; unsigned long pid; (void)time(&now); gmt = gmtime(&now); day = gmt->tm_year * 1000L + gmt->tm_yday; sec = gmt->tm_hour * 3600L + gmt->tm_min * 60L + gmt->tm_sec; pid = getpid(); Radix64(day, day64); Radix64(sec, sec64); Radix64(pid, pid64); printf("<%s.%s.%s@%s>\n", day64, sec64, pid64, GetFQDN()); Decode64("day", day, day64); Decode64("sec", sec, sec64); Decode64("pid", pid, pid64); } -- Please send comp.sources.unix-related mail to rsalz@uunet.uu.net. Use a domain-based address or give alternate paths, or you may lose out.
rees@pisa.citi.umich.edu (Jim Rees) (03/22/91)
There are lots of times when you want a unique identifier. NFS file handles, user/group identifiers, IPC port ids, and so on. Some operating systems provide a way to get an opaque bag-of-bits that is unique for all time, for any application that needs it. The OS that I use has such a feature, so I use it to generate message ids. They contain a time stamp and a cpu serial number. I'm not sure why Mach doesn't have unique ids (uids), as many older CMU OSs had them, as did Eden, which was partly CMU inspired (Guy Almes was from CMU). Uids may be just another Multics-era idea that got lost in the quest for "simplicity" (is Unix still simpler than Multics?)
henry@zoo.toronto.edu (Henry Spencer) (03/23/91)
In article <1991Mar22.044131.3764@maverick.ksu.ksu.edu> tar@math.ksu.edu (Tim Ramsey) writes: > <tttttttt.pppp@host> > > where: tttttttt is the return value of time(2) in base-16 > and: pppp is the process id >...If you went to a larger radix this would be smaller. This is what's planned for C News, in fact, with a carefully-chosen alphabet. (You can't be too ambitious with the alphabet if you want to get it past all the broken systems, e.g. B2.11 which does completely case-insensitive message- ID matching, but you can do better than hex.) -- "[Some people] positively *wish* to | Henry Spencer @ U of Toronto Zoology believe ill of the modern world."-R.Peto| henry@zoo.toronto.edu utzoo!henry
tale@rpi.edu (David C Lawrence) (03/23/91)
As I am discovering now, articles are silently failing to be delivered to some (lots?) of sites because of some unspecified quality of my message-ids. I even trimmed the set by several characters, though they were always all valid RFC-822 id characters. This is a problem for me because of news.announce.newgroups, but I want to know just what it is that is causing the problem before I hack into it and change it to just use base 37 ([a-z0-9.]). As Henry has already pointed out, your base 64 is going to have a problem because of older B News sites, of which there are an appreciable amount to be annoying --- they do case-insensitive id handling. -- (setq mail '("tale@rpi.edu" "uupsi!rpi!tale" "tale@rpitsmts.bitnet"))
palkovic@linac.fnal.gov (John Palkovic) (03/23/91)
>>>>> On 22 Mar 91 05:07:49 GMT, brad@looking.on.ca (Brad Templeton) said: > Is there any reason for the message-id to be readable? ... Why should the headers be readable? The articles usually aren't. :-) How about this little subroutine? Look in the header of this article for an example. As long as pid's don't repeat in 60 sec it is fine. /* * The following was inspired by a program apparently written by Jon Zeeff * (zeeff@b-tech.ann-arbor.mi.us). Palkovic@linac.fnal.gov, 3/16/91. */ /* * A string of some valid message id characters */ char string[] = "!#$%^&*_+|-=~`{}'?ABCDFGHJKLMNPQRSTVWXYZ1234567890"; #define size (sizeof(string) - 1) void rand_id(s) char *s; { int getpid(); long num; num = (time((long *) 0) - 658216800)/60; do { *s++ = string[num % size]; num /= size; } while (num); num = (long) getpid(); do { *s++ = string[num % size]; num /= size; } while (num); }
wb8foz@mthvax.cs.miami.edu (David Lesher) (03/23/91)
(Brad Templeton) writes: >Is there any reason for the message-id to be readable? I guess I'm the outcast. I *like* the cnews message-id's. Machines that hoard old news, then suddenly dump them back on the net with new dates seem, alas, to be a regular "feature" in recent years. The message-id's are a good sanity check on the "Is this pointless argument STILL going on?" feeling when this happens. -- A host is a host from coast to coast.....wb8foz@mthvax.cs.miami.edu & no one will talk to a host that's close............(305) 255-RTFM Unless the host (that isn't close)......................pob 570-335 is busy, hung or dead....................................33257-0335
peter@taronga.hackercorp.com (Peter da Silva) (03/23/91)
brad@looking.on.ca (Brad Templeton) writes: > I say make it as small as you can, either the sequence number, which > is smallest, or a radix 85 (or however many safe characters there are in > message-ids) encoding of the minute and process-id, with epoch when > you started your site up. I use radix-36. The number of safe characters including punctuation really doesn't save that much space... I'm not going to worry about an extra byte or two. -- (peter@taronga.uucp.ferranti.com) `-_-' 'U`
igb@fulcrum.bt.co.uk (Ian G Batten) (03/26/91)
In article <5084abb1.1bc5b@pisa.citi.umich.edu> rees@citi.umich.edu (Jim Rees) writes: > CMU). Uids may be just another Multics-era idea that got lost in the quest > for "simplicity" (is Unix still simpler than Multics?) I recall that Multics unique identifiers are only unique per-installation and provided the clock is never reset. They provided a base-foo coding the clock which was ticking microseconds from 1900. I was told that there were some hairy interlocks on the clock so that in multi-processor set-ups only one cpu could get a given clock value. ian
fitz@wang.com (Tom Fitzgerald) (03/27/91)
brad@looking.on.ca (Brad Templeton) writes: > I say make it as small as you can, either the sequence number, which > is smallest, or a radix 85 (or however many safe characters there are in > message-ids) encoding of the minute and process-id, with epoch when > you started your site up. The number of safe characters is way below 85 unfortunately. But above 36 it really doesn't do you a lot of good. If you want to crunch 31 bits of timestamp and 15 bits of process ID into a string, it's easy to get a 10-character result (like you'll see in the message ID of this article) and a pain to get any shorter than that. Some points on the curve are: for 31-bit date: 6 characters, alphabet size must be 36 or greater 5 characters, alphabet size must be 74 or greater for 15-bit process ID: 3 characters, alphabet size must be 32 or greater 2 characters, alphabet size must be 182 or greater So using lowercase letters and digits gives you a 10 character identifier (with the separating dot). It's impossible to get the alphabet size to 74 characters since some systems (early C news systems? VMS systems running ANU news? Somebody...) require case-insensitive message IDs. You could easily get rid of the dot separator by ALWAYS using 6+3 characters. By treating the timestamp and process ID as a single 46-bit number, things can get even smaller: for 46-bit combined timestamp and process ID: 9 characters, alphabet size must be 35 or greater 8 characters, alphabet size must be 54 or greater So for a 8-character identifier, the alphabet can be letters, digits and 18 random punctuation marks, which isn't too hard. All this assumes that the world will come to an end in January of 2038, but we all understand that. Do any systems use 16-bit process IDs? --- Tom Fitzgerald Wang Labs fitz@wang.com 1-508-967-5278 Lowell MA, USA ...!uunet!wang!fitz
irwin@uvmark.uucp (Frank Irwin) (03/28/91)
In article <b2x77c.9uo@wang.com> fitz@wang.com (Tom Fitzgerald) writes: >brad@looking.on.ca (Brad Templeton) writes: > >The number of safe characters is way below 85 unfortunately. But above >36 it really doesn't do you a lot of good. If you want to crunch 31 bits >of timestamp and 15 bits of process ID into a string, it's easy to get a ^^^^^^^ > >Do any systems use 16-bit process IDs? The IBM RS/6000 uses 31-bit (yup, thirty-one) process IDs. You can always use the process slot in the kernel, which is encoded into the PID, but that still uses 17 bits. -- ==================================================================== Frank Irwin | "I'll bet $50 on that flush." Vmark Software, Inc. | Whooooosh! ..uunet!merk!uvmark!irwin | "Aaaaiiiieeee! Not *that* flush!"
rees@pisa.citi.umich.edu (Jim Rees) (03/29/91)
In article <b2x77c.9uo@wang.com>, fitz@wang.com (Tom Fitzgerald) writes:
The number of safe characters is way below 85 unfortunately. But above
36 it really doesn't do you a lot of good. If you want to crunch 31 bits
of timestamp and 15 bits of process ID into a string, it's easy to get a
10-character result (like you'll see in the message ID of this article)
and a pain to get any shorter than that.
Wait, I've got an idea. How about numbering each article, starting with '1'
and going up. You could keep a counter, say in a file in /usr/lib/news, and
increment it each time. You could use decimal and not reach that 10
character limit until you posted 10 billion articles.
henry@zoo.toronto.edu (Henry Spencer) (03/29/91)
In article <50a41197.1bc5b@pisa.citi.umich.edu> rees@citi.umich.edu (Jim Rees) writes: >Wait, I've got an idea. How about numbering each article, starting with '1' >and going up. You could keep a counter, say in a file in /usr/lib/news, and >increment it each time... How do you coordinate simultaneous access to that file by multiple posters? Across a network filesystem? Across NFS (the thing that shambles like a filesystem)? What happens if it gets scrambled? Shared databases are a lot trickier than they look. C News abandoned that approach deliberately. -- "The stories one hears about putting up | Henry Spencer @ U of Toronto Zoology SunOS 4.1.1 are all true." -D. Harrison| henry@zoo.toronto.edu utzoo!henry
kherron@ms.uky.edu (Kenneth Herron) (03/29/91)
rees@pisa.citi.umich.edu (Jim Rees) writes: >In article <b2x77c.9uo@wang.com>, fitz@wang.com (Tom Fitzgerald) writes: >Wait, I've got an idea. How about numbering each article, starting with '1' >and going up. You could keep a counter, say in a file in /usr/lib/news, and >increment it each time. I thought of this once, but the locking could get painful and this has uses beyond news anyway. How about a "unique number server" that does nothing but provide numbers on demand. Give it a period of a million or a billion and it'll take months or years to repeat... Obviously there are problems with this; consider it a Partially Baked Idea. -- Kenneth Herron kherron@ms.uky.edu University of Kentucky (606) 257-2975 Department of Mathematics "Never trust gimmicky gadgets" -- the Doctor
louie@sayshell.umd.edu (Louis A. Mamakos) (03/29/91)
In article <b2x77c.9uo@wang.com> fitz@wang.com (Tom Fitzgerald) writes: >The number of safe characters is way below 85 unfortunately. But above >36 it really doesn't do you a lot of good. If you want to crunch 31 bits >of timestamp and 15 bits of process ID into a string, I had an interesting thought; on BSD flavored systems with gettimeofday(), you will always get a unique time returned (provided the clock is not reset, but only slewed using adjtime()). If gettimeofday() is called more than once between clock interrupts, such that the same time would have been returned, the low order bits of tv_micro are farbled to ensure a unique value. You might just dispense with the process id completely, and use a unique time value composed of the time (though in that case, you've got 64 bits 'o time, 32 each in tv_sec and tv_usec in struct timeval). See, another reason to run NTP to synchronize your clocks and to beat on your vendors that can't get a UNIX kernel to keep correct time and not drop clock interrupts.. Just a random thought, louie
brad@looking.on.ca (Brad Templeton) (03/29/91)
Down the road, operating system designers probably should consider getunique() as an operating system service. A very simple function, it would simply guarantee that it never, ever, returns the same string. Would be handy. It might have a few modes, providng strings that are unique for the process, day, system-forever and universe-forever. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
jerry@olivey.ATC.Olivetti.Com (Jerry Aguirre) (03/30/91)
In article <1991Mar29.032900.548@ni.umd.edu> louie@sayshell.umd.edu (Louis A. Mamakos) writes: >You might just dispense with the process id completely, and use a >unique time value composed of the time (though in that case, you've >got 64 bits 'o time, 32 each in tv_sec and tv_usec in struct timeval). What! Trade 15 bits of PID for 32 bits of tv_usec? :-) Actually the tv_usec doesn't use all 32 bits, only 20. It only counts to 999,999. Even if it was bigger the correct action would be to scale the excess into the seconds. If one encoded it as <ssss.uuu@domain>, with the usec as variable width field, then one could get smaller message IDs by just looping on gettimeofday until the usec. returned is a small value. Delaying some postings a fraction of a second to save the world from having to handle bigger message-IDs. :-)
amanda@visix.com (Amanda Walker) (04/02/91)
rees@pisa.citi.umich.edu (Jim Rees) writes:
You could keep a counter, say in a file in
/usr/lib/news, and increment it each time.
Henry brought up locking, but there's also the issue of file system
access/writeability (NFS-mounted news spool/NNTP).
--
Amanda Walker amanda@visix.com
Visix Software Inc. ...!uunet!visix!amanda
--
Q.: What do you get if you cross a godfather with a lawyer?
A.: Someone who makes an offer you can't understand.
hks@nic.funet.fi (Harri Salminen) (04/17/91)
Would it be possible to have after the time a checksum calculated over the most of the message? The checksum calculation should could include at least the newsgroup name and subject if not everything. It's unlikely that even an automatic program sends within one second two messages with same subject to same newsgroup. If you're worried that it still might be the same in some very rare circumstances it should be relatively easy to have the rejection routine to make a diff (or just compare wordcount) with the original message and send it to news manager for perusal. Including the newsgroup name would make it possible to munge the message-ids to become consistently different when gatewayed to two different newsgroups from mail. In theory you shouldn't tamper with message-ids if they are already present but in practise you might have to or the message might get lost. The other way around the problem would be to change all history database implementations to include the newsgroup name or number in some form which would have to be changed in all nodes wanting to benefit from this feature... Third and best alternative which I hope could someday be achieved is to standardize mailing lists at least as clearly as news messages so that crossposting, followups, references etc. would work nicely giving us a truly global distributed group communication service (some would call it computer conferencing I presume) The other advantage of this style of message id (marked with some special delimiter?) could be used to detect problems in message transport. Since most of us haven't yet migrated to use some single ISO standard character set only US ASCII representations of 0-9, a-z and A-Z which are common to almost all systems should be used. Harri -- Harri K Salminen - Finnish University & Research Network project hks@funet.fi, LK-HS at FINHUTC, tut!hks, OPMVAX::hks, OH2LGE@OH2RBI FUNET c/o VTKK/TLP, PL 40, 02101 Espoo, Finland - +358-0-4572288 "Virtually, I don't work, I just netWORK :-)"
henry@zoo.toronto.edu (Henry Spencer) (04/18/91)
In article <1991Apr16.174706.4963@nic.funet.fi> hks@funet.fi writes: >Would it be possible to have after the time a checksum calculated over >the most of the message? The checksum calculation should could include at >least the newsgroup name and subject if not everything. It's unlikely >that even an automatic program sends within one second two messages >with same subject to same newsgroup... I'm not sure what your objective is here. What this is essentially doing is adding a random number to the message ID. Using the process ID accomplishes the same thing, with random numbers that are *guaranteed unique* over the whole system, making collisions essentially impossible. >Including the newsgroup name would make it possible to munge the >message-ids to become consistently different when gatewayed to two >different newsgroups from mail. In theory you shouldn't tamper with >message-ids if they are already present but in practise you might have >to or the message might get lost. Can you explain this in more detail? I don't see why you ever have to tamper with a legal message ID, and you most certainly should never have to assign more than one to the same message. Gatewaying to multiple newsgroups should be done with a cross-posting, not by posting the same article to each newsgroup in turn! >The other advantage of this style of message id (marked with some special >delimiter?) could be used to detect problems in message transport... Geoff and I thought about this long and hard during C News development. Some early versions generated a Checksum header for this purpose. We eventually deleted it. The problem is that articles which go via broken networks like Bitnet are often changed slightly in harmless ways, like having tabs expanded to spaces or empty lines changed to contain a single space. So you get a lot of spurious checksum mismatches. Given this, we couldn't see a use for the checksums. You can't just discard articles with bad checksums. Messages complaining about it will be frequent enough that people will ignore them. The software problems that cause them are mostly already known, so alerting people won't do any good. Checking the checksum on every article is costly, especially if the algorithm is trying to be clever and ignore harmless kinds of damage. There are perhaps rare circumstances where it would be useful to know whether an article was damaged or not, but they didn't seem common enough to justify hauling the checksum along in every message. -- And the bean-counter replied, | Henry Spencer @ U of Toronto Zoology "beans are more important". | henry@zoo.toronto.edu utzoo!henry
wayne@dsndata.uucp (Wayne Schlitt) (04/18/91)
In article <1991Apr17.212354.12236@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes: > In article <1991Apr16.174706.4963@nic.funet.fi> hks@funet.fi writes: > >Would it be possible to have after the time a checksum calculated over > >the most of the message? The checksum calculation should could include at > >least the newsgroup name and subject if not everything. It's unlikely > >that even an automatic program sends within one second two messages > >with same subject to same newsgroup... > > I'm not sure what your objective is here. What this is essentially > doing is adding a random number to the message ID. Using the process ID > accomplishes the same thing, with random numbers that are *guaranteed > unique* over the whole system, making collisions essentially impossible. while looking through a long list of message id's, i came across one message id format that i kind of like. they used <time-date-stamp.username@site>. i hadnt really thought about it, but having the login name instead of the process id has a real advantage in that you will often have a valid email address to the person who posted the article. granted, using the login name has the same problem as a process id, in that the same user can generate more than one article per second, but it isnt any worse than the process id either. the login name may also be a little bit longer on average than the process id, but probably not by that much. you would also end up with lots of articles coming from user names of "gateway", or "news", but even that is no _worse_ than the process id. getting the users name in a protable way may be a problem, i am not sure... anyway, just think of the fun hacks you could add to expire to kill things from known net idiots quicker, and leave articles from doug gwyn, chris torek and (of course) henry spencer around longer. just a thought... -wayne
wisner@ims.alaska.edu (Bill Wisner) (04/19/91)
In article <WAYNE.91Apr17195303@dsndata.uucp> wayne@dsndata.uucp (Wayne Schlitt) writes: >granted, using the login name has the same problem as a process id, in >that the same user can generate more than one article per second, but >it isnt any worse than the process id either. Wrong. The message ID is generated by inews or anne.jones, which gets invoked each time an article is posted. Thus, the process ID is different for every article. The username is constant. If the username replaces the PID, two articles posted by the same user in one second will have the same message ID. Using the PID, the IDs will be different since the PID will have changed. I think it's safe to say that it's very unlikely for any system to cycle through all 30,000 process IDs in one second. Bill Wisner <wisner@ims.alaska.edu> Gryphon Gang Fairbanks AK 99775 bnug, dude yeah .
richard@locus.com (Richard M. Mathews) (04/20/91)
wisner@ims.alaska.edu (Bill Wisner) writes: >Wrong. The message ID is generated by inews or anne.jones, which gets >invoked each time an article is posted. Thus, the process ID is different >for every article. Actually, this isn't safe on all systems. On "secure" systems which generate pseudo-random PIDs, you are certain that there will not be two processes at the same time with the same PID; but successive processes could have the same PID. The probability of two with the same PID being created during different parts of the same second is small, but it doesn't require cycling through 30000 processes. Richard M. Mathews Lietuva laisva = Free Lithuania richard@locus.com Brivu Latviju = Free Latvia lcc!richard@seas.ucla.edu Eesti vabaks = Free Estonia ...!{uunet|ucla-se|turnkey}!lcc!richard
res@colnet.uucp (Rob Stampfli) (04/23/91)
While we are on the subject of Message-IDs, how about this: Choose any workable standard for message-IDs you like, provided it has some randomness to it (already a good idea for other reasons). Then pass this format thru a one-way authenticating function (one that is hard to invert) which produces a one-to-one mapping of inputs to outputs. Use the output of this function, suitably reformatted to match the specification for the Message-ID field, in the Message-ID field of the message being posted. Finally, save the unencrypted Message-ID at the site in a file accessible only to the News software. Because of the one-to-one mapping, it can be guaranteed that encrypted Message-IDs will be unique if they are generated from a subset of unique unencrypted Message-IDs. Now, once this is in place, change the News software to demand that the unencrypted Message-ID be sent along with any cancel message (define a new "Authentication: " field) if a cancel request is for a Message-ID which matches the format of an encrypted Message-ID. The result is that cancel messages can no longer be forged. Only the poster or the News administrator at the site the message originates at can cancel it. Of course, a given site can kill the message locally (and for downstream sites), but it cannot purge the message globally. I think such a scheme would have significant advantages in the anarchy we call Usenet. -- Rob Stampfli, 614-864-9377, res@kd8wk.uucp (osu-cis!kd8wk!res), kd8wk@n8jyv.oh
hks@nic.funet.fi (Harri Salminen) (04/26/91)
henry@zoo.toronto.edu (Henry Spencer) writes: >In article <1991Apr16.174706.4963@nic.funet.fi> hks@funet.fi writes: >>Would it be possible to have after the time a checksum calculated over >>the most of the message? The checksum calculation should could include at >>least the newsgroup name and subject if not everything. It's unlikely >>that even an automatic program sends within one second two messages >>with same subject to same newsgroup... >I'm not sure what your objective is here. What this is essentially >doing is adding a random number to the message ID. Using the process ID >accomplishes the same thing, with random numbers that are *guaranteed >unique* over the whole system, making collisions essentially impossible. We need an unique identifier, but defining the PID to be part of it might not mean it's unique, since some systems don't rotate pids, others might not have them all and third group of system might not even use separate processes for each message. Of course implementors choose some way (even a counter) to make them unique but I thought the idea was to put some meaning to the randomness and utilize it. The idea of one way unique encryption sounds fine if one can find a suitable algorithm which may be hard... You could use RSA or some other public key system but the message-ID's might get several lines long :-) Wasn't there an RFC on authenticated mail that could be utilized? >>Including the newsgroup name would make it possible to munge the >>message-ids to become consistently different when gatewayed to two >>different newsgroups from mail. In theory you shouldn't tamper with >>message-ids if they are already present but in practise you might have >>to or the message might get lost. >Can you explain this in more detail? I don't see why you ever have to >tamper with a legal message ID, and you most certainly should never have >to assign more than one to the same message. I believe that gateways should be liberal in what they accept but output only "legal" format. Although it's rare nowadays, you'll sometimes get messages with illegal characters in message-ID's (Zmailer is the only RFC-822 mailer I know that really cares what the message-ID looks like...). Some popular gateways just map them to legal ones and pass through. The same applies to other headers. I don't like systems that discard or return the message if it has just some minor errors (like "non-existing" timezone) and has already reached all list subscribers without problems. Zmailer does quite nice compromise by letting it through and informing very clearly what was wrong according to which rfc. It even checks message-ID and references fields... >Gatewaying to multiple >newsgroups should be done with a cross-posting, not by posting the same >article to each newsgroup in turn! It's almost impossible to do crossposting when you have multiple gateways in different places. To do so each gateway would have to decipher from the mail headers and a global gateway database to which groups it's going (sometimes even impossible since the list name might not even be in X-Resent-Cc: field in the message body not to mention the millions of variations a list's addres can be represented). The only reliable method, unless you gateway a listserv list with right newsgroup header insertion) is to have alias for each incoming list in the gateway host. Sometimes the same list gatewayed to two different distributions untill one of the groups is removed. We need to define mailing lists better and coordinate gateways to improve the situation. When a message ends up via two different gateways to two different newsgroups it will be shown only in the first one it arrives. Fortunately it's common that the person reads both groups so you can live with it because the message is at least somewhere... To solve this small problem you either need different message-ID's or history checking that includes the newsgroup. Maybe the latter is cleaner way after all? >>The other advantage of this style of message id (marked with some special >>delimiter?) could be used to detect problems in message transport... >-- >And the bean-counter replied, | Henry Spencer @ U of Toronto Zoology >"beans are more important". | henry@zoo.toronto.edu utzoo!henry -- Harri K Salminen - Finnish University & Research Network project hks@funet.fi, LK-HS at FINHUTC, tut!hks, OPMVAX::hks, OH2LGE@OH2RBI FUNET c/o VTKK/TLP, PL 40, 02101 Espoo, Finland - +358-0-4572288 "Virtually, I don't work, I just netWORK :-)"
hks@nic.funet.fi (Harri Salminen) (04/26/91)
I forgot to note that if you just ADD a string to the already unique messageid just before @ you'll get even more unique ID. The string could be crc of newsgroup name & subject or some other hash result of the newsgroup name. That way you don't have to change anything else but the gateway although it isn't as clean as checking newsgroup name in history check. Harri -- Harri K Salminen - Finnish University & Research Network project hks@funet.fi, LK-HS at FINHUTC, tut!hks, OPMVAX::hks, OH2LGE@OH2RBI FUNET c/o VTKK/TLP, PL 40, 02101 Espoo, Finland - +358-0-4572288 "Virtually, I don't work, I just netWORK :-)"