schaefer@ogicse.ogc.edu (Barton E. Schaefer) (02/06/90)
In article <7129@ogicse.ogc.edu> I wrote: } In article <3257@taux01.UUCP> crehta@taux01.nsc.com (Ran Ever-Hadani) writes: } } How hard should it be to fix the folder sort-by-date to take } } time zones into consideration? } } Mush currently does absolutely *nothing* with time zones -- it doesn't } even store them in its internal representation of the date. I've begun } working on some code for timezones but it isn't anywhere near finished. OK, folks, I need some help here. What follows is a list of all the date formats mush understands. Some of them have time zone fields, some appear not to. All were reported by users of mush at one time or another. I'll run through the ones that I know have TZ first, to give you an idea what I'm after, and then if you recognize any of the other formats and can tell me if and where they give a TZ -- probably at the far right somewhere, but I can't be sure -- we'll be one step closer to doing this right. First comes mush's comment-text description of the format, and then some reconstructions as best I can do them -- I've never seen most of these. === The formats that have time zones. Note that we've had complaints about formats that lack a seconds field in the time, so we have to check for both cases if we scan past a time field looking for a timezone. --- * day_name month_name day_number time timezone year_number Mon Feb 5 14:05:57 PST 1990 Mon Feb 5 14:05:57 -0800 1990 Mon Feb 5 14:05 PST 1990 (This should be good 'ol ctime, but somebody mangles it anyway.) --- * day_number month_name year_number time timezone ... 5 Feb 1990 14:05:57 PST 5 Feb 1990 14:05:57 -0800 5 Feb 1990 14:05 PST (The first two are RFC822 format, which, interestingly, RFC882 violates in its own examples section -- there, it uses something like the last one, except with no `:' between hours/minutes. No one has ever complained about encountering that format.) --- * day_number month_name year_number time-timezone (day) * ^no colon separator 5 Feb 1990 140557-PST (Mon) 5 Feb 1990 1405-PST (Mon) 5 Feb 1990 1405-0800 (Mon) [??] (Does this ever show up with a GMT-offset as zone? Does it ALWAYS show up with a GMT-offset, and the "-" is part of the time rather than a separator?) === This next format is a bit suspicious. I wonder if that "-" floating out there is the beginning of a GMT-offset zone. Anybody know? --- * day_number month_name year_number, time "-" 5 Feb 1990, 14:05:57 - 5 Feb 1990, 14:05:57 -0800 [??] === These formats are not known to have time zones. Recognize any of them? --- * day_name month_name day_number time year_number Mon Feb 5 14:05:57 1990 --- * day_name month_name day_number year_number time Mon Feb 5 1990 14:05:57 --- * day_number month_name year_number 12_hour_time am_or_pm Mon Feb 5 1990 02:05:57 pm --- * day_name day_number month_name year_number time Mon 5 Feb 1990 14:05:57 --- * day_number month_name year_number time 5 Feb 1990 14:05:57 --- * day_number-month_name-year time 5-Feb-1990 14:05:57 --- * day_name, day_number-month_name-year time Mon, 5-Feb-1990 14:05:57 === Any help you can send along is appreciated, but please don't bombard me with mail unless you have some idea what you're talking about, or can show me an example of one of the time-zone-less formats above showing where the timezone can be found. Thanks! -- Bart Schaefer "February. The hangnail on the big toe of the year." -- Duffy schaefer@cse.ogi.edu (used to be cse.ogc.edu)
loverso@XYLOGICS.COM (John Robert LoVerso) (02/06/90)
Hmmm - is this date-parsing code based upon something like unctime(), of B news? If not, that code already parses lots of different date formats. If you've got additional formats, you could add it to unctime() and hand it back to the rest of the world... John "I never try to reinvent the wheel; But when I do, I just make it square" LoVerso
levin@magnus.Hotline.Com (Michael M Levin) (02/07/90)
In article <9002060315.AA10418@xenna.Xylogics.COM> loverso@XYLOGICS.COM (John Robert LoVerso) writes: >Hmmm - is this date-parsing code based upon something like unctime(), of >B news? If not, that code already parses lots of different date formats. >If you've got additional formats, you could add it to unctime() and hand >it back to the rest of the world... Me thinks, perhaps, that we are going to find ourselves beating the entire timezone issue to death, since there are NO real standards recognized by the 'entire civilized world'. I believe that a slightly different approach is in order- like, maybe, deciding on just what the OFFICIAL standard really ought to be (such as the -0800 format), and if there really isn't any pressing reason, _maybe_ just decide on using a header field which is called by a slightly different name-- like "Std-time: ", which could then be expressed in Greenwich format. If a PD routine to generate time in this format were widely distributed, in a fashion suitable for inclusion in all sendmail or smail generated email, and also suitable for inclusion in the various mail-shells, then a convention for sorting based on a universally adopted time standard would make everbody real-real-real happy. cost-- $0.02 Mike Levin -- _ _ | | ___ ___ |_| ___ Michael Levin SilentRadio Headquarters- Los Angeles | |/ ._\| | || || \ 20732 Lassen Street, Chatsworth CA 91311 U.S.A. |_|\___/ \_/ |_||_|_| E-Mail: levin@Hotline.Com {att|csun|srhqla}!magnus!mml
davidsen@crdos1.crd.ge.COM (02/07/90)
I doubt that anything which depends on other people doing things to their mailer is going to find a lot of non-compliance. I would be much happier having a really strong date interpreter on my end than expecting people to add another field to their headers.
schaefer@ogicse.ogc.edu (Barton E. Schaefer) (02/08/90)
In article <630@magnus.Hotline.Com> levin@magnus.Hotline.Com (Michael M Levin) writes: } In article <9002060315.AA10418@xenna.Xylogics.COM> loverso@XYLOGICS.COM (John Robert LoVerso) writes: } >Hmmm - is this date-parsing code based upon something like unctime(), of } >B news? If not, that code already parses lots of different date formats. } >If you've got additional formats, you could add it to unctime() and hand } >it back to the rest of the world... } } Me thinks, perhaps, that we are going to find ourselves beating the entire } timezone issue to death, since there are NO real standards recognized by } the 'entire civilized world'. Just to clarify a point: The standard to which Mush adheres (or attempts to) is RFC822, Standard for the Format of ARPA Internet Text Messages. (Mush will also eventually support X.400 format if that is different -- I have yet to obtain a copy of the X.400 specs, so I can't say.) The format specified by RFC822 is: Day, Date Month Year Hour:Minute:Second Timezone where "Day," is optional. Day and Month are 3-letter abbreviations; Date, Year, Hour, Minute, and Second are two digits each; and Timezone is either an offset from Universal Time (GMT) or a short list of North American 3- letter timezone abbreviations. Offsets from UT are of the form -HHMM or +HHMM, e.g. PST is -0800, and Newfoundland is (I think) -0330 (just to show that the minutes are indeed necesary). Obviously, this doesn't cover everybody. Though almost everyone who is not using X.400 nominally complies with 822, there are a number of minor variations (omitting the comma after Day, using a 4-digit Year, omitting the Seconds, swapping the places of Date and Month, etc.) and there are lots of 3- 4- and 5-letter timezone abbreviations outside NA. Mush has so far avoided dealing with the time zone question (hence my original posting) but it handles all the other variations. } I believe that a slightly different approach } is in order- like, maybe, deciding on just what the OFFICIAL standard } really ought to be (such as the -0800 format), and if there really isn't } any pressing reason, _maybe_ just decide on using a header field which is } called by a slightly different name-- like "Std-time: ", which could then } be expressed in Greenwich format. In article <5D01C5E2E4@crdos1> davidsen@crdos1.crd.ge.com writes: } } I doubt that anything which depends on other people doing things to } their mailer is going to find a lot of non-compliance. I would be much } happier having a really strong date interpreter on my end than expecting } people to add another field to their headers. Bill has the right idea once you delete "non-" from that first sentence. However, I don't think its worthwhile for Mush to go so far as parsing some of the really outlandish forms. No mailer is going to generate "Saturday, February Third, Nineteen Ninety, Twelve Fifty-Seven Thirteen Post Meridian, Pacific Standard Time". I'll admit that Mush's present date parser could stand improvement, but it accepts every date format that has been reported since Mush first appeared. -- Bart Schaefer "February. The hangnail on the big toe of the year." -- Duffy schaefer@cse.ogi.edu (used to be cse.ogc.edu)
kjones@talos.uu.net (Kyle Jones) (02/13/90)
Barton E. Schaefer writes: > * day_number month_name year_number time timezone ... > > 5 Feb 1990 14:05:57 PST > 5 Feb 1990 14:05:57 -0800 > 5 Feb 1990 14:05 PST > (The first two are RFC822 format, which, interestingly, RFC882 > violates in its own examples section -- there, it uses something > like the last one, except with no `:' between hours/minutes. No > one has ever complained about encountering that format.) None of these are RFC 822 complaint because the year number is supposed to only have two digits. I think you're working too hard with these date formats. At some point you've got to blow off all these nonstandard variants and just stick with the standard. You're going to be old and gray with all but six marbles gone before you manage to grok all the wierd date formats out there. I suggest that you handle two: RFC 822 and the format the UNIX date(1) command returns.
schaefer@ogicse.ogc.edu (Barton E. Schaefer) (02/13/90)
In article <1990Feb12.181547.27427@talos.uu.net> kyle@xanth.cs.odu.edu writes: } Barton E. Schaefer writes: } > * day_number month_name year_number time timezone ... } > } > 5 Feb 1990 14:05:57 PST } > 5 Feb 1990 14:05:57 -0800 } > 5 Feb 1990 14:05 PST } } None of these are RFC 822 complaint because the year number is supposed } to only have two digits. Typo, my apologies. You'll find that mush actually does use only 2 digits whenever it creates such a date. It will, however, accept either 2 or 4 digits when parsing it, which is part of the reason I mistyped here. } I think you're working too hard with these date formats. At some point } you've got to blow off all these nonstandard variants and just stick with the } standard. You're going to be old and gray with all but six marbles gone } before you manage to grok all the wierd date formats out there. Undoubtedly, but I think we should at least try to do a better job with the several that we already grok. -- Bart Schaefer "February. The hangnail on the big toe of the year." -- Duffy schaefer@cse.ogi.edu (used to be cse.ogc.edu)
steve@thelake.mn.org (Steve Yelvington) (02/13/90)
[In article <1990Feb12.181547.27427@talos.uu.net>, kjones@talos.uu.net (Kyle Jones) writes ... ] > Barton E. Schaefer writes: > > * day_number month_name year_number time timezone ... > > > > 5 Feb 1990 14:05:57 PST > > 5 Feb 1990 14:05:57 -0800 > > 5 Feb 1990 14:05 PST > > (The first two are RFC822 format, which, interestingly, RFC882 > > violates in its own examples section -- there, it uses something > > like the last one, except with no `:' between hours/minutes. No > > one has ever complained about encountering that format.) > > None of these are RFC 822 complaint because the year number is supposed > to only have two digits. That is no longer true; see the following excerpt: RFC1123 MAIL -- SMTP & RFC-822 October 1989 5.2.14 RFC-822 Date and Time Specification: RFC-822 Section 5 The syntax for the date is hereby changed to: date = 1*2DIGIT month 2*4DIGIT All mail software SHOULD use 4-digit years in dates, to ease the transition to the next century. There is a strong trend towards the use of numeric timezone indicators, and implementations SHOULD use numeric timezones instead of timezone names. However, all implementations MUST accept either notation. If timezone names are used, they MUST be exactly as defined in RFC-822. The military time zones are specified incorrectly in RFC-822: they count the wrong way from UT (the signs are reversed). As a result, military time zones in RFC-822 headers carry no information. Finally, note that there is a typo in the definition of "zone" in the syntax summary of appendix D; the correct definition occurs in Section 3 of RFC-822. (Followups are directed to comp.mail.headers.) -- Steve Yelvington at the (thin ice today) lake in Minnesota