[comp.mail.mush] Time Zones -- help me out

schaefer@ogicse.ogc.edu (Barton E. Schaefer) (02/06/90)

In article <7129@ogicse.ogc.edu> I wrote:
} In article <3257@taux01.UUCP> crehta@taux01.nsc.com (Ran Ever-Hadani) writes:
} } How hard should it be to fix the folder sort-by-date to take
} } time zones into consideration?
} 
} Mush currently does absolutely *nothing* with time zones -- it doesn't
} even store them in its internal representation of the date.  I've begun
} working on some code for timezones but it isn't anywhere near finished.

OK, folks, I need some help here.  What follows is a list of all the date
formats mush understands.  Some of them have time zone fields, some appear
not to.  All were reported by users of mush at one time or another.

I'll run through the ones that I know have TZ first, to give you an idea
what I'm after, and then if you recognize any of the other formats and can
tell me if and where they give a TZ -- probably at the far right somewhere,
but I can't be sure -- we'll be one step closer to doing this right.  First
comes mush's comment-text description of the format, and then some
reconstructions as best I can do them -- I've never seen most of these.

===
The formats that have time zones.  Note that we've had complaints about
formats that lack a seconds field in the time, so we have to check for
both cases if we scan past a time field looking for a timezone.
---
     *   day_name month_name day_number time timezone year_number

	Mon Feb  5 14:05:57 PST 1990
	Mon Feb  5 14:05:57 -0800 1990
	Mon Feb  5 14:05 PST 1990
	(This should be good 'ol ctime, but somebody mangles it anyway.)
---
     *   day_number month_name year_number time timezone ...

	5 Feb 1990 14:05:57 PST
	5 Feb 1990 14:05:57 -0800
	5 Feb 1990 14:05 PST
	(The first two are RFC822 format, which, interestingly, RFC882
	violates in its own examples section -- there, it uses something
	like the last one, except with no `:' between hours/minutes.  No
	one has ever complained about encountering that format.)
---
     *   day_number month_name year_number time-timezone (day)
     *                                       ^no colon separator

	5 Feb 1990 140557-PST (Mon)
	5 Feb 1990 1405-PST (Mon)
	5 Feb 1990 1405-0800 (Mon)	[??]
	(Does this ever show up with a GMT-offset as zone?  Does it
	ALWAYS show up with a GMT-offset, and the "-" is part of the
	time rather than a separator?)
===
This next format is a bit suspicious.  I wonder if that "-" floating
out there is the beginning of a GMT-offset zone.  Anybody know?
---

     *   day_number month_name year_number, time "-"

	5 Feb 1990, 14:05:57 -
	5 Feb 1990, 14:05:57 -0800	[??]
===
These formats are not known to have time zones.  Recognize any of them?
---
     *   day_name month_name day_number time year_number

	Mon Feb 5 14:05:57 1990
---
     *   day_name month_name day_number year_number time

	Mon Feb 5 1990 14:05:57
---
     *   day_number month_name year_number 12_hour_time am_or_pm

	Mon Feb 5 1990 02:05:57 pm
---
     *   day_name day_number month_name year_number time

	Mon 5 Feb 1990 14:05:57
---
     *   day_number month_name year_number time

	5 Feb 1990 14:05:57
---
     *   day_number-month_name-year time

	5-Feb-1990 14:05:57
---
     *   day_name, day_number-month_name-year time

	Mon, 5-Feb-1990 14:05:57
===

Any help you can send along is appreciated, but please don't bombard me
with mail unless you have some idea what you're talking about, or can
show me an example of one of the time-zone-less formats above showing
where the timezone can be found.  Thanks!
-- 
Bart Schaefer          "February.  The hangnail on the big toe of the year."
                                                                    -- Duffy

schaefer@cse.ogi.edu (used to be cse.ogc.edu)

loverso@XYLOGICS.COM (John Robert LoVerso) (02/06/90)

Hmmm - is this date-parsing code based upon something like unctime(), of
B news?  If not, that code already parses lots of different date formats.
If you've got additional formats, you could add it to unctime() and hand
it back to the rest of the world...

John
"I never try to reinvent the wheel; But when I do, I just make it square"
LoVerso

levin@magnus.Hotline.Com (Michael M Levin) (02/07/90)

In article <9002060315.AA10418@xenna.Xylogics.COM> loverso@XYLOGICS.COM (John Robert LoVerso) writes:
>Hmmm - is this date-parsing code based upon something like unctime(), of
>B news?  If not, that code already parses lots of different date formats.
>If you've got additional formats, you could add it to unctime() and hand
>it back to the rest of the world...

Me thinks, perhaps, that we are going to find ourselves beating the entire
timezone issue to death, since there are NO real standards recognized by
the 'entire civilized world'.  I believe that a slightly different approach
is in order-  like, maybe, deciding on just what the OFFICIAL standard
really ought to be (such as the -0800 format), and if there really isn't
any pressing reason, _maybe_ just decide on using a header field which is
called by a slightly different name-- like "Std-time: ", which could then
be expressed in Greenwich format.  If a PD routine to generate time in this
format were widely distributed, in a fashion suitable for inclusion in all
sendmail or smail generated email, and also suitable for inclusion in the
various mail-shells, then a convention for sorting based on a universally
adopted time standard would make everbody real-real-real happy.

		cost--  $0.02




					Mike Levin


-- 
 _            _           
| | ___  ___ |_| ___   Michael Levin     SilentRadio Headquarters- Los Angeles
| |/ ._\| | || ||   \  20732 Lassen Street,    Chatsworth  CA  91311    U.S.A.
|_|\___/ \_/ |_||_|_|  E-Mail: levin@Hotline.Com  {att|csun|srhqla}!magnus!mml

davidsen@crdos1.crd.ge.COM (02/07/90)

  I doubt that anything which depends on other people doing things to
their mailer is going to find a lot of non-compliance. I would be much
happier having a really strong date interpreter on my end than expecting
people to add another field to their headers.

schaefer@ogicse.ogc.edu (Barton E. Schaefer) (02/08/90)

In article <630@magnus.Hotline.Com> levin@magnus.Hotline.Com (Michael M Levin) writes:
} In article <9002060315.AA10418@xenna.Xylogics.COM> loverso@XYLOGICS.COM (John Robert LoVerso) writes:
} >Hmmm - is this date-parsing code based upon something like unctime(), of
} >B news?  If not, that code already parses lots of different date formats.
} >If you've got additional formats, you could add it to unctime() and hand
} >it back to the rest of the world...
} 
} Me thinks, perhaps, that we are going to find ourselves beating the entire
} timezone issue to death, since there are NO real standards recognized by
} the 'entire civilized world'.

Just to clarify a point:  The standard to which Mush adheres (or attempts
to) is RFC822, Standard for the Format of ARPA Internet Text Messages.
(Mush will also eventually support X.400 format if that is different -- I
have yet to obtain a copy of the X.400 specs, so I can't say.)  The format
specified by RFC822 is:

	Day, Date Month Year Hour:Minute:Second Timezone

where "Day," is optional.  Day and Month are 3-letter abbreviations; Date,
Year, Hour, Minute, and Second are two digits each; and Timezone is either
an offset from Universal Time (GMT) or a short list of North American 3-
letter timezone abbreviations.  Offsets from UT are of the form -HHMM or
+HHMM, e.g. PST is -0800, and Newfoundland is (I think) -0330 (just to
show that the minutes are indeed necesary).

Obviously, this doesn't cover everybody.  Though almost everyone who is
not using X.400 nominally complies with 822, there are a number of minor
variations (omitting the comma after Day, using a 4-digit Year, omitting
the Seconds, swapping the places of Date and Month, etc.) and there are
lots of 3- 4- and 5-letter timezone abbreviations outside NA.  Mush has
so far avoided dealing with the time zone question (hence my original
posting) but it handles all the other variations.

} I believe that a slightly different approach
} is in order-  like, maybe, deciding on just what the OFFICIAL standard
} really ought to be (such as the -0800 format), and if there really isn't
} any pressing reason, _maybe_ just decide on using a header field which is
} called by a slightly different name-- like "Std-time: ", which could then
} be expressed in Greenwich format.

In article <5D01C5E2E4@crdos1> davidsen@crdos1.crd.ge.com writes:
} 
}   I doubt that anything which depends on other people doing things to
} their mailer is going to find a lot of non-compliance. I would be much
} happier having a really strong date interpreter on my end than expecting
} people to add another field to their headers.

Bill has the right idea once you delete "non-" from that first sentence.  
However, I don't think its worthwhile for Mush to go so far as parsing
some of the really outlandish forms.  No mailer is going to generate
"Saturday, February Third, Nineteen Ninety, Twelve Fifty-Seven Thirteen
Post Meridian, Pacific Standard Time".  I'll admit that Mush's present
date parser could stand improvement, but it accepts every date format
that has been reported since Mush first appeared.
-- 
Bart Schaefer          "February.  The hangnail on the big toe of the year."
                                                                    -- Duffy

schaefer@cse.ogi.edu (used to be cse.ogc.edu)

kjones@talos.uu.net (Kyle Jones) (02/13/90)

Barton E. Schaefer writes:
 >      *   day_number month_name year_number time timezone ...
 > 
 > 	5 Feb 1990 14:05:57 PST
 > 	5 Feb 1990 14:05:57 -0800
 > 	5 Feb 1990 14:05 PST
 > 	(The first two are RFC822 format, which, interestingly, RFC882
 > 	violates in its own examples section -- there, it uses something
 > 	like the last one, except with no `:' between hours/minutes.  No
 > 	one has ever complained about encountering that format.)

None of these are RFC 822 complaint because the year number is supposed
to only have two digits.

I think you're working too hard with these date formats.  At some point
you've got to blow off all these nonstandard variants and just stick with the
standard.  You're going to be old and gray with all but six marbles gone
before you manage to grok all the wierd date formats out there.

I suggest that you handle two: RFC 822 and the format the UNIX
date(1) command returns.

schaefer@ogicse.ogc.edu (Barton E. Schaefer) (02/13/90)

In article <1990Feb12.181547.27427@talos.uu.net> kyle@xanth.cs.odu.edu writes:
} Barton E. Schaefer writes:
}  >      *   day_number month_name year_number time timezone ...
}  > 
}  > 	5 Feb 1990 14:05:57 PST
}  > 	5 Feb 1990 14:05:57 -0800
}  > 	5 Feb 1990 14:05 PST
} 
} None of these are RFC 822 complaint because the year number is supposed
} to only have two digits.

Typo, my apologies.  You'll find that mush actually does use only 2 digits
whenever it creates such a date.  It will, however, accept either 2 or 4
digits when parsing it, which is part of the reason I mistyped here.

} I think you're working too hard with these date formats.  At some point
} you've got to blow off all these nonstandard variants and just stick with the
} standard.  You're going to be old and gray with all but six marbles gone
} before you manage to grok all the wierd date formats out there.

Undoubtedly, but I think we should at least try to do a better job with
the several that we already grok.
-- 
Bart Schaefer          "February.  The hangnail on the big toe of the year."
                                                                    -- Duffy

schaefer@cse.ogi.edu (used to be cse.ogc.edu)

steve@thelake.mn.org (Steve Yelvington) (02/13/90)

[In article <1990Feb12.181547.27427@talos.uu.net>,
     kjones@talos.uu.net (Kyle Jones) writes ... ]

> Barton E. Schaefer writes:
>  >      *   day_number month_name year_number time timezone ...
>  > 
>  > 	5 Feb 1990 14:05:57 PST
>  > 	5 Feb 1990 14:05:57 -0800
>  > 	5 Feb 1990 14:05 PST
>  > 	(The first two are RFC822 format, which, interestingly, RFC882
>  > 	violates in its own examples section -- there, it uses something
>  > 	like the last one, except with no `:' between hours/minutes.  No
>  > 	one has ever complained about encountering that format.)
> 
> None of these are RFC 822 complaint because the year number is supposed
> to only have two digits.

That is no longer true; see the following excerpt:

RFC1123                  MAIL -- SMTP & RFC-822             October 1989

      5.2.14  RFC-822 Date and Time Specification: RFC-822 Section 5

         The syntax for the date is hereby changed to:

            date = 1*2DIGIT month 2*4DIGIT

         All mail software SHOULD use 4-digit years in dates, to ease
         the transition to the next century.

         There is a strong trend towards the use of numeric timezone
         indicators, and implementations SHOULD use numeric timezones
         instead of timezone names.  However, all implementations MUST
         accept either notation.  If timezone names are used, they MUST
         be exactly as defined in RFC-822.

         The military time zones are specified incorrectly in RFC-822:
         they count the wrong way from UT (the signs are reversed).  As
         a result, military time zones in RFC-822 headers carry no
         information.

         Finally, note that there is a typo in the definition of "zone"
         in the syntax summary of appendix D; the correct definition
         occurs in Section 3 of RFC-822.


(Followups are directed to comp.mail.headers.)

-- 
   Steve Yelvington at the (thin ice today) lake in Minnesota