[comp.std.internat] Data compression standard

gordoni@berlioz.ua.oz (Gordon Irlam) (06/22/91)

From article <859@spam.ua.oz>, by ross@spam.ua.oz.au (Ross Williams)

  [A meta standard for data compression.]

> Date string - A  date string is a standard string  of length 11 having
>     the format "dd-mmm-yyyy" where dd  is in the range "01".."31", mmm
>     is in the range "Jan","Feb",.."Dec"  (Case dependent), and yyyy is
>     in the range "1900" and "9999".

Hence,

    20-Jun-1991

But this creates yet another incompatible date/time format.  It would
be better to adopt a standard date/time format.

A fairly common date/time format on the Internet is that used in
RFC 822.  It looks something like this.

    20 Jun 91 15:48:18 GMT

Unfortunately this date/time format has several disadvantages:

    - It contains "white space" characters.

    - Even without the white space the mapping from literal strings to
      date/times is many to 1.

    - The representation of the month only makes sense in English
      speaking countries.

    - It doesn't include the century.  (What will happen to Usenet on
      the 1st of January 2000?)

It would probably be better to choose one of the following.

    1991-06-20

    1991-06-20T15:48:18Z

These are the "extended format complete representation of a calendar
date", and the "extended format complete representation of a moment of
Coordinated Universal Time" as specified by ISO 8601.

Advantages of the yyyy-mm-dd format over the widely used dd-mm-yyyy
format include:

    - "The avoidance of confusion in comparison with existing national
       conventions using different systems of ascending order."

      (the U.S. national format is easily confused with that of most
       other countries)

    - "The ease with which the whole date may be treated as a single
       number for the purposes of filing and classification."

      (ie. sorting)

    - "The possibility of continuing the order by adding digits for
       hour-minute-second."

Note that ISO 8601 includes representations for many other date/time
quantities that are not relevant here.  These include yyyy-ddd and
yyyy-Www-d formats, basic formats, reduced precision, truncated
representations, fractional hours, minutes, and seconds, periods of
time, and time zone differences.

It would be a serious mistake to allow any ISO 8601 date/time format
since writing a program to parse an arbitrary ISO date/time
representation would be a big challenge.  Instead adopt just one
possible representation.  (I would suggest the second of the two
formats presented above.)

                                       Gordon Irlam.
                                       (gordoni@cs.adelaide.edu.au)

enag@ifi.uio.no (Erik Naggum) (06/24/91)

Gordon Irlam <gordoni@berlioz.ua.oz> writes:
|
|   A fairly common date/time format on the Internet is that used in
|   RFC 822.  It looks something like this.
|
|       20 Jun 91 15:48:18 GMT
|
|   Unfortunately this date/time format has several disadvantages:
|
|       - It doesn't include the century.  (What will happen to Usenet on
|	  the 1st of January 2000?)

The IETF Host Requirements working group sensibly recommended
four-digit year specification in RFC 1123:

      5.2.14  RFC-822 Date and Time Specification: RFC-822 Section 5

         The syntax for the date is hereby changed to:

            date = 1*2DIGIT month 2*4DIGIT

         All mail software SHOULD use 4-digit years in dates, to ease
         the transition to the next century.

|   It would probably be better to choose one of the following.
|
|       1991-06-20

This is probably the most readable choice.

|       1991-06-20T15:48:18Z

This seems needlessly cluttered.

|   It would be a serious mistake to allow any ISO 8601 date/time format
|   since writing a program to parse an arbitrary ISO date/time
|   representation would be a big challenge.  Instead adopt just one
|   possible representation.  (I would suggest the second of the two
|   formats presented above.)

I'm sure writing an ISO 8601 parser which returns the UNIX standard
time representation (seconds since 1970-01-01 00:00:00 +0000, or
whatever ;-) is a challenge, but it would be very, very useful.  Are
there anybody out there who would like to work with me on this?

</Erik>
--
Erik Naggum             Professional Programmer            +47-2-836-863
Naggum Software             Electronic Text             <erik@naggum.no>
0118 OSLO, NORWAY       Computer Communications        <enag@ifi.uio.no>

shores@fergvax.unl.edu (Shores) (06/24/91)

In <3761@sirius.ucs.adelaide.edu.au> gordoni@berlioz.ua.oz (Gordon Irlam) writes:

>From article <859@spam.ua.oz>, by ross@spam.ua.oz.au (Ross Williams)

>  [A meta standard for data compression.]

>> Date string - A  date string is a standard string  of length 11 having
>>     the format "dd-mmm-yyyy" where dd  is in the range "01".."31", mmm
>>     is in the range "Jan","Feb",.."Dec"  (Case dependent), and yyyy is
>>     in the range "1900" and "9999".

>Hence,

>    20-Jun-1991

>But this creates yet another incompatible date/time format.  It would
>be better to adopt a standard date/time format.

>A fairly common date/time format on the Internet is that used in
>RFC 822.  It looks something like this.

>    20 Jun 91 15:48:18 GMT

>Unfortunately this date/time format has several disadvantages:

>    - It contains "white space" characters.

>    - Even without the white space the mapping from literal strings to
>      date/times is many to 1.

>    - The representation of the month only makes sense in English
>      speaking countries.

>    - It doesn't include the century.  (What will happen to Usenet on
>      the 1st of January 2000?)

>It would probably be better to choose one of the following.

>    1991-06-20

>    1991-06-20T15:48:18Z

>These are the "extended format complete representation of a calendar
>date", and the "extended format complete representation of a moment of
>Coordinated Universal Time" as specified by ISO 8601.

I have a better idea.  Instead of storing a date STRING, why not just
store a number?  The Macintosh stores dates as a 4 byte number,
representing the seconds elapsed since Jan 1, 1904.  Unix has a similar
convention, only from 1972 (better, IMHO).  Then it should be up to the
user program to represent the date.   The Mac has IUDateString, unix and
others have ctime(), etc.

--tom shores

PS: considering the nature of this group, shouldn't it be called
    "comp.ression" :-)


   Tom... Tommy... Thomas... the Tom-ster, the Tom-boy, the Tomminator...
   ... Tom Shores, Department of Mathematics, University of Nebraska.
   ... shores@fergvax.unl.edu

msp33327@uxa.cso.uiuc.edu (Michael S. Pereckas) (06/25/91)

In <shores.677707660@fergvax> shores@fergvax.unl.edu (Shores) writes:

>I have a better idea.  Instead of storing a date STRING, why not just
>store a number?  The Macintosh stores dates as a 4 byte number,
>representing the seconds elapsed since Jan 1, 1904.  Unix has a similar
>convention, only from 1972 (better, IMHO).  Then it should be up to the
>user program to represent the date.   The Mac has IUDateString, unix and
>others have ctime(), etc.

That is totally non-human-readable, however.  That may not matter for
the application that started this thread, whatever that was, but it
can be useful.  Further, everyone will probably use a different
scheme, and one 32 bit number looks the same as the next.  Also, if
you decide that you need tenth-second accuracy, the string can be
extended without breaking the logic of the scheme, even if it does
--

< Michael Pereckas  <>  m-pereckas@uiuc.edu  <>  Just another student... >
   "This desoldering braid doesn't work.  What's this cheap stuff made
    of, anyway?"  "I don't know, looks like solder to me."

campbell@redsox.bsw.com (Larry Campbell) (06/25/91)

I believe there is an ISO standard (sorry, don't know the number).  It's
very simple.  Dates and times are represented as a string of 12 to 16
decimal digits:

	YYYYMMDDHHMMSSHH

where the last two digits represent hundredths of a second;  I believe the
seconds, and hundredths of seconds, are optional.  You could, of course, add
as many trailing digits as you like, if you need to achieve nanosecond
precision, without ambiguity.

This format is completely unambiguous, is easily understood by both humans
and computers, sorts easily, is not Anglocentric, and is compact.

Of course, the time represented is assumed to be UTC, so no time zone
decorations are required.  Your user interface software should know how to
display this in the local time zone, in the local language.
-- 
Larry Campbell             The Boston Software Works, Inc., 120 Fulton Street
campbell@redsox.bsw.com    Boston, Massachusetts 02109 (USA)

enag@ifi.uio.no (Erik Naggum) (06/25/91)

Lessee, it's June 25th, 1991, 3:51pm local time, or 19910625135127.
Alternatively it's 1991-06-25 15:51:27 +02:00.  From both a machine-
and human-readable point of view, delimiters can be very helpful in
disambiguating syntaxes.  A string of digits is not much different
from a string of bits, as I see it.  Especially as the string gets
longer, it's hard for humans to sort out what's what.  Pick out the
day of the month from 19910625135127 and 1991-06-25 15:51:27, as a
simple exercise.

Further, I don't want to see heuristics added to the parsing algorithm
in order to find out what time was _really_ intended.  E.g. is
91-06-25 a date 1900 years ago or just a sloppy syntax?  Is
91062515512744 now, with centisecond precision, 1900 years ago with
centisecond precision or the 15th day of th 25th month of the year
9106 at 51:27:44?  Ok, so it isn't the latter, because it's absurd,
but there are cases where it would be hard to figure it out.

</Erik>



--
Erik Naggum             Professional Programmer            +47-2-836-863
Naggum Software             Electronic Text             <erik@naggum.no>
0118 OSLO, NORWAY       Computer Communications        <enag@ifi.uio.no>

hpa@casbah.acns.nwu.edu (H. Peter Anvin) (06/26/91)

In article <ENAG.91Jun25161013@gyda.ifi.uio.no> of comp.std.internat,
  enag@ifi.uio.no (Erik Naggum) writes:
> Lessee, it's June 25th, 1991, 3:51pm local time, or 19910625135127.
> Alternatively it's 1991-06-25 15:51:27 +02:00.  From both a machine-
> and human-readable point of view, delimiters can be very helpful in
> disambiguating syntaxes.  A string of digits is not much different
> from a string of bits, as I see it.  Especially as the string gets
> longer, it's hard for humans to sort out what's what.  Pick out the
> day of the month from 19910625135127 and 1991-06-25 15:51:27, as a
> simple exercise.
> 
> Further, I don't want to see heuristics added to the parsing algorithm
> in order to find out what time was _really_ intended.  E.g. is
> 91-06-25 a date 1900 years ago or just a sloppy syntax? 

A standardized non-computer-related way of codifying dates in numeric form
is the Julian day number.  It is designed to be zero on noon UTC, 0 Jan
4711 B.C. if I remember it right ("0 Jan" is astronomese for 31 Dec the
year before).  It increments by one every 24 hours; an arbitrary number of
decimals (or binals) can be added to the number for arbitrary precision.
Presume you need millisecond precision.  There are 86,400,000 ms in a day,
so you need either 8 decimals or 27 binals.  There are roughly 2,500,000
days since the zero point, so in order to describe that range and cover the
same time span into the future (to sometime in the 68th century) you need 7
digits or 23 bits.  Total: 15 digits or 50 bits.  If you can cut down to
centisecond performance, only 14 digits or 46 bits.

Check in an astronomical book or table to check out the exact Julian day
number epoch; quite some astronomical literature lists the Julian day
number for the first day of the year.

Personally, I think the JDN would fit very well in a 48-bit, 2's-complement
format with subcentisecond precision: 24 binals, 24 integer bits.  Could
cover almost 46,000 years ranging from 27,000 B.C. to 18,000 A.D.

	/Peter 


-- 
MAIL: hpa@casbah.acns.nwu.edu   (hpa@nwu.edu after this summer)
"finger" the address above for more information.

yfcw14@castle.ed.ac.uk (K P Donnelly) (06/27/91)

If I received a message with a time stamp of
          1991-06-25 15:51:27 +02:00
I would assume, if I didn't know better, that it had been sent at
17:51 Universal Time, since 15+2=17.  In fact, of course, it was sent at
          1991-06-25 13:51:27

Anyone agree with me that the sign convention for time-zones is unfortunate?

   Kevin Donnelly

hpa@casbah.acns.nwu.edu (H. Peter Anvin) (06/28/91)

In article <11329@castle.ed.ac.uk>, yfcw14@castle.ed.ac.uk (K P Donnelly) writes:
|> If I received a message with a time stamp of
|>           1991-06-25 15:51:27 +02:00
|> I would assume, if I didn't know better, that it had been sent at
|> 17:51 Universal Time, since 15+2=17.

The sign convention is NOT unfortunate, it is only the way it has been made to look on USENET.  "+02:00" is more properly written as "GMT+02:00" or "UTC+2".  The problem is the juxtaposition of a time with its time zone code, the latter stripped of "GMT" or "UTC".

	/Peter

-- 
INTERNET: hpa@casbah.acns.nwu.edu   (hpa@nwu.edu after this summer)
BITNET:   HPA@NUACC           HAM RADIO:  N9ITP, SM4TKN
FIDONET:  1:115/989.4
"finger" the Internet address above for more information.

enag@ifi.uio.no (Erik Naggum) (06/29/91)

K P Donnelly <yfcw14@castle.ed.ac.uk> writes:
|
|   If I received a message with a time stamp of
|	      1991-06-25 15:51:27 +02:00
|   I would assume, if I didn't know better, that it had been sent at
|   17:51 Universal Time, since 15+2=17.  In fact, of course, it was sent at
|	      1991-06-25 13:51:27

Well, it's actually short-hand for GMT+02:00.  The time listed is two
hours more than UT.

I think it's intuitive, because I view it from UT, not from local time
to UT.  I think this makes a lot of sense.  In addition, if we change,
we will get massive confusion, and equally useless time zone
indications as the military time zones in RFC 822 (which were listed
wrong, and consequently are rendered ambiguous -> meaningless).

</Erik>
--
Erik Naggum             Professional Programmer            +47-2-836-863
Naggum Software             Electronic Text             <erik@naggum.no>
0118 OSLO, NORWAY       Computer Communications        <enag@ifi.uio.no>