Ladies and gentlemen programmers,
I am currently building a database of old Usenet postings (spanning from 1988 to 1999) from which I want to extract the respective time stamps. As expected, there were a lot of non-standard date strings in use at the time. Nonetheless, the Date.parse()
method offers promising results, although it handles time zones in a rather idiosyncratic way.
Let’s review the cases of interest:
-
If the string contains (in whichever form) valid date, time, and time zone information, there obviously is no problem.
E.g.
"8 Aug 93 02:49:00 PDT"
will be interpreted as (JSON-style)1993-08-08T09:49:00Z
. -
If the string does not contain valid information at all, and hence, cannot be parsed,
Date.parse()
returns NaN, which, again, is fine. -
However, if the string contains valid date and time information, but no time zone information,
Date.parse()
acts on its own and assumes local time zone.E.g.
"8 Aug 93 02:49:00"
will be interpreted as1993-08-08T00:49:00Z
because I’m currently in the CEST, or UTC+2, time zone. -
Moreover, if the string contains valid date, but no time information,
Date.parse()
, again, acts on its own, albeit differently this time, by assuming UTC.E.g.
"8 Aug 93"
will be interpreted as1993-08-07T22:00:00Z
, again, because I’m currently in the CEST time zone.
Cases 3 and 4 constitute a major problem for the task at hand, because the resulting Date object will be indistinguishable from one obtained by parsing a completely specified string. For instance, "8 Aug 93 02:49:00"
(case 3) will be interpreted the same as "8 Aug 93 00:49:00 GMT"
— but only in the time zone the program is currently running in! Or, "8 Aug 93"
(case 4) will, in the end, be indistinguishable from "8 Aug 93 00:00:00 GMT+2"
(again, assuming CEST).
So my question is: Can Date.parse()
be forced to tell the programmer whether, and if so which, assumptions it made during the parsing process?
Ideally, I want to be able to review incomplete information myself, and possibly correct it, based on otherwise available information. For instance, if the original text was posted in the month of August, from a server of the University of Washington, then the time zone should be PDT – and not my current, local one; nor UTC.
Thanks in advance for any advice you may be able to offer.
Günther Wollenweber is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.