Learn R Programming

readr (version 0.1.0)

parse_datetime: Parse a character vector of dates or date times.

Description

Parse a character vector of dates or date times.

Usage

parse_datetime(x, format = "", tz = "UTC")
col_datetime(format = "", tz = "UTC")
parse_date(x, format = "%Y-%m-%d")
col_date(format = "%Y-%m-%d")

Arguments

x
A character vector of dates to parse.
format
A format specification, as described below. If omitted, parses dates according to the ISO8601 specification (with caveats, as described below).

Unlike strptime, the format specification must match the complete string.

tz
Default tz. This is used both for input (if the time zone isn't present in individual strings), and for output (to control the default display). The default is to use "UTC", a time zone that does not use daylight savings time (DST) and hence is typically most useful for data. The absense of time zones makes it approximately 50x faster to generate UTC times than any other time zone.

Use "" to use the system default time zone, but beware that this will not be reproducible across systems.

For a complete list of possible time zones, see OlsonNames(). Americans, note that "EST" is a Canadian time zone that does not have DST. It is not Eastern Standard Time. It's better to use "US/Eastern", "US/Central" etc.

Value

A POSIXct vector with tzone attribute set to tz. Elements that could not be parsed (or did not generate valid dates) will bes set to NA, and a warning message will inform you of the total number of failures.

Format specification

readr uses a format specification similiar to strptime. There are three types of element:
  1. Date components are specified with "%" followed by a letter. For example "%Y" matches a 4 digit year, "%m", matches a 2 digit month and "%d" matches a 2 digit day.
  2. Whitespace is any sequence of zero or more whitespace characters.
  3. Any other character is matched exactly.
parse_datetime recognises the following format specifications:
  • Year: "%Y" (4 digits). "%y" (2 digits); 00-69 -> 2000-2069, 70-99 -> 1970-1999.
  • Month: "%m" (2 digits), "%b" (abbreviated name in current locale), "%B" (full name in current locale).
  • Day: "%d" (2 digits), "%e" (optional leading space)
  • Hour: "%H"
  • Minutes: "%M"
  • Seconds: "%S" (integer seconds), "%OS" (partial seconds)
  • Time zone: "%Z" (as name, e.g. "America/Chicago"), "%z" (as offset from UTC, e.g. "+0800")
  • Non-digits: "%." skips one non-digit charcter, "%*" skips any number of non-digits characters.
  • Shortcuts: "%D" = "%m/%d/%y", "%F" = "%Y-%m-%d", "%R" = "%H:%M", "%T" = "%H:%M:%S", "%x" = "%y/%m/%d".

ISO8601 support

Currently, readr does not support all of ISO8601. Missing features:
  • Week & weekday specifications, e.g. "2013-W05", "2013-W05-10"
  • Ordinal dates, e.g. "2013-095".
  • Using commas instead of a period for decimal separator
The parser is also a little laxer than ISO8601:
  • Dates and times can be separated with a space, not just T.
  • Mostly correct specifications like "2009-05-19 14:" and "200912-01" work.

Examples

Run this code
# Format strings --------------------------------------------------------
parse_datetime("01/02/2010", "%d/%m/%Y")
parse_datetime("01/02/2010", "%m/%d/%Y")
# Handle any separator
parse_datetime("01/02/2010", "%m%.%d%.%Y")

# Dates look the same, but internally they use the number of days since
# 1970-01-01 instead of the number of seconds. This avoids a whole lot
# of troubles related to time zones, so use if you can.
parse_date("01/02/2010", "%d/%m/%Y")
parse_date("01/02/2010", "%m/%d/%Y")

# You can parse timezones from strings (as listed in OlsonNames())
parse_datetime("2010/01/01 12:00 US/Central", "%Y/%m/%d %H:%M %Z")
# Or from offsets
parse_datetime("2010/01/01 12:00 -0600", "%Y/%m/%d %H:%M %z")

# Use the tz parameter to control the default time zone
# (but note UTC is considerably faster than other options)
parse_datetime("2010/01/01 12:00", "%Y/%m/%d %H:%M", tz = "US/Central")
parse_datetime("2010/01/01 12:00", "%Y/%m/%d %H:%M", tz = "US/Eastern")

# Unlike strptime, the format specification must match the complete
# string (ignoring leading and trailing whitespace). This avoids common
# errors:
strptime("01/02/2010", "%d/%m/%y")
parse_datetime("01/02/2010", "%d/%m/%y")

# Failures -------------------------------------------------------------
parse_datetime("01/01/2010", "%d/%m/%Y")
x <- parse_datetime(c("01/ab/2010", "32/01/2010"), "%d/%m/%Y")
problems(x)

# ISO8601 --------------------------------------------------------------
# With separators
parse_datetime("1979-10-14")
parse_datetime("1979-10-14T10")
parse_datetime("1979-10-14T10:11")
parse_datetime("1979-10-14T10:11:12")
parse_datetime("1979-10-14T10:11:12.12345")

# Without separators
parse_datetime("19791014")
parse_datetime("19791014T101112")

# Time zones
parse_datetime("1979-10-14T1010", tz = "US/Central")
parse_datetime("1979-10-14T1010-0500", tz = "US/Central")
parse_datetime("1979-10-14T1010Z", tz = "US/Central")

Run the code above in your browser using DataLab