Function to identify dates columns and give there format. It use a bunch of default formats. But you can also add your own formats.
identifyDates(
dataSet,
cols = "auto",
formats = NULL,
n_test = 30,
ambiguities = "IGNORE",
verbose = TRUE
)
Matrix, data.frame or data.table
List of column(s) name(s) of dataSet to look into. To check all all columns, set it to "auto". (characters, default to "auto")
List of additional Date formats to check (see strptime
)
Number of non-null rows on which to test (numeric, default to 30)
How ambiguities should be treated (see details in ambiguities section) (character, default to IGNORE)
Should the algorithm talk? (Logical, default to TRUE)
A named list with names being col names of dataSet
and values being formats.
Ambiguities are often present in dates. For example, in date: 2017/01/01, there is no way to know
if format is YYYY/MM/DD or YYYY/DD/MM.
Some times ambiguity can be solved by a human. For example
17/12/31, a human might guess that it is YY/MM/DD, but there is no sure way to know.
To be safe, findAndTransformDates doesn't try to guess ambiguities.
To answer ambiguities problem, param ambiguities
is now available. It can take one of the following values
IGNORE
function will then take the first format which match (fast, but can make some mistakes)
WARN
function will try all format and tell you - via prints - that there are multiple matches (and won't perform date transformation)
SOLVE
function will try to solve ambiguity by going through more lines, so will be slower.
If it is able to solve it, it will transform the column, if not it will print the various acceptable formats.
This function is looking for perfect transformation.
If there are some mistakes in dataSet, consider setting them to NA before.
In the unlikely case where you have numeric higher than as.numeric(as.POSIXct("1990-01-01"))
they will be considered as timestamps and you might have some issues. On the other side,
if you have timestamps before 1990-01-01, they won't be found, but you can use
setColAsDate
to force transformation.
# NOT RUN {
# Load exemple set
data(messy_adult)
head(messy_adult)
# using the findAndTransformDates
identifyDates(messy_adult, n_test = 5)
# }
Run the code above in your browser using DataLab