datediff
: Returns the number of days from y
to x
.
If y
is later than x
then the result is positive.
months_between
: Returns number of months between dates y
and x
.
If y
is later than x
, then the result is positive. If y
and x
are on the same day of month, or both are the last day of month, time of day will be ignored.
Otherwise, the difference is calculated based on 31 days per month, and rounded to 8 digits.
date_format
: Converts a date/timestamp/string to a value of string in the format
specified by the date format given by the second argument. A pattern could be for instance
dd.MM.yyyy
and could return a string like '18.03.1993'. All
pattern letters of java.time.format.DateTimeFormatter
can be used.
Note: Use when ever possible specialized functions like year
. These benefit from a
specialized implementation.
from_utc_timestamp
: This is a common function for databases supporting TIMESTAMP WITHOUT
TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as a
timestamp in UTC, and renders that timestamp as a timestamp in the given time zone.
However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not
timezone-agnostic. So in Spark this function just shift the timestamp value from UTC timezone to
the given timezone.
This function may return confusing result if the input is a string with timezone, e.g.
(2018-03-13T06:18:23+00:00
). The reason is that, Spark firstly cast the string to
timestamp according to the timezone in the string, and finally display the result by converting
the timestamp to string according to the session local timezone.
next_day
: Given a date column, returns the first date which is later than the value of
the date column that is on the specified day of the week. For example,
next_day("2015-07-27", "Sunday")
returns 2015-08-02 because that is the first Sunday
after 2015-07-27. Day of the week parameter is case insensitive, and accepts first three or
two characters: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun".
to_utc_timestamp
: This is a common function for databases supporting TIMESTAMP WITHOUT
TIMEZONE. This function takes a timestamp which is timezone-agnostic, and interprets it as a
timestamp in the given timezone, and renders that timestamp as a timestamp in UTC.
However, timestamp in Spark represents number of microseconds from the Unix epoch, which is not
timezone-agnostic. So in Spark this function just shift the timestamp value from the given
timezone to UTC timezone.
This function may return confusing result if the input is a string with timezone, e.g.
(2018-03-13T06:18:23+00:00
). The reason is that, Spark firstly cast the string to
timestamp according to the timezone in the string, and finally display the result by converting
the timestamp to string according to the session local timezone.
add_months
: Returns the date that is numMonths (x
) after startDate (y
).
date_add
: Returns the date that is x
days after.
date_sub
: Returns the date that is x
days before.