tbl_df
classThe tbl_df
class is a subclass of data.frame
,
created in order to have different default behaviour. The colloquial term
"tibble" refers to a data frame that has the tbl_df
class. Tibble is the
central data structure for the set of packages known as the
tidyverse, including
dplyr,
ggplot2,
tidyr, and
readr.
The general ethos is that tibbles are lazy and surly: they do less and complain more than base data.frames. This forces problems to be tackled earlier and more explicitly, typically leading to code that is more expressive and robust.
Objects of class tbl_df
have:
A class
attribute of c("tbl_df", "tbl", "data.frame")
.
A base type of "list"
, where each element of the list has the same
vctrs::vec_size()
.
A names
attribute that is a character vector the same length as the
underlying list.
A row.names
attribute, included for compatibility with data.frame.
This attribute is only consulted to query the number of rows,
any row names that might be stored there are ignored
by most tibble methods.
How default behaviour of tibbles differs from that of data.frames, during creation and access:
Column data is not coerced. A character vector is not turned into a factor.
List-columns are expressly anticipated and do not require special tricks.
Read more in tibble()
.
Recycling only happens for a length 1 input.
Read more in vctrs::vec_recycle()
.
Column names are not munged, although missing names are auto-populated.
Empty and duplicated column names are strongly discouraged, but the user
must indicate how to resolve. Read more in vctrs::vec_as_names()
.
Row names are not added and are strongly discouraged, in favor of storing that info as a column. Read about in rownames.
df[, j]
returns a tibble; it does not automatically extract the column
inside. df[, j, drop = FALSE]
is the default. Read more in subsetting.
There is no partial matching when $
is used to index by name. df$name
for a nonexistent name generates a warning. Read more in subsetting.
Printing and inspection are a very high priority. The goal is to convey as much information as possible, in a concise way, even for large and complex tibbles. Read more in formatting.