These functions uses the Arrow C++ CSV reader to read into a data.frame
.
Arrow C++ options have been mapped to argument names that follow those of
readr::read_delim()
, and col_select
was inspired by vroom::vroom()
.
read_delim_arrow(
file,
delim = ",",
quote = "\"",
escape_double = TRUE,
escape_backslash = FALSE,
col_names = TRUE,
col_select = NULL,
na = c("", "NA"),
quoted_na = TRUE,
skip_empty_rows = TRUE,
skip = 0L,
parse_options = NULL,
convert_options = NULL,
read_options = NULL,
as_data_frame = TRUE
)read_csv_arrow(
file,
quote = "\"",
escape_double = TRUE,
escape_backslash = FALSE,
col_names = TRUE,
col_select = NULL,
na = c("", "NA"),
quoted_na = TRUE,
skip_empty_rows = TRUE,
skip = 0L,
parse_options = NULL,
convert_options = NULL,
read_options = NULL,
as_data_frame = TRUE
)
read_tsv_arrow(
file,
quote = "\"",
escape_double = TRUE,
escape_backslash = FALSE,
col_names = TRUE,
col_select = NULL,
na = c("", "NA"),
quoted_na = TRUE,
skip_empty_rows = TRUE,
skip = 0L,
parse_options = NULL,
convert_options = NULL,
read_options = NULL,
as_data_frame = TRUE
)
A character file name, raw
vector, or an Arrow input stream
Single character used to separate fields within a record.
Single character used to quote strings.
Does the file escape quotes by doubling them?
i.e. If this option is TRUE
, the value """"
represents
a single quote, \"
.
Does the file use backslashes to escape special
characters? This is more general than escape_double
as backslashes
can be used to escape the delimiter character, the quote character, or
to add special characters like \\n
.
If TRUE
, the first row of the input will be used as the
column names and will not be included in the data frame. If FALSE
, column
names will be generated by Arrow, starting with "f0", "f1", ..., "fN".
Alternatively, you can specify a character vector of column names.
A character vector of column names to keep, as in the
"select" argument to data.table::fread()
, or a
tidy selection specification
of columns, as used in dplyr::select()
.
A character vector of strings to interpret as missing values.
Should missing values inside quotes be treated as missing
values (the default) or strings. (Note that this is different from the
the Arrow C++ default for the corresponding convert option,
strings_can_be_null
.)
Should blank rows be ignored altogether? If
TRUE
, blank rows will not be represented at all. If FALSE
, they will be
filled with missings.
Number of lines to skip before reading data.
see file reader options.
If given, this overrides any
parsing options provided in other arguments (e.g. delim
, quote
, etc.).
Should the function return a data.frame
(default) or
an Arrow Table?
A data.frame
, or a Table if as_data_frame = FALSE
.
read_csv_arrow()
and read_tsv_arrow()
are wrappers around
read_delim_arrow()
that specify a delimiter.
Note that not all readr
options are currently implemented here. Please file
an issue if you encounter one that arrow
should support.
If you need to control Arrow-specific reader parameters that don't have an
equivalent in readr::read_csv()
, you can either provide them in the
parse_options
, convert_options
, or read_options
arguments, or you can
use CsvTableReader directly for lower-level access.
# NOT RUN {
tf <- tempfile()
on.exit(unlink(tf))
write.csv(iris, file = tf)
df <- read_csv_arrow(tf)
dim(df)
# Can select columns
df <- read_csv_arrow(tf, col_select = starts_with("Sepal"))
# }
Run the code above in your browser using DataLab