Parquet is a columnar storage file format. This function enables you to write Parquet files from R.
write_parquet(
x,
sink,
chunk_size = NULL,
version = NULL,
compression = NULL,
compression_level = NULL,
use_dictionary = NULL,
write_statistics = NULL,
data_page_size = NULL,
properties = ParquetWriterProperties$create(x, version = version, compression =
compression, compression_level = compression_level, use_dictionary = use_dictionary,
write_statistics = write_statistics, data_page_size = data_page_size),
use_deprecated_int96_timestamps = FALSE,
coerce_timestamps = NULL,
allow_truncated_timestamps = FALSE,
arrow_properties = ParquetArrowWriterProperties$create(use_deprecated_int96_timestamps
= use_deprecated_int96_timestamps, coerce_timestamps = coerce_timestamps,
allow_truncated_timestamps = allow_truncated_timestamps)
)
An arrow::Table, or an object convertible to it.
an arrow::io::OutputStream or a string which is interpreted as a file path
chunk size in number of rows. If NULL, the total number of rows is used.
parquet version, "1.0" or "2.0". Default "1.0"
compression algorithm. Default "snappy". See details.
compression level. Meaning depends on compression algorithm
Specify if we should use dictionary encoding. Default TRUE
Specify if we should write statistics. Default TRUE
Set a target threshold for the approximate encoded size of data pages within a column chunk (in bytes). Default 1 MiB.
properties for parquet writer, derived from arguments
version
, compression
, compression_level
, use_dictionary
,
write_statistics
and data_page_size
. You should not specify any of
these arguments if you also provide a properties
argument, as they will
be ignored.
Write timestamps to INT96 Parquet format. Default FALSE
.
Cast timestamps a particular resolution. Can be
NULL
, "ms" or "us". Default NULL
(no casting)
Allow loss of data when coercing timestamps to a particular resolution. E.g. if microsecond or nanosecond data is lost when coercing to "ms", do not raise an exception
arrow specific writer properties, derived from arguments
use_deprecated_int96_timestamps
, coerce_timestamps
and allow_truncated_timestamps
You should not specify any of these arguments if you also provide a properties
argument, as they will be ignored.
the input x
invisibly.
The parameters compression
, compression_level
, use_dictionary
and
write_statistics
support various patterns:
The default NULL
leaves the parameter unspecified, and the C++ library
uses an appropriate default for each column (defaults listed above)
A single, unnamed, value (e.g. a single string for compression
) applies to all columns
An unnamed vector, of the same size as the number of columns, to specify a value for each column, in positional order
A named vector, to specify the value for the named columns, the default value for the setting is used when not supplied
The compression
argument can be any of the following (case insensitive):
"uncompressed", "snappy", "gzip", "brotli", "zstd", "lz4", "lzo" or "bz2".
Only "uncompressed" is guaranteed to be available, but "snappy" and "gzip"
are almost always included. See codec_is_available()
.
The default "snappy" is used if available, otherwise "uncompressed". To
disable compression, set compression = "uncompressed"
.
Note that "uncompressed" columns may still have dictionary encoding.
# NOT RUN {
tf1 <- tempfile(fileext = ".parquet")
write_parquet(data.frame(x = 1:5), tf1)
# using compression
if (codec_is_available("gzip")) {
tf2 <- tempfile(fileext = ".gz.parquet")
write_parquet(data.frame(x = 1:5), tf2, compression = "gzip", compression_level = 5)
}
# }
Run the code above in your browser using DataLab