Learn R Programming

SparkR (version 3.1.2)

read.df: Load a SparkDataFrame

Description

Returns the dataset in a data source as a SparkDataFrame

Usage

read.df(path = NULL, source = NULL, schema = NULL, na.strings = "NA", ...)

loadDF(path = NULL, source = NULL, schema = NULL, ...)

Arguments

path

The path of files to load

source

The name of external data source

schema

The data schema defined in structType or a DDL-formatted string.

na.strings

Default string value for NA when source is "csv"

...

additional external data source specific named properties.

Value

SparkDataFrame

Details

The data source is specified by the source and a set of options(...). If source is not specified, the default data source configured by "spark.sql.sources.default" will be used. Similar to R read.csv, when source is "csv", by default, a value of "NA" will be interpreted as NA.

See Also

read.json

Examples

Run this code
# NOT RUN {
sparkR.session()
df1 <- read.df("path/to/file.json", source = "json")
schema <- structType(structField("name", "string"),
                     structField("info", "map<string,double>"))
df2 <- read.df(mapTypeJsonPath, "json", schema, multiLine = TRUE)
df3 <- loadDF("data/test_table", "parquet", mergeSchema = "true")
stringSchema <- "name STRING, info MAP<STRING, DOUBLE>"
df4 <- read.df(mapTypeJsonPath, "json", stringSchema, multiLine = TRUE)
# }

Run the code above in your browser using DataLab