Learn R Programming

openxlsx2 (version 1.4)

wb_to_df: Create a data frame from a Workbook

Description

Simple function to create a data.frame from a sheet in workbook. Simple as in it was simply written down. read_xlsx() and wb_read() are just internal wrappers of wb_to_df() intended for people coming from other packages.

Usage

wb_to_df(
  file,
  sheet,
  start_row = 1,
  start_col = NULL,
  row_names = FALSE,
  col_names = TRUE,
  skip_empty_rows = FALSE,
  skip_empty_cols = FALSE,
  skip_hidden_rows = FALSE,
  skip_hidden_cols = FALSE,
  rows = NULL,
  cols = NULL,
  detect_dates = TRUE,
  na.strings = "#N/A",
  na.numbers = NA,
  fill_merged_cells = FALSE,
  dims,
  show_formula = FALSE,
  convert = TRUE,
  types,
  named_region,
  keep_attributes = FALSE,
  ...
)

read_xlsx( file, sheet, start_row = 1, start_col = NULL, row_names = FALSE, col_names = TRUE, skip_empty_rows = FALSE, skip_empty_cols = FALSE, rows = NULL, cols = NULL, detect_dates = TRUE, named_region, na.strings = "#N/A", na.numbers = NA, fill_merged_cells = FALSE, ... )

wb_read( file, sheet = 1, start_row = 1, start_col = NULL, row_names = FALSE, col_names = TRUE, skip_empty_rows = FALSE, skip_empty_cols = FALSE, rows = NULL, cols = NULL, detect_dates = TRUE, named_region, na.strings = "NA", na.numbers = NA, ... )

Arguments

file

An xlsx file, wbWorkbook object or URL to xlsx file.

sheet

Either sheet name or index. When missing the first sheet in the workbook is selected.

start_row

first row to begin looking for data.

start_col

first column to begin looking for data.

row_names

If TRUE, the first col of data will be used as row names.

col_names

If TRUE, the first row of data will be used as column names.

skip_empty_rows

If TRUE, empty rows are skipped.

skip_empty_cols

If TRUE, empty columns are skipped.

skip_hidden_rows

If TRUE, hidden rows are skipped.

skip_hidden_cols

If TRUE, hidden columns are skipped.

rows

A numeric vector specifying which rows in the xlsx file to read. If NULL, all rows are read.

cols

A numeric vector specifying which columns in the xlsx file to read. If NULL, all columns are read.

detect_dates

If TRUE, attempt to recognize dates and perform conversion.

na.strings

A character vector of strings which are to be interpreted as NA. Blank cells will be returned as NA.

na.numbers

A numeric vector of digits which are to be interpreted as NA. Blank cells will be returned as NA.

fill_merged_cells

If TRUE, the value in a merged cell is given to all cells within the merge.

dims

Character string of type "A1:B2" as optional dimensions to be imported.

show_formula

If TRUE, the underlying Excel formulas are shown.

convert

If TRUE, a conversion to dates and numerics is attempted.

types

A named numeric indicating, the type of the data. Names must match the returned data. See Details for more.

named_region

Character string with a named_region (defined name or table). If no sheet is selected, the first appearance will be selected. See wb_get_named_regions()

keep_attributes

If TRUE additional attributes are returned. (These are used internally to define a cell type.)

...

additional arguments

Details

The returned data frame will have named rows matching the rows of the worksheet. With col_names = FALSE the returned data frame will have column names matching the columns of the worksheet. Otherwise the first row is selected as column name.

Depending if the R package hms is loaded, wb_to_df() returns hms variables or string variables in the hh:mm:ss format.

The types argument must be a named numeric.

  • 0: character

  • 1: numeric

  • 2: date

  • 3: posixt (datetime)

  • 4: logical

wb_to_df() will not pick up formulas added to a workbook object via wb_add_formula(). This is because only the formula is written and left to be evaluated when the file is opened in a spreadsheet software. Opening, saving and closing the file in a spreadsheet software will resolve this.

See Also

wb_get_named_regions()

Examples

Run this code
###########################################################################
# numerics, dates, missings, bool and string
example_file <- system.file("extdata", "openxlsx2_example.xlsx", package = "openxlsx2")
wb1 <- wb_load(example_file)

# import workbook
wb_to_df(wb1)

# do not convert first row to column names
wb_to_df(wb1, col_names = FALSE)

# do not try to identify dates in the data
wb_to_df(wb1, detect_dates = FALSE)

# return the underlying Excel formula instead of their values
wb_to_df(wb1, show_formula = TRUE)

# read dimension without colNames
wb_to_df(wb1, dims = "A2:C5", col_names = FALSE)

# read selected cols
wb_to_df(wb1, cols = c("A:B", "G"))

# read selected rows
wb_to_df(wb1, rows = c(2, 4, 6))

# convert characters to numerics and date (logical too?)
wb_to_df(wb1, convert = FALSE)

# erase empty rows from dataset
wb_to_df(wb1, skip_empty_rows = TRUE)

# erase empty columns from dataset
wb_to_df(wb1, skip_empty_cols = TRUE)

# convert first row to rownames
wb_to_df(wb1, sheet = 2, dims = "C6:G9", row_names = TRUE)

# define type of the data.frame
wb_to_df(wb1, cols = c(2, 5), types = c("Var1" = 0, "Var3" = 1))

# start in row 5
wb_to_df(wb1, start_row = 5, col_names = FALSE)

# na string
wb_to_df(wb1, na.strings = "a")

###########################################################################
# Named regions
file_named_region <- system.file("extdata", "namedRegions3.xlsx", package = "openxlsx2")
wb2 <- wb_load(file_named_region)

# read dataset with named_region (returns global first)
wb_to_df(wb2, named_region = "MyRange", col_names = FALSE)

# read named_region from sheet
wb_to_df(wb2, named_region = "MyRange", sheet = 4, col_names = FALSE)

# read_xlsx() and wb_read()
example_file <- system.file("extdata", "openxlsx2_example.xlsx", package = "openxlsx2")
read_xlsx(file = example_file)
df1 <- wb_read(file = example_file, sheet = 1)
df2 <- wb_read(file = example_file, sheet = 1, rows = c(1, 3, 5), cols = 1:3)

Run the code above in your browser using DataLab