wb_to_df: Create a data frame from a Workbook

Description

Simple function to create a data.frame from a sheet in workbook. Simple as in it was simply written down. read_xlsx() and wb_read() are just internal wrappers of wb_to_df() intended for people coming from other packages.

Usage

wb_to_df(
  file,
  sheet,
  start_row = 1,
  start_col = NULL,
  row_names = FALSE,
  col_names = TRUE,
  skip_empty_rows = FALSE,
  skip_empty_cols = FALSE,
  skip_hidden_rows = FALSE,
  skip_hidden_cols = FALSE,
  rows = NULL,
  cols = NULL,
  detect_dates = TRUE,
  na.strings = "#N/A",
  na.numbers = NA,
  fill_merged_cells = FALSE,
  dims,
  show_formula = FALSE,
  convert = TRUE,
  types,
  named_region,
  keep_attributes = FALSE,
  ...
)
read_xlsx(
  file,
  sheet,
  start_row = 1,
  start_col = NULL,
  row_names = FALSE,
  col_names = TRUE,
  skip_empty_rows = FALSE,
  skip_empty_cols = FALSE,
  rows = NULL,
  cols = NULL,
  detect_dates = TRUE,
  named_region,
  na.strings = "#N/A",
  na.numbers = NA,
  fill_merged_cells = FALSE,
  ...
)
wb_read(
  file,
  sheet = 1,
  start_row = 1,
  start_col = NULL,
  row_names = FALSE,
  col_names = TRUE,
  skip_empty_rows = FALSE,
  skip_empty_cols = FALSE,
  rows = NULL,
  cols = NULL,
  detect_dates = TRUE,
  named_region,
  na.strings = "NA",
  na.numbers = NA,
  ...
)

Arguments

file: An xlsx file, wbWorkbook object or URL to xlsx file.
sheet: Either sheet name or index. When missing the first sheet in the workbook is selected.
start_row: first row to begin looking for data.
start_col: first column to begin looking for data.
row_names: If TRUE, the first col of data will be used as row names.
col_names: If TRUE, the first row of data will be used as column names.
skip_empty_rows: If TRUE, empty rows are skipped.
skip_empty_cols: If TRUE, empty columns are skipped.
skip_hidden_rows: If TRUE, hidden rows are skipped.
skip_hidden_cols: If TRUE, hidden columns are skipped.
rows: A numeric vector specifying which rows in the xlsx file to read. If NULL, all rows are read.
cols: A numeric vector specifying which columns in the xlsx file to read. If NULL, all columns are read.
detect_dates: If TRUE, attempt to recognize dates and perform conversion.
na.strings: A character vector of strings which are to be interpreted as NA. Blank cells will be returned as NA.
na.numbers: A numeric vector of digits which are to be interpreted as NA. Blank cells will be returned as NA.
fill_merged_cells: If TRUE, the value in a merged cell is given to all cells within the merge.
dims: Character string of type "A1:B2" as optional dimensions to be imported.
show_formula: If TRUE, the underlying Excel formulas are shown.
convert: If TRUE, a conversion to dates and numerics is attempted.
types: A named numeric indicating, the type of the data. Names must match the returned data. See Details for more.
named_region: Character string with a named_region (defined name or table). If no sheet is selected, the first appearance will be selected. See wb_get_named_regions()
keep_attributes: If TRUE additional attributes are returned. (These are used internally to define a cell type.)
...: additional arguments

Details

The returned data frame will have named rows matching the rows of the worksheet. With col_names = FALSE the returned data frame will have column names matching the columns of the worksheet. Otherwise the first row is selected as column name.

Depending if the R package hms is loaded, wb_to_df() returns hms variables or string variables in the hh:mm:ss format.

The types argument must be a named numeric.

0: character
1: numeric
2: date
3: posixt (datetime)
4: logical

wb_to_df() will not pick up formulas added to a workbook object via wb_add_formula(). This is because only the formula is written and left to be evaluated when the file is opened in a spreadsheet software. Opening, saving and closing the file in a spreadsheet software will resolve this.

Examples

Run this code

###########################################################################
# numerics, dates, missings, bool and string
example_file <- system.file("extdata", "openxlsx2_example.xlsx", package = "openxlsx2")
wb1 <- wb_load(example_file)

# import workbook
wb_to_df(wb1)

# do not convert first row to column names
wb_to_df(wb1, col_names = FALSE)

# do not try to identify dates in the data
wb_to_df(wb1, detect_dates = FALSE)

# return the underlying Excel formula instead of their values
wb_to_df(wb1, show_formula = TRUE)

# read dimension without colNames
wb_to_df(wb1, dims = "A2:C5", col_names = FALSE)

# read selected cols
wb_to_df(wb1, cols = c("A:B", "G"))

# read selected rows
wb_to_df(wb1, rows = c(2, 4, 6))

# convert characters to numerics and date (logical too?)
wb_to_df(wb1, convert = FALSE)

# erase empty rows from dataset
wb_to_df(wb1, skip_empty_rows = TRUE)

# erase empty columns from dataset
wb_to_df(wb1, skip_empty_cols = TRUE)

# convert first row to rownames
wb_to_df(wb1, sheet = 2, dims = "C6:G9", row_names = TRUE)

# define type of the data.frame
wb_to_df(wb1, cols = c(2, 5), types = c("Var1" = 0, "Var3" = 1))

# start in row 5
wb_to_df(wb1, start_row = 5, col_names = FALSE)

# na string
wb_to_df(wb1, na.strings = "a")

###########################################################################
# Named regions
file_named_region <- system.file("extdata", "namedRegions3.xlsx", package = "openxlsx2")
wb2 <- wb_load(file_named_region)

# read dataset with named_region (returns global first)
wb_to_df(wb2, named_region = "MyRange", col_names = FALSE)

# read named_region from sheet
wb_to_df(wb2, named_region = "MyRange", sheet = 4, col_names = FALSE)

# read_xlsx() and wb_read()
example_file <- system.file("extdata", "openxlsx2_example.xlsx", package = "openxlsx2")
read_xlsx(file = example_file)
df1 <- wb_read(file = example_file, sheet = 1)
df2 <- wb_read(file = example_file, sheet = 1, rows = c(1, 3, 5), cols = 1:3)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Details

See Also

Examples