Learn R Programming

pointblank (version 0.7.0)

file_tbl: Get a table from a local or remote file

Description

If your target table is in a file, stored either locally or remotely, the file_tbl() function can make it possible to access it in a single function call. Compatible file types for this function are: CSV (.csv), TSV (.tsv), RDA (.rda), and RDS (.rds) files. This function generates an in-memory tbl_dbl object, which can be used as a target table for create_agent() and create_informant(). The ideal option for data access with file_tbl() is using this function as the read_fn parameter in either of the aforementioned create_*() functions. This can be done by using a leading ~ (e.g,. read_fn = ~file_tbl(...)).

In the remote data use case, we can specify a URL starting with http://, https://, etc., and ending with the file containing the data table. If data files are available in a GitHub repository then we can use the from_github() function to specify the name and location of the table data in a repository.

Usage

file_tbl(file, type = NULL, ..., keep = FALSE, verify = TRUE)

Arguments

file

The complete file path leading to a compatible data table either in the user system or at a http://, https://, ftp://, or ftps:// URL. For a file hosted in a GitHub repository, a call to the from_github() function can be used here.

type

The file type. This is normally inferred by file extension and is by default NULL to indicate that the extension will dictate the type of file reading that is performed internally. However, if there is no extension (and valid extensions are .csv, .tsv, .rda, and .rds), we can provide the type as either of csv, tsv, rda, or rds.

...

Options passed to readr's read_csv() or read_tsv() function. Both functions have the same arguments and one or the other will be used internally based on the file extension or an explicit value given to type.

keep

In the case of a downloaded file, should it be stored in the working directory (keep = TRUE) or should it be downloaded to a temporary directory? By default, this is FALSE.

verify

If TRUE (the default) then a verification of the data object having the data.frame class will be carried out.

Value

A tbl_df object.

Function ID

1-7

See Also

Other Planning and Prep: action_levels(), create_agent(), create_informant(), db_tbl(), scan_data(), tbl_get(), tbl_source(), tbl_store(), validate_rmd()

Examples

Run this code
# NOT RUN {
# A local CSV file can be obtained as
# a tbl object by supplying a path to
# the file and some CSV reading options
# (the ones used by `readr::read_csv()`)
# to the `file_tbl()` function; for
# this example we could obtain a path
# to a CSV file in the pointblank
# package with `system.file()`:
csv_path <- 
  system.file(
    "data_files", "small_table.csv",
    package = "pointblank"
  )

# Then use that path in `file_tbl()`
# with the option to specify the column
# types in that CSV  
tbl <- 
  file_tbl(
    file = csv_path,
    col_types = "TDdcddlc"
  )
  
# Now that we have a `tbl` object that
# is a tibble, it can be introduced to
# `create_agent()` for validation
agent <- create_agent(tbl = tbl)

# A different strategy is to provide
# the data-reading function call
# directly to `create_agent()`:
agent <- 
  create_agent(
    read_fn = ~ file_tbl(
      file = system.file(
        "data_files", "small_table.csv",
        package = "pointblank"
      ),
      col_types = "TDdcddlc"
    )
  ) %>%
  col_vals_gt(vars(a), value = 0)

# All of the file-reading instructions
# are encapsulated in the `read_fn` so
# the agent will always obtain the most
# recent version of the dataset (and the
# logic can be translated to YAML, for
# later use)

if (interactive()) {

# A CSV can be obtained from a public
# GitHub repo by using the `from_github()`
# helper function; let's create an agent
# a supply a table-prep formula that
# gets the same CSV file from the GitHub
# repository for the pointblank package 
agent <- 
  create_agent(
    read_fn = ~ file_tbl(
      file = from_github(
        file = "inst/data_files/small_table.csv",
        repo = "rich-iannone/pointblank"
      ),
      col_types = "TDdcddlc"
    )
  ) %>%
  col_vals_gt(vars(a), value = 0) %>%
  interrogate()

# This interrogated the data that was
# obtained from the remote source file,
# and, there's nothing to clean up (by
# default, the downloaded file goes into
# a system temp directory)

# Storing table-prep formulas in a table
# store makes it easier to work with
# tabular data originating from files;
# here's how to generate a table store
# with two named entries for table
# preparations
tbls <-
  tbl_store(
    small_table_file ~ file_tbl(
      file = system.file(
        "data_files", "small_table.csv",
        package = "pointblank"
      ),
      col_types = "TDdcddlc"
    ),
    small_high_file ~ file_tbl(
      file = system.file(
        "data_files", "small_table.csv",
        package = "pointblank"
      ),
      col_types = "TDdcddlc"
    ) %>%
      dplyr::filter(f == "high")
  )

# Now it's easy to access either of these
# tables (the second is a mutated version)
# via the `tbl_get()` function
tbl_get("small_table_file", store = tbls)
tbl_get("small_high_file", store = tbls)

# The table-prep formulas in `tbls`
# could also be used in functions with
# the `read_fn` argument; this is thanks
# to the `tbl_source()` function
agent <- 
  create_agent(
    read_fn = ~ tbl_source(
      "small_table_file",
      store = tbls
    )
  )

informant <- 
  create_informant(
    read_fn = ~ tbl_source(
      "small_high_file",
      store = tbls
    )
  )

}

# }

Run the code above in your browser using DataLab