Learn R Programming

reproducible (version 1.1.1)

preProcess: Download, Checksum, Extract files

Description

This does downloading (via downloadFile), checksumming (Checksums), and extracting from archives (extractFromArchive), plus cleaning up of input arguments (e.g., paths, function names). This is the first stage of three used in prepInputs.

Usage

preProcess(
  targetFile = NULL,
  url = NULL,
  archive = NULL,
  alsoExtract = NULL,
  destinationPath = getOption("reproducible.destinationPath", "."),
  fun = NULL,
  dlFun = NULL,
  quick = getOption("reproducible.quick"),
  overwrite = getOption("reproducible.overwrite", FALSE),
  purge = FALSE,
  useCache = getOption("reproducible.useCache", FALSE),
  .tempPath,
  ...
)

Arguments

targetFile

Character string giving the path to the eventual file (raster, shapefile, csv, etc.) after downloading and extracting from a zip or tar archive. This is the file before it is passed to postProcess. Currently, the internal checksumming does not checksum the file after it is postProcessed (e.g., cropped/reprojected/masked). Using Cache around prepInputs will do a sufficient job in these cases. See table in preProcess.

url

Optional character string indicating the URL to download from. If not specified, then no download will be attempted. If not entry exists in the CHECKSUMS.txt (in destinationPath), an entry will be created or appended to. This CHECKSUMS.txt entry will be used in subsequent calls to prepInputs or preProcess, comparing the file on hand with the ad hoc CHECKSUMS.txt. See table in preProcess.

archive

Optional character string giving the path of an archive containing targetFile, or a vector giving a set of nested archives (e.g., c("xxx.tar", "inner.zip", "inner.rar")). If there is/are (an) inner archive(s), but they are unknown, the function will try all until it finds the targetFile. See table in preProcess.

alsoExtract

Optional character string naming files other than targetFile that must be extracted from the archive. If NULL, the default, then it will extract all files. Other options: "similar" will extract all files with the same filename without file extension as targetFile. NA will extract nothing other than targetFile. A character string of specific file names will cause only those to be extracted. See table in preProcess.

destinationPath

Character string of a directory in which to download and save the file that comes from url and is also where the function will look for archive or targetFile. NOTE (still experimental): To prevent repeated downloads in different locations, the user can also set options("reproducible.inputPaths") to one or more local file paths to search for the file before attempting to download. Default for that option is NULL meaning do not search locally.

fun

Function or character string indicating the function to use to load targetFile into an R object, e.g., in form with package name: "raster::raster". NOTE: passing NULL will skip loading object into R.

dlFun

Optional "download function" name, such as "raster::getData", which does custom downloading, in addition to loading into R. Still experimental.

quick

Logical. This is passed internally to Checksums (the quickCheck argument), and to Cache (the quick argument). This results in faster, though less robust checking of inputs. See the respective functions.

overwrite

Logical. Should downloading and all the other actions occur even if they pass the checksums or the files are all there.

purge

Logical or Integer. 0/FALSE (default) keeps existing CHECKSUMS.txt file and prepInputs will write or append to it. 1/TRUE will deleted the entire CHECKSUMS.txt file. Other options, see details.

useCache

Passed to Cache in various places. Defaults to getOption("reproducible.useCache").

.tempPath

Optional temporary path for internal file intermediate steps. Will be cleared on.exit from this function.

...

Additional arguments passed to fun (i.e,. user supplied), postProcess and Cache. Since ... is passed to postProcess, these will ... will also be passed into the inner functions, e.g., cropInputs. See details and examples.

Value

A list with 5 elements, checkSums (the result of a Checksums after downloading), dots (cleaned up ..., including deprecated argument checks), fun (the function to be used to load the preProcessed object from disk), and targetFilePath (the fully qualified path to the targetFile).

Combinations of <code>targetFile</code>, <code>url</code>, <code>archive</code>, <code>alsoExtract</code>

# Params url targetFile archive alsoExtract Result Checksum 1st time Checksum 2nd time
------ ------ ------ ------ ------ ------ ------ ------
1 char NULL NULL NULL Download, extract all files if an archive, guess at targetFile, load into R write or append all new files same as 1st -- no targetFile*
NULL char NULL NULL load targetFile into R write or append targetFile no downloading, so no checksums use
NULL NULL char NULL extract all files, guess at targetFile, load into R write or append all new files no downloading, so no checksums use
NULL NULL NULL char guess at targetFile from files in alsoExtract, load into R write or append all new files no downloading, so no checksums use
------ ------ ------ ------ ------ ------ ------ ------
2 char char NULL NULL Download, extract all files if an archive, load targetFile into R write or append all new files use Checksums, skip downloading
char NULL char NULL Download, extract all files, guess at targetFile, load into R write or append all new files same as 1st -- no targetFile*
char NULL NULL char Download, extract only named files in alsoExtract, guess at targetFile, load into R write or append all new files same as 1st -- no targetFile*
NULL char NULL char load targetFile into R write or append all new files no downloading, so no checksums use
NULL char char NULL Extract all files, load targetFile into R write or append all new files no downloading, so no checksums use
NULL NULL char char Extract only named files in alsoExtract, guess at targetFile, load into R write or append all new files no downloading, so no checksums use
------ ------ ------ ------ ------ ------ ------ ------
3 char char char NULL Download, extract all files, load targetFile into R write or append all new files use Checksums, skip downloading
char NULL char char Download, extract files named in alsoExtract, guess at targetFile, load into R write or append all new files use Checksums, skip downloading
char NULL char "similar" Download, extract all files (can't understand "similar"), guess at targetFile, load into R write or append all new files same as 1st -- no targetFile*
char char NULL char Download, if an archive, extract files named in targetFile and alsoExtract, load targetFile into R write or append all new files use Checksums, skip downloading
char char NULL "similar" Download, if an archive, extract files with same base as targetFile, load targetFile into R write or append all new files use Checksums, skip downloading
char char char NULL Download, extract all files from archive, load targetFile into R write or append all new files use Checksums, skip downloading
NULL char char char Extract files named in alsoExtract from archive, load targetFile into R write or append all new files no downloading, so no checksums use
------ ------ ------ ------ ------ ------ ------ ------
4 char char char char Download, extract files named in targetFile and alsoExtract, load targetFile into R write or append all new files use Checksums, skip downloading
char char char "similar" Download, extract all files with same base as targetFile, load targetFile into R write or append all new files use Checksums, skip downloading

* If the url is a file on Google Drive, checksumming will work even without a targetFile specified because there is an initial attempt to get the remove file information (e.g., file name). With that, the connection between the url and the filename used in the CHECKSUMS.txt file can be made.