This does downloading (via downloadFile
), checksumming (Checksums
),
and extracting from archives (extractFromArchive
), plus cleaning up of input
arguments (e.g., paths, function names).
This is the first stage of three used in prepInputs
.
preProcess(
targetFile = NULL,
url = NULL,
archive = NULL,
alsoExtract = NULL,
destinationPath = getOption("reproducible.destinationPath", "."),
fun = NULL,
dlFun = NULL,
quick = getOption("reproducible.quick"),
overwrite = getOption("reproducible.overwrite", FALSE),
purge = FALSE,
useCache = getOption("reproducible.useCache", FALSE),
.tempPath,
...
)
Character string giving the path to the eventual file
(raster, shapefile, csv, etc.) after downloading and extracting from a zip
or tar archive. This is the file before it is passed to
postProcess
. Currently, the internal checksumming does not checksum
the file after it is postProcess
ed (e.g., cropped/reprojected/masked).
Using Cache
around prepInputs
will do a sufficient job in these cases.
See table in preProcess
.
Optional character string indicating the URL to download from.
If not specified, then no download will be attempted. If not entry
exists in the CHECKSUMS.txt
(in destinationPath
), an entry
will be created or appended to. This CHECKSUMS.txt
entry will be used
in subsequent calls to
prepInputs
or preProcess
, comparing the file on hand with the ad hoc
CHECKSUMS.txt
. See table in preProcess
.
Optional character string giving the path of an archive
containing targetFile
, or a vector giving a set of nested archives
(e.g., c("xxx.tar", "inner.zip", "inner.rar")
). If there is/are (an) inner
archive(s), but they are unknown, the function will try all until it finds
the targetFile
. See table in preProcess
.
Optional character string naming files other than
targetFile
that must be extracted from the archive
. If
NULL
, the default, then it will extract all files. Other options:
"similar"
will extract all files with the same filename without
file extension as targetFile
. NA
will extract nothing other
than targetFile
. A character string of specific file names will cause
only those to be extracted. See table in preProcess
.
Character string of a directory in which to download
and save the file that comes from url
and is also where the function
will look for archive
or targetFile
. NOTE (still experimental):
To prevent repeated downloads in different locations, the user can also set
options("reproducible.inputPaths")
to one or more local file paths to
search for the file before attempting to download. Default for that option is
NULL
meaning do not search locally.
Function or character string indicating the function to use to load
targetFile
into an R
object, e.g., in form with package name:
"raster::raster"
. NOTE: passing NULL
will skip loading object
into R.
Optional "download function" name, such as "raster::getData"
, which does
custom downloading, in addition to loading into R. Still experimental.
Logical. Should downloading and all the other actions occur even if they pass the checksums or the files are all there.
Logical or Integer. 0/FALSE
(default) keeps existing
CHECKSUMS.txt
file and
prepInputs
will write or append to it. 1/TRUE
will deleted the entire
CHECKSUMS.txt
file. Other options, see details.
Passed to Cache
in various places.
Defaults to getOption("reproducible.useCache")
.
Optional temporary path for internal file intermediate steps. Will be cleared on.exit from this function.
Additional arguments passed to fun
(i.e,. user supplied),
postProcess
and Cache
.
Since ...
is passed to postProcess
, these will
...
will also be passed into the inner
functions, e.g., cropInputs
. See details and examples.
A list with 5 elements, checkSums
(the result of a Checksums
after downloading), dots
(cleaned up ..., including deprecated argument checks),
fun
(the function to be used to load the preProcessed object from disk),
and targetFilePath
(the fully qualified path to the targetFile
).
# Params | url |
targetFile |
archive |
alsoExtract |
Result | Checksum 1st time | Checksum 2nd time |
------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
1 | char | NULL | NULL | NULL | Download, extract all files if an archive, guess at targetFile , load into R |
write or append all new files | same as 1st -- no targetFile * |
NULL | char | NULL | NULL | load targetFile into R |
write or append targetFile |
no downloading, so no checksums use | |
NULL | NULL | char | NULL | extract all files, guess at targetFile , load into R |
write or append all new files | no downloading, so no checksums use | |
NULL | NULL | NULL | char | guess at targetFile from files in alsoExtract , load into R |
write or append all new files | no downloading, so no checksums use | |
------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
2 | char | char | NULL | NULL | Download, extract all files if an archive, load targetFile into R |
write or append all new files | use Checksums, skip downloading |
char | NULL | char | NULL | Download, extract all files, guess at targetFile , load into R |
write or append all new files | same as 1st -- no targetFile * |
|
char | NULL | NULL | char | Download, extract only named files in alsoExtract , guess at targetFile , load into R |
write or append all new files | same as 1st -- no targetFile * |
|
NULL | char | NULL | char | load targetFile into R |
write or append all new files | no downloading, so no checksums use | |
NULL | char | char | NULL | Extract all files, load targetFile into R |
write or append all new files | no downloading, so no checksums use | |
NULL | NULL | char | char | Extract only named files in alsoExtract , guess at targetFile , load into R |
write or append all new files | no downloading, so no checksums use | |
------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
3 | char | char | char | NULL | Download, extract all files, load targetFile into R |
write or append all new files | use Checksums, skip downloading |
char | NULL | char | char | Download, extract files named in alsoExtract , guess at targetFile , load into R |
write or append all new files | use Checksums, skip downloading | |
char | NULL | char | "similar" |
Download, extract all files (can't understand "similar"), guess at targetFile , load into R |
write or append all new files | same as 1st -- no targetFile * |
|
char | char | NULL | char | Download, if an archive, extract files named in targetFile and alsoExtract , load targetFile into R |
write or append all new files | use Checksums, skip downloading | |
char | char | NULL | "similar" |
Download, if an archive, extract files with same base as targetFile , load targetFile into R |
write or append all new files | use Checksums, skip downloading | |
char | char | char | NULL | Download, extract all files from archive, load targetFile into R |
write or append all new files | use Checksums, skip downloading | |
NULL | char | char | char | Extract files named in alsoExtract from archive, load targetFile into R |
write or append all new files | no downloading, so no checksums use | |
------ | ------ | ------ | ------ | ------ | ------ | ------ | ------ |
4 | char | char | char | char | Download, extract files named in targetFile and alsoExtract , load targetFile into R |
write or append all new files | use Checksums, skip downloading |
char | char | char | "similar" |
Download, extract all files with same base as targetFile , load targetFile into R |
write or append all new files | use Checksums, skip downloading |
*
If the url
is a file on Google Drive, checksumming will work
even without a targetFile
specified because there is an initial attempt
to get the remove file information (e.g., file name). With that, the connection
between the url
and the filename used in the CHECKSUMS.txt file can be made.