Create a list of all the files (in all subfolders) of an FTP server.
Defaults to the German Weather Service (DWD, Deutscher WetterDienst) OpenData server at
https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/.
The R package RCurl
must be available to do this.
It's not suggested to run this for all folders, as it can take quite some time
and you may get kicked off the FTP-Server. This package contains an index
of the climatic observations at weather stations (fileIndex
)
and gridded datasets (gridIndex
).
If they are out of date, please let me know!
Getting banned from the FTP Server
Normally, this shouldn't happen anymore: since Version 0.10.10 (2018-11-26),
a single RCurl handle is used for all FTP requests and since version 1.0.17 (2019-05-14),
the file tree provided by the DWD is used to obtain all folders first,
eliminating the recursive calls.
There's a provision if the FTP server detects bot requests and denies access.
If RCurl::getURL()
fails, there will still be an output
which you can pass in a second run via folder
to extract the remaining dirs.
You might need to wait a bit and set sleep
to a higher value in that case.
Here's an example:
gridindex <- indexFTP("", gridbase)
gridindex <- indexFTP(gridindex, gridbase, sleep=15)
Of course, with a higher sleep value, the execution will take longer!
indexFTP(
folder = "currentfindex",
base = dwdbase,
is.file.if.has.dot = TRUE,
exclude.latest.bin = TRUE,
fast = TRUE,
sleep = 0,
dir = "DWDdata",
filename = folder[1],
overwrite = FALSE,
quiet = rdwdquiet(),
progbar = !quiet,
verbose = FALSE
)
a vector with file paths
Folder(s) to be indexed recursively, e.g. "/hourly/wind/".
Leading slashes will be removed.
Use folder=""
to search at the location of base
itself.
If folder
is "currentfindex" (the default) and base
is the default, folder
is changed to all observational
folders listed in the current tree file at
https://opendata.dwd.de/weather/tree.html. With "currentgindex"
and gridbase
, the grid folders in the tree are used.
DEFAULT: "currentfindex"
Main directory of FTP server. Trailing slashes will be removed.
DEFAULT: dwdbase
Logical: if some of the input paths contain a dot, treat those as files, i.e. do not try to read those as if they were a folder. Only set this to FALSE if you know what you're doing. DEFAULT: TRUE
Exclude latest file at opendata.dwd.de/weather/radar/radolan? RCurl::getURL indicates this is a pointer to the last regularly named file. DEFAULT: TRUE
Read tree file with data.table::fread()
(1 sec) instead of readLines()
(10 secs)?
DEFAULT: TRUE
If not 0, a random number of seconds between 0 and sleep
is passed to Sys.sleep()
after each read folder
to avoid getting kicked off the FTP-Server, see note above. DEFAULT: 0
Writeable directory name where to save the downloaded file.
Created if not existent.
DEFAULT: "DWDdata" at current getwd()
Character: Part of output filename. "INDEX_of_DWD_" is prepended, "/" replaced with "_", ".txt" appended. DEFAULT: folder[1]
Logical: Overwrite existing file? If not, "_n" is added to the
filename, see berryFunctions::newFilename()
.
DEFAULT: FALSE
Suppress progbars and message about directory/files?
DEFAULT: FALSE through rdwdquiet()
Logical: present a progress bar in each level? DEFAULT: TRUE
Logical: write a lot of messages from RCurl::getURL()
?
DEFAULT: FALSE (usually, you dont need all the curl information)
Berry Boessenkool, berry-b@gmx.de, Oct 2016
createIndex()
, updateIndexes()
,
website index chapter
if (FALSE) ## Needs internet connection
sol <- indexFTP(folder="/daily/solar", dir=tempdir())
head(sol)
# mon <- indexFTP(folder="/monthly/kl", dir=tempdir(), verbose=TRUE)
Run the code above in your browser using DataLab