Usage
getLinkContent(corpus, links = sapply(corpus, meta, "origin"), timeout.request = 30, chunksize = 20, verbose = getOption("verbose"), curlOpts = curlOptions(verbose = FALSE, followlocation = TRUE, maxconnects = 5, maxredirs = 20, timeout = timeout.request, connecttimeout = timeout.request, ssl.verifyhost = FALSE, ssl.verifypeer = FALSE, useragent = "R", cookiejar = tempfile()), retry.empty = 3, sleep.time = 3, extractor = ArticleExtractor, .encoding = integer(), ...)
Arguments
corpus
object of class Corpus
for which link content should be downloaded links
character vector specifyinig links to be used for download, defaults to
sapply(corpus, meta, "Origin")
timeout.request
timeout (in seconds) to be used for connections/requests, defaults to 30
chunksize
Size of download chunks to be used for parallel retrieval, defaults to 20
verbose
Specifies if retrieval info should be printed, defaults to getOption("verbose")
curlOpts
curl options to be passed to getURL
retry.empty
Specifies number of times empty content sites should be retried, defaults to 3
sleep.time
Sleep time to be used between chunked download, defaults to 3 (seconds)
extractor
Extractor to be used for content extraction, defaults to extractContentDOM
.encoding
encoding to be used for getURL
, defaults to integer() (=autodetect) ...
additional parameters to getURL