Downloading documents asynchronously involves some trade-offs.
The switching between different streams, detecting when input is
available on any of them involves a little more processing
and so increases the consumption of CPU cycles.
On the other hand, there is a potentially large saving of
time when one considers total time to download.
See
getURI
calls this function if more than one
URI is specified and async
is TRUE
, the default in this case.
One can also download the (contents of the) multiple URIs
serially, i.e. one after the other using getURI
with a value of FALSE
for async
.
getURIAsynchronous(url, ..., .opts = list(), write = NULL,
curl = getCurlHandle(),
multiHandle = getCurlMultiHandle(), perform = Inf,
.encoding = integer(), binary = rep(NA, length(url)))
curlSetOpt
when creating each of the different curlHandle
objects.CURLOptions
object identifying the
curl options for the handle. This is merged with the values of ...to create the actual options for the curl handle in the request.curlMultiPerform
that are to be made in this
function call. This is typically either 0 for no calls
or Inf
meaning proc If the requests are not performed or completed
(i.e. perform
is zero or too small a value to process all the chunks)
a list with 2 elements is returned.
These elements are:
MultiCURLHandle-class
. This can be used
in further calls to curlMultiPerform
write
argument (after it was potentially
expanded to a list). This can then be used to fetch the results
of the requests when the requests are completed in the future.curlMultiPerform
and the multi/asynchronous interface for libcurl.getURL
getCurlMultiHandle
curlMultiPerform
uris = c("http://www.omegahat.org/RCurl/index.html", "http://www.omegahat.org/RCurl/philosophy.xml")
txt = getURIAsynchronous(uris)
names(txt)
nchar(txt)
Run the code above in your browser using DataLab