Downloading documents asynchronously involves some trade-offs. The switching between different streams, detecting when input is available on any of them involves a little more processing and so increases the consumption of CPU cycles. On the other hand, there is a potentially large saving of time when one considers total time to download. See http://www.omegahat.net/RCurl/concurrent.xml for more details. This is a common trade-off that arises in concurrent/parallel/asynchronous computing.
getURI
calls this function if more than one
URI is specified and async
is TRUE
, the default in this case.
One can also download the (contents of the) multiple URIs
serially, i.e. one after the other using getURI
with a value of FALSE
for async
.
getURIAsynchronous(url, ..., .opts = list(), write = NULL, curl = getCurlHandle(), multiHandle = getCurlMultiHandle(), perform = Inf, .encoding = integer(), binary = rep(NA, length(url)))
curlSetOpt
when creating each of the different curlHandle
objects.CURLOptions
object identifying the
curl options for the handle. This is merged with the values of ...
to create the actual options for the curl handle in the request.curlMultiPerform
that are to be made in this
function call. This is typically either 0 for no calls
or Inf
meaning process the requests until completion.
One may find alternative values useful, such as 1 to ensure that
the requests are dispatched.
CE_UTF8
and CE_LATIN1
.
Note that, by default, the package attempts to process the header of
the HTTP response to determine the encoding. This argument is used
when such information is erroneous and the caller knows the correct
encoding.
perform
is zero or too small a value to process all the chunks)
a list with 2 elements is returned.
These elements are:
MultiCURLHandle-class
. This can be used
in further calls to curlMultiPerform
write
argument (after it was potentially
expanded to a list). This can then be used to fetch the results
of the requests when the requests are completed in the future.
curlMultiPerform
and the multi/asynchronous interface for libcurl.
getURL
getCurlMultiHandle
curlMultiPerform
uris = c("http://www.omegahat.net/RCurl/index.html",
"http://www.omegahat.net/RCurl/philosophy.xml")
txt = getURIAsynchronous(uris)
names(txt)
nchar(txt)
Run the code above in your browser using DataLab