function to get multiple robotstxt files
get_robotstxts(
domain,
warn = TRUE,
force = FALSE,
user_agent = utils::sessionInfo()$R.version$version.string,
ssl_verifypeer = c(1, 0),
use_futures = FALSE,
verbose = FALSE,
rt_request_handler = robotstxt::rt_request_handler,
rt_robotstxt_http_getter = robotstxt::get_robotstxt_http_get,
on_server_error = on_server_error_default,
on_client_error = on_client_error_default,
on_not_found = on_not_found_default,
on_redirect = on_redirect_default,
on_domain_change = on_domain_change_default,
on_file_type_mismatch = on_file_type_mismatch_default,
on_suspect_content = on_suspect_content_default
)
domain from which to download robots.txt file
warn about being unable to download domain/robots.txt because of
if TRUE instead of using possible cached results the function will re-download the robotstxt file HTTP response status 404. If this happens,
HTTP user-agent string to be used to retrieve robots.txt file from domain
either 1 (default) or 0, if 0 it disables SSL peer verification, which might help with robots.txt file retrieval
Should future::future_lapply be used for possible parallel/async retrieval or not. Note: check out help pages and vignettes of package future on how to set up plans for future execution because the robotstxt package does not do it on its own.
make function print out more information
handler function that handles request according to the event handlers specified
function that executes HTTP request
request state handler for any 5xx status
request state handler for any 4xx HTTP status that is not 404
request state handler for HTTP status 404
request state handler for any 3xx HTTP status
request state handler for any 3xx HTTP status where domain did change as well
request state handler for content type other than 'text/plain'
request state handler for content that seems to be something else than a robots.txt file (usually a JSON, XML or HTML)