A helper function for get_robotstxt() that will extract the robots.txt file from the HTTP request result object. furthermore it will inform get_robotstxt() if the request should be cached and which problems occured.
rt_request_handler(
request,
on_server_error = on_server_error_default,
on_client_error = on_client_error_default,
on_not_found = on_not_found_default,
on_redirect = on_redirect_default,
on_domain_change = on_domain_change_default,
on_sub_domain_change = on_sub_domain_change_default,
on_file_type_mismatch = on_file_type_mismatch_default,
on_suspect_content = on_suspect_content_default,
warn = TRUE,
encoding = "UTF-8"
)on_server_error_default
on_client_error_default
on_not_found_default
on_redirect_default
on_domain_change_default
on_sub_domain_change_default
on_file_type_mismatch_default
on_suspect_content_default
a list with three items following the following schema:
list( rtxt = "", problems = list( "redirect" = list( status_code = 301 ),
"domain" = list(from_url = "...", to_url = "...") ) )
An object of class list
of length 4.
An object of class list
of length 4.
An object of class list
of length 4.
An object of class list
of length 2.
An object of class list
of length 3.
An object of class list
of length 2.
An object of class list
of length 4.
An object of class list
of length 4.
result of an HTTP request (e.g. httr::GET())
request state handler for any 5xx status
request state handler for any 4xx HTTP status that is not 404
request state handler for HTTP status 404
request state handler for any 3xx HTTP status
request state handler for any 3xx HTTP status where domain did change as well
request state handler for any 3xx HTTP status where domain did change but only to www-sub_domain
request state handler for content type other than 'text/plain'
request state handler for content that seems to be something else than a robots.txt file (usually a JSON, XML or HTML)
suppress warnings
The text encoding to assume if no encoding is provided in the headers of the response