the full path to the CLF-formatted file you want to read.
has_header
whether or not the file has a header row. Set to FALSE by
default.
Value
a data.frame consisting of seven fields, as discussed above, with normalised
timestamps.
Details
the CLF is a standardised format for web request logs. It consists of the fields:
ip_address: the IP address of the remote host that made the request. The CLF
does not (by default) include the de-facto standard X-Forwarded-For header
remote_user_ident: the RFC 1413 remote
user identifier.
local_user_ident: the identifier the user has authenticated with locally.
timestamp: the timestamp associated with the request, stored as
"[08/Apr/2001:17:39:04 -0800]", where "-0800" represents the time offset (minus
eight hours) of the timestamp from UTC.
request: the actual user request, containing the HTTP method used, the
asset requested, and the HTTP Protocol version used.
status_code: the HTTP status code returned.
bytes_sent: the number of bytes sent
While outdated as a standard, systems using the CLF are still around; the Squid caching
system, for example, uses the CLF as one of its default log formats (the other,
the squid "native" format, can be read with read_squid).
See Also
read_combined for the /Combined/ Log Format, and
split_clf for splitting out the "requests" field.