the Squid default log formats are either the CLF - for which, use
read_clf - or the "native" Squid format, which is described in more detail
below. read_squid allows you to read the latter.
Usage
read_squid(file, has_header = FALSE)
Arguments
file
the full path to the CLF-formatted file you want to read.
has_header
whether or not the file has a header row. Set to FALSE by
default.
Details
The log format for Squid servers can be custom-set, but by default follows one of two
patterns; it's either the Common Log Format (CLF), which you can read in with
read_clf, or the "native log format", a Squid-specific format handled
by this function. It consists of the fields:
timestamp: the timestamp identifying when the request was received. This is
stored (from the file's point of view) as a count of seconds, in UNIX time:
read_squid turns them into POSIXlt timestamps, assuming UTC as an
origin timezone.
time_elapsed: the amount of time (in milliseconds) that the connection and fulfilment
of the request lasted for.
ip_address: the IP address of the remote host making the request.
status_code: the status code and Squid response code associated with that request,
stored as a single field. This can be split into two distinct fields with split_squid
bytes_sent: the number of bytes sent
http_method: the HTTP method (POST, GET, etc) used.
url: the URL of the requested asset.
remote_user_ident: the RFC 1413 remote
user identifier.
peer_info: the status of how forwarding to a peer server was handled and, if the
request was forwarded, the server it was sent to.
See Also
read_clf for the Common Log Format (also used by Squids), and
split_squid for splitting the "status_code" field into its component parts.