Learn R Programming

webreadr (version 0.4.0)

read_clf: read CLF-formatted logs

Description

Read a file of request logs stored in the Common Log Format.

Usage

read_clf(file, has_header = FALSE)

Arguments

file
the full path to the CLF-formatted file you want to read.
has_header
whether or not the file has a header row. Set to FALSE by default.

Value

a data.frame consisting of seven fields, as discussed above, with normalised timestamps.

Details

the CLF is a standardised format for web request logs. It consists of the fields:

  • ip_address: the IP address of the remote host that made the request. The CLF does not (by default) include the de-facto standard X-Forwarded-For header
  • remote_user_ident: the RFC 1413 remote user identifier.
  • local_user_ident: the identifier the user has authenticated with locally.
  • timestamp: the timestamp associated with the request, stored as "[08/Apr/2001:17:39:04 -0800]", where "-0800" represents the time offset (minus eight hours) of the timestamp from UTC.
  • request: the actual user request, containing the HTTP method used, the asset requested, and the HTTP Protocol version used.
  • status_code: the HTTP status code returned.
  • bytes_sent: the number of bytes sent

While outdated as a standard, systems using the CLF are still around; the Squid caching system, for example, uses the CLF as one of its default log formats (the other, the squid "native" format, can be read with read_squid).

See Also

read_combined for the /Combined/ Log Format, and split_clf for splitting out the "requests" field.

Examples

Run this code
#Read in an example CLF-formatted file provided with the webreadr package.
data <- read_clf(system.file("extdata/log.clf", package = "webreadr"))

Run the code above in your browser using DataLab