Learn R Programming

webreadr (version 0.4.0)

read_aws: read Amazon CloudFront access logs

Description

Amazon CloudFront uses access logs with a standard format described on their website. read_aws reads these files in; due to the Amazon treatment of header lines, it is capable of organically detecting whether files lack common fields, and compensating for that. See "Details"

Usage

read_aws(file)

Arguments

file
the full path to the AWS file you want to read.

Details

Amazon CloudFront uses tab-separated files with Amazon-specific fields. This can be changed by individual CloudFront users, however, to exclude particular fields, and historically has contained fewer fields than it now does. Luckily, Amazon's insistence on standardisation in field names means that we can organically detect if fields are missing, and compensate for that before reading in the file.

If no fields are missing, the fields returned will be:

  • date: the date and time when the request was completed
  • time_elapsed: the amount of time (in milliseconds) that the connection and fulfilment of the request lasted for.
  • edge_location: the Amazon edge location that served the request, identified by a three-letter code. See the Amazon documentation for more details.
  • bytes_sent: a count of the number of bytes sent by the server to the client, including headers, to fulfil the request.
  • ip_address: the IP address of the client making the request.
  • http_method: the HTTP method (POST, GET, etc) used.
  • host: the CloudFront host name.
  • path: the path to the requested asset.
  • status_code: the HTTP status code associated with the request.
  • referer: the referer associated with the request.
  • user_agent: the user agent of the client that made the request.
  • query: the query string associated with the request; if there is no query string, this will be a dash.
  • cookie: the cookie header from the request, stored as name-value pairs. When no cookie header is provided, or it is empty, this will be a dash.
  • result_type: the result of the request. This is similar to Squid response codes ( see read_squid) but Amazon-specific; their documentation contains details on what each code means.
  • request_id: A hashed unique identifier for each request.
  • host_header: the host header of the requested asset. While host will always be the CloudFront host name, host_header contains alternate domain names (or 'CNAMES') when the CloudFront distribution is using them.
  • protocol: the protocol used in the request (http/https).
  • bytes_received: client-to-server bytes, including headers.
  • time_elapsed: the time elapsed, in seconds, between the time the request was received and the time the server completed responding to it.

See Also

read_s3, for Amazon S3 files, read_clf for the Common Log Format, read_squid and read_combined.

Examples

Run this code
#Read in an example CloudFront file provided with the webreadr package.
data <- read_aws(system.file("extdata/log.aws", package = "webreadr"))

Run the code above in your browser using DataLab