source_data: Load plain-text data and RData from a URL (either http or https)

Description

source_data loads plain-text or RDATA formatted data stored at a URL (both http and https) into R.

Usage

source_data(url, rdata, sha1 = NULL, cache = FALSE, clearCache = FALSE,
  sep = "auto", header = "auto", stringsAsFactors = FALSE,
  envir = parent.frame(), ...)

Value

a data frame

Arguments

url: The data's URL. To distinguish between plain-text and RDATA the url must end in a distinguishing file extension.
rdata: logical. Whether or not the data set is an .RDATA file. If not specified than source_url will attempt to determine whether or not the file is an .RDATA file from the URL's extension.
sha1: Character string of the file's SHA-1 hash, generated by source_data. Note if you are using data stored using Git, this is not the file's commit SHA-1 hash.
cache: logical. Whether or not to cache the data so that it is not downloaded every time the function is called.
clearCache: logical. Whether or not to clear the downloaded data from the cache.
sep: The separator method for the plain-text data. For example, to load comma-separated values data (CSV) use sep = ",". To load tab-separated values data (TSV) use sep = "\t". Only relevant for plain-text data.
header: Logical, whether or not the first line of the file is the header (i.e. variable names).
stringsAsFactors: logical. Convert all character columns to factors?
envir: the environment where the data should be loaded.
...: additional arguments passed to fread or load as relevant.

Details

Loads plain-text data (e.g. CSV, TSV) or RDATA from a URL. Works with both HTTP and HTTPS sites. Note: the URL you give for the url argument must be for the RAW version of the file. The function should work to download plain-text data from any secure URL (https), though I have not verified this.

From the source_url documentation: "If a SHA-1 hash is specified with the sha1 argument, then this function will check the SHA-1 hash of the downloaded file to make sure it matches the expected value, and throw an error if it does not match. If the SHA-1 hash is not specified, it will print a message displaying the hash of the downloaded file. The purpose of this is to improve security when running remotely-hosted code; if you have a hash of the file, you can be sure that it has not changed."

Examples

Run this code

if (FALSE) {
# Download electoral disproportionality data stored on GitHub
# Note: Using shortened URL created by bitly
DisData <- source_data("http://bit.ly/156oQ7a")

# Check to see if SHA-1 hash matches downloaded file
DisDataHash <- source_data("http://bit.ly/Ss6zDO",
   sha1 = "dc8110d6dff32f682bd2f2fdbacb89e37b94f95d")
}