read.csv.sql: Read File Filtered by SQL

Description

Read a file into R filtering it with an sql statement. Only the filtered portion is processed by R so that files larger than R can otherwise handle can be accommodated.

Usage

read.csv.sql(file, sql = "select * from file", header = TRUE, sep = ",", row.names, eol, skip, filter, dbname = tempfile(), drv = "SQLite", ...)
read.csv2.sql(file, sql = "select * from file", header = TRUE, sep = ";", row.names, eol, skip, filter, dbname = tempfile(), drv = "SQLite", ...)

Arguments

file

As in read.csv.

sql

character string holding an SQL statement. The table representing the file should be referred to as file.

header

As in read.csv.

sep

As in read.csv.

row.names

As in read.csv.

eol

Character which ends line.

skip

Skip indicated number of lines in input file.

filter

If specified, this should be a shell/batch command that the input file is piped through. For read.csv2.sql it is by default the following on non-Windows systems: tr , .. This translates all commas in the file to dots. On Window

dbname

As in sqldf except that the default is tempfile().

drv

This argument is ignored. Currently the only database SQLite supported by read.csv.sql and read.csv2.sql is SQLite. Note that the H2 database has a builtin SQL function, CSVREAD, which can be used in place of

...

Passed to sqldf.

Value

If the sql statement is a select statement then a data frame is returned.

Details

Reads the indicated file into an sql database creating the database if it does not already exist. Then it applies the sql statement returning the result as a data frame. If the database did not exist prior to this statement it is removed. Note that it uses facilities of SQLite to read the file which are intended for speed and therefore not as flexible as in R. For example, it does not recognize quoted fields as special but will regard the quotes as part of the field. See the sqldf help for more information. read.csv2.sql is like read.csv.sql except the default sep is ";" and the default filter translates all commas in the file to decimal points (i.e. to dots). On Windows, if the filter argument is used and if Rtools is detected in the registry then the Rtools bin directory is added to the search path facilitating use of those tools without explicitly setting any the path.

Examples

Run this code

write.table(iris, "iris.csv", sep = ",", quote = FALSE, row.names = FALSE)
iris2 <- read.csv.sql("iris.csv", 
	sql = "select * from file where Sepal.Length > 5")

Run the code above in your browser using DataLab