Learn R Programming

Kmisc (version 0.5.0)

split_file: Split a File by Unique Entries in a Column

Description

This script splits a delimited file by unique entries in a selected column. The name of the entry being split over is appended to the file name (before the file extension).

Usage

split_file(file, column, sep = NULL, outDir = file.path(dirname(file), "split"), prepend = "", dots = 1, skip = 0, verbose = TRUE)

Arguments

file
The location of the file we are splitting.
column
The column (by index) to split over.
sep
The file separator. Must be a single character. If '', we guess the delimiter from the first line.
outDir
The directory to output the files.
prepend
A string to prepend to the output file names; typically an identifier for what the column is being split over.
dots
The number of dots used in making up the file extension. If there are no dots in the file name, this argument is ignored.
skip
Integer; number of rows to skip (e.g. to avoid a header).
verbose
Be chatty?

Details

This function should help users out in the unfortunate case that the data they have attempted to read is too large to fit into RAM. By splitting the file into multiple, smaller files, we hope that each file, post-splitting, is now small enough to fit into RAM.

The focus is on efficient splitting of 'well-mannered' files, so if you have comments, quoted delimiters, cell entries that have paragraphs of unicode text, or other wacky things this is probably not the function for you.

See Also

extract_rows_from_file