Takes a sequence of files and combines them by rows, without reading the full files into memory. This is especially useful when dealing with large datasets, where the reading of entire files may be time consuming and require a large amount of memory.
rbindFiles(infiles, outfile, col.sep, header = FALSE, ask = TRUE,
verbose = FALSE, add.file.number = FALSE, blank.lines.skip = FALSE)
There is no useful output; the objective of rbindFiles
is to produce outfile
.
A character vector of names (and paths) of the files to combine.
A character string giving the name of the modified file. The name of the file is relative to the current working directory, unless the file name contains a definite path.
Specifies the separator used to split the columns in the files. To split at all types of spaces or blank characters, set col.sep = "[[:space:]]"
or col.sep = "[[:blank:]]"
.
A logical variable which indicates if the first line in each file contains the names of the variables. If "TRUE", outfile
will display this header in its first row, assuming the headers for each file are identical. Equals FALSE by default, i.e. no headers assumed.
Logical. Default is "TRUE". If set to "FALSE", an already existing outfile will be overwritten without asking.
Logical. Default is "TRUE", which means that the line number is displayed for each iteration, i.e. each combined line.
A logical variable which equals "FALSE" by deafult. If "TRUE", an extra first column will be added to the outfile, consisting of the file numbers for each line.
Logical. If "TRUE" (default), lineByLine
ignores blank lines in the input.
Miriam Gjerdevik,
with Hakon K. Gjessing
Professor of Biostatistics
Division of Epidemiology
Norwegian Institute of Public Health
hakon.gjessing@uib.no
The function rbind
combines R objects by rows. However, reading large data files may require a large amount of memory and be extremely time consuming.
rbindFiles
avoids reading the full files into memory. It reads the files line by line, possibly modifies each line, then writes to outfile.
If however, header
, verbose
, add.file.number
and blank.lines.skip
are all set to "FALSE"
(their default values), the files are appended directly, thus evading line-by-line modifications.
In the case where infiles
contains only one file and no output or modifications are requested
(verbose
, add.file.number
and blank.lines.skip
equal "FALSE"), an identical copy of this file is made.
Web Site: https://haplin.bitbucket.io
cbindFiles
, lineByLine
if (FALSE) {
# Combines the three infiles, by rows
rbindFiles(file.names = c("myfile1.txt", "myfile2.txt", "myfile3.txt"),
outfile = "myfile_combined_by_rows.txt", col.sep = " ", header = TRUE, verbose = TRUE)
}
Run the code above in your browser using DataLab