read.fwf: Read Fixed Width Format Files

Description

Read a table of fixed width formatted data into a data.frame.

Usage

read.fwf(file, widths, header = FALSE, sep = "\t", skip = 0, row.names, col.names, n = -1, buffersize = 2000, fileEncoding = "", ...)

Arguments

file

the name of the file which the data are to be read from.

Alternatively, file can be a connection, which will be opened if necessary, and if so closed at the end of the function call.

widths

integer vector, giving the widths of the fixed-width fields (of one line), or list of integer vectors giving widths for multiline records.

header

a logical value indicating whether the file contains the names of the variables as its first line. If present, the names must be delimited by sep.

sep

character; the separator used internally; should be a character that does not occur in the file (except in the header).

skip

number of initial lines to skip; see read.table.

row.names

see read.table.

col.names

see read.table.

the maximum number of records (lines) to be read, defaulting to no limit.

buffersize

Maximum number of lines to read at one time

fileEncoding

character string: if non-empty declares the encoding used on a file (not a connection) so the character data can be re-encoded. See the ‘Encoding’ section of the help for file, the ‘R Data Import/Export Manual’ and ‘Note’.

...

further arguments to be passed to read.table. Useful such arguments include as.is, na.strings, colClasses and strip.white.

Value

A data.frame as produced by read.table which is called internally.

Details

Multiline records are concatenated to a single line before processing. Fields that are of zero-width or are wholly beyond the end of the line in file are replaced by NA.

Negative-width fields are used to indicate columns to be skipped, e.g., -5 to skip 5 columns. These fields are not seen by read.table and so should not be included in a col.names or colClasses argument (nor in the header line, if present).

Reducing the buffersize argument may reduce memory use when reading large files with long lines. Increasing buffersize may result in faster processing when enough memory is available. Note that read.fwf (not read.table) reads the supplied file, so the latter's argument encoding will not be useful.

Examples

Run this code

ff <- tempfile()
cat(file = ff, "123456", "987654", sep = "\n")
read.fwf(ff, widths = c(1,2,3))    #> 1 23 456 \ 9 87 654
read.fwf(ff, widths = c(1,-2,3))   #> 1 456 \ 9 654
unlink(ff)
cat(file = ff, "123", "987654", sep = "\n")
read.fwf(ff, widths = c(1,0, 2,3))    #> 1 NA 23 NA \ 9 NA 87 654
unlink(ff)
cat(file = ff, "123456", "987654", sep = "\n")
read.fwf(ff, widths = list(c(1,0, 2,3), c(2,2,2))) #> 1 NA 23 456 98 76 54
unlink(ff)

Run the code above in your browser using DataLab