read.tfl: Loading and Saving Type Frequency Lists (zipfR)

Description

read.tfl loads type frequency list from .tfl file

write.tfl saves type frequency list object in .tfl file

Usage

read.tfl(file, encoding=getOption("encoding"))
  write.tfl(tfl, file, encoding=getOption("encoding"))

Arguments

file

character string specifying the pathname of a disk file. Files with extension .gz will automatically be compressed/decompressed. See section "Format" for a description of the required file format

tfl

a type frequency list, i.e.\ an object of class tfl

encoding

specifies the character encoding of the disk file to be read or written to. See file for details.

Value

read.tfl returns an object of class tfl (see the tfl manpage for details)

Format

A TAB-delimited text file with column headers but no row names (suitable for reading with read.delim), containing the following columns:

f: type frequencies \(f_k\)
k: optional: the corresponding type IDs \(k\). If missing, increasing non-negative integers are automatically assigned as IDs.
type: optional: type representations (such as word forms or lemmas)

These columns may appear in any order in the text file. Only the f column is mandatory and all unrecognized columns will be silently ignored.

Details

If the filename file ends in the extension .gz, .bz2 pr .xz, the disk file will automatically be decompressed (read.tfl) and compressed (write.tfl).

The .tfl file format stores neither the values of N and V nor the range of type frequencies explicitly. Therefore, incomplete type frequency lists cannot be fully reconstructed from disk files (and will not even be recognized as such). An attempt to save such a list will trigger a corresponding warning.

Examples

Run this code

# NOT RUN {
## save type-frequency list for Brown corpus to external file
fname <- tempfile(fileext=".tfl.gz") # automatically compresses file
write.tfl(Brown.tfl, fname)
## file <fname> contains a compressed TAB-delimited table with fields
##   k    ... type ID (usually Zipf rank)
##   f    ... frequency of type
##   type ... the type itself (here a word form)

## read it back in
New.tfl <- read.tfl(fname)

## same as Brown.tfl
summary(New.tfl)
summary(Brown.tfl)
print(New.tfl)
print(Brown.tfl)
head(New.tfl)
head(Brown.tfl)
stopifnot(isTRUE(all.equal(New.tfl, Brown.tfl))) # should by identical

# }
# NOT RUN {
## suppose you have a text file with a frequency list, one f per line, e.g.:
##   f
##   14
##   12
##   31
##   ...

## you can import this with read.tfl
MyData.tfl <- read.tfl("mylist.txt")
summary(MyData.tfl)
print(MyData.tfl) # ids in column k added by zipfR

## from this you can generate a spectrum with tfl2spc
MyData.spc <- tfl2spc(MyData.tfl)
summary(MyData.spc)
# }

Run the code above in your browser using DataLab