Learn R Programming

SEERaBomb (version 2019.2)

mkSEER: Make R binaries of SEER data.

Description

Converts SEER ASCII text files into large R binaries that include all cancer types and registries combined.

Usage

mkSEER(df,seerHome="~/data/SEER",outDir="mrgd",outFile="cancDef",
                  indices = list(c("sex","race"), c("histo3","seqnum"),  "ICD9"),
                  writePops=TRUE,writeRData=TRUE,writeDB=FALSE)

Arguments

df

A data frame that was the output of pickFields(). This determines which fields to transfer. Using the output of getFields() is a common mistake that must be avoided.

seerHome

The directory that contains the SEER population and incidence directories. This should be writable by the user.

outDir

seerHome subdirectory to write to. Default is mrgd for all registries merged together.

outFile

Base name of the SQLite database and cancer binary. Default = cancDef (Cancer Default).

indices

Passed to copy_to() in dplyr.

writePops

TRUE if you wish to write out the population data frame binaries. Doing so takes ~10 seconds, so savings of FALSE are small.

writeRData

TRUE if you wish to write out the cancer data frame binary. Writing files takes most of the time.

writeDB

TRUE if you wish to write cancer, popga, popsa, and popsae data frames to SQLite database tables.

Value

None, it produces R binary files of the SEER data.

Details

This function uses the R package LaF to access the fixed-width format data files of SEER. LaF is fast, but it requires knowledge of all the widths of columns wanted, as well as the the widths of unwanted stretches in between. This knowledge is produced by getFields() and pickFields() combined. It is passed to mkSEER() via the argument df.

See Also

SEERaBomb-package,getFields,pickFields

Examples

Run this code
# NOT RUN {
library(SEERaBomb)
(df=getFields())
(df=pickFields(df))
# the following will take a several minutes, but may only need 
# to be done roughly once per year, with each release.
mkSEER(df)
# }

Run the code above in your browser using DataLab