Learn R Programming

SEERaBomb (version 2015.2)

mkSEER: Make R binaries of SEER data.

Description

Converts SEER ASCII text files into large R binaries that include all cancer types and registries combined.

Usage

mkSEER(df,seerHome="~/data/SEER",outDir="mrgd",outFile="cancDef",
                  indices = list(c("sex","race"), c("histo3","seqnum"),  "ICD9"),
                  writePops=TRUE,writeRData=TRUE,writeDB=TRUE)

Arguments

df
A data frame that was the output of pickFields(). This determines which fields to transfer. Using the output of getFields() is a common mistake that must be avoided.
seerHome
The directory that contains the SEER population and incidence directories. This should be writable by the user.
outDir
seerHome subdirectory to write to. Default is mrgd for all registries merged together.
outFile
Base name of the SQLite database and cancer binary. Default = CancDef (Cancer Default).
indices
Passed to copy_to() in dplyr.
writePops
TRUE if you wish to write out the population data frame binaries. Doing so takes ~10 seconds, so savings of FALSE are small.
writeRData
TRUE if you wish to write out the cancer data frame binary. Writing files takes most of the time.
writeDB
TRUE if you wish to write cancer, popga, popsa, and popsae data frames to SQLite database tables.

Value

  • None, it produces R binary files of the SEER data.

Details

This function uses the R package LaF to access the fixed-width format data files of SEER. LaF is fast, but it requires knowledge of all the widths of columns wanted, as well as the the widths of unwanted stretches in between. This knowledge is produced by getFields() and pickFields() combined. It is passed to mkSEER as the argument df. mkSEER uses dplyr objects of class tbl_df.

See Also

SEERaBomb-package,getFields,pickFields

Examples

Run this code
library(SEERaBomb)
(df=getFields())
(df=pickFields(df))
# the following will take a several minutes, but may only need 
# to be done roughly once per year, with each release.
mkSEER(df)

Run the code above in your browser using DataLab