fixCSV(file, skip = 0, overwrite = FALSE)
skip=0
, implying that the header row is the first row of
the CSV file.overwrite=FALSE
, the default), or
overwrite the original file (overwrite=TRUE
)? If
overwrite=TRUE
, the original file is copied to a
.BAK
file before being overwritten.fixCSV
tidies up a Comma Separated Value (CSV) file
to ensure that the CSV file contains a strictly rectangular block
of data for input into R (ignoring any preliminary comment rows
via the skip=
argument).CSV formatted files are a plain text file format for tabular data, in which cell entries in the same row of a table are separated by commas. When such files are exported from other applications such as spreadsheet software, the software has to decide whether any empty cells to the right-hand side of, or below, the table or spreadsheet should be represented by trailing commas in the CSV file. Such decisions can result in a ragged table in the CSV file, in which some rows contain fewer commas (short rows) or more commas (long rows) than others, or where empty rows below the table are included as comma-only rows in the CSV file.
While R's read.table
and related functions can
sensibly extend short rows as needed, ragged tables in a CSV file
can still result in errors, unwanted empty rows (below the table)
or unwanted columns (to the right of the table) when the data is
loaded into R.
fixCSV
reads in a specified CSV file and removes or adds
commas to rows, to ensure that each row in the body of the table
contains the same number of cells as the header row of the table.
Any empty rows below the table are also removed. The resulting
table is then written back to file, either to a new file with
FIXED added to the filename (argument
overwrite=FALSE
, the default) or overwriting the original
file (overwrite=TRUE
- the original file is copied to a
.BAK
file before being overwritten).
Note that:
skip=
argument (see below) can
similarly lead to such corruption of the fixed file.
fixCSV
does not remove empty cells, rows or columns
within the interior (or on the left side) of the table - it is
concerned only with the right and bottom boundaries of the table.
skip=
argument is included to tell fixCSV
to
ignore the specified number of comment rows preceding the header
row. Such rows are simply copied over into the output file
unchanged. The default for this parameter is skip=0
, so
that the first row in the data file is assumed to be the header
row. As noted above, misspecification of this argument can
seriously corrupt the output.
fixCSV
can overwrite your data file(s) (via
overwrite=TRUE
), and althought it makes a backup of your
original file, you should still make sure that you have a separate
backup of your data file in a safe place before using this
function! The author of this code takes no responsibility for any
data loss or corruption as a result of the use of this routine...
## Not run:
#
# ## Assuming CSV file 'alleleDataFile.csv' exists in the current
# ## directory. The following overwrites the CSV file - make sure
# ## you have a backup!
#
# fixCSV("alleleDataFile.csv",overwrite=TRUE)
#
# ## End(Not run)
Run the code above in your browser using DataLab