inputData
reads in an allele dataset from the specified
file, then calls preprocessData
to perform a series
of data format checks and preprocessing steps before returning the
checked and preprocessed dataset as an R data frame. The
reference information for preprocessData
contains
further information on the checks and preprocessing - it is
strongly recommended you read that information in addition to the
information below.The use of inputData
is optional, if you wish to create or
load the allele dataset into R by other means. However, it is then
necessary to call preprocessData
on the data frame
prior to using any other analysis functions in this package.
Similarly, if you decide to change or manipulate the data frame
contents within R, you should call preprocessData
again on the data frame prior to using any of the PolyPatEx
analysis functions. See the help for preprocessData
for further details.
Note that inputData
strips leading or trailing spaces
(whitespace) from each entry in the allele dataset as it is read
in. If you load your data by a means other than inputData
,
you should ensure that you perform this step yourself, as
preprocessData
will not carry out this necessary
step.
Note also that you should not use spaces in any of your allele
codes - PolyPatEx functions use spaces to separate allele codes as
they process the data - if allele codes already contains spaces,
errors will occur in this processing. If you need a separator, I
recommend using either code. (a period) or
code_ (an underscore) rather than a space.
Neither inputData
(nor preprocessData
) will
alter the CSV file from which the data is loaded - they merely
return a checked and preprocessed version of your allele dataset
(in the form of an R data frame) within the R environment, ready
for use by other PolyPatEx functions.
To load the allele dataset into R, inputData
calls R's
read.csv
function with certain arguments specified.
These arguments make read.csv
more stringent about
the precise format of the input datafile, requiring in particular
that each row of the CSV-formatted data file contain the correct
number of commas. This is not always guaranteed when the CSV file
has been exported from spreadsheet software. Should you get
Error in scan messages complaining about the number of
elements in a line of the input file, consider calling
fixCSV
on the data file, before calling
inputData
again. fixCSV
attempts to find and
correct such errors in a CSV file - see the help for this
function. Note that if you specify the skip
parameter in
a call to fixCSV
, you should use the same value for
this parameter in inputData
to avoid an error.
The various PolyPatEx functions need to know the characteristics
of the dataset being analysed - these are specified in the
inputData
or preprocessData
calls and are
invisibly attached to the allele data frame that is returned, for
use by other PolyPatEx functions. The required characteristics
are:
-
numLoci
: the number of loci in the dataset.
-
ploidy
: the ploidy $p$ of the species (currently
allowed to be 4, 6, or 8. ploidy
can also be 2, provided
dataType="genotype"
).
-
dataType
: whether the data is genotypic (all $p$
alleles at each locus are observed) or phenotypic (only the
distinct allele states at a locus are observed - alleles that
appear more than once in the genotype of a locus only appear once
in the phenotype).
-
dioecious
: whether the species is dioecious or
monoecious.
-
selfCompatible
whether a monoecious species is self
compatible (i.e., whether an individual can fertilise itself).
-
mothersOnly
: whether a dioecious dataset should
retain only adult females that are mothers of progeny in the
dataset. If dioecious=TRUE
, then mothersOnly
must
be set to either TRUE
or FALSE
.