The most common raw data format is a .csv
file that contains ecology identifiers, node identifiers, blau parameters, and memberships, among other variables. Note: Unless told otherwise, this function will automatically assume binary columns are memberships and non-binary columns are parameters. Manual specification of memberships or parameters will overwrite this auto-detection. This makes it easy to work with relatively large data without specificing dozens of columns.
The vast majority of configuration takes place when calling the blau function. As such, it is essential that the user understand how choices made here impact the operation of subsequent functions. The easiest way to get started is to determine which of the four optional parameters---node identifiers (node.ids
), ecology identifiers (ecology.ids
), weights (weights
), and relational data (graph
)---are present in your dataset and will be used for analysis. These should be specified by indicating their locations with the appropriate function argument, and the blau
function will automatically assume all other columns are either membership or demographic columns. If there are columns to be excluded from analysis, they can be specified with the exclude parameter. This type of setup is appropriate for the vast majority of datasets.
It is important to remember that any information incorporated into the blau
object through this function will be used when calling subsequent functions. For instance, if your analysis does not require weights, but they are provided in the dataset, they should be explicitly excluded with the exclude
argument.
If ecology identifiers are provided, all subsequent analyses will automatically proceed on a by-ecology level (unless specified explicitly in subsequent functions).
With network information, the most important consideration is that node identifiers are properly indicated and may be matched up with node identifications provided with the node.ids
parameter. Adjacency matrix or edgelist input formats are both converted to an network
object. The preferred format is a named edgelist (two columns, with node names in each row indicating an edge).
If node names are numeric
rather than character
, they should still be specified in the input function with node.ids
and a network should correctly indicate node identifiers.
If complete.cases is FALSE
(the default option), we automatically use as much information as possible to compute niche boundaries. For example, an individual may have missing Blau parameter information for a certain dimension. Under the default settings, we still utilize the user's other demographic information to compute niche boundaries. If compleCases is specified as TRUE
, then only observations with no missing values along all elements in the input matrix will be utilized in determining boundaries.