Learn R Programming

pcvr (version 1.2.0)

mv_ag: Multi Value Trait Aggregation function

Description

EMD can get very heavy with large datasets. For an example lemnatech dataset filtering for images from every 5th day there are 6332^2 = 40,094,224 pairwise EMD values. In long format that's a 40 million row dataframe, which is unwieldy. This function is to help reduce the size of datasets before comparing histograms and moving on with matrix methods or network analysis.

Usage

mv_ag(
  df,
  group,
  mvCols = "frequencies",
  n_per_group = 1,
  outRows = NULL,
  keep = NULL,
  parallel = getOption("mc.cores", 1),
  traitCol = "trait",
  labelCol = "label",
  valueCol = "value",
  id = "image"
)

Value

Returns a dataframe summarized by the specified groups over the multi-value traits.

Arguments

df

A dataframe with multi value traits. This can be in wide or long format, data is assumed to be long if traitCol, valueCol, and labelCol are present.

group

Vector of column names for variables which uniquely identify groups in the data to summarize data over. Typically this would be the design variables and a time variable.

mvCols

Either a vector of column names/positions representing multi value traits or a character string that identifies the multi value trait columns as a regex pattern. Defaults to "frequencies".

n_per_group

Number of rows to return for each group.

outRows

Optionally this is a different way to specify how many rows to return. This will often not be exact so that groups have the same number of observations each.

keep

A vector of single value traits to also average over groups, if there are a mix of single and multi value traits in your data.

parallel

Optionally the groups can be run in parallel with this number of cores, defaults to 1 if the "mc.cores" option is not set globally.

traitCol

Column with phenotype names, defaults to "trait".

labelCol

Column with phenotype labels (units), defaults to "label".

valueCol

Column with phenotype values, defaults to "value".

id

Column that uniquely identifies images if the data is in long format. This is ignored when data is in wide format.

Examples

Run this code

s1 <- mvSim(
  dists = list(runif = list(min = 15, max = 150)),
  n_samples = 10,
  counts = 1000,
  min_bin = 1,
  max_bin = 180,
  wide = TRUE
)
mv_ag(s1, group = "group", mvCols = "sim_", n_per_group = 2)

Run the code above in your browser using DataLab