GatherPreprocessing: Gather preprocess data from a formula

Description

Gather the preprocess data from a formula and a model, where the output corresponds to the data structure used by the engine gather_compute; see estimate().

Usage

GatherPreprocessing(
  formula,
  model = c("DyNAM", "REM"),
  subModel = c("choice", "choice_coordination", "rate"),
  preprocessArgs = NULL,
  progress = getOption("progress"),
  envir = new.env()
)

Value

a list object including:

stat_all_events: a matrix. The number of rows can be up to the number of events times the number of actors (square number of actors for the REM). Rigth-censored events are included when the model has an intercept. The number of columns is the number of effects in the model. Every row is the effect statistics at the time of the event for each actor in the choice set or the sender set.
n_candidates: a numeric vector with the number of rows related with an event. The length correspond to the number of events plus right censored events if any.
selected: a numeric vector with the position of the selected actor (choice model), sender actor (rate model), or active dyad (choice-coordination model, REM model). Indexing start at 1 for each event.
sender, receiver: a character vector with the label of the sender/receiver actor. For right-censored events the receiver values is not meaningful.
hasIntercept: a logical value indicating if the model has an intercept.
namesEffects: a character vector with a short name of the effect. It includes the name of the object used to calculate the effects and modifiers of the effect, e.g., the type of effect, weighted effect.
effectDescription: a character matrix with the description of the effects. It includes the name of the object used to calculate the effects and additional information of the effect, e.g., the type of effect, weighted effect, transformation function, window length.

If the model has an intercept and the subModel is rate or model is REM, additional elements are included:

timespan: a numeric vector with the time span between events, including right-censored events.
isDependent: a logical vector indicating if the event is dependent or right-censored.

Arguments

formula

a formula object that defines at the left-hand side the dependent network (see defineDependentEvents()) and at the right-hand side the effects and the variables for which the effects are expected to occur (see vignette("goldfishEffects")).

model

a character string defining the model type. Current options include "DyNAM", "DyNAMi" or "REM"

DyNAM: Dynamic Network Actor Models (Stadtfeld, Hollway and Block, 2017 and Stadtfeld and Block, 2017)

DyNAMi

Dynamic Network Actor Models for interactions (Hoffman et al., 2020)

REM

Relational Event Model (Butts, 2008)

subModel

a character string defining the submodel type. Current options include "choice", "rate" or "choice_coordination"

choice: a multinomial receiver choice model model = "DyNAM" (Stadtfeld and Block, 2017), or the general Relational event model model = "REM" (Butts, 2008). A multinomial group choice model model = "DyNAMi" (Hoffman et al., 2020)

choice_coordination

a multinomial-multinomial model for coordination ties model = "DyNAM" (Stadtfeld, Hollway and Block, 2017)

rate

A individual activity rates model model = "DyNAM" (Stadtfeld and Block, 2017). Two rate models, one for individuals joining groups and one for individuals leaving groups, jointly estimated model = "DyNAMi"(Hoffman et al., 2020)

preprocessArgs

a list containing additional parameters for preprocessing. It may contain:

startTime: a numerical value or a date-time character with the same time-zone formatting as the times in event that indicates the starting time to be considered during estimation. Note: it is only use during preprocessing

endTime

a numerical value or a date-time character with the same time-zone formatting as the times in event that indicates the end time to be considered during estimation. Note: it is only use during preprocessing

opportunitiesList

a list containing for each dependent event the list of available nodes for the choice model, this list should be the same length as the dependent events list (ONLY for choice models).

progress

logical indicating whether should print a minimal output to the console of the progress of the preprocessing and estimation processes.

envir

an environment where formula objects and their linked objects are available.

Details

It differs from the estimate() output when the argument preprocessingOnly is set to TRUE regarding the memory space requirement. The gatherPreprocessing() produces a list where the first element is a matrix that could have up to the number of events times the number of actors rows and the number of effects columns. For medium to large datasets with thousands of events and thousands of actors, the memory RAM requirements are large and, therefore, errors are produced due to a lack of space. The advantage of the data structure is that it can be adapted to estimate the models (or extensions of them) using standard packages for generalized linear models (or any other model) that use tabular data as input.

Examples

Run this code

data("Fisheries_Treaties_6070")
states <- defineNodes(states)
states <- linkEvents(states, sovchanges, attribute = "present")
states <- linkEvents(states, regchanges, attribute = "regime")
states <- linkEvents(states, gdpchanges, attribute = "gdp")

bilatnet <- defineNetwork(bilatnet, nodes = states, directed = FALSE)
bilatnet <- linkEvents(bilatnet, bilatchanges, nodes = states)

createBilat <- defineDependentEvents(
  events = bilatchanges[bilatchanges$increment == 1, ],
  nodes = states, defaultNetwork = bilatnet
)

contignet <- defineNetwork(contignet, nodes = states, directed = FALSE)
contignet <- linkEvents(contignet, contigchanges, nodes = states)

gatheredData <- GatherPreprocessing(
  createBilat ~ inertia(bilatnet) + trans(bilatnet) + tie(contignet)
)

Run the code above in your browser using DataLab