How it Works:
First, n_sim i.i.d. samples from the root nodes are drawn. Children of these nodes are then generated one by one according to specified relationships and causal coefficients. For example, lets suppose there are two root nodes, age and sex. Those are generated from a normal distribution and a bernoulli distribution respectively. Afterward, the child node height is generated using both of these variables as parents according to a linear regression with defined coefficients, intercept and sigma (random error). This works because every DAG has at least one topological ordering, which is a linear ordering of vertices such that for every directed edge \(u\) \(v\), vertex \(u\) comes before \(v\) in the ordering. By using sort_dag=TRUE it is ensured that the nodes are processed in such an ordering.
This procedure is simple in theory, but can get very complex when manually coded. This function offers a simplified workflow by only requiring the user to define the dag object with appropriate information (see documentation of node function). A sample of size n_sim is then generated from the DAG specified by those two arguments.
Specifying the DAG:
Concrete details on how to specify the needed dag object are given in the documentation page of the node function and in the vignettes of this package.
Can this function create longitudinal data?
Yes and no. It theoretically can, but only if the user-specified dag directly specifies a node for each desired point in time. Using the sim_discrete_time is better in some cases. A brief discussion about this topic can be found in the vignettes of this package.
If time-dependent nodes were added to the dag using node_td calls, this function may not be used. Only the sim_discrete_time function will work in that case.