- x
an object of class dsm
- subset
Boolean expression or index vector selecting a subset of the rows; the expression can use variables term
and f
to access target terms and their marginal frequencies, nnzero
for the number of nonzero elements in each row, further optional variables from the row information table, as well as global variables such as the sample size N
- select
Boolean expression or index vector selecting a subset of the columns; the expression can use variables term
and f
to access feature terms and their marginal frequencies, nnzero
for the number of nonzero elements in each column, further optional variables from the column information table, as well as global variables such as the sample size N
- recursive
if TRUE
and both subset
and select
conditions are specified, the subset
is applied repeatedly until the DSM no longer changes.
This is typically needed if conditions on nonzero counts or row/column norms are specified, which may be affected by the subsetting procedure.
- drop.zeroes
if TRUE
, all rows and columns without any nonzero entries after subsetting are removed from the model
(nonzero counts are based on the score matrix \(S\) if available, raw cooccurrence frequencies \(M\) otherwise)
- matrix.only
if TRUE
, return only the selected subset of the score matrix \(S\) (if available) or frequency matrix \(M\), not a full DSM object. This may conserve a substantial amount of memory when processing very large DSMs.
- envir
environment in which the subset
and select
conditions are evaluated. Defaults to the context of the function call, so all variables visible there can be used in the expressions.
- run.gc
whether to run the garbage collector after each iteration of a recursive subset (recursive=TRUE
) in order to keep memory overhead as low as possible. This option should only be specified if memory is very tight, since garbage collector runs can be expensive (e.g. when there are many distinct strings in the workspace).
- ...
any further arguments are silently ignored