Test whether a co-occurrence matrix is represented in a DSM canonical format, or convert matrix to canonical format.
dsm.is.canonical(x, nonneg.check = FALSE)dsm.canonical.matrix(x, triplet = FALSE, annotate = FALSE, nonneg.check = FALSE)
dsm.is.canonical()
returns a data frame containing a single row with the following items:
whether x
is a sparse (TRUE
) or dense (TRUE
) matrix
whether x
is in canonical format
whether all cells of x
are non-negative; may be NA
if nonneg.check=FALSE
dsm.canonical.matrix()
returns a matrix in canonical DSM format, i.e.
of class matrix
for a dense matrix (even if x
is a denseMatrix
object);
of class dgCMatrix
for a sparse matrix.
If triplet=TRUE
and x
is sparse, it returns a matrix of class dgTMatrix
, which is not a canonical format.
If annotate=TRUE
, the returned matrix has attributes sparse
and nonneg
(possibly NA
).
a dense or sparse DSM co-occurrence matrix
if TRUE
, check whether all elements of the matrix are non-negative
if TRUE
and if x
is sparse, return a matrix in triplet format (class dgTMatrix
) rather than in column-compressed format (class dgCMatrix
). Note that this is not a canonical DSM format.
if TRUE
, annotate x
with attributes sparse
and nonneg
, indicating whether the matrix is in sparse representation and non-negative, respectively. Non-negativity is only checked if nonneg.check=TRUE
; otherwise an existing attribute will be passed through without validation.
Stephanie Evert (https://purl.org/stephanie.evert)
Note that conversion into canonical format may result in unnecessary copying of x
, especially if annotate=TRUE
.
For optimal performance, set annotate=FALSE
whenever possible and do not call dsm.canonical.matrix()
as a no-op.
Instead of
M <- dsm.canonical.matrix(M, annotate=TRUE, nonneg=TRUE)
use
M.flags <- dsm.is.canonical(M, nonneg=FALSE)
if (!M.flags$canonical) M <- dsm.canonical.matrix(M)
M.flags <- dsm.is.canonical(M, nonneg=TRUE)
If nonneg.check=FALSE
and x
has an attribute nonneg
, its value is accepted without validation.
Checking non-negativity can be expensive and create substantial memory overhead. It is guaranteed to be efficient for a matrix in canonical format.