Learn R Programming

quanteda (version 0.7.2-1)

dfm-class: Virtual class "dfm" for a document-feature matrix

Description

The dfm class of object is a type of Matrix-class object with additional slots, described below. quanteda uses two subclasses of the dfm class, depending on whether the object can be represented by a sparse matrix, in which case it is a dfmSparse class object, or if dense, then a dfmDense object. See Details.

Usage

## S3 method for class 'dfm':
t(x)

## S3 method for class 'dfmSparse': colSums(x, na.rm = FALSE, dims = 1L, ...)

## S3 method for class 'dfmDense': colSums(x, na.rm = FALSE, dims = 1L, ...)

## S3 method for class 'dfmSparse': rowSums(x, na.rm = FALSE, dims = 1L, ...)

## S3 method for class 'dfmDense': rowSums(x, na.rm = FALSE, dims = 1L, ...)

## S3 method for class 'dfm': [(x, i, j, ..., drop = FALSE)

## S3 method for class 'dfmDense,index,index,missing': [(x, i = NULL, j = NULL, ..., drop = FALSE)

## S3 method for class 'dfmDense,index,index,logical': [(x, i = NULL, j = NULL, ..., drop = FALSE)

## S3 method for class 'dfmDense,index,missing,missing': [(x, i, j, ..., drop = FALSE)

## S3 method for class 'dfmDense,index,missing,logical': [(x, i, j, ..., drop = FALSE)

## S3 method for class 'dfmDense,missing,index,missing': [(x, i, j, ..., drop = FALSE)

## S3 method for class 'dfmDense,missing,index,logical': [(x, i, j, ..., drop = FALSE)

## S3 method for class 'dfmDense,missing,missing,missing': [(x, i, j, ..., drop = FALSE)

## S3 method for class 'dfmDense,missing,missing,logical': [(x, i, j, ..., drop = FALSE)

## S3 method for class 'dfmSparse,index,index,missing': [(x, i = NULL, j = NULL, ..., drop = FALSE)

## S3 method for class 'dfmSparse,index,index,logical': [(x, i = NULL, j = NULL, ..., drop = FALSE)

## S3 method for class 'dfmSparse,index,missing,missing': [(x, i, j, ..., drop = FALSE)

## S3 method for class 'dfmSparse,index,missing,logical': [(x, i, j, ..., drop = FALSE)

## S3 method for class 'dfmSparse,missing,index,missing': [(x, i, j, ..., drop = FALSE)

## S3 method for class 'dfmSparse,missing,index,logical': [(x, i, j, ..., drop = FALSE)

## S3 method for class 'dfmSparse,missing,missing,missing': [(x, i, j, ..., drop = FALSE)

## S3 method for class 'dfmSparse,missing,missing,logical': [(x, i, j, ..., drop = FALSE)

## S3 method for class 'dfmSparse,numeric': +(e1, e2)

## S3 method for class 'numeric,dfmSparse': +(e1, e2)

## S3 method for class 'dfmDense,numeric': +(e1, e2)

## S3 method for class 'numeric,dfmDense': +(e1, e2)

## S3 method for class 'dfm': as.matrix(x)

## S3 method for class 'dfm': as.data.frame(x)

Arguments

x
the dfm object
na.rm
if TRUE, omit missing values (including NaN) from the calculations
dims
ignored
...
additional arguments not used here
i
index for documents
j
index for features
drop
always set to FALSE
e1
first quantity in "+" operation for dfm
e2
second quantity in "+" operation for dfm

Details

The dfm class is a virtual class that will contain one of two subclasses for containing the cell counts of document-feature matrixes: dfmSparse or dfmDense.

The dfmSparse class is a sparse matrix version of dfm-class, inheriting dgCMatrix-class from the Matrix package. It is the default object type created when feature counts are the object of interest, as typical text-based feature counts tend contain many zeroes. As long as subsequent transformations of the dfm preserve cells with zero counts, the dfm should remain sparse.

When the Matrix package implements sparse integer matrixes, we will switch the default object class to this object type, as integers are 4 bytes each (compared to the current numeric double type requiring 8 bytes per cell.)

The dfmDense class is a sparse matrix version of dfm-class, inheriting dgeMatrix-class from the Matrix package. dfm objects that are converted through weighting or other transformations into cells without zeroes will be automatically converted to the dfmDense class. This will necessarily be a much larger sized object than one of dfmSparse class, because each cell is recorded as a numeric (double) type requiring 8 bytes of storage.

See Also

dfm

Examples

Run this code
dfmSparse <- dfm(inaugTexts, verbose=FALSE)
str(as.matrix(dfmSparse))
class(as.matrix(dfmSparse))
dfmDense <- dfm(inaugTexts, verbose=FALSE, matrixType="dense")
str(as.matrix(dfmDense))
class(as.matrix(dfmDense))
identical(as.matrix(dfmSparse), as.matrix(dfmDense))
dfmSparse <- dfm(inaugTexts, verbose=FALSE)
str(as.data.frame(dfmSparse))
class(as.data.frame(dfmSparse))
dfmDense <- dfm(inaugTexts, verbose=FALSE, matrixType="dense")
str(as.data.frame(dfmDense))
class(as.data.frame(dfmDense))
identical(as.data.frame(dfmSparse), as.data.frame(dfmDense))

Run the code above in your browser using DataLab