fsubset
is a generic function, with methods supplied for vectors, matrices, and data
frames (including lists). It represents an improvement in both speed and functionality over subset
. The function ss
is an improvement of [
to subset (vectors) matrices and data frames without dropping dimensions. It is significantly faster than [.data.frame
. For subsetting columns alone, please see selecting and replacing columns.
For ordinary vectors, subset
can be integer or logical, subsetting is done in C and more efficient than [
for large vectors.
For matrices the implementation is all base-R but slightly more efficient and more versatile than subset.matrix
. Thus it is possible to subset
matrix rows using logical or integer vectors, or character vectors matching rownames. The drop
argument is passed on to the indexing method for matrices.
For both matrices and data frames, the …
argument can be used to subset columns, and is evaluated in a non-standard way. Thus it can support vectors of column names, indices or logical vectors, but also multiple comma separated column names passed without quotes, each of which may also be replaced by a sequence of columns i.e. col1:coln
, and new column names may be assigned e.g. fsubset(data, col1 > 20, newname = col2, col3:col6)
(see examples).
For data frames, the subset
argument is also evaluated in a non-standard way. Thus next to vector of row-indices or logical vectors, it supports logical expressions of the form col2 > 5 & col2 < col3
etc. (see examples). The data frame method is implemented in C, hence it is significantly faster than subset.data.frame
. Note that the use of %==%
to compare a single column to a single value can yield significant performance gains on large data. If fast data frame subsetting is required but no non-standard evaluation, the function ss
is slightly simpler and faster.
Factors may have empty levels after subsetting; unused levels are not automatically removed. See fdroplevels
to drop all unused levels from a data frame.