Learn R Programming

FSA (version 0.8.11)

Subset: Subsets/filters a data frame and drops the unused levels.

Description

Subsets/filters a data frame and drops the unused levels.

Usage

Subset(x, subset, select, drop = FALSE, resetRownames = TRUE, ...)
filterD(x, ..., except = NULL)

Arguments

x
A data frame.
subset
A logical expression that indicates elements or rows to keep: missing values are taken as false.
select
An expression, that indicates columns to select from a data frame.
drop
passed on to [ indexing operator.
resetRownames
A logical that indicates if the rownames should be reset after the subsetting (TRUE; default). Resetting rownames will simply number the rows from 1 to the number of rows in the result.
except
Indices of columns from which NOT to drop levels.
...
further arguments to be passed to or from other methods.

Value

A data frame with the subsetted rows and selected variables.

IFAR Chapter

Basic Data Manipulations.

Details

Newbie students using R expect that when a factor variable is subsetted with subset or filtered with filter that any original levels that are no longer used after the subsetting or flitering will be ignored. This, however, is not the case and often results in tables with empty cells and figures with empty bars. One remedy is to use drop.levels from gdata immediately following the subset or filter call. This generally becomes a repetitive sequence for most newbie students; thus, Subset and filterD incorporate these two functions into one function.

Subset is a wrapper to subset with a catch for non-data.frames and a specific call to drop.levels just before the data.frame is returned. I also added an argument to allow resetting the row names. filterD is a wrapper for filter from dplyr followed by drop.levels just before the data.frame is returned. Otherwise, there is no new code here.

These functions are used only for data frames.

See Also

See subset and filter from dplyr for similar functionality. See drop.levels in gdata and droplevels for related functionality.

Examples

Run this code
## The problem -- note use of unused level in the final table.
levels(iris$Species)
iris.set1 <- subset(iris,Species=="setosa" | Species=="versicolor")
levels(iris.set1$Species)
xtabs(~Species,data=iris)

## A simpler fix using Subset
iris.set2 <- Subset(iris,Species=="setosa" | Species=="versicolor")
levels(iris.set2$Species)
xtabs(~Species,data=iris.set2)

## A simpler fix using filterD
iris.set3 <- filterD(iris,Species=="setosa" | Species=="versicolor")
levels(iris.set3$Species)
xtabs(~Species,data=iris.set3)

Run the code above in your browser using DataLab