Learn R Programming

cheapr (version 1.1.0)

sset: Cheaper subset

Description

Cheaper alternative to [ that consistently subsets data frame rows, always returning a data frame. There are explicit methods for enhanced data frames like tibbles, data.tables and sf.

Usage

sset(x, ...)

# S3 method for data.frame sset(x, i = NULL, j = NULL, ...)

# S3 method for tbl_df sset(x, i = NULL, j = NULL, ...)

# S3 method for POSIXlt sset(x, i = NULL, j = NULL, ...)

# S3 method for data.table sset(x, i = NULL, j = NULL, ...)

# S3 method for sf sset(x, i = NULL, j = NULL, ...)

Value

A new vector, data frame, list, matrix or other R object.

Arguments

x

Vector or data frame.

...

Further parameters passed to [.

i

A logical or vector of indices.

j

Column indices, names or logical vector.

Details

sset is an S3 generic. You can either write methods for sset or [.
sset will fall back on using [ when no suitable method is found.

To get into more detail, using sset() on a data frame, a new list is always allocated through new_list().

Difference to base R

When i is a logical vector, it is passed directly to which_().
This means that NA values are ignored and this also means that i is not recycled, so it is good practice to make sure the logical vector matches the length of x. To return NA values, use sset(x, NA_integer_).

ALTREP range subsetting

When i is an ALTREP compact sequence which can be commonly created using e.g. 1:10 or using seq_len, seq_along and seq.int, sset internally uses a range-based subsetting method which is faster and doesn't allocate i into memory.

Examples

Run this code
library(cheapr)
library(bench)

# Selecting columns
sset(airquality, j = "Temp")
sset(airquality, j = 1:2)

# Selecting rows
sset(iris, 1:5)

# Rows and columns
sset(iris, 1:5, 1:5)
sset(iris, iris$Sepal.Length > 7, c("Species", "Sepal.Length"))

# Comparison against base
x <- rnorm(10^4)

mark(x[1:10^3], sset(x, 1:10^3))
mark(x[x > 0], sset(x, x > 0))

df <- data.frame(x = x)

mark(df[df$x > 0, , drop = FALSE],
     sset(df, df$x > 0),
     check = FALSE) # Row names are different

Run the code above in your browser using DataLab