Learn R Programming

collapse (version 1.8.9)

efficient-programming: Small Functions to Make R Programming More Efficient

Description

A small set of functions to addresses some common inefficiencies in R, such as the creation of logical vectors to compare quantities, unnecessary copies of objects in elementary mathematical or subsetting operations, obtaining information about objects (esp. data frames), or dealing with missing values.

Usage

anyv(x, value)              # Faster than any(x == value). See also kit::panyv()
allv(x, value)              # Faster than all(x == value). See also kit::pallv()
allNA(x)                    # Faster than all(is.na(x)). See also kit::pallNA()
whichv(x, value,            # Faster than which(x == value)
       invert = FALSE)      # or which(x != value). See also Note (3)
whichNA(x, invert = FALSE)  # Faster than which((!)is.na(x))
x %==% value                # Infix for whichv(v, value, FALSE), use e.g. in fsubset()
x %!=% value                # Infix for whichv(v, value, TRUE). See also Note (3)
alloc(value, n)             # Faster than rep_len(value, n)
copyv(X, v, R, ..., invert  # Fast replace(X, v, R), replace(X, X (!/=)= v, R) or
    = FALSE, vind1 = FALSE, # replace(X, (!)v, R[(!)v]). See Details and Note (4).
    xlist = FALSE)          # For multi-replacement see also kit::vswitch()
setv(X, v, R, ..., invert   # Same for X[v] <- r, X[x (!/=)= v] <- r or
    = FALSE, vind1 = FALSE, # x[(!)v] <- r[(!)v]. Modifies X by reference, fastest.
    xlist = FALSE)          # X/R/V can also be lists/DFs. See Details and Examples.
setop(X, op, V, ...,        # Faster than X <- X +\-\*\/ V (modifies by reference)
      rowwise = FALSE)      # optionally can also add v to rows of a matrix or list
X %+=% V                    # Infix for setop(X, "+", V). See also Note (2)
X %-=% V                    # Infix for setop(X, "-", V). See also Note (2)
X %*=% V                    # Infix for setop(X, "*", V). See also Note (2)
X %/=% V                    # Infix for setop(X, "/", V). See also Note (2)
na_rm(x)                    # Fast: if(anyNA(x)) x[!is.na(x)] else x,
                            # also removes NULL / empty elements from list
na_omit(X, cols = NULL,     # Faster na.omit for matrices and data frames,
      na.attr = FALSE, ...) # can use selected columns and attach indices
na_insert(X, prop = 0.1,    # Insert missing values at random
          value = NA)
missing_cases(X,            # The oposite of complete.cases(), faster for
              cols = NULL)  # data frames. See also kit::panyNA() and kit::pallNA()
vlengths(X, use.names=TRUE) # Faster version of lengths() (in C, no method dispatch)
vtypes(X, use.names = TRUE) # Get data storage types (faster vapply(X, typeof, ...))
vgcd(x)                     # Greatest common divisor of positive integers or doubles
frange(x, na.rm = TRUE)     # Much faster base::range, for integer and double objects
fnlevels(x)                 # Faster version of nlevels(x) (for factors)
fnrow(X)                    # Faster nrow for data frames (not faster for matrices)
fncol(X)                    # Faster ncol for data frames (not faster for matrices)
fdim(X)                     # Faster dim for data frames (not faster for matrices)
seq_row(X)                  # Fast integer sequences along rows of X
seq_col(X)                  # Fast integer sequences along columns of X
cinv(x)                     # Choleski (fast) inverse of symmetric PD matrix, e.g. X'X

Arguments

X, V, R

a vector, matrix or data frame.

x, v

a (atomic) vector or matrix (na_rm also supports lists).

value

a single value of any (atomic) vector type. For whichv it can also be a length(x) vector.

invert

logical. TRUE considers elements x != value.

vind1

logical. If length(v) == 1L, setting vind1 = TRUE will interpret v as an index, rather than a value to search and replace.

xlist

logical. If X is a list, the default is to treat it like a data frame and replace rows. Setting xlist = TRUE will treat X and its replacement R like 1-dimensional list vectors.

op

an integer or character string indicating the operation to perform.

Int. String Description
1"+"add V
2"-"subtract V
3"*"multiply by V
4"/"divide by V

rowwise

logical. TRUE performs the operation between V and each row of X.

cols

select columns to check for missing values using column names, indices, a logical vector or a function (e.g. is.numeric). The default is to check all columns, which could be inefficient.

n

integer. The length of the vector to allocate with value.

na.attr

logical. TRUE adds an attribute containing the removed cases. For compatibility reasons this is exactly the same format as na.omit i.e. the attribute is called "na.action" and of class "omit".

prop

double. Specify the proportion of observations randomly replaced with NA.

use.names

logical. Preserve names if X is a list.

na.rm

logical. TRUE omits missing values by skipping them in the computation. FALSE terminates the computation once a missing value is encountered and returns 2 missing values.

...

for na_omit: further arguments passed to [ for vectors and matrices. With indexed data it is also possible to specify the drop.index.levels argument, see indexing. For copyv, setv and setop, the argument is unused, and serves as a placeholder for possible future arguments.

Details

copyv and setv are designed to optimize operations that require replacing data in objects in the broadest sense. The only difference between them is that copyv first deep-copies X before doing replacements whereas setv modifies X in place and returns the result invisibly. There are 3 ways these functions can be used:

  1. To replace a single value, setv(X, v, R) is an efficient alternative to X[X == v] <- R, and copyv(X, v, R) is more efficient than replace(X, X == v, R). This can be inverted using setv(X, v, R, invert = TRUE), equivalent to X[X != v] <- R.

  2. To do standard replacement with integer or logical indices i.e. X[v] <- R is more efficient using setv(X, v, R), and, if v is logical, setv(X, v, R, invert = TRUE) is efficient for X[!v] <- R. To distinguish this from use case (1) when length(v) == 1, the argument vind1 = TRUE can be set to ensure that v is always interpreted as an index.

  3. To copy values from objects of equal size i.e. setv(X, v, R) is faster than X[v] <- R[v], and setv(X, v, R, invert = TRUE) is faster than X[!v] <- R[!v].

Both X and R can be atomic or data frames / lists. If X is a list, the default behavior is to interpret it like a data frame, and apply setv/copyv to each element/column of X. If R is also a list, this is done using mapply. Thus setv/copyv can also be used to replace elements or rows in data frames, or copy rows from equally sized frames. Note that for replacing subsets in data frames set from data.table provides a more convenient interface (and there is also copy if you just want to deep-copy an object without any modifications to it).

If X should not be interpreted like a data frame, setting xlist = TRUE will interpret it like a 1D list-vector analogous to atomic vectors, except that use case (1) is not permitted i.e. no value comparisons on list elements.

See Also

Data Transformations, Small (Helper) Functions, Collapse Overview

Examples

Run this code

## Which value
whichNA(wlddev$PCGDP)                # Same as which(is.na(wlddev$PCGDP))
whichNA(wlddev$PCGDP, invert = TRUE) # Same as which(!is.na(wlddev$PCGDP))
whichv(wlddev$country, "Chad")       # Same as which(wlddev$county == "Chad")
wlddev$country %==% "Chad"           # Same thing
whichv(wlddev$country, "Chad", TRUE) # Same as which(wlddev$county != "Chad")
wlddev$country %!=% "Chad"           # Same thing
lvec <- wlddev$country == "Chad"     # If we already have a logical vector...
whichv(lvec, FALSE)                  # is fastver than which(!lvec)
rm(lvec)

# Using the %==% operator can yield tangible performance gains
fsubset(wlddev, iso3c %==% "DEU") # 3x faster than:
fsubset(wlddev, iso3c == "DEU")

## Math by reference: permissible types of operations
x <- alloc(1.0, 1e5) # Vector
x %+=% 1
x %+=% 1:1e5
xm <- matrix(alloc(1.0, 1e5), ncol = 100) # Matrix
xm %+=% 1
xm %+=% 1:1e3
setop(xm, "+", 1:100, rowwise = TRUE)
xm %+=% xm
xm %+=% 1:1e5
xd <- qDF(replicate(100, alloc(1.0, 1e3), simplify = FALSE)) # Data Frame
xd %+=% 1
xd %+=% 1:1e3
setop(xd, "+", 1:100, rowwise = TRUE)
xd %+=% xd
rm(x, xm, xd)

## setv() and copyv()
x <- rnorm(100)
y <- sample.int(10, 100, replace = TRUE)
setv(y, 5, 0)            # Faster than y[y == 5] <- 0
setv(y, 4, x)            # Faster than y[y == 4] <- x[y == 4]
setv(y, 20:30, y[40:50]) # Faster than y[20:30] <- y[40:50]
setv(y, 20:30, x)        # Faster than y[20:30] <- x[20:30]
rm(x, y)

# Working with data frames, here returning copies of the frame
copyv(mtcars, 20:30, ss(mtcars, 10:20))
copyv(mtcars, 20:30, fscale(mtcars))
ftransform(mtcars, new = copyv(cyl, 4, vs))
# Column-wise:
copyv(mtcars, 2:3, fscale(mtcars), xlist = TRUE)
copyv(mtcars, 2:3, mtcars[4:5], xlist = TRUE)

## Missing values
mtc_na <- na_insert(mtcars, 0.15)    # Set 15% of values missing at random
fnobs(mtc_na)                        # See observation count
missing_cases(mtc_na)                # Fast equivalent to !complete.cases(mtc_na)
missing_cases(mtc_na, cols = 3:4)    # Missing cases on certain columns?
# to see all missing cases use kit::pallNA(mtc_na)
na_omit(mtc_na)                      # 12x faster than na.omit(mtc_na)
na_omit(mtc_na, na.attr = TRUE)      # Adds attribute with removed cases, like na.omit
na_omit(mtc_na, cols = c("vs","am")) # Removes only cases missing vs or am
na_omit(qM(mtc_na))                  # Also works for matrices
na_omit(mtc_na$vs, na.attr = TRUE)   # Also works with vectors
na_rm(mtc_na$vs)                     # For vectors na_rm is faster ...
rm(mtc_na)

Run the code above in your browser using DataLab