Learn R Programming

Laurae (version 0.0.0.9001)

DTcolsample: data.table colsampling (nearly without) copy

Description

This function attempts to subsample one data.table without making copies. Well, you could just use DT[, (mycols) := NULL] (for removal) or DT <- DT[, (mycols), with = FALSE] for selecting...

Usage

DTcolsample(DT, kept, remove = FALSE, low_mem = FALSE, collect = 0,
  silent = TRUE)

Arguments

DT
Type: data.table. The data.table to combine on.
kept
Type: vector of integers or vector of characters. The columns to select to keep.
remove
Type: boolean. Whether the argument kept acts as a removal (keep all columns which are not in kept). Defaults to FALSE.
low_mem
Type: boolean. Unallows DT (up to) twice in memory by deleting DT (WARNING: empties your DT) to save memory when set to TRUE. Setting it to FALSE allow DT to reside (up to) twice in memory, therefore memory usage increases. Defaults to FALSE.
collect
Type: integer. Forces a garbage collect every collect iterations to clear up memory. Setting this to 1 along with low_mem = TRUE leads to the lowest possible memory usage one can ever get to merge two data.tables. It also prints verbose information about the process everytime it garbage collects. Setting this to 0 leads to no garbage collect. Lower values increases the time required to subsample the data.table. Defauls to 0.
silent
Type: boolean. Force silence during garbage collection iterations at no speed cost. Defaults to TRUE.

Value

The subsampled data.table.

Details

Warning: DT is a pointer only even if you pass the object to this function. This is how memory efficiency is achieved.

Examples

Run this code
library(data.table)
DT <- data.frame(matrix(nrow = 50, ncol = 10))
DT <- setDT(DT)
colnames(DT) <- paste(colnames(DT), "xx", sep = "")
DT <- DTcolsample(DT, kept = 1:8, remove = FALSE, low_mem = TRUE)
DT <- DTcolsample(DT, kept = 1:6, remove = TRUE, low_mem = TRUE)

Run the code above in your browser using DataLab