Learn R Programming

expss (version 0.10.7)

compute: Modify data.frame/modify subset of the data.frame

Description

  • compute evaluates expression expr in the context of data.frame data and return original data possibly modified.

  • calculate evaluates expression expr in the context of data.frame data and return value of the evaluated expression. Function use_labels is shortcut for calculate with argument use_labels set to TRUE. When use_labels is TRUE there is a special shortcut for entire data.frame - ..data.

  • do_if modifies only rows for which cond equals to TRUE. Other rows remain unchanged. Newly created variables also will have values only in rows for which cond have TRUE. There will be NA's in other rows. This function tries to mimic SPSS "DO IF(). ... END IF." statement.

Full-featured %to% is available in the expressions for addressing range of variables. There is a special constant .N which equals to number of cases in data for usage in expression inside compute/calculate. Inside do_if .N gives number of rows which will be affected by expressions. For parametrization (variable substitution) see .. or examples. Sometimes it is useful to create new empty variable inside compute. You can use .new_var function for this task. This function creates variable of length .N filled with NA. See examples. modify is an alias for compute, modify_if is an alias for do_if and calc is an alias for calculate.

Usage

compute(data, ...)

modify(data, ...)

do_if(data, cond, ...)

modify_if(data, cond, ...)

calculate(data, expr, use_labels = FALSE)

use_labels(data, expr)

calc(data, expr, use_labels = FALSE)

data %calc% expr

data %use_labels% expr

data %calculate% expr

Arguments

data

data.frame/list of data.frames. If data is list of data.frames then expression expr will be evaluated inside each data.frame separately.

...

expressions that should be evaluated in the context of data.frame data. It can be arbitrary code in curly brackets or assignments. See examples.

cond

logical vector or expression. Expression will be evaluated in the context of the data.

expr

expression that should be evaluated in the context of data.frame data

use_labels

logical. Experimental feature. If it equals to TRUE then we will try to replace variable names with labels. So many base R functions which show variable names will show labels.

Value

compute and do_if functions return modified data.frame/list of modified data.frames, calculate returns value of the evaluated expression/list of values.

Examples

Run this code
# NOT RUN {
dfs = data.frame(
    test = 1:5,
    a = rep(10, 5),
    b_1 = rep(11, 5),
    b_2 = rep(12, 5),
    b_3 = rep(13, 5),
    b_4 = rep(14, 5),
    b_5 = rep(15, 5) 
)


# compute sum of b* variables and attach it to 'dfs'
compute(dfs, {
    b_total = sum_row(b_1 %to% b_5)
    var_lab(b_total) = "Sum of b"
    random_numbers = runif(.N) # .N usage
})

# calculate sum of b* variables and return it
calculate(dfs, sum_row(b_1 %to% b_5))


# set values to existing/new variables
compute(dfs, {
    (b_1 %to% b_5) %into% text_expand('new_b{1:5}')
})

# .new_var usage
compute(dfs, {
    new_var = .new_var()
    new_var[1] = 1 # this is not possible without preliminary variable creation
})

# conditional modification
do_if(dfs, test %in% 2:4, {
    a = a + 1    
    b_total = sum_row(b_1 %to% b_5)
    random_numbers = runif(.N) # .N usage
})


# variable substitution
name1 = "a"
name2 = "new_var"

compute(dfs, {
     ..$name2 = ..$name1*2    
})

compute(dfs, {
     for(name1 in paste0("b_", 1:5)){
         name2 = paste0("new_", name1) 
         ..$name2 = ..$name1*2 
     }
     rm(name1, name2) # we don't need this variables as columns in 'dfs'
})

# square brackets notation
compute(dfs, {
     ..[(name2)] = ..[(name1)]*2  
})

compute(dfs, {
     for(name1 in paste0("b_", 1:5)){
         ..[paste0("new_", name1)] = ..$name1*2 
     }
     rm(name1) # we don't need this variable as column in 'dfs'
})

# '..$' doesn't work for case below so we need to use square brackets form
name1 = paste0("b_", 1:5)
name2 = paste0("new_", name1)
compute(dfs, {
     for(i in 1:5){
         ..[name2[i]] = ..[name1[i]]*3
     }
     rm(i) # we don't need this variable as column in 'dfs'
})

# 'use_labels' examples. Utilization of labels in base R.
data(mtcars)
mtcars = apply_labels(mtcars,
                      mpg = "Miles/(US) gallon",
                      cyl = "Number of cylinders",
                      disp = "Displacement (cu.in.)",
                      hp = "Gross horsepower",
                      drat = "Rear axle ratio",
                      wt = "Weight (lb/1000)",
                      qsec = "1/4 mile time",
                      vs = "Engine",
                      vs = c("V-engine" = 0,
                             "Straight engine" = 1),
                      am = "Transmission",
                      am = c("Automatic" = 0,
                             "Manual"=1),
                      gear = "Number of forward gears",
                      carb = "Number of carburetors"
)

use_labels(mtcars, table(am, vs))

# }
# NOT RUN {
use_labels(mtcars, plot(mpg, hp))
# }
# NOT RUN {
mtcars %>% 
       use_labels(lm(mpg ~ disp + hp + wt)) %>% 
       summary()

# }

Run the code above in your browser using DataLab