row_sums: Row sums and means for data frames

Description

row_sums() and row_means() compute row sums or means for at least n valid values per row. The functions are designed to work nicely within a pipe-workflow and allow select-helpers for selecting variables.

Usage

row_sums(x, ...)
# S3 method for default
row_sums(x, ..., n, var = "rowsums", append = TRUE)
# S3 method for mids
row_sums(x, ..., var = "rowsums", append = TRUE)
row_means(x, ...)
total_mean(x, ...)
# S3 method for default
row_means(x, ..., n, var = "rowmeans", append = TRUE)
# S3 method for mids
row_means(x, ..., var = "rowmeans", append = TRUE)

Arguments

A vector or data frame.

...

Optional, unquoted names of variables that should be selected for further processing. Required, if x is a data frame (and no vector) and only selected variables from x should be processed. You may also use functions like : or tidyselect's select_helpers. See 'Examples' or package-vignette.

May either be

a numeric value that indicates the amount of valid values per row to calculate the row mean or sum;
a value between 0 and 1, indicating a proportion of valid values per row to calculate the row mean or sum (see 'Details').
or Inf. If n = Inf, all values per row must be non-missing to compute row mean or sum.

If a row's sum of valid (i.e. non-NA) values is less than n, NA will be returned as value for the row mean or sum.

var

Name of new the variable with the row sums or means.

append

Logical, if TRUE (the default) and x is a data frame, x including the new variables as additional columns is returned; if FALSE, only the new variables are returned.

Value

For row_sums(), a data frame with a new variable: the row sums from x; for row_means(), a data frame with a new variable: the row means from x. If append = FALSE, only the new variable with row sums resp. row means is returned. total_mean() returns the mean of all values from all specified columns in a data frame.

Details

For n, must be a numeric value from 0 to ncol(x). If a row in x has at least n non-missing values, the row mean or sum is returned. If n is a non-integer value from 0 to 1, n is considered to indicate the proportion of necessary non-missing values per row. E.g., if n = .75, a row must have at least ncol(x) * n non-missing values for the row mean or sum to be calculated. See 'Examples'.

Examples

Run this code

# NOT RUN {
data(efc)
efc %>% row_sums(c82cop1:c90cop9, n = 3, append = FALSE)

library(dplyr)
row_sums(efc, contains("cop"), n = 2, append = FALSE)

dat <- data.frame(
  c1 = c(1,2,NA,4),
  c2 = c(NA,2,NA,5),
  c3 = c(NA,4,NA,NA),
  c4 = c(2,3,7,8),
  c5 = c(1,7,5,3)
)
dat

row_means(dat, n = 4)
row_sums(dat, n = 4)

row_means(dat, c1:c4, n = 4)
# at least 40% non-missing
row_means(dat, c1:c4, n = .4)
row_sums(dat, c1:c4, n = .4)

# total mean of all values in the data frame
total_mean(dat)

# create sum-score of COPE-Index, and append to data
efc %>%
  select(c82cop1:c90cop9) %>%
  row_sums(n = 1)

# if data frame has only one column, this column is returned
row_sums(dat[, 1, drop = FALSE], n = 0)

# }

Run the code above in your browser using DataLab