www.jamovi.org) from wide to long format

Description

Converts .omv-files for the statistical spreadsheet 'jamovi' (https://www.jamovi.org) from wide to long format

Usage

wide2long_omv(
  fleInp = "",
  fleOut = "",
  varLst = c(),
  varExc = c(),
  varID = "ID",
  varTme = "cond",
  varSep = "_",
  varSrt = c(),
  usePkg = c("foreign", "haven"),
  selSet = "",
  ...
)

Arguments

fleInp: Name (including the path, if required) of the data file to be read ("FILENAME.omv"; default: ""); can be any supported file type, see Details below
fleOut: Name (including the path, if required) of the data file to be written ("FILENAME.omv"; default: ""); if empty, FILENAME from fleInp is extended with "_long(file extension -> .omv)"
varLst: List / set of variables that are to be transformed into single (time-varying) variables in long format (default: c())
varExc: List / set of variables to be excluded from the variable list (default: c())
varID: Name(s) of one or more variables that (is created to) identify the same group / individual (if empty, "ID" is added with row numbers identifying cases; default: "ID")
varTme: Name of the variable that (is created to) differentiate multiple records from the same group / individual (default: "cond"; a counter is added for each time-varying part)
varSep: Character that separates the variables in varLst into a time-varying part and a part that forms the variable name in long format ("" in "VAR_1", "VAR_2", default: "")
varSrt: Variable(s) that are used to sort the data frame (see Details; if empty, the order returned from reshape is kept; default: c())
usePkg: Name of the package: "foreign" or "haven" that shall be used to read SPSS, Stata and SAS files; "foreign" is the default (it comes with base R), but "haven" is newer and more comprehensive
selSet: Name of the data set that is to be selected from the workspace (only applies when reading .RData-files)
...: Additional arguments passed on to methods; see Details below

Details

If varLst is empty, it is tried to generate it using all variables in the data frame except those defined by varExc and varID. The variable(s) in varID have to be unique identifiers (in the original dataset), those in varExc don't have this requirement. It is generally recommended that the variable names in varExc and varID should not include the variable separator (defined in varSep; default: "_") For further arguments, see the help for reshape (where varLst ~ varying, varSep ~ sep, varID ~ idvar, varTme ~ timevar). varSrt is a character vector containing column names that are used to sort the data frame before it is written. The ellipsis-parameter (...) can be used to submit arguments / parameters to the functions that are used for transforming or reading the data. The transformation uses reshape. When reading the data, the functions are: read_omv (for jamovi-files), read.table (for CSV / TSV files; using similar defaults as read.csv for CSV and read.delim for TSV which both are based upon read.table but with adjusted defaults for the respective file types), readRDS (for rds-files), read_sav (needs R-package "haven") or read.spss (needs R-package "foreign") for SPSS-files, read_dta ("haven") / read.dta ("foreign") for Stata-files, read_sas ("haven") for SAS-data-files, and read_xpt ("haven") / read.xport ("foreign") for SAS-transport-files. If you would like to use "haven", it may be needed to install it manually (i.e., install.packages("haven", dep = TRUE)).

Examples

Run this code

if (FALSE) {
library(jmvReadWrite)
# generate a test dataframe with 100 (imaginary) participants / units of
# observation (ID), and 8 repeated measurements of variable (X_1, X_2, ...)
dtaInp <- cbind(data.frame(ID = as.character(seq(1:100))),
                stats::setNames(
                    as.data.frame(matrix(runif(800, -10, 10), nrow = 100)),
                    paste0("X_", 1:8)))
cat(str(dtaInp))
# 'data.frame':	100 obs. of  9 variables:
#  $ ID : chr  "1" "2" "3" "4" ...
#  $ X_1: num  ...
#  $ X_2: num  ...
#  $ X_3: num  ...
#  $ X_4: num  ...
#  $ X_5: num  ...
#  $ X_6: num  ...
#  $ X_7: num  ...
#  $ X_8: num  ...
# this data set is stored as (temporary) RDS-file and later processed by wide2long
nmeInp <- paste0(tempfile(), ".rds")
nmeOut <- paste0(tempfile(), ".omv")
saveRDS(dtaInp, nmeInp)
wide2long_omv(fleInp = nmeInp, fleOut = nmeOut, varID = "ID", varTme = "measure",
    varLst = setdiff(names(dtaInp), "ID"), varSrt = c("ID", "measure"))
# it is required to give at least the arguments fleInp and varID
# "reshape" then assigns all variables expect the variable defined by varID to
# varLst (but throws a warning)
# varSrt enforces sorting the data set after the transformation (sorted, the
# measurements within one person come after another; unsorted all measurements
# for one repetition would come after another)

# check whether the file was created and its size
cat(list.files(dirname(nmeOut), basename(nmeOut)))
# -> "file[...].omv" ([...] contains a random combination of numbers / characters
cat(file.info(nmeOut)$size)
# -> 6939 (approximate size; size may differ in every run [in dependence of how
#          well the generated random data can be compressed])
cat(str(read_omv(nmeOut, sveAtt = FALSE)))
# the data set is now transformed into long (and each the measurements is now
# indicated by the "measure")
# 'data.frame':	800 obs. of  3 variables:
#  $ ID     : Factor w/ 100 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 2 2 ...
#   ..- attr(*, "missingValues")= list()
#  $ measure: Factor w/ 8 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 1 2 ...
#   ..- attr(*, "missingValues")= list()
#  $ X      : num  ...
#   ..- attr(*, "missingValues")= list()

unlink(nmeInp)
unlink(nmeOut)
}

Run the code above in your browser using DataLab