Learn R Programming

papeR (version 1.0-5)

labels: Extract labels from and set labels for data frames

Description

Labels can be stored as an attribute "variable.label" for each variable in a data set using the assignment function. With the extractor function one can assess these labels.

Usage

# S3 method for data.frame
labels(object, which = NULL, abbreviate = FALSE, ...)

## assign labels labels(data, which = NULL) <- value

## check if data.frame is a special labeled data.frame ('ldf') is.ldf(object)

## convert object to labeled data.frame ('ldf') convert.labels(object) as.ldf(object, ...)

## special plotting function for labeled data.frames ('ldf') # S3 method for ldf plot(x, variables = names(x), labels = TRUE, by = NULL, with = NULL, regression.line = TRUE, line.col = "red", ...)

Arguments

object

a data.frame.

data

a data.frame.

which

either a number indicating the label to extract or a character string with the variable name for which the label should be extracted. One can also use a vector of numerics or character strings to extract mutiple labels. If which is NULL (default), all labels are returned.

value

a vector containing the labels (in the order of the variables). If which is given, only the corresponding subset is labeled. Note that all other labels contain the variable name as label afterwards.

abbreviate

logical (default: FALSE). If TRUE variable labels are abbreviated such that they remain unique. See abbreviate for details. Further arguments to abbreviate can be specified (see below).

further options passed to function abbreviate if argument abbreviate = TRUE.

In x[...], … can be used to specify indices for extraction. See [ for details.

In plot, can be used to specify further graphial parameters.

x

a labeled data.frame with class 'ldf'.

variables

character vector or numeric vector defining (continuous) variables that should be included in the table. Per default, all numeric and factor variables of data are used.

labels

labels for the variables. If labels = TRUE (the default), labels(data, which = variables) is used as labels. If labels = NULL variables is used as label. labels can also be specified as character vector.

by

a character or numeric value specifying a variable in the data set. This variable can be either a grouping factor or is used as numeric y-variable (see with for details). Per default no grouping is applied. See also ‘Details’ and ‘Examples’.

with

a character or numeric value specifying a numeric variable with which to “correlate” all variables specified in variables. For numeric variables a scatterplot is plotted, for factor variables one gets a grouped boxplot. Per default no variable is given here. Instead of with one can also specify a numeric variable in by with the same results. See also ‘Details’ and ‘Examples’.

regression.line

a logical argument specifying if a regression line should be added to scatter plots (which are plotted if both variables and by are numeric values).

line.col

the color of the regression line.

Value

labels(data) returns a named vector of variable labels, where the names match the variable names and the values represent the labels.

Details

All labels are stored as attributes of the columns of the data frame, i.e., each variable has (up to) one attribute which contains the variable lable.

One can set or extract labels from data.frame objects. If no labels are specified labels(data) returns the column names of the data frame.

Using abbreviate = TRUE, all labels are abbreviated to (at least) 4 characters such that they are unique. Other minimal lengths can specified by setting minlength (see examples below).

Univariate plots can be easily obtained for all numeric and factor variables in a data set data by using plot(data).

Bivariate plots can be obtained by specifying by. In case of a factor variable, grouped boxplots or spineplots are plotted depending on the class of the variable specified in variables. In case of a numeric variable, grouped boxplots or scatter plots are plotted depending on the class of the variable specified in variables. Note that one cannot specify by and with at the same time (as they are internally identical). Note that missings are excluded plot wise (also for bivariate plots).

See Also

read.spss in package foreign

Examples

Run this code
# NOT RUN {
############################################################
### Basic labels manipulations

data <- data.frame(a = 1:10, b = 10:1, c = rep(1:2, 5))
labels(data)  ## only the variable names
is.ldf(data) ## not yet

## now set labels
labels(data) <- c("my_a", "my_b", "my_c")
## one gets a named character vector of labels
labels(data)
## data is now a ldf:
is.ldf(data)

## Altervatively one could use as.ldf(data) or convert.labels(data);
## This would keep the default labels but set the class
## correctly.

## set labels for a and b only
## Note that which represents the variable names!
labels(data, which = c("a", "b")) <- c("x", "y")
labels(data)

## reset labels (to variable names):
labels(data) <- NULL
labels(data)

## set label for a only and use default for other labels:
labels(data, which = "a") <- "x"
labels(data)

## attach label for new variable:
data2 <- data
data2$z <- as.factor(rep(2:3, each = 5))
labels(data2)  ## no real label for z, only variable name
labels(data2, which = "z") <- "new_label"
labels(data2)


############################################################
### Abbreviate labels

## attach long labels to data
labels(data) <- c("This is a long label", "This is another long label",
                  "This also")
labels(data)
labels(data, abbreviate = TRUE, minlength = 10)


############################################################
### Data manipulations

## reorder dataset:
tmp <- data2[, c(1, 4, 3, 2)]
labels(tmp)
## labels are kept and order is updated

## subsetting to single variables:
labels(tmp[, 2])  ## not working as tmp[, 2] drops to vector
## note that the label still exists but cannot be extracted
## using labels.default()
str(tmp[, 2])

labels(tmp[, 2, drop = FALSE]) ## prevent dropping

## one can also cbind labeled data.frame objects:
labels(cbind(data, tmp[, 2]))
## or better:
labels(cbind(data, tmp[, 2, drop = FALSE]))
## or rbind labeled.data.set objects:
labels(rbind(data, tmp[, -2]))


############################################################
### Plotting data sets

## plot the data auto"magically"; numerics as boxplot, factors as barplots
par(mfrow = c(2,2))
plot(data2)

## a single plot
plot(data2, variables = "a")
## grouped plot
plot(data2, variables = "a", by = "z")
## make "c" a factor and plot "c" vs. "z"
data2$c <- as.factor(data2$c)
plot(data2, variables = "c", by = "z")
## the same
plot(data2, variables = 3, by = 4)

## plot everithing against "b"
## (grouped boxplots, stacked barplots or scatterplots)
plot(data2, with = "b")
# }

Run the code above in your browser using DataLab