df.duplicated: Extract Duplicated or Unique Rows

Description

The function df.duplicated extracts duplicated rows and the function df.unique extracts unique rows from a matrix or data frame.

Usage

df.duplicated(..., data, first = TRUE, keep.all = TRUE, from.last = FALSE,
              keep.row.names = TRUE, check = TRUE)
df.unique(..., data, keep.all = TRUE, from.last = FALSE,
          keep.row.names = TRUE, check = TRUE)

Value

Returns duplicated or unique rows of the data frame in ... or data.

Arguments

...: an expression indicating the variable names in data used to determine duplicated or unique rows.e.g., df.duplicated(x1, x2, data = dat). Note that the operators ., +, -, ~, :, ::, and ! can also be used to select variables, see Details in the df.subset function.
data: a data frame.
first: logical: if TRUE (default), the df.duplicated() function will return duplicated rows including the first of identical rows.
keep.all: logical: if TRUE (default), the function will return all variables in x after extracting duplicated or unique rows based on the variables specified in the argument ....
from.last: logical: if TRUE, duplication will be considered from the reversed side, i.e., the last of identical rows would correspond to duplicated = FALSE. Note that this argument is only used when first = FALSE.
keep.row.names: logical: if TRUE (default), the row names from x are kept, otherwise they are set to NULL.
check: logical: if TRUE (default), argument specification is checked.

Author

Takuya Yanagida takuya.yanagida@univie.ac.at

Details

Note that df.unique(x) is equivalent to unique(x). That is, the main difference between the df.unique() and the unique() function is that the df.unique() function provides the ... argument to specify a variable or multiple variables which are used to determine unique rows.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Examples

Run this code

dat <- data.frame(x1 = c(1, 1, 2, 1, 4),
                  x2 = c(1, 1, 2, 1, 6),
                  x3 = c(2, 2, 3, 2, 6),
                  x4 = c(1, 1, 2, 2, 4),
                  x5 = c(1, 1, 4, 4, 3))

#-------------------------------------------------------------------------------
# df.duplicated() function

# Example 1: Extract duplicated rows based on all variables
df.duplicated(., data = dat)

# Example 2: Extract duplicated rows based on x4
df.duplicated(x4, data = dat)

# Example 3: Extract duplicated rows based on x2 and x3
df.duplicated(x2, x3, data = dat)

# Example 4: Extract duplicated rows based on all variables
# exclude first of identical rows
df.duplicated(., data = dat, first = FALSE)

# Example 5: Extract duplicated rows based on x2 and x3
# do not return all variables
df.duplicated(x2, x3, data = dat, keep.all = FALSE)

# Example 6: Extract duplicated rows based on x4
# consider duplication from the reversed side
df.duplicated(x4, data = dat, first = FALSE, from.last = TRUE)

# Example 7: Extract duplicated rows based on x2 and x3
# set row names to NULL
df.duplicated(x2, x3, data = dat, keep.row.names = FALSE)

#-------------------------------------------------------------------------------
# df.unique() function

# Example 8: Extract unique rows based on all variables
df.unique(., data = dat)

# Example 9: Extract unique rows based on x4
df.unique(x4, data = dat)

# Example 10: Extract unique rows based on x1, x2, and x3
df.unique(x1, x2, x3, data = dat)

# Example 11: Extract unique rows based on x2 and x3
# do not return all variables
df.unique(x2, x3, data = dat, keep.all = FALSE)

# Example 12: Extract unique rows based on x4
# consider duplication from the reversed side
df.unique(x4, data = dat, from.last = TRUE)

# Example 13: Extract unique rows based on x2 and x3
# set row names to NULL
df.unique(x2, x3, data = dat, keep.row.names = FALSE)

Run the code above in your browser using DataLab