df.duplicated: Extract Duplicated or Unique Rows

Description

This function extracts duplicated or unique rows from a matrix or data frame.

Usage

df.duplicated(x, ..., first = TRUE, keep.all = TRUE,
              from.last = FALSE, keep.row.names = TRUE,
              check = TRUE)
df.unique(x, ..., keep.all = TRUE,
          from.last = FALSE, keep.row.names = TRUE,
          check = TRUE)

Value

Returns duplicated or unique rows of the matrix or data frame in x.

Arguments

x: a matrix or data frame.
...: a variable or multiple variables which are specified without quotes '' or double quotes "" used to determine duplicated or unique rows. By default, all variables in x are used.
first: logical: if TRUE, the df.duplicated() function will return duplicated rows including the first of identical rows.
keep.all: logical: if TRUE, the function will return all variables in x after extracting duplicated or unique rows based on the variables specified in the argument ....
from.last: logical: if TRUE, duplication will be considered from the reversed side, i.e., the last of identical rows would correspond to duplicated = FALSE. Note that this argument is only used when first = FALSE.
keep.row.names: logical: if TRUE, the row names from x are kept, otherwise they are set to NULL.
check: logical: if TRUE, argument specification is checked.

Author

Takuya Yanagida takuya.yanagida@univie.ac.at

Details

Note that df.unique(x) is equivalent to unique(x). That is, the main difference between the df.unique() and the unique() function is that the df.unique() function provides the ... argument to specify a variable or multiple variables which are used to determine unique rows.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Examples

Run this code

dat <- data.frame(x1 = c(1, 1, 2, 1, 4),
                  x2 = c(1, 1, 2, 1, 6),
                  x3 = c(2, 2, 3, 2, 6),
                  x4 = c(1, 1, 2, 2, 4),
                  x5 = c(1, 1, 4, 4, 3))

#--------------------------------------
# df.duplicated() function

# Extract duplicated rows based on all variables
df.duplicated(dat)

# Extract duplicated rows based on x4
df.duplicated(dat, x4)

# Extract duplicated rows based on x2 and x3
df.duplicated(dat, x2, x3)

# Extract duplicated rows based on all variables
# exclude first of identical rows
df.duplicated(dat, first = FALSE)

# Extract duplicated rows based on x2 and x3
# do not return all variables
df.duplicated(dat, x2, x3, keep.all = FALSE)

# Extract duplicated rows based on x4
# consider duplication from the reversed side
df.duplicated(dat, x4, first = FALSE, from.last = TRUE)

# Extract duplicated rows based on x2 and x3
# set row names to NULL
df.duplicated(dat, x2, x3, keep.row.names = FALSE)

#--------------------------------------
# df.unique() function

# Extract unique rows based on all variables
unique(dat)

# Extract unique rows based on x4
df.unique(dat, x4)

# Extract unique rows based on x1, x2, and x3
df.unique(dat, x1, x2, x3)

# Extract unique rows based on x2 and x3
# do not return all variables
df.unique(dat, x2, x3, keep.all = FALSE)

# Extract unique rows based on x4
# consider duplication from the reversed side
df.unique(dat, x4, from.last = TRUE)

# Extract unique rows based on x2 and x3
# set row names to NULL
df.unique(dat, x2, x3, keep.row.names = FALSE)

Run the code above in your browser using DataLab