
stri_duplicated()
determines which strings in a character vector
are duplicates of other elements.stri_duplicated_any()
determines if there are any duplicated
strings in a character vector.
stri_duplicated(str, fromLast = FALSE, ..., opts_collator = NULL)
stri_duplicated_any(str, fromLast = FALSE, ..., opts_collator = NULL)
opts_collator
stri_opts_collator
, NULL
for default collation optionsstri_duplicated()
returns a logical vector of the same length
as str
. Each of its elements indicates whether a canonically
equivalent string was already found in str
.stri_duplicated_any()
returns a single non-negative integer.
Value of 0 indicates that all the elements in str
are unique.
Otherwise, it gives the index of the first non-unique element.
Unlike duplicated
and anyDuplicated
,
these functions test for canonical equivalence of strings
(and not whether the strings are just bytewise equal)
Such operations are locale-dependent.
Hence, stri_duplicated
and stri_duplicated_any
are significantly slower (but much better suited for natural language
processing) than their base R counterpart.
See also stri_unique
for extracting unique elements.
%s<%< a="">
,
stri_compare
,
stri_count_boundaries
,
stri_enc_detect2
,
stri_extract_all_boundaries
,
stri_locate_all_boundaries
,
stri_opts_collator
,
stri_order
,
stri_split_boundaries
,
stri_trans_tolower
,
stri_unique
, stri_wrap
,
stringi-locale
,
stringi-search-boundaries
,
stringi-search-coll
Other locale_sensitive: %s<%< a="">
,
stri_compare
,
stri_count_boundaries
,
stri_enc_detect2
,
stri_extract_all_boundaries
,
stri_locate_all_boundaries
,
stri_opts_collator
,
stri_order
,
stri_split_boundaries
,
stri_trans_tolower
,
stri_unique
, stri_wrap
,
stringi-locale
,
stringi-search-boundaries
,
stringi-search-coll
# In the following examples, we have 3 duplicated values,
# "a" - 2 times, NA - 1 time
stri_duplicated(c("a", "b", "a", NA, "a", NA))
stri_duplicated(c("a", "b", "a", NA, "a", NA), fromLast=TRUE)
stri_duplicated_any(c("a", "b", "a", NA, "a", NA))
# compare the results:
stri_duplicated(c("\u0105", stri_trans_nfkd("\u0105")))
duplicated(c("\u0105", stri_trans_nfkd("\u0105")))
stri_duplicated(c("gro\u00df", "GROSS", "Gro\u00df", "Gross"), strength=1)
duplicated(c("gro\u00df", "GROSS", "Gro\u00df", "Gross"))
Run the code above in your browser using DataLab