duplicatedMatching: Searching of duplicated records in a bibliographic database
Description
Search duplicated records in a dataframe.
Usage
duplicatedMatching(M, Field = "TI", exact = FALSE, tol = 0.95)
Value
the value returned from duplicatedMatching is a data frame without duplicated records.
Arguments
M
is the bibliographic data frame.
Field
is a character object. It indicates one of the field tags used to identify duplicated records. Field can be equal to one of these tags: TI (title), AB (abstract), UT (manuscript ID).
exact
is logical. If exact = TRUE the function searches duplicates using exact matching. If exact=FALSE,
the function uses the restricted Damerau-Levenshtein distance to find duplicated documents.
tol
is a numeric value giving the minimum relative similarity to match two manuscripts. Default value is tol = 0.95.
To use the restricted Damerau-Levenshtein distance, exact argument has to be set as FALSE.
Details
A bibliographic data frame is obtained by the converting function convert2df.
It is a data matrix with cases corresponding to manuscripts and variables to Field Tag in the original SCOPUS and Clarivate Analytics WoS file.
The function identifies duplicated records in a bibliographic data frame and deletes them.
Duplicate entries are identified through the restricted Damerau-Levenshtein distance.
Two manuscripts that have a relative similarity measure greater than tol argument are stored in the output data frame only once.
See Also
convert2df to import and convert an WoS or SCOPUS Export file in a bibliographic data frame.
biblioAnalysis function for bibliometric analysis.