Learn R Programming

Ecfun (version 0.2-2)

matchName: Match surname and givenName in a table

Description

Use parseName to split a name into surname and givenName, the look for matches in table.

Usage

matchName(x, data, Names=1:2, 
          nicknames=matrix(character(0), 0, 2), 
          namesNotFound="attr.replacement", ...)
matchName1(x1, data, name=data[, 1],     
          nicknames=matrix(character(0), 0, 2), ...)

Arguments

x

One of the following:

  • A character matrix or data.frame with the same number of rows as data. The best partial match is sought in Names. The algorithm stops when a unique match is found; any remaining columns of x are then ignored. Any nicknames are ignored for the first column but not for subsequent columns.

  • A character vector whose length matches the number of rows of data. This will be replaced by parseName(x).

data

a character matrix or a data.frame. If surname and givenName are character vectors of names, their length must match the number of rows of data.

Names

One of the following in which matches for x will be sought:

  • A character vector or matrix or a data.frame for which NROW(Names) == nrow(data).

  • Something to select columns of data to produce a character vector or matrix or data.frame via data[, Names]. In this case, accents will be stripped using subNonStandardNames.

nicknames

a character matrix with two columns, each row giving a pair of names like "Pete" and "Peter" that should be regarded as equivalent if no exact match(es) is(are) found.

optional arguments passed to subNonStandardNames

x1

a character vector of names to match name.

NOTE: matchName calls subNonStandardNames, but matchName1 does not. Thus, x1 is assumed to NOT to contain characters not in standard English.

name

A character vector or matrix for which NROW(name) == nrow(data).

NOTE: matchName calls subNonStandardNames, but matchName1 does not. Thus, name is assumed to NOT to contain characters not in standard English.

namesNotFound

character vector passed to subNonStandardNames and used to compute any "namesNotFound" attribute of the object returned by parseName.

Value

matchName returns a list of the same length as x, each of whose components is object obtained as a subset of rows of data or NULL if no acceptable matches are found. The list may have an attribute "namesNotFound" as determined per the argument of that name.

matchNames1 returns a list of vectors of integers for subsets of data matching x1.

Details

*** 1. matchName(x, data, Names, nicknames, ...):

1.1. if(length(dim(x)<2))x <- parseName(x, ...)

1.2. x1 <- matchName1(x[, 1], cata, Names[1], ...)

1.3. For any component i of x1 with multiple rows, let x1i <- matchName1(x[i, 2], x1[[i]], Name[-1], nicknames=nicknames, ...). If nrow(x1i)>0, x1[[i]] <- x1i; else leave unchanged.

1.4. return x1

===========

*** 2. matchName1(x1, data, name, nicknames, ...):

2.1. If name indicates a column of data, replace with data[, name].

2.2. xsplit <- strsplit(x1, ' ')

2.3. nx <- length(x1); xlist <- vector(nx, mode='list')

2.4. for(j in 1:nx):

2.5. xj <- xplit[[j]]

2.6. let jd = the subset of names that match xj or subNonStandardNames(xj) or nicknames of xj; xlist[j] <- jd.

2.7. return xlist

See Also

parseName subNonStandardNames

Examples

Run this code
# NOT RUN {
##
## 1.  Names to match exercising many possibile combinations 
##     of surname with 0, 1, >1 matches possibly after 
##     replacing with subNonStandardNames 
##     combined with possibly multiple givenName combinations 
##     with 0, 1, >1 matches possibly requiring replacing with 
##     subNonStandardNames or nicknames 
##
# NOTE:  "-" could also be "e" with an accent;  
#    not included with this documentation, because 
#    non-English characters generate warnings in standard tests.  
Names2mtch <- c("Andr_ Bruce C_rdenas", "Dolores Ella Feinstein",
           "George Homer", "Inez Jane Kappa", "Luke Michael Noel", 
           "Oscar Papa", "Quincy Ra_l Stevens", 
           "Thomas U. Vel_zquez", "William X. Young", 
           "Zebra")
##
## 2.  Data = matrix(..., byrow=TRUE) to exercise the combinations 
##     the combinations from 1 
##
Data1 <- matrix(c("Feld", "Don", "789", 
                  "C_rdenas", "Don", "456", 
                  "C_rdenas", "Andre B.", "123", 
                  "Smith", "George", "aaa", 
                  "Young", "Bill", "369"), 
                ncol=3, byrow=TRUE)
Data1. <- subNonStandardNames(Data1)                
##
## 3.  matchName1
##        
parceNm1 <- parseName(Names2mtch)
match1.1 <- matchName1(parceNm1[, 'surname'], Data1.)

# check
match1.1s <- vector('list', 10)
match1.1s[[1]] <- 2:3
match1.1s[[9]] <- 5
names(match1.1s) <- parceNm1[, 'surname'] 
# }
# NOT RUN {
all.equal(match1.1, match1.1s)
# }
# NOT RUN {
##
## 4.  matchName1 with name = multiple columns 
##
match1.2 <- matchName1(c('Cardenas', 'Don'), Data1., 
                       name=Data1.[, 1:2])

# check 
match1.2a <- list(Cardenas=2:3, Don=1:2)
# }
# NOT RUN {
all.equal(match1.2, match1.2a)
# }
# NOT RUN {
##
## 5.  matchName 
##
nickNames <- matrix(c("William", "Bill"), 1, byrow=TRUE)

match1 <- matchName(Names2mtch, Data1, nicknames=nickNames)
                  
# check 
match1a <- list("Cardenas, Andre Bruce"=Data1[3,, drop=FALSE ], 
                "Feinstein, Dolores Ella"=NULL, 
                "Homer, George"=NULL, "Kappa, Inez Jane"=NULL, 
                "Noel, Luke Michael"=NULL, "Papa, Oscar"=NULL, 
                "Stevens, Quincy Raul"=NULL, 
                "Velazquez, Thomas U."=NULL, 
                "Young, William X."=Data1[5,, drop=FALSE], 
                "Zebra"=NULL)
# }
# NOT RUN {
all.equal(match1, match1a)
# }
# NOT RUN {
##
## 6.  namesNotFound 
##
tstNotFound <- matchName('xx_x', Data1)

# check 
tstNF <- list('xx_x'=NULL)
attr(tstNF, 'namesNotFound') <- 'xx_x'
# }
# NOT RUN {
all.equal(tstNotFound, tstNF)
# }
# NOT RUN {
##
## 7.  matchName(NULL) to simplify use 
##
mtchNULL <- matchName(NULL, Data1)
# }
# NOT RUN {
all.equal(mtchNULL, NULL)
# }

Run the code above in your browser using DataLab