Learn R Programming

Ecfun (version 0.2-0)

matchQuote: Match isolated quotes across records

Description

Look for unmatched quotes in a character vector. If found, look for a matching quote starting the next characer string in the vector, possibly after a blank line. If found, merge the two strings and return the resulting shortened character vector.

Usage

matchQuote(x,  Quote='"', sep=' ', maxChars2append=2, ...)

Arguments

x

a character vector to scan for unmatched Quotes.

Quote

the Quote character that should appear in pairs

sep

sep argument passed to paste to combine pairs of successive lines with unmatched quotes.

maxChars2append

maximum number of characters in the following string to concatonate two adjacent strings (possibly separated by a blank line) with unmatched Quotes.

optional arguments for gsub

Value

The input character vector possibly shortened with the following attributes explaining what was found:

  • unmatchedQuotes indices of the input x with an unmatched Quote.

  • blankLinesDropped indices of the input x that were dropped because they (1) followed an unmatched Quote and (2) contained no non-blank characters. quoteLinesAppended indices of the input x that were concatonated with a preceeding line because the two lines contained unmatched Quote characters, and concatonating them produced a line with all Quotes matched. ncharsAppended an integer vector of the same length as quoteLinesConcatonated giving the number of characters in the second line concatonated onto the previous line.

Details

This function was written to help parse data from the US Department of Health and Human Services on cyber-security breaches affecting 500 or more individuals. As of 2014-06-03 the csv version of these data included commas in quotes that are not sep characters, quotes that are not matched, lines with zero characters, followed by lines with 3 characters being a quote and a comma. This function was written to drop the blank lines and append the quote-comma line to the preceeding line so it contained matching quotes.

See Also

strsplit1 delimMatch

Examples

Run this code
# NOT RUN {
chvec <- c('abc', 'de"f', ' ', '",', 'g"h', 'matched"quotes"', '')
ch. <- matchQuote(chvec)

# check 
chv. <- c('abc', 'de"f ",', 'g"h', 'matched"quotes"', '')
attr(chv., 'unmatchedQuotes') <- c(2, 4, 5)
attr(chv., 'blankLinesDropped') <- 3
attr(chv., 'quoteLinesAppended') <- 4
attr(chv., 'ncharsAppended') <- 2 
# }
# NOT RUN {
all.equal(ch., chv.)
# }

Run the code above in your browser using DataLab