trim: Methods to Remove Unsemantic Text Prior to Diff

Description

diff* methods, in particular diffPrint, modify the text representation of an object prior to running the diff to reduce the incidence of spurious mismatches caused by unsemantic differences. For example, we look to remove matrix row indices and atomic vector indices (i.e. the [1,] or [1] strings at the beginning of each display line).

Usage

trimPrint(obj, obj.as.chr)
# S4 method for ANY,character
trimPrint(obj, obj.as.chr)
trimStr(obj, obj.as.chr)
# S4 method for ANY,character
trimStr(obj, obj.as.chr)
trimChr(obj, obj.as.chr)
# S4 method for ANY,character
trimChr(obj, obj.as.chr)
trimDeparse(obj, obj.as.chr)
# S4 method for ANY,character
trimDeparse(obj, obj.as.chr)
trimFile(obj, obj.as.chr)
# S4 method for ANY,character
trimFile(obj, obj.as.chr)

Arguments

obj

the object

obj.as.chr

character the printed representation of the object

Value

a length(obj.as.chr) row and 2 column integer matrix with the start (first column) and end (second column) character positions of the sub string to run diffs on.

Details

Consider:

> matrix(10:12)
     [,1]
[1,]   10
[2,]   11
[3,]   12
> matrix(11:12)
     [,1]
[1,]   11
[2,]   12

In this case, the line by line diff would find all rows of the matrix to be mismatched because where the data matches (rows containing 11 and 12) the indices do not. By trimming out the row indices before the diff, the diff can recognize that row 2 and 3 from the first matrix should be matched to row 1 and 2 of the second.

These methods follow a similar interface as the guide* methods, with one available for each diff* method except for diffCsv since that one uses diffPrint internally. The unsemantic differences are added back after the diff for display purposes, and are colored in grey to indicate they are ignored in the diff.

Currently only trimPrint and trimStr do anything meaningful. trimPrint removes row index headers provided that they are of the default un-named variety. If you add row names, or if numeric row indices are not ascending from 1, they will not be stripped as those have meaning. trimStr removes the ..$, ..-, and ..@ tokens to minimize spurious matches.

You can modify how text is trimmed by providing your own functions to the trim argument of the diff* methods, or by defining trim* methods for your objects. Note that the return value for these functions is the start and end columns of the text that should be kept and used in the diff.

As with guides, trimming is on a best efforts basis and may fail with “pathological” display representations. Since the diff still works even with failed trimming this is considered an acceptable compromise. Trimming is more likely to fail with nested recursive structures.