conversion: Lint internal data structures

Description

Lint internal data structures

Usage

empty.find
  parse2find(parse.data)
  find2replace(find.data)
  locate2find(loc)

Arguments

Format

'data.frame': 0 obs. of 4 variables: $ line1: int $ col1 : int $ line2: int $ col2 : int

Introduction

lint makes use of several functions from different packages that store data in various different formats. These functions provide utilities for converting between the different formats. The formats are:

parse - from the getParseData function. In parse data each element of an expression has it's own row.
find - similar to parse but gives a row for each region or expression of interest.
replace - for use with stringr. Uses a column structure with start and end, organized into a matrix with a row for each line.
locate - results from str_locate from stringr. same as replace for most purposes but does not include a string.

parse data structure

Parse data structure originates from the getParseData function,which returns an objects with the attribute 'data'. Parse formatteddata contains a row for every token, string, and expression. The data frame describes a tree structure with each row a node. Each node has a parent unless it is a root node i.e. parent==0. It has the following columns.

line1 starting line of the expression.
col1 starting column.
line2 ending line of the expression.
col2 ending column.
token the token class number.
id the unique id of the expression
parent the parent of the expression, 0 if none.
top_level top_level, which top level expression is the expression associated with
token class name of the token.
terminal is this a terminal node? i.e. has no child nodes.
text the actual text of the expression.

The parse data is formatted with C based indexing. E.g. the first two elements would be listed as

col1=0,
  col2=2

. The line number however is 1 based so the first line is 1, there is no zero line.

Find data structure

For the purposes of the data the find data consists of a single row for each section/region with the first 4 columns of parse.data; the columns line1, col1, line2, and col2, marking the beginning and end of a section. This is a condensation of the parse data which would have the same columns as well as additional columns, and a row for each expression in the region. Find formatted data is defined to be R or 1 based arrays and inclusive. the first two elements would be col1=1, col2=2. Although both col elements are retained in conversion functions, at this time only col columns are used internally.

Replace data structure

The data structure for replace data is defined as a data frame with columns suitable for use ase arguments to str_sub. That is it has columns

start
end
and either string or line

where string would be preferred but line to match up with line data. find2replace uses the line, since the string is not available in the find data. Replace data formatted data is also R/1 inclusive based arrays.

Locate data structure

locate data is defined as the matrix that comes from str_locate. It has columns

start
end

and has a row for every line.

Details

parse2find Deprecated. Due to the changes in R version 3.0 this function is no longer necessary

Expects either a parse formatted data.frame or a list of data.frames. each data.frame is a contiguous region that is collapsed into a single find formatted data.frame, one row for each region.