seqid: Generate Group-Id from Integer Sequences

Description

seqid can be used to group sequences of integers in a vector, e.g. seqid(c(1:3, 5:7)) becomes c(rep(1,3), rep(2,3)). It also supports increments > 1, unordered sequences, and missing values in the sequence.

Some applications are to facilitate identification of, and grouped operations on, (irregular) time-series and panels.

Usage

seqid(x, o = NULL, del = 1L, start = 1L, na.skip = FALSE,
      skip.seq = FALSE, check.o = TRUE)

Arguments

a factor or integer vector. Numeric vectors will be converted to integer i.e. rounded.

an (optional) integer ordering vector specifying the order by which to pass through x.

del

integer. The integer deliminating two consecutive points in a sequence. del = 1 means seqid tracks sequences of the form c(1,2,3,..), del = 2 tracks sequences c(1,3,5,..) etc.

start

integer. The starting value of the resulting sequence id. Default is starting from 1. For C++ programmers, starting from 0 could be a better choice.

na.skip

logical. Skip missing values in the sequence. The default behavior is skipping such that seqid(c(1, NA, 2)) is regarded as one sequence and coded as c(1, NA, 1).

skip.seq

logical. If na.skip = TRUE, this changes the behavior such that missing values are viewed as part of the sequence, i.e. seqid(c(1, NA, 3)) is regarded as one sequence and coded as c(1, NA, 1).

check.o

logical. Programmers option: FALSE prevents checking that each element of o is in the range [1, length(x)], it only checks the length of o. This gives some extra speed, but will terminate R if any element of o is too large or too small.

Value

An integer vector of class 'qG'. See qG.

Details

seqid was created primarily to deal with problems of computing lagged values, differences and growth rates on irregularly spaced time-series and panels (#26). flag, fdiff and fgrowth do not natively support such panels because they do not pre-compute an ordering of the data but directly compute the ordering from the supplied id and time variables while providing errors for gaps and repeated time values. see flag for computational details.

However fortunately any irregular time-series or panel-series can be expressed as a regular panel-series with a group-id created such that the time-periods within each group are consecutive.

A simple solution to applying existing functionality (flag, fdiff and fgrowth) to irregular time-series and panels is thus to create a group-id that fully identifies the data together with the time variable. seqid makes this very easy: For an irregular panel with some arbitrary gaps or repeated values in the time variable, an appropriate id variable can be generated using settransform(data, newid = seqid(time, radixorder(id, time))). Lags can then be computed using L(data, 1, ~newid, ~time) etc. This way collapse maintains a balance between offering very fast computations on 99% of time series and panels (which may be unbalanced but where observations for each entity are consecutive in time), and flexibility of application.

In general, for any regularly spaced panel the identity given by identical(groupid(id, order(id, time)), seqid(time, order(id, time))) should hold.

I note that regularly spaced panels with gaps in time (such as a panel-survey) can be handled either by seqid(..., del = gap) or, in most cases, simply by converting the time variable to factor using qF, which will make observations consecutive.

There are potentially other more analytical applications for seqid...

For the opposite operation of creating a new time-variable that is consecutive in each group, see data.table::rowid.

Examples

Run this code

# NOT RUN {
## This creates an irregularly spaced panel, with a gap in time for id = 2
data <- data.frame(id = rep(1:3, each = 4),
                   time = c(1:4, 1:2, 4:5, 1:4),
                   value = rnorm(12))
data
# }
# NOT RUN {
## Gaps in time error
L(data, 1, value ~ id, ~time)
# }
# NOT RUN {
## Generating new id variable (here seqid(time) would suffice as data is sorted)
settransform(data, newid = seqid(time, order(id, time)))
data

## Lag the panel
L(data, 1, value ~ newid, ~time)

## A different solution: Simply creating a consecutive time variable
settransform(data, newtime = data.table::rowid(id))
data
L(data, 1, value ~ id, ~newtime)

## With sorted data we could of course also omit the time variable alltogether...
L(data, 1, value ~ id)

# }

Run the code above in your browser using DataLab