Learn R Programming

DataCombine

Christopher Gandrud

Version 0.2.21

Please report any bugs or suggestions at: https://github.com/christophergandrud/DataCombine/issues.

Motivation and Functions

DataCombine is a set of miscellaneous tools intended to make combining data sets--especially time-series cross-section data--easier. The package is continually being developed as I turn lines of code that I frequently use into single functions. It currently includes the following functions:

  • CasesTable function added to report cases after listwise deletion of

missing values for time-series cross-sectional data.

  • change: calculates the absolute, percentage, and proportion change from

a specified lag, including within groups.

  • CountSpell: function that returns a variable counting the spell number

for an observation. Works with grouped data.

  • dMerge: merges 2 data frames and report/drop/keeps only duplicates.

  • DropNA: drops rows from a data frame when they have missing (NA) values on a

given variable(s).

  • FillDown: fills in missing (NA) values with the previous non-missing value

  • FillIn: fills in missing values of a variable from one data frame with the

values from another variable.

  • FindDups: find duplicated values in a data frame and subset it to either

include or not include them.

  • FindReplace: replaces multiple patterns found in a character string column

of a data frame.

  • grepl.sub: subsets a data frame if a specified pattern is found in a

character string.

  • InsertRow: allows user to insert a row into a data frame. Largely

implements: Ari B. Friedman's function.

  • MoveFront: moves variables to the front of a data frame. This can be useful

if you have a data frame with many variables and want to move a variable or variables to the front.

  • NaVar: create new variable(s) indicating if there are missing values in

other variable(s).

  • shift: creates lag and lead variables, including for time-series

cross-sectional data. The shifted variable is returned to a new vector. This function is largely based on TszKin Julian's shift function.

  • slide: creates lag and lead variables, including for time-series

cross-sectional data. The slid variable are added to the original data frame. This expands the capabilities of shift.

  • slideMA: creates a moving average for a period before or after each time

point for a given variable.

  • SpreadDummy: spread a dummy variable (1's and 0') over a specified time

period and for specified groups.

  • StartEnd: finds the starting and ending time points of a spell, including

for time-series cross-sectional data.

  • rmExcept: removes all objects from a workspace except those specified by the

user.

  • TimeExpand: expands a data set so that it includes an observation for each

time point in a sequence. Works with grouped data.

  • TimeFill: creates a continuous Unit-Time-Dummy data frame from a data

frame with Unit-Start-End times.

  • VarDrop: drops one or more variables from a data frame.

Updates

I will continue to add to the package as I build data sets and run across other pesky tasks I do repeatedly that would be simpler if they were completed by a single function.

Installation

DataCombine is on CRAN.

You can also install the most recent stable version with install_github from the devtools:

devtools::install_github('christophergandrud/DataCombine')

Copy Link

Version

Install

install.packages('DataCombine')

Monthly Downloads

10,259

Version

0.2.21

License

GPL (>= 3)

Last Published

April 13th, 2016

Functions in DataCombine (0.2.21)

grepl.sub

Subset a data frame if a specified pattern is found in a character string
PercChange

Calculate the percentage change from a specified lag, including within groups
CasesTable

Create reports cases after listwise deletion of missing values for time-series cross-sectional data.
InsertRow

Inserts a new row into a data frame
shiftMA

Internal function for slideMA
FindReplace

Replace multiple patterns found in a character string column of a data frame
CountSpell

Count spells, including for grouped data
MoveFront

Move variables to the front of a data frame.
slideMA

Create a moving average for a period before or after each time point for a given variable
StartEnd

Find the starting and ending time points of a spell
slide

A function for creating lag and lead variables, including for time-series cross-sectional data.
SpreadDummy

Spread a dummy variable (1's and 0') over a specified time period and for specified groups
change

Calculate the changes (absolute, percent, and proportion) changes from a specified lag, including within groups
VarDrop

Drop one or more variables from a data frame.
TimeFill

Creates a continuous Unit-Time-Dummy data frame from a data frame with Unit-Start-End times
TimeExpand

Expands a data set so that it includes an observation for each time point in a sequence. Works with grouped data.
FindDups

Find duplicated values in a data frame and subset it to either include or not include them.
rmExcept

Remove all objects from a workspace except those specified by the user.
NaVar

Create new variable(s) indicating if there are missing values in other variable(s)
shift

A function for creating lag and lead variables.
FillIn

A function for filling in missing values of a variable from one data frame with the values from another variable.
FillDown

Fills in missing (NA) values with the previous non-missing value
dMerge

Merges 2 data frames and report/drop/keeps only duplicates.
DropNA

Drop rows from a data frame with missing values on a given variable(s).