Learn R Programming

MergeGUI (version 0.2-1)

A GUI for Merging Datasets in R

Description

A GUI for merging datasets in R using gWidgets.

Copy Link

Version

Install

install.packages('MergeGUI')

Monthly Downloads

19

Version

0.2-1

License

GPL (>= 2.0)

Maintainer

Xiaoyue Cheng

Last Published

January 27th, 2014

Functions in MergeGUI (0.2-1)

Short the names from a template. The merging GUI is designed to merge data from different files. But sometimes the file names are too long to be displayed in the GUI. Hence this function is used to short the basenames by removing the same beginning letters of each name. Hence the output is a character vector whose elements will not start with the same letter.

Obtain the intersection of a list of vectors. Function "intersect" in the base package can only intersect two vectors. The function "intersect2" is designed to obtain the intersection and the difference for more than two vectors. The input should be a list whose elements are the vectors, and the outputs include the intersection of all vectors and a list whose elements are the input vectors substracting the intersection. Besides, intersect2 allows the labels of the vectors. If a list of labels is given in the input, then the outputs will also include a matrix of labels which match the intersection for the vectors, and a list of labels which match the left part of the vectors.

Compute the p-values of the Kolmogorov-Smirnov tests between different sources for each variable. This function is used to detect whether the matched variables from different files have different distributions. For each variable, it will compute the pairwise KS-test p-values among the sources, then report the lowest p-value as the indice for this variable.

Detect the classes of the variables.

The Merging GUI. This function will start with an starting interface, allowing 1) selecting several data files; 2) doing the next command with more than one files. There are two commands which could be selected: match the variables, match the cases by the key variable. In the matching-variable interface the user can 1) check the matching of the variables among files and switch the variable names if they are wrongly matched; 2) look at the numerical and graphical summaries for the selected variables, or the dictionary for selected factor varibles; 3) observe the misclassification rate, KS-test p-values and Chi-square test p-values for each variable, which helps to determine whether any transformation is needed for the variable; (For each variable, the user may want to know whether it could distinguish the sources correctly. So the misclassification rate is calculated through the tree model. KS-test is used to check whether any variable has different distributions for different sources. And the Chi-square test is useful when the user is interested in the pattern of missing values among the sources.) 4) change the name or class for any variable; 5) export the merged dataset and the summary for it. In the matching-case interface the user can determine a primary key for each data file and then merge the cases by the key.

Compute the misclassification rate for each variable. When merging data from several datasets, it is meaningful to detect whether the matched variables from different files have different centers. The function computes the misclassification rate variable by variable using classification tree (the rpart package). It will firstly merge the dataset by the given nametable.class, then use rpart for each variable to seperate the data without any covariates and compute the misclassification rate.

Chi-square tests for the counts of missing and non-missing. This function is used to detect whether the matched variables from different files have different missing patterns. For each variable, it will firstly count the missing and non-missing values among the sources, and then form a contingency table. The p-value of Chi-square test is computed from the contingency table and finally reported for the variable.