UPMASKdata: Run UPMASK in a data frame

Description

UPMASKdata executes the UPMASK method on a data frame, and returns another data frame as output, including the membership analysis result as additional columns.

UPMASKdata is a method for performing membership assignment in stellar clusters. The distributed code is prepared to use photometry and spatial positions, but it can take into account other types of data as well. The method is able to take into account arbitrary error models (the used must rewrite the takeErrorsIntoAccount function), and it is unsupervised, data-driven, physical-model-free and relies on as few assumptions as possible. The approach followed for membership assessment is based on an iterative process, dimensionality reduction, a clustering algorithm and a kernel density estimation.

Usage

UPMASKdata(dataTable, positionDataIndexes=c(1,2),
photometricDataIndexes=c(3,5,7,9,11,19,21,23,25,27),
photometricErrorDataIndexes=c(4,6,8,10,12,20,22,24,26,28), threshold=1, 
classAlgol="kmeans", maxIter=25, starsPerClust_kmeans=25, nstarts_kmeans=50, 
nRuns=8, runInParallel=FALSE, paralelization="multicore", independent=TRUE, 
verbose=FALSE, autoCalibrated=FALSE, considerErrors=FALSE, 
finalXYCut=FALSE, nDimsToKeep=4, dimRed="PCA", scale=TRUE)

Arguments

dataTable

a data frame with the data to perform the analysis

positionDataIndexes

an array of integers indicating the columns of the data frame containing the spatial position measurements

photometricDataIndexes

an array of integers with the column numbers containing photometric measurements (or any other measurement to go into the PCA step)

photometricErrorDataIndexes

an array of integers with the column numbers containing the errors of the photometric measurements

threshold

a double indicating the thresholding level for the random field analysis

classAlgol

a string indicating the type of clustering algorithm to consider. Only k-means is implemented at this moment (defaults to kmeans)

maxIter

an integer the maximum amount of iterations of the outer loop before giving up convergence (usually it is not necessary to modify this)

starsPerClust_kmeans

an integer with the average number of stars per k-means cluster

nstarts_kmeans

an integer the amount of random re-initializations of the k-means clustering method (usually it is not necessary to modify this)

nRuns

the total number of individual runs to execute the total number of outer loop runs to execute

runInParallel

a boolean indicating if the code should run in parallel

paralelization

a string with the type of paralilization to use. the paralelization can be: "multicore" or "MPIcluster". At this moment only "multicore" is implemented (defaults to multicore).

independent

a boolean indicating if non-parallel runs should be completely independent

verbose

a boolean indicating if the output to screen should be verbose

autoCalibrated

a boolean indicating if the number of random field realizations for the clustering check in the position space should be autocalibrated (experimental code, defaults to FALSE).

considerErrors

a boolean indicating if the errors should be taken into account

finalXYCut

a boolean indicating if a final cut in the XY space should be performed (defaults to FALSE)

nDimsToKeep

an integer with the number of dimensions to consider (defaults to 4)

dimRed

a string with the dimensionality reduction method to use (defaults to PCA. The only other options are LaplacianEigenmaps or None)

scale

a boolean indicating if the data should be scaled and centered

Value

A data frame with the original data used to run the method and additional columns indicating the classification at each run, as well as a membership probability in the frequentist sense.

References

Krone-Martins, A. & Moitinho, A., A&A, v.561, p.A57, 2014

Examples

Run this code

# NOT RUN {
# Analyse a simulated open cluster using spatial and photometric data 
# Load the data into a data frame
fileNameI <- "oc_12_500_1000_1.0_p019_0880_1_25km_120nR_withcolors.dat"
inputFileName <- system.file("extdata", fileNameI, package="UPMASK")
ocData <- read.table(inputFileName, header=TRUE)

# Example of how to run UPMASK using data from a data frame
# (serious analysis require at least larger nRuns)
posIdx <- c(1,2)
photIdx <- c(3,5,7,9,11,19,21,23,25,27)
photErrIdx <- c(4,6,8,10,12,20,22,24,26,28)

upmaskRes <- UPMASKdata(ocData, posIdx, photIdx, PhotErrIdx, nRuns=2, 
                        starsPerClust_kmeans=25, verbose=TRUE)

# Create a simple raw plot to see the results
pCols <- upmaskRes[,length(upmaskRes)]/max(upmaskRes[,length(upmaskRes)])
plot(upmaskRes[,1], upmaskRes[,2], col=rgb(0,0,0,pCols), cex=0.5, pch=19)

# Clean the environment
rm(list=c("inputFileName", "ocData", "posIdx", "photIdx", "photErrIdx", 
          "upmaskRes", "pCols"))
# }
# NOT RUN {
 
# }

Run the code above in your browser using DataLab