Learn R Programming

UPMASK (version 1.2)

UPMASKfile: Run UPMASK in a file

Description

UPMASKfile executes the UPMASK method using a file as an input and writes another file as an output. This is a wrapper function that only reads a file into an R data frame, calls the UPMASKdata function using this data frame and the parameters passed by the user and writes the output into another file.

Usage

UPMASKfile(filenameWithPathInput, filenameWithPathOuput, 
positionDataIndexes=c(1,2), photometricDataIndexes=c(3,5,7,9,11,19,21,23,25,27),
photometricErrorDataIndexes=c(4,6,8,10,12,20,22,24,26,28), threshold=1, 
maxIter=20, starsPerClust_kmeans=50, nstarts_kmeans=50, nRuns=5, 
runInParallel=FALSE, paralelization="multicore", independent=TRUE, verbose=FALSE, 
autoCalibrated=FALSE, considerErrors=FALSE, finalXYCut=FALSE, 
fileWithHeader=FALSE, nDimsToKeep=4, dimRed="PCA", scale=TRUE)

Arguments

filenameWithPathInput

a string indicating the file containing the data to run UPMASK on (with full path)

filenameWithPathOuput

a string indicating the file where the output shall be written (with full path)

positionDataIndexes

an array of integers indicating the columns of the file containing the spatial position measurements

photometricDataIndexes

an array of integers with the column numbers containing photometric measurements (or any other measurement to go into the PCA step)

photometricErrorDataIndexes

an array of integers with the column numbers containing the errors of the photometric measurements

threshold

a double indicating the thresholding level for the random field analysis

maxIter

an integer the maximum amount of iterations of the outer loop before giving up convergence (usually it is not necessary to modify this)

starsPerClust_kmeans

an integer with the average number of stars per k-means cluster

nstarts_kmeans

an integer the amount of random re-initializations of the k-means clustering method (usually it is not necessary to modify this)

nRuns

the total number of individual runs to execute the total number of outer loop runs to execute

runInParallel

a boolean indicating if the code should run in parallel

paralelization

a string with the type of paralilization to use. the paralelization can be: "multicore" or "MPIcluster". At this moment only "multicore" is implemented (defaults to multicore).

independent

a boolean indicating if non-parallel runs should be completely independent

verbose

a boolean indicating if the output to screen should be verbose

autoCalibrated

a boolean indicating if the number of random field realizations for the clustering check in the position space should be autocalibrated (experimental code, defaults to FALSE).

considerErrors

a boolean indicating if the errors should be taken into account

finalXYCut

a boolean indicating if a final cut in the XY space should be performed (defaults to FALSE)

fileWithHeader

a boolean indicating if the input file has a text header

nDimsToKeep

an integer with the number of dimensions to consider (defaults to 4)

dimRed

a string with the dimensionality reduction method to use (defaults to PCA. The only other options are LaplacianEigenmaps or None)

scale

a boolean indicating if the data should be scaled and centered

References

Krone-Martins, A. & Moitinho, A., A&A, v.561, p.A57, 2014

Examples

Run this code
# NOT RUN {
# Analyse a simulated open cluster using spatial and photometric data 
# Create strings with filenames
fileNameI <- "oc_12_500_1000_1.0_p019_0880_1_25km_120nR_withcolors.dat"
inputFileName <- system.file("extdata", fileNameI, package="UPMASK")
outputFileName <- file.path(tempdir(), "up-RESULTS.dat")

# Example of how to run UPMASK using data from a file
# (serious analysis require at least larger nRuns)
posIdx <- c(1,2)
photIdx <- c(3,5,7,9,11,19,21,23,25,27)
photErrIdx <- c(4,6,8,10,12,20,22,24,26,28)
UPMASKfile(inputFileName, outputFileName, posIdx, photIdx, photErrIdx, nRuns=5, 
           starsPerClust_kmeans=25, verbose=TRUE, fileWithHeader=TRUE)

# Open the resulting file to inspect the results
tempResults <- read.table(outputFileName, header=TRUE)

# Create a simple raw plot to see the results
pCols <- tempResults[,length(tempResults)]/max(tempResults[,length(tempResults)])
plot(tempResults[,1], tempResults[,2], col=rgb(0,0,0,pCols), cex=0.5, pch=19)

# Clean the environment
rm(list=c("tempResults", "inputFileName", "outputFileName", "pCols", "fileNameI"))
# }
# NOT RUN {
 
# }

Run the code above in your browser using DataLab