Learn R Programming

OriGen (version 1.4.3)

FindRhoParameterCrossValidation: Finds the appropriate value of the Rho parameter via crossvalidation.

Description

This function finds the appropriate value of the tuning constant, RhoParameter, via a leave one sample site out cross validation.

Usage

FindRhoParameterCrossValidation(PlinkFileName,LocationFileName,MaxIts=6,MaxGridLength=20)

Arguments

PlinkFileName
Base name of Plink PED file (i.e. without ".ped" or ".map")
LocationFileName
Space or tab delimited text file with Longitude and Latitude coordinates for each individual listed in the 4th and 5th columns respectively. Note that rows should correspond to the individuals in the Plink File. Also, this file should have a header row.
MaxIts
An integer giving the number of iterations before selecting the rho parameter. Note that this is a long process so it is best to start small.
MaxGridLength
An integer giving the maximum number of boxes to fill the longer side of the region. Note that computation time increases quadratically as this number increases, but this number also should be high enough to separate different sample sites otherwise they

Value

  • List with the following components:
  • PlinkFileNameThis shows the inputted PlinkFileName with ".ped" attached.
  • LocationFileThis shows the inputted LocationFileName.
  • NumberSNPsThis shows the integer number of SNPs found.
  • MaxItsAn integer giving the number of iterations before selecting the rho parameter. Note that this is a long process so it is best to start small. This number is inputted into the function.
  • MaxGridLengthAn integer giving the maximum number of boxes to fill the longer side of the region. Note that computation time increases quadratically as this number increases, but this number also should be high enough to separate different sample sites otherwise they will be binned together as a single site. This number was part of the inputs.
  • RhoVectorAn array giving the tested values of RhoParameter along with the resulting cross validation results where lower is better.
  • GridLengthAn array giving the number of longitudinal and latitudinal divisions. The dimension of this array is [2], where the first number is longitude and the second is latitude.
  • RhoParameterA real value showing the best RhoParameter value found.
  • SampleSitesThis shows the integer number of sample sites found.

References

Ranola J, Novembre J, Lange K (2014) Fast Spatial Ancestry via Flexible Allele Frequency Surfaces. Bioinformatics, in press.

See Also

ConvertPEDData for converting Plink PED files into a format appropriate for analysis,

FitOriGenModel for fitting allele surfaces to the converted data,

PlotAlleleFrequencySurface for a quick way to plot the resulting allele frequency surfaces from FitOriGenModel,

ConvertUnknownPEDData for converting two Plink PED files (known and unknown)into a format appropriate for analysis,

FitOriGenModelFindUnknowns for fitting allele surfaces to the converted data and finding the locations of the given unknown individuals,

PlotUnknownHeatMap for a quick way to plot the resulting unknown heat map surfaces from FitOriGenModelFindUnknowns,;

FitAdmixedModelFindUnknowns for fitting allele surfaces to the converted data and finding the locations of the given unknown individuals who may be admixed,

PlotAdmixedSurface for a quick way to plot the resulting admixture surfaces from FitAdmixedFindUnknowns,

RankSNPsLRT for reducing the number of SNPs using a likelihood ratio test criteria or informativeness for assignment,

Examples

Run this code
#Note that Plink files "10SNPs.ped", "10SNPs.map" and also "Locations.txt" 
#are included in the data folder of the OriGen package.  
#Please navigate to the appropriate location before testing 
#the following commands.

trials5=FindRhoParameterCrossValidation("10SNPs","Locations.txt",
	MaxIts=4,MaxGridLength=20)
trials5

Run the code above in your browser using DataLab