Learn R Programming

⚠️There's a newer version (0.1.35) of this package.Take me there.

RAINBOWR

Reliable Association INference By Optimizing Weights with R

Author : Kosuke Hamazaki (hamazaki@ut-biomet.org)

Date : 2019/03/25 (Last update: 2022/01/31)

NOTE!!!!

The paper for RAINBOWR has been published in PLOS Computational Biology (https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1007663). If you use this RAINBOWR in your paper, please cite RAINBOWR as follows:

Hamazaki, K. and Iwata, H. (2020) RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method. PLOS Computational Biology, 16(2): e1007663.

The stable version for RAINBOWR package is now available at the CRAN (Comprehensive R Archive Network).

Please check the change in RAINBOWR with the version update from NEWS.md.

The older version of RAINBOWR is RAINBOW, which is available at https://github.com/KosukeHamazaki/RAINBOW.

We changed the package name from RAINBOW to RAINBOWR because the original package name RAINBOW conflicted with the package rainbow (https://cran.r-project.org/package=rainbow) when we submitted our package to CRAN (https://cran.r-project.org/).

In this repository, the R package RAINBOWR is available. Here, we describe how to install and how to use RAINBOWR.


What is RAINBOWR

RAINBOWR(Reliable Association INference By Optimizing Weights with R) is a package to perform several types of GWAS as follows.

  • Single-SNP GWAS by RGWAS.normal function
  • SNP-set (, haplotype-block, or gene-set) GWAS by RGWAS.multisnp function (which tests multiple SNPs at the same time)
  • Check epistatic (SNP-set x SNP-set interaction) effects by RGWAS.epistasis (very slow and less reliable)
  • Single-SNP GWAS including tests for interaction with genetic background by RGWAS.normal function
  • SNP-set (, haplotype-block, or gene-set) GWAS including tests for interaction with genetic background (or epistatic effects with polygenes) by RGWAS.multisnp function (which tests multiple SNPs at the same time)

RAINBOWR also offers some functions to solve the linear mixed effects model.

  • Solve multi-kernel linear mixed effects model with EM3.general function (using gaston, MM4LMM, or RAINBOWR packages; fast for gaston and MM4LMM)
  • Solve one-kernel linear mixed effects model with EMM.cpp function
  • Solve multi-kernel linear mixed effects model with EM3.cpp function (for the general kernel, not so fast)
  • Solve multi-kernel linear mixed effects model with EM3.linker.cpp function (for the linear kernel, fast)

By utilizing these functions, you can estimate the genomic heritability and perform genomic prediction (GP).

Finally, RAINBOWR offers other useful functions.

  • qq and manhattan function to draw Q-Q plot and Manhattan plot
  • modify.data function to match phenotype and marker genotype data
  • CalcThreshold function to calculate thresholds for GWAS results
  • See function to see a brief view of data (like head function, but more useful)
  • genetrait function to generate pseudo phenotypic values from marker genotype
  • SS_GWAS function to summarize GWAS results (only for simulation study)
  • estPhylo and estNetwork functions to estimate phylogenetic tree or haplotype network and haplotype effects with non-linear kernels for haplotype blocks of interest.
  • convertBlockList function to convert haplotype block list estimated by PLINK to the format which can be inputted as a gene.set argument in RGWAS.multisnp, RGWAS.multisnp.interaction, and RGWAS.epistasis functions.

Installation

The stable version of RAINBOWR is now available at the CRAN (Comprehensive R Archive Network). The latest version of RAINBOWR is also available at the KosukeHamazaki/RAINBOWR repository in the GitHub, so please run the following code in the R console.

#### Stable version of RAINBOWR ####
install.packages("RAINBOWR")  


#### Latest version of RAINBOWR ####
### If you have not installed yet, ...
install.packages("devtools")  

### Install RAINBOWR from GitHub
devtools::install_github("KosukeHamazaki/RAINBOWR")

If you get some errors via installation, please check if the following packages are correctly installed. (We removed a dependency on rgl package!)

Rcpp,      # also install `Rtools` for Windows user
plotly,    # Suggests
Matrix,
cluster,
MASS,
pbmcapply,
optimx,
methods,
ape,
stringr,
pegas,
ggplot2,     # Suggests
ggtree,      # Suggests, install from Bioconducter with `BiocManager::install("ggtree")`
scatterpie,  # Suggests
phylobase,   # Suggests
haplotypes,  # Suggests
rrBLUP,
expm,
here,        # Suggests
htmlwidgets,
Rfast,
adegenet,    # Suggests
gaston,
MM4LMM,
furrr,       # Suggests
future,      # Suggests
progressr,   # Suggests
foreach,     # Suggests, but stongly recommend the installation for Windows user to parallel computation
doParallel,  # Suggests
data.table   # Suggests

In RAINBOWR, since part of the code is written in Rcpp (C++ in R), please check if you can use C++ in R. For Windows users, you should install Rtools.

If you have some questions about installation, please contact us by e-mail (hamazaki@ut-biomet.org).

Usage

First, import RAINBOWR package and load example datasets. These example datasets consist of marker genotype (scored with {-1, 0, 1}, 1,536 SNP chip (Zhao et al., 2010; PLoS One 5(5): e10780)), map with physical position, and phenotypic data (Zhao et al., 2011; Nature Communications 2:467). Both datasets can be downloaded from Rice Diversity homepage (http://www.ricediversity.org/data/). Also, the dataset includes a list of haplotype blocks from the version 0.1.30. This list was estimated by the PLINK 1.9 (Taliun et al., 2014; BMC Bioinformatics, 15).

### Import RAINBOWR
require(RAINBOWR)

### Load example datasets
data("Rice_Zhao_etal")
Rice_geno_score <- Rice_Zhao_etal$genoScore
Rice_geno_map <- Rice_Zhao_etal$genoMap
Rice_pheno <- Rice_Zhao_etal$pheno
Rice_haplo_block <- Rice_Zhao_etal$haploBlock

### View each dataset
See(Rice_geno_score)
See(Rice_geno_map)
See(Rice_pheno)
See(Rice_haplo_block)

You can check the original data format by See function. Then, select one trait (here, Flowering.time.at.Arkansas) for example.

### Select one trait for example
trait.name <- "Flowering.time.at.Arkansas"
y <- Rice_pheno[, trait.name, drop = FALSE]

For GWAS, first you can remove SNPs whose MAF <= 0.05 by MAF.cut function.

### Remove SNPs whose MAF <= 0.05
x.0 <- t(Rice_geno_score)
MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
x <- MAF.cut.res$x
map <- MAF.cut.res$map

Next, we estimate additive genomic relationship matrix (GRM) by using calcGRM function.

### Estimate genomic relationship matrix (GRM) 
K.A <- calcGRM(genoMat = x)

Next, we modify these data into the GWAS format of RAINBOWR by modify.data function.

### Modify data
modify.data.res <- modify.data(pheno.mat = y, geno.mat = x, map = map,
                               return.ZETA = TRUE, return.GWAS.format = TRUE)
pheno.GWAS <- modify.data.res$pheno.GWAS
geno.GWAS <- modify.data.res$geno.GWAS
ZETA <- modify.data.res$ZETA

### View each data for RAINBOWR
See(pheno.GWAS)
See(geno.GWAS)
str(ZETA)

ZETA is a list of genomic relationship matrix (GRM) and its design matrix.

Finally, we can perform GWAS using these data. First, we perform single-SNP GWAS by RGWAS.normal function as follows.

### Perform single-SNP GWAS
normal.res <- RGWAS.normal(pheno = pheno.GWAS, geno = geno.GWAS,
                           ZETA = ZETA, n.PC = 4, skip.check = TRUE, P3D = TRUE)
See(normal.res$D)  ### Column 4 contains -log10(p) values for markers
### Automatically draw Q-Q plot and Manhattan by default.

Next, we perform SNP-set GWAS by RGWAS.multisnp function.

### Perform SNP-set GWAS (by regarding 11 SNPs as one SNP-set)
SNP_set.res <- RGWAS.multisnp(pheno = pheno.GWAS, geno = geno.GWAS, ZETA = ZETA, 
                              n.PC = 4, test.method = "LR", kernel.method = "linear", 
                              gene.set = NULL, skip.check = TRUE, 
                              test.effect = "additive", window.size.half = 5, window.slide = 11)

See(SNP_set.res$D)  ### Column 4 contains -log10(p) values for markers

You can perform SNP-set GWAS with sliding window by setting window.slide = 1. And you can also perform gene-set (or haplotype-block based) GWAS by assigning the following data set to gene.set argument. (You can check the example also by See(Rice_haplo_block) in R.)

ex.)

gene (or haplotype block)marker
haploblock_1id1005261
haploblock_1id1005263
haploblock_2id1009557
haploblock_2id1009616
haploblock_3id1020154
......

For example, when using the list of haplotype blocks estimated by PLINK,

### Perform haplotype-block based GWAS (by using hapltype blocks estimated by PLINK)
haplo_block.res <- RGWAS.multisnp(pheno = pheno.GWAS, geno = geno.GWAS, ZETA = ZETA, 
                              n.PC = 4, test.method = "LR", kernel.method = "linear", 
                              gene.set = Rice_haplo_block, skip.check = TRUE, 
                              test.effect = "additive")

See(haplo_block.res$D)  ### Column 4 contains -log10(p) values for markers

There is no significant block for this dataset because the number of markers and blocks is too small for this dataset. However, when whole-genome sequencing data is available, the impact of using SNP-set/gene-set/haplotype-block methods becomes larger and we strongly recommend you use these methods. Please see Hamazaki and Iwata (2020, PLOS Comp Biol) for more details of the features of these methods.

Help

If you want some help before performing GWAS with RAINBOWR, please see the help for each function by ?function_name.

References

Kennedy, B.W., Quinton, M. and van Arendonk, J.A. (1992) Estimation of effects of single genes on quantitative traits. J Anim Sci. 70(7): 2000-2012.

Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci. 100(16): 9440-9445.

Yu, J. et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 38(2): 203-208.

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Kang, H.M. et al. (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 42(4): 348-354.

Zhang, Z. et al. (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 42(4): 355-360.

Endelman, J.B. (2011) Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome J. 4(3): 250.

Endelman, J.B. and Jannink, J.L. (2012) Shrinkage Estimation of the Realized Relationship Matrix. G3 Genes, Genomes, Genet. 2(11): 1405-1413.

Su, G. et al. (2012) Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers. PLoS One. 7(9): 1-7.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Listgarten, J. et al. (2013) A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics. 29(12): 1526-1533.

Lippert, C. et al. (2014) Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics. 30(22): 3206-3214.

Jiang, Y. and Reif, J.C. (2015) Modeling epistasis in genomic selection. Genetics. 201(2): 759-768.

Hamazaki, K. and Iwata, H. (2020) RAINBOW: Haplotype-based genome-wide association study using a novel SNP-set method. PLOS Computational Biology, 16(2): e1007663.

Copy Link

Version

Install

install.packages('RAINBOWR')

Monthly Downloads

328

Version

0.1.33

License

MIT + file LICENSE

Maintainer

Kosuke Hamazaki

Last Published

September 12th, 2023

Functions in RAINBOWR (0.1.33)

EM3.general

Equation of mixed model for multi-kernel including using other packages (with other packages, much faster than EM3.cpp)
EM3.op

Equation of mixed model for multi-kernel using other packages (much faster than EM3.cpp)
EMM.cpp

Equation of mixed model for one kernel, a wrapper of two methods
MAF.cut

Function to remove the minor alleles
EMM1.cpp

Equation of mixed model for one kernel, GEMMA-based method (inplemented by Rcpp)
EM3.linker.cpp

Equation of mixed model for multi-kernel (fast, for limited cases)
RAINBOWR

RAINBOWR: Perform Genome-Wide Asscoiation Study (GWAS) By Kernel-Based Methods
CalcThreshold

Function to calculate threshold for GWAS
EM3.cpp

Equation of mixed model for multi-kernel (slow, general version)
RGWAS.epistasis

Check epistatic effects by kernel-based GWAS (genome-wide association studies)
EMM2.cpp

Equation of mixed model for one kernel, EMMA-based method (inplemented by Rcpp)
Rice_geno_score

Marker genotype of rice genome
RGWAS.twostep

Perform normal GWAS (genome-wide association studies) first, then perform SNP-set GWAS for relatively significant markers
RGWAS.menu

Print the R code which you should perform for RAINBOWR GWAS
RGWAS.normal

Perform normal GWAS (test each single SNP)
Rice_haplo_block

Physical map of rice genome
Rice_Zhao_etal

Rice_Zhao_etal:
RGWAS.multisnp

Testing multiple SNPs simultaneously for GWAS
RGWAS.multisnp.interaction

Testing multiple SNPs and their interaction with some kernel simultaneously for GWAS
Rice_geno_map

Physical map of rice genome
See

Function to view the first part of data (like head(), tail())
Rice_pheno

Phenotype data of rice field trial
SS_gwas

Calculate some summary statistics of GWAS (genome-wide association studies) for simulation study
adjustGRM

Function to adjust genomic relationship matrix (GRM) with subpopulations
is.diag

Function to judge the square matrix whether it is diagonal matrix or not
RGWAS.normal.interaction

Perform normal GWAS including interaction (test each single SNP)
calcGRM

Function to calculate genomic relationship matrix (GRM)
plotHaploNetwork

Function to plot haplotype network from the estimated results
RGWAS.twostep.epi

Perform normal GWAS (genome-wide association studies) first, then check epistatic effects for relatively significant markers
plotPhyloTree

Function to plot phylogenetic tree from the estimated results
score.calc.score

Calculate -log10(p) of each SNP-set by the score test
make.full

Change a matrix to full-rank matrix
estNetwork

Function to estimate & plot haplotype network
manhattan2

Draw manhattan plot (another method)
cumsumPos

Function to calculate cumulative position (beyond chromosome)
manhattan3

Draw the effects of epistasis (3d plot and 2d plot)
estPhylo

Function to estimate & plot phylogenetic tree
score.calc

Calculate -log10(p) for single-SNP GWAS
qq

Draw qq plot
score.calc.epistasis.LR.MC

Calculate -log10(p) of epistatic effects by LR test (multi-cores)
score.cpp

Calculte -log10(p) by score test (slow, for general cases)
score.calc.LR.MC

Calculate -log10(p) of each SNP-set by the LR test (multi-cores)
modify.data

Function to modify genotype and phenotype data to match
score.calc.epistasis.score

Calculate -log10(p) of epistatic effects with score test
score.calc.int.MC

Calculate -log10(p) for single-SNP GWAS with interaction (multi-cores)
score.linker.cpp

Calculte -log10(p) by score test (fast, for limited cases)
convertBlockList

Function to convert haplotype block list from PLINK to RAINBOWR format
design.Z

Function to generate design matrix (Z)
manhattan

Draw manhattan plot
genesetmap

Function to generate map for gene set
spectralG.cpp

Perform spectral decomposition (inplemented by Rcpp)
genetrait

Generate pseudo phenotypic values
manhattan.plus

Add points of -log10(p) corrected by kernel methods to manhattan plot
score.calc.LR.int

Calculate -log10(p) of each SNP-set and its interaction with kernels by the LR test
score.calc.LR

Calculate -log10(p) of each SNP-set by the LR test
score.calc.LR.int.MC

Calculate -log10(p) of each SNP-set and its interaction with kernels by the LR test (multi-cores)
parallel.compute

Function to parallelize computation with various methods
score.calc.int

Calculate -log10(p) for single-SNP GWAS with interaction
score.calc.MC

Calculate -log10(p) for single-SNP GWAS (multi-cores)
score.calc.score.MC

Calculate -log10(p) of each SNP-set by the score test (multi-cores)
score.calc.epistasis.LR

Calculate -log10(p) of epistatic effects by LR test
welcome_to_RGWAS

Function to greet to users
score.calc.epistasis.score.MC

Calculate -log10(p) of epistatic effects with score test (multi-cores)