Learn R Programming

genio

The genio (GENetics I/O) package provides easy-to-use and efficient readers and writers for formats in genetics research. Currently targets Plink, Eigenstrat, and GCTA formats (more to come). Plink BED/BIM/FAM and GCTA GRM formats are fully supported. Lightning fast read_bed and write_bed (written in Rcpp) reads and writes genotypes between native R matrices and Plink BED format. make_* functions create default FAM and BIM files to go with simulated genotype data. Otherwise, the package consists of wrappers for readr functions that add missing extensions and column names (often absent in these files).

Installation

You can install the released version of genio from CRAN with:

install.packages("genio")

Install the latest development version from GitHub:

install.packages("devtools") # if needed
library(devtools)
install_github("OchoaLab/genio", build_vignettes = TRUE)

You can see the package vignette, which has more detailed documentation, by typing this into your R session:

vignette('genio')

Example

Load library:

library(genio)

Make a BED/BIM/FAM file set for simulated data

Note that write_plink writes all three BED/BIM/FAM files together, while each write_{bed,bim,fam} function creates a single file.

# write your genotype matrix stored in an R native matrix

# (here we create a small example with random data)
# create 10 random genotypes
X <- rbinom(10, 2, 0.5)
# replace 3 random genotypes with missing values
X[sample(10, 3)] <- NA
# turn into 5x2 matrix
X <- matrix(X, nrow = 5, ncol = 2)

# also create a simulated phenotype vector
pheno <- rnorm(2) # two individuals as above

# write simulated data to all BED/BIM/FAM files in one handy command
# missing BIM and FAM columns are automatically generated
# data dimensions are validated for provided data
write_plink('random', X, pheno = pheno)

### same thing in separate steps:

# create default tables to go with simulated genotype data
fam <- make_fam(n = 2)
bim <- make_bim(n = 5)
# overwrite with simulated phenotype
fam$pheno <- pheno

# write simulated data to BED/BIM/FAM separately (one command each)
# extension can be omitted and it still works!
write_bed('random', X)
write_fam('random', fam)
write_bim('random', bim)

Reading and writing existing data

# read individual and locus data into "tibbles"

# read plink data all at once
data <- read_plink('sample')
# extract genotypes and annotation tables
X   <- data$X
bim <- data$bim
fam <- data$fam

# Plink files read individually
bim <- read_bim('sample.bim')
fam <- read_fam('sample.fam')
X   <- read_bed('sample.bed', nrow(bim), nrow(fam))

# Eigenstrat formats
snp <- read_snp('sample.snp')
ind <- read_ind('sample.ind')

# in all cases extension can be omitted and it still works!
bim <- read_bim('sample')
fam <- read_fam('sample')
snp <- read_snp('sample')
ind <- read_ind('sample')

# write these data to other files
# here extensions are also added automatically
# write all plink files together, ensuring consistency
write_plink('new', X, bim, fam)
# write plink files individually
write_fam('new', fam)
write_bim('new', bim)
write_bed('new', X)
# Eigenstrat files
write_ind('new', ind)
write_snp('new', snp)

Reading and writing GCTA GRM files

# read data from GRM files:
# - sample.grm.bin (kinship matrix),
# - sample.grm.N.bin (sample sizes matrix), and
# - sample.grm.id (family and ID table for individuals in this data)
obj <- read_grm( 'sample' )
# the kinship matrix
kinship <- obj$kinship
# the pair sample sizes matrix
M <- obj$M
# the fam and ID tibble
fam <- obj$fam

# write data into new GRM files
# writes: new.grm.bin, new.grm.N.bin, new.grm.id
write_grm( 'new', kinship, M = M, fam = fam )

Copy Link

Version

Install

install.packages('genio')

Monthly Downloads

748

Version

1.1.2

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Alejandro Ochoa

Last Published

January 6th, 2023

Functions in genio (1.1.2)

read_fam

Read Plink *.fam files
read_matrix

Read a numerical matrix file into an R matrix
read_ind

Read Eigenstrat *.ind files
read_bim

Read Plink *.bim files
read_eigenvec

Read Plink eigenvec file
read_snp

Read Eigenstrat *.snp files
geno_to_char

Convert a genotype matrix from numeric to character codes
require_files_grm

Require that GCTA binary GRM files are present
require_files_phen

Require that PHEN file is present
require_files_plink

Require that Plink binary files are present
sex_to_int

Convert character sex codes to integer codes
sex_to_char

Convert integer sex codes to character codes
tidy_kinship

Create a tidy version of a kinship matrix
write_bed

Write a genotype matrix into Plink BED format
write_eigenvec

Write eigenvectors table into a Plink-format file
write_bim

Write Plink *.bim files
write_plink

Write genotype and sample data into a Plink BED/BIM/FAM file set.
write_phen

Write *.phen files
write_ind

Write Eigenstrat *.ind files
write_fam

Write Plink *.fam files
write_grm

Write GCTA GRM and related plink2 binary files
read_grm

Read GCTA GRM and related plink2 binary files
write_matrix

Write a matrix to a file without row or column names
read_phen

Read *.phen files
write_snp

Write Eigenstrat *.snp files
read_plink

Read genotype and sample data in a Plink BED/BIM/FAM file set.
delete_files_phen

Delete PHEN files
genio

genio (GENetics I/O): A package for reading and writing genetics data
delete_files_plink

Delete all Plink binary files
ind_to_fam

Convert an Eigenstrat IND tibble into a Plink FAM tibble
make_bim

Create a Plink BIM tibble
make_fam

Create a Plink FAM tibble
count_lines

Count the number of lines of a file
delete_files_grm

Delete all GCTA binary GRM files
read_bed

Read a genotype matrix in Plink BED format