Learn R Programming

DNAtools

There are two main features of this package:

  • Computation of the distribution of the numbers of alleles in DNA mixtures.
  • Empirical testing of DNA match probabilities.

Each is described in a separate vignette, and a small example given below under “Getting started”. The documentation (vignettes and manual) is both included in package and available for reading online at https://mikldk.github.io/DNAtools/.

Install

With internet access

To build and install from Github using R 3.3.0 (or later) and the R devtools package 1.11.0 (or later) run this command from within R:

devtools::install_github("mikldk/DNAtools", 
                         build_opts = c("--no-resave-data", "--no-manual"))

You can also install the package without vignettes if needed as follows:

devtools::install_github("mikldk/DNAtools")

Without internet access

To install on a computer without internet access:

  1. Download DNAtools as a .tar.gz archive from GitHub, transfer to the destination computer, e.g. using removable media
  2. Install devtools and DNAtools pre-requisites (multicool, Rcpp, RcppParallel, RcppProgress, Rsolnp)
  3. Install DNAtools in R using the devtools::install_local() function

Contribute, issues, and support

Please use the issue tracker at https://github.com/mikldk/DNAtools/issues if you want to notify us of an issue or need support. If you want to contribute, please either create an issue or make a pull request.

Getting started

Please read the vignettes for more elaborate explanations than those given below. The below example is meant to illustrate some of the functionality the package provides in a compact fashion.

Say that we have a reference database:

data(dbExample, package = "DNAtools")
head(dbExample)[, 2:7]
#>   D16S539.1 D16S539.2 D18S51.1 D18S51.2 D19S433.1 D19S433.2
#> 1        11        11       15       21        14        14
#> 2        13        12       15       14        16        16
#> 3         9         9       13       17        14        14
#> 4        11        12       14       15        15        13
#> 5        12        12       17       12      15.2        13
#> 6         9        13       17       14        13        14
dim(dbExample)
#> [1] 1000   21

We now find the allele frequencies:

allele_freqs <- lapply(1:10, function(x){
  al_freq <- table(c(dbExample[[x*2]], dbExample[[1+x*2]]))/(2*nrow(dbExample))
  al_freq[sort.list(as.numeric(names(al_freq)))]
})
names(allele_freqs) <- sub("\\.1", "", names(dbExample)[(1:10)*2])

Number of alleles

One could ask: What is the distribution of the number of alleles observed in a three person mixture?

The distribution of the number of alleles in a three person mixture can be calculated by this package. We focus on the D16S539 locus:

allele_freqs$D16S539
#> 
#>      8      9     10     11     12     13     14 
#> 0.0005 0.1910 0.0195 0.2755 0.2860 0.2255 0.0020
noa <- Pnm_locus(m = 3, theta = 0, alleleProbs = allele_freqs$D16S539)
names(noa) <- seq_along(noa)
noa
#>           1           2           3           4           5           6 
#> 0.001164550 0.089551483 0.492098110 0.389529448 0.027534048 0.000122361

This can be illustrated by a barchart:

 Number of alleles Frequency                                        
 1                                                                  
 2                 |||||||||                                        
 3                 |||||||||||||||||||||||||||||||||||||||||||||||||
 4                 |||||||||||||||||||||||||||||||||||||||          
 5                 |||                                              
 6                                                                  

So it is most likely that a three person mixture on D16S539 has 3 alleles.

This can be done for all loci at once:

noa <- Pnm_all(m = 3, theta = 0, probs = allele_freqs, locuswise = TRUE)
noa
#>                    1           2         3         4          5            6
#> D16S539 0.0011645502 0.089551483 0.4920981 0.3895294 0.02753405 1.223610e-04
#> D18S51  0.0002318216 0.017959845 0.1779391 0.4378291 0.31153235 5.450770e-02
#> D19S433 0.0035865859 0.089632027 0.3625087 0.3976107 0.13518050 1.148149e-02
#> D21S11  0.0038709572 0.096894566 0.3687696 0.3853717 0.13233905 1.275409e-02
#> D2S1338 0.0000431618 0.006746923 0.1068460 0.3899646 0.39812735 9.827197e-02
#> D3S1358 0.0016039659 0.078199562 0.3939623 0.4258141 0.09768694 2.733108e-03
#> D8S1179 0.0007349290 0.039905625 0.2705804 0.4539819 0.21275810 2.203902e-02
#> FGA     0.0000742453 0.010955567 0.1455096 0.4287449 0.34698332 6.773235e-02
#> TH01    0.0025373680 0.111902320 0.4515490 0.3761236 0.05783065 5.706482e-05
#> vWA     0.0008047420 0.054208046 0.3452015 0.4542519 0.13852872 7.005098e-03

We can also find the convolution and thereby the total number of distinct alleles:

noa <- Pnm_all(m = 3, theta = 0, probs = allele_freqs)
noa
#>            1            2            3            4            5            6 
#> 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 
#>            7            8            9           10           11           12 
#> 0.000000e+00 0.000000e+00 0.000000e+00 2.891086e-32 2.089630e-29 6.726379e-27 
#>           13           14           15           16           17           18 
#> 1.282439e-24 1.625439e-22 1.457361e-20 9.605595e-19 4.777072e-17 1.827088e-15 
#>           19           20           21           22           23           24 
#> 5.455402e-14 1.287742e-12 2.429902e-11 3.702434e-10 4.597777e-09 4.693091e-08 
#>           25           26           27           28           29           30 
#> 3.968035e-07 2.798451e-06 1.656443e-05 8.274188e-05 3.504602e-04 1.263902e-03 
#>           31           32           33           34           35           36 
#> 3.894858e-03 1.028680e-02 2.334381e-02 4.560959e-02 7.684831e-02 1.117952e-01 
#>           37           38           39           40           41           42 
#> 1.405269e-01 1.526853e-01 1.433854e-01 1.163205e-01 8.143643e-02 4.912883e-02 
#>           43           44           45           46           47           48 
#> 2.548638e-02 1.133857e-02 4.311188e-03 1.394979e-03 3.821005e-04 8.802401e-05 
#>           49           50           51           52           53           54 
#> 1.691803e-05 2.685887e-06 3.478319e-07 3.616152e-08 2.955716e-09 1.846961e-10 
#>           55           56           57           58           59           60 
#> 8.484368e-12 2.703293e-13 5.435722e-15 5.774600e-17 2.098567e-19 1.565331e-22

This can be illustrated by a barchart:

 Number of alleles Frequency      
 1                                
 2                                
 3                                
 4                                
 5                                
 6                                
 7                                
 8                                
 9                                
 10                               
 11                               
 12                               
 13                               
 14                               
 15                               
 16                               
 17                               
 18                               
 19                               
 20                               
 21                               
 22                               
 23                               
 24                               
 25                               
 26                               
 27                               
 28                               
 29                               
 30                               
 31                               
 32                |              
 33                ||             
 34                |||||          
 35                ||||||||       
 36                |||||||||||    
 37                |||||||||||||| 
 38                |||||||||||||||
 39                |||||||||||||| 
 40                ||||||||||||   
 41                ||||||||       
 42                |||||          
 43                |||            
 44                |              
 45                               
 46                               
 47                               
 48                               
 49                               
 50                               
 51                               
 52                               
 53                               
 54                               
 55                               
 56                               
 57                               
 58                               
 59                               
 60                               

So it is most likely that a three person mixture has 38 distinct alleles on all loci combined.

Empirical testing of DNA match probabilities

Another relevant questions is how many matches and near-matches there are. This can be calculated as follows:

db_summary <- dbCompare(dbExample, hit = 6, trace = FALSE)
db_summary
#> Summary matrix
#>      partial
#> match     0     1     2     3     4     5     6     7     8     9    10
#>    0    102  1368  7122 21878 44189 59463 54601 34203 13571  3281   353
#>    1    206  2114 10013 26084 43656 47418 34320 15463  4145   472      
#>    2    165  1477  5710 12566 17049 14642  7570  2220   310            
#>    3     72   556  1821  3250  3361  2135   719   116                  
#>    4     22   149   360   493   379   156    34                        
#>    5      6    19    44    41    26     5                              
#>    6      0     2     3     0     0                                    
#>    7      0     0     0     0                                          
#>    8      0     0     0                                                
#>    9      0     0                                                      
#>    10     0                                                            
#> 
#> Profiles with at least 6 matching loci
#>   id1 id2 match partial
#> 1 153 687     6       2
#> 2 625 641     6       2
#> 3 694 855     6       2
#> 4 379 560     6       1
#> 5 422 881     6       1

The hit argument returns pairs of profiles that fully match at hit (here 6) or more loci.

The summary matrix gives the number of pairs mathcing/partially-matching at ((i,j)) loci. For example the row

     partial
match     0     1     2     3     4     5     6     7     8     9    10
   5      6    19    44    41    26     5                              

means that there are 6+19+44+41+26+5 = 141 pairs of profiles matching exactly at 5 loci. Conditional on those 5 matches, there are 6 pairs not matching on the remaining 5 loci, 19 pairs partial matching on 1 locus and not matching on the remaining 4 loci, and so on.

Copy Link

Version

Install

install.packages('DNAtools')

Monthly Downloads

358

Version

0.2-4

License

GPL (>= 2) | file LICENSE

Maintainer

Last Published

March 17th, 2022

Functions in DNAtools (0.2-4)

dbExample

Simulated database with 1,000 individuals
dbCollapse

Collapse m/p output to vector
pContrib_locus

Compute the posterior probabilities for \(\Pr(m|n_0)\) for a given prior \(\Pr(m)\).
optim.relatedness

Estimate theta and the fraction of comparisons between close relatives
pContrib

Compute the posterior probabilities for P(m|n0) for a given prior P(m) and observed vector n0 of locus counts
Pnm_all

The exact distribution of the number of alleles in a m-person DNA mixture
print.dbcompare

Prints the summary matrix
simAlleleFreqs

Simulate Allele Frequencies
plot.dbcompare

Plots the summary matrix
print.dbOptim

Prints the results from optim.relatedness()
genRypeRec

Generates DNA profiles of n individuals.
genTypeRec

Generates DNA profiles of n unrelated individuals for a locus
plot.dbOptim

Plots the fitted object function for estimated familial relationships in the database and theta.
dbCompare

Compare DNA profiles
DNAtools-package

Tools for analysing forensic genetic DNA databases
dbSimulate

Simulate a DNA database
dbVariance

Covariance matrix of cell counts in DNA database comparison
dbExpect

Expected value of cell counts in DNA database comparison
estimatePD

Estimate the drop-out probability based on number of alleles
freqEst

Simple allele frequency estimation