Learn R Programming

seqinr (version 3.1-2)

s2n: simple numerical encoding of a DNA sequence.

Description

By default, if no levels arguments is provided, this function will just code your DNA sequence in integer values following the lexical order (a > c > g > t), that is 0 for "a", 1 for "c", 2 for "g", 3 for "t" and NA for ambiguous bases.

Usage

s2n(seq, levels = s2c("acgt"), base4 = TRUE, forceToLower = TRUE)

Arguments

seq
the sequence as a vector of single chars
levels
allowed char values, by default a, c, g and t
base4
if TRUE the numerical encoding will start at O, if FALSE at 1
forceToLower
if TRUE the sequence is forced to lower case caracters

Value

  • a vector of integers

References

citation("seqinr")

See Also

n2s, factor, unclass

Examples

Run this code
##
## Example of default behaviour:
##
urndna <- s2c("acgt")
seq <- sample( urndna, 100, replace = TRUE ) ; seq
s2n(seq)
##
## How to deal with RNA:
##
urnrna <- s2c("acgt")
seq <- sample( urnrna, 100, replace = TRUE ) ; seq
s2n(seq)
##
## what happens with unknown characters:
##
urnmess <- c(urndna,"n")
seq <- sample( urnmess, 100, replace = TRUE ) ; seq
s2n(seq)
##
## How to change the encoding for unknown characters:
##
tmp <- s2n(seq) ; tmp[is.na(tmp)] <- -1; tmp
##
## Simple sanity check:
##
stopifnot(all(s2n(s2c("acgt")) == 0:3))

Run the code above in your browser using DataLab