Learn R Programming

Biostrings (version 2.40.2)

misc: Some miscellaneous stuff

Description

Some miscellaneous stuff.

Usage

N50(csizes)

Arguments

csizes
A vector containing the contig sizes.

Value

N50: The N50 value as an integer.

The N50 contig size

Definition The N50 contig size of an assembly (aka the N50 value) is the size of the largest contig such that the contigs larger than that have at least 50% the bases of the assembly. How is it calculated? It is calculated by adding the sizes of the biggest contigs until you reach half the total size of the contigs. The N50 value is then the size of the contig that was added last (i.e. the smallest of the big contigs covering 50% of the genome). What for? The N50 value is a standard measure of the quality of a de novo assembly.

See Also

XStringSet-class

Examples

Run this code
  # Generate 10 random contigs of sizes comprised between 100 and 10000:
  my.contig <- DNAStringSet(
                 sapply(
                   sample(c(100:10000), 10),
                   function(size)
                       paste(sample(DNA_BASES, size, replace=TRUE), collapse="")
                 )
               )

  # Get their sizes:
  my.size <- width(my.contig)

  # Calculate the N50 value of this set of contigs:
  my.contig.N50 <- N50(my.size)

Run the code above in your browser using DataLab