Learn R Programming

genomes (version 2.16.0)

proks: Prokaryotic genomes at NCBI

Description

Prokaryotic genome sequencing projects at NCBI.

Usage

data(proks)

Arguments

Format

A genomes data frame with observations on the following 23 variables.
pid
BioProject id
name
Organism name
status
Sequencing status
released
First public sequence release
taxid
Taxonomy id
acc
BioProject Accession number
group
Phylum
subgroup
Class level
size
Total length of DNA (Mb)
gc
Percent GC (guanine or cytosine)
refseq
Refseq chromosome sequence accessions
insdc
GenBank chromosome sequence accessions
plasmid.refseq
Refseq plasmid sequence accessions
plasmid.insdc
GenBank plasmid sequence accessions
wgs
Four-letter WGS Accession prefix followed by version
scaffolds
Number of scaffolds/contigs
genes
Number of genes
proteins
Number of proteins
modified
Last modification date
center
Sequencing center
biosample
BioSample Accession number
assembly
Assembly Accession number
reference
Reference or representative genome

Details

BioProject IDs are no longer unique and the table was modified on Nov 1, 2013 to include BioSample and Assembly accessions. See email on NCBI announcement regarding bacterial strain-level TaxID management for details

Examples

Run this code
data(proks)
proks
#single row 
t(proks[1,])
class(proks)
attributes(proks)[c("date","url")] 
summary(proks)
## check for missing release dates
table2(proks$status,!is.na(proks$wgs), dnn=list("Status", "Has WGS acc?"))
plot(proks)
plotby(proks, log='y', las=1, top=2)
hist(proks$size[proks$size<15], br=50, main="", col="blue", xlab="Size (Mb)")

## download recent table from NCBI
## Not run: update(proks) 


Run the code above in your browser using DataLab