proks: Prokaryotic genomes at NCBI

Description

Prokaryotic genome sequencing projects at NCBI.

Usage

data(proks)

Arguments

Format

A genomes data frame with observations on the following 23 variables.

pid: BioProject id
name: Organism name
status: Sequencing status
released: First public sequence release
taxid: Taxonomy id
acc: BioProject Accession number
group: Phylum
subgroup: Class level
size: Total length of DNA (Mb)
gc: Percent GC (guanine or cytosine)
refseq: Refseq chromosome sequence accessions
insdc: GenBank chromosome sequence accessions
plasmid.refseq: Refseq plasmid sequence accessions
plasmid.insdc: GenBank plasmid sequence accessions
wgs: Four-letter WGS Accession prefix followed by version
scaffolds: Number of scaffolds/contigs
genes: Number of genes
proteins: Number of proteins
modified: Last modification date
center: Sequencing center
biosample: BioSample Accession number
assembly: Assembly Accession number
reference: Reference or representative genome

Source

downloaded from ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt

Details

BioProject IDs are no longer unique and the table was modified on Nov 1, 2013 to include BioSample and Assembly accessions. See email on NCBI announcement regarding bacterial strain-level TaxID management for details

Examples

Run this code

data(proks)
proks
#single row 
t(proks[1,])
class(proks)
attributes(proks)[c("date","url")] 
summary(proks)
## check for missing release dates
table2(proks$status,!is.na(proks$wgs), dnn=list("Status", "Has WGS acc?"))
plot(proks)
plotby(proks, log='y', las=1, top=2)
hist(proks$size[proks$size<15], br=50, main="", col="blue", xlab="Size (Mb)")

## download recent table from NCBI
## Not run: update(proks)

Run the code above in your browser using DataLab