read.pdb: Read PDB File

Description

Read a Protein Data Bank (PDB) coordinate file.

Usage

read.pdb(file, maxlines = -1, multi = FALSE, rm.insert = FALSE,
         rm.alt = TRUE, verbose = TRUE)
## S3 method for class 'pdb':
print(x, \dots)
## S3 method for class 'pdb':
summary(object, printseq=FALSE, \dots)

Arguments

file

a single element character vector containing the name of the PDB file to be read, or the four letter PDB identifier for online file access.

maxlines

the maximum number of lines to read before giving up with large files. By default if will read up to the end of input on the connection.

multi

logical, if TRUE multiple ATOM records are read for all models in multi-model files and their coordinates returned.

rm.insert

logical, if TRUE PDB insert records are ignored.

rm.alt

logical, if TRUE PDB alternate records are ignored.

verbose

print details of the reading process.

a PDB structure object obtained from read.pdb.

object

a PDB structure object obtained from read.pdb.

printseq

logical, if TRUE the PDB ATOM sequence will be printed to the screen. See also pdbseq.

...

additional arguments to print.

Value

Returns a list of class "pdb" with the following components:
atoma data.frame containing all atomic coordinate ATOM and HETATM data, with a row per ATOM/HETATM and a column per record type. See below for details of the record type naming convention (useful for accessing columns).
helixstart, end and length of H type sse, where start and end are residue numbers resno.
sheetstart, end and length of E type sse, where start and end are residue numbers resno.
seqressequence from SEQRES field.
xyza numeric vector (or matrix for multi-model PDB files) of ATOM and HETATM coordinate data.
calphalogical vector with length equal to nrow(atom) with TRUE values indicating a C-alpha elety.
callthe matched call.

Details

maxlines may be set so as to restrict the reading to a portion of input files. Note that the preferred means of reading large multi-model files is via binary DCD or NetCDF format trajectory files (see the read.dcd and read.ncdf functions).

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696. For a description of PDB format (version3.3) see: http://www.wwpdb.org/documentation/format33/v3.3.html.

Examples

Run this code

## Read a PDB file from the RCSB online database
#pdb <- read.pdb("4q21")

## Read a PDB file from those included with the package
pdb <- read.pdb( system.file("examples/1hel.pdb", package="bio3d") )

## Print a brief composition summary
pdb

## Examine the storage format (or internal *str*ucture)
str(pdb)

## Print data for the first four atom
pdb$atom[1:4,]

## Print some coordinate data
head(pdb$atom[, c("x","y","z")])

## Or coordinates as a numeric vector
head(pdb$xyz)

## Print C-alpha coordinates (can also use 'atom.select' function)
head(pdb$atom[pdb$calpha, c("resid","elety","x","y","z")])
inds <- atom.select(pdb, elety="CA")
head( pdb$atom[inds$atom, ] )

## The atom.select() function returns 'indices' (row numbers)
## that can be used for accessing subsets of PDB objects, e.g.
inds <- atom.select(pdb,"ligand")
pdb$atom[inds$atom,]
pdb$xyz[inds$xyz]

## See the help page for atom.select() function for more details.


## Print SSE data for helix and sheet,
##  see also dssp() and stride() functions
print.sse(pdb)
pdb$helix
pdb$sheet$start
  
## Print SEQRES data
pdb$seqres

## SEQRES as one letter code
aa321(pdb$seqres)

## Where is the P-loop motif in the ATOM sequence
inds.seq <- motif.find("G....GKT", pdbseq(pdb))
pdbseq(pdb)[inds.seq]

## Where is it in the structure
inds.pdb <- atom.select(pdb,resno=inds.seq, elety="CA")
pdb$atom[inds.pdb$atom,]
pdb$xyz[inds.pdb$xyz]

## View in interactive 3D mode
#view(pdb)

Run the code above in your browser using DataLab