core.find: Identification of Invariant Core Positions

Description

Perform iterated rounds of structural superposition to identify the most invariant region in an aligned set of protein structures.

Usage

core.find(...)
"core.find"(pdbs, shortcut = FALSE, rm.island = FALSE, verbose = TRUE, stop.at = 15, stop.vol = 0.5, write.pdbs = FALSE, outpath="core_pruned", ncore = 1, nseg.scale = 1, ...)
"core.find"(xyz, ...)
"core.find"(pdb, verbose=TRUE, ...)

Arguments

pdbs

a numeric matrix of aligned C-alpha xyz Cartesian coordinates. For example an alignment data structure obtained with read.fasta.pdb or pdbaln.

shortcut

if TRUE, remove more than one position at a time.

rm.island

remove isolated fragments of less than three residues.

verbose

logical, if TRUE a “core\_pruned” directory containing ‘core structures’ for each iteraction is written to the current directory.

stop.at

minimal core size at which iterations should be stopped.

stop.vol

minimal core volume at which iterations should be stopped.

write.pdbs

logical, if TRUE core coordinate files, containing only core positions for each iteration, are written to a location specified by outpath.

outpath

character string specifying the output directory when write.pdbs is TRUE.

ncore

number of CPU cores used to do the calculation. ncore>1 requires package ‘parallel’ installed.

nseg.scale

split input data into specified number of segments prior to running multiple core calculation. See fit.xyz.

xyz

a numeric matrix of xyz Cartesian coordinates, e.g. obtained from read.dcd or read.ncdf.

pdb

an object of type pdb as obtained from function read.pdb with multiple frames (>=4) stored in its xyz component. Note that the function will attempt to identify C-alpha and phosphate atoms (for protein and nucleic acids, respectively) in which the calculation should be based.

...

arguments passed to and from functions.

Value

volume: total core volume at each fitting iteration/round.
length: core length at each round.
resno: residue number of core residues at each round (taken from the first aligned structure) or, alternatively, the numeric index of core residues at each round.
step.inds: atom indices of core atoms at each round.
atom: atom indices of core positions in the last round.
xyz: xyz indices of core positions in the last round.
c1A.atom: atom indices of core positions with a total volume under 1 Angstrom\^3.
c1A.xyz: xyz indices of core positions with a total volume under 1 Angstrom\^3.
c1A.resno: residue numbers of core positions with a total volume under 1 Angstrom\^3.
c0.5A.atom: atom indices of core positions with a total volume under 0.5 Angstrom\^3.
c0.5A.xyz: xyz indices of core positions with a total volume under 0.5 Angstrom\^3.
c0.5A.resno: residue numbers of core positions with a total volume under 0.5 Angstrom\^3.

Details

This function attempts to iteratively refine an initial structural superposition determined from a multiple alignment. This involves iterated rounds of superposition, where at each round the position(s) displaying the largest differences is(are) excluded from the dataset. The spatial variation at each aligned position is determined from the eigenvalues of their Cartesian coordinates (i.e. the variance of the distribution along its three principal directions). Inspired by the work of Gerstein et al. (1991, 1995), an ellipsoid of variance is determined from the eigenvalues, and its volume is taken as a measure of structural variation at a given position.

Optional “core PDB files” containing core positions, upon which superposition is based, can be written to a location specified by outpath by setting write.pdbs=TRUE. These files are useful for examining the core filtering process by visualising them in a graphics program.

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.

Gerstein and Altman (1995) J. Mol. Biol. 251, 161--175.

Gerstein and Chothia (1991) J. Mol. Biol. 220, 133--149.

Examples

Run this code

## Not run: 
# ##-- Generate a small kinesin alignment and read corresponding structures
# pdbfiles <- get.pdb(c("1bg2","2ncd","1i6i","1i5s"), URLonly=TRUE)
# pdbs <- pdbaln(pdbfiles)
# 
# ##-- Find 'core' positions
# core <- core.find(pdbs)
# plot(core)
# 
# ##-- Fit on these relatively invarient subset of positions 
# #core.inds <- print(core, vol=1)
# core.inds <- print(core, vol=0.5)
# xyz <- pdbfit(pdbs, core.inds, outpath="corefit_structures")
# 
# ##-- Compare to fitting on all equivalent positions
# xyz2 <- pdbfit(pdbs)
# 
# ## Note that overall RMSD will be higher but RMSF will
# ##  be lower in core regions, which may equate to a
# ##  'better fit' for certain applications
# gaps <- gap.inspect(pdbs$xyz)
# rmsd(xyz[,gaps$f.inds])
# rmsd(xyz2[,gaps$f.inds])
# 
# plot(rmsf(xyz[,gaps$f.inds]), typ="l", col="blue", ylim=c(0,9))
# points(rmsf(xyz2[,gaps$f.inds]), typ="l", col="red")
# ## End(Not run)

## Not run: 
# ##-- Run core.find() on a multimodel PDB file
# pdb <- read.pdb('1d1d', multi=TRUE)
# core <- core.find(pdb)
# 
# ##-- Run core.find() on a trajectory
# trtfile <- system.file("examples/hivp.dcd", package="bio3d")
# trj <- read.dcd(trtfile)
# 
# ## Read the starting PDB file to determine atom correspondence
# pdbfile <- system.file("examples/hivp.pdb", package="bio3d")
# pdb <- read.pdb(pdbfile)
# 
# ## select calpha coords from a manageable number of frames
# ca.ind <- atom.select(pdb, "calpha")$xyz
# frames <- seq(1, nrow(trj), by=10)
# 
# core <- core.find( trj[frames, ca.ind], write.pdbs=TRUE )
# 
# ## have a look at the various cores "vmd -m core_pruned/*.pdb"
# 
# ## Lets use a 6A^3 core cutoff
# inds <- print(core, vol=6)
# write.pdb(xyz=pdb$xyz[inds$xyz],resno=pdb$atom[inds$atom,"resno"], file="core.pdb")
# 
# 
# ##- Fit trj onto starting structure based on core indices
# xyz <- fit.xyz( fixed = pdb$xyz,
#                mobile = trj,
#                fixed.inds  = inds$xyz,
#                mobile.inds = inds$xyz)
# 
# ##write.pdb(pdb=pdb, xyz=xyz, file="new_trj.pdb")
# ##write.ncdf(xyz, "new_trj.nc")
# 
# ## End(Not run)

Run the code above in your browser using DataLab