pdb2aln: Align a PDB structure to an existing alignment

Description

Extract the sequence from a PDB file and align it to an existing multiple sequence alignment that you wish keep intact.

Usage

pdb2aln(aln, pdb, id="seq.pdb", aln.id=NULL, exefile="muscle", file="pdb2aln.fa")

Arguments

aln

an alignment list object with id and ali components, similar to that generated by read.fasta, read.fasta.pdb, and

pdb

the PDB object.

name for the PDB sequence in the new alignment.

aln.id

id of the sequence in the original alignment that is closest to the sequence of the PDB structure.

exefile

file path to the MUSCLE program on your system (i.e. how is MUSCLE invoked).

file

file name for outputing the new alignment.

Value

Return a list object with three components:
idsequence names as identifers.
alian alignment character matrix with a row per sequence and a column per equivalent aminoacid/nucleotide.
refan integer matrix with the first row the indices of original alignment and the second CA indices of the PDB structure.

Details

This function aligns a PDB sequence to an alignment and stores the mappings between the new and existing alignments, as well as the mappings between new alignment and the PDB atomic indices.

The function can be used to perform the routine procedure of finding the indices of CA atoms in the PDB structure, the residue numbers of which are equivalent to the predefined positions in the existing alignment. For example, when we project a MD simulation trajectory onto the low dimensional subspace derived from the PCA of cystallographic structures, we need first align the sequence of the simulated protein to the original alignment of crystal structures (or find out the identical sequence in the alignment if the simulation started from one of the crystal structures). Then residues of the simulation system equivalent to those used for fitting crystal structures and performing PCA can be identified. The corresponding CA atoms to be used for fitting and projecting the trajectory are then obtained by mapping the equivalent residues onto the topology of the trajectory.

When aln.id is provided, the function will do pairwise alignment between the PDB sequence and the sequence in the alignment aln with id containing aln.id. This is the best way to use the function if the simulated protein has an identical or very similar sequence to one of the sequences in the alignment aln.

References

Grant, B.J. et al. (2006) Bioinformatics 22, 2695--2696.

Examples

Run this code

##--- Read aligned PDB coordinates (CA only)
aln  <- read.fasta(system.file("examples/kif1a.fa",package="bio3d"))
pdbs <- read.fasta.pdb(aln)

##--- Read PDB coordinate for a new structure (all atoms)
id <- get.pdb("2kin", URLonly=TRUE)
pdb <- read.pdb(id)

# map the non-gap positions
gap.inds <- gap.inspect(pdbs$resno)
naln <- pdb2aln(aln=pdbs, pdb=pdb, id=id)
ninds <- which(naln$ref["ali.pos", ] %in% gap.inds$f.inds)
npc.inds <- naln$ref["ca.inds", ninds]

# If gaps are found in PDB sequence with the predefined indices,
# redefine the non-gap positions
ngap.f.inds <- gap.inds$f.inds[!is.na(npc.inds)]
npc.inds <- npc.inds[!is.na(npc.inds)]

##--- fit the atomic coordinates to the aligned X-ray structure
xyz <- fit.xyz(pdbs$xyz[1,], pdb$xyz, atom2xyz(ngap.f.inds), atom2xyz(npc.inds))

## seq2aln(pdbseq(pdb), aln, id = id)
## do we get the same result

Run the code above in your browser using DataLab