Learn R Programming

rPDBapi: A Comprehensive R Package Interface for Accessing the Protein Data Bank

Introduction

rPDBapi is an R package designed to provide seamless access to the RCSB Protein Data Bank (PDB). It simplifies the retrieval and analysis of 3D structural data of large biological molecules, essential for bioinformatics and structural biology research. This package leverages the PDB's XML-based API to facilitate custom queries, data retrieval, and advanced search capabilities within the R programming environment.

Features

  • User-Friendly Interface: Simplifies access to PDB data for the R community.
  • Custom Queries: Streamlines the process of crafting custom queries for efficient data retrieval.
  • Advanced Search Capabilities: Includes specialized search functions for PubMed IDs, organisms, experimental methods, protein structure similarities, and more.
  • Data Retrieval: Facilitates downloading of PDB files in various formats and extraction of FASTA sequences.
  • Integration with R: Provides functions for data manipulation and analysis directly within R, enhancing research workflows.

Installation

You can install the stable version of rPDBapi from CRAN:

install.packages("rPDBapi", repos = "http://cran.us.r-project.org")

To install the development version from GitHub:

devtools::install_github("selcukorkmaz/rPDBapi")

Usage

Loading the Package

library(rPDBapi)

Retrieving PDB IDs Retrieve PDB IDs related to a specific term, such as "hemoglobin":

pdbs <- query_search(search_term = "hemoglobin")
head(pdbs)

Advanced Searches Search by PubMed ID:

pdbs <- query_search(search_term = 32453425, query_type = "PubmedIdQuery")
pdbs

Search by source organism:

pdbs <- query_search(search_term = '7227', query_type = 'TreeEntityQuery')
head(pdbs)

Search by experimental method:

pdbs <- query_search(search_term = 'SOLID-STATE NMR', query_type='ExpTypeQuery')
head(pdbs)

Data Retrieval Fetch data based on user-defined IDs and properties:

properties <- list(rcsb_entry_info = c("molecular_weight"), exptl = "method", rcsb_accession_info = "deposit_date")
ids <- query_search("CRISPR")
df <- data_fetcher(id = ids, data_type = "ENTRY", properties = properties, return_as_dataframe = TRUE)
df

Describing Chemical Compounds Retrieve comprehensive descriptions of chemical compounds:

chem_desc <- describe_chemical('ATP')
chem_desc$rcsb_chem_comp_descriptor$smiles

Retrieving PDB Files Download PDB files in various formats:

pdb_file <- get_pdb_file(pdb_id = "4HHB", filetype = "cif")
head(pdb_file$atom)

Additional Functions get_info: Retrieve detailed information about a specific PDB entry. get_fasta_from_rcsb_entry: Fetch FASTA sequences for specified PDB entry IDs.

Documentation

For more detailed examples and usage, please refer to the package documentation.

Authors

  • Selcuk Korkmaz - Trakya University, Department of Biostatistics
  • Bilge Eren Yamasan - Trakya University, Department of Biophysics

License

This package is licensed under the MIT License.

Copy Link

Version

Install

install.packages('rPDBapi')

Monthly Downloads

207

Version

2.1.1

License

GPL (>= 2)

Maintainer

Selcuk Korkmaz

Last Published

October 19th, 2024

Functions in rPDBapi (2.1.1)

SequenceOperator

Create a Sequence Operator for Sequence-Based Searches
describe_chemical

Describe Chemical Compound from RCSB PDB
StructureOperator

Create a Structure Operator for Structure-Based Searches
handle_api_errors

Handle API Errors
return_data_as_dataframe

Convert RCSB PDB Response Data into a Dataframe
rPDBapi-package

rPDBapi: A Comprehensive Interface for Accessing the Protein Data Bank
infer_search_service

Infer the Appropriate Search Service for RCSB PDB Queries
get_pdb_api_url

Generate a PDB API URL
get_pdb_file

Download and Process PDB Files from the RCSB Database
parse_fasta_text_to_list

Helper Function: Parse FASTA Text to List Grouped by Header
parse_response

Parse API Response
SeqMotifOperator

Create a Sequence Motif Operator for RCSB PDB Searches
find_results

Retrieve Specific Fields for Search Results from RCSB PDB
generate_json_query

Generate a JSON Query for RCSB PDB Data Retrieval
get_info

Retrieve Information for a Given PDB ID
get_fasta_from_rcsb_entry

Retrieve FASTA Sequence from PDB Entry or Specific Chain
fetch_data

Fetch Data from RCSB PDB Using a JSON Query
find_papers

Search for and Retrieve Paper Titles from PDB
walk_nested_dict

Recursively Walk Through a Nested Dictionary
send_api_request

Send API Request to a Specified URL
search_graphql

Perform a GraphQL Query to RCSB PDB
perform_search

Perform a Search in the RCSB PDB
query_search

Search Query Function
DefaultOperator

Create a Default Search Operator
QueryGroup

Create a Grouped Query Object for RCSB PDB Searches
ContainsWordsOperator

Create a Contains Words Search Operator
ExistsOperator

Create an Existence Search Operator
ComparisonOperator

Create a Comparison Search Operator
QueryNode

Create a Query Node for RCSB PDB Searches
InOperator

Create an Inclusion Search Operator
ContainsPhraseOperator

Create a Contains Phrase Search Operator
ChemicalOperator

Create a Chemical Search Operator for SMILES/InChI Descriptors
ExactMatchOperator

Create an Exact Match Search Operator
ScoredResult

Create a Scored Result Object for PDB Searches
data_fetcher

Fetch RCSB PDB Data Based on Specified Criteria
RangeOperator

Create a Range Search Operator
RequestOptions

Define Request Options for RCSB PDB Search Queries
add_property

Add or Merge Properties for RCSB PDB Data Fetching
autoresolve_sequence_type

Automatically Determine the Sequence Type