- peptide_data
a data frame that contains the input columns to this function. If structure
or prediction files should be fetched automatically, please provide column names to the following
arguments: uniprot_id, pdb_id, chain, auth_seq_id,
map_value. If no PDB structure for a protein is available the pdb_id
and chain
column should contain NA at these positions. If a structure or prediction file is provided in the
structure_file
argument, this data frame should only contain information associated with
the provided structure. In case of a user provided structure, column names should be provided to
the following arguments: uniprot_id, chain, auth_seq_id, map_value.
- uniprot_id
a character column in the peptide_data
data frame that contains UniProt
identifiers for a corresponding peptide, protein region or amino acid.
- pdb_id
a character column in the peptide_data
data frame that contains PDB
identifiers for structures in which a corresponding peptide, protein region or amino acid is found.
If a protein prediction should be fetched from AlphaFold, this column should contain NA. This
column is not required if a structure or prediction file is provided in the structure_file
argument.
- chain
a character column in the peptide_data
data frame that contains the name of
the chain from the PDB structure in which the peptide, protein region or amino acid is found.
If a protein prediction should be fetched from AlphaFold, this column should contain NA. If an
AlphaFold prediction is provided to the structure_file
argument the chain should be
provided as usual (All AlphaFold predictions only have chain A). Important: please provide
the author defined chain definitions for both ".cif" and ".pdb" files. When the output of the
find_peptide_in_structure
function is used as the input for this function, this
corresponds to the auth_asym_id
column.
- auth_seq_id
optional, a character (or numeric) column in the peptide_data
data frame
that contains semicolon separated positions of peptides, protein regions or amino acids in the
corresponding PDB structure or AlphaFold prediction. This information can be obtained from the
find_peptide_in_structure
function. The corresponding column in the output is called
auth_seq_id
. In case of AlphaFold predictions, UniProt positions should be used. If
signal positions and not stretches of amino acids are provided, the column can be numeric and
does not need to contain the semicolon separator.
- map_value
a numeric column in the peptide_data
data frame that contains a value
associated with each peptide, protein region or amino acid. If one start to end position pair
has multiple different map values, the maximum will be used. This value will be displayed as a
colour gradient when mapped onto the structure. The value can for example be the fold change,
p-value or score associated with each peptide, protein region or amino acid (selection). If
the selections should be displayed with just one colour, the value in this column should be
the same for every selection. For the mapping, values are scaled between 50 and 100. Regions
in the structure that do not map any selection receive a value of 0. If an amino acid position
is associated with multiple mapped values, e.g. from different peptides, the maximum mapped
value will be displayed.
- file_format
a character vector containing the file format of the structure that will be
fetched from the database for the PDB identifiers provided in the pdb_id
column. This
can be either ".cif" or ".pdb". The default is ".cif"
. We recommend using ".cif" files
since every structure contains a ".cif" file but not every structure contains a ".pdb" file.
Fetching and mapping onto ".cif" files takes longer than for ".pdb" files. If a structure file
is provided in the structure_file
argument, the file format is detected automatically
and does not need to be provided.
- scale_per_structure
a logical value that specifies if scaling should be performed for
each structure independently (TRUE) or over the whole data set (FALSE). The default is TRUE,
which scales the scores of each structure independently so that each structure has a score
range from 50 to 100.
- export_location
optional, a character argument specifying the path to the location in
which the fetched and altered structure files should be saved. If left empty, they will be
saved in the current working directory. The location should be provided in the following
format "folderA/folderB".
- structure_file
optional, a character argument specifying the path to the location and
name of a structure file in ".cif" or ".pdb" format. If a structure is provided the peptide_data
data frame should only contain mapping information for this structure.
- show_progress
a logical, if show_progress = TRUE
, a progress bar will be shown
(default is TRUE).