Learn R Programming

LFM (version 0.3.0)

protein: Protein Secondary Structure Data

Description

This dataset contains protein sequences and their corresponding secondary structures, including beta-sheets (E), helices (H), and coils (_).

Usage

protein

Arguments

Format

A data frame with multiple rows and columns representing protein sequences and their secondary structures.

  • Sequence: Amino acid sequence (using 3-letter codes).

  • Structure: Secondary structure of the protein (E for beta-sheet, H for helix, _ for coil).

  • Parameters: Additional parameters for neural networks (to be ignored).

  • Biophysical_Constants: Biophysical constants (to be ignored).

Details

The dataset is used for predicting protein secondary structures from amino acid sequences. The first few numbers in each sequence are parameters for neural networks and should be ignored. The '<' symbol is used as a spacer between proteins and to mark the beginning and end of sequences.

Examples

Run this code
# Load the dataset
data(protein)

# Print the first few rows of the dataset
print(head(protein))

Run the code above in your browser using DataLab