Structure: Secondary structure of the protein (E for beta-sheet, H for helix, _ for coil).
Parameters: Additional parameters for neural networks (to be ignored).
Biophysical_Constants: Biophysical constants (to be ignored).
Details
The dataset is used for predicting protein secondary structures from amino acid sequences. The first few numbers in each sequence are parameters for neural networks and should be ignored. The '<' symbol is used as a spacer between proteins and to mark the beginning and end of sequences.