On predicting foldability of a protein from its sequence
Abstract:
Several properties of amino acid sequences corresponding to proteins
that are known to fold are compared to those of randomly generated
sequences and to sequences of intrinsically disordered proteins
in order to find properties that distinguish folding sequences
from the rest.
The properties studied included helix and sheet propensities
from secondary structure prediction, adjacency correlations,
directionality correlations, as well as propensities of
all possible triplets and quadruplets.
Small differences between known folded and random sequences
were observed for the adjacency and directional correlations and
significant differences were seen on the triplet and especially
on the quadruplet propensities.
Based on the differences in the adjacency, triplet or quadruplet
propensities folding scores were defined and used to test
the accuracy of foldability prediction based on these statistics.
The best predictions were obtained from the quadruplet propensities.