On predicting foldability of a protein from its sequence
Abstract: Several properties of amino acid sequences corresponding to proteins that are known to fold are compared to those of randomly generated sequences and to sequences of intrinsically disordered proteins in order to find properties that distinguish folding sequences from the rest. The properties studied included helix and sheet propensities from secondary structure prediction, adjacency correlations, directionality correlations, as well as propensities of all possible triplets and quadruplets. Small differences between known folded and random sequences were observed for the adjacency and directional correlations and significant differences were seen on the triplet and especially on the quadruplet propensities. Based on the differences in the adjacency, triplet or quadruplet propensities folding scores were defined and used to test the accuracy of foldability prediction based on these statistics. The best predictions were obtained from the quadruplet propensities.