Predicting the Secondary Structure of Your Protein
Expression provides an interface to a large range of sophisticated secondary structure
prediction algorithms. This section provides an introduction
to the interface and a brief overview of the different algorithms.
Performing a Prediction
Open your target protein sequence, then select the desired algorithm under Protein Structure Prediction from the Internet
menu. This will open the
appropriate form in the Internet Tool Window. To launch the algorithm, simply push the Submit button. The time it takes before your results are returned depends on
the speed of your internet connection, the network speed on your new or refurbished computers, the length of your target sequence,
the complexity of the
Secondary Structure Prediction Algorithms
Expression presently supports nine different secondary structure prediction algorithms. All computations are performed via the Network Protein Sequence Analysis server (PBIL, France).
The DPM (Double Prediction Method) algorithm uses two approaches to produce the final result - first it predicts the protein structural class and then the secondary structure for the sequence (Deleage and Roux, 1987). The DPM method can be divided into four steps:
- Prediction of the structural class of a protein from AA composition (Nakashima et al., 1986).
- Preliminary secondary structure estimation from a simple algorithm.
- Comparison between the two independent predictions.
- Optimisation of parameters and determination of secondary structure.
DSC (Discrimination of protein Secondary structure Class) is based on dividing secondary structure prediction into the basic concepts and then use of simple and linear statistical methods to combine the concepts for prediction (King and Sternberg, 1996). This makes the prediction method comprehensible and allows the relative importance of the different sources of information used to be measured.
At NPS@, a BLASTP search of your sequence is performed against the SWISS-PROT database. These results are filtered and then aligned by CLUSTALW. The resulting alignment is the input for DSC.
GOR IV is the fourth version of GOR secondary structure prediction methods based on the information theory (Garnier et al., 1996). There is no defined decision constant. GOR IV uses all possible pair frequencies within the window of 17 amino acid residues. After crossvalidation on a data base of 267 proteins, the version IV of GOR has a mean accuracy of 64.4% for a three state prediction (Q3).
The HNN (Hierarchical Neural Network) prediction method can be seen as an improvement on the famous classifier developed by Qian and Sejnowski, and derived from the system NETtalk (Guermeur). As its predecessor, it is made up of two networks: a sequence-to-structure network and a structure-to-structure network. The prediction is thus only based on local information. The improvements mainly deal with two points:
- Technical tricks (recurrent connections, shared weights etc.) have been used to increase the context on which the prediction is made and concomitantly decrease by two orders of magnitude the number of parameters (weights).
- Physico-chemical data have been explicitly incorporated in the predictors used by the structure-to-structure network.
These modifications have significantly improved the error in generalization.
MLRC (Multivariate Linear Regression Combination) is a secondary structure prediction method which combines GOR4, SIMPA96 and SOPMA (Guermeur et al., 1999). It post-processes the outputs of protein secondary structure prediction methods and generates class posterior probability estimates. Experimental results establish that it can increase the recognition rate of methods that provide inhomogeneous scores, even if their individual prediction successes are largely different.
Note: The MLRC algorithm may take several minutes to compute larger sequences (>500 amino acids).
PHD are neural network systems (a sequence-to-structure level and a structure-structure level) to predict secondary structure (PHDsec), relative solvent accessibility (PHDacc) and transmembrane helices (PHDhtm) (Rost and Sander, 1993). The NPS@ server only uses PHDsec. PHDsec focuses on predicting hydrogen bonds. The procedure essentially involves executing a BLASTP search of your sequence, filtering these results and aligning them with CLUSTALW, then using the multiple alignment as the input of the neural network. The PHD prediction done with NPS@ is better than the PHD prediction on the single sequence. But it's not exactly the same and could be a little bit less accurate than the PredictProtein one.
Note: The PHD algorithm may take several minutes to compute larger sequences (>500 amino acids).
PREDATOR is a secondary structure prediction method based on recognition of potentially hydrogen-bonded residues in a single amino acid sequence (Frishman and Argos, 1996).
SIMPA96 is a nearest neighbor secondary structure prediction method (Levin, 1997). It's based on the homologue method described by Levin et al. (1986).
SOPMA (Self-Optimized Prediction Method with Alignment) is based on the homologue method of Levin et al. (1986). The improvement takes place in the fact that SOPMA takes into account information from an alignment of sequences belonging to the same family (Geourjon and Deleage, 1995).
Note: The SOMPA algorithm may take several minutes to compute larger sequences (>500 amino acids).
Deleage G, Roux B (1987). An algorithm for protein secondary structure prediction based on class prediction. Protein Eng, 1(4):289-294.
Frishman D, Argos P (1996). Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. Protein Eng, 9(2):133-142.
Garnier J, Gibrat J-F, Robson B (1996). GOR secondary structure prediction method version IV. Meth Enzymol, 266:540-553
Guermeur Y. Combinaison de classifieurs statistiques, Application a la prediction de structure secondaire des proteines. PhD Thesis.
King RD, and Sternberg MJ (1996). Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci, 5(11):2298-310.
Geourjon C, and Deleage G (1995). SOPMA: Significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Comput Appl Biosci, 11(6):681-684.
Guermeur Y, Geourjon C, Gallinari P, and Deleage G (1999). Improved Performance in Protein Secondary Structure Prediction by Inhomogeneous Score Combination. Bioinformatics, 15(5):413-421.
Levin JM, Robson B, and Garnier J (1986). An algorithm for secondary structure determination in proteins based on sequence similarity. FEBS Lett, 205(2):303-308.
Levin JM (1997). Exploring the limits of nearest neighbour secondary structure prediction. Protein Eng, 7:771-776.
Nakashima et al. (1986). J. Biochem. Tokyo, 99:153-162
Rost B, and Sander C (1993). Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 1993, 232(2):584-99.
Importing Sequences from GenBank and Other Sources
Return to Expression Overview