Predicting the Secondary Structure of Your Protein

Expression provides an interface to a large range of sophisticated secondary structure prediction algorithms. This section provides an introduction to the interface and a brief overview of the different algorithms.

Performing a Prediction

Open your target protein sequence, then select the desired algorithm under Protein Structure Prediction from the Internet menu. This will open the appropriate form in the Internet Tool Window. To launch the algorithm, simply push the Submit button. The time it takes before your results are returned depends on the speed of your internet connection, the network speed on your new or refurbished computers, the length of your target sequence, and the complexity of the algorithm.

Secondary Structure Prediction Algorithms

Expression presently supports nine different secondary structure prediction algorithms. All computations are performed via the Network Protein Sequence Analysis server (PBIL, France).

DPM

The DPM (Double Prediction Method) algorithm uses two approaches to produce the final result - first it predicts the protein structural class and then the secondary structure for the sequence (Deleage and Roux, 1987). The DPM method can be divided into four steps:

Prediction of the structural class of a protein from AA composition (Nakashima et al., 1986).
Preliminary secondary structure estimation from a simple algorithm.
Comparison between the two independent predictions.
Optimisation of parameters and determination of secondary structure.

DSC

DSC (Discrimination of protein Secondary structure Class) is based on dividing secondary structure prediction into the basic concepts and then use of simple and linear statistical methods to combine the concepts for prediction (King and Sternberg, 1996). This makes the prediction method comprehensible and allows the relative importance of the different sources of information used to be measured.

At NPS@, a BLASTP search of your sequence is performed against the SWISS-PROT database. These results are filtered and then aligned by CLUSTALW. The resulting alignment is the input for DSC.

GORIV

GOR IV is the fourth version of GOR secondary structure prediction methods based on the information theory (Garnier et al., 1996). There is no defined decision constant. GOR IV uses all possible pair frequencies within the window of 17 amino acid residues. After crossvalidation on a data base of 267 proteins, the version IV of GOR has a mean accuracy of 64.4% for a three state prediction (Q3).

HNN

The HNN (Hierarchical Neural Network) prediction method can be seen as an improvement on the famous classifier developed by Qian and Sejnowski, and derived from the system NETtalk (Guermeur). As its predecessor, it is made up of two networks: a sequence-to-structure network and a structure-to-structure network. The prediction is thus only based on local information. The improvements mainly deal with two points:

Technical tricks (recurrent connections, shared weights etc.) have been used to increase the context on which the prediction is made and concomitantly decrease by two orders of magnitude the number of parameters (weights).
Physico-chemical data have been explicitly incorporated in the predictors used by the structure-to-structure network.

These modifications have significantly improved the error in generalization.

MLRC

MLRC (Multivariate Linear Regression Combination) is a secondary structure prediction method which combines GOR4, SIMPA96 and SOPMA (Guermeur et al., 1999). It post-processes the outputs of protein secondary structure prediction methods and generates class posterior probability estimates. Experimental results establish that it can increase the recognition rate of methods that provide inhomogeneous scores, even if their individual prediction successes are largely different.

Note: The MLRC algorithm may take several minutes to compute larger sequences (>500 amino acids).

PHD

PHD are neural network systems (a sequence-to-structure level and a structure-structure level) to predict secondary structure (PHDsec), relative solvent accessibility (PHDacc) and transmembrane helices (PHDhtm) (Rost and Sander, 1993). The NPS@ server only uses PHDsec. PHDsec focuses on predicting hydrogen bonds. The procedure essentially involves executing a BLASTP search of your sequence, filtering these results and aligning them with CLUSTALW, then using the multiple alignment as the input of the neural network. The PHD prediction done with NPS@ is better than the PHD prediction on the single sequence. But it's not exactly the same and could be a little bit less accurate than the PredictProtein one.

Note: The PHD algorithm may take several minutes to compute larger sequences (>500 amino acids).

Predator

PREDATOR is a secondary structure prediction method based on recognition of potentially hydrogen-bonded residues in a single amino acid sequence (Frishman and Argos, 1996).

SIMPA96

SIMPA96 is a nearest neighbor secondary structure prediction method (Levin, 1997). It's based on the homologue method described by Levin et al. (1986).

SOMPA

SOPMA (Self-Optimized Prediction Method with Alignment) is based on the homologue method of Levin et al. (1986). The improvement takes place in the fact that SOPMA takes into account information from an alignment of sequences belonging to the same family (Geourjon and Deleage, 1995).

Note: The SOMPA algorithm may take several minutes to compute larger sequences (>500 amino acids).

References

Deleage G, Roux B (1987). An algorithm for protein secondary structure prediction based on class prediction. Protein Eng, 1(4):289-294.

Frishman D, Argos P (1996). Incorporation of non-local interactions in protein secondary structure prediction from the amino acid sequence. Protein Eng, 9(2):133-142.

Garnier J, Gibrat J-F, Robson B (1996). GOR secondary structure prediction method version IV. Meth Enzymol, 266:540-553

Guermeur Y. Combinaison de classifieurs statistiques, Application a la prediction de structure secondaire des proteines. PhD Thesis.

King RD, and Sternberg MJ (1996). Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci, 5(11):2298-310.

Geourjon C, and Deleage G (1995). SOPMA: Significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments. Comput Appl Biosci, 11(6):681-684.

Guermeur Y, Geourjon C, Gallinari P, and Deleage G (1999). Improved Performance in Protein Secondary Structure Prediction by Inhomogeneous Score Combination. Bioinformatics, 15(5):413-421.

Levin JM, Robson B, and Garnier J (1986). An algorithm for secondary structure determination in proteins based on sequence similarity. FEBS Lett, 205(2):303-308.

Levin JM (1997). Exploring the limits of nearest neighbour secondary structure prediction. Protein Eng, 7:771-776.

Nakashima et al. (1986). J. Biochem. Tokyo, 99:153-162

Rost B, and Sander C (1993). Prediction of protein secondary structure at better than 70% accuracy. J Mol Biol 1993, 232(2):584-99.

Importing Sequences from GenBank and Other Sources

Return to Expression Overview

Overview
Sequence Annotation
Graphical Map
Restriction Analysis
Primer Design
ORF Prediction
Pattern Finding
Reverse Translation
GenBank Searching
Pattern Identification
Multiple Alignment
DNA/Protein Calculator