Finding Patterns and Motifs in DNA or Protein Sequences
Using the powerful Pattern Finder tool, it is possible to rapidly find not only sequences identical to your query, but also near matches. In fact, the Patter Finder is so flexible, you can even use it to identify similar genes in a genome or genome section. Additionally, the tool can automatically search through all six reading frames of a DNA sequence for an amino acid query.
Using the Pattern Finder
To search for a pattern, simply open the Pattern Finder tool and enter your query into the box. Depending on the speed of your computer and the length of your sequence (less than 50,000 bp on a Pentium III 450), the Find button is automatically pushed in and the search results are calculated real-time and displayed instantly. For longer target sequences (or slower computers), you must push the Find button to execute the search. The match slider is used to to set the threshhold percentage match - ie the percentage of the query sequence which must match with the target. A similarity matrix (BLOSUM62) is employed when searching for an amino acid sequence, to take into account the degree of chemical similarity between the different amino acids.
The search results are, by default, sorted by percentage match. Like the other tools in Expression, to sort the results by any of the other fields, simply click on the column heading. Clicking on the records in the search results, automatically highlights the corresponding region in the Sequence Editor and the Sequence Map, as well as aligning the sequence directly under your query.
Note that the Pattern Finder does not provide any specific DNA or protein motifs and signatures. To examine a protein sequence for known protein signatures, use the ProScan tool on the Internet menu. See the Using ProScan to Identify Protein Signatures tutorial for further details.
Searching Sequences with Ambiguous Characters
The Patter Finder tool can analyse sequences that contain ambiguous characters. The algorithm records an exact match only where the query and target are identical. For example, the query sequence GTATC will not match GTRTC (where R = A or G) in the target sequence, however, the query GTRTC will match GTRTC, GTATC, and GTGTC. For more details about how Expression handles amiguity in sequences, see the Reverse Translation and Handling Ambiguous Characters tutorial.
Related Articles
Reverse Translation and Handling Ambiguous Characters
Using ProScan to Identify Protein Signatures
Return to Expression Overview
|