Annotating Your Sequences
The Sequence Editor in Expression allows you to fully annotate your sequences with extra information. This section guides you through the key aspects of the Sequence Editor and how to get the most out of sequence annotation.
Why Annotate?
Annotating your sequences has a number of advantages:
- You can mark out important regions in your sequences, such as mutations, cut sites, start and stop signals, transcription factor binding sites, probe or primer binding sites.
- A graphical map of your annotated sequence is generated on the fly, giving you an instant overview of the important aspects of your sequence.
- An annotated sequence can be navigated using the Sequence Map, allowing you to instantly jump to the regions you're interested in.
- You can do away with cumbersome line numbers in your sequences, and counting out residues to find a particular spot.
- Expression understands a number of tags in annotated sequences, which allows the software to intelligently perform tasks for you, without you needing to do anything!
How to Annotate
Sequences can be annotated in Expression in three primary ways:
- Importing sequence data from GenBank using the NCBI Query tool from the Internet menu.
- Selecting Annotate while using either the Primer Designer, Enzyme Digestor, or ORF Mapper tools.
- Entering them yourself in the Sequence Editor.
Annotation Notation
Annotating your sequences is really simple and just requires the knowledge of a little notation. The annotation, or tags as we will refer to them as, are very similar to those used in HTML (the language used to format web pages). If you're already famililar with HTML then you will find sequence tagging very natural.
The tags are distinguished from your sequence by < and > symbols. Tagged regions are automatically coloured blue. These are referred to as the tag names. eg
GATCGAT<RBS>CAGCTACATGCGATCGATCGATGCTAGCTAGCTAGCATATGCTAGCATTG ATAGCATGCATTGAGCACGATAGCTAGCTGAGCTAGTCGAAT
To define a region, you simplify repeat the same tag at the end of the region with a / character at the front. We refer to the tags around regions as start tags and end tags. eg
GATCGAT<RBS>GAGGTG</RBS>CAGCTACATGCGATCGATCGATGCTAGCTAGCTAG CATATGCTAGCATTGATAGCATGCATTGAGCACGATAGCTAGCTGAGCTAGTCGAAT
As soon as a tag is made, it is drawn real-time on the Sequence Map, so a direct correlation between the sequence editor and the graphical is always maintained. Single tags are represented on the map as a point and a pair of tags flanking a region is shown as a purple bar. What's more, if you click on the labels on the Sequence Map, it will automatically take you to that the region in the sequence editor. This facility can be very useful for finding your way around long sequences.
For longer descriptions, you can also add comments to your tag. You can add comments to either the start or end tags - you don't need them in both. Comments are addeded in quote marks after the tag name. eg
GATCGAT<RBS >GAGGTG</RBS>CAGCTACATGCGATCGATC GATGCTAGCTAGCTAGCATATGCTAGCATTGATAGCATGCATTGAGCACGATAGCT AGCTGAGCTAGTCGAAT
Pretty simple so far right? Where Expression gets really clever is in its utilisation of some other special tags, which help the program understand the meaning of your sequence. The first of these are the + and - prefixes. These are used to tell Expression which regions are protein-coding, and whether they are on the sense (+) or antisense (-) strand. eg
GATCGAT<RBS >GAGGTG</RBS>CAGCTAC<+CDS>ATGCG ATCGATCGATGCTAGCTAGCTAGCATATGCTAGCATTGATAGCATGCATTGA</CDS>GC ACGATAGCTAGCTGAGCTAGTCGAAT
Coding regions are represented on the Sequence Map as arrows, with their direction showing which way the sequence would be transcribed. When you click on the label of a coding region on the Sequence Map, the corresponding sequence is automatically translated, and the resulting amino acid sequence is shown in a panel at the bottom of the Sequence Editor.
The other special tag recognised by Expression is exon. When you select a label associated with an exon tag on the Sequence Map, Expression intelligently finds the other associated exons and provides the complete translation for the whole gene. eg
ACGTGACTAGCTAGC<+exon1>ATGCATGCATGCAGCATCGATCGATCGATCGATGCAT CGAGCATGCAGCTAG</exon1>CTAGCTAGTCAGCTAGT<+exon2>CAGTAGCTAGCT AGCTAGCTAGAGCATCACACGACGACTAGCACACGATCGACGACTATCGACGACGACAGC ATCGACATCAGCAGCAGACGACGATCGCACGACGATCGACAGCTAGC</exon2>TAGCATC
As explained above, the + prefix in this example denotes that the exon is on the sense strand. If you're examining a sequence with multiple genes, each consisting of multiple exons, the different groups of exons are differentiated by a letter following the exon number, eg the tag names, exon1a, exon2a, exon3a, exon1b, exon2b, would be interpreted as two genes; the first with three exons and the second with two.
Summary
The following is a brief summary of the notation used by Expression to annotate your sequences:
Symbol | Name | Function |
< > | Tag Marker | Encapsulates tag names and comments. |
/ | End Prefix | Used to mark the end of a sequence region. |
' ' | Comment Marker | Encapsulates additional comments within your tag. |
+ | Plus Coding Prefix | Denotes that the region is protein-coding on the plus (sense) strand. |
- | Minus Coding Prefix | Denotes that the region is protein-coding on the minus (anti-sense) strand. |
exonn | Exon Tag | Denotes that the region is the nth exon in a series of exons. Requires a + or - prefix. |
Related Articles
Using the Sequence Map
Importing sequences from GenBank and other sources
Return to Expression Overview
|