"MISMATCH.DOC" follows

version 2 April 1991

This file, MISMATCH.DOC contains user information for the BASIC program 
MISMATCH.BAS.

BACKGROUND.  This program automates the design of " Directed Mismatch" 
experiments.  In this technique (probably first used by Haliassos et al. 
(1989) Nucleic Acids Res. 17:3606)  the user desires to use restriction 
enzyme digestion to distinguish between the wild type allele and a common 
mutant allele of an interesting gene even though the two sequences do not 
immediately differ by the presence vs. absence of a site. The user designs 
a PCR primer containing a single mismatch at a site near the mutation  
such that the PCR products of the wild and mutant genes differ by a 
restriction site.  Since the mismatched primer functions poorly in the 
first few cycles, the PCR reaction is usually run a few cycles longer than 
a reaction with correctly matched primers.  Many diagnostic laboratories 
prefer the Directed Mismatch technique to the Allele Specific 
Oligonucleotide hybridization method for allelotyping cystic fibrosis genes 
(A. Beaudet, R. Fenwick et al., personal communication).  Many published 
uses of the Directed Mismatch technique involve allelotyping at the the 
various RAS genes.

COMPUTER REQUIREMENTS.  Any computer capable of running BASIC should be 
able to run this program since it does not use unusual instructions or 
features. It has been run with IBM-BASIC and GW-BASIC on IBM-PCs and 
clones.

USER SUPPLIES. The user must supply the wild type sequence for the region 
of his/her gene in  a file as a single line consisting of all upper case 
letters and no intervening blank spaces. A 50 character line containing 25 
bases before and after the mutant site is large enough to allow the program 
to test all the known restriction sites in the current default file 
APR91.ENZ.  The user must either supply a restriction site file or else use 
the default file.  The enzyme file contains the name of an enzyme in columns 
1-13 and the recognition site (using all upper case letters and the 
same ambiguity base assignments as New England Biolabs--sorry, there are 
two different systems in use and I mixed them in the March 1991 version of the 
MISMATCH.BAS program) beginning in column 14. The user is 
asked for the position in her sequence file of the mutation in question, 
the identity of the mutant base and the wild type base. The user is also 
asked for the name of an output file in which the program can list the 
possibly useful single mismatched sites and the alignment of the 
recognition sites with the gene sequence.  Output also is sent to the 
screen.

RUNNING THE PROGRAM. An inexperienced user should follow these 
directions. All of your responses to the computer requests are terminated 
by <enter> or <carriage return>.
  
1.  Change the default disk and directory on your PC to the 
directory containing your BASIC program (BASIC.COM, BASICA.COM, 
GWBASIC.COM, etc.), for example: cd c:\basic. Copy the program MISMATCH.BAS, 
the restriction file APR91.ENZ and your one line gene sequence file (I am 
assuming this last file is called myfile) to that directory as well.

2. At the prompt C> issue the command BASIC MISMATCH. 

3.  When the program asks for your sequence file, type in  filename.

4.  When the program asks for the enzyme file, just hit <enter> to 
accept the default restriction file.

5.  When the program asks for the name of an output file, invent a name such as
mygene.mis which you can later print with the DOS print command or edit with 
your word processor.

6. Answer the program's questions about the position of the mutation and the 
identities of the wild type and mutant bases.

7.  Sit back while the program runs.  When it is finished, you can either 
run it again (choosing a different enzyme file or a different gene
sequence file) with the command RUN, or you can leave BASIC with the command
SYSTEM. 


A compiled version runs considerably faster than interpreted BASIC.


"MISMATCH.BAS" follows:
10 REM This program, "MISMATCH", by LANCE DAVIDOW (COLLABORATIVE RESEARCH,Inc.
20 REM  (Two Oak Park, Bedford MA 01730. Phone (617) 275-0004 ext.115)
30 REM    finds directed mismatch primers to allow allelotyping tests by a
40 REM    restriction digest analysis following PCR.    version 11 april 1991
50 REM   ref: Haliassos et al. (1989). Nuc Acids Res 17:3606
60 DEFINT A-Z
70 REM ALLBASES$ IS THE STRING OF ALL POSSIBLE LEGAL BASES
80 ALLBASES$="ABCDGHKMNRSTUVWY"
90 REM matrule$ array has NEBiolabs ambiguous base matching rules
100 DIM MATRULE$(16)
110 REM blanks or illegal characters in the recognition site treated as "N"
120 MATRULE$(0)="ACGTU"
130 MATRULE$(1)="A"
140 MATRULE$(2)="CGTU"
150 MATRULE$(3)="C"
160 MATRULE$(4)="AGTU"
170 MATRULE$(5)="G"
180 MATRULE$(6)="ACTU"
190 MATRULE$(7)="GTU"
200 MATRULE$(8)="AC"
210 MATRULE$(9)="ACGTU"
220 MATRULE$(10)="AG"
230 MATRULE$(11)="CG"
240 MATRULE$(12)="TU"
250 MATRULE$(13)="TU"
260 MATRULE$(14)="ACG"
270 MATRULE$(15)="ATU"
280 MATRULE$(16)="CTU"
290 INPUT "FILE NAME WITH WILD TYPE SEQUENCE REGION ON 1 LINE";GENEFILE$
300 INPUT "RESTRICTION ENZYME FILE [APR91.ENZ]";ENZFILE$
310 IF ENZFILE$="" THEN ENZFILE$="APR91.ENZ"
320 INPUT "OUTPUT FILE NAME [NUL=NO OUTPUT FILE]";OUTFILE$
330 IF OUTFILE$="" THEN OUTFILE$="NUL"
340 OPEN GENEFILE$ FOR INPUT AS #1
350 LINE INPUT #1, GENESEQ$
360 CLOSE #1
370 OPEN OUTFILE$ FOR OUTPUT AS #3
380 PRINT#3,"GENE FILE=";GENEFILE$,"ENZYME FILE=";ENZFILE$,"OUTPUT=";OUTFILE$
390 INPUT "MUTANT POSITION in bp from start--e.g. 25";MUTPOS
400 INPUT "WILD TYPE BASE--UPPER CASE ONLY!!!--e.g. C";WTBASE$
410 REM program does not verify that position and base agree
420 INPUT "MUTANT BASE--UPPER CASE ONLY!!!" ; MUTBASE$
430 PRINT#3,
440 PRINT#3,"AN '*' INDICATES MUTANT POSITION. A '-' DENOTES MISMATCH POSITION."
450 PRINT#3, SPC(MUTPOS-1) "*"
460 PRINT#3, GENESEQ$
470 PRINT#3,"MUTANT POSITION=";MUTPOS,"WT base=";WTBASE$,"Mutant base=";MUTBASE$
480 PRINT "First Pass--Sites Present in WT but not in Mutant"
490 PRINT#3,"First Pass--Sites Present in WT but not in Mutant"
500 FOR PASS=1 TO 2
510   IF PASS=2 THEN MID$(GENESEQ$,MUTPOS)=MUTBASE$:SWAP MUTBASE$,WTBASE$: PRINT "--PASS#2--Site in MUT": PRINT#3,"--PASS#2--Site in MUT"
520   OPEN ENZFILE$ FOR INPUT AS #2
530   WHILE NOT EOF(2)
540      LINE INPUT #2,NEXTENZ$
550      ENZNAME$=MID$(NEXTENZ$,1,13)
560      REM ENZYME NAMES IN COLS 1 TO 13. RECOGNITION SITE FROM 14 TO END
570      RECOG$=MID$(NEXTENZ$,14,65)
580      SITESIZE=LEN(RECOG$)
590      FOR INSITE=1 TO SITESIZE
600        REM DOES THIS BASE MATCH WT BUT NOT MUTANT?
610        LOOKUP=INSTR(ALLBASES$,MID$(RECOG$,INSITE,1))
620        A$=MATRULE$(LOOKUP)
630        REM call subroutine if this base does distinguish wt and mut
640        IF INSTR(A$,WTBASE$)<>0 AND INSTR(A$,MUTBASE$)=0 THEN GOSUB 740
650      NEXT INSITE
660   WEND
670   CLOSE #2
680   PRINT
690   PRINT#3,
700 NEXT PASS
710 CLOSE #3
720 END
730 REM
740 REM Subroutine to count up mismatches between enzyme and sequence
750 REM then call up another subroutine to output any alignments with
760 REM one or 0 mismatches
770 REM align recognition site with sequence. INSITE base over MUTPOS base
780 MISS=0
790 MISSPT=0
800 FOR TEST=1 TO SITESIZE
810     LOOKUP=INSTR(ALLBASES$,(MID$(RECOG$,TEST,1)))
820     A$=MATRULE$(LOOKUP)
830     IF INSTR(A$,MID$(GENESEQ$,(MUTPOS-INSITE+TEST),1))=0 THEN MISS=MISS+1: MISSPT=TEST
840 NEXT TEST
850 IF MISS<=1 THEN GOSUB 880
860 RETURN
870 REM
880 REM Subroutine to output a useful restriction site and alignment
890 REM print a "*" at the mutation site and a "-" at the mismatch base
900 PRINT#3,
910 PRINT
920 OUTLINE$=SPACE$(20)
930 IF MISSPT<>0 THEN MID$(OUTLINE$,MISSPT)="-"
940 MID$(OUTLINE$,INSITE)="*"
950 PRINT , OUTLINE$
960 PRINT#3, , OUTLINE$
970 PRINT#3, ENZNAME$, RECOG$, MISS;"  MISMATCHES"
980 PRINT ENZNAME$, RECOG$, MISS;"  mismatches"
990 PRINT#3, "TARGET DNA  ", MID$(GENESEQ$,(MUTPOS-INSITE+1),SITESIZE)
1000 PRINT "target dna  ",MID$(GENESEQ$,(MUTPOS-INSITE+1),SITESIZE)
1010 RETURN

"APR91.ENZ"follows:
AatII        GACGTC
AccI         GTMKAC
AciI         CCGC
AflII        CTTAAG
AflIII       ACRYGT
AgeI         ACCGGT
AhaII        GRCGYC
AluI         AGCT
AlwI         GGATC 
AlwI         GATCC
AlwNI        CAGNNNCTG
ApaI         GGGCCC
ApaLI        GTGCAC
AseI         ATTAAT
AvaI         CYCGRG
AvaII        GGWCC
AvrII        CCTAGG
BamHI        GGATCC
BanI         GGYRCC
BanII        GRGCYC
BbsI         GAAGAC
BbsI         GTCTTC
BbvI         GCAGC
BbvI         GCTGC
BcgI         CGANNNNNNTGC
BcgI         GCANNNNNNTCG
BclI         TGATCA
BglI         GCCNNNNNGGC
BglII        AGATCT
BsaI         GGTCTC
BsaI         GAGACC
BsaAI        YACGTR
BsaBI        GATNNNNATC
BsaJI        CCNNGG
BsaHI        GRCGYC
BsiWI        CGTACG
BslI         CCNNNNNNNGG
BsmI         GAATGC
BsmI         GCATTC
BsmAI        GTCTC
BsmAI        GAGAC
Bspl286      GDGCHC
BspDI        ATCGAT
BspEI        TCCGGA
BspHI        TCATGA
BspMI        ACCTGC
BspMI        GCAGGT
BsrI         ACTGG
BsrI         CCAGT
BssHII       GCGCGC
BstBI        TTCGAA
BstEII       GGTNACC
BstNI        CCWGG
BstUI        CGCG
BstXI        CCANNNNNNTGG
BstYI        RGATCY
Bsu36I       CCTNAGG
Cfr10I       RCCGGY
ClaI         ATCGAT
DdeI         CTNAG
DpnI         GATC
DpnII        GATC
DraI         TTTAAA
DraIII       CACNNNGTG
DrdI         GACNNNNNNGTC
EaeI         YGGCCR
EagI         CGGCCG
EarI         CTCTTC
EarI         GAAGAG
EcoNI        CCTNNNNNAGG
EcoO109I     RGGNCCY
EcoRI        GAATTC
EcoRV        GATATC
Eco47III     AGCGCT
EspI         GCTNAGC
Fnu4HI       GCNGC
FokI         GGATG
FokI         CATCC
FspI         TGCGCA
GdiII        YGGCCG
GsuI         CTCCAG
GsuI         CTGGAG
HaeI         WGGCCW
HaeII        RGCGCY
HaeIII       GGCC
HgaI         GACGC
HgaI         GCGTC
HgiAI        GWGCWC
HgiEII       ACCNNNNNNGGT
HhaI         GCGC
HinCII       GTYRAC
HinDIII      AAGCTT
HinFI        GANTC
HinPI        GCGC
HpaI         GTTAAC
HphI         GGTGA
HphI         TCACC
KasI         GGCGCC
KpnI         GGTACC
MaeII        ACGT
MaeIII       GTNAC
MboI         GATC
MboII        GAAGA
MboII        TCTTC
MluI         ACGCGT
MnlI         CCTC
MnlI         GAGG
MscI         TGGCCA
MseI         TTAA
MspI         CCGG
NaeI         GCCGGC
NarI         GGCGCC
NciI         CCSGG
NcoI         CCATGG
NdeI         CATATG
NheI         GCTAGC
NlaIII       CATG
NlaIV        GGNNCC
NotI         GCGGCCGC
NruI         TCGCGA
NsiI         ATGCAT
NspBII       CMGCKG
NspHI        RCATGY
PacI         TTAATTAA
PaeR7I       CTCGAG
PflMI        CCANNNNNTGG
PleI         GAGTC
PleI         GACTC
PmlI         CACGTG
PpuMI        RGGWCCY
PstI         CTGCAG
PvuI         CGATCG
PvuII        CAGCTG
RmaI         CTAG
RsaI         GTAC
RsrII        CGGWCCG
SacI         GAGCTC
SacII        CCGCGG
SalI         GTCGAC
Sau96I       GGNCC
ScaI         AGTACT
ScrFI        CCNGG
SfaNI        GCATC
SfaNI        GATGC
Sfi I        GGCCNNNNNGGCC
SmaI         CCCGGG
SnaI         GTATAC
SnaBI        TACGTA
SpeI         ACTAGT
SphI         GCATGC
SplI         CGTACG
SspI         AATATT
StuI         AGGCCT
StyI         CCWWGG
TaqI         TCGA
TfiI         GAWTC
Tth111I      GACNNNGTC
Tth111II     CAARCA
Tth111II     TGYTTG
XbaI         TCTAGA
XcmI         CCANNNNNNNNNTGG
XhoI         CTCGAG
XmaI         CCCGGG
XmnI         GAANNNNTTC
