"MISMATCH.DOC" follows version 2 April 1991 This file, MISMATCH.DOC contains user information for the BASIC program MISMATCH.BAS. BACKGROUND. This program automates the design of " Directed Mismatch" experiments. In this technique (probably first used by Haliassos et al. (1989) Nucleic Acids Res. 17:3606) the user desires to use restriction enzyme digestion to distinguish between the wild type allele and a common mutant allele of an interesting gene even though the two sequences do not immediately differ by the presence vs. absence of a site. The user designs a PCR primer containing a single mismatch at a site near the mutation such that the PCR products of the wild and mutant genes differ by a restriction site. Since the mismatched primer functions poorly in the first few cycles, the PCR reaction is usually run a few cycles longer than a reaction with correctly matched primers. Many diagnostic laboratories prefer the Directed Mismatch technique to the Allele Specific Oligonucleotide hybridization method for allelotyping cystic fibrosis genes (A. Beaudet, R. Fenwick et al., personal communication). Many published uses of the Directed Mismatch technique involve allelotyping at the the various RAS genes. COMPUTER REQUIREMENTS. Any computer capable of running BASIC should be able to run this program since it does not use unusual instructions or features. It has been run with IBM-BASIC and GW-BASIC on IBM-PCs and clones. USER SUPPLIES. The user must supply the wild type sequence for the region of his/her gene in a file as a single line consisting of all upper case letters and no intervening blank spaces. A 50 character line containing 25 bases before and after the mutant site is large enough to allow the program to test all the known restriction sites in the current default file APR91.ENZ. The user must either supply a restriction site file or else use the default file. The enzyme file contains the name of an enzyme in columns 1-13 and the recognition site (using all upper case letters and the same ambiguity base assignments as New England Biolabs--sorry, there are two different systems in use and I mixed them in the March 1991 version of the MISMATCH.BAS program) beginning in column 14. The user is asked for the position in her sequence file of the mutation in question, the identity of the mutant base and the wild type base. The user is also asked for the name of an output file in which the program can list the possibly useful single mismatched sites and the alignment of the recognition sites with the gene sequence. Output also is sent to the screen. RUNNING THE PROGRAM. An inexperienced user should follow these directions. All of your responses to the computer requests are terminated by or . 1. Change the default disk and directory on your PC to the directory containing your BASIC program (BASIC.COM, BASICA.COM, GWBASIC.COM, etc.), for example: cd c:\basic. Copy the program MISMATCH.BAS, the restriction file APR91.ENZ and your one line gene sequence file (I am assuming this last file is called myfile) to that directory as well. 2. At the prompt C> issue the command BASIC MISMATCH. 3. When the program asks for your sequence file, type in filename. 4. When the program asks for the enzyme file, just hit to accept the default restriction file. 5. When the program asks for the name of an output file, invent a name such as mygene.mis which you can later print with the DOS print command or edit with your word processor. 6. Answer the program's questions about the position of the mutation and the identities of the wild type and mutant bases. 7. Sit back while the program runs. When it is finished, you can either run it again (choosing a different enzyme file or a different gene sequence file) with the command RUN, or you can leave BASIC with the command SYSTEM. A compiled version runs considerably faster than interpreted BASIC. "MISMATCH.BAS" follows: 10 REM This program, "MISMATCH", by LANCE DAVIDOW (COLLABORATIVE RESEARCH,Inc. 20 REM (Two Oak Park, Bedford MA 01730. Phone (617) 275-0004 ext.115) 30 REM finds directed mismatch primers to allow allelotyping tests by a 40 REM restriction digest analysis following PCR. version 11 april 1991 50 REM ref: Haliassos et al. (1989). Nuc Acids Res 17:3606 60 DEFINT A-Z 70 REM ALLBASES$ IS THE STRING OF ALL POSSIBLE LEGAL BASES 80 ALLBASES$="ABCDGHKMNRSTUVWY" 90 REM matrule$ array has NEBiolabs ambiguous base matching rules 100 DIM MATRULE$(16) 110 REM blanks or illegal characters in the recognition site treated as "N" 120 MATRULE$(0)="ACGTU" 130 MATRULE$(1)="A" 140 MATRULE$(2)="CGTU" 150 MATRULE$(3)="C" 160 MATRULE$(4)="AGTU" 170 MATRULE$(5)="G" 180 MATRULE$(6)="ACTU" 190 MATRULE$(7)="GTU" 200 MATRULE$(8)="AC" 210 MATRULE$(9)="ACGTU" 220 MATRULE$(10)="AG" 230 MATRULE$(11)="CG" 240 MATRULE$(12)="TU" 250 MATRULE$(13)="TU" 260 MATRULE$(14)="ACG" 270 MATRULE$(15)="ATU" 280 MATRULE$(16)="CTU" 290 INPUT "FILE NAME WITH WILD TYPE SEQUENCE REGION ON 1 LINE";GENEFILE$ 300 INPUT "RESTRICTION ENZYME FILE [APR91.ENZ]";ENZFILE$ 310 IF ENZFILE$="" THEN ENZFILE$="APR91.ENZ" 320 INPUT "OUTPUT FILE NAME [NUL=NO OUTPUT FILE]";OUTFILE$ 330 IF OUTFILE$="" THEN OUTFILE$="NUL" 340 OPEN GENEFILE$ FOR INPUT AS #1 350 LINE INPUT #1, GENESEQ$ 360 CLOSE #1 370 OPEN OUTFILE$ FOR OUTPUT AS #3 380 PRINT#3,"GENE FILE=";GENEFILE$,"ENZYME FILE=";ENZFILE$,"OUTPUT=";OUTFILE$ 390 INPUT "MUTANT POSITION in bp from start--e.g. 25";MUTPOS 400 INPUT "WILD TYPE BASE--UPPER CASE ONLY!!!--e.g. C";WTBASE$ 410 REM program does not verify that position and base agree 420 INPUT "MUTANT BASE--UPPER CASE ONLY!!!" ; MUTBASE$ 430 PRINT#3, 440 PRINT#3,"AN '*' INDICATES MUTANT POSITION. A '-' DENOTES MISMATCH POSITION." 450 PRINT#3, SPC(MUTPOS-1) "*" 460 PRINT#3, GENESEQ$ 470 PRINT#3,"MUTANT POSITION=";MUTPOS,"WT base=";WTBASE$,"Mutant base=";MUTBASE$ 480 PRINT "First Pass--Sites Present in WT but not in Mutant" 490 PRINT#3,"First Pass--Sites Present in WT but not in Mutant" 500 FOR PASS=1 TO 2 510 IF PASS=2 THEN MID$(GENESEQ$,MUTPOS)=MUTBASE$:SWAP MUTBASE$,WTBASE$: PRINT "--PASS#2--Site in MUT": PRINT#3,"--PASS#2--Site in MUT" 520 OPEN ENZFILE$ FOR INPUT AS #2 530 WHILE NOT EOF(2) 540 LINE INPUT #2,NEXTENZ$ 550 ENZNAME$=MID$(NEXTENZ$,1,13) 560 REM ENZYME NAMES IN COLS 1 TO 13. RECOGNITION SITE FROM 14 TO END 570 RECOG$=MID$(NEXTENZ$,14,65) 580 SITESIZE=LEN(RECOG$) 590 FOR INSITE=1 TO SITESIZE 600 REM DOES THIS BASE MATCH WT BUT NOT MUTANT? 610 LOOKUP=INSTR(ALLBASES$,MID$(RECOG$,INSITE,1)) 620 A$=MATRULE$(LOOKUP) 630 REM call subroutine if this base does distinguish wt and mut 640 IF INSTR(A$,WTBASE$)<>0 AND INSTR(A$,MUTBASE$)=0 THEN GOSUB 740 650 NEXT INSITE 660 WEND 670 CLOSE #2 680 PRINT 690 PRINT#3, 700 NEXT PASS 710 CLOSE #3 720 END 730 REM 740 REM Subroutine to count up mismatches between enzyme and sequence 750 REM then call up another subroutine to output any alignments with 760 REM one or 0 mismatches 770 REM align recognition site with sequence. INSITE base over MUTPOS base 780 MISS=0 790 MISSPT=0 800 FOR TEST=1 TO SITESIZE 810 LOOKUP=INSTR(ALLBASES$,(MID$(RECOG$,TEST,1))) 820 A$=MATRULE$(LOOKUP) 830 IF INSTR(A$,MID$(GENESEQ$,(MUTPOS-INSITE+TEST),1))=0 THEN MISS=MISS+1: MISSPT=TEST 840 NEXT TEST 850 IF MISS<=1 THEN GOSUB 880 860 RETURN 870 REM 880 REM Subroutine to output a useful restriction site and alignment 890 REM print a "*" at the mutation site and a "-" at the mismatch base 900 PRINT#3, 910 PRINT 920 OUTLINE$=SPACE$(20) 930 IF MISSPT<>0 THEN MID$(OUTLINE$,MISSPT)="-" 940 MID$(OUTLINE$,INSITE)="*" 950 PRINT , OUTLINE$ 960 PRINT#3, , OUTLINE$ 970 PRINT#3, ENZNAME$, RECOG$, MISS;" MISMATCHES" 980 PRINT ENZNAME$, RECOG$, MISS;" mismatches" 990 PRINT#3, "TARGET DNA ", MID$(GENESEQ$,(MUTPOS-INSITE+1),SITESIZE) 1000 PRINT "target dna ",MID$(GENESEQ$,(MUTPOS-INSITE+1),SITESIZE) 1010 RETURN "APR91.ENZ"follows: AatII GACGTC AccI GTMKAC AciI CCGC AflII CTTAAG AflIII ACRYGT AgeI ACCGGT AhaII GRCGYC AluI AGCT AlwI GGATC AlwI GATCC AlwNI CAGNNNCTG ApaI GGGCCC ApaLI GTGCAC AseI ATTAAT AvaI CYCGRG AvaII GGWCC AvrII CCTAGG BamHI GGATCC BanI GGYRCC BanII GRGCYC BbsI GAAGAC BbsI GTCTTC BbvI GCAGC BbvI GCTGC BcgI CGANNNNNNTGC BcgI GCANNNNNNTCG BclI TGATCA BglI GCCNNNNNGGC BglII AGATCT BsaI GGTCTC BsaI GAGACC BsaAI YACGTR BsaBI GATNNNNATC BsaJI CCNNGG BsaHI GRCGYC BsiWI CGTACG BslI CCNNNNNNNGG BsmI GAATGC BsmI GCATTC BsmAI GTCTC BsmAI GAGAC Bspl286 GDGCHC BspDI ATCGAT BspEI TCCGGA BspHI TCATGA BspMI ACCTGC BspMI GCAGGT BsrI ACTGG BsrI CCAGT BssHII GCGCGC BstBI TTCGAA BstEII GGTNACC BstNI CCWGG BstUI CGCG BstXI CCANNNNNNTGG BstYI RGATCY Bsu36I CCTNAGG Cfr10I RCCGGY ClaI ATCGAT DdeI CTNAG DpnI GATC DpnII GATC DraI TTTAAA DraIII CACNNNGTG DrdI GACNNNNNNGTC EaeI YGGCCR EagI CGGCCG EarI CTCTTC EarI GAAGAG EcoNI CCTNNNNNAGG EcoO109I RGGNCCY EcoRI GAATTC EcoRV GATATC Eco47III AGCGCT EspI GCTNAGC Fnu4HI GCNGC FokI GGATG FokI CATCC FspI TGCGCA GdiII YGGCCG GsuI CTCCAG GsuI CTGGAG HaeI WGGCCW HaeII RGCGCY HaeIII GGCC HgaI GACGC HgaI GCGTC HgiAI GWGCWC HgiEII ACCNNNNNNGGT HhaI GCGC HinCII GTYRAC HinDIII AAGCTT HinFI GANTC HinPI GCGC HpaI GTTAAC HphI GGTGA HphI TCACC KasI GGCGCC KpnI GGTACC MaeII ACGT MaeIII GTNAC MboI GATC MboII GAAGA MboII TCTTC MluI ACGCGT MnlI CCTC MnlI GAGG MscI TGGCCA MseI TTAA MspI CCGG NaeI GCCGGC NarI GGCGCC NciI CCSGG NcoI CCATGG NdeI CATATG NheI GCTAGC NlaIII CATG NlaIV GGNNCC NotI GCGGCCGC NruI TCGCGA NsiI ATGCAT NspBII CMGCKG NspHI RCATGY PacI TTAATTAA PaeR7I CTCGAG PflMI CCANNNNNTGG PleI GAGTC PleI GACTC PmlI CACGTG PpuMI RGGWCCY PstI CTGCAG PvuI CGATCG PvuII CAGCTG RmaI CTAG RsaI GTAC RsrII CGGWCCG SacI GAGCTC SacII CCGCGG SalI GTCGAC Sau96I GGNCC ScaI AGTACT ScrFI CCNGG SfaNI GCATC SfaNI GATGC Sfi I GGCCNNNNNGGCC SmaI CCCGGG SnaI GTATAC SnaBI TACGTA SpeI ACTAGT SphI GCATGC SplI CGTACG SspI AATATT StuI AGGCCT StyI CCWWGG TaqI TCGA TfiI GAWTC Tth111I GACNNNGTC Tth111II CAARCA Tth111II TGYTTG XbaI TCTAGA XcmI CCANNNNNNNNNTGG XhoI CTCGAG XmaI CCCGGG XmnI GAANNNNTTC