roy@phri.UUCP (Roy Smith) (03/18/88)
David Kristofferson was kind enough to send me Rich Roberts'
official enzyme list. Looking the list over, I'm confused about what the
non-standard bases mean. For example, I see:
AccI GT^JKAC
AeuI (EcoRII) CC^LGG
I've never seen the J or L before. I would guess that J is [AC]
but I've always used M for that, which I though was the IUPAC standard.
Here's an extract from an include file I always use:
# define BASE_A 1 /* Adenine */
# define BASE_C 2 /* Cytosine */
# define BASE_G 3 /* Guanine */
# define BASE_T 4 /* Thymine */
# define BASE_U 5 /* Uracil */
# define BASE_R 6 /* A or G (puRine) */
# define BASE_Y 7 /* C or T (pYrimidine) */
# define BASE_M 8 /* A or C */
# define BASE_W 9 /* A or T */
# define BASE_S 10 /* C or G */
# define BASE_K 11 /* G or T */
# define BASE_B 12 /* C, G, or T (not A) */
# define BASE_D 13 /* A, G, or T (not C) */
# define BASE_H 14 /* A, C, or T (not G) */
# define BASE_V 15 /* A, C, or G (not T) */
# define BASE_N 16 /* A, C, G, or T (anything) */
# define BASE_BLK 17 /* Blank, place holder for insertions */
# define BASE_ERR 18 /* Error, (illegal character on input) */
Did the standard change, or was I mislead, or is Rich Roberts using
his own notation, or what? Come to think of it, if I had my way, I think I
might vote for dropping the special multi-base abbreviations all together
and forcing people who cared about such things to learn about regular
expressions; gt[ac][gt]ac makes a lot more sense to me than either gtmkac
or gtjkac. The notational convenience of one-base, one-position often
doesn't seem worth the effort of having to remember all those non-mneumonic
abbreviations (not to mention the fact that everybody seems to have their
own idea of what those abbreviations should be).
--
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016dd@beta.UUCP (Dan Davison) (03/18/88)
In article <3193@phri.UUCP>, roy@phri.UUCP (Roy Smith) writes: > > David Kristofferson was kind enough to send me Rich Roberts' > official enzyme list. Looking the list over, I'm confused about what the > non-standard bases mean. For example, I see: > > AccI GT^JKAC > AeuI (EcoRII) CC^LGG > > I've never seen the J or L before. I would guess that J is [AC] > > Did the standard change, or was I mislead, or is Rich Roberts using > his own notation, or what? Come to think of it, if I had my way, I think I > Roy Smith, {allegra,cmcl2,philabs}!phri!roy It's the ambiguous base code developed by the MOLGEN project at SU SUMEX-AIM.STANFORD.EDU back in the dawn of time, 1979-1980. It bears no resemblence to the Staden or IUPAC codes. [INEWS FODDER] dan davison theoretical biology los alamos national laboratory t-10 ms k710 los alamos, nm 87544 dd@lanl.gov, dd@lanl.UUCP, ...cmcl2!lanl!dd -- dan davison/theoretical biology/t-10 ms k710/los alamos national laboratory los alamos, nm 875545/dd@lanl.gov (arpa)/dd@lanl.uucp(new)/..cmcl2!lanl!dd "I refuse to be intimidated by reality any more" "What is reality anyway? Nuthin' but a collective hunch!" --Jane Wagner,via Lily Tomlin
roy@phri.UUCP (Roy Smith) (03/19/88)
In response to a query of mine about Rich Roberts's ambigious base notation, dd@beta.UUCP (Dan Davison) writes: > It's the ambiguous base code developed by the MOLGEN project at SU > SUMEX-AIM.STANFORD.EDU back in the dawn of time, 1979-1980. It bears no > resemblence to the Staden or IUPAC codes. I did a bit more research on this topic and came up with the following paper: %A Athel Cornish-Bowden %T Nomenclature for incompletely specified bases in nucleic acid sequences: recommendations 1984 %J Nucleic Acids Research %D 1985 %V 13 %P 3021-3030 This paper includes a longish list of references to other attempts at standardizing the code, and provides some arguments as to why the scheme he presents (the IUPAC scheme) is more mneumonic that any other. For example, W={A,T} and S={C,G} because A-T pairs are Weak and C-G pairs are Strong; M={A,C} and K={G,T} because A and C have aMido groups in chemicaly similar positions while G and T have Keto groups in those positions. I'm fully aware how hard it is to change over from one standard to another, especially after using the old one for so many years. On the other hand, I think it's pretty much agreed that IUPAC is the final authority when it comes to chemical nomenclature; to insist on using some other naming system just doesn't make sense. -- Roy Smith, {allegra,cmcl2,philabs}!phri!roy System Administrator, Public Health Research Institute 455 First Avenue, New York, NY 10016