roy@phri.UUCP (Roy Smith) (03/18/88)
David Kristofferson was kind enough to send me Rich Roberts' official enzyme list. Looking the list over, I'm confused about what the non-standard bases mean. For example, I see: AccI GT^JKAC AeuI (EcoRII) CC^LGG I've never seen the J or L before. I would guess that J is [AC] but I've always used M for that, which I though was the IUPAC standard. Here's an extract from an include file I always use: # define BASE_A 1 /* Adenine */ # define BASE_C 2 /* Cytosine */ # define BASE_G 3 /* Guanine */ # define BASE_T 4 /* Thymine */ # define BASE_U 5 /* Uracil */ # define BASE_R 6 /* A or G (puRine) */ # define BASE_Y 7 /* C or T (pYrimidine) */ # define BASE_M 8 /* A or C */ # define BASE_W 9 /* A or T */ # define BASE_S 10 /* C or G */ # define BASE_K 11 /* G or T */ # define BASE_B 12 /* C, G, or T (not A) */ # define BASE_D 13 /* A, G, or T (not C) */ # define BASE_H 14 /* A, C, or T (not G) */ # define BASE_V 15 /* A, C, or G (not T) */ # define BASE_N 16 /* A, C, G, or T (anything) */ # define BASE_BLK 17 /* Blank, place holder for insertions */ # define BASE_ERR 18 /* Error, (illegal character on input) */ Did the standard change, or was I mislead, or is Rich Roberts using his own notation, or what? Come to think of it, if I had my way, I think I might vote for dropping the special multi-base abbreviations all together and forcing people who cared about such things to learn about regular expressions; gt[ac][gt]ac makes a lot more sense to me than either gtmkac or gtjkac. The notational convenience of one-base, one-position often doesn't seem worth the effort of having to remember all those non-mneumonic abbreviations (not to mention the fact that everybody seems to have their own idea of what those abbreviations should be). -- Roy Smith, {allegra,cmcl2,philabs}!phri!roy System Administrator, Public Health Research Institute 455 First Avenue, New York, NY 10016