roy@phri.UUCP (Roy Smith) (03/18/88)
David Kristofferson was kind enough to send me Rich Roberts'
official enzyme list. Looking the list over, I'm confused about what the
non-standard bases mean. For example, I see:
AccI GT^JKAC
AeuI (EcoRII) CC^LGG
I've never seen the J or L before. I would guess that J is [AC]
but I've always used M for that, which I though was the IUPAC standard.
Here's an extract from an include file I always use:
# define BASE_A 1 /* Adenine */
# define BASE_C 2 /* Cytosine */
# define BASE_G 3 /* Guanine */
# define BASE_T 4 /* Thymine */
# define BASE_U 5 /* Uracil */
# define BASE_R 6 /* A or G (puRine) */
# define BASE_Y 7 /* C or T (pYrimidine) */
# define BASE_M 8 /* A or C */
# define BASE_W 9 /* A or T */
# define BASE_S 10 /* C or G */
# define BASE_K 11 /* G or T */
# define BASE_B 12 /* C, G, or T (not A) */
# define BASE_D 13 /* A, G, or T (not C) */
# define BASE_H 14 /* A, C, or T (not G) */
# define BASE_V 15 /* A, C, or G (not T) */
# define BASE_N 16 /* A, C, G, or T (anything) */
# define BASE_BLK 17 /* Blank, place holder for insertions */
# define BASE_ERR 18 /* Error, (illegal character on input) */
Did the standard change, or was I mislead, or is Rich Roberts using
his own notation, or what? Come to think of it, if I had my way, I think I
might vote for dropping the special multi-base abbreviations all together
and forcing people who cared about such things to learn about regular
expressions; gt[ac][gt]ac makes a lot more sense to me than either gtmkac
or gtjkac. The notational convenience of one-base, one-position often
doesn't seem worth the effort of having to remember all those non-mneumonic
abbreviations (not to mention the fact that everybody seems to have their
own idea of what those abbreviations should be).
--
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016