[bionet.software] Sequence representations

BIOCUKM@osucc.bitnet (09/11/90)

Folks,
	The recent chatter on alternative ways of presenting
nucleotide sequences underscores a common dissatisfaction with
the use of the first letters (Roman alphabet) of the English
names of the bases.  That dissatisfaction is justified since
the use of ACTG takes up much more space than needed, since
there is a potential ambiguity between C and G on poor quality
copies of the sequence, and since computers are required to
spot patterns.

	If we can all agree that ACTG is a poor choice, we should
switch to a better one.  The chatter mentions a number of good
alternatives.  I suspect that a change will require one or more
courageous journal editors.  Are there any?

				Peace,
				Ulrich Melcher

kristoff@genbank.bio.net (David Kristofferson) (09/11/90)

> 	If we can all agree that ACTG is a poor choice, we should
> switch to a better one.  The chatter mentions a number of good
> alternatives.  I suspect that a change will require one or more
> courageous journal editors.  Are there any?

I think that it will take a bit more than one courageous editor ... 8-) 
Considering the minor uproar over some recent features table changes,
just think of all of the complaints from people whose software would
be rendered obsolete unless they were provided with a conversion
program back to the old format.  Enough said.
-- 
				Sincerely,

				Dave Kristofferson
				GenBank Manager

				kristoff@genbank.bio.net

triplett@CALSHP.CALS.WISC.EDU (09/12/90)

To:  Ulrich Melcher
I do not agree that ACTG is a poor choice.  In my experience, there
has been no ambiguity in this matter.  This is a long established 
convention that would make all previous papers unnecessarily obsolete.
I would strongly encourage journal editors not to consider such
an unnecssary change.  Poor quality printers can distort practically
any message.  I suggest that those who have problems with ACTG should
improve the quality of their print output would would be beneficial 
to all other printing.     Eric Triplett, University of Wisconsin

chh9@quads.uchicago.edu (Conrad Halton Halling) (09/12/90)

I find _lowercase_ acgt much easier to read than ACGT; hence, I prefer the
GenBank database entries over their EMBL equivalents.

Perhaps authors should use lowercase letters in their sequence figures to
improve readability.  (I think that within a year or two sequence figures
will largely disappear, anyway, to be replaced by the accession number and
a map of the sequenced region.)


--
Conrad Halling
chh9@midway.uchicago.edu

jej@chinet.chi.il.us (joe jesson) (09/15/90)

 In order to reduce the ambiguity, the code should maximize the hamming
distance with either check bits or a simplified code. I've got many ideas
for the designation...

-- 
---------------------------------------------------------------------------
Joseph Jesson   jej@chinet.uucp  Day (312) 856-3645 Eve.  (708) 356-6817 
                      21414 W. Honey Lane, Lake Villa, IL, 60046
---------------------------------------------------------------------------

BIOCUKM@osucc.bitnet (09/18/90)

Responses to my posting on adopting sequence representations
other than ATCG require two clarifications:
1.  Ambiguity problem
	Some journals still do not require submission of sequence
data to the banks.  Even when submission is required, availability
in the data banks sometimes lags behind publication.  In these cases,
I run to the library and make a photocopy of the sequence.
It is on those photocopies that I have trouble telling C's from
G's.  Only those with superior eyesight, superior library copy
machines or personal subscriptions to all the journals can breeze
through such sequences error free.

2.  Software conversion--no problem
	I am not suggesting that databases or software be changed.
ASCII codes for ACTG would still be the form that sequences are
stored and manipulated by computers.  I am only suggesting that
we change how those codes are depicted on the printed page.
The only software that would be affected is software that can
read a sequence by scanning a printed page.  Such software, if it
exists, does not seem to be widely used.

			Peace,
			Ulrich Melcher
			Oklahoma State University

roy@phri.nyu.edu (Roy Smith) (09/18/90)

Ulrich Melcher writes:
> Some journals still do not require submission of sequence data to the
> banks [...] Only those with superior eyesight, superior library copy
> machines or personal subscriptions to all the journals can breeze through
> [hard-copy] sequences error free.

	Ulrich, I suppose we simply have differing points of view, but I
think the answer to the problem of hard-to-read printed sequences is not to
make the printed sequences easier to read, but to get rid of them!  Rather
than bug the journal editors to change the way they present sequences in
print, bug them to insist on timely submission to the appropriate database
as a prerequisite to publication.
--
Roy Smith, Public Health Research Institute
455 First Avenue, New York, NY 10016
roy@alanine.phri.nyu.edu -OR- {att,cmcl2,rutgers,hombre}!phri!roy
"Arcane?  Did you say arcane?  It wouldn't be Unix if it wasn't arcane!"

NUM208JN@NRCCAD.NRC.CA (JOHN NASH) (09/18/90)

In response to the comment by Ulrich Melcher on sequence changes:

UM>     I am not suggesting that databases or software be changed.
UM>ASCII codes for ACTG would still be the form that sequences are
UM>stored and manipulated by computers.  I am only suggesting that
UM>we change how those codes are depicted on the printed page.

I'm not too fussed about how these codes are depicted on a page.  However, when
I'm transcribing sequence by hand, I often use lower case "g" instead of G.
(Unfortunately, this could lead to confusion because g is often used for
"probably G".)

Just a thought,


     cheers,
     John,

--------------------------------------------------------------
     John H.E. Nash <Bitnet: NUM208JN@NRCCAD.NRC.CA >
     Institute for Biological Sciences,
     National Research Council of Canada,
     Ottawa, Canada  K1A 0R6.

     Phone:  (613) 990-0990     Fax:    (613) 952-9092.
==============================================================