[comp.archives] [bionet.molbio.evolution] Re: WHAT IS NEEDED FOR PHYLIP

gilbertd@ogre.cica.indiana.edu (Don Gilbert) (11/24/90)

Archive-name: phylip/22-Nov-90
Original-posting-by: gilbertd@ogre.cica.indiana.edu (Don Gilbert)
Original-subject: Re: WHAT IS NEEDED FOR PHYLIP
Archive-site: iubio.bio.indiana.edu [129.79.1.101]
Reposted-by: emv@ox.com (Edward Vielmetti)

In article <9011221446.AA05075@asmus1.genetics.uga.edu> arnold%gandal.dnet@ASMUS1.GENETICS.UGA.EDU writes:
>4) conversion routines for output from GCG to PHYLIP data format, i.e
>TOPHYLIP and TOGCG.

You can use ReadSeq now to convert from GCG to Phylip format (it doesn't
yet do the reverse).  Obtain it via anonymous ftp to iubio.bio.indiana.edu,
cd [archive.molbio.readseq], mget *.c and mget *.h and mget *.doc to get the
C source and documents, or binary get readseq.arc to get an archive file
that includes a VMS shell.

I would much prefer to see Phylip read (and if needed, write) sequence data
using the IntelliGenetics format.  This format is widely used now, and some
multiple aligners already produce this output.  The current Phylip 3.3
sequence input format, with its interleaving of species, is a pain to
translate to/from.  While many multi-aligners produce some sort of interleaved
output for _display_, all of these require extensive hand editting to
fit into Phylip format.  I am working with output from another program now
that uses an interleaved format: about 150 species with 2000+ bases each.
Normally programs read 1 sequence at a time.  This means 150 passes thru
a 600 kilobyte file ... it takes a while.  A format with one sequence after
another can be read in one gulp.

-- Don