[bionet.software] Sequence Reading

gribskov@FCRFV1.NCIFCRF.GOV ("Gribskov, Michael") (03/29/91)

About a year ago there was some discussion  (I think it was here)
about a joint project to develop C modules to read the various
sequence formats.  My impression was that the modules would then be
made publically available.  Did anything come of this project? 

Thanks for any information.

Michael Gribskov
gribskov@ncifcrf.gov

BROE@AARDVARK.UCS.UOKNOR.EDU (Bruce Roe) (03/29/91)

Regarding the question regarding reading sequences in various formats.
The following was posted earlier by Don Gilbert and may be of relevance.

---------------------------  cut here  -----------------------------------
I've updated the sequence reformatter of mine called ReadSeq.  This
program comes as C source code that is suitable for Unix, VMS, MS-DOS, or
other command-line systems.
Readseq reads and writes nucleic/protein sequence in these formats:
    Stanford/IG, Genbank, NBRF, EMBL, UWGCG,  DNA Strider, Fitch,
    Pearson, Zuker, Olsen, Phylip v3.2, Phylip v3.3, and Plain text
Data files may have multiple sequences.  Software developers are
encouraged to use these routines rather than devise their own obscure
formats.  The pascal version of readseq is now out-of-date.

You can get the full set of readseq source and document files as an ARC
archive file thru anonymous ftp to Iubio.bio.indiana.edu.   See directory
[archive.molbio.readseq].  Use binary ftp for getting the readseq.arc file.
                                                              -- Don

Don.Gilbert@iubio.bio.indiana.edu
biology dept., indiana univ.,  bloomington, in 47405, usa
---------------------------------------------------------------------------

Cheers,
--bruce roe
--broe@aardvark.ucs.uoknor.edu

SBAIRD%UOTTAWA@ACADVM1.UOTTAWA.CA (Stephen Baird) (04/05/91)

>From: Bruce Roe <BROE@aardvark.ucs.uoknor.edu>

>Regarding the question regarding reading sequences in various formats.
>---------------------------  cut here  -----------------------------------
>I've updated the sequence reformatter of mine called ReadSeq.  This
>program comes as C source code that is suitable for Unix, VMS, MS-DOS, or
>other command-line systems.
>Readseq reads and writes nucleic/protein sequence in these formats:
>    Stanford/IG, Genbank, NBRF, EMBL, UWGCG,  DNA Strider, Fitch,
>    Pearson, Zuker, Olsen, Phylip v3.2, Phylip v3.3, and Plain text
>Data files may have multiple sequences.  Software developers are
>encouraged to use these routines rather than devise their own obscure
>formats.  The pascal version of readseq is now out-of-date.

I'd prefer not reformating sequences as I use them.    I'd prefer to have some
program translate the sequence just for the program doing the analysis and then
spit back the resulting sequence in the form I started with.  If one uses
several different programs which use different formats, there can be a
resulting hodgepodge collection of different files of the same sequence
with various changes or additions to it.  A modular program (like readseq
Ibelieve) which would filter the format and leave the comments intact
(i think) would be useful.  Modules could be added for the different programs
for the different ways they open a sequence file.  Is this asking too much?


Stephen Baird
Molecular Genetics
Children's Hospital of Eastern Ontario
sbaird@acadvm1.uottawa.ca