gribskov@FCRFV1.NCIFCRF.GOV ("Gribskov, Michael") (03/29/91)
About a year ago there was some discussion (I think it was here) about a joint project to develop C modules to read the various sequence formats. My impression was that the modules would then be made publically available. Did anything come of this project? Thanks for any information. Michael Gribskov gribskov@ncifcrf.gov
BROE@AARDVARK.UCS.UOKNOR.EDU (Bruce Roe) (03/29/91)
Regarding the question regarding reading sequences in various formats. The following was posted earlier by Don Gilbert and may be of relevance. --------------------------- cut here ----------------------------------- I've updated the sequence reformatter of mine called ReadSeq. This program comes as C source code that is suitable for Unix, VMS, MS-DOS, or other command-line systems. Readseq reads and writes nucleic/protein sequence in these formats: Stanford/IG, Genbank, NBRF, EMBL, UWGCG, DNA Strider, Fitch, Pearson, Zuker, Olsen, Phylip v3.2, Phylip v3.3, and Plain text Data files may have multiple sequences. Software developers are encouraged to use these routines rather than devise their own obscure formats. The pascal version of readseq is now out-of-date. You can get the full set of readseq source and document files as an ARC archive file thru anonymous ftp to Iubio.bio.indiana.edu. See directory [archive.molbio.readseq]. Use binary ftp for getting the readseq.arc file. -- Don Don.Gilbert@iubio.bio.indiana.edu biology dept., indiana univ., bloomington, in 47405, usa --------------------------------------------------------------------------- Cheers, --bruce roe --broe@aardvark.ucs.uoknor.edu
SBAIRD%UOTTAWA@ACADVM1.UOTTAWA.CA (Stephen Baird) (04/05/91)
>From: Bruce Roe <BROE@aardvark.ucs.uoknor.edu> >Regarding the question regarding reading sequences in various formats. >--------------------------- cut here ----------------------------------- >I've updated the sequence reformatter of mine called ReadSeq. This >program comes as C source code that is suitable for Unix, VMS, MS-DOS, or >other command-line systems. >Readseq reads and writes nucleic/protein sequence in these formats: > Stanford/IG, Genbank, NBRF, EMBL, UWGCG, DNA Strider, Fitch, > Pearson, Zuker, Olsen, Phylip v3.2, Phylip v3.3, and Plain text >Data files may have multiple sequences. Software developers are >encouraged to use these routines rather than devise their own obscure >formats. The pascal version of readseq is now out-of-date. I'd prefer not reformating sequences as I use them. I'd prefer to have some program translate the sequence just for the program doing the analysis and then spit back the resulting sequence in the form I started with. If one uses several different programs which use different formats, there can be a resulting hodgepodge collection of different files of the same sequence with various changes or additions to it. A modular program (like readseq Ibelieve) which would filter the format and leave the comments intact (i think) would be useful. Modules could be added for the different programs for the different ways they open a sequence file. Is this asking too much? Stephen Baird Molecular Genetics Children's Hospital of Eastern Ontario sbaird@acadvm1.uottawa.ca