[bionet.molbio.genbank] Formats for sequence distribution

Graham.Cameron@embl.bitnet ("Graham Cameron, ext. 257") (01/30/91)

Dear Colleagues,

The issue of what format(s) should be made available for ftp, dailyish  updates,
on  CD-ROM's,  magtapes  or  carved in granite is of great interest to EMBL.  We
have been giving a lot of thought to the issues raised in the  recent  round  of
banter on this topic.

     1.  The EMBL format as delivered is actually supposed to be  of  some  use:
         we  didn't  design it in an attempt to make life difficult for software
         producers.  We'd be much happier if software used the data as they  get
         them  rather  than  changing  them  into  their  own thing.  It's clear
         however that the consensus is that the  data  need  to  be  reformatted
         before  use.   Some reasons are clear:  e.g., compression, avoidance of
         having to wade through annotation to get  to  the  real  business  etc.
         We'll  certainly  try  to  be  responsive  to suggestions as to how the
         distribution format might be modified to make it more usable.

     2.  The streets are not paved with gold at EMBL.  It'd cost us just as much
         as anyone else to keep umpteen copies of the data in different formats.
         However, it is our job to supply the data as usefully as possible.

     3.  We cannot be partisan -  simply  to  supply  the  data  in  the  format
         suitable  for  our  favourite  commercial package would be invite wrath
         from people who disagree with us and from other software producers.

     4.  An approach we have discussed in the  past  is  to  give  any  software
         producer the chance to give us a filter through which an EMBL entry can
         be pushed to produce what they need.  It could then  be  an  option  in
         file-server  requests  to  ask  for  the entries to come in "Nifty-SEQ"
         format or whatever is available.

     5.  Production of releases in other formats could perhaps go the same  way,
         but  poses  a  few  problems.   We'd  be  unlikely  to  want  to do the
         conversion on the fly.  Producing a couple of  hundred  tapes  is  time
         consuming  enough  without  adding anything else to it so we'd probably
         have  to  store  all  the  formats  distributed.   Also  specific   s/w
         producers'  customisations  might  render our documentation invalid but
         we'd be reluctant to shuffle lots of bits of paper for  them.   Clearly
         the  CD mastering is another matter - we can't support lots of formats,
         but even here we could envisage a system whereby s/w producers  provide
         the tools to produce a CD in their own format (and pick up the bill for
         mastering).

In summary:

      -  We'd like our format to be usable - tell us why it's not.

      -  If s/w producers could provide the  filters,  different  formats  as  a
         fileserver option is not difficult.

      -  Production of magtapes in various formats could be done  similarly  but
         it'd cost us more.

      -  CD's are in some senses the most difficult and the s/w producers  would
         have to pick up the tab.




Graham.




Graham Cameron                            Phone      +49 (06221) 387257
Group Leader                              Telex      461613 (embl d)
The EMBL Data Library                     Telefax    +49 (6221) 387306
European Molecular Biology Laboratory
Postfach 10.2209   Meyerhofstrasse 1      Network(reply to) cameron@embl.bitnet
6900 Heidelberg                           General enquiries  datalib@embl.bitnet
Germany                                   Data submissions  datasubs@embl.bitnet