[bionet.molbio.genome-program] NIH/DOE DNA SEQUENCING WORKING GROUP REPORT

JP2@CU.NIH.GOV (08/09/90)

                NIH/DOE DNA SEQUENCING WORKING GROUP
                              SUMMARY
                            MAY 10, 1990


The joint NIH/DOE DNA Sequencing Working Group conducted its
first meeting on May 10, 1990.  Following is a summary of their
discussions.

ROLE OF THE WORKING GROUP

The role of the Working Group will be to advise the NCHGR and the
DOE on all aspects of DNA sequencing in order to allow the
agencies to formulate programs to achieve the goals of the Human
Genome Program.  The group recognized that sequencing programs
are, by necessity, multi-disciplinary, and therefore the mission
of this working group will overlap those of the working groups
for mapping and informatics.  There was also agreement that,
because technology is evolving so rapidly in this area, it will
be the role of this committee and the staff of the agencies to
keep the rest of the scientific community abreast of the current
state of the technology in the field.

OVERVIEW OF CURRENT STATE OF THE FEDERAL PROGRAMS

The NIH program currently funds annually approximately $1 million
in sequencing projects and $1.5 million in DNA sequence
technology development.  The DOE funds only sequence technology
development at present and estimates that about $5.7 million is
spent annually on this effort.  The DOE is considering a
sequencing program in the National Laboratories that would
concentrate on sequencing cDNAs to support the physical mapping
effort.

STATUS OF CURRENT LARGE SEQUENCING PROJECTS

There are presently several efforts to sequence bacterial DNA
that have been ongoing for three or more years.  These efforts
have proceeded more slowly than anticipated because of
difficulties encountered in reading autoradiographs and managing
the large amount of data accurately.  Two efforts to sequence E.
coli have each completed nearly 100 kb pairs of finished sequence
to date and expect to be generating data at a considerably
greater rate this summer.

Automated sequencers are performing well in several laboratories.
These labs can produce 7-8,000 bp of raw DNA sequence per day per
machine with an accuracy of greater than 99%.  Several
laboratories have recently generated 100 kb pairs of finished
sequence in a matter of several months and expect to complete 1
Mb pairs of finished sequence in the next year.  The largest cost
in these projects is personnel, so that most of the laboratories
attempting large-scale sequencing are trying to automate as much
of the sample preparation and data handling as possible.

RECOMMENDATIONS

THE NIH-DOE PROGRAMS SHOULD SUPPORT DEVELOPMENT OF NEW TECHNOLOGY
AS WELL AS SEVERAL PROJECTS THAT ATTEMPT MEGABASE LEVEL
SEQUENCING USING ADVANCED STATE-OF-THE-ART TECHNOLOGY.  The five-
year goals of the genome project call for support of DNA
sequencing technology in order to reduce the cost of sequencing
to $0.50 per base pair and produce 10 megabases pairs of human
and 20 megabases pairs of model organism finished DNA sequence by
the end of 1995.  In order to achieve these goals, projects that
test the feasibility of scaling up and extending state-of-the-art
technology must be initiated now.  These projects will, by
attempting to sequence at levels and rates never achieved before,
push the boundaries of sequencing technology while reducing the
costs.  Reasonable goals for such projects would be to attain the
sequence of at least three megabase pairs per year using large
contiguous segments of DNA at a cost of less than $0.75 per base
pair (total costs) by the end of the third year and attaining a
rate of two to four megabases per year by the end of the fifth
year at a cost of $0.50 per base pair.  Careful attention to
error rates will be necessary to maximize the value of the
sequence obtained.  Such programs must only involve biologically
interesting regions of DNA, so that the information gained will
be of maximum interest to the scientific community.

THE NIH-DOE SHOULD NOT, AT THIS TIME, SET UP A SEQUENCING
RESOURCE FOR SEQUENCING LIMITED REGIONS OF GENOMIC DNA.  While
this strategy might represent a cost effective way of providing
sequencing for many laboratories that want to sequence a single
locus of interest, it was agreed that even with NIH and DOE
subsidizing the effort, the total cost of sequencing would be
very high at present, and it would be better to invest in
improving new technology.  In any case, it is expected that as
the cost of sequencing technology is decreased, more companies
will set up to provide this service.  There are already several
such service companies.  The focus of the genome project at this
stage should be to encourage technology development to make
sequencing cost effective for all laboratories.

THE NIH-DOE PROGRAMS SHOULD SUPPORT SEQUENCING AT THE MID-RANGE
(<500KB PAIRS) ONLY WHEN THE REGION IS OF EXTREME BIOLOGICAL
INTEREST.  Because the cost of sequencing on a limited scale is
very high at present, the programs should not support mid-range
sequencing projects until sequencing is more cost effective or
unless there is overwhelming biological interest in a particular
site.   Sequencing of small or mid-range regions should be
encouraged and supported by other institutes.

THE COMMITTEE BELIEVES THAT IT IS POSSIBLE TO MAKE DATA GENERATED
FROM SEQUENCING PROJECTS AVAILABLE TO THE COMMUNITY WITHIN 3-6
MONTHS OF THE SEQUENCE BEING FINISHED AND IN NO CASE SHOULD
SEQUENCE DATA BE HELD LONGER THAN ONE YEAR.  Although it is
reasonable for a laboratory that has done the sequencing to be
allowed first chance at its analysis, it is also important to
make the data available as quickly as possible.  The laboratory
that generated the data already has an inherent strategic
advantage over any other laboratory wishing to begin analysis of
the sequence.  Thus, it was agreed that an aggressive policy to
ensure timely release of data is needed.