JP2@CU.NIH.GOV (08/09/90)
NIH/DOE DNA SEQUENCING WORKING GROUP SUMMARY MAY 10, 1990 The joint NIH/DOE DNA Sequencing Working Group conducted its first meeting on May 10, 1990. Following is a summary of their discussions. ROLE OF THE WORKING GROUP The role of the Working Group will be to advise the NCHGR and the DOE on all aspects of DNA sequencing in order to allow the agencies to formulate programs to achieve the goals of the Human Genome Program. The group recognized that sequencing programs are, by necessity, multi-disciplinary, and therefore the mission of this working group will overlap those of the working groups for mapping and informatics. There was also agreement that, because technology is evolving so rapidly in this area, it will be the role of this committee and the staff of the agencies to keep the rest of the scientific community abreast of the current state of the technology in the field. OVERVIEW OF CURRENT STATE OF THE FEDERAL PROGRAMS The NIH program currently funds annually approximately $1 million in sequencing projects and $1.5 million in DNA sequence technology development. The DOE funds only sequence technology development at present and estimates that about $5.7 million is spent annually on this effort. The DOE is considering a sequencing program in the National Laboratories that would concentrate on sequencing cDNAs to support the physical mapping effort. STATUS OF CURRENT LARGE SEQUENCING PROJECTS There are presently several efforts to sequence bacterial DNA that have been ongoing for three or more years. These efforts have proceeded more slowly than anticipated because of difficulties encountered in reading autoradiographs and managing the large amount of data accurately. Two efforts to sequence E. coli have each completed nearly 100 kb pairs of finished sequence to date and expect to be generating data at a considerably greater rate this summer. Automated sequencers are performing well in several laboratories. These labs can produce 7-8,000 bp of raw DNA sequence per day per machine with an accuracy of greater than 99%. Several laboratories have recently generated 100 kb pairs of finished sequence in a matter of several months and expect to complete 1 Mb pairs of finished sequence in the next year. The largest cost in these projects is personnel, so that most of the laboratories attempting large-scale sequencing are trying to automate as much of the sample preparation and data handling as possible. RECOMMENDATIONS THE NIH-DOE PROGRAMS SHOULD SUPPORT DEVELOPMENT OF NEW TECHNOLOGY AS WELL AS SEVERAL PROJECTS THAT ATTEMPT MEGABASE LEVEL SEQUENCING USING ADVANCED STATE-OF-THE-ART TECHNOLOGY. The five- year goals of the genome project call for support of DNA sequencing technology in order to reduce the cost of sequencing to $0.50 per base pair and produce 10 megabases pairs of human and 20 megabases pairs of model organism finished DNA sequence by the end of 1995. In order to achieve these goals, projects that test the feasibility of scaling up and extending state-of-the-art technology must be initiated now. These projects will, by attempting to sequence at levels and rates never achieved before, push the boundaries of sequencing technology while reducing the costs. Reasonable goals for such projects would be to attain the sequence of at least three megabase pairs per year using large contiguous segments of DNA at a cost of less than $0.75 per base pair (total costs) by the end of the third year and attaining a rate of two to four megabases per year by the end of the fifth year at a cost of $0.50 per base pair. Careful attention to error rates will be necessary to maximize the value of the sequence obtained. Such programs must only involve biologically interesting regions of DNA, so that the information gained will be of maximum interest to the scientific community. THE NIH-DOE SHOULD NOT, AT THIS TIME, SET UP A SEQUENCING RESOURCE FOR SEQUENCING LIMITED REGIONS OF GENOMIC DNA. While this strategy might represent a cost effective way of providing sequencing for many laboratories that want to sequence a single locus of interest, it was agreed that even with NIH and DOE subsidizing the effort, the total cost of sequencing would be very high at present, and it would be better to invest in improving new technology. In any case, it is expected that as the cost of sequencing technology is decreased, more companies will set up to provide this service. There are already several such service companies. The focus of the genome project at this stage should be to encourage technology development to make sequencing cost effective for all laboratories. THE NIH-DOE PROGRAMS SHOULD SUPPORT SEQUENCING AT THE MID-RANGE (<500KB PAIRS) ONLY WHEN THE REGION IS OF EXTREME BIOLOGICAL INTEREST. Because the cost of sequencing on a limited scale is very high at present, the programs should not support mid-range sequencing projects until sequencing is more cost effective or unless there is overwhelming biological interest in a particular site. Sequencing of small or mid-range regions should be encouraged and supported by other institutes. THE COMMITTEE BELIEVES THAT IT IS POSSIBLE TO MAKE DATA GENERATED FROM SEQUENCING PROJECTS AVAILABLE TO THE COMMUNITY WITHIN 3-6 MONTHS OF THE SEQUENCE BEING FINISHED AND IN NO CASE SHOULD SEQUENCE DATA BE HELD LONGER THAN ONE YEAR. Although it is reasonable for a laboratory that has done the sequencing to be allowed first chance at its analysis, it is also important to make the data available as quickly as possible. The laboratory that generated the data already has an inherent strategic advantage over any other laboratory wishing to begin analysis of the sequence. Thus, it was agreed that an aggressive policy to ensure timely release of data is needed.