Sir

I was contacted recently by the Los Angeles Times, which had heard that the concept of expressed sequence tags (ESTs), essential to the human genome project, was developed in our laboratory. It is not generally known that we had described this development in 1983 in Nature1, preceding by many years the adaptation of the technique to the genome project2,3.

The genome project has important commercial applications. Still, I think it is essential for the community at large to appreciate that the development of the concept of ESTs and shotgun sequencing had nothing to do with commercial interests, but rather was motivated by our intellectual curiosity. The generosity and open-mindedness of the National Institutes of Health and Muscular Dystrophy Foundation enabled us to tackle the question with the spirit of ‘let's just do it and see what happens’.

In 1982, my laboratory was in the Department of Biology at Massachusetts Institute of Technology. DNA sequencing was a relatively new method developed along two different and independent lines by Walter Gilbert at Harvard and Fred Sanger at the MRC Laboratory in Cambridge, UK. We had started with the Maxam–Gilbert chemical method in the late 1970s, but we gradually incorporated Sanger's powerful dideoxy approach in the 1980s — now used almost exclusively in automated instruments.

Most people then were interested in obtaining sequences of individual genes or of a complementary (c)DNA copy of a single messenger (m)RNA. This was viewed as an enormous undertaking, and typically involved determining a partial sequence of the desired gene product (a protein), then making a degenerate oligonucleotide for hybridization and primer extension or reverse transcription. Eventually, a gene would be cloned and sequenced. Often, however, the degenerate primers did not yield anything useful because they cross-hybridized to spurious sequences.

The question we asked was as follows: what if we simply did random, shotgun sequencing of all expressed genes associated with a specific tissue? If we then translated these sequences into all six reading frames, we should pick up fragmentary amino-acid sequences associated with the proteins of that specific tissue. These would include known and unknown proteins. We would have to use a single primer that bound to all expressed sequences — the mRNAs. Because eukaryote mRNAs have ‘tails’ of polyadenylic acid (poly A) at their ends, a single primer of oligo deoxythymidylate (oligo dT) could be used. With this primer, we could reverse-transcribe the mRNA pool and thereby pick up sequences near the ends of those mRNAs. These were the expressed sequence tags (although we did not call them that then).

At the time this idea seemed a bit far-fetched. The only way we could know whether we were on the right track was if among the sequences we obtained were some that clearly coded for the proteins specific to the tissue we had used for the preparation of the cDNA. If we could see such sequences in our collection, then we would know that the many unassigned sequences were also relevant and coded for proteins that had yet to be discovered.

The approach succeeded beyond our wildest expectations. We prepared a cDNA library from rabbit muscle and did shotgun sequencing of randomly isolated clones. Altogether about 20,000 nucleotides were manually determined (a large number in that era). Among the clones we identified were coding sequences for 13 different muscle proteins, in addition to a new isotype of one protein (troponin T). This result gave us some confidence that, though some of the unassigned sequences might have arisen from artefacts, many of them undoubtedly encoded new proteins.

For many years we received requests for specific clones that people could use for their own purposes. (I had agreed with the editor of Nature before publication of the paper that we would freely distribute these clones.) Subsequently, the concept of ESTs and the strategy of shotgun sequencing were developed and expanded in the genome project2,3, and included its commercial applications.

I have long been a proponent of and participant in the biotechnology industry, although not with respect to the work described here. It should not be forgotten, however, that the origins of success for many biotechnology enterprises are in basic research sponsored by a government or charitable foundation. I believe that this is as it should be.

But my fear is that the public at large may believe that private enterprise per se is sufficient to generate the discoveries and advances needed for the future. This is simply not true. Only the government and private charities have the capacity and vision to let an investigator pursue questions that come from his or her own natural curiosity. Eventually, the pursuit of pure curiosity bears fruit for society when new knowledge and concepts — such as ESTs and shotgun sequencing — are taken up by industry, where true applications can be financed.