More than half of all published studies of human mitochondrial DNA (mtDNA) sequences contain mistakes, according to a geneticist at the University of Cambridge.

To the occasional chagrin of his peers, Peter Forster has repeatedly pointed out errors in published mtDNA sequences, the genetic material from cells' mitochondria, which are inherited from the mother. But his commentary in the latest issue of Annals of Human Genetics1 argues that the problem is far bigger than researchers had imagined.

The mistakes may be so extensive that geneticists could be drawing incorrect conclusions in studies of human populations and evolution, says Forster. They may also confuse forensic analyses that rely on the published sequences, he adds.

“I was surprised by the number of errors,” says Eric Shoubridge, a geneticist at McGill University's Montreal Neurological Institute in Canada, who investigates human diseases that result from problems with mtDNA. “What concerns me most is that these errors could be compounded in the databases.”

Published mtDNA sequences are popular tools for investigating the evolution and demography of human populations. Forster has been compiling a database of corrected mitochondrial sequences published since 1981, when the complete sequence of human mtDNA — known as the 'Cambridge reference sequence' — was published2. His colleagues' responses when he informs them of errors are varied. “Antagonism would be an understatement in some cases,” he says. But he adds that authors are often forthcoming in resolving discrepancies in sequences.

Neil Howell, vice-president for research at MitoKor, a San Diego-based biotech company that specializes in mitochondrial disease, says that Forster's error-detection method, which involves constructing evolutionary trees based on how sequences change, may even underestimate the extent of the errors. Howell jointly led the team that resolved minor errors in the Cambridge reference sequence3.

The extent to which the field has been affected by mtDNA sequence errors is an open question. Forster's recent commentary was commissioned to accompany a controversial paper that re-appraises mtDNA data on European populations. The paper claims to refute earlier work that says Icelanders are less genetically heterogeneous than other European groups4. The extent to which erroneous sequence data affect the Icelandic issue is not resolved.

Known errors have led to a retraction of conclusions in at least one high-profile case. In 1999, a team led by scientists at the University of Cambridge seemed to have overturned dogma when they claimed to have shown mixing of maternal and paternal mtDNA by genetic recombination5. But their results were subsequently found to be based on mistakes in the sequence.

Some scientists question the impact that most of the sequence errors identified by Forster will have on published conclusions. “I would say that it is only a small number of cases where sequencing errors would affect the results,” says Vincent Macaulay, a statistician at the University of Oxford.

Perhaps the gravest concern surrounds forensic investigations. Because large numbers of mitochondria are present in cells, they are often used to identify degraded samples from which nuclear DNA cannot be obtained. But the region of mtDNA typically used in forensics — the 'control region' — is highly variable, says geneticist Douglas Wallace of the University of California, Irvine. “People don't appreciate the fact that the control region can undergo different mutations in different cells,” he says. For instance, there might be differences between mtDNA from someone's blood and from the same person's hair follicle. “This erodes the reliability of forensic assays,” he says.

Most errors in published mtDNA sequences are the result of poor documentation, Forster claims. “Mistakes occur between reading the sequencing output and publishing the results,” he says. Journals are as much to blame as scientists, he adds, saying that editors should be more vigilant.

Forster notes that nuclear DNA sequences in public databases are also plagued by errors, and that this may be an even bigger problem, as such mistakes are more difficult to detect.