Main

AIDS is arguably the most serious infectious disease to have affected humankind. Not only are an estimated 42 million people carrying the virus at present1, but its case fatality rate is close to 100%, making it an infection of devastating ferocity. In 2002 alone, 5 million people became infected with the causative agent — the human immunodeficiency virus (HIV) — and, of these, 70% live in sub-Saharan Africa. Although a succession of antiviral agents has made HIV/AIDS a more manageable disease in some industrialized nations, and several vaccines are about to enter Phase III clinical trials, HIV will doubtless continue to impose a terrible burden of morbidity and mortality (Box 1).

Although the development of anti-HIV drugs and vaccines has often been a frustrating exercise, and many aspects of HIV pathogenesis remain unclear, research into the origins and evolution of the virus has proven more fruitful. Ironically, one of the main reasons that successful treatment of HIV infection is so difficult — its rapid rate of evolutionary change — has allowed us to reconstruct the evolutionary history of HIV with great precision. More importantly, establishing the ground rules that underpin the evolution of HIV will lead to better vaccines and antiviral agents. Here, we review the current evidence documenting the origin of HIV, discuss how the virus evolves in relation to the human immune system and argue that evolutionary ideas are essential for the successful control of the virus. Basic aspects of the biology of HIV are illustrated in Fig. 1.

Figure 1: Key aspects of the HIV life cycle.
figure 1

Although the human immunodeficiency virus (HIV) is able to infect a variety of cell types, AIDS results from the depletion of CD4+ T-HELPER LYMPHOCYTE CELLS, a key component of the human immune system. The env (envelope) gene encodes the proteins of the outer envelope of the virus, the gag (group-specific antigen) gene encode the components of the inner capsid protein, whereas the pol (polymerase) gene codes for the enzymes (such as REVERSE TRANSCRIPTASE) that are used in viral replication.

The origins of HIV

The key to understanding the origin of HIV was the discovery that closely related viruses — the simian immunodeficiency viruses (SIVs) — were present in a wide variety of African primates2. Collectively, HIV and SIV comprise the primate lentiviruses, and SIVs have been isolated in more than 20 African primate species. Importantly, in no case (other than laboratory-associated infections of Asian macaque monkeys) has it been shown that the SIVs cause disease in their hosts, although only a few studies of their natural history in wild populations have been undertaken.

Molecular phylogenies of HIV and SIV. The evolutionary history of HIV-1 and HIV-2 has been reconstructed in great detail by inferring phylogenetic trees of the primate lentiviruses. It was soon discovered that the two human viruses are related to different SIVs and therefore have different evolutionary origins (Fig. 2). Specifically, HIV-1 is most closely related to SIVcpz, which is found in some sub-species of chimpanzee (Pan troglodytes troglodytes and Pan troglodytes schweinfurthii) that inhabit parts of equatorial Western and Central Africa, respectively3,4. SIVcpz from P. t. troglodytes is of most interest because it shares the closest relationship with the abundant HIV-1 M group. The geographical range of P. t. troglodytes also encompasses the region in Africa that has the greatest genetic diversity of HIV-1, containing groups M, N and O; such a distribution is expected if this was where HIV-1 first emerged. By contrast, HIV-2 is most closely related to SIVsm5, which is found at high prevalence in sooty mangabey monkeys (Cercocebus atys). As with HIV-1, sooty mangabeys are most frequent in the regions of West Africa where HIV-2 is likely to have emerged.

Figure 2: Evolutionary history of the primate lentiviruses.
figure 2

Because both the human immunodeficiency virus type 1 (HIV-1) and HIV-2 lineages (red branches) fall within the simian immunodeficiency viruses (SIVs) that are isolated from other primates, they represent independent cross-species transmission events. The tree and other evidence also indicate that HIV-1 groups M, N and O represent separate transfers from chimpanzees (SIVcpz), again because there is a mixing of the HIV-1 and SIV lineages. Similarly, HIV-2 seems to have been transferred from sooty mangabey monkeys (SIVsm) on many occasions, although this is best documented in other phylogenetic analyses2. The tree was reconstructed using a MAXIMUM LIKELIHOOD METHOD on an alignment of 34 published nucleotide sequences of the viral polymerase (pol) gene (Fig. 1), excluding third codon positions (full details from the authors on request). Other abbreviations for viruses and their primate hosts are as follows: SIVcol, black and white colobus; SIVdrl, drill; SIVgsn, greater spot-nosed monkey; SIVlhoest, L'Hoest monkey; SIVmac, macaque; SIVmnd, mandrill; SIVmon, Campbell's mona monkey; SIVrcm, red-capped monkey; SIVsab, Sabaeus monkey; SIVsun, sun-tailed monkey; SIVsyk, Sykes' monkey; SIVtan, tantalus monkey; SIVver, vervet monkey. For clarity, only some subtypes of HIV-1 and HIV-2 are shown. All gene sequences were taken from GenBank (see online links box).

Molecular phylogenies also show that there have been many cross-species transmissions to humans, because there is a mixing of HIV and SIV lineages (Fig. 2). Although the vagaries of sampling make it difficult to determine exactly how many cross-species transfers have occurred, for HIV-2 this number might be at least four2, whereas three jumps from chimpanzees to humans are thought to explain the current diversity of HIV-1, such that groups M, N and O each have an independent origin3,6. However, inter-specific recombination, which might be common among the primate lentiviruses, greatly complicates this analysis7. For example, HIV-1 group N seems to be a recombinant between a SIVcpz strain and a virus related to the ancestor of group M (Ref. 3), but this event occurred before the establishment of group M and N in humans. SIVcpz is itself a composite of viruses, the descendants of which are now found in red-capped mangabeys (Cercocebus torquatus, SIVrcm) and greater spot-nosed monkeys (Cercocebus nictitans, SIVgsn)8.

Dating the evolution of primate lentiviruses. To fully understand the evolutionary history of HIV, it is also necessary to determine its timescale. At face value, the wide host-species range of the SIVs, as well as their low VIRULENCE and strong association with specific species9, argues for an ancient evolutionary history, perhaps representing virus–host co-divergence over several million years. Indeed, in some cases the phylogeny of the SIVs matches that of their primate hosts, as expected under a co-divergence model10. However, there are an increasing number of instances in which host and virus phylogenies are mismatched, implying more recent cross-species transfer in the SIVs8,11,12. If cross-species transfer is in fact the main mode of evolution, then any resemblance between host and virus trees might be because related primate species are often found in adjacent geographical ranges, or because host switching is most likely to occur between closely related host species13.

The timescale of evolution inferred from viral MOLECULAR CLOCKS also seems incompatible with long-term co-divergence. If co-divergence were true, then the divergence times of SIVs should broadly match those of their hosts, going back millions of years. However, all molecular clock estimates of primate lentivirus evolution are orders of magnitude more recent than this14, and the rates of mutation and replication are similar among these viruses15,16. Therefore, if molecular clocks are accurate, then the evolutionary timescale for each epidemic of HIV-1 and HIV-2 is measured only in decades. Several methods are available to measure substitution rates, and therefore divergence times, in RNA viruses, although the most reliable estimates come from analysing the temporal distribution of nodes on trees (Box 2). Application of these (and similar) methods have led to suggestions that the M group of HIV-1 originated in the 1930s, with a range of 10 years on either side17,18,19 (but see Box 2). A broadly similar evolutionary timescale has been proposed for HIV-2 (Ref. 20).

Although most estimates for the time of origin of HIV-1 are consistent, all can be subject to the same systematic bias. Once again, recombination might contribute to this error. Recombination has complex effects on the estimation of divergence times, by increasing apparent variation in rates among nucleotide sites and reducing genetic distances between sequences21,22. In these circumstances, perhaps the only reliable indicators of the timescale of HIV evolution are archival viral samples. The earliest HIV-1 M group sequence that is available was sampled in the Democratic Republic of Congo in 1959 (Ref. 23). That this sequence falls some distance from the root of the M group tree is strong evidence that the diversification of these viruses occurred before this time (Box 2). Accurately dating HIV evolution will require the analysis of more such 'fossil' viruses.

Emergence of the human diseases. By what mechanisms did HIV jump to humans? Although several theories have been proposed, there is little evidence to suggest that anything other than entirely 'natural' processes are responsible for its emergence2. In particular, given the frequency with which primate bushmeat that is sold in African markets is infected with SIV24, it is easy to envisage how individuals that are involved in the slaughter of animals or the preparation of food could become infected. Indeed, it is likely that SIV jumped into humans many times before the transmission pathways leading to the current AIDS epidemics were established, and that these incipient infections occurred in isolated rural communities and soon burnt out because of a lack of susceptible hosts.

It is also important to explain why HIV-1 forms subtypes, such that virus sequences tend to fall into distinct clusters with approximately equal genetic distances between them (10–30%, depending on the genes compared). These clusters are most likely to be produced by a combination of FOUNDER EFFECTS and incomplete sampling. In particular, intensive viral collection from West and Central Africa has now uncovered strains that fall between the previously described subtypes25. This indicates that these parts of Africa were the source of the strains that ignited successful epidemics in other localities, in Africa and beyond, and that the subtype structure of the HIV-1 tree to a large extent reflects sampling bias26. For example, most HIV-1 strains that were isolated in North America and Europe fall into subtype B, and their relative similarity reflects their recent common origin from a founder in, or from, Africa. Under the evolutionary timescale proposed for group M, the individual subtypes would have diversified in the last 40 years or so17,27, although there is debate as to whether subtype B originated in the 1960s or the 1970s28.

The phylogenesis of HIV-1 is therefore a dynamic process, such that subtypes will disappear and new epidemics arise (Fig. 3). More importantly, the current recommendations of what constitutes a subtype will become meaningless as phylogenetic complexity increases through the continued spread of the epidemic, with recombination among currently recognized types and more widespread sampling. The mean genetic divergence of subtype A, B, C and D genomes isolated in 1999 (9.8%, 10.0%, 7.9% and 8.5%, respectively) each equal that of the whole group M epidemic in 1985 (8.8%). The genetic divergence of the entire group M has increased to 14.9% in the same period (data from the HIV Database; see online links box). Although some of this increase can be ascribed to increased sampling, the substitution rate that this implies (0.002 substitutions per site, per year) lies within the range calculated previously for the viral env (envelope) and gag (group-specific antigen) genes (0.0028 and 0.0019, respectively)17.

Figure 3: The phylogenesis of HIV.
figure 3

The current global genetic diversity of HIV-1 group M is the result of several historical events in which geographically isolated epidemics (lower coloured triangles) were founded from strains that were present in a source population (large base triangle), most likely in the west of Central Africa25. Within each of these epidemics, frequent mixing of strains results in a complex recombinant structure (arrows within triangles). Subsequently, owing to global travel, the geographical ranges of these epidemics have increasingly overlapped, resulting in inter-subtype circulating recombinant forms (arrow between triangles), the number and mosaic complexity of which has steadily been increasing.

Evolutionary processes

Within-host evolution. One of the earliest and most striking observations made about HIV is the extensive genetic variation that the virus has within individual hosts, particularly in the hypervariable regions of the env gene29; this variation makes HIV one of the fastest evolving of all organisms. Such rapid evolution is the result of an explosive combination of factors. First, the virus experiences a high rate of mutation, with reverse transcriptase making 0.2 errors per genome during each replication cycle30, and further errors occurring during transcription from DNA by RNA Pol II polymerase. Second, HIV has remarkable replicatory dynamics: it has a viral generation time of 2.5 days and produces 1010–1012 new VIRIONS each day31. Finally, frequent recombination and natural selection further elevate its rate of evolutionary change.

Two key evolutionary questions arise when considering within-host genetic variation; how much of this diversity is shaped by IMMUNE SELECTION, and what is the relationship between genetic diversity and clinical outcome? Although it is possible that stochastic forces (GENETIC DRIFT) are involved in intra-host evolution, especially when the viral EFFECTIVE POPULATION SIZE is small (perhaps owing to population subdivision32), and when advantageous mutations are at low frequency, there is strong evidence that natural (positive) selection is the driving force of intra-host evolution. So, HIV successively FIXES mutations that allow it to evade immune responses, especially in the env gene. Host immune-selection pressure could be generated by neutralizing antibodies33 — for example, in the guise of an evolving GLYCAN SHIELD34, by T-helper cells35 or by CYTOTOXIC T LYMPHOCYTES (CTLs)36,37,38,39. CTL escape has been particularly well characterized, especially in SIV models40,41, where it can have a large impact on disease outcome42. Positive selection is also easily detectable in computational analyses of HIV gene sequences, in which the per site rate of NONSYNONYMOUS SUBSTITUTION (dN) exceeds that of SYNONYMOUS SUBSTITUTION (dS)35,43,44, and in the structure of intra-host phylogenies (see below). The remarkable strength of immune selection was revealed in a recent analysis of evolutionary dynamics in 50 HIV-1 patients45. Most fixed env amino-acid changes in these patients confer a selective advantage, with an average of one adaptive fixation event every 2.5 months. Under these criteria, HIV shows stronger positive selection than any other organism studied so far.

AIDS pathogenesis. Less clear is the role, if any, played by viral evolution in the development of AIDS in HIV-infected individuals. One early hypothesis was that there was a direct link between genetic (or, more specifically, antigenic) variation and pathogenesis, such that the immune system was unable to suppress each replicating variant and the patient succumbed to AIDS46. However, studies on larger numbers of patients have painted a far more complex picture of the relationship between genetic diversity and disease status. Although there is some evidence that rates of nonsynonymous substitution vary according to disease status, the direction of the correlation and its underlying causes are more uncertain47,48,49. This could, in part, be because analyses of intra-host sequence variation have failed to distinguish between total genetic variation and selectively advantageous changes, the latter being key to understanding the interaction between host and virus. Moreover, recombination again complicates the analysis, as it might produce false-positive evidence for natural selection in measures of dN/dS50,51. How to tease apart the respective roles of selection and recombination in shaping genetic diversity is a key area for future research in evolutionary genetics.

It is also important to determine whether the evolution of virulent viruses within each host tips the balance towards AIDS. The first indication that this might be the case was that AIDS patients more frequently harboured viruses that formed SYNCYTIA in vitro (SI strains) than those who were asymptomatic52. More recently, the development of SI strains has been associated with a broadening of co-receptor usage. Although HIV preferentially infects CD4+ cells, it also utilizes CHEMOKINE co-receptors, most notably CCR5 and CXCR4. Viral strains that infect macrophages use the CCR5 receptor (known as R5 strains), dominate during the early years of HIV infection and do not induce syncytia in vitro (NSI strains)53. A broadening of co-receptor usage — so that strains also using the CXCR4 receptor emerge (known as X4 strains) — generally occurs later in infection48. These T-cell tropic SI strains are associated with a higher rate of CD4+ cell decline and therefore a more rapid progression to AIDS54. However, many HIV-infected patients progress to AIDS without the appearance of X4 viruses. Moreover, HIV-1 subtype C, now the most common subtype in sub-Saharan Africa, is mainly composed of R5 virus strains55, but causes AIDS with the same frequency as other subtypes. Consequently, although viruses of higher virulence often appear during intra-host viral evolution, this alone is not responsible for the development of AIDS.

Taken together, most studies of intra-host HIV evolution indicate that the extent and structure of viral genetic diversity is more a marker of the arms race between host and virus than the cause of AIDS itself. In particular, a strong host immune response generates a strong selective reply from the virus, as measured in dN/dS. Therefore, the greatest positive selection in HIV is generally seen in patients with longer asymptomatic phases35,45.

Evolution within and among hosts. The persistent nature of HIV infection means that evolution occurs both within and among hosts. Strikingly, intra-host and inter-host evolution seem to be very different processes, with positive selection dominating in the former but not the latter. This is evident in the structure of phylogenetic trees (Fig. 4). Intra-host HIV phylogenies have a strong temporal structure, reflecting the successive fixation of advantageous mutations and the extinction of unfavourable lineages. By contrast, those trees that track viral evolution among hosts show little evidence for continual positive selection. Rather, they depict the (neutral) spatial and temporal diffusion of the virus, with viral lineages co-existing for extended time periods. Indeed, there is little evidence that fitness differences determine subtype structure and distribution. For example, experimental studies have revealed that subtype C viruses consistently have lower in vitro fitness than those assigned to subtype B (Ref. 56). Although caution should be shown when extrapolating from the laboratory to nature, this indicates that the high prevalence of subtype C in sub-Saharan Africa is the result of its chance entry into populations with high rates of partner exchange. However, it is unclear whether the success of HIV-1 group M, relative to groups N and O, is the result of some intrinsic property of the virus that enhances transmissibility, or because the founding virus from group M was fortunate enough to find itself in populations in which the epidemiological conditions were ideal for transmission.

Figure 4: Contrasting patterns of intra- and inter-host evolution of HIV.
figure 4

The tree was constructed using the NEIGHBOUR-JOINING METHOD on envelope gene-sequence data that was taken from nine HIV-infected patients48 (a total of 1,195 sequences, 822 base pairs in length), with those viruses sampled from each patient depicted by a different colour. In each case, intra-host HIV evolution is characterized by continual immune-driven selection, such that there is a successive selective replacement of strains through time, with relatively little genetic diversity at any time point. By contrast, there is little evidence for positive selection at the population level (bold lines connecting patients), so that multiple lineages are able to coexist at any time point. A major BOTTLENECK is also likely to occur when the virus is transmitted to new hosts.

Why is natural selection a less potent force among hosts than within them? The first factor is the bottleneck that accompanies inter-host transmission, which greatly reduces genetic diversity. Evidence for a strong bottleneck at transmission is the homogeneity of the virus during primary infection57,58, although this could depend on the mode of transmission59. The second important factor concerns the behavioural aspects of HIV transmission. HIV is predominantly a sexually transmitted disease, and so the extensive variation in rates of partner exchange will, in combination with the transmission bottleneck, generate strong genetic drift. As a result, strains with advantageous mutations could, by chance, find themselves in individuals with low rates of partner exchange and so will not be transmitted far in the population. Of more debate is whether a bottleneck has a selective component, so that strains that are better adapted to new hosts (such as R5 strains) competitively establish themselves in primary infection60, or whether it is entirely neutral61 and thereby only magnifies the effects of genetic drift.

Finally, some advantageous mutations, such as those conferring CTL escape, might not appear until relatively late in infection62. If these late-escape mutants do not arise until after most individuals have transmitted the virus, natural selection will be less effective at the population level. As a consequence, HIV strains might not readily adapt to the HLA HAPLOTYPE distributions of their local populations63, because some CTL-escape mutants have little opportunity for further transmission. The data presented to support the adaptation of HIV to HLA haplotypes at the population level only considered within-host evolution, albeit in a large number of patients, and did not measure the effect of transmission. Indeed, the fact that repeated individual adaptation was observed in these patients indicates that the HIV population as a whole was not adapted to the host HLA distribution. Moreover, although certain CTL-escape mutants can be transmitted through the population64, it is possible that CTL-escape mutations that are passed to individuals with the 'wrong' HLA background will sometimes be deleterious and removed by purifying selection. In summary, inter-host HIV evolution is not merely intra-host evolution played out over a longer timescale, and the evolutionary process that occurs within hosts will not select for viruses with enhanced transmissibility.

Recombination and HIV diversity. Genetic recombination is an integral part of the HIV lifecycle, occurring when reverse transcriptase switches between alternative genomic templates during replication. As already mentioned, the recombination rate of HIV is one of the highest of all organisms, with an estimated three recombination events occurring per genome per replication cycle65, thereby exceeding the mutation rate per replication. The discovery that most infected cells harbour two or more different proviruses66, and the evidence for dual infection67,68, set the stage for recombination to have a central role in generating HIV diversity. Indeed, recombination has now been detected at all phylogenetic levels: among primate lentiviruses7,8, among HIV-1 groups69, among subtypes70 and within subtypes71. Prevalent inter-subtype recombinants are denoted 'circulating recombinant forms' (CRFs). There are 15 currently recognized CRFs that show a broad range of complexity and are widely distributed. In some geographical regions, CRFs account for at least 25% of all HIV infections72. Probably because it is more difficult to detect, the role of intra-subtype recombination has traditionally been downplayed. However, recent population-genetic studies indicate that recombination is also a pervasive force within subtypes71,73.

As hinted at earlier, recombination has important implications for understanding the HIV epidemic. In particular, many evolutionary inferences about HIV are made after the reconstruction of phylogenies, which can be greatly affected by recombination. Therefore, analyses of phylogenetic relationships, the timing of events, demographic processes or natural selection, are all potentially affected by recombination21,73,74. For example, although the data set used for dating the origin of HIV-1 M group to the 1930s was 'cleaned' for known recombinants before analysis17, the presence of recombination still seems evident, raising concerns about the accuracy of this estimate75. Because recombination is so frequent, it cannot be factored out by simply identifying recombinants and excluding those from the analysis. Indeed, today's subtypes might comprise old and successful recombinant lineages that trace back to a shared ancestral population (Fig. 3). HIV should therefore be studied with methods that are robust to the occurrence of recombination, or that explicitly take recombination into account. The development of such methods will doubtless prove difficult, but is necessary to make reliable inferences on many aspects of HIV-1 evolution. A simple way to start might be through the use of network approaches for phylogenetic inference, in which individual sequences are allowed to have many ancestors, and which provide a good alternative to traditional trees76,77.

Within individual hosts, recombination interacts with selection and drift to produce complex population dynamics, and perhaps provides an efficient mechanism for the virus to escape from the accumulation of deleterious mutations or to jump between ADAPTIVE PEAKS. Specifically, recombination might accelerate progression to AIDS78 and provide an effective mechanism (coupled with mutation) to evade drug therapy, vaccine treatment79 or immune pressure80,81. For example, vaccines without STERILIZING activity (such as CTL-inducing vaccines) or drug therapies with inconsistent application might offer the chance for SUPERINFECTION, in turn allowing recombination to produce viruses that carry many vaccine or drug escape mutations. Clearly, more studies are needed to quantify in vivo the role of recombination in generating important HIV diversity.

Consequences of HIV evolution

The evolution of drug resistance. The emergence of drug resistance in HIV has perhaps been the single largest setback in the treatment of AIDS. Once touted as a cure for HIV infection82, there is now less optimism that highly active antiretroviral therapy (HAART), involving combinations of drugs that act against different aspects of the viral life cycle, can rid the body of virus altogether. Mathematical models that predicted the eradication of virus from a patient within 2–3 years83 failed to adequately take into account two key aspects of HIV biology: viral reservoirs and evolution. No doubt, the development of HAART has greatly extended life expectancy and quality of life for those suffering from AIDS who can access these expensive drugs. The failure of HAART as a cure has led to many important lessons concerning our understanding of HIV biology84.

The discovery of latent reservoirs of HIV-1 in patients on HAART85 was one of the key components in understanding the ability of the virus to persist long after the initiation of therapy86. These reservoirs can serve to replenish the main pool of replicating virus and are now known from a variety of cell types, including CD4+ T lymphocytes, follicular dendritic cells87 and macrophages88, and are housed in various tissues throughout the body89. However, key questions remain regarding when and how these reservoirs become stocked with virus, whether or not replication is ongoing in or near the reservoirs, and whether the reservoirs are restocked with virus from subsequent replication. Although drug therapy is efficient at controlling viral levels in the plasma, the viruses in reservoirs are protected from interaction with drugs and can be extremely long-lived. It is therefore crucial to gain a clearer picture of the role and function of reservoirs in HIV infection.

Although the importance of viral evolution in immune escape is well understood, the evolutionary potential of HIV was severely underestimated by those proposing HAART therapy as a cure for AIDS. This occurred for two reasons. First, whereas the evolution of drug resistance in single drug therapy (such as AZT) had been documented, conventional wisdom was that the new triple-drug combination therapy would prove too difficult to evolve a solution to, because mutations were believed to occur singly and to accumulate in an ordered fashion90. However, this ignored recombination, which allows the virus to accumulate and exchange drug-resistant mutations in a nonlinear fashion, leading to rapid evolution of drug-resistant mutants, even between different reservoirs81 (Fig. 5). Second, early studies from patients on HAART concluded that the virus was not undergoing evolution in those with 'undetectable' levels of viral load85,91. In fact, the data from these studies clearly indicated that the virus was evolving; sequences from different time points were separated by measurable branch lengths, indicating that mutational changes had accumulated. Evidently, it was only a matter of time before the virus hit the mutations that would confer drug resistance92,93. Thankfully, through computational and mathematical approaches, we are now gaining a better understanding of the evolutionary dynamics of HIV-1 and its reaction to drug therapy94.

Figure 5: Multiple drug resistance induced by recombination.
figure 5

Hypothetical example showing how recombination will be an important mechanism to generate drug resistance in HIV. In this figure, two different HIV strains that are resistant to drug A (in red) and drug B (in blue) recombine to produce a new strain that is resistant to both drugs.

There is now a growing database of HIV mutations that confer drug resistance (see HIV drug resistance database in online links box) and their implications for drug resistance are, of course, enormous. More worryingly, there is evidence that some drug-resistant mutants show a greater infectivity, and in some cases a higher replication rate, compared with viruses without drug-resistant mutations95. Drug-resistant mutations are now being seen in drug-naive patients96, perhaps indicative of adaptation of the virus to a changed environment in parts of the world where drug treatment is widespread. However, it is unclear how many of these cases are simply the result of contact with a drug-treated individual, as in most cases drug-resistance mutations have an overall fitness cost in the absence of drugs97.

Viral genetic diversity and vaccine design. The development of an HIV vaccine has been frustratingly slow. Most attention has been directed towards the viral envelope region and comprises a variety of approaches, including recombinant proteins, synthetic peptides, recombinant viral vectors, recombinant bacterial vectors, recombinant particles, and whole-killed and live-attenuated HIV. The latter two have not progressed to clinical trials owing to an unfavourable benefit/risk ratio, which is further supported by experimental evidence from humans and simian models98,99.

Not least of the hurdles in the development of an effective HIV vaccine is the clinical trial process to show efficacy against the virus. Phase I and II vaccine trials are performed on small numbers of volunteers at relatively low risk for HIV-1 infection and attempt to establish a record of safety and provide valuable immunogenicity data. Phase III trials, on the other hand, are large-scale natural experiments on populations with high incidence and high risk for HIV-1 infection. Although more than 60 Phase I and II trials have taken place for more than 30 candidate vaccines, only two, developed by VaxGen (see online links box), have progressed to Phase III trials. The results from the first US-based Phase III trial showed that the vaccine had no efficacy for the target population experiencing subtype B infection. However, there was evidence of some efficacy in certain minority subpopulations, although the sample size in these categories was limited100. A similar (no efficacy) result was recently announced from their Phase III trial in Thailand.

The other intriguing result from the US trial was that the sampling greatly increased the estimates of HIV genetic diversity across that country100. As such, these data make it clear that, despite attempts to characterize genetic diversity in the USA for HIV vaccine development101, our database of HIV variation is still woefully inadequate, even for understanding diversity in a single subtype. We have relied on the published HIV database to serve as the basis for population-genetic and diversity studies, even though these data have largely been collected for medical studies. Recently, it has been suggested that reconstructions of ancestral HIV sequences or simple consensus sequences could be used to develop vaccine strains101,102. The intent of these techniques is to obtain a strain that is, as best as possible, similar to all the strains of a given subtype and therefore will provide maximum possible coverage. Although these approaches are based on evolutionary reasoning, the long development and testing cycle of a new vaccine will mean that the genetic diversity of HIV might have changed considerably in the intervening time. It is therefore crucial, for the development of successful vaccines, to better understand the impact of drug resistance in HIV, and to accurately document the overall evolution of this virus and, especially, to continually monitor genetic diversity of HIV on a global perspective. The use of carefully designed sampling strategies, concomitant with our improving understanding and modelling of the molecular evolution and epidemiology of this virus, will allow us to begin to predict how this diversity will change.

Conclusions

As this review has shown, great progress has been made in our understanding of the origins and evolution of HIV. However, it is clear that there are a number of unanswered questions and areas for which future research will be highly beneficial. Perhaps the issue of greatest importance is to fully determine the extent of the difference between viral evolution within and among hosts. With this knowledge, we will be better able to predict the long-term spread of drug resistance and CTL-escape mutations, as well as the likely impact of vaccination.