Scolaris Content Display Scolaris Content Display

Cytology versus HPV testing for cervical cancer screening in the general population

This is not the most recent version

Collapse all Expand all

Abstract

This is a protocol for a Cochrane Review (Diagnostic test accuracy). The objectives are as follows:

To determine the diagnostic accuracy of HPV testing for detecting histologically confirmed CIN 2 or worse (CIN 2+), including adenocarcinoma in situ, in women participating in primary cervical cancer screening; and how it compares to the accuracy of cytological testing (liquid‐based and conventional) at various thresholds.

Background

Screening for cervical cancer meets many of the prerequisites that the World Health Organization (WHO) dictates as necessary for a useful mass screening programme (Wilson 1968). The disease is common enough to justify mass screening, it is associated with significant mortality, effective treatment is available for pre‐invasive or early invasive disease and, finally, detection and treatment of a presymptomatic state results in benefits beyond those obtained through treatment of symptomatic disease. An effective mass screening test, the Pap test, was introduced in the 1940s by George Papanicolaou and is based on the cytological morphology assessment of exfoliated cervical cells (Papanicolaou 1941). Organised screening programmes based on the Pap test have been successful in reducing the incidence and mortality from the disease, although cancer still does occur in women who regularly attend for screening (Laara 1987). In the last two decades it has been established that cervical cancer has a strong causal relationship with persistent infection with high‐risk human papillomavirus (HPV) types (IARC 2007). Since then, research efforts have focused on the evaluation of a test for the detection of HPV deoxyribonucleic acid (DNA) as an alternative method of screening for cervical cancer precursors.

Target condition being diagnosed

Worldwide, there are approximately half a million cases of cervical cancer annually and 85% of cases occur in the developing world (Ferlay 1992; Ferlay 2004). Cervical cancer accounts for 10% of all female cancers, making it the second leading cause of cancer death in women. It is the third most common gynaecological cancer in the UK, after ovarian and endometrial cancer, although before the introduction of the screening programme it was the most common (Quinn 1999). In the developed world the incidence of and mortality from cervical cancer appears to be falling, particularly in countries with systematic screening programmes (Arbyn 2009). Despite this trend in the developed countries cervical cancer remains the second most common cancer in women less than 45 years of age (Ferlay 1992; Ferlay 2004).

Infection of the uterine cervix with the high‐risk types of HPV is necessary for the development of cervical cancer, although the HPV infection alone is usually not sufficient to cause cancer. The presence of additional co‐factors is required (Bosch 2002 ; IARC 2007). Most high‐risk HPV infections clear spontaneously but in a small proportion of women the infection persists. It is these women who are at risk of developing high‐grade cervical intraepithelial neoplasia (CIN) grades 2 or 3 and adenocarcinoma in situ, which are cancer precursors (Schiffman 2007). CIN 2 and 3 can be effectively treated by excision or ablation of the lesion. Over a period of 30 years, untreated CIN 3 has a risk of progressing to invasive disease in approximately 25% to 30% of cases (McCredie 2008; McIndoe 1984).

Index test(s)

Pap test

Currently in the developed world screening for cervical cancer is carried out by means of cytological examination of a cervical smear (the Pap test). After visualization of the cervix with the use of a speculum the specimen is obtained with a sampling device, usually a spatula and in some instances a brush, which is rotated on the cervix. The collected material is applied to a glass slide (for conventional cytology) or the sampling device is rinsed in a preservative solution (for liquid based cytology).

Cytologists reading the Pap tests usually follow the Bethesda classification system for reporting cervical cytologic diagnoses (Solomon 2002). In this system the smears are reported as negative for intraepithelial lesion or malignancy; atypical squamous cells of undetermined significance (ASC‐US); atypical squamous cells, cannot exclude high grade lesion (ASC‐H); low‐grade squamous intraepithelial lesion (LSIL); high‐grade squamous intraepithelial lesion (HSIL); squamous cell carcinoma; atypical glandular cells (ACG); adenocarcinoma in situ (AIS); or adenocarcinoma. Women with an abnormal Pap test should be referred for further investigation, which includes either repetition of the cytology, HPV triage or colposcopy (Jordan 2008; Wright 2006). Cervical smears in the UK are reported using the British Society of Cervical Cytopathology (BSCC) terminology, which includes the categories of negative, inadequate, mild dyskaryosis, moderate dyskaryosis, severe dyskaryosis, possible invasive cancer, glandular neoplasia, and borderline changes. Women in the UK are referred for colposcopy if three consecutive smears are reported as inadequate; two consecutive smears as borderline; or any smear is reported as mild, moderate or severe dyskaryosis, possible invasive cancer or glandular neoplasia (NHSCSP 2004).

The European executive policy is that women between the ages of 25 and 65 years are invited to have a cervical smear test every three to five years (Jordan 2008). The establishment of a population‐based screening programme with the ideal screening interval involves considerable infrastructure, workforce and equipment costs, which can be a barrier for implementation in developing countries.

HPV test

Considering that HPV cannot be grown in conventional cell cultures, and serological assays have only limited sensitivity (Dilner 1999), the diagnosis of HPV infection requires the detection of its genome in cellular samples collected from the site under investigation. In the case of the uterine cervix the test is performed by collecting exfoliated endocervical and ectocervical cells, similar to the Pap test. Specimens can be collected either by a healthcare provider during a pelvic examination, or through self‐sampling in the convenience of the woman’s home. Molecular technologies for the detection of HPV DNA can be broadly divided into amplified and non‐amplified. The tests mainly used in clinical research use amplification methods, which are further divided into signal amplified and target amplified. The main representative techniques of each category are the hybrid capture II (HCII, Digene Corporation, Gainthersburg, MD, USA) assay and polymerase chain reaction (PCR), respectively.

HCII is a Food and Drug Administration (FDA) approved test for HPV detection. It can detect infection from any of 13 high‐risk types (16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59 and 68) or 5 low‐risk types (6, 11, 42, 43, 44) but exact typing is not routinely possible. The number of viral copies that have to be present per sample in order to obtain a positive result is 5000. HCII succeeded an earlier test, the hybrid capture tube, which detected four fewer high‐risk types and had a higher threshold for positivity (50,000 viral copies per sample). That is, it had lower sensitivity than HCII and is therefore not currently used.

PCR is a chemical reaction resulting in the synthesis of a large number of target HPV DNA strands. It allows testing on scanty cell samples, small amount of DNA, or few viral copies and consists of two main steps. The first step is the amplification of the target DNA. This is performed with a thermocycling process (heating and cooling) and the use of oligonucleotide primers. The primers are usually consensus or general, meaning that they can be used to amplify a broad spectrum of HPV genotypes. They are aimed mainly at the L1 region of their genome. Type specific primers that amplify a particular HPV genotype can also be used, though rarely. There are various designs of general primers currently available. They differ in the size of the DNA region they amplify and in measures taken to compensate for the problem of intertypic sequence variation of the target DNA sites. The GP5+/6+ primers amplify a 150 bp fragment and have to be used at a low annealing temperature in order to compensate for the mismatches with different genotypes. The MY09/11 primers amplify a 450 bp fragment and consist of a complex mixture of oligonucleotides in order to make up for intertypic variation. The PGMY primers amplify the same region of DNA as MY09/11 but contain inosine, which matches any nucleotide. The SPF10 system is another example of inosine containing primers and targets a 65 bp region. Finally the CPI/II primers amplify a 188 bp region of the E1 gene.

The second step of the PCR process is the detection and analysis of the PCR products. The amplified DNA sequence can be detected by agarose gel electrophoresis. However type‐specific analysis is possible and this can be achieved by a variety of methods such as restriction fragment length polymorphism, Southern blotting, microtiter plate hybridization, direct sequence analysis and reverse hybridization. One commercially available PCR system for HPV detection is the Amplicor HPV test (Roche Molecular Systems, Branchburg, USA). It is designed to detect 13 high‐risk types (16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, 59 and 68). It utilizes primers targeting a 165 bp sequence on the L1 region. For detection and analysis after amplification, the amplified DNA is hybridized with high‐risk HPV probes on separated wells of microtiter plates.

Even though the Amplicor HPV test does not provide genotyping, there are at least two PCR‐based commercially available genotyping tests. These are the Linear Array HPV Genotyping Test (Roche Molecular Diagnostics, Pleasanton, CA, USA), which can distinguish between 37 different types, and the Inno‐LiPa HPV Genotyping Extra CE (Innogenetics, Gent, Belgium), which allows determination of 28 different types. HPV genotyping is useful mainly for research and epidemiological purposes. The established clinical uses of HPV genotyping are limited, at least at present (Koliopoulos 2009).

The basic disadvantage of HPV DNA detection methods in clinical practice is their low specificity. This is because HPV infections are usually transient and most of them do not cause any serious consequences. Only a small proportion of HPV infections initiate an oncogenic process that will eventually lead to the development of CIN and invasive cancer. Women with active HPV infection will express E6/E7 oncogenes. These are required for malignant transformation, by inhibiting the tumor suppressors p53 and RB. The E6/E7 mRNA transcripts are detected by mRNA‐based molecular techniques and may therefore be of higher prognostic value, improving the specificity and positive prognostic value compared with the HPV DNA testing used in screening. The most widely used mRNA test, the PreTect HPV Proofer assay (NorChip AS, Klokkarstua, Norway) detects only five (16, 18, 31, 33 and 45) high‐risk HPV types. This test uses real‐time multiplex nucleic acid sequence‐based amplification (NASBA), which is a method that amplifies single‐stranded nucleic acids (Chan 1999).

Other molecular markers of HPV infection such as P16 and L1 immunostaining will not be examined by this review.

Rationale

It is proven that 80% of cervical cancer can be prevented by well‐organised, high quality screening programmes using Pap smears with three‐ to five‐year screening intervals (IARC 2005). With well‐organised programmes, mortality from the disease can be reduced by approximately 90% (Hristova 1997). Some of the Nordic countries are good examples in this respect (Sigurdsson 1999). On the other hand, in several countries a decrease in cervical cancer incidence of only 40% to 65% has been documented. There are still countries with very high death and morbidity rates from this disease and with no historical decrease in the rates (IARC 2005).

Various shortcomings of cervical cytology screening have been suggested as the source of this observation. One of them is the relatively low sensitivity of a single Pap test, even though the longitudinal sensitivity of repeated cytology is higher. In cancer screening a high rate of false negative results is a serious weakness. Therefore a more sensitive screening test is desirable. A systematic review of cervical screening failures in countries with organised screening programmes showed that among the women who developed cervical cancer, 20% to 55% had false negative smears 0 to 6 years prior to the diagnosis (Spence 2007). However, this result should be interpreted cautiously as the percentage of cancers that are cytologically negative is in direct proportion to screening coverage. With expanding screening coverage the number of cancers detected between screening intervals will increase in relation to the number of cancers detected in the unscreened population (Herbert 2009).

Apart from the issue of low sensitivity, there are other concerns about the Pap smear test. There is considerable variation in the organisation and implementation of cervical cancer screening programmes within European countries (Anttila 2004). Infrastructure and resources in health care are not sufficient in many areas to build up an effective programme based on conventional cytology. Even in several rich countries a large proportion of the target women remain totally unscreened (Breitenecker 2004), forming a high‐risk group for cervical cancer. Moreover, very frequent screening intervals of young women may be associated with growing anxiety, over‐treatment and unfounded costs. Finally, there are concerns about the reproducibility of the Pap test.

Given HPV is the cause of cervical cancer and that HPV DNA is detected in virtually all cervical cancers (Walboomers 1999), new screening techniques based on HPV DNA testing have raised hopes and expectations for better prevention of the disease. Testing for HPV DNA is one of the most intensively studied alternatives to cervical cytology screening. The role of HPV testing has already been established and its use has gained wide acceptance in certain areas such as the triage of Pap smears with atypical squamous cell changes (ASCUS smears) and follow up after treatment (Arbyn 2004 ; Arbyn 2006). Its role in general population screening is not yet well defined.

Objectives

To determine the diagnostic accuracy of HPV testing for detecting histologically confirmed CIN 2 or worse (CIN 2+), including adenocarcinoma in situ, in women participating in primary cervical cancer screening; and how it compares to the accuracy of cytological testing (liquid‐based and conventional) at various thresholds.

Secondary objectives

To determine the diagnostic accuracy of the combination of HPV testing and cytological testing and to compare it with the accuracy of each test separately, where a positive combined test result is defined as at least one test that is positive and a negative combined result is where both tests are negative.

Investigation of sources of heterogeneity

Sources of heterogeneity will be addressed with the study of certain subgroups such as the:

  • type of cytology used (liquid based or conventional),

  • type of HPV test used,

  • number of HPV types detected by the HPV test,

  • positivity threshold for cytology (atypical squamous cell changes (ASCUS+), low grade squamous intraepithelial lesion (LSIL+)) and HPV test,

  • procedures used for reference standard verification,

  • geographical location of the study.

Methods

Criteria for considering studies for this review

Types of studies

Comparative test accuracy studies where all participants have received both HPV testing and cervical cytology followed by verification of the disease status with the reference standard. Studies where participants were randomized to receive either only the index test or only the comparator test will not be included.

Participants

Women participating in a cervical cancer screening programme who are not being followed up for previous cytological abnormalities. The study population should not be part of a case‐control design (with a predetermined proportion of known disease positives to known disease negatives). Rather, women should form a consecutive series; they should be recruited as a single group with their disease status being unknown at the time of recruitment. The women should be close to or within the age range suitable for cervical screening according to international guidelines (20 to 70 years).

Index tests

Only HPV tests that are still currently used in clinical research practice will be considered. These are:

  • HCII or newer improved signal amplification methods,

  • PCR using the following primers GP5+/GP6+, MY09/11, SPF10, or CPI/II,

  • newer techniques that might be identified during the search process (ie. mRNA testing for E6 and 7 genes).

We will consider the following thresholds for the definition of a positive result: 1 pg/ml for the hybrid capture II method; and for PCR the threshold used by the researchers.

Comparator tests

For conventional cytology or liquid‐based cytology we will consider two thresholds that define an abnormal Pap smear: ASCUS or worse, and LSIL or worse. In studies where the cytology is reported in other systems (that is the BSCC terminology or the Second Munich Cytological Classification) the results will be converted to the nearest equivalent in the Bethesda system. We will consider the borderline category of the BSCC and the Pap IIw category of the Munich classification as equivalent to the ASCUS category. We will consider the mild dyskaryosis category of the BSCC and the Pap IIID category of the Munich classification as equivalent to the LSIL category.

Target conditions

The target condition is high grade CIN 2 or worse. Some studies might have used the threshold of CIN 3. These will be included in the review but will be analysed separately.

Reference standards

As a reference standard, we will use the combination of colposcopy and histology. If colposcopy is normal, a histologic result will not be required for the definition of absence of disease. If colposcopy is abnormal, then the histologic result will be used as the reference standard. We will assume that the histologic examination of material obtained by colposcopy directed biopsy, loop excision or endocervical curettage provides complete assessment of the considered disease status.

Colposcopy as a reference standard has serious shortcomings even with directed biopsies. It is a subjective examination and has low sensitivity for the detection of small CIN 3 lesions (Jeronimo 2006). The ideal reference standard for the evaluation of a cervical screening test would be the excision of the whole transformation zone and its subsequent histopathological examination. Given that such a procedure in healthy women is ethically unjustifiable, due to its morbidity, studies have to rely on colposcopy with directed biopsies even with its limitations.

In this review we will include studies where the reference standard was used in one of three ways:

  1. applied to all women,

  2. applied to all women with a positive screening test and to a random sample of screen‐negative women in order to correct for verification bias,

  3. restricted to those with a positive screening test.

This last category of studies is prone to verification bias if the double test negatives are considered to be true negatives. However, verification bias will be limited when one of the screen tests is very sensitive. These studies can produce unbiased estimates of relative sensitivity and relative false positive rates.

Search methods for identification of studies

Electronic searches

We will perform a systematic literature search of articles (1992 to present day) that contain quantitative data. We will start our search from 1992 because HPV testing for clinical use was not introduced until a few years later.

Articles will be retrieved from the electronic bibliographic databases:

  • Cochrane Register of Diagnostic Test Accuracy Studies,

  • MEDLINE, through PubMed (January 1992 to current issue),

  • EMBASE (January 1992 to current issue).

The search strategy for MEDLINE is given in Appendix 1. A similarly structured search strategy will be designed to run in EMBASE and to search the Cochrane Register of Diagnostic Test Accuracy Studies. The service provider that will be used to access EMBASE is www.embase.com. Studies identified as relevant will be used as seeds in Scopus to identify articles citing the relevant studies. The 'related articles' feature in PubMed will be used, to retrieve articles which are similar in terms of keywords and database subject headings to the original included studies.

The search will not be restricted to articles written in the English language.

Searching other resources

The reference lists of articles identified as relevant will be checked for additional relevant articles and the reference lists of these will in turn be checked for relevance.

Authors of relevant articles will be contacted in order to obtain missing data.

Data collection and analysis

Selection of studies

One review author (GK) will assess the titles and abstracts from the literature search to determine whether they meet the eligibility criteria. If there is any doubt the full text of the article will be retrieved. Another review author (MK) will then review the search results and the articles detected by the first review author in order to increase the specificity of the search. If there are disagreements the third review author (MA) will be consulted. The selection process will not be blind (that is the names of the authors and institutions will not be concealed).

There will be a list of the non‐relevant studies so that it can be shown that consideration has been given to these studies.

Data extraction and management

An electronic data collection form will be used by one review author (GK) to collect the data from each study on:

  • study design,

  • number of participants,

  • age range of participants,

  • threshold for the definition of a positive screening result,

  • index and comparator tests,

  • method used as reference standard,

  • threshold used for the definition of disease (eg. CIN 2+, or CIN 3+),

  • the number of true positives, false positives, true negatives, and false negatives in a 2 x 2 table completed for each screening test used in each study.

The electronic data collection form will be double checked by a second review author (MK). The data collection form will be piloted with six to eight studies to identify any necessary changes to the form. Different forms might be required for different study designs.

Assessment of methodological quality

The QUADAS tool will be used for the assessment of methodological quality of the included studies (Whiting 2003). The tool will be used by two review authors (GK, PMH). The results for each study will be presented in table form. The use of QUADAS will show how study quality affects the results (Appendix 2).

Statistical analysis and data synthesis

The numbers of true positives, false negatives, false positives and true negatives defined at the considered thresholds will be extracted from each study, and test sensitivity and specificity will be calculated considering CIN 2+ and CIN 3+ as disease outcomes.

To assess differences in accuracy in the studies where only test positives are verified (Reference standards category 3), we will calculate the ratio of sensitivity (or specificity) of HPV testing to that of cytology and then pool the individual ratios. We will provide the pooled estimates of the sensitivity and specificity of the index and the comparator tests for the main positivity thresholds, although these are prone to verification bias. We will also calculate the accuracy of the combination of tests, defining a positive result as the positivity of either test and a negative result as the negativity of both tests.

For the rest of the studies a hierarchical summary receiver operating characteristic (ROC) analysis will be used as has been described by Rutter and Gatsonis (Rutter 2001). The hierarchical summary receiver operating characteristics (HSROC) model provides a general framework for the meta‐analysis of diagnostic studies and allows the calculation of the summary receiver operating characteristics (SROC) as well as the expected operating point on the curve and summary estimates of sensitivity and specificity (and hence likelihood ratios). It allows the meta‐analyst to investigate heterogeneity between studies while taking into account both within‐ and between‐study variability and thus avoids the need for separate meta‐analyses using a range of different methods often applied to subsets of the data. Given that heterogeneity is likely to be present in many meta‐analyses, we consider that a mixed model that uses all of the available data seems preferable to conducting multiple analyses on subsets of the data using a range of statistical methods. In particular, in the studies where a random sample of test negatives is verified (Reference standards category 2) we will not put the 2 x 2 data directly into RevMan 5 but will first calculate the adjusted number of screening test false negatives given the proportion of the verified population.

Investigations of heterogeneity

For investigation of the sources of heterogeneity, subgroup analyses and multivariate SROC regression will be performed. The variables that we will consider for such analyses are the geographical location where the study was conducted (for example Africa, Europe, North America) and quality‐related variables. The possible quality‐related variables that could affect heterogeneity have already been mentioned and are the use of blinding, the sample size, the type of cytology (liquid based or conventional), the type of HPV testing (HCII or PCR), the number of HPV types that the HPV test detects, and risk of verification bias.

Sensitivity analyses

The following sensitivity analyses will be performed.

  1. Analysis only of studies where the HPV test used detects all 13 high‐risk types, for maximum sensitivity. Studies where the test detects fewer types will be excluded. This analysis will show how the maximum sensitivity HPV test compares to cytology.

  2. Analysis only of studies where the reference standard was used on all women. This will minimise the effect of verification bias.

  3. A separate analysis on the accuracy of HPV testing in women over 30 years of age will be performed. Studies where the population is strictly over 30 years of age will be included in this analysis. This age group was selected as several proposals have been made that HPV testing is more useful among older women because it provides a higher positive predictive value for clinically significant disease. This view suggests that positive HPV tests among older women are more frequently associated with persistent infections and progression to CIN 3, whereas many HPV tests reflect transient infections among young women.

Assessment of reporting bias

The effective sample size funnel plot and associated regression test of asymmetry will be used to detect publication bias (Deeks 2005).