Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE) for the detection of dementia within a secondary care setting

Summary of findings 1. Summary of findings

Study ID	Country	Subjects (n)	Mean Age (yrs)	IQCODE version	Language	Dementia diagnosis	Dementia prevalence N (%)	Other assessments
Flicker 1997	Australia	377 (299 from MC)	73.4 MC 79.7 ACAT	26 item	English	DSM‐III‐R	n = 248 (65.8)	AMT; MMSE
Garcia 2002	Spain	113	78	16 item	Spanish	DSM‐III‐R	n = 90 (87.4)	MMSE
Goncalves 2011	Australia	204	76.9	16 item	English	DSM‐IV‐TR	n = 152 (74.5)	RUDAS; SMMSE
Hancock 2009	UK	144	67 (median)	26 item	English	DSM‐IV	n = 85 (59.0)	ACE‐R; MMSE
Harwood 1997	UK	177	76	16 item	English	DSM‐III‐R	n = 21 (10.5)	AMT
Jorm 1991	Australia	69	80	26 item	English	DSM‐III‐R; ICD‐10	n = 24 (34.8)	MMSE
Knaefelc 2003	Australia	323	74.7	16 item	English	DSM‐IV	n = 229 (70.9)	CAMDEX
Mackinnon 1998	Switzerland	106	80.3	16 item	French	DSM‐IV	n = 58 (54.7)	MMSE
Mulligan 1996	Switzerland	76	81.8	26 item	French	DSM‐III‐R	n = 33 (43.4)	AEMT; MMSE (French)
Narasimhalu 2008	Singapore	576	65 ‐ 70 (mean by diagnosis)	16 item	Cantonese	DSM‐IV	n = 169 (29.3)	MMSE (Singapore)
Sikkes 2010	The Netherlands	328 (59 known MCI)	68.4	16 item	Dutch	NINCDS‐ADRDA	n = 180 (54.9)	MMSE
Siri 2006	Thailand	200	72.9	32 item & 26 item	Thai	DSM‐IV	n = 100 (50.0)	BOMC; CDT; MIS; MMSE
Tang 2003	China	189	74.2	26 item	Chinese	DSM‐IV	n = 24 (12.7)	CDR; MMSE
See Characteristics of included studies for more detailed study descriptors Abbreviations: ACAT ‐ Aged Care Assessment Team Group; ACE‐R ‐ Addenbrooke's Cognitive Examination‐Revised; AEMT ‐ Antisaccadic Eye Movement Test; AMT ‐ Abbreviated Mental Test; BOMC ‐ Blessed Orientation Memory; CAMDEX ‐ Cambridge Mental Disorders of the Elderly Examination; CDR ‐ Clinical Dementia Rating Scale; CDT ‐ Clock Drawing Test; DSM ‐ American Psychiatric Association Diagnostic and Statistical Manual of Mental Disorders; MC ‐ Memory Clinic Group; MCI ‐ Mild Cognitive Impairment; MIS ‐ Memory Impairment Screen; MMSE ‐ Mini Mental State Examination; NINCDS‐ADRDA ‐ National Institute of Neurological and Communicative Disorders and Stroke and Alzheimer's Disease and Related Disorders Association; NINDS‐AIREN ‐ National Institute of Neurological Disorders and Stroke and the Association Internationale pour la Recherche et l'Enseignement en Neurosciences; RUDAS ‐ Rowland Universal Dementia Assessment Scale; SMMSE ‐ Standardized Mini Mental State Examination

Summary of findings 2. New Summary of findings table

Test	Summary accuracy (95% CI)	No. of participants (studies)	Dementia prevalence	Implications, Quality and Comments
What is the accuracy of the Informant Questionnaire for Cognitive Decline in the Elderly (IQCODE) test for detection of dementia when differing thresholds are used to define IQCODE positive cases
Population	Adults attending secondary‐care services, with no restrictions on the case mix of recruited participants
Setting	Our primary setting of interest was secondary care; within this rubric we included inpatient wards and hospital outpatient clinics
Index test	Informant Questionnaire for Cognitive Decline in the Elderly (IQCODE) administered to a relevant informant. We restricted analyses to the traditional 26‐item IQCODE and the commonly‐used short form IQCODE with 16 items
Reference Standard	Clinical diagnosis of dementia made using any recognised classification system
Studies	We included cross‐sectional studies but not case‐control studies

IQCODE cut‐off 3.3 or nearest	sens: 0.91 (95% CI 0.86 to 0.94); spec: 0.66 (95% CI 0.56 to 0.75) +ve LR: 2.7 (95% CI 2.0 to 3.6) ‐ve LR: 0.14 (95% CI 0.09 to 0.22)	n = 2745 (13 studies)	n = 1413 (51%)	Within the range of commonly used cut‐offs for defining IQCODE positivity, there is no clearly optimal value for use in secondary care settings. The sensitivity falls as the diagnostic threshold increases from 3.3‐3.6, with a relative increase in the specificity. The preferred balance between sensitivity and specificity is debatable. Both false positive (person diagnosed with possible dementia and referred for further assessment) and false negative (person with dementia has diagnosis missed and is not referred to specialist services) are associated with potential harms. The dementia prevalence was highly varied in included studies (10.5% to 87.4%) reflecting the heterogeneity of included participants within a "hospital" setting. This heterogeneity and associated "modelling" of real world implications of the test accuracy data presented are described in the next summary of findings table.
IQCODE cut‐off 3.3	sens: 0.96 (95% CI 0.94 to 0.98) spec: 0.66 (95% CI 0.41 to 0.84) +ve LR: 2.8 (95% CI 1.5 to 5.5) ‐ve LR: 0.1 (95% CI 0.03 to 0.1)	n = 722 (4 studies)	n = 334 (46%)
IQCODE cut‐off 3.4	sens: 0.94 (95% CI 0.84 to 0.98) spec: 0.73 (95% CI 0.59 to 0.85) +ve LR: 3.5 (95% CI 2.1 to 5.8) ‐ve LR 0.1 (95% CI 0.03 to 0.2)	n = 1211 (4 studies)	n = 394 (33%)
IQCODE cut‐off 3.5	sens: 0.92 spec: 0.63	n = 269 (1 study)	n = 152 (57%)
IQCODE cut‐off 3.6	sens: 0.89 (95% CI 0.85 to 0.92) spec: 0.68 (95% CI 0.56 to 0.79) +ve LR: 2.8 (95% CI 1.9 to 4.0) ‐ve LR: 0.2 (95% CI 0.1 to 0.2)	n = 1576 (9 studies)	n = 968 (61%)
CAUTION: The results in this table should not be interpreted in isolation from the results of the individual included studies contributing to each summary test accuracy measure. These are reported in the main body of the text of the review. **: quantitative synthesis not performed as only one study reported data at cut‐off of 3.5
Abbreviations: sens ‐ sensitivity; spec ‐ specificity; +ve LR ‐ positive likelihood ratio; ‐ve LR ‐ negative likelihood ratio

Summary of findings 3. New Summary of findings table

What is the accuracy of the Informant Questionnaire for Cognitive Decline in the Elderly (IQCODE) test for detection of dementia using different versions of IQCODE and using different languages of administration
Population	Adults attending secondary care services, with no restrictions on the case‐mix of recruited participants
Setting	Our primary setting of interest was secondary care, within this rubric we included inpatient wards and hospital outpatient clinics. Secondary care settings can be considered as two groups: (1) Studies conducted in a specialist memory/psychogeriatrics setting where participants were referred due to cognitive symptoms (2) Non‐memory focused hospital services. These included unselected admissions of older adults, those referred to specialist older people's assessment teams, outpatient attenders and inpatients under the care of geriatricians
Index test	Informant Questionnaire for Cognitive Decline in the Elderly (IQCODE) administered to a relevant informant. We restricted analyses to the traditional 26‐item IQCODE and the commonly used short form IQCODE with 16 items
Reference Standard	Clinical diagnosis of dementia made using any recognised classification system
Studies	Cross‐sectional studies were included, we did not include case‐control studies
Comparative analyses
Test	No. of participants (studies)	Dementia prevalence total across studies	Findings	Implications
26‐item versus 16‐item IQCODE	Total: n = 2745 (13) 26 item n = 977 (6)	Total n = 1413 (51%) 26 item n = 514 (53%) 16 item n = 899 (51%)	No significant difference in test accuracy Relative sensitivity of 26‐item versus 16‐item IQCODE: 0.98 (95% CI 0.89 to 1.07) Relative specificity of 26‐item versus 16‐item IQCODE: 0.99 (95% CI 0.75 to 1.33)	There was no difference in accuracy between IQCODE versions so it may be justifiable to advocate use of the short form to minimise responses required
English language versus Non‐English	Total: n = 2745 (13) English language n = 1216 (6)	Total n = 1413 (51%) English language n = 759 (62%) Non‐English language n = 654 (43%)	No significant difference in test accuracy Relative sensitivity of English language versus non‐English language IQCODE: 1.07 (95% CI 0.98 to 1.17) Relative specificity of English language versus non‐English language IQCODE: 1.10 (95% CI 0.83 to 1.47)	The language of administration does not significantly influence the diagnostic accuracy of IQCODE
Non‐memory setting versus memory	Total: n = 1918 (9)* memory setting n = 1352 (6)	Total n = 1129 (59%) memory setting n = 984 (73%) non‐memory setting n = 145 (26%)	Significant difference in test accuracy between settings (P = 0.019), due to higher specificity in non‐memory settings Relative sensitivity of non‐memory versus memory IQCODE: 1.06 (95% CI 0.99 to 1.15) Relative specificity of non‐memory versus memory IQCODE: 1.49 (1.22 to 1.83)	The lower level of specificity in the specialist memory services is of limited clinical concern as other tests will be used in this setting and incorrectly diagnosing someone with dementia based on IQCODE alone would be unlikely. In the non‐memory setting it is likely a positive IQCODE would prompt referral to specialist services, and this may be associated with psychological harm and unnecessary expense. Applying our non‐memory findings to the UK; there are around 2 million unscheduled admissions annually in over‐65s (Imison 2012) and a dementia prevalence of 42.4% in this group (Sampson 2009). Using the IQCODE alone to screen for dementia would result in 42,400 people with dementia not being identified and 218,880 dementia‐free people being referred inappropriately for specialist assessment.
CAUTION: The results in this table should not be interpreted in isolation from the results of the individual included studies contributing to each summary test accuracy measure. These are reported in the main body of the text of the review. *: Four studies included participants recruited in both specialist memory and non‐memory settings, without reporting outcome data stratified by recruitment setting and are thus not included in the quantitative synthesis

Background

Dementia is a chronic, progressive, neurodegenerative syndrome that is a substantial and growing public health concern (Hebert 2003; Hebert 2013; Prince 2013). Depending on the case definition employed, contemporary estimates of dementia prevalence in the United States are in the range 2.5 to 4.5 million individuals. Dementia is predominantly a disease of older adults, with a 10% prevalence in adults aged over 65, increasing to around 30% in adults aged over 85 (Ferri 2005). Changes in population demographics will be accompanied by increases in dementia incidence and prevalence. Consensus opinion based on current epidemiological trends is of a doubling in dementia prevalence every 20 years, with a global prevalence of around 81 million cases by 2040. Dementia is not limited to 'Western' nations and an increasing prevalence is particularly marked in countries such as China and India (Ferri 2005). Recent population follow‐up studies have cast doubt upon earlier estimates of increasing dementia incidence (Matthews 2013); however even with these lower predictions of incidence, the absolute number of individuals with dementia in society will be substantial, and accurate diagnosis remains a public health priority.

A key element of effective management in dementia is a firm diagnosis. Recent guidelines place emphasis on early diagnosis to facilitate improved management and to allow informed discussions and planning with patients and carers. Given the projected global increase in dementia prevalence, there is a potential tension between the clinical requirements for robust diagnosis at the individual patient level and the need for equitable, easy access to diagnosis at a population level. The ideal would be expert, multidisciplinary assessment informed by various supplementary investigations. Such an approach may be possible for assessment of challenging cases in 'specialist' settings, but is not practical or feasible for all people with possible cognitive decline.

In practice a two‐stage process is often employed, with initial screening or 'triage' assessments, suitable for use by non‐specialists, used to select those people who require further detailed assessment (Boustani 2003). Various tools for initial cognitive screening have been described (Brodaty 2002; Folstein 1975; Galvin 2005). Regardless of the methods employed, there is scope for improvement, with observational work suggesting that many people with dementia are not diagnosed (Chodosh 2004; Valcour 2000). UK national dementia strategies have focused on secondary (hospital) care, particularly unscheduled admissions, as a setting where there may be scope for opportunistic dementia screening (Shenkin 2014).

Screening assessment often takes the form of brief, direct cognitive testing. Such an approach will only provide a 'snapshot' of cognitive function. However, a defining feature of dementia is cognitive or neuropsychological change over time. Patients themselves may struggle to make an objective assessment of personal change, and so an attractive approach is to question collateral sources with sufficient knowledge of the patient. Informant‐based interviews have been described that aim to retrospectively assess change in function. An instrument prevalent in research and clinical practice is the Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE) (Jorm 1988) and this is the focus of our review.

A number of properties can be described for a clinical assessment (reliability, responsiveness, feasibility); for our purposes the test property of greatest interest is diagnostic test accuracy (DTA) (Cordell 2013).

Target condition being diagnosed

The target condition for this diagnostic test accuracy review is all‐cause dementia (clinical diagnosis).

Dementia is a syndrome characterised by cognitive or neuropsychological decline sufficient to interfere with usual functioning. The neurodegeneration and clinical manifestations of dementia are progressive and at present there is no 'cure', although numerous interventions to slow or arrest cognitive decline have been described, for example, pharmacotherapy such as acetylcholinesterase inhibitors; memantine; or cognitive rehabilitation therapies (Bahar‐Fuchs 2013; Birks 2006; McShane 2006).

Dementia remains a clinical diagnosis, based on history from the patient and suitable collateral sources, and direct examination including cognitive assessment. We have chosen expert clinical diagnosis as our 'gold standard' (reference standard) for describing IQCODE properties, as we believe this is most in keeping with current diagnostic criteria and best practice. We recognise that there is no universally accepted, ante‐mortem, gold standard diagnostic strategy. Although some would argue that the true gold standard would be neuropathological data, for the purposes of testing diagnostic accuracy in secondary care, limiting analysis to those studies with neuropathologically confirmed diagnosis is likely to yield limited and highly selected data. Furthermore, recent studies have suggested only a modest correlation between neuropathological changes and clinical cognitive phenotype in older age. There are several studies that have described cognitive impairment in 'normal brains' and multiple pathological changes with preserved cognition (Matthews 2009; Wharton 2015). We also recognise that clinical‐neuropathological correlations are less apparent in mixed dementia and older people, who form the majority with dementia in the hospital setting (Savva 2009).

Criteria for diagnosis of dementia are evolving in line with improvements in our understanding of the underlying pathophysiological processes. Various biomarkers based on biological fluid assays or functional/quantitative neuroimaging have shown promise but to date are not accepted or validated as independent diagnostic tests (McKhann 2011). Here a distinction must be made between dementia diagnosis in clinical practice and dementia diagnosis for clinical research. These novel biomarker and imaging techniques may be increasingly used in secondary‐care settings and may be stipulated in research diagnostic criteria but are not absolutely required for clinical diagnosis.

The label of dementia encompasses varying pathologies, of which Alzheimer’s disease is the most common (Savva 2009). For our reference standard of clinical diagnosis, we accepted a dementia diagnosis made according to any of the internationally accepted diagnostic criteria, with exemplars being the various iterations of the World Health Organization, International Classification of Diseases (ICD) and American Psychiatric Association Diagnostic and Statistical Manual of Mental Disorders (DSM) for all‐cause dementia and subtypes (Appendix 1) and the various diagnostic criteria available for specific dementia subtypes, i.e. NINCDS‐ADRDA (National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association) criteria for Alzheimer’s dementia (McKhann 1984); McKeith criteria for Lewy Body dementia (McKeith 2005); Lund criteria for frontotemporal dementias (McKhann 2001); and the NINDS‐AIREN (National Institute of Neurological Disorders and Stroke and the Association Internationale pour la Recherche et l'Enseignement en Neurosciences) criteria for vascular dementia (Erkinjuntti 2000; Roman 1993). We considered all‐cause dementia as the target condition for our primary analysis of diagnostic test accuracy, recognising that in a selected cohort referred to hospital services there may be a greater spectrum of differing dementia pathologies than is seen in unselected community cohorts. We have not defined preferred diagnostic criteria for rarer forms of dementia (e.g. alcohol‐related; HIV‐related; prion disease‐related), which were considered under our rubric of 'all‐cause dementia' and were not considered separately.

The label 'dementia' can also span a range of disease severities, from mild to end‐stage disease. We recognise that the diagnostic properties of a tool such as IQCODE vary depending on disease stage; for example, a patient is more likely to screen positive when disease is advanced and diagnosis is clear. For our primary analysis we included any dementia diagnosis at any stage of disease. Definitions pertinent to various stages of the dementia 'journey' are also described: a preclinical stage occurring years before disease is manifest, which may be characterised by changes in one or more disease biomarkers (Sperling 2011); a stage of mild cognitive impairment (MCI) where problems with cognition are noticed by the patient or others but the disease is not sufficiently advanced to warrant a diagnostic label of dementia (Albert 2011); and finally established dementia as defined above (McKhann 2011). We have not included diagnoses of preclinical and MCI states in this review.

Index test(s)

Our index test was the Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE).

The IQCODE was originally described as a 26‐item informant questionnaire that seeks to retrospectively ascertain change in cognitive and functional performance over a 10‐year time period (Jorm 1988). IQCODE is designed as a brief screen for potential dementia, usually administered as a questionnaire given to the relevant proxy. For each item the chosen proxy scores change on a five‐point ordinal hierarchical scale, with responses ranging from 1: 'has become much better' to 5: 'has become much worse'. This gives a sum‐score of 26 to 130 that can be averaged by the total number of completed items to give a final score of 1.0 to 5.0, where higher scores indicate greater decline.

First described in 1989, use of IQCODE is prevalent in both clinical practice and research (Holsinger 2007). A literature describing the properties of IQCODE is available including studies of non‐English IQCODE translations; studies in specific patient populations; and modifications to the original 26‐item direct informant interview (Isella 2002; Jorm 1989; Jorm 2004). Versions of the IQCODE have been produced in other languages, including Chinese, Dutch, Finnish, French, Canadian French, German, Italian, Japanese, Korean, Norwegian, Polish, Spanish and Thai (www.anu.edu.au/iqcode/). A shortened 16‐item version is also available (Jorm 1994); this modified IQCODE is common in clinical practice and has been recommended as the preferred IQCODE format (Jorm 2004).

For this review the term 'IQCODE' refers to the original 26‐item questionnaire as described by Jorm 1988. Other versions of IQCODE were described according to the number of items and administration language (e.g. a 16‐item IQCODE for Spanish speakers is described as 'IQCODE‐16 Spanish'). Other authors have also shortened the timeframe for assessment with a two‐year version of the IQCODE having been described (Ehrensperger 2010).

Although we describe the utility of IQCODE for dementia diagnosis, IQCODE used in isolation is not suitable for establishing a clinical diagnosis. The value of IQCODE is in selecting people who require more definitive assessment. Use of IQCODE in hospital settings is valid, as new diagnostic criteria for dementia make explicit reference to documenting decline and involving collateral informants, emphasising the potential utility of an informant interview tool such as IQCODE.

The full 26‐ and 16‐item versions of ICQODE with scoring rules are available in Appendix 2 and Appendix 3.

The purpose of this review is to describe the diagnostic test accuracy of IQCODE. Other important psychometric properties for a tool that is to be used in clinical practice include reliability, responsiveness and acceptability. Contemporary reviews of the 26‐ and 16‐item IQCODE suggest good inter‐rater reliability with retest kappa 0.96 at three days and 0.75 at one year (Jorm 2004; Tang 2003). Internal consistency is uniformly high with Cronbach’s alpha in the range 0.93 to 0.97 (Jorm 2004). Validation work has included validation against measures of cognitive change, neuropathology, neuroimaging, and neuropsychological assessment (Cordoliani‐Mackowiak 2003; Jorm 2000; Jorm 2004; Rockwood 1998). Factor analysis suggests that the scale measures a common factor of cognitive decline (http://onlinelibrary.wiley.com/doi/10.1002/14651858.CD010079/full#CD010079‐bbs2‐0025#CD010079‐bbs2‐0025). There are fewer published data on the psychometric properties of other 'short' forms of IQCODE.

IQCODE cut‐off scores suggestive of a potential dementia diagnosis will vary with the demographics of the population tested. In the original development and validation work, normative data were described, with a total score above 93 or an average score above 3.31 indicative of cognitive impairment (Jorm 2004). These data were based on community samples and the thresholds with greatest utility in a selected secondary‐care cohort may differ. There is no consensus on the optimal threshold and various authors have described improved diagnostic accuracy with other cut‐offs. In setting thresholds for any diagnostic test there is a trade‐off between sensitivity and specificity, with the preferred values partly determined by the purpose of the test. For specialist memory services, a more sensitive test may be preferred as a degree of filtering of non‐dementia diagnoses will have already occurred. For general hospital services, where there may be confounders of delirium or disability, a more specific test may be preferred.

IQCODE has a number of features that make it attractive for clinical and research use, particularly in a secondary‐care setting. The questions have an immediacy and relevance that is likely to appeal to users. Assessment and (informant) scoring take around five to seven minutes and as the scale is not typically interviewer‐administered it requires minimal training in application and scoring (Holsinger 2007). There are data to suggest that, compared to standard direct assessments, IQCODE may be less prone to bias from cultural norms and previous level of education (Jorm 2004).

Clinical pathway

Dementia develops over a trajectory of several years and screening tests may be performed at different stages in the dementia pathway. This review focuses on the secondary‐care setting. This is effectively two related patient populations.

In 'general' secondary‐care settings, people will have been referred for expert input but not exclusively due to memory complaints; there may have already been a degree of cognitive screening by the referrer. Opportunistic screening of adults presenting as unscheduled admissions to hospitals would be another secondary‐care pathway.

The rubric of secondary care also includes those people referred to dementia/memory‐specific services. This population will have a high prevalence of cognitive disorders and other physical and psychological health conditions; patients would be expected to have had a degree of cognitive assessment prior to referral. However, we recognise there will be no standardised approach to this pre‐referral assessment and real‐world studies have indicated a low level of pre‐referral cognitive testing (Fisher 2007).

Alternative test(s)

Several other dementia screening and assessment tools have been described. Instruments commonly used in secondary‐care settings include Folstein’s mini‐mental state examination (MMSE) (Folstein 1975); Montreal cognitive assessment (MoCA) (Nasreddine 2005); and the MiniCog (Borson 2000). These performance‐based measures for cognitive screening all rely on comparing single or multi‐domain cognitive testing against population‐specific normative data. Copyright issues may preclude widespread use of certain tools.

Other informant interviews are also available. For example, the AD‐8 is an eight‐question tool, requiring dichotomous responses (yes or no) and testing for perceived change in memory, problem‐solving, orientation and daily activities (Galvin 2005).

For this review we focused on papers that describe IQCODE diagnostic properties, and did not consider other cognitive screening/assessment tools. Where a paper describes IQCODE with an in‐study comparison against another screening tool, we included the IQCODE data only. Where IQCODE is used in combination with another cognitive screening tool, we included the IQCODE data only.

Rationale

There is no consensus on the optimal screening test for dementia and the choice is currently dictated by experience with a particular instrument, time constraints and training. A better understanding of the diagnostic properties of various strategies would allow for an informed approach to testing. Critical evaluation of the evidence base for screening tests or other diagnostic markers is of major importance. Without a robust synthesis of the available information there is the risk that future research, clinical practice and policy will be built on erroneous assumptions about diagnostic validity. This is particularly pertinent to secondary care as healthcare systems increasingly see hospital admission as a window for opportunistic cognitive screening.

IQCODE is commonly used in practice and research; it is used internationally and is one of only a few validated informant‐based screening/diagnostic tools. A literature describing the test accuracy of IQCODE in different settings is available, although some of these studies have been modest in size. Thus systematic review and, if possible, meta‐analysis of the diagnostic properties of IQCODE is warranted.

Although we use the term 'diagnosis' in this review, we recognise that in practice IQCODE alone is not sufficient to make a diagnosis. Rather, IQCODE can be used to 'triage' people presenting with memory problems for further assessment or to inform a diagnosis in conjunction with direct patient assessment and investigations.

This review forms part of a body of work describing the diagnostic properties of commonly‐used dementia tools. The Cochrane Dementia and Cognitive Improvement Group have reviews planned or underway for other commonly‐employed dementia assessment scales (Appendix 4) and other IQCODE reviews are completed (Harrison 2014; Quinn 2014). At present we are conducting single‐test review and meta‐analysis. The intention, however, is then to collate these data, performing an overview allowing comparison of various test strategies.

Objectives

To determine the accuracy of the informant‐based questionnaire IQCODE, for detection of dementia in a secondary care setting.

Secondary objectives

Where data were available we planned to describe the following:

The diagnostic accuracy of IQCODE at various prespecified thresholds. We recognise that various thresholds or cut‐off scores have been used to define IQCODE screen‐positive states. We described the properties of IQCODE for the following cut‐off scores (rounded where necessary): 3.6; 3.5; 3.4; 3.3. These thresholds have been chosen to represent the range of cut‐offs that are commonly used in practice and research; we have been inclusive in our choice of cut‐off to maximise available data for review.
Accuracy of IQCODE for diagnosis of the commonest specific dementia subtype ‐ Alzheimer’s dementia.
Effects of heterogeneity on the reported diagnostic accuracy of IQCODE. Potential sources of heterogeneity that we aimed to explore included: age of cohort; case mix of cohort; reason for hospital consultation (dichotomised as 'memory' or 'non‐memory' services); technical features of IQCODE; method of dementia diagnosis.

Methods

Criteria for considering studies for this review

Types of studies

This review forms part of a suite of reviews describing IQCODE accuracy in various healthcare settings. We created a generic strategy for searching; selection; data extraction and analysis that would be applicable to all the proposed IQCODE reviews. For consistency with the other reviews we have used the same text descriptor in each, except where the methodology is specific to the setting of interest.

We included those studies concerned with secondary‐care assessment that described the properties of IQCODE for diagnosis at a single time point in a population robustly and independently assessed for presence of dementia. This implies that the index and reference are performed contemporaneously.

An alternative approach is to perform the index test and then prospectively follow people for development of the condition of interest defined using a reference standard. This 'delayed verification' of dementia methodology is best suited to studies describing progression of mild cognitive impairment (MCI) to dementia and was not considered in this review.

Case‐control studies are known to potentially overestimate properties of a test and we did not include such studies. Similarly we excluded case studies and samples with small numbers (for the purposes of this review, we defined 'small numbers' as fewer than 10 participants). Small samples were excluded due to the potential for bias in selection and lack of representativeness.

Where settings were mixed, for example, a population study 'enriched' with additional non‐secondary‐care cases, we did not consider such studies unless separate data were presented for participants from each setting. This design can suffer from similar biases to a case‐control design.

Participants

All adults (aged over 18 years) presenting to secondary care were eligible.

Our definition of a secondary‐care‐based study setting was one where participants were referred to a hospital or outpatient specialist service, either due to perceived memory problems or due to another medical complaint; they may have had previous cognitive testing. There were no predefined exclusion criteria relating to the case mix of the population studied, but this aspect of the study was considered as part of our assessment of heterogeneity. Where there were concerns that the participants were not representative of a secondary‐care sample we explored this at study level using our 'Risk of bias' assessment framework. Where studies focused on a specific population, for example, stroke survivors, we described these separately. Recognising that people referred to hospital for specific memory assessment may differ from those referred to hospital for other complaints, we presented these two settings separately.

Index tests

Studies had to include (not necessarily exclusively) IQCODE used as an informant questionnaire.

IQCODE has been translated into various languages to allow international administration (Isella 2002). The properties of a translated IQCODE in a cohort of non‐English speakers may differ from properties of the original English‐language questionnaire. We collected data on the principal language used for IQCODE assessment in studies to allow for assessment of heterogeneity in relation to language.

Since its original description, modifications to the administration of IQCODE have been described (Jorm 2004). Shorter forms of informant questionnaires that test fewer domains are available and properties may differ from the original 26‐item IQCODE tool. We included all such versions of IQCODE, but present separate analysis limited to the commonest 26‐ and 16‐item versions. A modified IQCODE for self assessment has been described (Cullen 2007). As our interest was informant interviews, we have not included self‐assessment IQCODE in the review.

Target conditions

Papers reporting any clinical diagnosis of all‐cause (unspecified) dementia were potentially eligible for inclusion. Defining a particular dementia subtype was not required, although where available these data were recorded.

Reference standards

Our reference standard was clinical diagnosis of dementia. We recognise that clinical diagnosis itself has a degree of variability but this is not unique to dementia studies and does not invalidate the basic diagnostic test accuracy approach. Clinical diagnosis included all‐cause (unspecified) dementia, using any recognised diagnostic criteria (for example, International Classification of Diseases Edition 10 (ICD‐10); Diagnostic and Statistical Manual of Mental Disorders Edition 4 (DSM‐IV)). Dementia diagnosis may specify a pathological subtype and all dementia subtypes were included. Clinicians may use imaging, pathology, or other data to aid diagnosis; however, we did not include diagnosis based only on these data without corresponding clinical assessment. We recognise that different iterations of diagnostic criteria may not be directly comparable and that diagnosis may vary with the degree or manner in which the criteria have been operationalised (e.g. individual clinician versus algorithm versus consensus determination). We set no criteria relating to severity or stage of dementia diagnosis; instead we classified any clinical diagnosis of dementia (not mild cognitive impairment or its equivalents). We planned to explore stage/severity of dementia as a potential source of heterogeneity.

Search methods for identification of studies

We used a variety of information sources to ensure that we included all relevant studies. We devised terms for electronic database searching in conjunction with the Trials Search Co‐ordinator at the Cochrane Dementia and Cognitive Improvement Group. As part of a body of work looking at cognitive assessment tools, we created a sensitive search strategy designed to capture dementia test accuracy papers. We then assessed the output of the searches to select those papers that could be pertinent to IQCODE, with further selection for directly relevant papers and those papers with a secondary‐care focus.

Electronic searches

We searched ALOIS, the specialised register of the Cochrane Dementia and Cognitive Improvement Group (which includes both intervention and diagnostic accuracy studies), MEDLINE (Ovid SP), EMBASE (Ovid SP), PsycINFO (Ovid SP), BIOSIS Previews (Thomson Reuters Web of Science), Web of Science Core Collection (includes Conference Proceedings Citation Index) (Thomson Reuters Web of Science), CINAHL (EBSCOhost) and LILACS (Bireme). See Appendix 5 and Appendix 6 for the search strategies run. The final search date was 28 January 2013.

We also searched sources specific to diagnostic accuracy and healthcare research assessment:

MEDION database (Meta‐analyses van Diagnostisch Onderzoek: www.mediondatabase.nl);
DARE (Database of Abstracts of Reviews of Effects via the Cochrane Library);
HTA Database (Health Technology Assessment Database via the Cochrane Library);
ARIF database (Aggressive Research Intelligence Facility: www.arif.bham.ac.uk).

We applied no language or date restrictions to the electronic searches, and used translation services as necessary.

A single researcher (ANS), with extensive experience of systematic reviews from the Cochrane Dementia and Cognitive Impairment Group, performed the initial screening of the search results. All subsequent searches of titles/abstracts/papers were performed by independent paired assessors (TJQ, PF).

Searching other resources

Grey literature: We identified 'grey' literature through searching of conference proceedings, theses or PhD abstracts in EMBASE, the Web of Science Core Collection and other databases already specified.

Handsearching: We did not perform handsearching. The evidence for the benefits of handsearching are not well defined, and we note that a study specific to diagnostic accuracy studies suggested little additional benefit of handsearching above a robust initial search strategy (Glanville 2010).

Reference lists: We checked the reference lists of all relevant studies and reviews in the field for further possible titles and repeated the process until we found no new titles (Greenhalgh 2005).

Correspondence: We contacted research groups who have published or are conducting work on IQCODE for dementia diagnosis, informed by the results of the initial search.

We searched for relevant studies in PubMed, using the 'related article' feature. We examined key studies in the citation databases of Science Citation Index and Scopus to ascertain any further relevant studies.

Data collection and analysis

Selection of studies

One review author (ANS) screened all titles generated by initial electronic database searches for relevance. The initial search was a sensitive, generic search, designed to include all potential dementia screening tools. Two review authors (ANS, TJQ) selected titles potentially relevant to IQCODE. Two review authors (TJQ, PF) independently conducted all further review and selection. We reviewed potential IQCODE‐related titles, assessing all eligible studies as abstracts, and potentially relevant studies as full manuscripts against the inclusion criteria. We resolved disagreement by discussion, with the potential to involve a third review author (DJS) as arbiter if necessary.

We adopted a hierarchical approach to exclusion, first excluding on the basis of index test and reference standard, and then on the basis of sample size and study data. Finally we assessed all IQCODE papers with regard to setting.

Where a study may have included useable data but these were not presented in the published manuscript, or the data presented could not be extracted to a standard two‐by‐two table, we contacted the authors directly to request further information or source data. If authors did not respond or if the data were not available we did not include the study (labelled as 'data not suitable for analysis' on the study flowchart). If the same dataset was presented in more than one paper we included the primary paper.

We detailed the study selection process in a PRISMA flow diagram.

Data extraction and management

We extracted data to a study‐specific pro forma that included clinical/demographic details of the participants (including details of reason for hospital referral – 'memory' or 'non‐memory'), details of IQCODE administration, and details of the dementia diagnosis process. We extracted data for all IQCODE studies, before dividing them by setting (community, primary or secondary). We piloted the pro forma against two of the included papers before use.

Where IQCODE data were given for a number of cut‐off points, we extracted data for each IQCODE threshold. Where thresholds were described to two decimal places, we chose the cutpoint closest to the point of interest (i.e. all scores less than 3.35 would be scored as 3.3, all scores 3.35 or greater would be scored as 3.4). We extracted data to a standard two‐by‐two table.

Two review authors (TJQ, PF) extracted data independently. The review authors were based in different centres and were blinded to each other's data until extraction was complete. We then compared and discussed data pro formas with reference to the original papers, resolving disagreements in data extraction by discussion, with the potential to involve a third review author (DJS) as arbiter if necessary.

For each included paper, we detailed the flow of participants (numbers recruited, included, assessed) in a flow diagram.

Assessment of methodological quality

As well as describing test accuracy, an important goal of the DTA (diagnostic test accuracy) process is to improve study design and reporting in dementia diagnostic studies. For this reason, we assessed both methodological and reporting quality.

We assessed the quality of study reporting using the Standards for the Reporting of Diagnostic Accuracy studies (STARD) checklist (Bossuyt 2003) (Appendix 7). We followed the guidance and principles outlined in the dementia‐specific STARDdem extension to STARD grading. We present our results under the descriptor STARD, as at time of writing STARDdem guidance is not yet published and in the public domain. We advocate use of STARDdem (Noel‐Storr 2014) for the assessment of diagnostic accuracy studies in dementia henceforth.

We assessed the methodological quality of each study using the Quality Assessment tool for Diagnostic Accuracy Studies (QUADAS‐2) tool (www.bris.ac.uk/quadas/quadas-2). This tool incorporates domains specific to patient selection; index test; reference standard; and participant flow. Each domain is assessed for risk of bias and the first three domains are also assessed for applicability. Operational definitions describing the use of QUADAS‐2 are detailed in Appendix 8. To create QUADAS‐2 anchoring statements specific to studies of dementia test accuracy, we convened a multidisciplinary review of various test accuracy studies with a dementia reference standard (Davis 2013) (Appendix 9).

Paired, independent raters (TJQ and PF or TJQ and JKH), blinded to each other’s scores, performed both assessments. We resolved disagreements by further review and discussion, with the potential to involve a third review author (DJS) as arbiter if necessary.

We did not use QUADAS‐2 data to form a summary quality score, but rather we chose to present a narrative summary describing studies that found high/low/unclear risk of bias/concerns regarding applicability with corresponding graphical displays.

Statistical analysis and data synthesis

We were principally interested in the test accuracy of IQCODE for the dichotomous variable 'dementia/no dementia'. Thus, we applied the current DTA framework for analysis of a single test and fitted the data extracted to a standard two‐by‐two data table showing binary test results cross‐classified with a binary reference standard. We repeated this process for each IQCODE threshold score described.

We used Review Manager 5 (RevMan 2014) to calculate sensitivity, specificity and their 95% confidence intervals (CIs) from the two‐by‐two tables abstracted from the included studies. We present these data graphically in forest plots to allow basic visual inspection of individual studies only. Standard forest plots with graphical representation of summary estimates are not suited to quantitative synthesis of DTA data. Using software additional to Review Manager 5 (SAS release 9.1) we used the bivariate method to calculate summary values within each prespecified cut‐off. The bivariate methods (Reitsma 2005) enabled us to calculate summary estimates of sensitivity and specificity while correctly dealing with the different sources of variation: (1) imprecision, by which sensitivity and specificity have been measured within each study; (2) variation beyond chance in sensitivity and specificity between studies; (3) any correlation that might exist between sensitivity and specificity. We describe the results for each chosen threshold as sensitivity and specificity and we estimate all accuracy measures with their 95% CI. Where data allowed, we chose to present individual study results graphically by plotting estimates of sensitivities and specificities in the receiver operating characteristic (ROC) space. We present the summary sensitivity and specificity points with a 95% confidence region. We have not fitted a ROC curve as we chose the bivariate model for the analysis rather than the hierarchical summary receiver‐operator curve (HSROC) method. We also describe metrics of pooled positive and negative likelihood ratios. To allow an overview of IQCODE test accuracy, we performed a further analysis: pooling data at a common threshold (3.3 or closest), chosen to maximise the data available for inclusion.

Investigations of heterogeneity

Heterogeneity is to be expected in diagnostic test accuracy reviews and we did not perform formal analysis to quantify it.

The properties of a tool describe behaviour of the instrument under particular circumstances. Thus, for our assessment of potential sources of heterogeneity (where data allowed) we collected data to inform our prespecified areas of interest:

a) clinical criteria used to reach dementia diagnosis (for example, ICD‐10; DSM‐IV) and the methodology used to reach dementia diagnosis (for example, individual assessment; group (consensus) assessment);

b) technical features of the testing strategy. (version of IQCODE (language); number of items, for example traditional IQCODE; 16‐item 'short' form etc).

c) reason for secondary‐care consultation. We dichotomised this as attending for 'memory problem' or attending for 'other medical problem'.

Where data allowed we performed pooled analysis with these factors as covariates, and compared results of subgroups. We prespecified that we would present data from the specialised memory setting (memory) and general secondary‐care setting (non‐memory) separately, that we would present data from the traditional (26 questions) and short‐form (16 questions) IQCODE separately, and that we would present data from studies using English language IQCODE against those using non‐English‐language versions.

Sensitivity analyses

Where appropriate (i.e. if not already explored in our analyses of heterogeneity) and as data allowed, we planned to explore the sensitivity of any summary accuracy estimates to aspects of study quality, guided by the anchoring statements developed in our QUADAS‐2 exercise. We prespecified sensitivity analysis planned to exclude studies of low quality (high likelihood of bias) to determine if the results are influenced by inclusion of the lower‐quality studies; and sensitivity analysis excluding studies that may have unrepresentative populations.

Results

Results of the search

Our search resulted in 16,144 citations, from which we identified 73 full‐text papers for eligibility.

We excluded 60 papers (Figure 1). Reasons for exclusion were: population not from a secondary‐care setting; no IQCODE data or unsuitable IQCODE data; small numbers (< 10) of included participants; no clinical diagnosis of dementia; repeat datasets; data not suitable for analysis (described in more detail in Selection of studies) and case‐control design (see Characteristics of excluded studies).

Figure 1

Study flow diagram.

Eight studies which we identified required translation. We contacted 16 authors to provide useable data, of whom 13 responded (Acknowledgements).

This review includes 13 studies, n = 2882 participants (summary of findings Table 1).

Methodological quality of included studies

We described the risk of bias using the QUADAS‐2 methodology (Appendix 8); our anchoring statements for the IQCODE are summarised in Appendix 9. We did not rate any study at low risk of bias for all the categories of QUADAS‐2 (Figure 2). Areas of particular concern for bias were around: participant sampling procedures (five papers graded low risk, with high rates of unclear or inappropriate sampling frames and inappropriate exclusions) and application of index test (two papers graded low risk of bias, with most papers failing to prespecify their cut‐off for test positivity). There were also concerns around applicability, particularly concerning patient selection and index test. Only six papers recruited a sample of representative secondary‐care attenders, either to a memory or a non‐memory setting, and only five studies provided sufficient detail for their procedure for conducting the IQCODE for it to be considered consistent with the original methodology for use in clinical practice.

Figure 2

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

We described reporting quality using the STARD guidance (Appendix 7). There were limitations in reporting across all papers (Appendix 10). No paper included all the details recommended in the STARD statement; particular areas of study reporting that could be improved were: distribution of severity of disease (three papers reported on severity of dementia); handling of missing results (four papers explained, for example, how incomplete IQCODE questionnaires were scored) and estimates of variability of diagnostic accuracy (three papers considered variability between assessors or subgroups of participants).

Findings

We have described the individual included studies in Characteristics of included studies and Additional Table 1. We have also presented tabulated data for test accuracy by IQCODE threshold (summary of findings Table 2) and by covariate (summary of findings Table 3).

Table 1. Summary of test accuracy at study level

Study ID	Participants (n)	Primary threshold	Sensitivity (%)	Specificity (%)
Flicker 1997	299*	3.6	87	58
Garcia 2002	103	3.6	92	81
Goncalves 2011	204	4.1	72	67
Hancock 2009	144	3.6	86	39
Harwood 1997	177	3.3	100	78
Jorm 1991	69	3.6	71	80
Knaefelc 2003	323	3.6	94	47
Mackinnon 1998	106	3.6	90	65
Mulligan 1996	76	3.3	100	42
Narasimhalu 2008	576	3.4	86	78
Sikkes 2010	269*	3.3	96	42
Siri 2006	200	3.3	94	88
Tang 2003	189	3.4	88	75

Where multiple thresholds were reported, we used the value closest to 3.3 to populate this table

*Total number of participants adjusted to reflect numbers included in quantitative synthesis

The total number of participants across the studies was 2882 (range: 69 to 576), of whom 1413 (49%) had a clinical dementia diagnosis. We performed quantitative synthesis for 2745 participants, of whom 1413 (51%) had a clinical dementia diagnosis. This excludes 59 participants with mild cognitive impairment included by Sikkes 2010, who was assessing the ability of IQCODE to diagnose mild cognitive impairment and presents data depending on diagnostic group. It also excludes 78 participants from the study by Flicker 1997 who were not assessed in the specialist memory clinic setting, as this paper presented differing test accuracy data with respect to assessment location.

The included studies are international, including datasets from eight countries (Australia, China, Singapore, Spain, Switzerland, Thailand, The Netherlands and the UK).

Nine different versions of IQCODE were used in the included studies and 10 different diagnostic thresholds (3.3, 3.4, 3.5, 3.6, 3.7, 3.8. 3.9, 4.0, 4.1, 4.2) were used to define a positive IQCODE. We limited our analysis to the validated forms of IQCODE that are in common clinical use, i.e. the 26‐ and 16‐item questionnaires. Although Siri 2006 only reported data relating to their 32‐item modified IQCODE at an optimal cut‐off of 3.4, the authors supplied data for use of the 26‐item IQCODE to facilitate inclusion in the quantitative synthesis.

Within the prespecified thresholds chosen for analysis there was a spread of sensitivity and specificity (sensitivity range: 71% to 100%; specificity range: 39% to 88%). Additional Table 1 provides a summary of test accuracy for each study, using the value closest to 3.3.

Overview analysis ‐ IQCODE using a 3.3 threshold or closest

From the 13 studies, 2745 participants are included in quantitative synthesis. Sensitivity was 0.91 (95% confidence interval (CI) 0.86 to 0.94); specificity 0.66 (95% CI 0.56 to 0.75). The overall positive likelihood ratio was 2.7 (95% CI 2.0 to 3.6) and the negative likelihood ratio was 0.14 (95% CI 0.09 to 0.22).

The summary receiver operating characteristic (ROC) curve describing test accuracy across the included studies is presented in Figure 3.

Figure 3

Summary ROC Plot, IQCODE using a 3.3 threshold score or nearest. The dark point is a summary point, the broken line represents 95% confidence region

IQCODE 3.3 threshold or closest ‐ comparing 26‐ and 16‐item IQCODE

We used the overview dataset to examine the effect of heterogeneity relating to IQCODE format (traditional 26‐item or short‐form 16‐item).

Analysis of the studies using the 26‐item IQCODE (six datasets) gave sensitivity of 0.89 (95% CI 0.82 to 0.94); specificity 0.66 (95% CI 0.49 to 0.80). The overall positive likelihood ratio was 2.6 (95% CI 1.6 to 4.3) and the negative likelihood ratio was 0.2 (95% CI 0.1 to 0.3).

Analysis of the studies using the 16‐item IQCODE (seven datasets) gave sensitivity of 0.92 (95% CI 0.85 to 0.96); specificity 0.66 (95% CI 0.54 to 0.77). The overall positive likelihood ratio was 2.7 (95% CI 1.9 to 3.8) and the negative likelihood ratio was 0.1. (95% CI 0.1 to 0.2).

Comparing the two, there were no differences in accuracy with a relative sensitivity of the 26‐item versus 16‐item of 0.98 (95% CI 0.89 to 1.07) and relative specificity of 0.99 (95% CI 0.75 to 1.33). (Figure 4)

Figure 4

Summary ROC Plot of IQCODE 3.3 threshold or nearest, comparing short form (16 item) and traditional IQCODE. The dark point is a summary point, the broken line represents 95% confidence region

As there was no difference we presented further data as the combined (26‐ and 16‐item IQCODE together) test accuracy.

IQCODE 3.3 threshold or closest ‐ comparing English and non‐English language IQCODE

We coded the language of IQCODE administration as a covariate. Study numbers did not allow analysis by individual languages and so we compared the IQCODE in the original wording (English language) with all translated IQCODE forms (non‐English language).

Analysis of studies using English language IQCODE (six datasets) gave sensitivity of 0.87 (95% CI 0.78 to 0.92); specificity 0.63 (95% CI 0.48 to 0.76). The overall positive likelihood ratio was 2.3 (95% CI 1.6 to 3.4) and the negative likelihood ratio was 0.2 (95% CI 0.1 to 0.3).

Analysis of studies using non‐English language IQCODE (seven datasets) gave sensitivity of 0.93 (95% CI 0.88 to 0.96); specificity 0.69 (95% CI 0.56 to 0.80). The overall positive likelihood ratio was 3.0 (95% CI 2.1 to 4.5) and the negative likelihood ratio was 0.1 (95% CI 0.1 to 0.2).

Comparing the two, there were no differences in accuracy with a relative sensitivity of the non‐English Language versus English language of 1.07 (95% CI 0.98 to 1.17) and relative specificity of 1.10 (95% CI 0.83 to 1.47). (Figure 5)

Figure 5

Summary ROC Plot of pooled IQCODE data at a 3.3 threshold (or nearest value), with language as covariate. The dark point is a summary point, the broken line represents 95% confidence region

As there was no difference we presented further data as the combined (English language and non‐English language IQCODE together) test accuracy.

IQCODE test accuracy at differing diagnostic thresholds

We calculated test accuracy at our prespecified IQCODE thresholds. We chose to present a summary ROC curve for those analyses with more than three included studies:

IQCODE 3.3 threshold: there were four datasets* (n = 722) that contained relevant data. The sensitivity was 0.96 (95% CI 0.94 to 0.98); specificity 0.66 (95% CI 0.41 to 0.84). The overall positive likelihood ratio was 2.8 (95% CI 1.5 to 5.5) and the negative likelihood ratio was 0.1 (95% CI 0.03 to 0.1).

IQCODE 3.4 threshold: there were four datasets* (n = 1211) that contained relevant data. The sensitivity was 0.94 (95% CI 0.84 to 0.98); specificity 0.73 (95% CI 0.59 to 0.85). The overall positive likelihood ratio was 3.5 (95% CI 2.1 to 5.8) and the negative likelihood ratio was 0.1 (95% CI 0.03 to 0.2).

IQCODE 3.5 threshold: there was only one dataset (n = 269) that contained relevant data. The sensitivity was 0.92 and specificity was 0.63; we did not perform quantitative synthesis.

IQCODE 3.6 threshold: there were nine datasets* (n = 1576) that contained relevant data. The sensitivity was 0.89 (95% CI 0.85 to 0.92); specificity 0.68 (95% CI 0.56 to 0.79). The overall positive likelihood ratio was 2.8 (95% CI 1.9 to 4.0) and the negative likelihood ratio was 0.2 (95% CI 0.1 to 0.2).

*Certain papers included more than one dataset

Heterogeneity relating to setting

Specialist memory setting: there were six datasets (n = 1352) that contained relevant data. The sensitivity was 0.90 (95% CI 0.83 to 0.94); specificity 0.54 (95% CI 0.44 to 0.64). The overall positive likelihood ratio was 1.9 (95% CI 1.6 to 2.4) and the negative likelihood ratio was 0.2 (95% CI 0.1 to 0.3). The dementia prevalence ranged from 55% to 87%.

Non‐memory setting: there were three datasets (n = 566) that contained relevant data. The sensitivity was 0.95 (95% CI 0.88 to 0.98); specificity 0.81 (95% CI 0.71 to 0.88). The overall positive likelihood ratio was 4.9 (95% CI 3.3 to 7.4) and the negative likelihood ratio was 0.06 (95% CI 0.02 to 0.2). The dementia prevalence ranged from 11% to 50%.

Comparing the two, there is a significant difference in accuracy in the non‐memory versus memory settings ( P = 0.019) which is attributable to the higher specificity of the IQCODE in the non‐memory setting. The relative sensitivity of non‐memory versus memory setting is 1.06 (95% CI 0.99 to 1.15) and the relative specificity is 1.49 (95% CI 1.22 to 1.83). (Figure 6)

Figure 6

Summary ROC Plot of pooled IQCODE data at a 3.3 threshold (or nearest value), with setting as covariate. The dark point is a summary point, the broken line represents 95% confidence region

In four studies (Jorm 1991; Mackinnon 1998; Mulligan 1996; Narasimhalu 2008) participants were recruited both in specialist memory and in non‐memory secondary‐care settings and data were not available stratified by setting, so we could not include them in the quantitative synthesis.

Other sources of heterogeneity and sensitivity analyses

Our objective was to assess the diagnostic accuracy of IQCODE across the cut‐off points commonly used in practice (3.3, 3.4, 3.5, 3.6). However, one study (Goncalves 2011) only reported data at an IQCODE cut‐off of 4.1. We conducted a sensitivity analysis removing this study, which demonstrated a similar test accuracy (sensitivity was 0.92, 95% CI 0.88 to 0.94; specificity 0.66, 95% CI 0.55 to 0.76).

We performed a sensitivity analysis removing those studies which included participants with a low mean or median age (< 70 years) (Hancock 2009; Narasimhalu 2008; Sikkes 2010). Test accuracy was similar after exclusion of these studies, with an improvement in the specificity of IQCODE (sensitivity was 0.92, 95% CI 0.84 to 0.95; specificity 0.70, 95% CI 0.60 to 0.78 at a threshold of 3.3 or closest).

A quantitative analysis of the effect of dementia diagnosis criteria (reference standard) was not possible. Twelve studies used the American Psychiatric Association Diagnostic and Statistical Manual of Mental Disorders (DSM), one used the World Health Organization International Statistical Classification of Diseases and Related Health Problems (ICD) for diagnosis, and one used the National Institute of Neurological and Communicative Disorders and Stroke and Alzheimer's Disease and Related Disorders Association (NINCDS‐ADRDA) criteria. Only one study (Jorm 1991) used two diagnostic criteria. Jorm 1991 reported that using DSM‐III‐R at an IQCODE cut‐off of 3.6, they obtained sensitivity of 69% and specificity of 80%, compared with using ICD‐10 criteria which resulted in a sensitivity of 80% and specificity of 82%. As the DSM criteria were those most commonly used in other studies, we included these data for reporting of Jorm 1991.

A further original aim was to describe the accuracy of the IQCODE for diagnosis of Alzheimer's disease dementia. Although three studies reported assessing the IQCODE specifically in people with Alzheimer's disease (Goncalves 2011; Narasimhalu 2008; Sikkes 2010), suitable data were only available for two of the three studies (Narasimhalu 2008; Sikkes 2010) and thus we felt that quantitative synthesis would be inappropriate.

Two studies specifically assessed the IQCODE in a stroke population (Narasimhalu 2008; Tang 2003). Tang 2003 only recruited people who had experienced a stroke, while in Narasimhalu 2008 they were a subgroup of the total study population. However, data are not presented on IQCODE properties specific to the stroke population (Narasimhalu 2008) and the small number of studies would make quantitative synthesis inappropriate.

We considered an investigation of heterogeneity assessing the impact of a prespecified IQCODE threshold on test accuracy, from our QUADAS‐2 assessment. However, only two studies (Garcia 2002; Goncalves 2011) were eligible for inclusion, and so quantitative synthesis was not appropriate.

Discussion

Summary of main results

We present a review of the available evidence around test accuracy of the Informant Questionnaire on Cognitive Decline in the Elderly (IQCODE) for dementia diagnosis in hospital/secondary‐care settings. Our quantitative synthesis demonstrates summary sensitivity of 0.91 and specificity of 0.66 when IQCODE is used across all (undifferentiated) secondary‐care settings for the diagnosis of dementia. The positive likelihood ratio was 2.7 and the negative likelihood ratio was 0.14; indicating that the IQCODE can be used as a 'rule‐out' test of dementia in a secondary‐care setting. These results represent a large dataset, comprising data from 13 international studies with over 2745 participants. We limited our review to studies concerning hospital‐based healthcare systems; however even within this focused setting there was substantial heterogeneity and we must be cautious in our interpretation of the pooled data. Across the included papers there was substantial potential for bias and issues with limited generalisabilty and suboptimal reporting.

The prevalence of dementia in the included settings was highly varied, ranging from 10.5% to 87.4%. This marked difference in patient populations reflects in part the differing case mix that potentially can be included under the 'hospital' setting label. We explored this aspect of heterogeneity with prespecified subgroup and sensitivity analyses and found a significant difference in test accuracy depending on whether the hospital setting described was a specialist memory service (for example, old‐age psychiatry ward; memory clinic) or a non‐memory‐specific hospital setting (for example, an acute admissions ward or general outpatient clinic). The clinical interpretation of such comparisons is challenging. One interpretation of these data is that IQCODE as a diagnostic tool may be more suited to general hospital settings rather than services with a cognitive focus. The pictorial summary analysis (Figure 6) illustrates that memory and non‐memory groups seem to behave differently and perhaps should be treated as such in future analyses of cognitive test accuracy. These data come with several caveats (a modest number of included studies; heterogeneity within the memory/non‐memory groups; issues with potential for bias) but our interpretation has a clinical validity as the case mix in a specialist service designed for those with suspected dementias is likely to be very different to the population presenting for assessment in an unscheduled acute admissions or medicine for the older adult ward. The difference between the groups was most apparent in specificity, with the data suggesting better specificity when IQCODE is used in non‐memory settings. We can speculate on potential reasons for this difference: in the specialist memory service setting, the high prevalence of depression, either solely or co‐existing with dementia (Knapskog 2014) may be an important consideration, as many of the IQCODE parameters are task‐orientated and thus may be impacted by depressive symptoms giving false positive IQCODE results. In separating the study settings of memory and non‐memory we recognise that differences between these populations operate at many levels including potential availability of an informant to complete IQCODE scoring.

There is no universal value of sensitivity and specificity that is considered 'good' or 'poor'; the values that clinicians will accept as suitable for clinical use will vary with the implications of a false positive or false negative result. In the non‐memory (often acute hospital) setting, delirium is prevalent (Ryan 2013), either alone or in association with cognitive impairment or dementia, but opportunities for in‐depth patient‐dependent cognitive testing may be limited and the more favourable test accuracy metrics of IQCODE in this setting are reassuring. It could be argued that the lower specificity for the instrument in a specialist memory service is less problematic than in other healthcare settings, as patients will receive additional assessments as determined by the specialist clinician and are unlikely to be misdiagnosed on the basis of an IQCODE result alone.

Applying summary test accuracy data to real‐world settings can be illustrative of the potential strengths and limitations of a test in practice. Applying our non‐memory summary data to the acute hospital admission setting, current UK data estimate around two million unscheduled admissions annually in the over‐65s (Imison 2012) and a dementia prevalence of 42.4% in this group (Sampson 2009). Using the IQCODE alone to screen for dementia would result in 42,400 people with dementia not being identified and 218,880 people without dementia being referred inappropriately for specialist assessment. Both false positive and false negative results have potential for harm. It is not certain that those whose dementia is missed with IQCODE will eventually receive a diagnosis and opportunities for early intervention may be lost, while inappropriately labelling a person as having cognitive decline based on IQCODE will also be associated with potential psychological harm and economic implications of need for further investigation. We acknowledge that the UK NHS‐based figures we quote may not be applicable to other countries or healthcare systems. We have presented UK data in this review as we have access to reasonably robust input data and their inclusion illustrates important points about the real‐world implications of our summary test accuracy data

Our results do not indicate an optimal cut‐off for the IQCODE in a hospital setting. A range of diagnostic thresholds were reported with significant overlap between the included studies, with the commonest diagnostic threshold being 3.6; this is higher than the IQCODE cut‐offs employed when the tool is used in community settings. Only one study reported data using a cut‐off outside our prespecified range of 3.3 to 3.6 (Goncalves 2011; cut‐off 4.1) To allow us to use the maximum available data we included these data in our summary analysis, with a sensitivity analysis demonstrating no significant effect of excluding them.

We recognise that IQCODE can be applied using various methods and we prespecified analyses to try and describe the effects on test accuracy. Our finding of no difference in the diagnostic accuracy between assessments conducted in the English language and those conducted in six other languages (grouped together as 'non‐English language' to allow analysis) is reassuring, and supports the cross‐cultural use of IQCODE. Similarly, the length of instrument (26‐item versus 16‐item) had no significant effect on test accuracy, a result in keeping with previous narrative review findings in Jorm 2004 and with previous review of IQCODE properties when used in a community setting (Quinn 2014).

Since IQCODE was originally designed for use in the older adult population, we felt it was important to ensure that a tool used to aid the diagnosis of dementia would be robust to the difficulties of assessment in younger age groups with potential early‐onset dementia (Vieira 2013).To explore age effects, we performed a sensitivity analysis removing studies with a low average age of included participants, and found test accuracy to be broadly similar.

We prespecified two other analyses based on diagnostic features. We accepted any validated clinical assessment system for our reference standard of dementia diagnosis but recognised that differing classifications operationalise dementia in slightly different ways. One included study used two diagnostic criteria in direct comparison (Jorm 1991). Using the same cut‐off of 3.6, the DSM III‐R resulted in a sensitivity of 0.69 and a specificity of 0.80 compared with ICD‐10 which produced a sensitivity of 0.80 and specificity of 0.82. As the majority of the other included studies used DSM criteria only, it was not possible to further describe any potential effect of diagnostic criteria on IQCODE accuracy. As a recognition of the different effects that subtypes of dementia have on the individual (Gure 2010), we felt it was reasonable to analyse the diagnostic properties of IQCODE with respect to specific subtypes of dementia. It had previously been demonstrated that the IQCODE performs differently in people with Alzheimer's disease dementia and those with frontotemporal dementia (Larner 2010). However, there was a lack of data available on dementia pathology in the included studies in our review and we were unable to offer subgroup analysis by dementia subtype.

Our 'Risk of bias' assessment using the QUADAS‐2 tool identified significant potential for bias in the included studies as described below. Given the modest number of included studies, we did not perform subgroup or sensitivity analyses to quantitatively explore these effects for each QUADAS‐2 domain.

Strengths and weaknesses of the review

Strengths and weaknesses of included studies

Our QUADAS‐2 and STARD assessments suggested potential problems of bias, poor generalisability and suboptimal reporting across the included studies. Areas of particular concern are highlighted in the text of the Characteristics of included studies table and summarised in Figure 2.

A key aspect of our QUADAS‐2 assessments was establishing whether authors prespecified the cut‐off used to define IQCODE positivity. Where authors calculate test accuracy across the range of potential IQCODE thresholds, they are not reflecting clinical practice, and test accuracy may be inflated if only the best‐performing cutpoints are reported. Thus, where cutpoints were not prespecified, we classified the paper as being at high risk of bias for the conduct of the index test. Only two of our included studies (Garcia 2002; Goncalves 2011) were deemed to be at low risk of bias for this domain.

In order that the findings of our analysis are applicable in practice it is essential that the recruited participants to the included studies are representative. Details about sampling procedures, particularly non‐consecutive or non‐random samples being recruited and studies inappropriately excluding those with relevant co‐morbidities, were an area of concern in our 'Risk of bias' assessment. A further concern about using the IQCODE is that it relies on the assessment of an informant, and not all patients have someone who can fulfil this role. Four of the included studies only recruited participants where an informant was present at the consultation (Garcia 2002; Goncalves 2011; Hancock 2009; Sikkes 2010). The significance of attending the memory clinic unaccompanied has previously been demonstrated to be a specific predictor of the individual not having a clinical diagnosis of dementia (Larner 2009). Nonetheless, reliance on an informant is a relevant factor when considering test accuracy as a screening tool, as those studies did not recruit participants who attended unaccompanied. One author adopted a broader approach and permitted the completion of the IQCODE by post or telephone (Harwood 1997).

We wanted to ensure that case‐control methodologies were not included in this review, given the propensity to falsely estimate test accuracy, as the prevalence has been artificially fixed. One of the included studies (Siri 2006) reported exactly equivalent numbers of those with dementia and those without (n = 100). The methodology described does not suggest that a case‐control design was used, but there is a lack of detail as to how the final sample was obtained.

Reporting quality impacts on the 'Risk of bias' assessment as, where procedures are not fully described, this limits the potential for judging the rigor of the methodology. The STARD assessments revealed a lack of reporting around disease severity and the handling of indeterminate results. Both of these have implications for the use of the tool in clinical practice. None of the included studies reported properties of IQCODE in relation to disease severity or stage. Intuitively test properties will differ comparing subtle, early dementia with later stage advanced disease; the optimal cut‐off may also change as the disease progresses.

Strengths and weaknesses of review process

In common with the other reviews in the suite (Harrison 2014; Quinn 2014), the review benefits from a structured and thorough search strategy created and conducted by an experienced Trials Search Co‐ordinator. We adopted an inclusive approach and identified relevant studies in a formal and standardised manner. We recognise that our search was performed in January 2013 and this may have led to the potential exclusion of relevant studies published more recently. Quality assessment was guided by our dementia‐specific QUADAS‐2 anchoring statements which were devised for use in diagnostic test accuracy studies that compare a cognitive index test and clinical reference standard (Davis 2013). In addition, our quality assessment was complemented by formal assessment of reporting quality using the STARD methodology (Appendix 7), an approach which has been shown to add rigor in test accuracy evaluation (Oliveira 2011). Had it been available at the time of analysis, the dementia‐specific STARDdem guidance on reporting may have better described the challenges inherent in reporting research around dementia tests (Noel‐Storr 2014).

We were inclusive in our initial search of the literature and assessed study reports which were not available in English, making use of translation services to facilitate study selection and data extraction. Although only one paper written in Spanish met the final inclusion criteria, this approach meant studies were not inappropriately excluded due to their language of presentation.

Contacting study authors was highly productive, allowing for clarification of methodology, for example, to ensure case‐control designs were not included; updating citations identified as abstracts to allow for the subsequent full‐text publications to be cited; and provision of data in a format suitable for inclusion in the quantitative synthesis.

Our review question was focused to facilitate the assessment of the test properties of IQCODE in a secondary‐care setting. Where a study included a non‐secondary‐care setting, we excluded the data from this review but considered them for reviews of IQCODE in other healthcare settings. We excluded studies concerned with the diagnosis of mild cognitive impairment from our quantitative synthesis, as our objective was to assess the diagnostic accuracy of IQCODE for the diagnosis of dementia. We prespecified a series of subgroup and sensitivity analyses to look at hospital settings, IQCODE application and dementia diagnosis. Not all of these analyses were possible due to limited data; we were mindful of not over‐analysing what was a modest dataset and did not perform post hoc analyses or analyses relating to QUADAS‐2 domains.

Comparisons with previous research

Our findings are in keeping with reviews assessing the test accuracy of the IQCODE in other healthcare settings. In the review describing IQCODE as used in the community setting, summary sensitivity was 0.80 (95% confidence interval (CI) 0.75 to 0.85) and specificity was 0.84 (95% CI 0.78 to 0.90). In the community review, the form of IQCODE (26 versus 16 items) similarly had no effect on accuracy and there was no obvious optimal cut‐off for IQCODE across the range 3.3 to 3.6 (Quinn 2014). The third in the suite of IQCODE reviews, assessing accuracy in primary care, had no quantitative synthesis as we found only one relevant study (Harrison 2014). The IQCODE has been assessed in comparison to other informant or self‐completed instruments, although without presenting quantitative synthesis (Cherbuin 2008). Other authors have concluded that a combined approach of tools, often with direct patient assessment and informant assessment, is required in view of the complexity of diagnosis and disease subtypes (Stephan 2010; Cullen 2007).

Applicability of findings to the review question

Our focused review question concerned the accuracy of IQCODE for dementia diagnosis in a secondary‐care/hospital setting. We believe our robust search and clear operationalisation of the hospital setting have allowed us to comprehensively collate all available evidence on this question. Although the number of included studies was modest with substantial heterogeneity, we were still able to offer quantitative summary analyses of IQCODE test accuracy.

Figure 1

Study flow diagram.

Figure 2

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Figure 3

Summary ROC Plot, IQCODE using a 3.3 threshold score or nearest. The dark point is a summary point, the broken line represents 95% confidence region

Figure 4

Summary ROC Plot of IQCODE 3.3 threshold or nearest, comparing short form (16 item) and traditional IQCODE. The dark point is a summary point, the broken line represents 95% confidence region

Figure 5

Summary ROC Plot of pooled IQCODE data at a 3.3 threshold (or nearest value), with language as covariate. The dark point is a summary point, the broken line represents 95% confidence region

Figure 6

Summary ROC Plot of pooled IQCODE data at a 3.3 threshold (or nearest value), with setting as covariate. The dark point is a summary point, the broken line represents 95% confidence region

Test 1

All studies IQCODE 3.3 or closest

Test 2

All 16‐item IQCODE

Test 3

All 26‐item IQCODE

Test 4

IQCODE 3.3 Threshold

Test 5

IQCODE 3.4 Threshold

Test 6

IQCODE 3.5 Threshold

Test 7

IQCODE 3.6 Threshold

Test 8

IQCODE >3.6 Threshold

Test 9

16‐item IQCODE 3.3 Threshold

Test 10

16‐item IQCODE 3.4 Threshold

Test 11

16‐item IQCODE 3.5 Threshold

Test 12

16‐item IQCODE 3.6 Threshold

Test 13

26‐item IQCODE 3.3 Threshold

Test 14

26‐item IQCODE 3.4 Threshold

Test 15

26‐item IQCODE 3.6 Threshold

Test 16

Sensitivity analysis removing Goncalves

Test 17

Sensitivity analysis removing low average age

Summary of findings 1. Summary of findings

Study ID	Country	Subjects (n)	Mean Age (yrs)	IQCODE version	Language	Dementia diagnosis	Dementia prevalence N (%)	Other assessments
Flicker 1997	Australia	377 (299 from MC)	73.4 MC 79.7 ACAT	26 item	English	DSM‐III‐R	n = 248 (65.8)	AMT; MMSE
Garcia 2002	Spain	113	78	16 item	Spanish	DSM‐III‐R	n = 90 (87.4)	MMSE
Goncalves 2011	Australia	204	76.9	16 item	English	DSM‐IV‐TR	n = 152 (74.5)	RUDAS; SMMSE
Hancock 2009	UK	144	67 (median)	26 item	English	DSM‐IV	n = 85 (59.0)	ACE‐R; MMSE
Harwood 1997	UK	177	76	16 item	English	DSM‐III‐R	n = 21 (10.5)	AMT
Jorm 1991	Australia	69	80	26 item	English	DSM‐III‐R; ICD‐10	n = 24 (34.8)	MMSE
Knaefelc 2003	Australia	323	74.7	16 item	English	DSM‐IV	n = 229 (70.9)	CAMDEX
Mackinnon 1998	Switzerland	106	80.3	16 item	French	DSM‐IV	n = 58 (54.7)	MMSE
Mulligan 1996	Switzerland	76	81.8	26 item	French	DSM‐III‐R	n = 33 (43.4)	AEMT; MMSE (French)
Narasimhalu 2008	Singapore	576	65 ‐ 70 (mean by diagnosis)	16 item	Cantonese	DSM‐IV	n = 169 (29.3)	MMSE (Singapore)
Sikkes 2010	The Netherlands	328 (59 known MCI)	68.4	16 item	Dutch	NINCDS‐ADRDA	n = 180 (54.9)	MMSE
Siri 2006	Thailand	200	72.9	32 item & 26 item	Thai	DSM‐IV	n = 100 (50.0)	BOMC; CDT; MIS; MMSE
Tang 2003	China	189	74.2	26 item	Chinese	DSM‐IV	n = 24 (12.7)	CDR; MMSE
See Characteristics of included studies for more detailed study descriptors Abbreviations: ACAT ‐ Aged Care Assessment Team Group; ACE‐R ‐ Addenbrooke's Cognitive Examination‐Revised; AEMT ‐ Antisaccadic Eye Movement Test; AMT ‐ Abbreviated Mental Test; BOMC ‐ Blessed Orientation Memory; CAMDEX ‐ Cambridge Mental Disorders of the Elderly Examination; CDR ‐ Clinical Dementia Rating Scale; CDT ‐ Clock Drawing Test; DSM ‐ American Psychiatric Association Diagnostic and Statistical Manual of Mental Disorders; MC ‐ Memory Clinic Group; MCI ‐ Mild Cognitive Impairment; MIS ‐ Memory Impairment Screen; MMSE ‐ Mini Mental State Examination; NINCDS‐ADRDA ‐ National Institute of Neurological and Communicative Disorders and Stroke and Alzheimer's Disease and Related Disorders Association; NINDS‐AIREN ‐ National Institute of Neurological Disorders and Stroke and the Association Internationale pour la Recherche et l'Enseignement en Neurosciences; RUDAS ‐ Rowland Universal Dementia Assessment Scale; SMMSE ‐ Standardized Mini Mental State Examination

Summary of findings 1. Summary of findings

Summary of findings 2. New Summary of findings table

Test	Summary accuracy (95% CI)	No. of participants (studies)	Dementia prevalence	Implications, Quality and Comments
What is the accuracy of the Informant Questionnaire for Cognitive Decline in the Elderly (IQCODE) test for detection of dementia when differing thresholds are used to define IQCODE positive cases
Population	Adults attending secondary‐care services, with no restrictions on the case mix of recruited participants
Setting	Our primary setting of interest was secondary care; within this rubric we included inpatient wards and hospital outpatient clinics
Index test	Informant Questionnaire for Cognitive Decline in the Elderly (IQCODE) administered to a relevant informant. We restricted analyses to the traditional 26‐item IQCODE and the commonly‐used short form IQCODE with 16 items
Reference Standard	Clinical diagnosis of dementia made using any recognised classification system
Studies	We included cross‐sectional studies but not case‐control studies

IQCODE cut‐off 3.3 or nearest	sens: 0.91 (95% CI 0.86 to 0.94); spec: 0.66 (95% CI 0.56 to 0.75) +ve LR: 2.7 (95% CI 2.0 to 3.6) ‐ve LR: 0.14 (95% CI 0.09 to 0.22)	n = 2745 (13 studies)	n = 1413 (51%)	Within the range of commonly used cut‐offs for defining IQCODE positivity, there is no clearly optimal value for use in secondary care settings. The sensitivity falls as the diagnostic threshold increases from 3.3‐3.6, with a relative increase in the specificity. The preferred balance between sensitivity and specificity is debatable. Both false positive (person diagnosed with possible dementia and referred for further assessment) and false negative (person with dementia has diagnosis missed and is not referred to specialist services) are associated with potential harms. The dementia prevalence was highly varied in included studies (10.5% to 87.4%) reflecting the heterogeneity of included participants within a "hospital" setting. This heterogeneity and associated "modelling" of real world implications of the test accuracy data presented are described in the next summary of findings table.
IQCODE cut‐off 3.3	sens: 0.96 (95% CI 0.94 to 0.98) spec: 0.66 (95% CI 0.41 to 0.84) +ve LR: 2.8 (95% CI 1.5 to 5.5) ‐ve LR: 0.1 (95% CI 0.03 to 0.1)	n = 722 (4 studies)	n = 334 (46%)
IQCODE cut‐off 3.4	sens: 0.94 (95% CI 0.84 to 0.98) spec: 0.73 (95% CI 0.59 to 0.85) +ve LR: 3.5 (95% CI 2.1 to 5.8) ‐ve LR 0.1 (95% CI 0.03 to 0.2)	n = 1211 (4 studies)	n = 394 (33%)
IQCODE cut‐off 3.5	sens: 0.92 spec: 0.63	n = 269 (1 study)	n = 152 (57%)
IQCODE cut‐off 3.6	sens: 0.89 (95% CI 0.85 to 0.92) spec: 0.68 (95% CI 0.56 to 0.79) +ve LR: 2.8 (95% CI 1.9 to 4.0) ‐ve LR: 0.2 (95% CI 0.1 to 0.2)	n = 1576 (9 studies)	n = 968 (61%)
CAUTION: The results in this table should not be interpreted in isolation from the results of the individual included studies contributing to each summary test accuracy measure. These are reported in the main body of the text of the review. **: quantitative synthesis not performed as only one study reported data at cut‐off of 3.5
Abbreviations: sens ‐ sensitivity; spec ‐ specificity; +ve LR ‐ positive likelihood ratio; ‐ve LR ‐ negative likelihood ratio

Summary of findings 2. New Summary of findings table

Summary of findings 3. New Summary of findings table

What is the accuracy of the Informant Questionnaire for Cognitive Decline in the Elderly (IQCODE) test for detection of dementia using different versions of IQCODE and using different languages of administration
Population	Adults attending secondary care services, with no restrictions on the case‐mix of recruited participants
Setting	Our primary setting of interest was secondary care, within this rubric we included inpatient wards and hospital outpatient clinics. Secondary care settings can be considered as two groups: (1) Studies conducted in a specialist memory/psychogeriatrics setting where participants were referred due to cognitive symptoms (2) Non‐memory focused hospital services. These included unselected admissions of older adults, those referred to specialist older people's assessment teams, outpatient attenders and inpatients under the care of geriatricians
Index test	Informant Questionnaire for Cognitive Decline in the Elderly (IQCODE) administered to a relevant informant. We restricted analyses to the traditional 26‐item IQCODE and the commonly used short form IQCODE with 16 items
Reference Standard	Clinical diagnosis of dementia made using any recognised classification system
Studies	Cross‐sectional studies were included, we did not include case‐control studies
Comparative analyses
Test	No. of participants (studies)	Dementia prevalence total across studies	Findings	Implications
26‐item versus 16‐item IQCODE	Total: n = 2745 (13) 26 item n = 977 (6)	Total n = 1413 (51%) 26 item n = 514 (53%) 16 item n = 899 (51%)	No significant difference in test accuracy Relative sensitivity of 26‐item versus 16‐item IQCODE: 0.98 (95% CI 0.89 to 1.07) Relative specificity of 26‐item versus 16‐item IQCODE: 0.99 (95% CI 0.75 to 1.33)	There was no difference in accuracy between IQCODE versions so it may be justifiable to advocate use of the short form to minimise responses required
English language versus Non‐English	Total: n = 2745 (13) English language n = 1216 (6)	Total n = 1413 (51%) English language n = 759 (62%) Non‐English language n = 654 (43%)	No significant difference in test accuracy Relative sensitivity of English language versus non‐English language IQCODE: 1.07 (95% CI 0.98 to 1.17) Relative specificity of English language versus non‐English language IQCODE: 1.10 (95% CI 0.83 to 1.47)	The language of administration does not significantly influence the diagnostic accuracy of IQCODE
Non‐memory setting versus memory	Total: n = 1918 (9)* memory setting n = 1352 (6)	Total n = 1129 (59%) memory setting n = 984 (73%) non‐memory setting n = 145 (26%)	Significant difference in test accuracy between settings (P = 0.019), due to higher specificity in non‐memory settings Relative sensitivity of non‐memory versus memory IQCODE: 1.06 (95% CI 0.99 to 1.15) Relative specificity of non‐memory versus memory IQCODE: 1.49 (1.22 to 1.83)	The lower level of specificity in the specialist memory services is of limited clinical concern as other tests will be used in this setting and incorrectly diagnosing someone with dementia based on IQCODE alone would be unlikely. In the non‐memory setting it is likely a positive IQCODE would prompt referral to specialist services, and this may be associated with psychological harm and unnecessary expense. Applying our non‐memory findings to the UK; there are around 2 million unscheduled admissions annually in over‐65s (Imison 2012) and a dementia prevalence of 42.4% in this group (Sampson 2009). Using the IQCODE alone to screen for dementia would result in 42,400 people with dementia not being identified and 218,880 dementia‐free people being referred inappropriately for specialist assessment.
CAUTION: The results in this table should not be interpreted in isolation from the results of the individual included studies contributing to each summary test accuracy measure. These are reported in the main body of the text of the review. *: Four studies included participants recruited in both specialist memory and non‐memory settings, without reporting outcome data stratified by recruitment setting and are thus not included in the quantitative synthesis

Summary of findings 3. New Summary of findings table

Table 1. Summary of test accuracy at study level

Study ID	Participants (n)	Primary threshold	Sensitivity (%)	Specificity (%)
Flicker 1997	299*	3.6	87	58
Garcia 2002	103	3.6	92	81
Goncalves 2011	204	4.1	72	67
Hancock 2009	144	3.6	86	39
Harwood 1997	177	3.3	100	78
Jorm 1991	69	3.6	71	80
Knaefelc 2003	323	3.6	94	47
Mackinnon 1998	106	3.6	90	65
Mulligan 1996	76	3.3	100	42
Narasimhalu 2008	576	3.4	86	78
Sikkes 2010	269*	3.3	96	42
Siri 2006	200	3.3	94	88
Tang 2003	189	3.4	88	75
Where multiple thresholds were reported, we used the value closest to 3.3 to populate this table *Total number of participants adjusted to reflect numbers included in quantitative synthesis

Table 1. Summary of test accuracy at study level

Table Tests. Data tables by test

Test	No. of studies	No. of participants
1 All studies IQCODE 3.3 or closest Show forest plot	13	2745

2 All 16‐item IQCODE Show forest plot	7	1768

3 All 26‐item IQCODE Show forest plot	6	977

4 IQCODE 3.3 Threshold Show forest plot	4	722

5 IQCODE 3.4 Threshold Show forest plot	4	1211

6 IQCODE 3.5 Threshold Show forest plot	1	269

7 IQCODE 3.6 Threshold Show forest plot	9	1576

8 IQCODE >3.6 Threshold Show forest plot	3	772

9 16‐item IQCODE 3.3 Threshold Show forest plot	2	446

10 16‐item IQCODE 3.4 Threshold Show forest plot	3	1022

11 16‐item IQCODE 3.5 Threshold Show forest plot	1	269

12 16‐item IQCODE 3.6 Threshold Show forest plot	5	988

13 26‐item IQCODE 3.3 Threshold Show forest plot	2	276

14 26‐item IQCODE 3.4 Threshold Show forest plot	1	189

15 26‐item IQCODE 3.6 Threshold Show forest plot	4	588

16 Sensitivity analysis removing Goncalves Show forest plot	12	2541

17 Sensitivity analysis removing low average age Show forest plot	10	1756

Table Tests. Data tables by test