The effectiveness and cost-effectiveness of donepezil, galantamine, rivastigmine and memantine for the treatment of Alzheimer’s disease (review of TA111): a systematic review and economic model Produced by: Peninsula Technology Assessment Group (PenTAG), Peninsula Medical School, University of Exeter APPENDICES APPENDIX 1: OUTCOME MEASURES............................................................................................................................. 2 APPENDIX 2: LITERATURE SEARCH STRATEGIES .................................................................................................... 11 APPENDIX 3: DATA EXTRACTION FORMS .................................................................................................................. 19 APPENDIX 4: FUNNEL PLOTS FROM THE SYNTHESIS WITH EXISTING EVIDENCE ................................................ 73 APPENDIX 5: COMBINED DOSE AND DOSE-SPECIFIC META-ANALYSES ............................................................... 75 APPENDIX 6: DATA SETS USED IN META-ANALYSIS OF POOLED MULTIPLE OUTCOME MEASURES ................ 90 APPENDIX 7: META-REGRESSION FIGURES .............................................................................................................. 94 APPENDIX 8: WINBUGS CODE FOR MIXED TREATMENT COMPARISONS ............................................................. 103 APPENDIX 9: MIXED TREATMENT COMPARISONS PERFORMED IN SPECIFIED MEASUREMENT POPULATIONS ...................................................................................................................................... 104 APPENDIX 10: STUDIES INCLUDED BY INDUSTRY BUT EXCLUDED FROM THE PENTAG CLINICAL EFFECTIVENESS SYSTEMATIC REVIEW ............................................................................................ 116 APPENDIX 11: ONGOING TRIALS ................................................................................................................................. 125 APPENDIX 12: PRISMA STATEMENT CHECKLIST ...................................................................................................... 132 APPENDIX 13: SUMMARY TABLES OF RESULTS FROM THE INSTITUTE OF QUALITY AND EFFICIENCY IN HEALTH CARE. ..................................................................................................................................... 134 APPENDIX 14: MEMANTINE ACHEI V. PLACEBO ACHEI ...................................................................................... 136 APPENDIX 15: UPDATE ON EVIDENCE ABOUT THE CARE COST OF ALZHEIMER’S DISEASE IN THE UK ........... 143 APPENDIX 16: CONSIDERATION OF A TWO-DIMENSIONAL MARKOV MODEL FOR ALZHEIMER’S DISEASE ...... 150 APPENDIX 17: PREVIOUS CRITICISMS OF THE SHTAC ALZHEIMER’S DISEASE MODEL ...................................... 155 APPENDIX 18: PUBLISHED UTILITY VALUES FOR ALZHEIMER’S DISEASE ............................................................ 158 APPENDIX 19: FIGURES FROM THE STATISTICAL ANALYSIS OF IPD FROM WOLSTENHOLME AND COLLEAGUES ....................................................................................................................................... 160 APPENDIX 20: GRAPHICAL PRESENTATION OF DISTRIBUTIONS FOR PSA ........................................................... 163 APPENDIX 21: TORNADO PLOTS FOR ACHEI VERSUS BEST SUPPORTIVE CARE ................................................ 166 REFERENCES TO APPENDICES ......................................................................................................................................... 169 AChEIs & memantine for Alzheimer's Appendices Appendix 1: Outcome measures These Tables of outcome measures have been copied from the previous TAR, TA 111, Appendix 6.1 Global outcome measures Type Construct measure and scoring Critical appraisal Clinical Dementia Cognitive impairment in memory, Provides physicians with a global Rating orientation, judgement/problem- rating that encompasses a broad (CDR) and Clinical solving, community affairs, range of patient characteristics and Dementia Rating Sum home/hobbies, and personal care can be used by neurologists, of Boxes 0=none, 0.5=questionable, 1=mild, psychiatrists, and psychologists and (CDR-SB) 2=moderate, 3=severe focuses on cognition, not on items that may be related to other medical, CDR-SB is a modified form which emotional or social conditions. sums the ratings in the six Good inter-rater reliability and fair to performance categories to give a good concurrent validity. Although no global dementia ranking. work has been done on test-retest reliability, nothing so far suggests that researchers should avoid this scale when trying to stage AD. The CDR can be used as an eligibility criterion for trial participation or as an outcome measure. Global Deterioration Progressive stages of cognitive Most frequently used but ratings can Scale impairment misstate a patient‟s severity. (GDS) 1 (no cognitive decline)-7 (very severe Problems might arise when the GDS cognitive decline) is used as an inclusion criterion for participation in an RCT. The ability to enrol desired patients could be threatened if the GDS misidentifies the stages of dementia. The GDS should not be used to stage dementia in Alzheimer‟s Disease drug trials. Clinical Global Overall improvement in patient health Fair to good test-retest and inter-rater Impression of Change status assessed by clinician (-with reliability and concurrent validity. scale (CGIC) and the caregiver) Results may arise from fact that global improvement 1 (very much improved) - 7 (very groups providing global assessments index with much worse) do not base their ratings on the same interviewing of domains. Physicians take clinical patients Clinician A number of different variations are psychopathology as the basis of Interview-Based available determining global improvement, Impression of Change nurses believe the amount of work (CIBIC) and with Scale is nonparametric and of a non- needed to care for patients was caregiver input interval nature. important. This instrument also Confidential material removed PenTAG 2010 - 2 - AChEIs & memantine for Alzheimer's Appendices Type Construct measure and scoring Critical appraisal (CIBIC-M or –Plus) includes a caregiver opinion, results may differ depending on whether the rater first interviews the patient or caregiver. The number of different variations may have reduced the validity. Gottfries-Bråne-Steen Motor function, intellectual function, Psychometric properties range from (GBS) emotional function and symptoms fair to good. Scale is useful mean of common to demented patients. quantifying dementia in drug trials. 0 (normal function or absence of GBS should not be used as a symptoms) to 6 (maximal disturbance diagnostic tool. or presence of symptoms) Mental Function A modification of the GBS prepared by Unable to source data on reliability Impairment Scale the study authors for a previous study. and validity. (MENFIS) Scores range from 0 to 78, with a higher score indicating a greater degree of deficit. Patient Global 7 point Likert scale ranges from 1 Unable to source data on reliability Assessment (PGA) (very much improved) to 4 (no and validity. change) to 7 (very much worse) Cognitive outcome measurement scales Type Construct measure and scoring Critical appraisal Alzheimer‟s Disease Orientation, memory, language and Limited in its ability to detect change at Assessment Scale- praxis one end or the other of the severity cognitive 0-70, with higher scores indicating continuum. For many subtests, (ADAS-cog) greater impairment detection of improvement appears only possible for a restricted range of severity levels. Limitations should be considered when used as a drug efficacy measure. The rate of decline of AD using ADAS-cog suggests that the decline is non-linear and not a constant but is dependent on the stage of the disease. Content and ecological validity are lacking. Benton Visual Assesses visual perception, visual The interscorer agreement for total Retention Test memory, and visuoconstructive error score is high and for major (BVRT) abilities. The test has three alternate categories of errors reliability is forms, each consisting of ten designs. moderate to high. A correlation of 0.42 In addition, there are four possible was found between the Benton and modes of administration. Scoring is the Digit Span WAIS subtest. This low based on an assessment of the correlation indicates discriminate number and types of errors made validity since the Benton was created compared with the expected scores to supplement the Digit Span test. found in the norm tables. The wider Confidential material removed PenTAG 2010 - 3 - AChEIs & memantine for Alzheimer's Appendices Type Construct measure and scoring Critical appraisal the discrepancy in favour of the Educational level may influence a expected score, the more probable it participant‟s score on the test. is that the participant has suffered Participants with higher educational neurological impairment. levels tend to use a more exhaustive exploration strategy during the recognition phase of the test, allowing them to perform better than participants with lower educational levels. The executive working memory component is more efficient in participants with higher educational levels. Computerised A computerized version of the Memory The MAC-Q questionnaire Memory Battery Assessment Clinical Battery (MAC) demonstrates internal consistency and (CMBT) designed to simulate critical cognitive test-retest reliability. tasks: Name-Face Association (delayed recall and total acquisition);First and Last Names (total acquisition), Facial Recognition (first miss and total correct); Telephone Number Recall (7-digit and 10-digit number correct); House and Object Placement Task (total acquisition and first trial) Clinical Global This rating instrument expresses the This is a sub-test of the CGI, it is easy Impression-item 2 global change in observable cognitive and quick to administer and is widely (CGI- 2) functioning directly on a transitional used in clinical and trial settings. scale ranging from 1 (very much improved) to 7 (very much deteriorated) as rated by a clinician. Digit symbol Participants fill in a grid of 100 blank Performance on this test is affected by substitution subtest squares, each paired with a randomly many different components, so the (DSST) of the assigned number from 1 to 9, using a test lacks specificity. Participants with Wechsler Adult key that pairs each number with a impaired vision or visuomotor Intelligence Scale- different symbol. The score is the coordination, pronounced motor Revised number of correct answers after 90 slowing or low education levels are at seconds. a disadvantage. Fuld object-memory Ten item assessment with ten Unable to source data on reliability evaluation (FOME) common objects in a bag are and validity. presented "to determine whether the patient can identify objects by touch" (stereognosis). The test was developed while testing large samples of aged adults, nursing home residents and community active people, for whom norms are provided. Mini-Mental State 11 questions on orientation, memory, Good reliability and validity for its Examination (MMSE) concentration, language and praxis. original purpose of screening for Confidential material removed PenTAG 2010 - 4 - AChEIs & memantine for Alzheimer's Appendices Type Construct measure and scoring Critical appraisal Scale ranges from 0-30. Higher score dementia, short screening scales are indicates less impairment. There is no not designed to measure more subtle range of scores that can be rigidly and aspects of cognition. Short scales universally applied to indicate such as the MMSE may indicate little dementia severity i.e. as a marker of or no change over time in subjects mild, moderate and severe dementia. who would otherwise be shown to In clinical trials often a score of 21-26 have declined substantially if another is associated with mild AD, moderate scale had been used to measure AD is associated with an MMSE of 10 change in status. Not an ideal to 20 and severe AD is usually outcome measure for AD drug trials, associated with an MMSE of less than especially if the expected benefits are 10. This may be less suitable within not large. It has dependence on intact routine daily practice. language ability and there are no available validated versions in languages suitable for use with ethnic minorities. It cannot be used effectively in people with low IQs or learning disabilities. Severe Impairment A measure of cognition that was The SIB has been shown to be Battery (SIB) developed to assess a range of psychometrically reliable and clinical cognitive functioning in individuals who norms are available. No further details are too impaired to complete standard of reliability and validity have been neuropsychological tests and takes sourced. into account specific behavioural and cognitive deficits associated with severe dementia. It is composed of 40 simple one-step commands which are scored on a three point scale and are presented in conjunction with gestural cues. The SIB also allows for non- verbal and partially correct responses. The six major subscales are attention, orientation, language, memory, visuo- spatial ability, and construction. Overall scores range from 0-1000 with positive scores indicating clinical improvement Syndrom Kurz Test A psychometric test battery for the This test has shown good test-retest (SKT) assessment of memory and attention. reliability. Correlations with other The SKT consists of nine 1 minute cognitive measures support its validity subtests that are partly speed oriented as a cognitive outcome measure for and partly span orientated: scaled AD. subtest scores are aggregated to an SKT total status score ranting from 1 (very good) to 27 (very poor). Ten Point Clock This is a screening test for dementia in This test has been shown to be both Drawing Test particular for assessing visuospatial reliable and valid and is simple and and executive functions. Patients have easy to administer with good to drawn in the numbers of digits sensitivity and specificity. placed in a pre drawn circle. Confidential material removed PenTAG 2010 - 5 - AChEIs & memantine for Alzheimer's Appendices Type Construct measure and scoring Critical appraisal Trail Making Test Assesses speed of visual search, Reliability is reported to be higher for (TMT) attention, mental flexibility and motor part A than for part B, which requires function. The test has two parts: A) more information-processing ability drawing a line linking numbers in and is more sensitive to brain damage. sequence and B) drawing a line linking Reliability is restricted due to the use letters in sequence. The reviewer calls of time scores rather than both error any mistakes to the attention of the counts and time scores, since error participant, and these must be correction may take longer in some corrected before progressing. The participants than others. Scores are score is the time taken to successfully strongly affected by the participant‟s complete a test. education level. Wechsler logical This test is one of 13 subtests of the Test-retest reliability and concurrent memory test Wechsler Memory Scale-Revised. The validity with a verbal learning test are first subtest is for screening purposes, adequate for the whole WMS-R test. and the other 12 are grouped into five Level of education affects a separate memory areas. The test participant's score. Normative data for manual provides guidelines for scoring those aged 75 and over is lacking. The and weighting, and provides norms for score is more heavily influenced by individuals aged 16-74 with verbal memory performance than by information about significant other memory components. differences between any two scores. Functional and quality of life outcome measurement scales Type Construct measure and scoring Critical appraisal Alzheimer‟s This rating scale is a 23-item assessment of The ADCS-ADL is a structured Disease ADLs that is scored from 0 (greatest questionnaire originally created to Cooperative impairment) to 78. It evaluates Activities of assess functional capacity over a Study-Activities of daily living. broad range of severity of dementia. Daily Living The ADAS-ADL19 is a subset of the original inventory and focuses on ADCS-ADL items appropriate for the assessment of later stages of dementia. The sensitivity and reliability of this modification has been established. Alzheimer's Scale consists of 10 items for instrumental Full assessment of psychometric Disease ADL: ability to use the telephone, properties not yet published. Has Functional performing household tasks, using face validity for those with mild- assessment and household appliances, handling money, moderate AD. Change Scale shopping, preparing food, ability to get The ADL items chosen for this scale (ADFACS) around both inside and outside the home, have been demonstrated to be pursuing hobbies and leisure activities, sensitive to change over 12 months, handling personal mail, grasping situations correlate well with MMSE scores, and or explanations. Scale has a range of 0 to have good test-retest reliability 54 where lower scores correspond to better (although several questions have function. Test takes approximately 20 been modified in the scale). minutes to complete. Confidential material removed PenTAG 2010 - 6 - AChEIs & memantine for Alzheimer's Appendices Type Construct measure and scoring Critical appraisal Behavioural Consists of 35 items (scored 0, 1, or 2) Unable to source data on reliability Rating Scale for assessing observable aspects of cognition, and validity. Geriatric Patients function and behaviour. A high score (BGP) indicates worse function. Bristol Activities Caregiver assessment of 20 ADLs. Designed specifically for use with of Daily Living Categories included are food, eating, patients with dementia. Face validity scale (BADL) drinks, drinking, dressing, hygiene, teeth, was measured by asking carers bath, toilet, transferring, mobility, orientation whether items were important, and to time and space, communication, construct validity was confirmed by telephone, housework/gardening, shopping, principal components analysis. finances, hobbies, and transport. Scores Concurrent validity was assessed by range from 0 - 60 with higher scores observed performance, the test has indicating better function. good content validity, and there is good test-retest reliability. The test is shown to correlate well with performance ADLs and tests of cognitive function. Caregiver-rated A modified Crichton Geriatric Rating Scale Reliability demonstrated. Unable to Modified Crichton (CGRS). This a seven-item scale using a source data on validity. Scale (CMCS) Likert-type scoring method. Questions include comprehension to time and place, carrying out conversation, cooperation, restlessness, dressing, social activities and leisure. Negative change relates to clinical improvement. Disability This rating scale is a 46-item structured The DAD scale demonstrates a high Assessment interview or questionnaire for the caregiver degree of internal consistency and for Dementia that is scored from 0 to 100 (least excellent interrater and test-retest (DAD) impairment). It evaluates ADLs and takes reliability. Full details of concurrent approximately 20 minutes to complete. It is and construct validity not yet based on a recognised conceptual definition published. of disability from the WHO Functional Assesses the magnitude of progressive FAST has been shown to be a Assessment functional deterioration in patients with reliable and valid assessment Staging scale dementia by identifying characteristic technique for evaluating functional (FAST) progressive disabilities. Seven major stages deterioration in AD patients range from normal (stage 1) to severe throughout the entire course of the dementia (Stage 7). illness. Because the elements of functional capacity incorporated in FAST are relatively universal and readily ascertainable, as well as characteristic of the course of AD, FAST can serve as a strong diagnostic and differential diagnostic aid for clinicians. General Health GHQ-30 The GHQ is a self-report GHQ-30 is based on Medical Questionnaire psychiatric screening test, and items Outcomes Study Short Form-36, (GHQ-30) include questions on: depression and which is extensively validated Confidential material removed PenTAG 2010 - 7 - AChEIs & memantine for Alzheimer's Appendices Type Construct measure and scoring Critical appraisal unhappiness, anxiety and felt psychological disturbance, social impairment, and hypochondriasis. Participants rate themselves on a four-point severity scale, according to how they have recently experienced each GHQ item: better than usual, same as usual, worse than usual, or much worse than usual. Normally each item is scored either 0 or 1, depending on which severity choice is selected. Individual items are summed to give the total score. Instrumental For women, the set of behaviour assessed The IADL is a very frequently used Activities of Daily include telephoning, shopping, food and often cited instrument for Living (IADL) preparation, housekeeping, laundering, use assessing the instrumental of transport, use of medicine and ability to competence of elderly patients. The handle money. For men, the areas of food scale is well anchored from a preparation, housekeeping and laundering theoretical point of view and the are excluded. behaviours that are included are Each of the behavioural areas is given a likely to be affected in the first stages score of 0 or 1, leading to an overall score of dementia. that ranges from 0 to 8 for women and from 0 to 5 for men. The Interview for The IDDD measures functional disability in This scale appears to be appropriate Deterioration in self-care (16 items such as washing, to assess community-living patients Daily Living in dressing and eating) and complex activities with mild and moderate levels of Dementia (IDDD) (17 items such as shopping, writing, and dementia. It assesses a substantial answering the telephone) proportion of complex activities likely Severity of impairment is rated on a 7-point to be affected during the first stages scale, where 1-2=no or slight impairment, of the AD. The number of non- redundant items in the scale is 3-4=mild impairment, 5-6=moderate viewed positively since it may impairment, 7=severe impairment, giving a increase the sensitivity of the tool. total range score of 22-231. Empirical info on the testing of the IDDD and its measurement properties is seriously lacking. Physical Self- Measured through competence of 6 Brief assessment of activities of daily Maintenance behaviours: toileting, feeding, dressing, living. Theoretically well grounded, it Scale (PSMS) grooming, locomotion and bathing. It can be has been proven useful for evaluation completed by untrained staff based on of institutionalised elderly but has a information from subjects, caregivers, ceiling effect for those living in the friends etc. Each behavioural area is given community. Testing of psychometric a score of 1 or 0, with over score ranging properties is incomplete. from 0 to 6. Using Guttman scaling, each scale point has 5 descriptive scale points. The Progressive PDS examines activities of daily living and This scale has been shown to be Deterioration instrumental activities of daily living. sensitive to three severity stages of Scale (PDS) Examples are: extent to which a patient can dementia although some debate leave the immediate neighbourhood, use of whether the content is adequate to familiar household implements, involvement assess those with moderately-severe Confidential material removed PenTAG 2010 - 8 - AChEIs & memantine for Alzheimer's Appendices Type Construct measure and scoring Critical appraisal in family finances, budgeting. AD. The scale was systematically Each question is scored by measuring the developed and tested on a fairly large distance along the line on a scale from 0 to sample of AD patients (although the 100, with higher scores reflecting better mean age of the final test group was functionality. A composite score is derived only 69.5 years). from averaging across the items for a Test-retest reliability was determined maximal score of 100. in 123 patients, giving stage The scale is sometimes classified as a correlations (rs) of 0.889 for early AD measure of quality of life. (14 participants), 0.775 for 44 middle stage participants and 0.775 for 65 late stage participants. A moderate degree of correlation has been demonstrated between PDS and ADAS-cog scores (rp= -0.57 to - 0.64). There is considerable reduplication within the scale – 4 questions relate to handling finances but there are no items pertaining to basic activities such as washing, dressing and toileting. The scale is therefore not thought to have adequate content to assess people with moderately severe AD as it does not assess the wide range of daily living skills affected at different stages of the disease. There are high levels of between and within patient variability (in the order of 12 points) which may make it less suited to detect differences over short time periods. QOL (patient and This assessment was a 7-item patient-rated This instrument has not been caregiver scales) scale evaluating the patients perceptions of validated in patients with Alzheimer's their well-being in terms of relationships, disease but was selected because no eating and sleeping, and social and leisure QOL instrument has been validated activities. The tests is conducted by in this population. interview. Scored on an analogue scale between 0 (worst quality) to 50 (best quality). Unified Activities All self-care and mobility variables The psychometric properties of this of Daily Living commonly used to assess patient‟s scale, resulting from the combination Form (Unified functional status. of existing evaluations, have not ADL) A 20-item scale was produced. The need been published. for assistance is scored for every item, on a 10-point scale. Confidential material removed PenTAG 2010 - 9 - AChEIs & memantine for Alzheimer's Appendices Behaviour and mood outcome measurement scales Type Construct measure and scoring Critical appraisal Behavioural A measure of the severity of behavioural The BEHAVE-AD has been Pathology in symptoms in AD. It consists of 25 symptoms shown to be reliable and valid. Alzheimer‟s group onto seven categories. Each symptom is Disease rating scored on the basis of severity on a four point scale (BEHAVE- scale. AD) Behavioural A 35 item rating scale more commonly used in No information about the Rating Scale for European trials. reliability or validity of this scale Geriatric patients was found. (BGP) NOSGER - Contains 30 items of behaviour, each rated on This scale has been validated, Nurses a 5-point scale according to frequency of and has high inter-rater and test- Observation occurrence. Item scores are summarized into 6 retest reliability. The test Scale for Geriatric dimension scores (memory, instrumental correlates well with clinician's Patients activities of daily life, self-care, mood, social global rating of change. behaviour, and disturbing behaviour). Neuro-psychiatric Currently evaluates 12 items: delusions, Content validity has been Inventory (NPI) hallucinations, dysphoria, anxiety, agitation, established, reliability and validity euphoria, apathy, irritability, disinhibition, are satisfactory. Limitations aberrant motor behaviour, night-time behaviour included: poor description of and changes in appetite/eating behaviour. appraisal period for behavioural Psychometric properties were established on symptoms; no justification for first 10 items. Total score for each domain is scoring system; and, inter-rater calculated by multiplying frequency rating by reliability was poorly deserved. severity rating, adding domain scores to get a total score. Higher scores represent more problems. Maximum scores is 12 per domain, with either 10 or 12 domains assessed. Confidential material removed PenTAG 2010 - 10 -
Description: