Espelandetal.BMCMedicalImaging2013,13:4 http://www.biomedcentral.com/1471-2342/13/4 RESEARCH ARTICLE Open Access Are two readers more reliable than one? A study of upper neck ligament scoring on magnetic resonance images Ansgar Espeland1,2*, Nils Vetti1,2 and Jostein Kråkenes1,2 Abstract Background: Magnetic resonance imaging (MRI) studies typically employ either a single expert or multiple readers incollaborationto evaluate (read) theimage results. However, no study has examinedwhether evaluations from multiple readers provide more reliable results than a single reader. We examined whether consistency inimage interpretation by a single expert might be equal to the consistency of combined readings, defined as independent interpretationsby two readers, where cases of disagreement were reconciled by consensus. Methods: One expertneuroradiologist and one trained radiology resident independently evaluated 102MRIs ofthe upper neck. The signal intensities of the alar and transverse ligaments were scored 0, 1, 2, or 3. Disagreementswere resolved by consensus. They repeated thegrading process after 3–8 months (second evaluation). We used kappa statistics and intraclass correlationcoefficients (ICCs) to assess agreement between theinitial and second evaluations for each radiologist and for combined determinations.Disagreementson score prevalence were evaluated withMcNemar’stest. Results: Higher consistency between the initial and second evaluations was obtained with the combined readings than withindividual readings for signal intensityscores of ligaments onboth theright and leftsides of thespine. The weighted kappa ranges were 0.65-0.71 vs. 0.48-0.62 for combined vs. individual scoring, respectively. The combined scores also showed better agreement between evaluations than individual scores for thepresence of grade 2–3signal intensities on any side in a givensubject(unweightedkappa 0.69-0.74 vs. 0.52-0.63, respectively). Disagreement between theinitial and second evaluations ontheprevalence of grades 2–3 was less marked for combined scores thanfor individual scores (P ≥ 0.039 vs.P ≤ 0.004, respectively).ICCs indicated a more reliable sumscore perpatientfor combined scores (0.74) and both readers’average scores (0.78) than for individual scores (0.55-0.69). Conclusions: This study was thefirst to provideempirical support for the principle thatan additional reader can improve thereproducibility of MRI interpretationscompared to one expert alone. Furthermore, even a moderately experienced second reader improved thereliability comparedto a single expert reader. The implications of this for clinical work require further study. *Correspondence:[email protected] 1DepartmentofRadiology,HaukelandUniversityHospital,JonasLiesvei65, 5021,Bergen,Norway 2DepartmentofSurgicalSciences,UniversityofBergen,Bergen,Norway ©2013Espelandetal.;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited. Espelandetal.BMCMedicalImaging2013,13:4 Page2of7 http://www.biomedcentral.com/1471-2342/13/4 Background polarized, receive-only, head coil, with a 1.5 Tesla scan- A key feature of any imaging test is its reliability [1]. A ner (Symphony Mastroclass, Siemens Medical System, conclusive image reading should be reliable, regardless Erlangen, Germany). We used an established protocol of the number of readers involved. Higher reliability for MRI of upper neck ligaments [14]. This protocol might be expected when multiple readers interpret the includedproton-density-weightedfast-spinechosequences images and their readings are combined than when only of the upper neck in the axial, coronal, and sagittal one reader interprets the images. However, we have planes with the following parameters: repetition time: found no empirical data to support or refute this as- 2150–2660 ms, echo time: 15 ms, slice thickness: 1.5 sumption. A majority or consensus view is not necessar- mm, interslicegap: 0.0 mm or 0.3 mm (sagittal), field of ily more reliable than the view of one expert alone. Two view: 175 mm × 200 mm or 200 mm × 200 mm (cor- readers combined could, in theory, be less consistent onal),voxelsize:0.6-0.7×0.4×1.5mm3,andechotrain than one reader (as exemplified by the hypothetical data length:13. shownin Additional file 1). Double, or repeated readings The alar and transverse ligaments were scored on a can also affect validity; this approach can prevent errors scale of 0, 1, 2, or 3 based on the ratio of the largest [2,3], butitcan alsoincrease falsepositive rates[4]. cross-sectional area of a high intensity signal (observed Radiologistsprovidea largenumber ofexpert opinions in at least two imaging planes) to the total cross- in their daily work. This work could be significantly sectional area of the ligament [15,16]. A high intensity impacted by data that showed a single expert opinion signal in 1/3 or less of the total cross sectional area was was insufficient or that a second opinion provided add- scored 1; a high intensity signal in 1/3 to 2/3 of the total itional benefit. In researchsettings,itis more feasible for cross sectional area was scored 2; and a high intensity a single expert to study large numbers of images, rather signal in 2/3 or more of the total cross sectional area than multiple readers. Indeed, many studies have was scored 3. Homogenous grey ligaments were scored reported conclusive image findings based on the deter- 2.Ligamentswith nohigh intensity signal were scored 0. mination of only one expert reader [5-7]. In other stud- The right and left sides of the spine were scored separ- ies, multiple readers were used to determine the final ately; alar ligaments were scored on sagittal sections, image results [8-11]. We compared these two approaches and transverse ligaments were scored on sagittal or cor- forscoringthesignalintensitiesofthealarandtransverse onalsections,dependingonligament orientation. ligaments on upper neck magnetic resonance images One neuroradiologist (reader A) with 26 years experi- (MRIs).Consistentimagereadingsarerequiredinresearch ence and one radiology resident (reader B) with 6 years toassessthepresenceandclinicalrelevanceof highinten- experience independently scored the signal intensities of sity signals [11-13]. Our aim was to determine whether the ligaments. Then, all disagreements were resolved by consistency in image interpretation by a single expert consensus. This process resulted in individual scores for might be equal to the consistency of combined readings, eachreaderandcombinedscoresforbothreaders(based defined as independent interpretations by two readers, on independent readings followed by consensus reading wherecasesofdisagreementwerereconciledbyconsensus. in cases of disagreement). Prior to this study, both read- ers were trained in the scoring system used and had dis- cussed scores in joint meetings. Reader A had previously Methods scored several thousand ligaments and reader B had This study included 102 prospectively recruited subjects scored aboutone thousand ligaments. (49 men and 53 women; mean age 47.2 years) that com- The images were de-identified, presented in a random prised 68 healthy volunteers, 18 patients with rheumatoid order (according to a computer generated list of random arthritis, and 16 patients with chronic neck pain. These numbers), and interspersed among similar images that subjects represented random subsamples of participants were not used in this study. After 3–8 months, the same included in a larger project on MRIs of upper neck liga- images were presented in a new random order (accord- ments. Based on a computer generated list of random ing to a new list of random numbers), and again, inter- numbers, the present study included the same relative spersed among similar images. The readers were not numbersof healthyindividuals,patientswitharthritis,and told thattheyhadassessedtheimagespreviously.Readers patientswithneckpainaswereincludedinthelargerpro- A and B independently re-scored the signal intensities of ject.All subjects gave written informed consent to partici- the ligaments and resolved any disagreements by consen- pate. The study was in compliance with the Helsinki sus.Thus,theyrepeatedtheentireprocessfollowedinthe DeclarationandwasapprovedbyTheRegionalCommittee firstevaluations. forMedicalResearchEthics,Western-Norway. Agreementsbetween theinitialandsecondevaluations All subjects were imaged with their head and neck in were analyzed for each reader and for the combined a neutral position in a standard, one-channel, circular, determinations. We analyzed evaluations of each of the Espelandetal.BMCMedicalImaging2013,13:4 Page3of7 http://www.biomedcentral.com/1471-2342/13/4 four ligament parts separately (left and right alar liga- individual readings (Table 1, Figure 1). This applied to ments, left and right parts of the transverse ligament) either ligament side (weighted kappa range was 0.65-0.71 and all the ligament parts combined. We calculated for combined readings vs. 0.48-0.62 for individual read- linearly weighted kappa values to assess agreement on ings) and to the presence of grade 2–3 signal intensities scores 0–3 for each side. We used unweighted kappa onanyligamentsideinagivensubject(unweightedkappa values to assess agreement on scores 2–3 vs. scores 0–1 range was 0.69-0.74 for combined readings vs. 0.52-0.63 per subject on any side (right and/or left). Kappa values for individual readings) (Table 1). For all ligament parts are expressed with 95% confidence intervals (CIs) based combined (n = 408), the weighted kappas for agreement on SEs (non-zero) and were interpreted as follows: k ≤ between the initial and the second evaluations were 0.56 0.20, poor; 0.21-0.40, fair; 0.41-0.60, moderate; 0.61-0.80, (95%CI:0.50,0.62)forreaderA,0.55(95%CI:0.48,0.62) good; and 0.81-1.00, very good agreement beyond for reader B, and 0.68 (95% CI: 0.63, 0.74) for the com- chance [17]. Disagreements on the prevalence of scores binedreading. 2–3 were assessed with McNemar’s test. P < 0.05 was Ahigherprevalenceofsignalintensityscoresof2–3per taken toindicatestatisticalsignificance. subject was reported in the second evaluations compared We compared the sum of the scores for all four liga- to the initial evaluations (Table 2). The P values for the ment parts (MRI sum score, 0–12) between reader Avs. difference betweenevaluationsweresmallerfor individual reader B vs. their combined scoring vs. their average reader scoring than for combined scoring (P ≤ 0.004 vs. scoring regarding a) intra-reader reliability using intra- P≥0.039forindividualvs.combined differences,respect- class correlation coefficients (ICCs, two-way model, as- ively)(Table2). suming normality), b) smallest detectable change (SDC), In the initial evaluation, the combined scores agreed andc)difference betweenthesecondandfirstevaluation with reader A’s scores in 86.0% of cases and with reader using Bland Altman plots with 95% limits of agreement. B’s scores in 80.1% of cases. In the second evaluation, the ICC ≥ 0.70 suggested adequate reliability [18]. SDC was combinedscoresagreedwithreaderA’sscoresin83.3%of definedas1.96×√2SEM(standarderrorofmeasurement) cases and with reader B’s scores in 77.7% of cases. and indicated the smallest change in MRI sum score Weighted kappa values for agreement between A and B that, with P < 0.05, could be interpreted as a “real was 0.49 (95% CI: 0.41, 0.57) in the initial evaluation and change”above measurementerrorinoneindividual[18]. 0.56(95%CI:0.49,0.62)inthesecondevaluation(allliga- Data were analyzed using WINPEPI 10.0 (http://www. mentpartscombined,n=408). brixtonhealth.com/pepi4windows.html). The MRI sum score for all ligament parts had higher A statistical power assessment indicated that, with a intra-reader reliability with combined scoring (ICC 0.74, true, unweighted kappa of 0.70 and a prevalence of 30% 95% CI: 0.64, 0.82) and both readers’ average scoring for the relevant MRI finding (e.g., intensity signal scores (ICC 0.78, 95% CI: 0.68, 0.84) than with individual reader of 2–3), 85 paired observations would provide 80% scoring(A: ICC 0.55,95%CI:0.40,0.67;B:ICC 0.69,95% power to give a significant result at the 5% level in a CI:0.58,0.78).SDCinMRIsumscorewaslowestforaver- two-sided test of k = 0.40 [19]. With a true, unweighted age scoring (2.9) followed by combined (3.3) and individ- kappa of0.60and aprevalenceof30%, 191paired obser- ual scoring (A: 4.2, B: 4.4). Similarly, MRI sum score vations would provide 80% power to give a significant differed less between the two evaluations when average result. This study included 102 paired observations, or scoringorcombinedscoringwasused(Figure2). 408paired observations,includingallligament parts. Discussion Results In this study on the reliability of two readers compared to Better agreement between the initial and second evalua- one, the combined reading was more reproducible than a tionsofalarandtransverseligament signalintensitieswas single expert’s reading. Therefore, research designs should obtained with the combined readings than with the preferentiallyuseacombinedreadingforconclusiveresults. Table1Kappavaluesforagreementbetweeninitialandsecondevaluations Alarligamentscores0-3 Transverseligamentscores0-3 Scoredby Rightside Leftside Anyside, Rightside Leftside Anyside, scores2-3 scores2-3 ReaderA 0.59(0.47,0.70) 0.62(0.51,0.73) 0.63(0.48,0.78) 0.50(0.35,0.64) 0.51(0.38,0.64) 0.59(0.43,0.75) ReaderB 0.51(0.39,0.63) 0.57(0.44,0.70) 0.52(0.35,0.68) 0.48(0.32,0.64) 0.58(0.45,0.72) 0.53(0.38,0.68) AandBcombined 0.68(0.56,0.79) 0.71(0.61,0.81) 0.74(0.61,0.88) 0.66(0.53,0.79) 0.65(0.54,0.77) 0.69(0.54,0.84) Valuesrepresentlinearlyweightedkappavaluesforscores0,1,2,or3oneachsideofthespine,andunweightedkappavaluesforscores2–3vs.scores0–1per subjectonanyside(rightand/orleft),with95%confidenceintervalsinparenthesis,basedonmagneticresonanceimagingin102subjects. Espelandetal.BMCMedicalImaging2013,13:4 Page4of7 http://www.biomedcentral.com/1471-2342/13/4 Figure1ScoringhighsignalintensitiesofalarandtransverseligamentsonupperneckMRIs.Proton-density-weighted,fast-spinecho,1.5 TeslaMRIsectionswereperformedin(A,D)coronal,(B,E)sagittal,and(C,F)axialdirections.MRIswerefromtwohealthywomen,aged(A-C) 44yearsold,and(D-F)60yearsold.Brokenlinesmarkthesagittalplane.(A-C)Thetransverseligamentisindicatedwitharrowheads.Thehigh intensitysignalwasscored2byreaderA,1byreaderB,and2byconsensus;inthesecondevaluation,thesamesignalwasscored2byboth readersindependently.Thealarligamentisindicatedwitharrows.(A,B)Thehighintensitysignalwasgraded2bybothreadersindependently; inthesecondevaluation,thesamesignalwasscored2byreaderA,3byreaderB,and2byconsensus.(D-F)Thetransverseligament(arrow heads)andalarligament(arrows)werescored0bybothreadersindependentlyinbothevaluations. Whenreportinga sum scorefor several MRI findings, two expert might have provided even more improvement in readers’ average score can be used. Importantly, the two the reliability. Third, the consensus reading in cases of readers in our study first interpreted all images independ- agreement may be useful, because consensus discussions ently; then, they solved all disagreements in consensus. can improve agreementbetweenreaders[2,24]. They did not perform a consensus reading without prior Theprevalenceofahighsignalintensityscoreincreased separate readings. This approach makes it impossible to from the first to the second reading (Table 2). This was assess observer variation, and it is not advised in research probably due to uncertainty in interpretation or due to a settings[20]. response bias (i.e., the readers’ tendency to prefer scoring Three other points should be noted. First, the com- high or low, particularly when in doubt, independently of bined reading improved the reliability of results, despite the signal characteristics [25]). Interestingly, the preva- the fact that the expert alone achieved moderate to good lence of a high signal intensity score increased between reliability. Because this level of reliability is common in evaluations less when based on both readers’ combined diagnostic imaging [21-23], our findings may be general- reading, probably because ambiguous cases were more ized to many types of imaging examinations. Second, the likely to be discussed in consensus and scored consist- additional reader had moderate experience. A second ently. In a prior study [16], each of two readers evaluated Table2Prevalenceofscores2–3oninitialandsecondevaluations Alarligamentscores2-3* Transverseligamentscores2-3* Scoredby Initial% Second% Pvalue§ Initial% Second% Pvalue§ ReaderA 29.4 43.1 0.001 27.5 40.2 0.004 ReaderB 22.5 40.2 <0.001 25.5 46.1 <0.001 AandBcombined 31.4 39.2 0.039 28.4 36.3 0.057 Thedataarebasedonmagneticresonanceimagingin102subjects. *Highestassignedscorewhendifferentintensitieswerenotedontherightandleftsidesofthespine. §Pvaluefordifferenceinprevalence(%),basedonMcNemar’stest. Espelandetal.BMCMedicalImaging2013,13:4 Page5of7 http://www.biomedcentral.com/1471-2342/13/4 Reader A Reader B Combined reading Average reading 1 e r o c score 2 minus sum s---3 -1 0 1 2 3 4 55 4 2 - -5 --2 1 0 1 2 3 4 54 -3 - -5 4 -3 2 ---1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5 m u S 0 2 4 6 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 Mean of sum score 1 and sum score 2 Figure2BlandAltmanplotsofdifferenceinMRIsumscorebetweensecondandfirstevaluations.On102upperneckMRIs,tworeaders AandBindependentlygradedthesignalintensitiesofthealarandtransverseligamentsonbothsidesofthespine(i.e.fourligamentparts)ona scaleof0,1,2or3andthenresolveddisagreementsbyconsensus(combinedreading).Theyalsorepeatedthegradingprocess(second evaluation).Thesumofthescoresforallligamentparts(MRIsumscore,possiblevalues0–12)wascalculated.Ineachplot,thedifference betweensumscore2(secondevaluation)andsumscore1(firstevaluation)isplottedagainstthemeanofthetwosumscores.Dottedlines representmeandifferenceand95%limitsofagreement.Theplotsshowgenerallysmallerdifferencesforbothreaders’combinedreadingandfor theaverageofbothreaders’scores(averagereading)thanforindividualreaderscores.Meandifferenceinsumscore(with95%limitsof agreement)wasforreaderA−0.4(−4.4,3.7),readerB1.2(−2.7,5.1),combinedreading0.0(−3.4,3.4)andaveragereading0.4(−2.5,3.3). the signal intensity of the alar ligament, and assigned potential relationship to clinical features, outcome, or lowerscoresinthesecondreadingthaninthefirst. treatment effects [33,34]. InsomepreviousMRIstudiesofthealarandtransverse Our findings support the use of two or more readers ligaments, conclusive ligament interpretations were based for determining conclusive image readings in research, on one reader’s evaluation [26,27], two readers’consensus particularly for images with some ambiguity. In cases evaluation (without prior separate readings) [28], or two where a reliable result was previously documented, one readers’combined reading(i.e.,separatereadings followed expert’s reading might be sufficient for a conclusive byconsensusreadingincasesofdisagreement)[13,29,30]. reading. The use of two readers must be weighted Many factors affected the quality of these studies in against the additional effort required to solve disagree- addition tothe reliabilityofthe conclusive ligament inter- ments in consensus or to employ additional readers. pretation. Nevertheless, our data indicated that the con- MRIs of upper neck ligaments yield limited clinical clusive interpretation was more reproducible when based information, and they are not recommended for routine on the combined reading, compared to a single expert clinical use [12,13,31,32,35,36]. It has been speculated reading. that the ligament high signal intensities may represent Combinedreadings wereperformedinourlargerstud- normal morphological ligament variants with loose con- ies on high signal intensities of alar and transverse liga- nective tissue and/or fat [13,27,31]. Nevertheless, the ments. In those studies, the same two readers that were present study suggested that more than one reader used in the present study independently scored all liga- would provide benefit, e.g., on MRI findings in a whip- ments, solved all disagreements in consensus, and lash patient. Further studies of clinically important reported a conclusive combined score [11,12,31,32]. image findings are required to confirm the higher reli- Based on that conclusive score, the high intensity signal ability oftwo readers compared toone. differed little between healthy volunteers and patients An important unresolved question is whether a “two with rheumatoid arthritis, chronic neck pain, or acute readersapproach” provides better agreement with aclin- whiplash; furthermore, the conclusive score did not ical “gold” standard than readout with one single expert, affect outcome after acute whiplash [11,12,31,32]. The or has better predictive utility. No “gold” standard exists process of conclusive image reading had been opti- and no predictive utility has been documented for the mized to improve the reliability, which is essential MRI findings evaluated in this study. It is also not clear before assessing validity [1]. Neglecting this optimization whether a “multiple readers approach” with use of more might lead to underestimations of the finding’s than two independent readers’ majority score may be Espelandetal.BMCMedicalImaging2013,13:4 Page6of7 http://www.biomedcentral.com/1471-2342/13/4 more reliable, accurateor clinically useful,or whether an for clinical work should be assessed in further studies of assessmentofdiscrepantreadsper semayprovide clinic- more clinically relevantimaging findings. ally more relevant information than consensus reading incasesofdisagreement. Additional file A major strength of this study was that the readers were blinded to the study design. The study had a mod- Additionalfile1:Hypotheticalexampleofacaseinwhichtwo erate sample size and power. However, all differences in readerscombinedprovidedlessconsistentscoresthanthat kappa values, in P values for disagreement on preva- providedbyeitherreaderindividually. lence, in ICCs, in SDC, and on Bland Altman plots were Competinginterests in the same direction; this indicated higher reliability Theauthorsdeclarethattheyhavenocompetinginterests. with two readers than with one. The kappa for all ligament parts together indicated significantly higher Authors’contributions AEconceived,designed,andcoordinatedthestudy,performedstatistical reliability based on non-overlapping CIs. These CIs analyses,anddraftedthemanuscript.NVinterpretedtheMRIexaminations, assumed independent scoring of the four ligament parts; performedstatisticalanalyses,andhelpeddraftthemanuscript.JK however, all four parts were visible on the same image. interpretedtheMRIexaminations,participatedinthedataanalysis,and reviewedthemanuscript.Allauthorsreadandapprovedthefinal Thus, some dependency in the scoring probably existed manuscript. and may have narrowed the CIs. The kappa value is affected by the prevalence of the evaluated finding, and Acknowledgements it is difficult to compare between groups that differ in ThisstudyreceivedfundingfromtheGriegFoundationandtheNorwegian ExtraFoundationforHealthandRehabilitation.WethankØivindSalvesenfor prevalence [19]. However, this effect on kappa is largest helpwithBlandAltmanplots. for prevalencebelow 10%andabove90%andsmaller for the prevalence reported in our study (22.5% - 46.1%, Received:2November2011Accepted:16January2013 Published:17January2013 Table2).Normalityplotssuggestedonlysmalldeviations from the assumed normal distribution and the ICCs References were also higher with two readers than with one based 1. ThornburyJR:EugeneW.CaldwellLecture.Clinicalefficacyofdiagnostic imaging:loveitorleaveit.AJRAmJRoentgenol1994,162:1–8. on log transformed data. Our study included more 2. D'agostinoMA,AegerterP,Jousse-JoulinS,Chary-ValckenaereI,LecoqB, healthy subjects than patients. Images from a sample GaudinP,BraultI,ScmitzJ,DehautF,LeParcJ,BrebanM,LandaisP:How with a higher proportion of patients would be likely to toevaluateandimprovethereliabilityofpowerDoppler ultrasonographyforassessingenthesitisinspondylarthritis. show a similar prevalence of high signal intensities in ArthritisRheum2009,61:61–69. the ligaments (based on findings in our main project). 3. YoonLS,HaimsAH,BrinkJA,RabinoviciR,FormanHP:Evaluationofan However, those images might have been more difficult emergencyradiologyqualityassuranceprogramatalevelItrauma center:abdominalandpelvicCTstudies.Radiology2002,224:42–46. to interpret, which might have led to lower reliability, 4. CanonCL,SmithJK,MorganDE,JonesBC,FellSC,KenneyPJ,FerranteD, and ultimately, a larger improvement in reliability with LockhartME,WestfallAO,KoehlerRE:Doublereadingofbariumenemas: the inclusion of a second reader. Therefore, this limita- isitnecessary?AJRAmJRoentgenol2003,181:1607–1610. 5. CheungKM,SamartzisD,KarppinenJ,MokFP,HoDW,FongDY,LukKD: tion tendstostrengthen ourfindings. Intervertebraldiscdegeneration:newinsightsbasedon"skipped"level discpathology.ArthritisRheum2010,62:2392–2400. 6. KjaerP,Leboeuf-YdeC,KorsholmL,SorensenJS,BendixT:Magnetic resonanceimagingandlowbackpaininadults:adiagnosticimaging Conclusions studyof40-year-oldmenandwomen.Spine(PhilaPa1976)2005, To our knowledge, this study was the first to provide 30:1173–1180. empiricaldata on the reliability of two readers compared 7. EspositoL,SaamT,HeiderP,BockelbrinkA,PelisekJ,SeppD,FeurerR, WinklerC,LiebigT,HolzerK,PaulyO,SadikovicS,HemmerB,PoppertH: to one. For scoring the signal intensities of ligaments on MRIplaqueimagingrevealshigh-riskcarotidplaquesespeciallyin upper neck MRIs, image reading by a single expert was diabeticpatientsirrespectiveofthedegreeofstenosis.BMCMedImaging less consistent than combined reading by the same 2010,10:27. 8. JagadeesanBD,GadoAlmandozJE,MoranCJ,BenzingerTL:Accuracyof expert in collaboration with a second reader. The latter susceptibility-weightedimagingforthedetectionofarteriovenous approach implied independent readingsfollowed bycon- shuntinginvascularmalformationsofthebrain.Stroke2011,42:87–92. sensus reading in cases of disagreement. This approach 9. JenschS,deVriesAH,PeringaJ,BipatS,DekkerE,BaakLC,BartelsmanJF, HeutinckA,MontaubanvanSwijndregtAD,StokerJ:CTcolonographywith was used to determine conclusive interpretations of high limitedbowelpreparation:performancecharacteristicsinanincreased- intensity signals in neck ligament studies that have pre- riskpopulation.Radiology2008,247:122–132. viously shown that high signal intensities had limited 10. PetersonCK,SaupeN,BuckF,PfirrmannCW,ZanettiM,HodlerJ:CT-guided sternoclavicularjointinjections:descriptionoftheprocedure,reliability clinical relevance [11-13,31,32,36]. Two or more readers ofimagingdiagnosis,andshort-termpatientresponses.AJRAmJ may be needed to provide reliable conclusive image Roentgenol2010,195:W435–W439. reading results in research. In this study, a moderately 11. VettiN,KrakenesJ,DamsgaardE,RorvikJ,GilhusNE,EspelandA:MRIofthe alarandtransverseligamentsinacutewhiplash-associateddisorders experienced second reader improved the reliability com- 1–2-across-sectionalcontrolledstudy.Spine(PhilaPa1976)2011, pared to a single expert reader. The implications of this 36:E434–E440. Espelandetal.BMCMedicalImaging2013,13:4 Page7of7 http://www.biomedcentral.com/1471-2342/13/4 12. VettiN,AlsingR,KrakenesJ,RorvikJ,GilhusNE,BrunJG,EspelandA:MRIof 35. LummelN,ZeifC,KloetzerA,LinnJ,BruckmannH,BitterlingH:Variability thetransverseandalarligamentsinrheumatoidarthritis:feasibilityand ofmorphologyandsignalintensityofalarligamentsinhealthy relationstoatlantoaxialsubluxationanddiseaseactivity.Neuroradiology volunteersusingMRimaging.AJNRAmJNeuroradiol2011,32:125–130. 2010,52:215–223. 36. MyranR,ZwartJA,KvistadKA,FolvikM,LydersenS,RoM,WoodhouseA, 13. MyranR,KvistadKA,NygaardOP,AndresenH,FolvikM,ZwartJA:Magnetic NygaardOP:Clinicalcharacteristics,pain,anddisabilityinrelationtoalar resonanceimagingassessmentofthealarligamentsinwhiplashinjuries: ligamentMRIfindings.Spine(PhilaPa1976)2011,36:E862–E867. acase–controlstudy.Spine(PhilaPa1976)2008,33:2012–2016. 14. VettiN,KrakenesJ,EideGE,RorvikJ,GilhusNE,EspelandA:MRIofthealar doi:10.1186/1471-2342-13-4 andtransverseligamentsinwhiplash-associateddisorders(WAD)grades Citethisarticleas:Espelandetal.:Aretworeadersmorereliablethan 1–2:high-signalchangesbyage,gender,eventandtimesincetrauma. one?Astudyofupperneckligamentscoringonmagneticresonance Neuroradiology2009,51:227–235. images.BMCMedicalImaging201313:4. 15. KrakenesJ,KaaleBR:Magneticresonanceimagingassessmentof craniovertebralligamentsandmembranesafterwhiplashtrauma. Spine(PhilaPa1976)2006,31:2820–2826. 16. KrakenesJ,KaaleBR,MoenG,NordliH,GilhusNE,RorvikJ:MRIassessment ofthealarligamentsinthelatestageofwhiplashinjury–astudyof structuralabnormalitiesandobserveragreement.Neuroradiology2002, 44:617–624. 17. AltmanDG:Practicalstatisticsformedicalresearch.1stedition.London: Chapman&Hall;1991. 18. TerweeCB,BotSDM,deBoerMR,vanderWindtDAWM,KnolDL,DekkerJ, BouterLM,deVetHCW:Qualitycriteriawereproposedformeasurement propertiesofhealthstatusquestionnaires.JClinEpidemiol2007,60:34–42. 19. SimJ,WrightCC:Thekappastatisticinreliabilitystudies:use, interpretation,andsamplesizerequirements.PhysTher2005,85:257–268. 20. BankierAA,LevineD,HalpernEF,KresselHY:Consensusinterpretationin imagingresearch:isthereabetterway?Radiology2010,257:14–17. 21. UmansH,WimpfheimerO,HaramatiN,ApplbaumYH,AdlerM,BoscoJ: Diagnosisofpartialtearsoftheanteriorcruciateligamentoftheknee: valueofMRimaging.AJRAmJRoentgenol1995,165:893–897. 22. vanRijnJC,KlemetsoN,ReitsmaJB,MajoieCB,HulsmansFJ,PeulWC,Stam J,BossuytPM,denHeetenGJ:ObservervariationinMRIevaluationof patientssuspectedoflumbardiskherniation.AJRAmJRoentgenol2005, 184:299–303. 23. JohnsonJ,KlineJA:Intraobserverandinterobserveragreementofthe interpretationofpediatricchestradiographs.EmergRadiol2010, 17:285–290. 24. PirisiM,LeutnerM,PinatoDJ,AvelliniC,CarsanaL,ToniuttoP,FabrisC, BoldoriniR:Reliabilityandreproducibilityoftheedmondsongradingof hepatocellularcarcinomausingpairedcorebiopsyandsurgical resectionspecimens.ArchPatholLabMed2010,134:1818–1822. 25. KerM:Issuesintheuseofkappa.InvestRadiol1991,26:78–83. 26. KaaleBR,KrakenesJ,AlbrektsenG,WesterK:Whiplash-associateddisorders impairmentrating:neckdisabilityindexscoreaccordingtoseverityof MRIfindingsofligamentsandmembranesintheuppercervicalspine. JNeurotrauma2005,22:466–475. 27. DullerudR,GjertsenO,ServerA:Magneticresonanceimagingof ligamentsandmembranesinthecraniocervicaljunctionin whiplash-associatedinjuryandinhealthycontrolsubjects. ActaRadiol2010,51:207–212. 28. KimHJ,JunBY,KimWH,ChoYK,LimMK,SuhCH:MRimagingofthealar ligament:morphologicchangesduringaxialrotationoftheheadin asymptomaticyoungadults.SkeletalRadiol2002,31:637–642. 29. RoyS,HolPK,LaerumLT,TillungT:Pitfallsofmagneticresonance imagingofalarligament.Neuroradiology2004,46:392–398. 30. PfirrmannCW,BinkertCA,ZanettiM,BoosN,HodlerJ:MRmorphologyof alarligamentsandoccipitoatlantoaxialjoints:studyin50asymptomatic subjects.Radiology2001,218:133–137. Submit your next manuscript to BioMed Central 31. VettiN,KrakenesJ,EideGE,RorvikJ,GilhusNE,EspelandA:AreMRIhigh- and take full advantage of: signalchangesofalarandtransverseligamentsinacutewhiplashinjury relatedtooutcome?BMCMusculoskeletDisord2010,11:260. • Convenient online submission 32. VettiN,KrakenesJ,AskT,ErdalKA,TorkildsenMD,RorvikJ,GilhusNE, EspelandA:Follow-UpMRImagingoftheAlarandTransverseLigaments • Thorough peer review afterWhiplashInjury:AProspectiveControlledStudy.AJNRAmJ • No space constraints or color figure charges Neuroradiol,inpress. • Immediate publication on acceptance 33. JarvikJG,DeyoRA:Moderateversusmediocre:thereliabilityofspineMR datainterpretations.Radiology2009,250:15–17. • Inclusion in PubMed, CAS, Scopus and Google Scholar 34. FeinsteinAR:Anadditionalbasicscienceforclinicalmedicine:IV.The • Research which is freely available for redistribution developmentofclinimetrics.AnnInternMed1983,99:843–848. Submit your manuscript at www.biomedcentral.com/submit