INSTITUTEOFPHYSICSPUBLISHING JOURNALOFPHYSICSD:APPLIEDPHYSICS J.Phys.D:Appl.Phys.36(2003)R97–R126 PII:S0022-3727(03)34952-6 TOPICAL REVIEW Elemental thin film depth profiles by ion beam analysis using simulated annealing—a new tool CJeynes1,NPBarradas2,6,PKMarriott3,GBoudreault1,MJenkin4, EWendler5 andRPWebb1 1UniversityofSurreyIonBeamCentre,Guildford,GU27XH,UK 2InstitutoTecnolo´gicoeNuclear,E.N.10,Sacave´m,Portugal 3DepartmentofStatistics,NationalUniversityofSingapore,Singapore 4SchoolofElectronicsComputingandMathematics,UniversityofSurrey,Guildford,UK 5Friedrich-Schiller-Universita¨tJena,Institutfu¨rFestko¨rperphysik,Jena,Germany Received18March2002 Published19March2003 Onlineatstacks.iop.org/JPhysD/36/R97 Abstract Rutherfordbackscatteringspectrometry(RBS)andrelatedtechniqueshavelongbeen usedtodeterminetheelementaldepthprofilesinfilmsafewnanometrestoafew micronsthick. However,althoughobtainingspectraisveryeasy,solvingtheinverse problemofextractingthedepthprofilesfromthespectraisnotpossibleanalytically exceptforspecialcases. Itisbecausethesespecialcasesincludeimportantclasses ofsamples,andbecauseskilledanalystsareadeptatextractingusefulqualitative informationfromthedata,thationbeamanalysisisstillanimportanttechnique. Wehaverecentlysolvedthisinverseproblemusingthesimulatedannealing algorithm. Wehaveimplementedthesolutioninthe‘IBADataFurnace’code,which hasbeendevelopedintoaveryversatileandgeneralnewsoftwaretoolthatanalysts cannowusetorapidlyextractquantitativeaccuratedepthprofilesfromrealsamples onanindustrialscale. Wereviewthefeatures,applicabilityandvalidationofthis newcodetogetherwithotherapproachestohandlingIBA(ionbeamanalysis)data, withparticularattentionbeinggiventodeterminingboththeabsoluteaccuracyof thedepthprofilesandstatisticallyaccurateerrorestimates. WeincludeexamplesofanalysesusingRBS,non-Rutherfordelasticscattering, elasticrecoildetectionandnon-resonantnuclearreactions. Highdepthresolution andtheuseofmultipletechniquessimultaneouslyarebothdiscussed. Thereis usuallysystematicambiguityinIBAdataandButler’sexampleofambiguity (1990Nucl.Instrum.MethodsB45160–5)isreanalysed. Analysesareshown:of evaporated,sputtered,oxidized,ionimplanted,ionbeammixedandannealed materials;ofsemiconductors,opticalandmagneticmultilayers,superconductors, tribologicalfilmsandmetals;andofoxidesonSi,mixedmetalsilicides,boron nitride,GaN,SiC,mixedmetaloxides,YBCOandpolymers. 1. Overview Glossaryinsection2foranexplanationoftheacronyms)inthe contextoftheworkofthecommunity;discussingthescientific We will review our recent software developments in thin implications of these developments. We hope to show that film depth profiling with ion beam analysis (IBA, see the these developments improve the usability of IBA to such an extentastoeffectivelyestablishanewdepthprofilingtool. 6 CentredeF´ısicaNucleardaUniversidadedeLisboa,AvenidaProg.Gama We start (section 2) with a simplified overview of Pinto2,1699LisboaCodex,Portugal IBA since we believe that our new software tool (the IBA 0022-3727/03/070097+30$30.00 ©2003IOPPublishingLtd PrintedintheUK R97 TopicalReview DataFurnace)willmakeIBAattractivetomanyusersforthe with magnetic tunnel junctions (used for magnetic random firsttime. Theessenceofthisnewtoolisitssolutionofthe access memory applications) fabricated with sophisticated ‘inverse problem’ of automatically extracting depth profiles metallic (and semi-metallic) multilayers and analysed with from Rutherford backscattering (RBS) data: we therefore highresolutionRBS.WithRossendorftheyhaveprofiledMBE describethisinverseproblem(section3)andcontinuewithan GaN on Si (Alves et al 2002). The Rossendorf group has introductiontothesimulatedannealingalgorithmonwhichthe investigatednitridingofAl(Telbizovaetal2001a,b,Fitzand inverseproblem’ssolutionisbuilt(section4). Thisalgorithm Mo¨ller2002). was first applied to RBS in 1997 by Barradas et al, and the ThegroupatSurreyhasalso:measuredlateralstressinim- mathematical analogy of ‘annealing’ is why we have called plantedgoldfilmsusingathreecrystalquartzresonatormethod thecode‘DataFurnace’. andadetailedself-consistentRBSanalysis(Wayetal1999); After some RBS examples exemplifying the scope of investigatedsurfactantbehaviourintheformationoflatexfilms the tool (section 5) we give details of the physics (the (Tzitzinouetal 1999); characterizedCVDdepositedSiCon ‘forward model’) and the algorithm used (section 6). We Si (Toal et al 1999) and sol–gel deposited Ta2O5 and TiO2 havealsoincludedheresomerecentdevelopmentsextending dielectricfilmsonsilicon(Cappellanietal1999);usedcom- plementaryRBSandXPStoinvestigatetribologicalcoatings the code to moderately rough surfaces. Another problem is (TiN,TiAlNandMoS :Ti)onsteel(Bakeretal2000);inves- thatweareforcedbyconsiderationofcomputationtimetouse 2 tigated low temperature growth of GaN (Young et al 2000); asimpleforwardmodelinvolvingonlysinglescattering,and characterized amorphous iron dicilicide (Milosavljevic´ et al therefore multiple scattering effects cannot be fitted directly. 2001,2002);andmeasuredHandNcontentofa-CbyHi-ERD However, we have demonstrated an effective method for (Careyetal (2000,2001):thisworkwasdoneatRossendorf accountingforthisandrelatedeffects(section7). andcitedBarradasetal(2000a,b)). We describe and discuss the intrinsic ambiguity of IBA TheCambridgegrouphascharacterizedsiliconanodically data and the systematic approaches to valid analysis in the oxidized using a wave resonance plasma (WARP) source by presenceofthisambiguity(section8). Wespecificallyrevisit RBS (Uchikoga et al 1999), and also characterized WARP Butler’s (1990) interesting example of ambiguity and show deposited a-C:N:H films by He ERD/RBS (Rodil et al howDataFurnaceisanexcellenttooltoexplorevalidinterpre- 2000). The Sussex group has profiled metal nanoparticles tationsofthedataunderavarietyofconstraints. Wealsoshow formed in glass by high dose implantation (Stepanov et al anumberofexamplesusingvariousforwardrecoiltechniques. 1999, 2001a,b). Other British RBS applications include It is one of the major benefits of this new tool that the the characterization of gate oxides on SiGe MOSFETs determinationoftheconfidencelimitsonthecalculateddepth (Riley et al (1999): Liverpool/London/Southampton); and profilescanbedoneasanaturalextensionofthealgorithm. We the investigation of polycarbonate/polymethylmethacrylate discusstheprecisionthatisavailableinthisway(section9). interfacechemistry(Hutchingsetal2001). We also discuss the use of accurate calculations of energy The Go¨ttingen group has followed ion beam mixing resolutionasafunctionofdepthtoenableustodoveryhigh of Si/C multilayers (Harbsmeier et al 2000) and Ta/Si depthresolutionanalysis(section10). Bothofthesefeatures bilayers (Bibic et al 2000, Dhar et al 2002) with RBS areusedintheanalysisofbothSiGemultilayersanalysedwith and other methods. They have also characterized cubic RBSanddeuteratedpolymermultilayersanalysedwithNRA boron nitride (Zinin et al 2002), fluorinated amorphous (nuclearreactionanalysis). carbon(Ronningetal2001)andbeta-irondisilicide(Wagner Theaccuracyofananalysis,discussedinsection11,isin et al 2002). The Darmstadt group has investigated heavy theendthemostimportantissue. Theestablishmentofanew ion induced metal–ceramic interface diffusion with RBS tooldependsfirstofallonbeingabletodemonstratethatthe (Nagel and Balogh 1999, Nagel et al 1999). The Chinese answers are correct. At last a routine analysis of non-trivial group have followed the interdiffusion of superconducting samplesisavailablewhichisbothcompleteandatstate-of-the- heterostructures(PZT/YBCO)(Xieetal1999). artaccuracy. Accuracyofanalysisdependsultimatelyonthe accuracyoftheforwardmodel:weendwithasubtlediscussion 2. IntroductiontoIBA oflimitationsofthecurrentforwardmodelinthecontextofan extraordinarilypreciseanalysisofanopticalmultilayersample IBAisaclusteroftechniquesinvolvingmaterialsanalysisby (section12). MeVionbeams. Whenanenergeticionstrikesatargetthere Inconclusion,wepointoutpossiblefuturedevelopments areavarietyofenergylossmechanisms,any(orall)ofwhich (section13). Thereviewisillustratedwithanumberofstriking can be used (together or separately) to obtain information examplesshowingthewiderangeofsamplesthathavebeen about the target. Table 1 summarizes these techniques, and analysed using the whole range of the IBA depth profiling indicatesthoseforwhichtheIBADataFurnacecanbeusedto techniques. It is worth mentioning that DataFurnace is in obtain elemental depth profiles. We emphasise that the IBA increasingusebyanalystsforavarietyofmaterialsanalyses DataFurnace software we describe applies only to thin film andwebrieflyreviewthisworkhere. compositiondepthprofilingandnottothevariousmicrobeam The Lisbon group have used high resolution RBS to imagingorchannellingapplications. monitor the fabrication of MnIr spin valves with nano- IBAisnowamaturescientifictechnique,verywidelyused oxide layers formed from plasma oxidation of CoFe inelectronicmaterialsresearch,andthroughoutthinfilmsci- layers (Veloso et al 2000). This process is to enhance ence:themostusefulsinglevolumeoverviewsareintheHand- the magnetoresistance for magnetic memory application. books(MayerandRimini1977,TesmerandNastasi1995). It Zhang et al (2001) have investigated processing problems is a very large and diverse field which we do not intend to R98 TopicalReview Table1. GlossaryofIBAtechniques. Technique Explanation DataFurnace? RBS Rutherfordbackscattering NuclearCoulombscattering, Yes incidentionisdetected(Q=0) EBS Elastic(non-Rutherford) NuclearscatteringasRBS,butwherethe Yes backscattering Coulombbarrierisexceeded(Q=0) ERD Elasticrecoildetection SynonymforFRS(Q=0) Yes FRS Forwardrecoilspectrometry SynonymforERD.AsRBS(orEBS)but Yes whererecoiledtargetatomisdetected. Thishastobeinaforwardscattering directionforkinematicalreasons NRA Nuclearreactionanalysis Non-elasticinteraction(Q(cid:1)=0). Yes Resultantreactionproductdetected (usuallyproton,alphaordeuterium) PIXE Particleinduced X-raysfrominelasticcollisionsofthe No x-rayemission incidentionwithinnercoreelectrons areeasytodetect:similarspectratoEDAX PIGE Particleinduced SpecialcaseofNRA,whereaphotonisdetected No gammarayemission Channelling UsedwithanyIBAtechniqueforquantifying No andprofilingdamageinsinglecrystalsamples Microbeam Theionbeamcanbefocussedandscannedand Yes usedasascanningionmicroscope. RBSand otherspectracanbecollectedfromspecificregions ofthesample. UsuallyusedwithPIXE IL Ionoluminescence AsPIXE,butlowerenergyphotonsdetected No IBIC Ionbeaminducedcurrent Microbeamtechniqueforinvestigatingsemiconductor No devices,whichrespondtosingleionimpacts STIM Scanningtransmission Detectsenergylossofsingleionspenetrating No ionmicroscopy (relatively)thicktargets review here, only making a few general comments that may An ion beam striking a solid target has many types of be useful for readers not analysts themselves. These read- interaction,anyofwhichcanbeusedforanalysis. Weshould ers should note that practitioners will probably quibble with pointoutthatnothingpreventstheanalystinstallingmultiple almostallofourgeneralizationsherewhichareintendedonly detectors in the target chamber to detect any or all of these toallownewcomerstograspIBAinaconcreteway. interactionssimultaneously. The first of the international biennial IBA Conference serieswasheldin1973(MayerandZiegler1973):theseare 2.1. Photonemission nowverylargemeetings,IBA-14(Mo¨lleretal2000)had362 Ionoluminescence has the highest cross-section of all the participantsandissupplementedbyseveralotherinternational interactions: this is a very young field (see, e.g. Bettiol et al conference series: both PIXE (see Malmqvist (1999)) and (1994)andYangetal(1994)). PIXE(JohanssenandCampbell the microbeam (see Prozesky et al (1999)) have their own 1988) typically uses a 3MeV proton beam, although helium large conferences and there are many regional meetings. canalsobeused. Itisdirectlycomparablewithelectronprobe IBA is ubiquitous in the scientific thin film community: for microanalysis(EPMA)withverysimilarspectra,exceptthat example, IBA was used in 38% of the papers published at there is negligible primary bremsstrahlung background due a recent IBMM (conference on the ion beam modification to the much higher particle mass. PIXE therefore has much of materials: Vredenberg et al (1999)). And not only the highersensitivitythanEPMA.Cross-sectionscanbehundreds materials community: for example, the Louvre Museum in of barns (1barn = 10−24cm2). However, it is hard to get Parishasitsowndedicatedaccelerator(Amseletal1990),and depthprofilesfromPIXEspectra,andthistechniqueisnotyet thereisextensiveinternationalapplicationofIBAtechniques implementedinDataFurnacealthoughitisoftenaveryuseful to cultural and archaeological artefacts (see Respaldiza and complementarytechnique(see,e.g.Lohetal(1993)). Go´mez-Camacho(1997)). IBA uses incident beam energies ranging perhaps from 2.2. Elasticscattering 100keV to 200MeV. At the low energy end the detectors become very expensive, as do the accelerators at the high RBS(Chuetal 1978)typicallyusesa2MeVHebeamwith end: we will not attempt to summarize all the detectors and typicalcross-sectionsofabarn:butforasimilarenergyproton accelerators in use today. IBA really started when nuclear beam the Coulomb barrier of light target nuclei (up to P) physics groups looked for new uses for their obsolete little is exceeded and the cross-sections are no longer Rutherford. 2MVVandeGraaffacceleratorsinthe1960sand1970s,and Where the interactions remain elastic we call the method there are still some of these machines around. The typical EBS. EBS cross-sections can exceed RBS ones by orders of machine being installed today is a 2MV tandem machine magnitude. Of course, two body interactions between the which can deliver multiply charged beams up to perhaps incident ion and the target nucleus will both scatter the ion 10MeV. and recoil the target nucleus into forward directions. The R99 TopicalReview recoilscarrymuchinformationaboutthesampleandcanalso 3.2. Elementaldepthprofiles bedetected. ThisisknownvariouslyasFRSorERD(Tirara Itturnsoutthatsimplesilicondiodesveryconvenientlystop etal1996). alphasofMeVenergies,convertingtheirenergyintoacloud ofelectron–holepairswhichcanbeseparatedbytheelectric 2.3. Nuclearreactions fieldifthediodeisreversebiased. Highqualityelectronicsto determinethenumberofelectronsarrivingattheanode,and Where the interaction energy is sufficiently high the nuclear hencetheenergyoftheparticle,havebeenavailableforover structure becomes involved, the interaction may become threedecades. Thesedetectorsareeffectively100%efficient, inelastic, and nuclear reactions can occur: NRA is often detecting every particle that strikes them. Thus, an alpha very useful since it is isotope specific and there is often no scattering experiment can determine the mass of the target backgroundtothesignal, butcross-sectionstypicallyfallby nucleus (from the energy of the scattered alpha, using the ordersofmagnitude(Vizkelethy1995,Hirvonen1995). kinematics:conservationofenergyandmomentum),thedepth of the target nucleus (also from the energy of the scattered 2.4. Imagingandchannelling alpha, this time using the energy loss as a function of depth in the target material), and the concentration of the target Turning to beam and sample manipulation, microbeams as nucleus(fromtheabsolutenumberofscatteredalphas,using smallas50nmhavebeenformed,and1µmmicrobeamsare thecross-sectionsthatRutherfordcalculated). in widespread routine use: thus lateral maps of the major, minorandtraceelementsofthesamplearereadilyavailable byPIXE(seeWattandGrime(1987)). NotethatforPIXEthe 3.3. Mass-depthambiguity excitationvolumewherethex-raysaregeneratedisessentially RBS spectrometry can therefore, in principle, determine the determined by the beam spot size since protons are not completeelementaldepthprofileofasamplefromtheenergy deflectedmuchbyelectrons. Incontrast,theexcitationvolume spectrumofthescatteredparticles. However,thereaderwill for x-ray analysis in the SEM is determined by the electron havenoticedthatthescatteredparticleenergyisafunctionboth beamenergy,andisoftenlargerthan5µmacross. Therefore, ofthemassofthetargetnucleus,andofthedepthofthetarget PIXEx-raymapsaretypicallyofmuchhigherlateralresolution nucleusinthesample. Thisambiguityintheinterpretationof thanEPMAones. Ofcourse,secondaryelectronmapsareof thedetectedparticleenergyistheoriginoftheinverseproblem muchlowerresolutionforPIXEthanfortheSEM. thatisattheheartofthispaper. Highenergyionbeamscanbeverywellcollimatedand Brice (1973) has shown that the elastic backscattering aligned with symmetry directions in single crystal samples, spectrumhastheformofatripleintegral: makinglatticesiteanddamageinformationavailable. Known (cid:1) (cid:1) (cid:1) ∂ ∞ t E σ(E ) as channelling, this is a very large field in its own right (see (cid:1) (E,E )=A dx(cid:3) N dx dE P P 1 Feldmanetal(1982),Go¨tzandGa¨rtner(1988)). e 3 ∂E3 0 0 e E(cid:3) 1 in outS(E1) Other interesting emerging fields include in situ obser- (1) vation of the electrical behaviour of working semiconductor where (cid:1) is the number of incident particles energy E e devicesbytheionbeaminducedcurrent(IBIC:see,e.g.Breese backscatteredwithenergyE intothedetectorfromelemente 3 et al (1998a)), and the characterization of the defect struc- in the target. N (t) is the number of scattering centres of e ture of thick crystalline samples using a channelled beam in element e at depth t in the target. For target atoms at the transmission(STIM,see,e.g.Breeseetal(1998b)). surface E is given by the kinematical factor k : E = k E, 3 e 3 e and in general E(cid:3)(E ) = E /k . E is the beam energy 3 3 e 1 before scattering for atom e at depth t in the sample. σ is 3. TheIBAinverseproblem the differential cross-section and S = dE/dR where dR is theaveragedistancetravelledbyanionenergyEwhilelosing 3.1. Rutherfordscattering energydE. Thepathlengthintothetargetbeforescatteringis In1911,ErnestRutherfordexplainedthestructureoftheatom givenbyx andthepathlengthafterscatteringbyx(cid:3). P and in asapositivelychargednucleussurroundedbyelectrons,where P are respectively the distributions in the projected range out thenucleuswastinywithrespecttothesizeoftheatom(already alongtheincomingandoutgoingdirections. known then to be around 1Å). He used a simple Coulomb Ofcourse,themeasuredspectrumisthesumofthe(cid:1)efor potentialandcontradictedJ.J.Thompson’s‘pudding’model. alltheconstituentelementsofthetarget. Thereisusuallyno RutherfordbasedhismodelonGeigerandMarsden’s(1909) waytomeasureindividualpartialspectra(cid:1)e. data on the ‘diffuse scattering’ (in backward directions) of alpha particles by metal foils, and Geiger’s (1910) data on 3.4. Fittingbysimulation themostprobableangleofscatteringoftransmittedparticles. Although the most probable scattering angle is very low, Clearly,tocalculatetheinverseN((cid:1))isnoteasyanalytically indicating a distributed charge as in Thompson’s model, and we describe a new approach to this inverse problem backscatteringeventsimplyaconcentratedchargeatthecentre below. Themostwidespreadapproachcurrentlyisapragmatic oftheatom(requiringtheconceptofthenucleus). Geigerand application of trial and error. The standard treatment is to Marsden(1913)subsequentlyverifiedRutherford’scalculation assume that the target has a certain structure, to calculate ofthescatteringprobability,orcross-section. the spectrum that would be obtained from such a target, to R100 TopicalReview comparewiththecollectedspectrumandthentoiterateuntil 4. Introductiontosimulatedannealing a reasonable match is obtained. The skilled analyst, from long experience and cunning, recognizes certain features of In this section, we describe the various elements of the the spectrum and will, usually quite quickly, converge to an simulated annealing (SA) algorithm that lies at the heart of acceptablesolution. theIBADataFurnace. Thisdiscussioniscentraltothispaper. Thisapproachhasanumberofmajordrawbacks. First, TheessentialelementsoftheSAalgorithmwereinvented it is entirely manual, requiring a skilled analyst to give each 50yearsagobyMetropolisetal(1953)withagroupincluding spectrumatreatmentthatcaneasilytakehours. (Itshouldbe Edward Teller, in the context of the discussion of the use notedthatmanytypesofspectracanbetreatedfaster,orrequire of ‘fast computing machines’ and Monte Carlo methods only limited data to be extracted. And of course, analysts to calculate the equilibrium value of any (thermodynamic) havebecomeadeptatthis.) Second,largenumbersofsamples quantityusingthecanonicalensembleofstatisticalmechanics. cannotbehandledwhereafulltreatmentisneeded,sincethe They successfully calculated the equation of state for a two- analyst simply runs out of patience. Third, because of the dimensional ensemble with these methods using the Los difficultyofdataanalysisthetemptationfortheexperimentalist AlamosMANIACcomputer. is to make samples especially for the analysis, instead of The fundamental SA theorem establishes that provided analysingthesamplesthatresultfromtheexperiment. Fourth, certainformalcriteriaareobserved,thealgorithmisguaranteed incaseswheretheexperimenterhasfailedtokeepthesamples to find the absolute minimum of any piecewise continuous simple, the spectra can be complex so that the analyst has function. The implementation of the algorithm is always difficulty extracting definite depth profiles from the data: in computationallyrelativelyexpensive,andarevivalofinterest these cases the analyst may remain uncertain of the validity in it followed the explosion of computing power in the oftheresult. Fifth,trainingtheseskilledanalystsisdifficult: 1980s. We have found the summary of Aarts and Korst theyarealmostinvariablyatleastpostgraduatestudentlevel. (1989)useful. Ithasbeenusedtosolvepreviouslyintractable Sixth,wehavedevelopedthe‘ruleofthumb’(roughestimate), problems,knownas‘NP-complete’,thatis‘non-deterministic over many years of analysis, that to analyse reasonably polynomialtimecomplete’wherethecomputationtimeisnot simple data collected over one day requires three further boundedbyanypowerofthenumberofindependentvariables days subsequent concentrated work by the analyst. Difficult oftheproblem. Examplesoftheseare:thetravellingsalesman data (or the inexperienced analyst) may take much longer! problem (see Press et al (1992)), and the construction of Clearly, to fully staff an accelerator will be very expensive. automatic sentence parsers for natural language processing Thereare,currently,fewlabsofferingafullcommercialRBS (see Wilks et al (1996)). It was actually in the context of service. thelastapplication,serendipitously,thatwecameacrossSA. In fact, the NP-complete class of problems has provided an importantstimulusforthedevelopmentofSA,asKirkpatrick 3.5. Fittingbysimulatedannealing etal(1983)pointoutinaninterestingarticlethatusesSAto We have taken a completely different approach. At its solvethesemiconductorchiplayoutdesignproblem(aswell heartisthesimulatedannealingalgorithm,whichisaglobal asthetravellingsalesmanproblem). minimizationalgorithmusingcombinatorialoptimization(see section 4). We have applied this algorithm to IBA data, 4.1. Theforwardmodel yieldinganalgorithmicsolutionoftheIBAinverseproblem. Notethatthemanualproceduredescribedaboveisnotatrue Wewillrepresenttheelementaldepthprofileofathinfilmas algorithmsinceitcannotbehandledbyamachineinthegeneral a sequence of layers of specified thickness, each containing case. Thefunctionoftheanalystisnowdifferent:insteadof certainelementsinagivenproportion. Thereareotherways trying to guess a rough structure that may be approximately of representing profiles, but this is quite general and is the consistentwiththedata,theanalystconcentratesonspecifying naturalrepresentationforthecalculationoftheexpectedIBA themachinecalibrations,onspecifyinganypriorknowledgeof energy spectra. Consider an arbitrary depth profile x which thesample,andonanappropriatemetricforthevalidityofthe is a structure {l,λ }, where the vector l is the list of layer i ij result. Foranygivensetofpriors(theseincludeinstrumental thicknesses,andthematrixλisthelistofr(cid:2)elativeproportions parameters)thealgorithm,nottheanalyst,willdeterminethe oftheelementsforeachlayer(andhence λ =1). Then j ij depth profile. We should point out that the algorithm itself, theforwardmodelisencapsulatedinthefunctionF(x)which aswehaveimplementedit,hasalargenumberofparameters calculatestheenergyspectrumY(E)(cfequation(1))expected which could be chosen by the analyst: our implementation forthesamplewithdepthprofilex. has value largely because we have shown that we have foundvaluesofgeneralvalidityformostoftheseparameters, Y(E)=F(x) (cid:3) and we give the analyst clear rules for the choice of the (2a) F(x)= f (x) remainder. e e We believe that the advent of this new algorithm will completelyrevolutionizetheapplicabilityofIBAtechniquesto wheretheforwardmodelcalculatesapartialspectrumf for e thinfilmanalysissinceitaddresseseachofthesixdrawbacks eachchemicalelementinthesample,andthetotalspectrumis enumeratedaboveatafundamentallevel. Theimprovements justthesumofthepartials. aresodramaticthatweeffectivelyhaveanewtoolforthinfilm This forward model, which is discussed in detail later, compositionanalysis. is the essence of the calculation: without a forward model R101 TopicalReview the spectra cannot be interpreted. All existing simulation ForSAtheMarkovchainsareconstructedinaparticular programsareessentiallysimplyimplementationsofaforward way. The state space S is the space of all possible depth model. Weconsiderlimitationsoftheexistingforwardmodel profiles x. This is the same as the space of all conceivable attheappropriateplaces. samples. The Markov chains are then sequences of profiles {x ,x ,...,x }. Thegenerationfunctionforconstructingx 1 2 m i+1 4.2. Theobjectivefunction fromxi canbeanymethodofselectinganewprofilexfrom state space. It is the acceptance criterion that gives SA its We have to have a way of comparing some profile x with remarkableproperties. the measured spectrum Y. There are many possibilities, some of which we will discuss later. We could use the very 4.4. Theacceptancecriterion simple and general χ2 function to compare the measured n-channelspectrumY = {y ,y ,...,y }withthecalculated Aproposedstatex isaccepteddependingonhowtheobjective 1 2 n p oneF(x)={ψ ,ψ ,...,ψ },givenby functionhaschangedfromitsvalueforthelastelementx of 1 2 n i (cid:3) thecurrentMarkovchainunderconstruction. Weevaluatethe χ2(Y,x)= (yi−ψi)2. (2b) change (cid:8) = O(Y,xp)−O(Y,xi) in the objective function i impliedbymovingfromstatex tox ,andifitisimproving i p It is not necessary to use the χ2 function, but some such ((cid:8) < 0) we accept the state: xi+1 = xp. However, if it is functionmustbespecifiedandthisisknownastheobjective gettingworse((cid:8)>0)wemayalsoacceptitaccordingtothe function O(Y,x). To solve the inverse RBS problem the probabilityP givenbytheMetropoliscriterion: (cid:4) (cid:6) objective function has to be minimized. When O(Y,x) is (cid:8) minimized an optimal solution x has been found: O = P ∝exp− (2e) o min T O(Y,x ). Note that there may be many indistinguishable o solutions x . Normally we are content with any particular where T is a control parameter characteristic of the Markov o solution x that is near an optimal solution x , but we show chain under construction. It is not hard to recognize the o later how to find both a most probable solution and also an similarity of this probability to the Boltzmann factor, with estimateoftheerrorofthissolution. T analogous to temperature. Thus, for the construction of the Markov chains used in SA the Metropolis criterion (or somegeneralizationofit)isusedfortheacceptancecriterion: 4.3. Markovchains this gives the Markov chains a natural thermodynamic SA depends on the mathematics of Markov chains (see interpretation. IftheMarkovchainsarelongenoughtheywill Gilks and Richardson (1996)). These are simply sequences reacha‘thermodynamicequilibrium’,thatis,theexpectation M of states s selected from a state space S such that value of certain statistics will stabilize. The introduction of s i M ={s ,s ,...s,...s }wherethes thentryinthechain aprobabilityintothetransitionfunctionmeansthatitcanbe s 1 2 i m i+1 dependsonlyonthesthentryandnotonanypreviousones. viewed as a Monte Carlo process, and these Markov chain i Moreproperly, MonteCarlo(MCMC)techniquesarecurrentlyofenormous mathematicalinterest(see,e.g.GilksandRichardson(1996)). P(s |s ,s ,...,s)=P(s |s) (2c) i+1 1 2 i i+1 i that is, the probability of si+1 given si is independent of all 4.5. Thecoolingschedule statesbeforei. ThefinalcomponentinSAisthecoolingschedule. Asequence Then the Markov chain is constructed by the transition distribution T(s) = s . This distribution is composed of of Markov chains is constructed with steadily reducing i i+1 ‘temperature’. ThentheSAtheoremstatesthattheobjective a generation distribution from which a proposed new state functionwillbeminimizedprovidedthateachMarkovchain is generated, and an acceptance criterion which determines islongenoughtohavereachedequilibrium,andprovidedthat whetherornottoacceptthisproposedstateasthenextentry thecoolingscheduleissufficientlyslow. intheMarkovchain. Barradas et al (1997) have pointed out that although S is extremely large, it is finite in principle because there are 4.6. Summary only 92 elements, and both sensitivity and energy resolution SA is an algorithm for finding the global minimum of an arefinite. Asanestimate, foranygivenbeamanddetection objective function. The entire state space of this function is system (cid:4)(cid:5) (cid:6) explored. Asequenceofstates(aMarkovchain)isconstructed e 1 log|S|≈ i=1 logn (2d) in which succeeding states have an objective function that φi is either reducing or has a Boltzmann-like probability of where |S| is the number of states s in the space S, there are increasingaccordingtoaparameteranalogoustotemperature. e elements in the sample, φ is the relative sensitivity for Hencetheideaof‘annealing’. AsequenceofMarkovchains i element i and n is the number of separable layers. Note isthenconstructedwithreducingtemperature; theendpoint that typically no more than 50 layers can be distinguished being an optimal solution. For IBA, the objective function (n < 50) and in each layer we can probably not determine foraproposeddepthprofileisconstructedfromthedifference the concentration of an element at better than 1% precision between the spectrum being fitted and that calculated with a (1/φ < 100). Therefore, theproblemmaybetreatedeither forwardmodel(astandardsimulationcode)fromtheproposed i asadiscreteor(sinceSissolarge)asacontinuousproblem, depthprofile. Thenthestatespaceexploredisthespaceofall dependingwhatismathematicallymoreconvenient. possibledepthprofiles. R102 TopicalReview 4.7. Algorithmicissues The SA theorem proves that in the general case a global minimum in the objective function is reached in logarithmic time,thatis,forMarkovchainkthecoolingscheduleisgiven by a temperature sequence T = T /log(10+k), provided k 0 thattheinitialtemperatureT issufficientlylarge(Gemanand 0 Geman 1984). Unfortunately, this implies ridiculously long computation times. In practical cases, geometrical cooling schedulesarealwaysused,thatisT =akT ,wheretypically k 0 inourcasea ∼ 1. ThisimpliesthatSAisefficientforsome 2 classesofproblem,butmaybeveryinefficientforothers. An interesting paper by Sorkin (1991) has given an indicationofthecircumstancesinwhichSAcanbeexpected to be efficient. He proves that for a state space with certain (rather artificial) fractal properties, a value of the objective functionclosetoaglobalminimumcanbefoundingeometrical time. Butheshowsthatitislikelythatamoregeneraltheorem forrealisticfractalstatespacesisalsotrue. Inapreliminary analysisofthestatespaceforarepresentativeRBSspectrum wehavefoundthat,inthevicinityoftheglobalminimumand in certain representations (not the χ2 one described above), the objective function has only one minimum (the global minimum). WethereforesuspectthatIBAspectraarelikely tohavequitewellbehavedobjectivefunctionswhichSAcan solveeasily. Wewillreportonthisinanotherplace. Depth(1015 atoms/cm2) To demonstrate that our SA algorithm is well behaved Figure1. 1.5MeV3HeRBSof200keV2×1017Fecm−2 we need to show that it converges properly. Barradas et al implantedintoSi. (a)RBSspectrumwithfit(linethroughdata (1998e) have shown the variation of the average Si content points)andtheFe,SiandOpartialfittedspectra(otherlines). of the last 50 elements (states x) of each Markov chain in (b)fitteddepthprofileforFe(afterBelsenetal(1999)). Probably the cooling schedule at various depths for an annealed and theFedoesnotpenetratesodeeplyintothetarget(thereisstill apparently‘significant’Feat24×1017atomscm−2):thisisan oxidizedironsilicide,plottedfortheentirecoolingschedule. ambiguityduetouncertaintyinthechargecollection. Thereis Inthesubstrate,thesiliconcontentreachesitsfinalvalueata noticeablesurfaceoxidationinthissample. Noteon‘thinfilmunits’ hightemperature,butnearthesurface,wheretherearemany (cfsection6,equation(3)):forsiliconofdensity2.32gcc−1or more possible states due to the presence of lots of oxygen 5×1022atomscc−1,onethinfilmunit(1015atomscm−2)is0.2nm. (with its small signal and consequent small influence on the χ2),thefinalvalueisnotreacheduntiltheendofthecooling. Marriottetal(1999)havealsodemonstratedthatourMCMC toextractfromspectralikethiseventhoughinthiscaseonly algorithmiswellbehaved,inparticularthatitconvergesand twoelements(SiandFe)areinvolved. TheSQUEAKIEcode ‘mixes’ properly. Mixing means that the sampling process of Børgesen et al (1982) is a matrix inversion type of code exploresallpossiblesolutionsinanefficientmanner. which reconstructs the depth profile by a closed calculation fromtheseparatedpartialspectra,alsoshowninfigure1. In thiscase, thepartialspectraoverlapandSQUEAKIEcanbe 5. Mixedmetalsilicides used iteratively, starting from the surface. Alkemade et al (1990) use the same type of iterative procedure. (There are Our first set of examples is in the silicide system. One other more general analytical approaches which we discuss form of iron disilicide is a semiconductor, and in principle below.) it should be possible to fabricate a heterojunction laser in Figure2isaspectrumfromamixedFeandCosilicide, silicon. On the other hand, cobalt disilicide is a metal, and in this case with three elements we believe that no andinterestingthree-dimensionalmetallizationschemesseem iterative procedure for SQUEAKIE exists. The kinematical feasible. A successful fabrication method for thin films of gap between Fe and Co for these conditions is 18keV, only eithermaterialinsiliconwaferscouldhaveadramaticimpact slightly higher than the system energy resolution of about onsiliconprocessingformicroelectronics. Ionbeamsynthesis 15keV(equivalenttoabout30nmofSi). Therefore,onewould ofβFeSi2 (usinghighdoseionimplantation)hasresultedin notexpecttheretobemuchdiscriminationbetweenFeandCo. light emitting diode structures (Leong et al 1997), and the The manual method using spectrum simulation described in examplesinthissectionaretakenfromrelatedwork. section3aboveistrickytousesinceitisnotobvioushowthe contrast in the metal signal arises. Because of the difficulty anduncertaintyofextractingthedepthprofilefromtheRBS 5.1. Analyticalsolutions spectrum, cross-sectional transmission electron microscopy, Figure1showsahighdose,highenergyFeimplantintoSisuch x-ray photoelectron spectroscopy and other techniques were thattheFeandSisignalsoverlap. Depthprofilesarenoteasy deployedtodeterminethedepthprofile(Harryetal1996). R103 TopicalReview (a) stronglyconstrainsthepossibleaverageZ(atomicnumber)of thesample. ThesameappliestoalltheIBAtechniquessince theyallhavestronglyZ-dependentcross-sections. With the manual method of spectrum simulation the charge is often treated as a free parameter (the analyst may simply ‘normalize’ the spectrum at an appropriate place). However, with the algorithmic approach of SA no tacit assumptions are allowed, and the explicit value given to the charge·solid-angle product is a very important constraint of thefittingprocess. (b) 5.4. Batchanalysisbysimulatedannealing In Barradas et al (1998b) examples are shown of data from asetofover50samplesinvestigatingthekineticsoftheion beammixingandannealingbehaviourofsputteredmetalfilms onsilicon(figure3). Thesespectraarealsohardtoanalyse, especiallysinceunwantedsevereoxidationwasexperiencedin thefurnaceannealing. ByusingSA,reliablequantitativedata fromthislargesetofhardspectracouldberapidlyobtained, allowingusefulconclusionstobedrawneveninthepresence of the oxidation (see also Milosavljevic´ et al (2001, 2002)). With manual methods sufficiently precise results would not havebeenobtainableandtheexperimentswouldhavehadto Figure2. 1.5MeV4HeRBSof200keV25×1016cm−2double berepeated. implantsofCofollowedbyFeintoSi. Implantationtemperature Figure 3 shows a set of spectra (left-hand side) with wasabout350˚C.(a)RBSspectrumwithfit(——)andCoandFe partialfittedspectra(----). (b)Fitteddepthprofile(figure3of the corresponding extracted depth profiles (right-hand side). Barradasetal(1997).) Thesurfaceissignificantlyoxidised,butthe The spectral plots show the data and the calculated fits Oprofileisomittedforclarity. ThedepressionintheSi (which go through almost all of the data points), and also concentrationbetweenabout10and15×1017cm−2correspondsto the calculated partial spectra. The profiles (right-hand side) acubicsilicide,whilebelowabout10×1017cm−2thelayerhasa wereobtainedusingamodifiedversionofthematrixinversion stoichiometrysimilartotheSi-richα-FeSi phase. 2 code (SQUEAKIE) of Børgesen et al (1982) which can be applieddirectlysincethepartialspectraareavailableoncethe 5.2. Simulatedannealingsolution SAcalculationiscomplete. Thelayerstructuredeterminedby SA is shown by the histogram lines in the top depth profile It is because the spectrum of figure 2 is very contrasty, with (figure3(a),right-handside). many strong edges and peaks, that it is ideally suited to SA: even a rapidly prototyped SA code was able to find a solution very rapidly, taking only minutes on a 100MHz 5.5. Discontinuityofsimulatedannealingsolutions 486 PC (Barradas et al 1997). This example was the first Figure3raisestheinterestingquestionofwhichrepresentation demonstrationthatSAwasapowerfulalgorithmforsolving is more fundamental, the discontinuous layer structure theinverseRBSproblem. determined by SA, or the continuous profile determined by subsequently displaying the partial spectra calculated by SA 5.3. Sensitivitytocollectedcharge asconcentrationversusdepth,usingSQUEAKIE?Clearly,the realdepthprofileiscontinuous,soexperimentalistsmayprefer In Barradas et al (1998a) the data of Harry et al (1996) to see the inverted partial spectra rather than the raw layer are re-analysed by SA, this time each fit taking about 2min structure. However,althoughSQUEAKIEhastheadvantage on a 200MHz Pentium PC. The optimization of the cooling thattheextracteddepthprofileusesallthespectraldatainan scheduleisdiscussed,togetherwithademonstrationthatinthis (nominally)internallyconsistentway,thematrixinversionstill case,withcontrastfrombarelyseparatedmasses,theFe:Co hasanumberofproblems. ratioisextremelysensitivetoerrorsinthechargecollection. First, the partial spectra are completely dependent on A1%errorinthechargegivesa20%errorintheFecontent. SA. Unavoidable statistical error in the estimation of the Doubtless a careful analytical procedure might increase the layer structure will cascade into the partial spectra and the precisionofthistypeofmeasurement,butno-onehasyetgiven profiles calculated from them. Moreover, the energy loss acriticaldemonstrationofchargecollectionbetterthan1%. variation with depth used by the matrix inversion code is a We will return repeatedly to the issue of instrumental function of the composition, where the layer structure is parameters and the determination of analytical errors, here determinedbySA.Therefore,theenergylossisanunavoidably weemphasisethatthecollectedchargeisacriticalparameter discontinuousfunctionofdepth. Second,aniterativeprocess in the interpretation of IBA data. This is because for RBS ofusingtheinvertedprofilesasthebasisforanewproposed spectra the cross-section is proportional to the square of the layer structure is not simple (and may be not possible) to atomicnumber,sotheabsolutenumberofcountscollectedvery implementsinceasthecodestandstheinvertedprofilestake R104 TopicalReview (a) (b) (c) (d) (e) Figure3. RBSdatawithsimulatedannealingfits(left),andcorrespondingderiveddepthprofiles(right),ofsputterdeposited120nmFe layersonSiionbeammixedwitheitherAsorXeandannealedat900˚Cfor150min. Intheleftcolumn,thesolidlinesarethefittedspectra, andthepartialO,SiandFespectraareshownasdashedlines. Intheright-handcolumnthepartialspectraextractedwithDataFurnaceare replottedas‘depthprofiles’usingSQUEAKIE(Børgesenetal1982). Forclarity,onlyonefitteddepthprofileisshownasahistogram(top spectrum,right-handsidecolumn). Noticethedifferentdepthscaleintheright-handsidecolumnofe). Figure2ofBarradasetal(1998b). noaccountofthedetectorresolutionorthevariationofenergy Tosummarize:wecanreplotthespectrumpointbypoint resolution with depth (energy straggling). In fact, Børgesen onanelementalconcentrationversusdepthscale,butthispro- etalspecificallypointoutthattheiralgorithmwillnothandle cedureiscarriedoutoneachpartialspectrumindependently energy straggling (or the surface signals) correctly. Third, withoutattemptingtoforceself-consistencybetweenthevar- theinversionnecessarilyassumestheaccuracyoftheforward iousreplottedpartialspectra. ItisdoneonlyafterSAdeter- model,buterrorsinthestoppingpowerdatabase(whichcan minesavalidstructure(depthprofile)forthesample,andusing beatthe10%levelasdiscussedindetailbelow)meanthatthe thedepthdependentenergylosscalculatedfromthisstructure. depthsofinterfacesmaybeinconsistentlyestimatedbetween The plots obtained take no account of energy resolution and thedifferentelementsinthesample. Wehaveseenverylarge straggling,buttheyareveryusefulinsomecircumstances. artefactsdemonstrablyduetothiseffect. We have therefore concluded that the layer structure 5.6. Occam’sRazor determinedbySAisthemorefundamental,andfortheseand otherreasonswenolongerusetheBørgesenmatrixinversion We have up to now invoked the principle of Occam’s Razor code (SQUEAKIE). As we mentioned above, the apparent (non sunt multiplicanda entia praeter necessitatem, that is, advantage of SQUEAKIE is that it nominally imposes self- entitiesarenottobemultipliedexceptofnecessity; inother consistencyonthedepthprofilesobtained. Butsincewecan words,‘minimizeyourassumptions’)instatingthatwechoose demonstrate that actually these profiles are not internally the layer structure with the fewest layers to represent the consistentingeneralwearebetteroffusingasimplerinversion sample. See Garrett (1991) for an interesting application of algorithm that does not claim self-consistency, treating each Occam’s Razor to Bayesian probability. Thus, we make the partial spectrum independently, while using a correct depth layer structure as coarse as possible consistent with the data scalecalculatedfromthefittedstructure. (thecalculationmethodisdescribedinBarradasetal(1999b)). R105 TopicalReview Notethatthequalityoffitobtainedisoutstandinglygoodeven Børgesen et al’s matrix inversion code (‘SQUEAKIE’, with these very discontinuous solutions. From an analytical 1982)isamathematicalattempttosolvetheinverseproblem point of view it serves to emphasise that we find an optimal which we have already described (section 5) and which has solution;thesolutionisconsistentwiththedata,leavingopen severelimitations. Asimilarbutmuchmoregeneralapproach the question of what other different structures may also be hasbeentakenbyKoganetal(1994)basedonthereduction consistentwiththedata,andhowdifferenttheymightbe. We method for ill-posed problems developed by Pyt’ev (1983). returntothesequestionslater,butwenoteherethatitisclear Pyt’ev’smethodisforlinearcasesbutKoganetalhavegen- thatsomesortofcontinuityconditionoughttobeappliedto eralizedtheirtreatmentforthisnon-linearproblem. Kogan’s thistypeofdataandwehavenotyetsatisfactorilydetermined ‘BeamExpert’doesappeartobetrulygeneralpurpose,thatis, howsuchaconditioncouldbespecified. itiscapableofinvertingspectrawherethepartialspectraover- laporwherethereisnouniquesolution. However, itisalso notclearhowrobustitistoproblemsoftheforwardmodel,in 5.7. RBSbytheunskilled particularthoseduetoerrorsintheenergylossdatabase. Nor Finally,returningtofigure1,thisspectrumwasfromoneofa isitclearwhetherornotitisrobustinuse,orwhetherithas setoftwelvesamplescreatedinanexperimentalset-upasan instabilitiessimilartothoseexhibitedbySerruys’code. exerciseforagroupofschool-leavers(18-year-olds)aspartof Difficulties and shortcomings of this type of inversion anationalschemetogivethisagegroupatasteforuniversity code have been comprehensively and elegantly described research. Theseyoungpeoplehadapproximatelyfourhours by Cumpson (1995) for the (related) technique of angle- traininginRBSdatacollectionanddatareductiontechniques. resolved XPS. Cumpson also compares the properties of Thedatawerecollectedforthemandtheanalysiswasset-up, inversionandmaximumentropyalgorithms(wedescribethe andthenthey,essentiallyunaided,usedtheDataFurnacesoft- MaxEnt code of the Garching group below, section 9), and waretoextractdepthprofilesfromthewholedataset,writing he comments on the usefulness of the MaxEnt approach, up the results (including a full presentation of the methods, especially when uncertainty estimates are required. Our aimsandcomplementaryopticalresults)inasingleday. This approachismathematicallycloselyrelatedtotheMaxEntone. expressesexactlythebenefitsofusingthisnewsoftwaretool forsolvingRBSspectra. Thedataturnedouttobeveryinter- 6.3. TheRBSforwardmodel estingandweresubsequentlypublished(Belsonetal1999). The RBS forward model is straightforward to calculate numerically. The sample is divided into thin layers and the 6. Theforwardmodel scattering of the incident beam (of energy E ) from each 0 layer is considered. Then the energy of the beam E before q In this section, we explicitly state the forward model that scatteringaftercrossinglayerq isgivenbyanintegrationof we use, and also describe the code structure of the IBA theenergylossesontheinwardpathtothelayer: DataFurnace,andanextensiontomoderatelyroughsurfaces. Wealsobrieflyreviewothersimulationandfittingcodes. (cid:3)q−1 E =E −secθ tε(p,E) (3) q 0 1 i i i i=0 6.1. Simulation where θ is the incident beam angle from normal, t is the 1 i All IBA labs have simulation codes to calculate spectra to thickness of the ith layer (in thin film units, that is, in comparewithcollecteddata:theseareequivalenttowhatwe µgcm−2 or equivalent units), and E is the incident beam 0 have called the forward model and have long been available energy. ε(p,E) is the energy loss (per unit depth, in thin (Brice1973). Thereareanumberofcodes,withvarioususeful i i film units) in the ith layer which(cid:2)has a composition given userinterfacefeatures,whichhavebeenwellreviewedbyKo´tai by p = {p ,...,p } where e p = 1 and there (1994). TheDataFurnaceisnotasimulationcode;itisafitting i 1,i e,i j=1 j,i are e elements. The number of scattering centres C of e,i code,incorporatingasimulationcodeinitscore. Itreplaces element e in layer i is then given by C = tp. Note that all existing single scattering simulation codes (we discuss this structure {t,p } has the same cheimicali eilements but i ij multiple scattering codes below in section 7). Because the a different (larger) number of layers from the depth profile forwardmodelisexecutedrepeatedlyforeachfititisoptimized x={l,e }definedpreviously. Thelayerthicknessest must i ij forspeed. However,someimprovementofthealgorithmmay bechosensmallenoughthatε(Ei−1) ∼= ε(Ei)otherwisethe beobtainableusingSerruys’retrogrademethod(1991). numericalintegrationwillcumulateerrors. The energy loss ε is calculated from a database of the 6.2. Fitting energylossofionsinmatter(Ziegleretal1985)thatextends acrosstheentireperiodictableforbothbeamandtarget,andfor Ko´taidoesnotmentionSerruysetal’sinteresting‘PERM’code allbeamvelocitiesaboveaboutthatof50keVH.Theenergy (1993)whichisatruefittingcodebasedonanovelsimulation lossinanytargetcanbecalculatedbylinearsuperpositionof algorithm (Serruys 1991) and a novel and efficient spectrum elementalenergylosses. Thus smoothingalgorithm(Serruys1990). However,Serruys’code isnotatruealgorithm,relyingasitdoesbothonareasonable (cid:3)e initialguessbytheuserandalsoonsomeguidancebytheuser ε(p,E)= p ε(j,E) (4) i i ij i duringexecution. j=1 R106
Description: