REPORT DOCUMENTATION PAGE Form Approved OMB NO. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggesstions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA, 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to any oenalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. PLEASE DO NOT RETURN YOUR FORM TO THE ABOVE ADDRESS. 1. REPORT DATE (DD-MM-YYYY) 2. REPORT TYPE 3. DATES COVERED (From - To) New Reprint - 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Tree approximation of the long wave radiation parameterization W911NF-07-1-0185 in the NCAR CAM global climate model 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 611103 6. AUTHORS 5d. PROJECT NUMBER Alexei Belochitski, Peter Binev, Ronald DeVore, Michael Fox-Rabinovitz, Vladimir Krasnopolsky, Philipp Lamby 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAMES AND ADDRESSES 8. PERFORMING ORGANIZATION REPORT NUMBER University of South Carolina Research Foundation 901 Sumter ST, 5th floor Byrnes Columbia, SC 29201 -3961 9. SPONSORING/MONITORING AGENCY NAME(S) AND 10. SPONSOR/MONITOR'S ACRONYM(S) ADDRESS(ES) ARO U.S. Army Research Office 11. SPONSOR/MONITOR'S REPORT P.O. Box 12211 NUMBER(S) Research Triangle Park, NC 27709-2211 52559-MA-MUR.22 12. DISTRIBUTION AVAILIBILITY STATEMENT Approved for public release; distribution is unlimited. 13. SUPPLEMENTARY NOTES The views, opinions and/or findings contained in this report are those of the author(s) and should not contrued as an official Department of the Army position, policy or decision, unless so designated by other documentation. 14. ABSTRACT The computation of Global Climate Models (GCMs) presents significant numerical challenges. This paper presents new algorithms based on sparse occupancy trees for learning and emulating the long wave radiation parameterization in the NCAR CAM climate model. This emulation occupies by far the most significant portion of the computational time in the implementation of the model. From the mathematical point of view this parameterization can be considered as a mapping which is to be learned from scattered data samples (xi,yi), 15. SUBJECT TERMS Climate and weather prediction; Numerical modeling; Non-parametric regression; High-dimensional approximation; Neural network; Sparse occupancy tree 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 15. NUMBER 19a. NAME OF RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE ABSTRACT OF PAGES Andrew Kurdila UU UU UU UU 19b. TELEPHONE NUMBER 540-231-8028 Standard Form 298 (Rev 8/98) Prescribed by ANSI Std. Z39.18 Report Title Tree approximation of the long wave radiation parameterization in the NCAR CAM global climate model ABSTRACT The computation of Global Climate Models (GCMs) presents significant numerical challenges. This paper presents new algorithms based on sparse occupancy trees for learning and emulating the long wave radiation parameterization in the NCAR CAM climate model. This emulation occupies by far the most significant portion of the computational time in the implementation of the model. From the mathematical point of view this parameterization can be considered as a mapping which is to be learned from scattered data samples (xi,yi), i=1,…,N. Hence, the problem represents a typical application of high-dimensional statistical learning. The goal is to develop learning schemes that are not only accurate and reliable but also computationally efficient and capable of adapting to time-varying environmental states. The algorithms developed in this paper are compared with other approaches such as neural networks, nearest neighbor methods, and regression trees as to how these various goals are met. REPORT DOCUMENTATION PAGE (SF298) (Continuation Sheet) Continuation for Block 13 ARO Report Number 52559.22-MA-MUR Tree approximation of the long wave radiation p ... Block 13: Supplementary Note © 2011 . Published in Journal of Computational and Applied Mathematics, Vol. 0, (0), Ed. 0 (2011), (Ed. ). DoD Components reserve a royalty-free, nonexclusive and irrevocable right to reproduce, publish, or otherwise use the work for Federal purposes, and to authroize others to do so (DODGARS §32.36). The views, opinions and/or findings contained in this report are those of the author(s) and should not be construed as an official Department of the Army position, policy or decision, unless so designated by other documentation. Approved for public release; distribution is unlimited. JournalofComputationalandAppliedMathematics236(2011)447–460 ContentslistsavailableatSciVerseScienceDirect Journal of Computational and Applied Mathematics journalhomepage:www.elsevier.com/locate/cam Tree approximation of the long wave radiation parameterization in the NCAR CAM global climate model AlexeiBelochitskia,PeterBinevb,⇤,RonaldDeVored,MichaelFox-Rabinovitza, VladimirKrasnopolskyc,PhilippLambyb aEnvironmentalSystemScienceInterdisciplinaryCenter,UniversityofMaryland,MD,USA bInterdisciplinaryMathematicsInstitute,UniversityofSouthCarolina,Columbia,SC,USA cEnvironmentalModelingCenter,NationalCentersforEnvironmentalPrediction,NationalOceanicandAtmosphericAdministration,MD,USA dDepartmentofMathematics,TexasA&MUniversity,CollegeStation,TX,USA a r t i c l e i n f o a b s t r a c t Keywords: The computation of Global Climate Models (GCMs) presents significant numerical Climateandweatherprediction challenges. This paper presents new algorithms based on sparse occupancy trees for Numericalmodeling learningandemulatingthelongwaveradiationparameterizationintheNCARCAMclimate Non-parametricregression model.Thisemulationoccupiesbyfarthemostsignificantportionofthecomputational High-dimensionalapproximation Neuralnetwork time in the implementation of the model. From the mathematical point of view this Sparseoccupancytree parameterizationcanbeconsideredasamappingR220 ! R33 whichistobelearned from scattered data samples (xi,yi), i 1,...,N. Hence, the problem represents a = typicalapplicationofhigh-dimensionalstatisticallearning.Thegoalistodeveloplearning schemes that are not only accurate and reliable but also computationally efficient and capableofadaptingtotime-varyingenvironmentalstates.Thealgorithmsdevelopedin thispaperarecomparedwithotherapproachessuchasneuralnetworks,nearestneighbor methods,andregressiontreesastohowthesevariousgoalsaremet. ©2011ElsevierB.V.Allrightsreserved. 1. Introduction Themaincomputationalburdeningeneralcirculationmodels(GCMs)forhigh-qualityweatherpredictionandclimate simulation is caused by the complexity of the physical processes that include, in particular, atmospheric radiation, turbulence,convection,clouds,largescaleprecipitation,constituencytransportandchemicalreactions.Theseprocesses aremodeledbyparameterizationswhicharecomplex1Dsubgrid-scaleschemesformulatedusingrelevantfirstprinciples andobservationaldata.Theevaluationoftheseparameterizationstypicallyconsumes70%–90%oftheoverallCPU-time in the GCM we consider in this work—the National Center of Atmospheric Research (NCAR) Community Atmospheric Model(CAM). Toovercomethiscomputational‘‘bottleneck’’,ithasbeenproposedtoemploystatisticallearningmethodslikeneural networksinordertoacceleratetheevaluationoftheparameterizations.Thisideaoriginatesfrom[1],whereabatteryof neuralnetworkshasbeenusedtoapproximatecertainpartialfluxeswithinthelongwaveradiation(LWR)parameterization oftheEuropeanCenterforMedium-RangeWeatherForecasts,explicitlyexploitingthestructureofthatparticularmodel. Anothermoredirectandmoregeneralapproachispursuedin[2]whereasingleneuralnetworkemulationisusedto entirelyreplacetheLWRparameterizationintheNCARCAMasawhole.Thisparameterizationtakesasurfacecharacteristic ⇤ Correspondingauthor.Tel.:+18035766269. E-mailaddresses:[email protected](A.Belochitski),[email protected](P.Binev),[email protected](R.DeVore),[email protected] (M.Fox-Rabinovitz),[email protected](V.Krasnopolsky),[email protected](P.Lamby). 0377-0427/$–seefrontmatter©2011ElsevierB.V.Allrightsreserved. doi:10.1016/j.cam.2011.07.013 448 A.Belochitskietal./JournalofComputationalandAppliedMathematics236(2011)447–460 andthediscretizationsoftenverticalprofilesoflocalphysicalpropertiesandgasconcentrationsasinputandreturnsa verticalprofileofheatingratesandsevenheatfluxesasoutput.Mathematicallythisparameterizationcanbeconsideredasa mappingf R220 R33,seeSection2.1.Thebasicideaofanemulationis,foranygiveninput,togetanapproximationofthe : ! outputvariablesbyevaluatingalearnedapproximation(afastprocess)insteadofevaluatingtheoriginalparameterization (which is a rather complex numerical simulation in its own right). The approximating function is learned from a set of training data points (xi,yi) for which the outputs yi f(xi) have been computed by solving the original physical = parameterization.Theapproximationshouldbebothaccurateandeasytoevaluateinordertoprovideasignificantspeed-up oftheGCMwithoutsignificantlychangingthepredictionofalong-termclimatesimulation. Whileartificialneuralnetworkscanbeconsideredasthecurrentstate-of-the-artblackboxmethodologyforawide rangeofhigh-dimensionalapproximationproblems,andjustifiablyso,theymaynotnecessarilybethebestsolutionforthis particularapplication.Theaccuracyofneuralnetworkemulationsdependsonthenumberoflayersandhiddenneurons employed.Whileitisknownthatneuralnetworksareuniversalapproximators,i.e.,theycanapproximateanycontinuous functionstoanypredeterminedaccuracy(seeforinstance[3,4]),thisisachievedonlybyallowingthenumberofneuronsto increasearbitrarily.However,thelearningofthenetworkparameters(weights)requiresthesolutionofalarge,non-linear optimizationproblem,whichisverytime-consuming,pronetodeliversub-optimalsolutionsand,thus,severelylimitsthe complexityofthenetworkthatonecanaffordtotrain. Thisbecomesapracticalissueconsideringthefollowingtask:theapproximationistrainedbyadatasetthatconsistsof evaluationsoftheoriginalparameterizationgainedduringareferencerunoftheclimatemodel.Theinputsofthistraining dataset,therefore,coverthephysicalstatesobservedduringacertaintimeperiodofclimatehistory.However,thedomain inwhichtheparameterizationistobeevaluatedmaychangewithtimeasinthecaseofclimatechange.Insuchsituations theapproximationmaybeforcedtoextrapolatebeyonditsgeneralizationability,whichmayleadtolargeerrors.Inthis caseitcouldbecomenecessarytore-traintheemulationinordertoadaptittothenewenvironment. Itwouldthereforebeadvantageoustohaveanalternativetoneuralnetworksthatwouldofferaneasiertrainingprocess and that would perhaps even be capable of incremental learning. Additionally, if one could estimate the error of the approximationforacertaininput,onecouldusetheoriginalparameterizationasafall-backoptionduringarunofthe GCMandimmediatelyincorporatethenewdataintotheemulation.Actually,[5]addressestheproblemoferrorestimation foraneuralnetworkemulation,butthequestionofhowtodynamicallyadapttheapproximationisleftopen. Inthepresentpaperwesearchforanalternativetoneuralnetworkswithintheclassofnon-parametricapproximation methods.Wecannotofferafullrealizationoftheprogramoutlinedabove,butwerestrictourselvestobasicdesigndecisions andtestingwhethersuchaprogramhasanychancetobesuccessfullyimplemented.Inparticular,wediscussthefeatures oftwocommonstatisticallearningparadigms,(approximate)nearestneighborsandregressiontrees,andpresentanew algorithm based on what we call sparse occupancy trees, which aims to provide a very efficient nearest neighbor type algorithmcapableofincrementallearning.Thedevelopmentofthelatterconceptwasoriginallymotivatedbythepresent applicationandiscomprehensivelydescribedin[6]. In order to motivate why we were actually trying to design new algorithms instead of just using an off-the-shelf algorithm,webrieflydescribetheaforementionedtechniques.MoreinformationcanbefoundinSection3andinstandard textbookslike[7].Non-parametriclearningmethodstypicallytrytopartitiontheinputspaceandthenusesimplelocal models like piecewise constants to approximate the data. In the case of nearest neighbor methods, the input space is implicitlypartitionedbythewaythetrainingdataisdistributed:theapproximationisconstantforquerypointsthathave thesamesetofnearestneighbors.Unfortunatelyinhighdimensionstherearenofastalgorithmswhichcouldanswerthe question‘‘whatarethenearestneighborstoagivenquerypointx?’’.Thereforeonemustbecontentwithapproximate answerstothisquestionthatcanberealizedusingso-calledkd-orbd-trees.Here,assumingthatallthetrainingdatais availablebeforehand,theinputdomainisrecursivelypartitioneddependingonthedistributionoftheinputpoints. Regressiontreesfollowamoreadaptiveapproachandalsousethey-valuesinordertodefinethedomainpartition. Here,startingwiththeentireinputdomain,thecellsinthepartitionarerecursivelysubdividedsuchthattheresidualofthe resultingapproximationisminimizedineachstep.Obviously,duetotheirrecursivedefinition,noneofthesetechniquesis availableforincrementallearningwithoutmodification:anewdatapointcouldtheoreticallychangethedecisionhowto performthefirstsplitinthetree,whichwouldrequirerelearningthetreefromtheverybeginning.Sparseoccupancytrees, ontheotherhand,encodethedatainaformatthatisindependentofthedistributionoftheincomingdata. Itmustbenotedherethatnopartitioningschemecanbeexpectedtobesuccessfulforarbitraryhigh-dimensionaldata. (Thesamecouldbesaidaboutneuralnetworks,althoughforotherreasons.)Forinstance,ifthedatapointswereuniformly distributedinaveryhigh-dimensionalspace,theattempttogeneratelocalapproximationslikethosedescribedabovewould bedoomed,becausetheaveragedistancebetweenaquerypointandthebestfittingdatapointmightbecomelargeevenfor hugetrainingdatasets.Thisisoftenreferredtoasthecurseofdimensionality.Oneusuallymakestheassumptionthatthedata isactuallydistributedoversomelower-dimensionalsubmanifoldorisconcentratedinasubsetofsmallmeasurewithinthe wholeinputspace.Inourspecialcasethisassumptionisjustifiedbecausetheinputdataisgroup-wisestronglycorrelated. Onepurposeofthisworkistoquantifythiseffect,andinSection4.3wegiveanestimateoftheintrinsicdimensionsofthe datasetwhichshowsthatnon-parametricapproximationofthelongwaveradiationdatashouldindeedbefeasible. Themainobstaclefortheapplicationoftree-basedapproximationschemesseemstobeimplementingitinahighly parallel computer system, which is unavoidable in the simulation of huge, complex systems like global climate. Non- parametricapproximationmethodsarememorybased,i.e.,theyneedtostoreallthetrainingdatapermanently.Thislimits A.Belochitskietal./JournalofComputationalandAppliedMathematics236(2011)447–460 449 theirpracticalusetosharedmemorysystems,becauseondistributedmemorysystemseachcoreorprocessorcanaddress onlyalimitedamountofdata.ThecurrentimplementationoftheNCAR-CAMsystemhowever,seemstobeoptimizedfor suchdistributedmemoryarchitecturesandrequirestheemulationtobestoredoneachindividualprocessor.Nevertheless, asproofofconcept,weperformeda10-yearclimatesimulationwheretheoriginalparameterizationwasreplacedbya regression tree. We report on this numerical experiment in Section 5. The tree was chosen as a compromise between accuracyandstorageconsiderationsandtrainedwitharatherlimitedamountofdata.Thatmeansthatoneofthemain advantages of non-parametric learning, namely its ability to cope with large amounts of data, was not really exploited. Despitethat,wewereabletoachievearatherpromisingresult,whichindicatesthepotentialusefulnessofthisapproachin futureprojects. 2. Generalconsiderations In this section we provide the mathematical formulation of the approximation problem and introduce some basic notation used throughout the paper. Furthermore, we make some very general remarks concerning the design of approximationschemesforheterogeneousvector-valuedfunctions. 2.1. StructureoftheLWRparameterization The input vectors for the LWR parameterization include one surface characteristic and ten profiles: atmospheric temperature;humidity;theconcentrationsofozone,CO ,N O,andCH ;twochlorofluorocarbonmixingratios;pressure; 2 2 4 andcloudiness.Theprofilesarediscretizedandpresentedbythevaluesin26verticallayersoftheatmosphere.Someofthe quantitiesareconstantincertainlayersandhavethereforebeenfilteredoutoftheinputvectors,effectivelyleaving220 inputparameters.Theoutputvectorsconsistofaprofileofheatingratesandsevenradiationfluxes,altogether33values. Hence,theparameterizationcanbeconsideredasafunction f R220 �! R33 . :⇢x=(xj)j=1,...,220 7�! y=(yl)l=1,...,33� Inwhatfollows,wedenotethecomponentsofavectorwithsubscripts,whileweusesuperscriptstoindicatedifferentdata pointsintheset. 2.2. Errorstatistics The goal is to find an approximation that closely matches the original parameterization when used in the GCM. Of course,onecannotaffordtotestthisinthedevelopmentstageofanapproximation,butonecantestthequalityofthe approximationusingstatisticalmeans.Theprocedureisusuallyasfollows:wegeneratetwodatasets.Thetrainingdata set X (xi,yi), i 1,...,N is used to train the parameters of the approximation, for instance the weights of the = { = } neuralnetortheunderlyingpartitionofthetreealgorithm.Thenthetestdatasetisusedtomeasurethebiasandresidual mean square errors to achieve a statistical evaluation of the approximation accuracy. We denote the test data set with X (xi,yi), i 1,...,M .Furthermoreweindicateanemulationoftheoriginalparameterizationwithf anditsoutput ˆ ={ ˆ ˆ = } ˜ f(xi)byyi.Usingtheseconventions,thebiasorsystematicerroroftheapproximationforthel-thlayerisdefinedby ˜ ˆ ˜ 1 M B yi yi (1) l = M ˆl�˜l i 1 X= andthetotalbiasis 1 D B B (2) = D l l 1 X= whereDisthenumberofoutputcomponents.Sincetheheatingratesandtheheatfluxesintheoutputvectoraremeasured indifferentunits,ithardlymakessensetocomputeerrormeasuresforall33outputcomponentssimultaneously;instead werestrictourselvestotheheatingrates,i.e.,thefirst26components.InthiscaseDcanbeconsideredasthenumberof layers.Thebiasisactuallynotameasurementofaccuracy(itcouldbezeroevenforanarbitrarilyinaccurateapproximation), butagoodapproximationwithalargebiaswouldbeunacceptable,becauseonehastofearthatsystematicdeviationscould accumulateintheGCM. Wemeasureaccuracyusingtherootmeansquareerror(RMSE).TheRMSEofthel-thoutputcomponentisdefinedas 1 M RMSE (yi yi)2 (3) l =vuuM Xi=1 ˆl�˜l t 450 A.Belochitskietal./JournalofComputationalandAppliedMathematics236(2011)447–460 Fig.1. Layerbias.Scatterplotsoftheneuralnetworkapproximationforthelayersl 26andl 9.Ineachdiagramthevaluesoftheoriginal = = parameterizationylareplottedalongthehorizontalaxis,theapproximatedvaluesy˜lalongtheverticalaxis.Thegreenlineshowsthediagonalyl=y˜l. andthetotalRMSEis 1 D RMSE RMSE2. (4) =vuuDXl=1 l t 2.3. Neuralnetworkapproximations Anaturalbenchmarkforcomparisonwiththepresentistheneuralnetworkemulationdescribedin[2].Thisisastandard feed-forwardneuralnetwithonehiddenlayerofneuronsoftheform m f(x) ↵ ↵�(� x �) (5) ˜ = 0+ i i· + i i 1 X= where� isthesigmoidalfunction�(x) = tanh(x),andthedirections�i 2 Rd,weights↵i 2 Rd0 andintercepts�i 2 Rare learnedfromthetrainingdatabyleastsquaresfitting.Inparticular,weuseasreferenceanetworkwithm 80hidden = neuronswhichhasbeentrainedwith196,608selectedevaluationsoftheoriginalparameterizationfromareferencerunof theNCAR-CAMsimulatingtheclimatefortheyears1961–1962. 2.4. Monolithicschemesvs.batteries Note,thattheaboverepresentationcanbedescribedasamonolithic,vector-valuedapproach.Anotherpossiblenetwork designcouldhavebeen m y˜j =↵0,j+ ↵i,j�(�i,j·x+�i,j) (6) i 1 X= with�i,j 2 Rdand�i,j 2 Rasabovebut↵i,j 2 R.Thatis,onecouldhaveusedabatteryofneuralnetworksapproximating eachoutputcomponentindividually.Thiswould,ofcourse,increasethetrainingandevaluationcosts,butwouldalsobe moreaccurate.Actually,thevector-valueddesignpotentiallycanbecomethesourceofbias.Inthetrainingoftheneural network,thetotalRMSEisusedastheoptimizationcriterion.However,thevarianceoftheoutputdataishigherforthe layersnearertothesurfaceoftheearth,hencetheerrorisdominatedbytheselayersandtheneuralnetisadaptedmainly tothem.Inotherlayers,theerrorsmightberelativelylarger.ThisisdemonstratedinFig.1whichshowsscatterplotsofthe referenceneuralnetforthelayers9and26.Onecanclearlyseethattheapproximationinlayer9isbiased.Onehastonote however,thatthescalesinthetwodiagramsdifferbyalmostafactorof30,i.e.,theabsoluteerrorintroducedbythisbiasis stillverysmall.Apossibleremedyforthisistonormalizetheoutputsinthelearningprocessaccordingtotheirvariances. Then,alloutputcomponentswillhavethesamerelativeaccuracy,possiblyatthecostofthetotalaccuracy.Thistechnique couldalsobeappliedtothetree-basedmethodsconsideredinthecurrentpaper. Sinceweareinterestedinexaminingthepotentialofpossiblealternativestothecurrentemulationdesign,weconsider mainly, but not exclusively, component-wise approximation in the following. The disadvantages of component-wise A.Belochitskietal./JournalofComputationalandAppliedMathematics236(2011)447–460 451 approximationsarethattheymightbecomecomputationallymoreexpensiveandcanpotentiallyreturnunrealisticprofiles, becausetheymaynotproperlyrepresentthecorrelationsbetweencomponentsoftheprofilevector.Thesecorrelationsare naturallyrepresentedbyvector-wiseapproximation. 3. Descriptionofalgorithms Inordertokeepthispaperself-containedwegiveaconcisedescriptionofthenon-parametricalgorithmsthatwewill considerinthefollowingnumericalexperiments.Thereby,wediscussthenearestneighborandregressiontreesonlyvery briefly,becausetheyarewellestablishedandcomprehensivelydiscussedintheliterature.Wegiveamorecomprehensive accountofthesparseoccupancytrees,because,asexplainedintheintroduction,theyarenewandhavebeendeveloped specificallyforthisapplication. 3.1. (Approximate)Nearestneighbors 3.1.1. Basicconcepts Thek-nearestneighbormethodworksasfollows:onedefinesametric ontheinputspaceandgivenaquerypointx k·k findsapermutationn i ofthetrainingdatasuchthat ! n xi1 x xi2 x xik x xip x k � kk � k···k � kk � k forallp>k.Then,oneaveragesthefunctionvaluescorrespondingtothesenearestneighbors k f(x) yin ˜ = n 1 X= todefineanapproximationoff(x).Unfortunately,itiswellknownthatinveryhighdimensionsitisnotpossibletodesign fastalgorithmsthatprovidethepermutationsought.Insteadonerelaxesthesearchandiscontentwithanalgorithmthat returnsapermutationn j suchthat ! n xjn x <(1 ") xin x k � k + k � k foralln.Therearealgorithmsbasedonkd-treesorbd-treesthatprovideafastanswertothisrelaxedproblem,if"ischosen largeenough.Anintroductiontothistopiccanbefoundin[8]. 3.1.2. Datascaling Thecentralpointintheabovedescriptionis,ofcourse,howtodefinethemetric ontheinputspace.Thisisanon- k·k trivialtaskbecausetheinputvectorsincludeseveralphysicalquantitiesmeasuredindifferentunitsandarevaryingover severalordersofmagnitude.Atrivialmethodtoequilibratethevariousinputquantitiesistocomputethemaximumand minimumofeachparameterinthetrainingdatasetandtothenscalethiseachcomponentoftheinputvectorindividuallyto theinterval 0,1 .Then,oneusesthestandardEuclidiannormon 0,1 dtomeasurethedistances.Anotherself-evidentidea [ ] [ ] istoscalethevariablesbelongingtothesameprofilewiththesamefactors.Numericalexperimentsshowedthatthesecond typeofscalingyieldsbetterresults.Therefore,weusethisscalinginthefollowingexperiments.Adaptivenearestneighbor methodstrytolearnaproblemdependentmetricfromthedata,butwehavenotpursuedthisapproachanyfurther,because thedataseemstobetoosparsetodefinelocalmetricsreliablyforthisapplication. 3.2. Regressiontrees 3.2.1. CART ThemostbasicalgorithmforthegenerationofregressiontreesistheClassificationandRegressionTree(CART)algorithm intensivelyanalyzedin[9].Itcanbesummarizedasfollows:weinitializethepartitionP ⌦ where⌦ d a,b isa hyper-rectanglethatcontainsallthetrainingpointsanddisthedimensionoftheinputsp=ac{e.T}hen,eachh=yperi-=r1e[ctiangi]lein Q thepartitionthatcontainsmorethanagivennumbermofdatapointsisrecursivelysubdividedalongahyperplanex c, i = wherei 1,...,d andc a,b ischosensuchthattheRMSEofthebestpiecewiseconstantapproximationonthe 2{ } 2[ i i] refinedpartitionisminimized.Thatis,theregressionfunctionassumestheaveragevalueofallthepointsinagivenhyper- rectangleofthepartition.Thisisareasonablyefficientalgorithmthatcanbeusedforawiderangeofclassificationand regressionproblems.Italsohastheadvantagethatitisindependentofthescalingofthedata. 3.2.2. RandomForests TheRandomForestsalgorithmin[10]averagestheresponseofagivennumberT ofCARTtrees.Hereby,beforeeach subdivisionstepinthegenerationoftheCARTtrees,thealgorithmchoosesarandomsubsetofsizePoftheinputparameters 452 A.Belochitskietal./JournalofComputationalandAppliedMathematics236(2011)447–460 (typically about one third of all input parameters) along which the cell is allowed to be subdivided. This ensures that eachsingleCARTtreegeneratesdifferentpartitionsofthedomain.UsuallythesingleCARTtreesinaRandomForestare notpruned,i.e.,onetakesm 1intheCART-algorithmanddeliberatelyoverfitsthedata.Nevertheless,RandomForest = approximationsarerelativelysmoothduetotheaveragingprocess.RandomForestsaregenerallyconsideredtobeoneof thebestavailableblack-boxnon-parametricregressionmethods. 3.3. Sparseoccupancytrees Sparseoccupancytreeshavebeendevelopedasanalternativeforapproximatenearestneighbormethods(see[6]).They aredesignedtoscalewellinhighdimensionsandtobereadilyavailableforonline-learning. 3.3.1. Setting Thesparseoccupancyalgorithmstrytoexploitbasicideasfrommultiresolutionanalysisforhighspatialdimension.The mainunderlyingdatastructureofthemultilevelconstructionisthesubdivisiontreewhichislinkedtoahierarchyofnested partitions.Thatis,weassumethatforeachlevell�0thesetsPl ={⌦l,k, k2Il}arepartitionsoftheinputspace⌦and eachcell⌦l,k 2Plisthedisjointunionofcellsonthenextfinerlevell+1: ⌦ ⌦ . l,k = l+1,r r[2Il,k ThehierarchyofpartitionsinducesaninfinitemastertreeT⇤,whoserootis⌦andwhoseothernodesarethecells⌦l,k.Each nodNeo⌦wl,kleotfutshiasstsruemeiestchoantnaecttreadinbinygandaetdagseettoXitschi(ldxir,eyni)⌦,l+i1,r w1h,e.r.e.,rN2Iisl,kg.iven.AnoccupancytreeT(X)isafinite = { = } subtreeofT⇤thatcontainsonlythenodesthatareoccupiedbyatleastonesamplexifromthedatasetX.Inhighdimensions occupancytreesaretypicallysparse,becausetherearemanymorecellsthandatapoints.Furthermore,thepointscanbe storedinacompactway.Inthefollowingtwosubsections,wepresenttwoalgorithmsthatmakeuseofthisdatastructure. 3.3.2. Piecewiseconstantapproximationoncubesubdivisions Apiecewiseconstantapproximationbasedonanoccupancytreecanbedefinedasfollows:givenaquerypointxwe averagethevaluesofpointsinthefinestcelloftheoccupancytreecontainingthequerypoint.Thisgeneralidea,which worksforanysubdivisiongeometry,ismosteasilyrealizedfordyadiccubesubdivision,becauseitcanbeveryeasilyencoded. Assumingthatalldataarescaledsuchthattheyfitintotheunithypercube 0,1 dthepartitionsaregivenby [ ] d Pl =( [ki2�l,(ki+1)2�l], ki 2{0,...,2l�1}). i 1 Y= Foranypointx (x ,...,x ) 0,1 dwecanwriteitscomponentsinbinaryrepresentationas = 1 d 2[ ] 1 xi = bik2�k. k 1 X= Notethatthefinitesequenceb ,b ,...,b isabinaryrepresentationoftheintegerk ofthelevel-lcelltowhichthedata i1 i2 il i pointbelongs.AssumingthatamaximumrefinementlevelLischosenwecomputethebitstream (b ,b ,...,b ,b ,b ,...,b ,...,b ,b ,...,b ) 11 21 d1 12 22 d2 1L 2L dL foreachtrainingandquerypoint.Thesubsequencesofdbitsb ,...,b arecalledcharacters.Thebitstreamsofthetraining 1l dl pointsaresortedlexicographically.Then,inordertodeterminewhichpointsonehastoaveragetoansweraquery,onefinds thepointswhosebitstreamshavethemostnumberofleadingcharactersincommonwiththebitstreamcorrespondingto thequerypoint.Thisisessentiallyabinarysearch. Thisalgorithmisveryfastandstorageefficient,becauseitreplacestheinputdatawithbitstreams.Furthermore,itis inherentlysuitedtoincrementallearning:ifonegetsanewdatapoint,onecomputesitsbitstream,sortsitintothelist ofalready-existingbitstreams,andthenthenextquerycanimmediatelybeaccepted.Conceptuallythiscodeisrelatedto nearestneighborapproximationbecauseitonlyconsidersproximityofthepointsintheinputspace,usingthesimilarity ofthebitstreamsasmetric.Unfortunately,thisrecoveryschemeisbyfarnotasaccurateasthenearestneighborscheme becauseofthetreestructure.Thatis,foranytwopointsx,x0 2XthetreedistancedistT(x,x0)istheshortestpathinthetree T(X)connectingthenodes⌦L,k(x) 3 xand⌦L,k0 3 x0.Ofcourse,whereaskx�x0kmaybearbitrarilysmallforanyfixed normk·k onRd,thetreedistancedistT(x,x0)couldbe2L.Theaboverecoveryschemetakeslocalaveragesoffunctionvalues whosetreedistanceissmall,possiblyomittingvaluesforargumentsthataregeometricallyveryclose.Thereareseveral possibleremedies.Sincetherecoveryschemeisveryfast,perhapsthesimplestoneistoperformseveraldifferentrecoveries withrespecttodifferentshiftsofthecoordinatesystem(chosenrandomly)andthentaketheaverageoftheoutputs.For example,inourimplementationwescalethedatatotheinterval 0.3,0.7 ,andthenshiftthedatawithrandomvectorsin [ ] A.Belochitskietal./JournalofComputationalandAppliedMathematics236(2011)447–460 453 [�0.3,0.3]d.Letf˜⇢(x)denotetheresultofaqueryatxwiththedatashiftedbythevector⇢andX⇢(x)bethecorresponding setoftrainingpointsintheleafofthesparseoccupancytreecontainingx.Furthermore,letR(x)bethesetofshifts⇢ for whichtheleveloftheevaluationismaximal.Then 1 f˜(x)= #(R(x)) f˜⇢(x). (7) ⇢X2R(x) Withamoderatenumberofrandomshifts,onecanusuallyachieveanaccuracysimilartothatofthenearestneighbor method. 3.3.3. Sparseoccupancytreesusingsimplices Overcomingthetree-distanceproblemmotivatesanotherapproachthatconstructspiecewiselinearapproximationson simplexsubdivisions.Inthefollowingdescriptionweleaveoutthetechnicaldetails,whichcanbefoundin[6].Webasethe approximationonahierarchyofpartitionsgeneratedbydyadicsimplexsubdivision.Thatis,eachsimplexSinthepartition P onlevellissubdividedinto2dsimplices(itschildren)onlevell 1. l + For this purpose we use a scheme described in [11] which is based on recursive binary subdivision. In each binary subdivisionsteponeedgeofthecurrentsimplexischosenandsubdividedatitsmidpoint.Tobemorepreciseletusdenote with(x ,x ,...,x ) asimplexthatarisesfromasimplexinthedyadictreebygbinarysubdivisions.Thenthissimplexis 0 1 d g dividedintothetwosubsimplices x x x , 0+ d,x ,...,x ,x ,...,x 0 1 g g 1 d 1 2 + � ✓ ◆g 1 + and x x x , 0+ d,x ,...,x ,x ,...,x d 1 g d 1 g 1 2 � + ✓ ◆g 1 + wherethesequences(x ,...,x )and(x ,...,x )shouldbereadasvoidforg d 1andg 0,respectively,and thesequencex ,...,gx+1 featurde�s1decreasi1ngindicges. = � = d 1 g 1 Givenx,wed�enotebyS+(x)thesimplexatlevellwhichcontainsxandgivenanysimplexS,wedenoteitssetofvertices l byV(S).Ifvisavertexofalevel-lsimplexinthemastertree,thenS(v)denotesthesetalllevel-lsimplicesthatsharevas l acornerpoint.WealsodefineV tobethesetofallverticesatlevellofoccupiedcells. l Inthetrainingstagewecomputethevalues y(v) Average yi xi S(v) l = { : 2 l } foreachvertexofasimplexonlevellintheoccupancytree.Intheevaluationstage,ifwearegivenaninputx,wefirst determinethemaximumlevellsuchthatV(S(x)) V andthenwecompute l \ l 6=; ⌧(S(x),v,x)y(v) l l f˜(x)= v2V(SPl(x))\Vl ⌧(S(x),v,x) (8) l v2V(Sl(x))\Vl where⌧(S,v,x)isthPebarycentricweightofthepointxwithrespecttothevertexvofthesimplexS. Tosummarize:thevalueofthequerypointisfoundbyinterpolatingvertexvaluesofthesimplexcontainingit.Hence, thequeryresponsebecomesanaverageofalltrainingpointsintheneighborhoodofthequerycoordinates,evenincluding thepointsinsimplicesthatarefarawayintreedistance.Again,notethatthisalgorithmissuitedforincrementallearning: ifonegetsanewsample,onejustcomputesitsplaceintheoccupancytreeandaddsitsvaluetoalladjacentvertices.Any queryafterthatcanimmediatelyusethenewinformation. 4. Numericalexperiments Inthissectionwepresenttheresultsofthreegroupsofnumericalexperiments.Inthefirsttwosubsections,thesetsof trainingdataandtestdataeachcontain196,608datasamplescollectedduringareferenceclimatesimulationfortheyears 1961–1962usingtheoriginalparameterization.Inthefirstsubsectionwecomparetheregressiontreeswiththebenchmark neuralnetworkapproximation.Notethatthiscomparisontendstooverestimatetheadvantagesoftheneuralnetwork.First ofall,itdoesnotreflectthetrainingtime,whichisaboutaweekfortheneuralnetwork,butonlybetweenafewseconds andlessthanafewhoursforthetree-basedmethods.Second,whereastheneuralnetworkwouldprofitonlyslightlyfrom takingmoretrainingdata(itsaccuracyisbasicallylimitedbythenumberofneurons),thenon-parametricmethodsbenefits significantlyfromallowingmoredata,andlimitingthetrainingdatasizeisartificialandunnecessary.Nevertheless,we performthecomparisoninthisform,becauseitisthesametrainingdatawewillusefortheexperimentinSection5where wehavetocomplywithmemorylimitations.Asitturnsout,nearestneighbormethodsandsparseoccupancytreesdo notdelivercompetitiveaccuracy,ifappliednaïvely,buttheirperformancecanbeenhancedbydimensionreduction.We demonstratethisinsomeexperimentspresentedinSection4.2.Finally,thelastexperimentpresentedinSection4.3gives someinsightintotheinternalstructureofthedata.Here,wetrytoobtainanestimateoftheintrinsicdimensionoftheinput data.