ebook img

NASA Technical Reports Server (NTRS) 20150005715: Advances in Parallelization for Large Scale Oct-Tree Mesh Generation PDF

0.76 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview NASA Technical Reports Server (NTRS) 20150005715: Advances in Parallelization for Large Scale Oct-Tree Mesh Generation

Advances in Parallelization For Large Scale Oct-Tree Mesh Generation ∗ MatthewO’Connell NASALangleyResearchCenter,Hampton,Virginia,23681 † SteveL.Karman UniversityofTennesseeatChattanooga: SimCenter,Chattanooga,Tennessee,37403 Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires usingspecial“gridgeneration”computemachinesthatcanhavemorethantentimesthememoryofaverage machines.Whilesomeparallelmeshgenerationtechniqueshavebeenproposed,generatingverylargemeshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generationofverylargescaleoff-bodyhierarchicalmeshesispresentedhere. Thisworkenableslargescale parallelgenerationofoff-bodymeshesbyusinganovelcombinationofparallelgridgenerationtechniquesand ahybrid“topdown”and“bottomup”oct-treemethod.Meshesaregeneratedusinghardwarecommonlyfound inparallelcomputeclusters. Thecapabilitytogenerateverylargemeshesisdemonstratedbythegeneration ofoff-bodymeshessurroundingcomplexaerospacegeometries. Resultsareshownincludingaonebillioncell meshgeneratedaroundaPredatorUnmannedAerialVehiclegeometry,whichwasgeneratedon64processors inunder45minutes. I. Introduction AutomatedgridgenerationtechniquesforComputationalFluidDynamicssimulationhaveseenariseinpopularity in recent years. Generating large scale meshes with these methods has, up to this point, largely been relegated to generatingthemeshona“gridgeneration”machine. Tohandlelargemeshes,the“gridgeneration”machinescanbe equippedwithmorethan10timesthememoryofaveragemachines. Largemeshesoftentakehoursorevenseveral days to generate. The power of next generation compute clusters will enable simulations using exceptionally large grids. Therehasbeenagrowinginterestinparallelizingthemeshgenerationprocesstoharnessthepowerofthenext generationofcomputinghardware. Hexagrid1hasseenincreasedperformancewiththeadditionofsharedanddistributedmemoryparallelismthrough useofthebuildingcubemethod.2DistributedmemoryparallelizationhasalsobeenimplementedinBoxerMesh3and PHUGG.4 These methods have been used to generate meshes on the order of tens to hundreds of millions of cells. In addition to these recent advancements, higher grid resolutions are needed to support high fidelity problems such as Large Eddy Simulations (LES) and aeroacoustic simulations. In these problem domains, meshes approaching or exceedingbillionsofcontrolvolumeswillbeneededforcomplexgeometries. Manycommonautomaticmeshgenerationtechniquesarebasedonhierarchicaltreesforoff-bodygridgeneration combinedwithanearbodygriddingtechnique.Sometechniquesincludecut-cellmethodssuchasinCart3D,5SPLIT- FLOW,6andPHUGG.Othertechniques,suchasHexagrid,projectnodesfromtheoff-bodymeshontothegeometry.7 FASTAR8 was developed using a type of binary tree called a split-tree. It uses tetrahedral meshing to combine a hierarchicaltechniqueforoff-bodygridstoabodyconformingviscousmeshgeneratedusingextrusion. Hierarchicalmeshgenerationmethodsaregenerallyimplementedaseithertopdownorbottomupmethods.Intop downmethods,themeshisgeneratedandthetreeisproduced,whiletraversingdownthetreestructurelevelbylevel. Inabottomupmeshgenerator,dataatthelowestlevelofthetreeisgeneratedfirst. Theremainingmeshisgenerated asthetreedatastructureismadelevelbylevelprogressingupthetree. Originally,topdownmethodswerecommon, thoughrecentlybottomupmethodshavebeenpopularizedbyDawes.3 Ratherthanuseastrictlytopdownorbottomupmethod, Betrointroducedahybridmethodthatbeginsasatop downapproachandfinishesasbottomup.8 Thatworkwasdoneusingaspecializedbinarytreecalledasplittree. In ∗InternEmploymentProgramResearchTrainee,ComputationalAeroSciencesBranch,MailStop128,AIAAStudentMember. †Professor,ComputationalEngineeringDepartment,AIAAAssociateFellow. 1of13 AmericanInstituteofAeronauticsandAstronautics recentyearsanumberofresearchershavedevelopeddynamicallyloadbalancedparalleloct-treemethods. Octor9and p4est10arebothexamplesofparalleloct-treecodes,whichusespace-fillingcurvestorepartitionanddynamicallyload balanceduringthemeshgenerationprocess. Thisworkpresentsaparallel,hybridtopdownandbottomupapproachforuseinoff-bodymeshgenerationusing oct-trees. Thisisincontrasttootherparallelgridgenerationtechniques,whichareeitherstrictlytopdownorbottom up. Thisworkdifferfromthosemethodsbyusingbothtopdownandbottomuptreetraversalsduringthegeneration process. Thetechniquebeginsbygeneratingacoarsemeshbysubdividingelementsinanoct-treebasedonaninput geometry. The oct-tree is only refined until it reaches an intermediate depth creating a coarse mesh. The coarse meshisthenpartitionedbeforerefinementcontinuestopdown,inparallel,tocreatefullresolutionspacingaroundthe geometry. Finally, a smoothvariation incellsize isachieved byenforcingquality rulesin parallelbytraversing the oct-treelevelbylevelbeginningatthebottom. Thispaperfocusesonparallelizationofanoff-bodymeshgenerationtechnique. Off-bodymeshgenerationmust be combined with a near body technique to provide a complete solution for high fidelity CFD problems. There are manyoptionsforanearbodytechniqueincludingcut-cell,immersedboundary,andcreatingabodyconformingmesh throughprojection. Thenearwallapproachwillbethesubjectofafuturepaper. II. HierarchicalMeshGeneration Hierarchical meshing techniques use spatial trees whose elements are voxels. The term voxel, which stands for volumetric pixel, comes from the field of computer graphics. Each voxel contains only a Cartesian bounding box, referredtoasanextentbox,theindexoftheparentvoxel,andtheindicesofanyofthevoxel’schildren. Nounique identifiersarestoredforthenodesofthemeshastheyarenotneededbythemeshgenerationprocess,buttheycanbe generatedwhenthemeshisexportedtocomplywithfileformatstandards. Thevoxelsareimplementedinthiswork usingaverysimpledatastructure,whichconsumesonly84bytesofmemory. Therootelementofanoct-treeencompassestheentirespatialdomainspannedbytheoct-tree. Childrenofavoxel areobtainedbydividingaparentvoxelintoeightequaloctants. Thechildofavoxelhasthesameaspectratioasits parent. Theworkpresentedhererequireseachvoxeltobeisotropic. Therootvoxelisinitializedasisotropicmaking all voxels isotropic. Requiring all voxels to be isotropic allows for the size of a voxel to be measured as simply the lengthofoneofitsedges. Thislengthisreferredtoasthecharacteristiclengthofthevoxel. Thecharacteristiclength ofavoxeliscomparedtouserdefinedspacingtodeterminewhetherornotavoxelshouldsubdivide. Voxelsthatdo nothavechildrenareleavesinthetreeandbecomecellsinthemesh. Thoughisotropicelementshaveidealquality,theyarenotsuitedforallregionsofthesimulation. Mostnotably, isotropicelementsarenotsuitedinboundarylayersofhighReynoldsnumberflow.Usingisotropicelementstoresolve aboundarylayerwouldbeverycostly. Hierarchicalgridgenerationisusuallyusedonlyintheoff-bodyregionwhere isotropicelementsarebettersuited The oct-tree based methods lack precise control over grid spacing. Instead, these tools favor automation and speed over precision. However, it can be ensured that the spacing in the mesh will be at least as small as the user requested. Figure1demonstratesthisin2Dusingaquad-tree. Inthisexample,smallspacingisrequestedinsidethe Cartesianalignedregionoutlinedasthedashedblueline. RegionsthatareCartesianalignedcanbedescribedusing justtwocoordinatesspecifyingthelocationoftwooppositecornersofabox. Theywillbereferredtoasextentboxes. Subdivisionoccurredinsidethemarkedextentbox,butthissubdivisionalsodecreasedspacingoutsidetherequested region. Itisalsonotpossibletoachievepreciselysizedspacing. ThisisillustratedinFig. 2. Forexample,arequested spacingof0.33isincompatiblewithameshwheretherootvoxelhadacharacteristiclengthof1. Subdividingtheroot voxelwouldyieldspacingsof0.5,0.25,0.125,andsoon. 2of13 AmericanInstituteofAeronauticsandAstronautics Figure2. Arootvoxelofunitlengthcannotproducetheuser Figure1. Extentboxusedtodefinearegionofrefinementin definedspacingof0.33representedbythebluesquare. The themesh. Noticethatduetothequad-treedatastructure,ad- meshissubdivideduntilthespacingisatleastassmallasthe ditionalvoxelsoutsidetheextentboxwerealsorefined. userrequested.Inthiscasethemeshsubdividedcreatingvox- elsoflength0.25. A. GeometryInput Themeshgenerationprocessbeginsbyreadinginastereolithographyfile. Stereolithographyfilesfollowasimple formattostoreacollectionoftrianglescalledfacets. Thefilecontainsnoadjacencyinformationandactsasalowest common denominator format for faceted surface information. Stereo lithography files make no guarantee that the geometrysurfaceismanifold. Infact,thesefilesoftencontaingapsoroverlappingtriangles. Gapscanbehandledso long as they are smaller than the local mesh spacing. Any gaps larger than the local mesh spacing are identified as geometry features. The mesh generator must remove elements of the mesh, which are not in the flow region. With externalflowcasesthismeansremovingelementsthatareinsidethegeometry. Itispossiblethatoverlappingtriangles could make it difficult to categorize regions of the mesh as inside or outside the geometry. In practice this problem hasnotbeenencountered. Furthermore,geometryfacetsdonotneedtohaveaspecifiedorientationorevenuniform orientation. B. MeshGeneration Themeshgenerationprocessbeginswithinitializingtherootvoxeltobelargeenoughtoencompasstheentiremesh domain. Once the root voxel is initialized, geometry facets are then used to recursively subdivide voxels until they reachaminimumsizethreshold. Eachfacetisprocessedbeginningattherootvoxel. Theextentboxofthevoxelis comparedtothefacet. Ifanypartofthefacetisinsidethevoxel,itisconsideredascrossingthatvoxel. Ifavoxel’s spacingislargerthanthespacingsetbytheuser, itsubdividescreating8newchildvoxels. Thissubdivisionoccurs onlyoncepervoxel. Ifanotherfacetlaterisdeterminedtobeinsidethevoxel,itdoesnottriggerasecondsubdivision. Afternewchildvoxelsarecreated,thosechildrenarecomparedtothefacetandthealgorithmrepeatsrecursivelydown thetree. Thesesimplerulesareappliedeachtimeavoxeliscomparedtoafacet. Eachrecursivecallendswhenthe facetiscomparedtoavoxel,whichhasacharacteristiclengththesamesizeorsmallerthantheuserrequestedspacing. Theprocessthenrepeatsforeachgeometryfacetcreatinganoct-treewithdifferentsizedvoxelsforeachdepthinthe tree. Cellsthatarecompletelycontainedwithinageometryareoutsidethecomputationaldomainandaredeleted,but firstthesecellsmustbeidentified. Anycellthatcontainsageometryfacetislabeledascrossingthegeometryduring theinitialsubdivisionphase. Alltheremainingcellsarenotlabeled. Thesecellsareeitherinsidethegeometryand will be deletedor they are outside the geometryand will be part of the final mesh. Thecells tagged as crossing the geometryseparatecellsinsidethegeometryfromthoseoutside.Thecollectionofcellstaggedascrossingthegeometry willdefinedaclosedregioninthemeshaslongastheinputgeometrydoesnotcontainlargegaps. Afloodfillalgorithmisusedtomarkallcellsthatareoutsidethegeometry. Thefloodfillalgorithmbeginswitha cellthatisknowntobeoutsidethegeometry. Thiscellischosenbyselectingavoxelinthemeshthatdoesnotcontain geometryasfollows. Sixraysarecastfromthecentroidofthevoxel,oneineachofthecardinaldirections. Ifallthe rayspassthroughanevennumberofgeometryfacets,thenthecellisoutsidethegeometry. Iftheallrayspassthrough anoddnumberofgeometryfacets,thenthecellisinsidethegeometryandanothercellmustbeselected. Overlapping triangles or gaps in the input geometry may cause rays to not agree on inside or outside status. If this happens the cell is discarded and another is selected. The method continues testing voxels until it finds a cell, which is outside 3of13 AmericanInstituteofAeronauticsandAstronautics thegeometry. Onceacellisfound,whichisoutsidethegeometry,thefloodfillprogressesrecursivelytoneighboring cells.Thefloodfilldoesnotprogresstocells,whicharemarkedascontaininggeometry.Afterthefloodfillterminates, each cell is either marked as containing geometry, marked as outside the geometry, or left unmarked. The user can selecttogenerateaninternalorexternalgrid. Forexternalcasesallunmarkedcellsareimplicitlyidentifiedasinternal tothegeometry,andthesecellsaredeleted. Foraninternalgrid,allexternalandoutsidecellsaredeleted. However, whengeneratinginternalcases,eachcellmustbedeterminedeitherinsideoroutsidethegeometry. Thenoutsidecells mustbedeleted. Theresultingmeshisreferredtoasthebasemesh. C. GradationEnforcement AnexampleofabasemeshcanbeseeninFig. 3. Noticethatthebasemeshhaslargecellsadjacenttomuchsmaller cells. Thenodesthatmakeupcornersofsmallcellsaresometimesalsocornernodesoflargercells. Therearealso cornernodesofsmallcells, thatareinsteadonasideofalargercellandnotitscorner. Thesenodesarecommonly calledhangingnodes. Themorehangingnodesasidehasthelargerthecellsizejumpacrosstheface. Thisquickcell gradation creates problems for many simulation techniques and so, for the final mesh, restrictions are placed on the growthofcellsizes. Thesequalitiesarenotenforcedonthebasemesh;theyareonlyenforcedontheparallelmesh. Figure3.AbasemeshforaPredatorgeometry.Manylargecellshaveseveralhangingnodes. Smoothcellsizegradationisenforcedwithtwoqualitycharacteristics. Thefirstcharacteristicisthatnocellhas morethanonehangingnodeperedge. Thisrestrictioniscommonamonghierarchicalmeshgeneratorsandisreferred toasa4:1neighborgradation(2:1in2D),orsometimessimplythebalancingrule. Aviolationofthebalancingrule isillustratedinFig. 4. ThelargevoxelontherightinFig4ahasthreeneighboringvoxelsonthesameside. Thelarge voxel’sleftsidehastwohangingnodes. Thebalancingrulerequiresthateachvoxelsidehasatmostonlyonehanging node. Figure4bshowsthesamesetofvoxelsafterasubdivisionrefinementhasbeenperformedonthelargestvoxel. Noweachvoxelhasatmostonehangingnodeonanygivensideandthebalancingruleisnowbeingsatisfied. Thesecondqualitycharacteristicthatisenforcedisaminimumthicknesslayer. Aregionofameshwiththesame sizecellsisreferredtoasalayer.Everycellofagivensizeneedstobeadjacenttoothercellsofthesamesizesuchthat thereareatleastncellsofaparticularsizebeforecellsizechanges. Theparameternissetbytheuseranddescribes the minimum thickness of a layer. Typical values of the thickness parameter are between 4 and 10. Similar to the minimumspacingparameter,aspecificthicknessofalayercannotbeguaranteed. Layersareguaranteedtobeatleast 4of13 AmericanInstituteofAeronauticsandAstronautics ncellsthick. However,becauseofhowvoxelscansubdivide,layersmaybethickerinsomeregions. Anexampleof layerthicknessisillustratedinFig. 5. (a)Aviolationofthebalancingrule. (b)Thebalancingruleenforcedbysubdivision. Figure4. Thebalancingrulerequiresthatnocellshavemorethanonehangingnode. Theleftimageshowsalargecellwithtwohanging nodesviolatingthebalancingrule.Ontheright,thebalancingruleisfollowedaftersubdividingthelargecell. Figure5.Thicknessparameterofthree.Thelayerbetweensmallandlargecellsizeisfourcellsthickduetotheoct-treedatastructure. Theenforcementofboththebalancingruleandlayerthicknessoccurssimultaneouslyandishandledinabottom upsweep. Thetechniquebeginsatthedeepestleveloftheoct-treecorrespondingtocellsofthesmallestspacing. An extentboxiscreatedaroundeachofthesecells.Eachextentboxisenlargedusingthegradationparameterndescribed above. Onceenlarged,allextentboxesareusedforrefinementuntilallcellsinsidetheextentboxesmatchthesmallest spacing. Thisstepcreatesalayerofthesmallestcells. Afterallthecellsinthesmallestlayerhavebeencreated,thebalancingruleisenforcedonallvoxelsadjacentto thenewlycreatedlayer.Thisisdonebyenlargingextentboxesforallcellsatthesmallestspacing.Whenenforcingthe balancingrule,extentboxesareonlyenlargedenoughtocaptureneighboringcells. Neighboringcellsaresubdivided untiltheysatisfythe4:1requirement.Now,therearenocellsatthesmallestspacingthatareadjacenttoanycellthatis greaterthan8timesitssize,andeachlayerofcellsofthesamesizehastheproperthickness. Theprocessisrepeated ateachdepthofthetreebeginningatthedeepestlevelofthetreeandproceedingtothehighest. III. ParallelImplementation Parallel mesh generation is the primary goal of this work. This implementation uses MPI to run on distributed memorymachines. Unlikecommonserialmeshgenerationmethods,theparallelimplementationrequiresnospecial hardwarebeyondcommonclustercomputenodes.Thereisnoneedfora“gridgeneration”machinewithexceptionally largememory. Theparallelgridgenerationprocessisdescribedbelow. Everyprocessorreadsinthegeometryandgeneratesthesamebasemeshinparallel. Theminimumcellspacing for the base mesh is determined by a user defined maximum depth. The maximum depth of a tree is the maximum numberoftimestherootvoxelissubdivided.Thebasemeshisgenerateddowntothismaximumdepth.Fortheresults presentedinthenextsection,atypicaldepthof7wasused. Afterthebasemeshisgenerateddowntothemaximum depth,cellsidentifiedasinternaltothegeometryareremovedtosavememory. Theleafvoxelsinthebasemeshare thenassignedtopartitionsusingMETIS.11 Sinceeachprocessorgeneratesthesamebasemesh,nocommunicationis requiredtopartitionthemesh. EachprocessorreceivesthesamepartitioningfromMETIS. 5of13 AmericanInstituteofAeronauticsandAstronautics Other parallel meshing techniques also generate base meshes down to a user desired maximum depth; however, thesetechniquesenforcequalityrulesinthebasemesh.4,8 Thesetechniquescanbedescribedastopdownmethods sincetheybeginatthetopofthetreeandgeneratethemeshastheytraversedown.Thetechniqueproposedherediffers fromthosemethodsbygeneratingonlythesmallestpossibletreestructurethatcontainspiecesofthegeometry.Notice inFig. 3thecellsinthemeshwithseveralhangingnodes. Inthebasemesh,thesehangingnodesareallowedsincethe basemeshisnotusedasthecomputationalmeshfortheCFDsolver. Itisonlyafterthegeometryhasbeenresolved byatopdowngenerationofthebasemeshthatthequalityrulesareenforced,thistimebytravellingupthetree. This techniquecanbedescribedasahybridmethodthatbeginswithafast“topdown”sweeptobuildtheskeletonofthe treefollowedbya“bottomup”sweeptofillthetree. The base mesh serves two purposes. One copy of the base mesh is used as a map describing what processor is responsible for each region of the domain and a second copy of the base mesh is used as a starting point for parallel mesh generation. Each processor retains a copy of the base mesh and the partitioning information. Using thisinformation,processorscandeterminewhatotherpartitionsneedtorefineevenwithlargeextentboxesthatspan severalpartitions. Each processor must maintain the tree structure that supports the region of space it is assigned, but any other sectionsofthemeshcanberemoved. Figures6–8representaonedimensionalbasemeshgeneratedwithabinary treeandthestepstakentopartitionthismeshontwoprocessors. Themeshcontains10cellscorrespondingtothe10 leavesinthetreeinFig. 7. Bypartitioningtheleafvoxelsofthebasemesh,eachprocessorisassignedspatialregions ofthecomputationaldomain. Inthiscase,thespaceoccupiedbytheleft5voxelsisidentifiedasbeingownedbyone partition, the space occupied by the right 5 voxels to another. In Fig. 7, voxels in the tree that are marked as blue willbestoredstrictlyontheprocessorresponsiblefortheblueleafvoxelswhileredvoxelswillneedtobestoredonly ontheotherprocessor. Voxelscoloredpurplehavebothredandbluedescendantsandwillneedtobestoredonboth processors. Anypartofthetreethatisnotneededbythehostprocessorisdeleted. ThisisillustratedinFig. 8. Figure6.Aonedimensionalbasemesh.Cellscoloredblueareassignedtoonepartition,redcellsareassignedtoanother. Figure7.Arepresentationofthetreeforthebasemeshpartitionedontotwoprocessors.Bluevoxelsonlyneedtobestoredontheprocessor responsibleforbluecells.Redvoxelsonlyneedtobestoredontheprocessorresponsibleforredcells.Purplevoxelshavebothredandblue descendantsandmustbestoredonbothprocessors. Afterremovingtheunnecessaryvoxels,everyprocessorhasacoarsemeshcoveringthedomainthattheprocessor isassigned. Thecoarsemesheswillberefinedtocreatethepartitionedcomponentsofthefinalmesh. Everyprocessor again uses geometry facets to refine its mesh. During base mesh generation, cells containing geometry are refined toamaximumdepth. Afterpartitioning,cellscontaininggeometryarerefineduntiltheymatchtheglobalminimum spacingrequestedbytheuser. Cellsinternaltothegeometrymustbedeleted. Afloodfillalgorithmisagainusedto marktheseinternalcells. No communication is used for the flood fill; instead it is performed by each processor, The flood fill does not progresstoneighboringpartitions. Eachprocessormaycontainmultipledisjointregionsofthedomain;bothinternal andexternalcellsmustbeidentified. Whenallcellshavebeenidentifiedaseitherinsideof,outsideof,orcrossingthe geometry,thecellsoutsidetheflowdomainaredeleted. 6of13 AmericanInstituteofAeronauticsandAstronautics Figure8.Afterpartitioning,eachprocessorremovesunusedvoxelsfromtheirlocaltree. Formanynearbodytechniques,suchasthosebasedoncut-cellsorimmersedboundary,onlystrictlyinternalcells need to be deleted. However, for other techniques, such as those that project off body elements, cells that cross the geometry also must be deleted. If the layer parameter is less than 3, deleting crossing cells may create cells in the off-body mesh that do not meet the user specified minimum spacing. To ensure that cells of the minimum size are exposedtothegeometry,anextralayerofminimumspacingcellsiscreated. Extentboxesarecreatedfromallcells thatcontainthegeometry.Theextralayerofminimumspacingcellsonlyneedstobeonecellwidththick,sotheextent boxes are enlarged enough to cross into neighboring cells. Cells inside these extent boxes are subdivided until they havecharacteristiclengthsthatmeettheminimumspacing. Someoftheextentboxeswillcrossintootherpartitions. Processors can identify if any extent boxes cross into regions of the domain owned by other processors by using a savedcopyofthebasemeshandthepartitioninginformation. Anyextentboxthatcrossesintothedomainownedby anotherprocessorissenttothatprocessoralongwiththerequestedspacing. Thereceivingprocessorrefinesthemesh makingallcellswithintheextentboxmatchtherequiredspacing. Thelaststepinthemeshgenerationprocessistoenforcethebalancingruleandtocreatetheproperthicknessin spacinglayers. Enforcementofthesegradationqualitiesbeginsatthemaximumdepthacrossallprocessors. Spacing layersarecreatedbyusingextentboxesfromcellsatthesmallestspacing. Theseextentboxesareenlargedandused to subdivide all the cells inside the extent box down to the smallest spacing. If an extent box crosses into domains ownedbyotherprocessors,theextentboxandtherequestedspacingiscommunicatedtothoseprocessors. Onceeach processorcompletestherefinementforthedeepestlevel,theprocessrepeatsforthenextdeepestandcontinuesuntil thetopofthetreeisreached. Figures9and10illustratethegradationphase. Figure9showsthatthelayerofsmallest spacingcellshasbeencreated. Figure10isthefinalmesh. Allthecelllayershaveaminimumthicknessof4cells. Figure9.Thesmallestspacingcellshavethepropersizedlayer Figure10.Thefinaloff-bodymeshwithallthequalityenforce- basedontheuserrequestedsize. ments. 7of13 AmericanInstituteofAeronauticsandAstronautics Figure11.A61millioncellmeshforafighteraircraftgeneratedon8processorswithpartitionscolored. IV. GridGenerationExamples This section presents some meshing examples generated using this parallel mesh generation technique. The ex- ample meshes are generated using either an F-22 fighter aircraft geometry, a Predator Unmanned Aerial Vehicle (UAV) geometry, or a simple pipe geometry. Both the F-22 and the UAV geometry files can be obtained from www.GrabCAD.com. A. FighterAircraft ThefirstexamplemeshwasgeneratedaroundanF-22fighteraircraftgeometry. Themeshisrelativelysmall,consist- ingofonly61millioncells. Thisexamplewasgeneratedusing8IntelX5355CPUseachwith4GBofmemory. Each CPU ran one MPI process. An overview image of the mesh and input geometry is shown in Fig. 11. The off-body grid generator was configured to delete crossing cells and create a small gap region between the geometry and the off-bodymesh. Therearejustunderonemillionfacets,whichmakeupthisgeometryandtheoff-bodytechniquehas noproblemswiththisscaleofcomplexity. AdditionalslicesareshowninFigs. 12and13,includingcloseupimages ofthelandinggear. Noticethatthegapregiondoesnothaveauniformsize. Thesurfaceisdiscreteandcanhavelarge gapsnearconcaveregionsascanbeseenforthecaseinFigure13. Figure12.F-22geometrywithoff-bodymeshgeneratedon8processors.Eachpartitionedmeshdomainisassignedadifferentcolor. B. UnmannedAerialVehicle ThetechniquewasalsousedtogenerateaonebillioncellmeshforaPredatorUAVgeometry. Thecasecompletedin under 45 minutes using 64 Intel X5355 CPUs each with 4GB of memory. Each CPU ran one MPI process. A slice of one domain is shown in Fig. 14. This case illustrates some of the problems with very large meshes, it was quite challengingtovisualizethemesh.Thefullmeshcouldnotbeloadedontoastandardworkstationcomputer.Itrequires over 20GB of memory to store just the coordinate locations of the points in the mesh. To overcome this limitation, 8of13 AmericanInstituteofAeronauticsandAstronautics Figure13.Themeshnearthelandinggear. eachprocessorwroteoutaseparatevisualizationfileforthemeshinitspartition. Thesefilescouldthenbeloadedone atatimeintoparaview. Figure14isasliceofoneofthosepartitions. A1.6billioncellmeshwasgeneratedforthisgeometrybyincreasingthethicknessofeachspacinglayer. Onthe samehardware,thismeshtook2hoursand12minutestocomplete. Figure14.AsliceofabillioncellmeshgeneratedforaPredatorUAVgeometry. C. InternalCurvedPipe Internal grid cases are also supported by removing cells marked as external rather than internal during the flood fill step. Figures15and16showaninternaloffbodymeshofacurvedpipewheretheexternalandcrossingcellshave beenremoved. Figure15showsjusttheinteriormesh,and16showsacutawayofthemeshalongwiththegeometry showninblue. Noticethattheinternalgridhassimilarqualitiesastheexternalgrid. Thegapregionbetweenthemesh andthegeometryisjaggedasthediscretemeshapproximatesthesmoothcurveofthepipegeometry. 9of13 AmericanInstituteofAeronauticsandAstronautics Figure15.Ameshinsideacurvedpipe.Theexternalcellsand Figure16. Acutawayofacurvedpipegeometry(blue)and cellsthatcrossthepipehavebeenremoved. theinteriormeshgeneratedinsideofit. V. PerformanceResults Oneofthemostimportantperformancemetricsishowthemethodperformswithincreasingcellcount. Thereare twotunableparameterstoincreasethecellcount. Thefirstparameteristheminimumcellsizerequestedbytheuser definingthespacingofcellsnearthegeometry. Thesecondparameterthatwillincreasecellcountisthenumberof cells in a refinement layer. Figure 17 shows the runtime required when increasing the cell count by decreasing the minimumspacing. Figure18showsruntimerequiredwhenincreasingthecellcountbyincreasingthenumberofcells requestedinarefinementlayer.Themethodscaleslinearlywhenincreasingcellcountbydecreasingtheminimumcell size,butthemethoddoesnotscalelinearlywhenincreasingthethicknessofrefinementlayers. Decreasingminimum cellsizeisalocaloperationrequiringonlylocalsubdivision. However,increasingthethicknessofrefinementlayers isanon-localoperation. Thereisadditionalworkforeachcellinarefinementlayer. Eachcellatagivendepthcreates anextentboxwherespacingofthemeshmustbeenforced. Figure17.Runtimescalabilityforincreasingcellcountbyde- Figure18. Runtimescalabilityforincreasingcellcountbyin- creasingtheminimumcellsize. creasingthethicknessofrefinementlayers. Anotherimportantaspectofperformanceistheparallelscalability. Itisdesirableforparallelalgorithmstohave scalableperformancetomakeefficientuseofcomputeclusters. Figure19showsparallelspeedupforastrongscaling caseasthenumberofCPUsisincreased. Themeshgeneratedcontained50millioncellsaroundtheUAVgeometry. This 50 million cell mesh is representative of the largest mesh that could be achieved on one processor using the hardwarementionedabove. Ittook22minutestogeneratethismeshonasingleprocessor. Whileruntimedecreased to117secondsusing32processors,theparallelperformanceisnotideal. Factorslimitingperformancearediscussed laterinthesection. Figure20showsscalabilityfora100millioncellmeshgeneratedwithbetween2and64processors. Speedupwas computed by comparing the parallel run times to the serial mesh generation. While running in serial, the algorithm doesnotneedtoperformpartitioningofthebasemesh. Partitioningthebasemeshrequiresbuildingupconnectivities needed to call METIS, calling METIS to generate the partition and then deleting voxels from the local tree that are owned by non-local processors. This series of steps accounts for about 16% of the total run time while running in parallel. Parallelperformanceisalsohinderedbypoorloadbalancingsincetheinitialbasemeshiscoarse. Allcellsinthe 10of13 AmericanInstituteofAeronauticsandAstronautics

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.