Table Of Content

Author’s Accepted Manuscript Parallel implementation of simulated annealing to reproducemultiple-pointstatistics A. OscarPeredo, JuliánM. Ortiz PII: S0098-3004(10)00340-7 DOI: doi:10.1016/j.cageo.2010.10.015 Reference: CAGEO2485 To appearin: Computers&Geosciences www.elsevier.com/locate/cageo Receiveddate: 12 March 2010 Reviseddate: 16 October2010 Accepted date: 20 October2010 Cite this article as: A. Oscar Peredo and Julián M. Ortiz, Parallel implementation of simulated annealing to reproduce multiple-point statistics, Computers & Geosciences, doi:10.1016/j.cageo.2010.10.015 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscriptwillundergocopyediting,typesetting,andreviewoftheresultinggalleyproof beforeitispublishedinitsfinalcitableform.Pleasenotethatduringtheproductionprocess errorsmaybediscoveredwhichcouldaffectthecontent,andalllegaldisclaimersthatapply tothejournalpertain. Parallel implementation of simulated annealing to 1 reproduce multiple-point statistics 2 3 Oscar Peredo A.a, Juliań M. Ortizb 4 aDepartment of Computer Sciences, Universidad de Chile, Chile 5 bALGES Lab, Advanced Center for Mining Technology, University of Chile, Chile 6 Department of Mining Engineering, University of Chile, Chile 7 Abstract 8 Thispapershowsaninnovativeimplementationofsimulatedannealinginthe 9 contextofparallelcomputing. Detailsregardingtheuseofparallelcomputing 10 through a cluster of processors, as well as the implementation decisions, 11 are provided. Simulated annealing is presented aiming at the generation 12 of stochastic realizations of categorical variables reproducing multiple-point 13 statistics. 14 The procedure starts with the use of a training image to determine the 15 frequencies of occurrence of particular configurations of nodes and values. 16 These frequencies are used as target statistics that must be matched by the 17 stochastic images generated with the algorithm. The simulation process con- 18 siders an initial random image of the spatial distribution of the categories. Email addresses: [email protected](OscarPeredoA.), [email protected](Juliań M. Ortiz) Preprint submitted to Computers & Geosciences November 8, 2010 19 Nodes are perturbed randomly and after each perturbation the mismatch 20 between the target statistics and the current statistics of the image is calcu- 21 lated. The perturbation is accepted if the statistics are closer to the target, 22 or conditionally rejected if not, based on the annealing schedule. 23 The simulation was implemented using parallel processes with C++ and 24 MPI.Themessage passingscheme wasimplemented usingaspeculative com- 25 putation framework, by which prior to making the decision of acceptance or 26 rejection of a proposed perturbation, processes already start calculating the 27 next possible perturbation at a second level; one as if the perturbation on 28 level one is accepted, and another process as if the proposed perturbation is 29 rejected. Additional levels can start their calculation as well, conditional to 30 the second level processes. Once a process reaches a decision as to whether 31 accept or reject the suggested perturbation, all processes within the branch 32 incompatible with that decision are dropped. This allows a speed up of up 33 to log (p + 1), where n is the number of categories and p the number of n 34 processes simultaneously active. 35 Examples are provided to demonstrate improvements and speed ups that 36 can be achieved. 37 Key words: Geostatistics, Stochastic Simulation, Multipoint, Speculative 2 38 Computation, Parallel Computing 39 1. Introduction 40 Numerical modeling with geostatistical techniques aims at characterizing 41 natural phenomena by summarizing and using the spatial correlation for the 42 statistical inference of parameters of the distribution of uncertainty at un- 43 sampled locations in space. In simulation techniques, this spatial correlation 44 is imposed into a model commonly constructed on a regular lattice. The 45 models must reproduce the statistical (histogram) and spatial distribution 46 (variogram or other spatial statistics) and their quality is often judged in 47 terms of the reproduction of geological features (Ortiz and Peredo, 2009). 48 Conventional techniques in geostatistics address the modeling using sta- 49 tistical measures of spatial correlation that quantify the expected dissimilar- 50 ity(transitiontoadifferent category) between locationsseparated byagiven 51 vectordistance, inreferencetoagivenattribute,suchasthefacies, rocktype, 52 porosity, grade of an element of interest, etc. This is done using the vari- 53 ogram. Limitations of these techniques have been pointed out in that they 54 only account for two locations at a time when defining the spatial structure. 55 Much richer features can be captured by considering multiple-point statis- 56 tics that consider the simultaneous arrangement of the attribute of interest 3 57 atseveral locations,providingthepossibilitytoaccountforcomplexfeatures, 58 such as hierarchy between facies, delay effects, superposition, curvilinearity, 59 etc. There are several approaches to simulate accounting for multiple-point 60 statistics. Modifications of conventional methods to impose local directions 61 of continuity using the variogram is a simple approach to impose some of the 62 complex geological features (Xu, 1996; Zanon, 2004). Object based meth- 63 ods and methods inspired in the genetic rules and physics of the deposition 64 of sediments in different environments also seek to overcome the limitations 65 of conventional categorical simulation techniques, with significant progress 66 (Deutsch and Wang, 1996; Tjelmeland, 1996; Pyrcz and Strebelle, 2008). 67 Presently, the most popular method is a sequential approach based on 68 Bayes’ postulate to infer the conditional distribution from the frequencies of 69 multiple-point arrangements obtained from a training image. This method, 70 originally proposed by Guardiano and Srivastava (1993), and later efficiently 71 implemented by Strebelle and Journel (2000), is called single normal equa- 72 tion simulation (snesim) (see also Strebelle, 2002). This method has been 73 the foundation for many variants such as simulating directly full patterns 74 (Arpat and Caers, 2007; Eskandari and Srinivasan, 2007) and using filters to 75 approximate the patterns (Zhang, Switzer and Journel, 2006). The use of 4 76 a Gibbs Sampling algorithm to account directly for patterns has also been 77 proposed (Boisvert, Lyster and Deutsch, 2007; Lyster and Deutsch, 2008). 78 A sequential method using a fixed search pattern and a ‘unilateral path’ also 79 provides good results (Daly, 2005; Daly and Knudby, 2007; Parra and Ortiz, 80 2009). Otherapproachesavailable consider theuse ofneural networks (Caers 81 and Journel, 1998; Caers and Ma, 2002), updating conditional distributions 82 withmultiple-pointstatisticsasauxiliaryinformation(Ortiz,2003;Ortizand 83 Deutsch, 2004; Ortiz and Emery, 2005) or secondary variable (Hong, Ortiz 84 and Deutsch, 2008), and simulated annealing (Deutsch, 1992). 85 Simulated annealing provides a very powerful framework to integrate dif- 86 ferent types of statistics, and potentially, generate models subject to con- 87 straints that cannot be handled by other methods. This motivates studying 88 approachestospeeduptheiterativeprocessinvolvedinannealingsimulation. 89 The recent increase in availability of powerful multiple-processor computers 90 andmultiple-core centralprocessing units(CPU), aswell asthe useofgraph- 91 icsprocessingunits(GPU)forparallelizingthecalculationsrequiredforsome 92 heavycomputingtasks, motivatesresearchingnewapplicationsinthisframe- 93 work and lets us revisit algorithms that were too demanding for the technol- 94 ogy existing a few years ago. In this paper, we explore a known paradigm in 5 95 Computer Science and implement simulated annealing to impose multiple- 96 pointstatistics. The approach isknown asspeculative computing, andworks 97 in a parallel computing setting, aiming at reducing the time of computation 98 for complex problems. Performance is assessed regarding speedup in compu- 99 tation time and statistical reproduction of multiple-point frequencies in the 100 simulated realizations. 101 2. Simulated Annealing 102 Simulated annealing is a general optimization algorithm that can gener- 103 ate numerical models by reducing an objective function -usually minimizing 104 a weighted sum of mismatch terms with respect to reference values-, repro- 105 ducing different spatial statistics and respecting constraints imposed in that 106 objective function (Besag, 1986; Farmer, 1992; Geman and Geman, 1984; 107 Kirkpatrick et al., 1983; Rothman, 1985). 108 In a spatial context, the algorithm can be used to generate models in 109 a lattice where each node of the grid has a value for a particular property 110 being simulated and the objective function considers some statistical param- 111 eter that relates those nodes spatially. Typical applications have been the 112 simulation of categorical variables such as facies or rock types, or the repro- 6 113 duction of continuous variables, such as petrophysical properties (porosity, 114 permeability), conductivity or concentrations of elements (Goovaerts, 1996; 115 Fang and Wang, 1997). 116 In essence, the algorithm works by perturbing one or more nodes at a 117 time of an initial model, which usually is a model with a random spatial 118 distribution of values. At every step of the process, the mismatch between 119 the current statistics of the model and those required (target statistics) is 120 quantified. If a perturbation reduces the mismatch, this means the simu- 121 lated model has statistics closer to the target ones, hence the perturbation 122 is kept. If a perturbation increases the mismatch, this means the model has 123 statisticsthataremoredifferentfromthetargetones, hence theperturbation 124 shouldberejected. However,insimulatedannealingsomeoftheunfavourable 125 perturbations are kept, in order to allow the model moving away from lo- 126 cal minima and reaching a lower mismatch later on. If a local minimum is 127 reached with a given perturbation and all unfavourable changes are rejected, 128 then it will be impossible to move out of that local minimum to get closer 129 to the global minimum. Depending on the topology of the solution space, 130 unfavourable changes should be rejected with a higher probability. The rate 131 of acceptance of unfavourable perturbations is controlled by the “annealing 7 132 schedule”, which defines the probability of acceptance of bad changes and 133 also controls the way this probability changes as the simulation progresses. 134 The annealing schedule is defined by an initial temperature and a proce- 135 dure tolower thattemperature assimulation progresses. In additiontothese 136 parameters, aperturbation mechanism isrequired, e.g. swapping nodes, ran- 137 domly perturbing one or more nodes, etc, and one or more stopping criteria. 138 The general formulation of the algorithm considers an objective function 139 of the following type: (cid:2)Nc O = w O (1) i i i=1 140 where N is the number of components in the objective function, w are c i 141 the weights assigned to each one of the components, and O is the mismatch i 142 value for component i. For example, this function could be composed by the 143 mismatch in histogram reproduction, defined as the squared difference in the 144 cumulative frequencies measured at some quantiles for the model simulated 145 versus the target histogram, a mismatch in variogram reproduction, com- 146 posed by squared differences between the target variogram model and the 147 variogram calculated from the realization being perturbed, for a number of 148 lagdistances, andamismatchinthereproductionofmultiple-pointstatistics. 8 149 In general any constraint can be ensured in a similar fashion. Conditioning 150 data can be imposed simply by not allowing the nodes to be perturbed at 151 sample locations. An important requirement to achieve a low mismatch be- 152 tween model and target statistics is that all the statistics and constraints 153 must be consistent. Since imposing the multiple-point statistics implicitly 154 defines lower-order statistics, if statistics are inferred from multiple sources, 155 some inconsistencies are expected. Most often, we do not have control over 156 the consistency of the statistics we try to impose, as these may come from 157 different sources. The use of training images as the basis for infering the 158 multiple-point statistics can generate problems when its lower order statis- 159 tics (histogram and variogram) are different than those on the sample data 160 (Ortiz, Lyster and Deutsch, 2007). 161 In general, the reproduction of a variogram map, indicator variograms, a 162 histogram of multiple-point statistics for some pattern sizes and the require- 163 ment of honouring conditional information can be imposed through elements 164 of the objective function. 165 The typical procedure for implementing the algorithm is: 166 1. Start with a spatially random distribution of values over all the nodes 167 on the lattice to be simulated, but ensuring that conditioning values at 9

Description:

Nov 8, 2010 simulated annealing to reproduce multiple-point statistics, Computers & Geosciences, doi:10.1016/j.cageo.2010.10.015. This is a PDF file of an

Parallel implementation of simulated annealing to reproduce - DIM PDF

48 Pages·2010·1.45 MB·English

by A. Oscar Peredo

Checking for file health...

Save to my drive

Quick download

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Parallel implementation of simulated annealing to reproduce - DIM

Description:

Nov 8, 2010 simulated annealing to reproduce multiple-point statistics, Computers & Geosciences, doi:10.1016/j.cageo.2010.10.015. This is a PDF file of an

See more

The list of books you might like

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.