ebook img

Anurag Verma Bayesian Population Projection PDF

21 PagesΒ·2017Β·1.5 MBΒ·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Anurag Verma Bayesian Population Projection

Probabilistic Approach of Population Projection for India and States Anurag Verma1*, Abhinav Singh2 P.S. Pundir2 , 1. Dept. of Community Medicine, IMS, BHU, Varanasi, India 2. Dept. of Statistics, University of Allahabad, Allahabad, India *presenting author, e-mail:[email protected] Abstract Projection of future population for a country by age and sex, are widely used for policy development, planning and research. They are mostly done deterministically, but there is a widespread need for probabilistic projections. In this paper we propose a Bayesian method for probabilistic population projections for India. The total fertility rate (TFR) and life expectancies at birth are projected probabilistically using Gompertz and logistic growth models respectively under Bayesian paradigm. The estimates obtained from proposed two models combined using cohort-component method to obtain age-specific projection of the population by sex. The analysis has been made using Markov Chain Monte Carlo (MCMC) technique with the software OpenBUGS. Convergence diagnostics techniques available with the OpenBUGS software have been applied to ensure the convergence of the chains necessary for implementation of MCMC. The method is illustrated by making 40- year projection using Indian data for the period 1971-2011. The study will provide probabilistic point estimates of parameter as well as the projection along with highest posterior density (HPD) interval, which is derived from population number and vital events, includes age specific death rates, life expectancies, age specific birth rate, total fertility rates and dependency ratio. Keywords: Demography, Probabilistic Population Projection, Bayesian Approach, Posterior Distribution Introduction Population Growth has become one of the most important problems in the world [1]. The idea of the future population is achieved with the help of projection. The demand of precise projected figures is always requisite for government personals, actuaries, for their social, 1 | P a ge economic planning purposes. The size of the population and the growth of a country directly affect the state of the economy, politics, culture, education and the environment, etc. In this country, and determine the cost of exploring natural sources no one wants to wait until these resources are depleted because of the population explosion. Therefore, the study of population projection has been started earlier [2-4]. Currently, there are two main approaches in statistics, such as the Frequentist and the Bayesian approach for data analysis [5]. The utilization of Bayesian approach in the field of data analysis is moderately new and has discovered mass support throughout the previous two decades to persons belonging to various disciplines. Probably the main reason behind the growing support is its flexibility and generality that allows it to deal with the complex situation. In addition, the Bayesian method is typically preferred by the classical method in estimate parameters causing intractable from of the likelihood function [5]. Difficult situation can be handled by BUGS (Bayesian analysis using Gibbs Sampling) software for its flexibility and overall approach [6]. This study is based on a Bayesian way of data analysis. Bayesian method, uncertainty in model choice is incorporated through averaging techniques. Here the resulting predictive distributions from Bayesian forecasting models have two main advantages over those obtained using more traditional stochastic models. First, uncertainties in the data, the model parameters and model choice are explicitly represented using probabilistic distributions. As a result, more realistic probabilistic population forecasts are obtained. Second, Bayesian models formally allow the incorporation of expert opinion including uncertainty into the forecasts [7, 8]. Most population projection are currently done deterministically, using the cohort component method [9, 10], this is an age and sex-structured version of the basic demographic identity that the population of a country at the next time point is equal to the population at the current time point, plus the number of births minus the number of deaths, plus the number of immigrants minus the number of emmigrants. It was formulated in matrix form by Leslie [11]. Population Projection are currently produced by many organization, including national and local governments and private companies. The main organizations that have produced population projection for all states including India is Registrar General of India. In India’s current method (RGI, 2006) [12] does not yield an assessment of uncertainty about future population. 2 | P a ge Standard population projection methods are deterministic, meaning that they yield a single projected value for each quantity of interest. However, Probabilistic projection that gives a probability distribution of each quantity of interest and hence convey uncertainty about the projections are widely desired [13-14]. In the recent past, researcher developed alternative methods which allowed for probabilistic population projection, aimed for probabilistic interpretation of each demographic factor of interest. Alho and Spencer (1985), Alho (1990) ,Cohen (1986, 1988), Pflanumer (1988), Lee (1992) and Lee and Tuljapurkar (1994),Allho (1999),Keilman (2002) show a probabilistic population projection. The comparison of deterministic and probabilistic method can be found by Lee(1998), Alho& Spencer (2006) and Stillwell&Clarke (2011). In this study expand on the work of Rahul(2007) up on is suggestion to integrate statistical and demographic methodology in performing age-specific population projection. The aim of the study is to improve current methodology in population projection for making probabilistic population projection using cohort component method under Bayesian approach for India and States. The total fertility rate and female and male life expectancies at birth are projected probabilistically using Bayesian models estimated via Markov Chain Monte Carlo under WinBUGS software using Indian population data. These are then converted to age- specific rates and combined with a Cohort Component Projection model. This yields Probabilistic projections of any population quantity of interest, the method is illustrated for four Indian state of different demographic stages, continents and sizes. In India’s current projection method dose not yield an assessment of uncertainty about future population quantities. It is somewhat subjective because the model used have been selected by the analyst from a small number of predetermined possibilities rather than estimated from the data. It is also somewhat rigid in that the set of model used is small and may not cover a full range of realistic future possibilities. To address these issues, we will develop a Bayesian probabilistic population projection method. This involves building Bayesian models to project the fertility and mortality rate, each of which produces a large number of possible future trajectories from the posterior predictive distribution. These are then input to the cohort component projection method to provide a posterior predictive distribution of any future population quantity of interest. 3 | P a ge Methodology Cohort Component Projection Method The procedure for making cohort-component population projections was developed by Whelpton in the 1930s. It is uses the components of demographic change to project population growth. The technique projects the population by age groups, in addition to other demographic attributes such as sex and ethnicity. This projection method is based on the components of demographic change including births, deaths, and migration. It can be thought of as an elaboration of the ideas encapsulated in the demographic balancing equation: π‘ƒπ‘œπ‘π‘’π‘™π‘Žπ‘‘π‘–π‘œπ‘› = π‘ƒπ‘œπ‘π‘’π‘™π‘Žπ‘‘π‘–π‘œπ‘› +π΅π‘–π‘Ÿπ‘‘π‘•π‘  βˆ’π·π‘’π‘Žπ‘‘π‘• +πΌπ‘šπ‘šπ‘–π‘”π‘Ÿπ‘Žπ‘‘π‘–π‘œπ‘› βˆ’πΈπ‘šπ‘–π‘”π‘Ÿπ‘Žπ‘‘π‘–π‘œπ‘› (1) 𝑑+𝑛 𝑑 𝑑 𝑑 𝑑 𝑑 where, Population is the population at time t, Births and Death are number of births and t t t deaths occurring between t and t+n. Immigration and Emigration are the number of t t immigrants and of emigrants from the country during the period t to t+n. This equation reminds us that there are only two possible ways of joining a population: one can be born into it or one can migrate into it. Similarly, the only ways to leave a population are to emigrate or to die. Cohort-component projections extend this concept to individual age cohorts. They make use of the fact that every year of time that passes, every member of a population becomes a year older. Thus, after 5 years the survivors of the cohort aged 0-4 years at some baseline date will be aged 5-9 years, 5 years after that they will age 10-14 years, and so on. The Age-specific fertility (the ability of an individual to give a livebirth) rates are required to project the number of births in future fertility projections, which are made by simulate a large number of trajectories of future values of the total fertility rate (TFR) and convert them to age-specific fertility rates using model fertility schedules. Next, the Age and sex-specific death rates required to project the total deaths in future mortality projections, which are made by simulate an equal number of trajectories of life expectancy at birth for females and males and convert them to age-specific mortality rates using a model life table. In this study, to simulate future values of TFR by using Gompertz model and to simulate future values of life expectancy at birth separately for both the sex by using Logistic model under the Bayesian approach. 4 | P a ge Bayesian Inference In Bayesian inference, uncertainty about the parameters πœƒ of a statistical model is described by its posterior probability distribution given observed data π‘₯ = π‘₯ ,π‘₯ ,…..,π‘₯ . The 𝑖 1 2 𝑖 probability density function of π‘₯ is obtained by using Bayes Theorem: 𝑖 𝑓(π‘₯ |πœƒ)𝑓(πœƒ) 𝑓 πœƒ π‘₯ = 𝑖 , (a) 𝑖 𝑓(π‘₯ ) 𝑖 where𝑓(π‘₯ |πœƒ) is the likelihood function and is defined by the model, 𝑓(πœƒ) is the prior 𝑖 distribution for πœƒ and 𝑓(π‘₯ ) is a normalizing constant. The prior distribution 𝑓(πœƒ) specifies 𝑖 the uncertainty about πœƒ prior to observing any data. Forecasting or prediction is particularly natural in a Bayesian framework. Uncertainty about next N future values of π‘₯ (for n=i+1,…..,i+N) is described by the joint predictive probability 𝑛 distribution 𝑓(π‘₯ ……,π‘₯ |π‘₯ = 𝑓(πœƒ|π‘₯ ) 𝑁 𝑓(π‘₯ |π‘₯ ,πœƒ)π‘‘πœƒ. (b) 𝑖+1, 𝑖+𝑁 𝑖) 𝑖 𝑗=1 𝑖+𝑁 𝑖+π‘βˆ’1 Note that the product term represent the joint predictive distribution in the case that parameter πœƒ is known. The Bayesian predictive distribution simply averages or integrates this with respect to the posterior probability distribution for πœƒ. Hence, uncertainty about πœƒ in light of the observed data is fully integrated. In a Bayesian analysis we obtain forecasts and associated measures of uncertainty by calculating marginal probability distributions for quantities of interest by integrating the posterior distribution in (a) or the predictive distribution in (b). Performing these integrations analytically is typically not possible for realistically complex models such as those described above. Historically, this has been prevented demographers and others from taking advantages of Bayesian methods for statistical inference. Recent developments in Bayesian computation have focused on Markov chain Monte Carlo (MCMC) generation of samples from distributions such as (a) and (b); see Gelman et al.(2004) for details. Once a samples has been obtained from a joint distribution, then a sample from a distribution of any component or function of components is readily available. To generate samples from the posterior and predictive distribution in this study, used an MCMC sampling approach implemented using the WinBUGS software. Probabilistic Projection of Fertility 5 | P a ge Let Y to denote TFR of India in the year t (i=1,…..,n), where i refers to TFR data in the ith i i year in a country and its initial value is taken to be 1. The four parameter Gompertz model use for the projection of Total fertility Rate may be described as follows. Assume general regression equation: π‘Œ = πœ‡ +πœ€ 𝑖 𝑖 𝑖 Where π‘Œ be the population size, in the year 𝑑 has been assumed to follow normal 𝑖 𝑖 distribution with respective mean Β΅i and common precision (=1/variance)T. Here , Β΅i is the deterministic part and πœ– is the disturbance part. Therefore, the disturbance part πœ– ~iid N(0,𝜏). 𝑖 𝑖 For the implementation of four parameter Gompertz model, we assume deterministic part as: (βˆ’πœƒ3βˆ’πœƒ4 π‘‘π‘–πœŽβˆ’π‘‘ ) πœ‡ = πœƒ +πœƒ π‘’βˆ’π‘’ 𝑖 1 2 Where, πœƒ is the lower asymptote and in our study this parameter is fixed at 1.8, πœƒ is the 1 2 upper asymptote, πœƒ is the parameter that determines the shape of the Gompertz curve, πœƒ is 3 4 the rate at which the fertility increases. To adopt Bayesian analysis, we need to provide prior distributions to all the parameters present in the model, πœƒ , πœƒ , πœƒ and 𝜏. We prefer to assign non-informative priors for all of 2 3 4 them as N(0,0.01), (variance=1/0.001) prior has been assigned to all of the parameters πœƒ , πœƒ , πœƒ and Gamma(0.00001,0.00001) to the parameter 𝜏. 2 3 4 Probabilistic Projection of Mortality Let π‘Œ denote the Life expectancy at birth of a country in the year 𝑑 (i=1,2,3,…..,n), where i 𝑖𝑗 𝑖 represents the time and j represents the sex agin its value is taken as 1. The four parameter logistic model is used for the projection of Life expectancy at birth. Let us assume the general regression equation as: π‘Œ = πœ‡ +πœ– 𝑖𝑗 𝑖𝑗 𝑖𝑗 Where, π‘Œ be the life expectancy at birth for males and females of a country in the year 𝑖𝑗 𝑑 (i=1,2,…,n), where i represents time and j represent the sex. πœ‡ is the deterministic part and 𝑖 𝑖𝑗 πœ– assumes to be error term which follows iid N(0,𝜏 ), where 𝜏 is the precision. The 𝑖𝑗 𝑗 𝑗 deterministic part of our logistic model is: 6 | P a ge πœƒ 2𝑗 πœ‡ = πœƒ + +πœƒ ;𝑗 = 1,2 𝑖𝑗 1𝑗 (1+𝑒𝑒2π‘—π‘’πœƒ3𝑗 π‘‘πœŽπ‘–βˆ’π‘‘π‘‘ ) 4𝑗 Where πœƒ is the lower asymptote, πœƒ ,πœƒ , πœƒ is the………. 1𝑗 2𝑗 3𝑗 4𝑗 In this model, the life-expectancy at birth π‘Œ in the year 𝑑 has been assumed to follow 𝑖𝑗 𝑖 normal distribution with respective means πœ‡ and common precisions 𝜏 . We need to assign 𝑖𝑗 𝑗 priors for parameter. We prefer to assign non informative priors have been assigned to all the parameters of the model.Normal(0,0.01), (variance=1/0.001) prior has been assigned to the parameters πœƒ , πœƒ , πœƒ ,πœƒ and Gamma(0.00001,0.00001) prior to the parameter 𝜏 . 1𝑗 2𝑗 3𝑗 4𝑗 𝑗 Future migration is more difficult to project that fertility or mortality. Migration can be volatile since short-term changes in economic, social, or political factors often play an important role. In this study, we assumed that the population is closed, i.e no migration takes place, or even if it does, net effect is zero. As for the sex ration at births which divide the future number of newborn into male and female, the female to male ratio is set 100:105 it is based on biological literature, and it remains consistent from 2011 onward. Data In this section, illustrate projecting method with the data for India. These data represent a case in which the counts of all population components by five year of age and sex are available. The data used to produce projection represent the three year moving average from period 1971-2013. The data on mortality and fertility rates were obtained from published statistical report of SRS. The India population obtain for Census 2001 and 2011, were 2011 census used as a baseline for prediction, was also obtained from the office of Registrar General India. To test and implement the proposed model in this study selected four states namely Gujarat, Kerala, Orissa and Uttar Pradesh besides the country as a whole. These states were selected considering their geographical and demographic diversities. While Uttar Pradesh is the largest populated state in India, Gujarat is an economically developed state but lacking far behind in terms human development indicators. Kerala ranks one while Gujarat ranks 7, Uttar Pradesh ranks 15 and Orissa ranks 17 among the states of India based on the report β€œInequality-adjusted Human Development Index for India’s States 2011” published by 7 | P a ge UNDP India. In addition to the diversities, all these selected states are situated in four different corners of the country. Results A Win BUGS program was developed to make a Bayesian analysis for the data to provide projection of the TFR and Life-expectancy at birth for India and states. WinBUGS codes of the models are given in the appendix. During the implementation of the program, we have taken two chains to run for each programme. Projection of Fertility In this study, for fertility model, we monitored four nodes πœƒ , πœƒ , πœƒ and 𝜎 = 1/√𝜏 2 3 4 separately for different dataset. Here we present the various diagnostics for Indian TFR data based on 45000 iterations after burning 5000 initial iterations. The history plots of the sample values of four nodes πœƒ , πœƒ , πœƒ and 𝜎 against iterations the 2 3 4 two chains of the model have been show in the Fig1(a-d). the mixing of the chains (in different colors) for all nodes in this model looks quite good giving us a confidence of the convergence of chains. Fig 2(a-d) presents autocorrelations for different lags for all the four nodes which decling trend with increase in lags. The value of R in the bgr diagnostic for the all nodes are also close to one as show in Fig3(a-d). the traces of the blue and green lines are stable and the red on has converged to one for all the four parameters monitored. Fig 4(a-d) shows smoothed curves of the posterior densities of the nodes. The appearance of all the curves is bell shaped indicating asymptotically normal. 8 | P a ge (a) 0 .2 ]21 [a t0 e. h8 t0 . 6 0 . 4 4999 20000 40000 iteration (b) 2 0 ].0 3- [a3 te0 h.0 t- 4 0 .0 - 4999 20000 40000 iteration (c) 0 .0 ] 42 [at0.0 e1 h t0 . 0 0 . 0 1 - 4999 20000 40000 iteration (d) 2 . 0 a m5 1 g. is0 1 . 0 5 0 . 0 4999 20000 40000 iteration Fig1 (a-d) History Plots for parameters from two chains 9 | P a ge (a):theta[2] (b):theta[3] 0 0 . . 1 1 0 0 . . 0 0 0 0 .1 .1 - - 0 50 0 50 lag lag (c):theta[4] (d):sigma 0 0 . . 1 1 0 0 . . 0 0 0 0 .1 .1 - - 0 50 0 50 lag lag Fig2 (a-d) Autocorrelation plots for parameters from two chains (a):theta[2] (b):theta[3] 0 0 . . 1 1 5 5 . . 0 0 0 0 . . 0 0 5225 10000 20000 5225 10000 20000 start-iteration start-iteration (c):theta[4] (d):sigma 0 0 . . 1 1 5 5 . . 0 0 0 0 . . 0 0 5225 10000 20000 5225 10000 20000 start-iteration start-iteration Fig3 (a-d) bgr Plots for parameters from two chains 10 | P a ge

Description:
Abhinav Singh. 2. , P.S. Pundir. 2. 1. can be handled by BUGS (Bayesian analysis using Gibbs Sampling) software for its flexibility and overall .. The BUGS book: A practical introduction to Bayesian analysis. CRC press, 2012. 7.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.