ebook img

Logistic mixture of multivariate regressions for analysis of water quality impacted by agrochemicals PDF

16 Pages·2006·0.26 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Logistic mixture of multivariate regressions for analysis of water quality impacted by agrochemicals

ENVIRONMETRICS Environmetrics(inpress) PublishedonlineinWileyInterScience (www.interscience.wiley.com)DOI:10.1002/env.820 Logistic mixture of multivariate regressions for analysis of water quality impacted by agrochemicals Yongsung Joo1*,y, Keunbaik Lee2, Joong-Hyuk Min3, Seong-Taek Yun4 and Trevor Park2 1Biostatistics,CollegeofPublicHealthandHealthProfessions,UniversityofFlorida,Gainesville,FL32611,U.S.A. 2DepartmentofStatistics,UniversityofFlorida,Gainesville,FL32611,U.S.A. 3DepartmentofEnvironmentalEngineeringSciences,UniversityofFlorida,Gainesville,FL32611,U.S.A. 4DepartmentofEarthandEnvironmentalSciences,KoreaUniversity,Seoul136-701,Korea SUMMARY Inthispaper,westudytheimpactsoftworepresentativeagriculturalactivities,fertilizersandlimeapplication,on waterquality.Becauseofheavyusageofnitrogenfertilizers,nitrate(NO(cid:1))concentrationinwaterisconsideredas 3 oneofthebestindicatorsforagriculturalpollution.Themixtureofnormaldistributionshasbeenwidelyapplied with(NO(cid:1))concentrationstoclusterwatersamplesintotwoenvironmentallyinterestedgroups(waterimpactedby 3 agrochemicals and natural background water groups). However, this method fails to yield satisfying results becauseitcannotdistinguishlow-levelfertilizerimpactandnaturalbackgroundnoise.Toimproveperformanceof clusteranalysis,weintroducethelogisticmixtureofmultivariateregressionsmodel(LMMR).Inthisapproach, watersamplesareclusteredbasedontherelationshipsbetweenmajorelementconcentrationsandphysicochemical variables,whicharedifferentinimpactedwaterandnaturalbackgroundwater.Copyright#2006JohnWiley& Sons, Ltd. key words: mixture of regressions; model-based clustering; mixture of normal distributions; ECM; water quality;agricultural pollution 1. INTRODUCTION Ground and surface water quality has been continuously threatened by the direct input of agrochemicals, such as fertilizers and pesticides, and the associated biogeochemical reactions (HamiltonandHelsel,1995;Kraftetal.,1999).Tounderstandtheimpactofagriculturalactivitieson waterquality,investigationsonthemajorchemicalcompositionsinalluvialandbedrockgroundwater andsurfacewaterwereconductedinalluvialareasofNakdongRiverwatershed,whichisthesecond largestinSouthKorea(Figure1).Year-roundvigorousagriculturalactivitiesarethetypicalland-use pattern in these areas. Large amounts of synthetic nitrogen-fertilizers and composite fertilizers (N-P-K¼21-17-17) are applied to the farmland. Also, farmers commonly apply lime (CaO) to neutralize soils that are acidified through heavy agricultural activities. *Correspondenceto:Y.Joo,Biostatistics,CollegeofPublicHealthandHealthProfessions,UniversityofFlorida,Gainesville,FL 32611,U.S.A. yE-mail:[email protected]fl.edu Received2March2006 Copyright#2006JohnWiley&Sons,Ltd. Accepted19September2006 Y. JOOETAL. Figure1. NakdongRiverwatershedandsamplinglocations This paper has two objectives: (1) clustering waters (water samples) of the Nakdong River watershedintotwoenvironmentallymeaningfulgroups(waterimpactedbyagrochemicalsvs.natural background water), and (2) estimating the relationship between agrochemical-originated element concentrations and physicochemical variables. To cluster waters based on major element concentrations that can be originated from the agrochemicals, many statisticians (Xue et al., 2005) and environmental scientists (Chen et al., 2002; Park et al., 2005) applied the mixture of univariate normal distributions (MUN): XJ (cid:2) (cid:1) (cid:3) fðyÞ¼ pf y(cid:1)m;s2 (1) i j i(cid:1) j j j¼1 Copyright#2006JohnWiley&Sons,Ltd. Environmetrics(inpress) DOI:10.1002/env LOGISTICMIXTUREOFMULTIVARIATEREGRESSIONS wherey isamajorelementconcentrationinalogscale,p isthemixingprobabilityforjthgroup,and (cid:1) i j fðy(cid:1)m;s2Þistheprobabilitydensityofnormaldistributionwithmeanm andvariances2.However, i(cid:1) j j j j thismethoddoesnotutilizethefactthatconcentrationsofagrochemical-originatedelementschange significantly with physicochemistry of water, which can be explained by well-known biochemical equations. Our proposed logistic mixture of multivariate regressions (LMMR) model achieves these objectives simultaneously. SincetheEMalgorithmwasdevelopedbyDempsteretal.(1977),ithasbeenregardedasoneofthe most popular approaches in estimating mixture models and has accelerated usage of these models. Everitt(1984)comparedestimationalgorithmsforthemixtureoftwounivariatenormaldensitiesand concludedthatNewtonandEMalgorithmsprovideverysatisfactoryresults.Often,randomvariables of interest are correlated with each other. To incorporate these correlations into the mixture model, Basford and McLachlan (1985) proposed the mixture of multivariate normals. Xue et al. (2005) exploited Bayesian mixtures of univariate normalsto cluster waters in the YangtzeRiverwatershed, their goals being similar to ours. Naturally, these models evolved into a regression framework to cluster observations based on the relationshipsbetweenaresponsevariableandpredictors.Turner(2000)studiedthemixtureofunivariate regressionstoclusterobservationsbasedontherelationshipbetweenaresponsevariableandpredictors. JeffriesandPfeiffer(2000)suggestedamixtureofunivariatenormalsmodelthathasalogisticregression structureformixingprobabilities.Thus,themixingprobabilitycanvarywiththecovariates.Thispaper extendstheirframeworkstotheLMMR.Asthenameimplies,thismodelincludesalltheseconventional mixture models as special cases and contains advantages of each method: 1. Itexplainstherelationshipbetweenaresponsevariableandpredictorsasinthemixtureofunivariate regressions (Turner, 2000; Razzaghi, 2002). 2. It uses the covariance structure of response variables as in the mixture of multivariate normals (Basford and McLachlan, 1985). 3. Itallowsthemixingprobabilitiestodependonpredictorsasinthelogisticmixturemodel(Jeffries and Pfeiffer, 2000). 4. Itconductsclusteringandestimationofparameterssimultaneouslyasinanyothermixturemodel. Eventhoughtheproposedmodelhasthepossibilityofoverparameterizationasadisadvantage,this is not observed in our case study with Nakdong River watershed data. In Section 2, Nakdong River watershed data and water chemistry related to agrochemicals are described.InSection3,theLMMRsandtheestimation(ECM)algorithmareintroduced.InSection4, NakdongRiverwatersheddataareanalyzedwiththemixturemodels.Section5concludesthepaper. 2. NAKDONG RIVERWATERSHED DATA AND WATER CHEMISTRY It is well known by water chemists that concentrations of major agrochemical-originated elements and physicochemical variables of waters have different relationships in impacted and natural background waters. The main idea of our proposed model is to use these relationships in clustering waters into two groups. 2.1. Data collection Water samples were collected at 93 locations from October 1999 to August 2000. The sampling locations are widely scattered to catch the impact of agrochemicals on the entire study area. These Copyright#2006JohnWiley&Sons,Ltd. Environmetrics(inpress) DOI:10.1002/env Y. JOOETAL. consist of 72 shallow alluvial groundwater (61 irrigation wells for agricultural activities and 11 domestics wells), 15 deep alluvial groundwater, and 6 surface water sampling locations at three small-scale rural watersheds (Wolha, Daesan, and Yongdang) in the middle to lower reaches of theNakdongRiver(Figure1).Shallowirrigationanddomesticwellwatersarecollectedmostlyinthe vicinityofthewatertablewithmediandepth8m(4–37m),whiledeepgroundwatersaresampledat mediandepth120m(80–198m).Thesesamplescovervarioustypesofwatersfromsurfacewatersto deep groundwaters, and from unpolluted waters to waters highly polluted by agricultural activity. Weassumethatmajorelementconcentrationsatthe93samplinglocationsareindependentforthe followingreasons.First,cropfieldsinthestudyareasoftheNakdong Riverwatershedhadtypically evenandrandomapplicationsofagrochemicalsbecausesimilarcropsweregrown.Thisstudyareadid nothaveanylocallymassivepollution,whichmayspreadoutandcausedependencyamongsampling locations. Second, concentrations of agrochemical-related major elements change randomly with variousfactorsintheundergroundenvironment.Ingeneral,physicochemicalpropertiesofsoilporous media(i.e.,porosity,hydraulicconductivity,temperature,redox,organicmattercontent,andsoon)and underground geological structures (i.e., joints and fractures) vary spatially, controlling groundwater flow path, flow rate, and water chemistry. Onsoy et al. (2005) discusses heterogeneity of alluvial groundwater system with agrochemical-induced pollutants. These heterogeneous characteristics of subsurface systemsmakethe samples independent,evenbetween horizontallyadjacent groundwater samples. Annualaveragesofagrochemical-relatedmajorelementconcentrations(NO(cid:1),Ca2þ,Kþ,andCl(cid:1)) 3 andphysicochemistyofwatersamples(alkalinityandpH)aremeasuredateachlocation.Inparticular, NO(cid:1)andCa2þaretwomajorelementsinfertilizerandlime.Inthisstudy,thealkalinitycorrespondsto 3 HCO(cid:1)concentrationofthewatersamplesbecausenaturalwatersintheNakdongRiverwatershedare 3 within neutral pH range (Drever, 1997). See Min et al. (2002) for details. 2.2. Water chemistry Numerous environmental studies have shown that deep groundwaters in rural watersheds are less susceptibletopollutionbyagriculturalactivitiesthanshallowgroundwaters,becausetheyarefarther from the ground surface where agrochemicals are applied (Hendry et al., 1983). Less permeable geologic units prevent the infiltration of impacted shallow groundwater into the deeper zone. In addition, pollutantsfrom agrochemicals tend to get retardedor degraded astheymoveto the deeper zone(HallbergandKeeney,1993;SpaldingandExner,1993;StitesandKraft,2001;KraftandStites, 2003).Allirrigationwellsinthisstudyareshallowandlocatedwithincropfields.Therefore,wecan assumethatgroundwatersinirrigationwellsareimpacted,whiledeepgroundwatersarenotimpacted andhavecharacteristicsofnaturalbackgroundwaters.Withregardtodomesticwellandsurfacewaters, noassumptioncanbemadebecausetheirimpactstatusdependsonadjacencytofarmlandandunknown underground geology. There has been much research in water chemistry on the relationships between major agrochemical-originated elements and physicochemical variables (Min et al., 2002; Chae et al., 2004). In general, the natural background water chemistry is primarily controlled by water–rock interactions.Inparticular,thedissolutionofcarbonatemineral(CaCO )playsamajorroleinregulating 3 the water chemistry due to its high solubility, which is described by CaCO þCO þH O,Ca2þþ2HCO(cid:1) (2) 3 2 2 3 Copyright#2006JohnWiley&Sons,Ltd. Environmetrics(inpress) DOI:10.1002/env LOGISTICMIXTUREOFMULTIVARIATEREGRESSIONS In other words, even though log concentrations of Ca2þ and alkalinity (HCO(cid:1)) may vary from 3 location to location depending on severity of weathering processes, they will have a positive relationship. For example, Chen et al. (2002) report that there is a very strong positive relationship betweenlog(Ca2þ)andlog(alkalinity)inYangtzeriverwater.However,ourNakdongRiverwatershed data has morecomplicated trendsthan theYangtzeRiverwatershed. When only naturalbackground waters(deepgroundwater,whichisdenotedwithDinFigure3(a))areconsidered,apositivetrendis observed. When only impacted waters (irrigation well groundwater, which is denoted with * in Figure3(a))areconsidered,anegativetrendisobserved.Thiscanbeexplainedbythefollowingtwo processes associated with agricultural activities. First, nitrification of nitrogen fertilizers (NHþ), which is described by Equation (3), raises NO(cid:1) 4 3 concentration and lowers pH. NHþþ2O ,NO(cid:1)þ2HþþH O (3) 4 2 3 2 Negative relationships between logðNO(cid:1)Þ and pH appear in Figure 3(c). As a consequence of 3 nitrification,lowerpHreducesalkalinityofwater(asmeasuredbylevelofHCO(cid:1),Minetal.,2002).In 3 otherwords,thenitrificationprocessmakesanegativetrendbetweenlogðNO(cid:1)Þandlog(alkalinity)in 3 impacted water (* in Figure 3(b)). This process is absent or verymild in natural background water (DinFigure3(b)).Second,lime(CaO)isappliedtoneutralizefarmlandsoilswherenitrogenfertilizers are used. This makes logðCa2þÞ and logðNO(cid:1)Þ have a positive relationship only among impacted 3 waters(*inFigure3(d)).Naturalbackgroundwatersdonothaveanystrongtrend(DinFigure3(d)). ThesetwoprocessesmakelogðCa2þÞandlog(alkalinity)haveanegativetrendinimpactedwater(*in Figure 3(a)). 3. LOGISTIC MIXTURE OF MULTIVARIATE REGRESSIONS MODEL AND ESTIMATION Inthissection,weproposetheLMMRsmodelandanECM(MengandRubin,1993)algorithmtofind the MLE of this model. 3.1. Proposed model SupposethatweareinterestedinthemixtureofJgroups.Also,lety beacolumnvectorofMresponse i variables for the ith observation (annual average of measurements at ith sampling location), x be a i predictorvectorforthemeanfunctionsmðxÞwherej¼1;...;J,andz beapredictorvectorforthe j i i logistic functions for mixing probability pðzÞ. The proposed model is given by j i J X (cid:1) fðyijxi;ziÞ¼ pjðziÞ(cid:2)fðyi(cid:1)mjðxiÞ;SjÞ (4) j¼1 (cid:1) where fðyi(cid:1)mjðxiÞ;SjÞ is a multivariate normal probability density function of yi with mean vector mðxÞ¼Xb and covariance matrix S, j i i j j 0yi1 1 0mj1ðxiÞ1 0bj11 . . . yi ¼B@ .. CA; mjðxiÞ¼B@ .. CA; Xi ¼IM(cid:3)M(cid:4)xi bj ¼B@ .. CA y m ðxÞ b iM jM i jM Copyright#2006JohnWiley&Sons,Ltd. Environmetrics(inpress) DOI:10.1002/env Y. JOOETAL. x ¼ð1;x ;(cid:2)(cid:2)(cid:2);x Þ, b ¼ðb ;(cid:2)(cid:2)(cid:2);b ÞT for m¼1;(cid:2)(cid:2)(cid:2);M; (cid:4) is the Kro-necker product, i i1 is(cid:1)1 jm jm0 jms(cid:1)1 8 >< PexJp(cid:1)ð1zigjÞ ; if j¼1;(cid:2)(cid:2)(cid:2);J(cid:1)1 pjðziÞ¼>:1þPlJ¼(cid:1)111expðziglÞ; If j¼J 1þ l¼1expðziglÞ z ¼ð1; z ;(cid:2)(cid:2)(cid:2);z Þ, gT ¼ðg ;(cid:2)(cid:2)(cid:2);g Þ, and gT ¼ðg ;(cid:2)(cid:2)(cid:2);g Þ. i i1 it(cid:1)1 1 J(cid:1)1 j j0 jt(cid:1)1 Notethatweuseonecommonpredictorvectorx fortheJ(cid:3)Mmeanfunctions,butdifferentvectors i can be used if need. For notational simplicity, x and z will be used instead of x and z unless it is i i necessary to specifically indicate observation i. 3.2. Parameter estimation Letw ¼ðw ;(cid:2)(cid:2)(cid:2);w Þbeanindicatorvectorforgroupmemberships:Ify belongstogroupj,w ¼1 i i1 iJ i ij andw ¼0forallotherj0 6¼j.Also,letu¼ðb ;(cid:2)(cid:2)(cid:2);b ;S ;(cid:2)(cid:2)(cid:2);S ;g ;(cid:2)(cid:2)(cid:2);g Þ.Parametersinthe ij0 1 J 1 J 1 J(cid:1)1 LMMRs model are estimated using the ECM algorithm (Meng and Rubin, 1993) as follows (see Appendix for details): Step 1. Provide initial values, wð0Þ ¼ðwð0Þ;(cid:2)(cid:2)(cid:2);wð0ÞÞT and 1 n (cid:2) (cid:3) uð0Þ ¼ bð0Þ;(cid:2)(cid:2)(cid:2);bð0Þ;Sð0Þ;(cid:2)(cid:2)(cid:2);Sð0Þ;gð0Þ;(cid:2)(cid:2)(cid:2);gð0Þ 1 J 1 J 1 J(cid:1)1 Step 2. Let (cid:2) (cid:1) (cid:3) pðk3ÞðzÞf y(cid:1)mðk1ÞðxÞSðk2Þ Piðjk1;k2;k3Þ ¼PJ jpðk3ÞiðzÞf(cid:2)i(cid:1)y(cid:1)(cid:1)jmðk1ÞiðxÞj;Sðk2Þ(cid:3) (5Þ l¼1 l i i(cid:1) l i l and calculate Pðk(cid:1)1;k(cid:1)1;k(cid:1)1Þ. Note that, after the algorithm converges, this probability can be ij interpreted as Prob½w ¼1andw ¼0forallj6¼j0jy;x;z(cid:5) ij ij0 . Step 3. Calculate ( N )(cid:1)1( N ) bðkÞ ¼ XXTSðk(cid:1)1Þ(cid:1)1XPðk(cid:1)1;k(cid:1)1;k(cid:1)1Þ XXTSðk(cid:1)1Þ(cid:1)1yPðk(cid:1)1;k(cid:1)1;k(cid:1)1Þ j i j i ij i j i ij i¼1 i¼1 and update Pðk;k(cid:1)1;k(cid:1)1Þ with bðkÞ. ij j Copyright#2006JohnWiley&Sons,Ltd. Environmetrics(inpress) DOI:10.1002/env LOGISTICMIXTUREOFMULTIVARIATEREGRESSIONS Step 4. Calculate SðkÞ ¼ 1 XN fy (cid:1)mðkÞðxÞgfy (cid:1)mðkÞðxÞgTPðk;k(cid:1)1;k(cid:1)1Þ j PN Pðk;k(cid:1)1;k(cid:1)1Þ i i i i ij i¼1 ij i¼1 and update Pðk;k;k(cid:1)1Þ with SðkÞ. ij j Step 5. Set l¼1. Also, let lð0Þ ¼gðk(cid:1)1Þ and (cid:4)@hðlðl(cid:1)1ÞÞ(cid:5)(cid:1)1 lðlÞ ¼lðl(cid:1)1Þ(cid:1) hðlðl(cid:1)1ÞÞ (6Þ @lðl(cid:1)1Þ where hðlÞ¼ðh ðlÞ;(cid:2)(cid:2)(cid:2);h ðlÞÞT 1 J(cid:1)1 h ðlÞ¼XN n(cid:2)Pðk;k;k(cid:1)1Þ(cid:1)PðkÞðzÞ(cid:3)zo a ia a i i i¼1 (cid:4) (cid:5) @hðlÞ @h ðlÞ a ¼ @l @l b ab @haðlÞ¼((cid:1)PNi¼1paðkÞðziÞ(cid:2)1(cid:1)paðkÞðziÞ(cid:3)zizTi; if a¼b @lb PN pðkÞðzÞpðkÞðzÞzzT; if a6¼b i¼1 b i a i i i for a;b¼1;(cid:2)(cid:2)(cid:2);J(cid:1)1. Iterate Equation (6) until lðlÞ converges. Then set lðkÞ ¼lðlÞ. Step 6. Set k¼kþ1 and repeat Steps 2–5 until convergence. 3.3. Standard errors of estimates Awell-known disadvantage of the EM algorithm is the difficulty in calculating standard errors for MLEs. There have been many studies, including Louis (1982), Meng and Rubin (1991), and Oaks (1999). However, their methods are difficult to implement for the LMMR because the model does not have an exponential family density, and analytic calculations of second derivatives are very difficult. Basford et al. (1997) proposed using two simpler alternatives for mixture of normals: an information-basedmethodandabootstrapmethod.Althoughthebootstrapmethodiscomputationally intensive,itprovidesmorereliableestimateswhenthesamplesizeisrelativelysmallormeansinthe mixturemodelarenotwell-separated(Basfordetal.,1997).Inthispaperweusethebootstrapmethod to find the standard errors of the MLE because our ECM algorithm converges fast enough. 4. ANALYSIS OF WATER CHEMISTRY DATA IN NAKDONG RIVER WATERSHED, KOREA Hamilton andHelsel (1995) and Min et al. (2002) reported that NO(cid:1) and Ca2þ concentrations were 3 significantly elevated in groundwater beneath agricultural areas. In particular, NO(cid:1) concentration in 3 Copyright#2006JohnWiley&Sons,Ltd. Environmetrics(inpress) DOI:10.1002/env Y. JOOETAL. water has been considered as the most important variable to describe the environmental impacts of agriculturalactivitiesonwaterquality(Bo¨hlkeandDenver,1995;Canter,1997;Minetal.,2003;Chae et al., 2004). Whilemanyothertypesofclusteringmethods,suchask-meansandhierarchicalclustering(Johnson and Wichern, 2002), provide actual assignment of each observation to a group, the mixture models calculate classification probabilities: (cid:1) (cid:1) ^pjðziÞfðyi(cid:1)(cid:1)m^jðxiÞ;S^jÞ Pdrob½wij ¼1(cid:5)andwij0 ¼0forallj0 6¼j(cid:1)(cid:1)y;x;z(cid:5)¼PJ ^pðzÞfðy(cid:1)(cid:1)m^ ðxÞ;S^ Þ (7) l¼1 l i i(cid:1) l i l Thentheithwaterisassignedtothejthgroup(normaldistributioncomponent)thathasthebiggest classification probability. This approach helps analysts understand the relationship between an observationandgroups(clusters)intermsoftheclassificationprobability.Inthissection,weapplythe MUNandtheLMMRtoclusterthesameNakdongRiverwatersheddata(93watersamplinglocations). All measurements except pH are used in log scales. 4.1. Mixture of univariate normal distributions The MUN approach (Eq. 1) has been applied very widely in environmental sciences (Jeffries and Pfeiffer, 2000; Borgmann et al., 2005; Park et al., 2005). In this subsection, the MUN is applied to clusterthewholesetofNakdongRiverwatershedobservations(93waters)intotwogroupsbasedon each of four elements (NO(cid:1), Cl(cid:1), Kþ, and Ca2þ). 3 Whenweassumethathigherconcentrationsoftheseelementsindicatehigheragriculturalimpact, theleftsidegroup(normalcurve)inFigure2isfornaturalbackgroundwatersandtherightsidegroup isforimpactedwaters.ConcentrationsofNO(cid:1)inFigure2(a)haveaverylargegroupontherightside 3 andaverysmalloneontheleftside,whichshowsthatthemajorportionofthewatersareimpactedby agriculturalactivity.However,Ca2þ,Kþ,andCl(cid:1)concentrationsinFigure2(b),(c),and(d)showthat most waters are not impacted. Eventhough concentrations of the four major elements are measured fromthesamewatersamples,resultsoftheseclusteringanalysescontradicteachother.Thisimplies that the MUN is not a proper clustering method for these waters. AsdiscussedinSection2.2,wemayassumethat,amongthe93waters,the61irrigationwellwaters are impacted and the 15 deep groundwaters are from natural background. The performance of a clusteringmethodcanbeevaluatedbycheckinghowoftenitfindstherightgroupmembershipsofthe 76(¼61þ15)impactedornaturalbackgroundwaters.AftertheMUNsareappliedtothewholedata (93 observations), classification probabilities (Eq. 7) are calculated. Each water is classified into a groupthathasclassificationprobabilitygreaterthan1/2.Thentwosuccessfulclassificationratesand theiraveragearecalculated(Table1):theproportionofirrigationwellgroundwatersthatareclassified totheimpactedwatergroup,andtheproportionofdeepgroundwatersthatareclassifiedtothenatural backgroundwatergroup.Notethatthe17domesticorsurfacewatersareusedformodelestimation,but are not considered in calculating successful classification rates because their group memberships cannotbeassumed.Whenagoodclusteringmethodisused,allormostoftheirrigationwellwaters should be classified to the impacted water group that has higher concentrations of agrochemica- l-originatedelementsandallormostofthedeepgroundwaterstothenaturalbackgroundwatergroup thathaslowerconcentrations.Thismakesbothsuccessfulclassificationrateshigh.Whenaclustering model tends to assign too many observations (waters) to one of the groups, one of the successful Copyright#2006JohnWiley&Sons,Ltd. Environmetrics(inpress) DOI:10.1002/env LOGISTICMIXTUREOFMULTIVARIATEREGRESSIONS Figure2. Histograms of logðNO(cid:1)Þ, log(Ca2þ), log(Kþ), and log(Cl(cid:1)) in waters of the Nakdong River watershed and the 3 estimatedmixtureofnormaldistributionsinsolidlines classificationrateswillbeveryhighandtheotherwillbeverylow.Iftherandomvariableofthemodel contains poor information to identify targeting clusters, both rates will be low. A better clustering methodwillhavebothrateshigh,andthereforetheaveragewillbehigh.Inthispaper,theaverageof rates is used as one of methods in comparing clustering models. Becauseofheavyusageofnitrogenfertilizers,theconcentrationofNO(cid:1)isusuallyregardedasthe 3 bestindicatorforagriculturalpollutants.However,itdoesnotperformasexpectedwiththeNakdong River watershed data. The MUN with NO(cid:1) assigns many natural background waters (10 out of 15 3 naturalbackgroundwaters)totheimpactedwatergroup,inadditiontoassigningallimpactedwaters (61 out of 61 impacted waters) to the impacted water group. In other words, too many waters are assignedtotheimpactedwatergroup(Figure2(a)).Asaresult,theimpactedwatershavesuccessful classificationrate1.000(¼61/61)andnaturalbackgroundwatersdoonly0.333(¼5/15).Theaverage ofthesetworatesis0.667.WhentheMUNsareusedwithKþandCl(cid:1)concentrations,thesuccessful classificationrateshavesimilartendencies,butintheoppositedirection.TheMUNswithKþandCl(cid:1) have rates 0.148 and 0.066 for impacted waters and 0.933 and 0.800 for natural background waters. Averagesoftheseratesare0.541and0.433.PerformanceoftheclusteringmodelbasedonCa2þiseven worse.Impactedwatershave0.164,naturalbackgroundwatershave0.267,andtheaverageoftherates Copyright#2006JohnWiley&Sons,Ltd. Environmetrics(inpress) DOI:10.1002/env Y. JOOETAL. Figure3. Scatter plots of log(Ca2þ), logðNO(cid:1)Þ, log(alkalinity), and pH in Nakdong River watershed: Triangles are deep 3 groundwater(naturalbackgroundwatergroup),circlesareshallowirrigationwater(impactedwatergroup),andsoliddotsare surfacewaterordomesticwellgroundwaters(unknowngroupmembership) Table 1. SuccessfulclassificationrateusingtheMUNs(waterimpactedbyagrochemicalsvs.naturalbackground water) Agrochemical elements(cid:6) NO(cid:1) Ca2þ Kþ Cl(cid:1) 3 Impacted naturalaverage 1.000 0.164 0.148 0.066 Natural 0.333 0.267 0.933 0.800 Average 0.667 0.216 0.541 0.433 (cid:6)Usedinlogscales. Copyright#2006JohnWiley&Sons,Ltd. Environmetrics(inpress) DOI:10.1002/env

Description:
water quality, investigations on the major chemical compositions in alluvial and bedrock (www.interscience.wiley.com) DOI: 10.1002/env.820 . nitrification, lower pH reduces alkalinity of water (as measured by level of HCOА.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.