ebook img

The Taxonomy of Mineral Occurrence Rarity and Endemicity PDF

28 Pages·2022·1.353 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview The Taxonomy of Mineral Occurrence Rarity and Endemicity

731 TheCanadianMineralogist Vol.60,pp.731-758(2022) DOI:10.3749/canmin.2200010 THE TAXONOMY OF MINERAL OCCURRENCE RARITY AND ENDEMICITY LIUBOMYR GAVRYLIV§ ComeniusUniversity,FacultyofNaturalSciences,DepartmentofMineralogy,PetrologyandEconomicGeology, Ilkovicˇova6,84104,Bratislava,Slovakia VITALIIPONOMAR UniversityofOulu,FacultyofTechnology,FiberandParticleEngineeringResearchUnit, PenttiKaiterankatu1,90014Oulu,Finland MARKOBERMANEC UniversityofBern,InstituteofGeologicalSciences,Baltzerstrasse1þ3,CH-3012Bern,Switzerland MARIA´N PUTISˇ ComeniusUniversity,FacultyofNaturalSciences,DepartmentofMineralogy,PetrologyandEconomicGeology, Ilkovicˇova6,84104,Bratislava,Slovakia ABSTRACT NearlyahalfofknownIMA-approvedminerals(asofNovember2021)arereportedfromfourlocalitiesorfewerandso may be considered rare mineral species. These minerals form a continuum with more common species (e.g., rock-forming minerals),allofwhichconstituteimportantconstituentsofEarthandcontributorstoitsdynamics.Tobetterunderstandthe taxonomyofmineralrarity,evaluationshavebeenmadeonthebasisofk-meansclusteringandkerneldensityestimationof one-dimensionaldataonmineraloccurrencemetrics.Resultsfromsecond-andthird-degreepolynomialregressionanalyses indicatethepresenceofadivergencebetweentheobservednumberofendemicmineralsdiscoveredsince2000andthosethat arelikelytorepresent‘‘true’’endemicspecies.Thesymmetryindex,calculatedusingtheapproachofUrusovforeachrarity cluster,revealsagradualdecreasefromubiquitoustoendemicfrom0.64to0.47.Anetworkanalysisofelementco-occurrences withineachrarityclustersuggeststheexistenceofatleastthreedifferentcommunitieshavingsimilargeochemicalaffinities; thelattermayreflecttherelativeabundanceofmineralstheirelementstendtoform.Theanalysisofelementco-occurrence matriceswithineachgroupindicatesthatcrustalabundanceisnottheonlyfactorcontrollingthetotalnumberofmineralseach elementtendstoform.Othersignificantfactorsinclude:(1)thegeochemicalaffinitytotheprincipalelementinthegroup(i.e., sulfurfor chalcophileandoxygenfor lithophileelements)and(2) dispersionoftheprincipalelementthroughgeochemical processes.Thereisapositivecorrelationbetweenthelithophileelementgroup’sabundanceintheEarth’scrustandthenumber ofcommonmineralstheytendtoform,butanegativecorrelationwiththenumberofrarespecies. Keywords:rarity,endemic minerals, data analysis,symmetry index. INTRODUCTION characteristicassociatedwithahighriskofextinction, implying that once a biological species becomes From the biological point of view, rarity may be extinct, they will not re-emerge (Vermeij & considered evidence that a species has characteristics Grosberg 2018). In mineral ecology, most of what that differ from a more common taxon (Kunin & are considered endemic minerals are not likely to Gaston 1993). In bioecology, rarity is a population disappear, except those falling into category 3 of §Correspondingauthore-mailaddress:[email protected] Downloaded from http://pubs.geoscienceworld.org/canmin/article-pdf/60/5/731/5721758/i1499-1276-60-5-731.pdf by Mineralogical Association of Canada user on 17 November 2022 732 THECANADIANMINERALOGIST Hazen & Ausubel (2016). Further, metamict minerals metric based on the number of localities where and those formed in the mantle may be considered minerals have been recorded close to, or at, the exceptions. However, the question of our current surfaceof theEarth. abilitytorecognize‘‘mineralfossils’’shouldberaised Hazen&Ausubel(2016)suggestedaprototypeof apartfrom these minerals. Possibly,thegeneral rarity ataxonomyofmineralogicalraritybasedprimarilyon of the bulk of endemic minerals will be diminished thetaxonomyofbiologicalrarity.Accordingtothis,a through the application of new analytical methods to mineralmaybeconsideredrareifitisknownfromfive identify such minerals and the study of hard-to-reach or fewer localities. Furthermore, it was proposed that locations,includingthatwhichhaveanextraterrestrial rare species generally conform to one of four distinct origin.Forinstance,ferroskutterudite,(Fe,Co)As,was categories (with slight overlaps) based on the reason first reported from dolomite-calcite veins of the for their rarity: (1) a restricted P-T-X range, (2) rare Noril’sk ore field (Russian Federation; Spiridonov et combinations of essential elements, (3) instability al. 2007) and was considered endemic until it was under prolonged exposure to ambient conditions, and described from a second locality, the Puy-les-Vignes (4) difficulties in the recognition of the mineral or breccia pipe, Massif Central, France (Staude et al. challenges associated withits discovery. 2012). It has been subsequently discovered at As large-number-of-rare-event (LNRE) models Wittichen, Schwarzwald, SW Germany (Harlaux et depend primarily on the distribution of rarer species al. 2015), and from Pira Inferida Yard, Fenugu Sibiri (Hazenetal.2016,2015a,Hystadetal.2019b,Grew Mine (Sardinia, Italy). By contrast, harstigite, Ca Mn et al. 2016, 2017, Liu et al. 2018, Hazen et al. 2017, 6 Be (SiO ) (Si O ) (OH) , known only from its type Morrisonetal.2017),itisnotedthatraremineralsare 4 4 2 2 7 2 2 locality at Harstigen Mine, Persberg ore district vitaltodeveloping amore completeunderstandingof (Sweden),hasbeenanendemicmineralfor136years overall mineral diversity and in predicting the total (Flink1886). numberofmineralspeciesyet-to-bediscovered.Based Nearly 21% (1232) of all IMA-approved minerals on predictive population models, the total number of (5762 as of November 23, 2021) are known from a ‘missing’ minerals in several different subsets has single locality, thus making them endemic mineral been evaluated for a range of elements, including: C species.Another28%(1616)occuratbetween2and5 (Morrison etal.2020, Hazen etal. 2016),Co (Hazen localities, meaning that almost half of all known etal.2017),Cr(Liuetal.2017),B(Grewetal.2016), minerals are rare according to the systematics of Li(Grewetal.2019),andV(Liuetal.2018)minerals. Hazen & Ausubel (2016), the other half being Since the predicted number of undiscovered minerals considered common or widespread. Those that are isbelievedtohavebeenunderestimated,owingtothe considered as being widespread are typically rock- acceleration in the rate of the mineral discovery over forming minerals, which play critical roles in reflect- the last decade (Hazen et al. 2015b), Hystad et al. ing the Earth’s composition and dynamics. The crust (2019a) employed Bayesian methods in an effort to comprises almost 2.5% of the Earth’s volume, with estimatethetotalnumberofmineralspeciesthatmay fewerthan100mineralsmakingup99%ofit(Hazen exist within the Earth’s crust, obtaining a number of &Ausubel2016).Ofthese,90%aresilicatesand75% ~3700 mineral species which have not yet been are tectosilicates (Deer et al. 2004). Despite being discovered.Hazen&Ausubel(2016)emphasizedthat widespread in the crust, 22% of the known tectosili- the precision of mineral ecology studies, which cates(i.e.,40species)arerecordedatonlyonelocality employ statistical methods, is primarily dominated on the Earth’s surface, indicating the importance of bythosemineralspeciesfoundatoneortwolocalities. discriminating between the rarity taxonomy of those Similar to biological ecosystems, where only a few minerals directly accessible to us and those that are species are widespread, but most species are rare, it not. waspositedthattheEarth’smineralsmightalsofollow Another example is bridgmanite, which has been asimilardistributionmodel. estimatedtoconstituteupto93%ofthelowermantle Furthermore, Hazen et al. (2015a) note that up to (Tschauneretal.2014).However,ithasbeenrecorded 15,000 mineral species could plausibly be discovered at only one locality at the surface of the Earth, on Earth-like planetary objects. Of course, this probably owing to its instability under ambient depends on the possibility that these bodies possess conditions (category 2 according to Hazen & environmentswithP-T-Xconditionsthatareeithernot Ausubel 2016). As such, bridgmanite may be consid- present,orremaintobediscoveredonEarth.However, ered an endemic mineral sensu stricto, but a even if these bodies began as did Earth, it might be widespread mineral sensu lato. Here, mineral rarity expected that there would be a difference in the isdefinedbasedonthedistributionofamineralwithin numberofraremineralsbetweenthosefoundonthem the available geospatial data, the latter representing a and those found on Earth. The latter implies that Downloaded from http://pubs.geoscienceworld.org/canmin/article-pdf/60/5/731/5721758/i1499-1276-60-5-731.pdf by Mineralogical Association of Canada user on 17 November 2022 THETAXONOMYOFMINERALOCCURRENCERARITYANDENDEMICITY 733 mineralrarity andabundancecannotyetbeexplained The present paper: (1) provides a thorough and by a completely deterministic model, at least not reliable taxonomy of mineral rarity and classification without the introduction of additional parameters ofIMA-approvedmineralsintoraritygroupsbasedon which are notyetknown orsufficiently understood. dataanalysis;(2)comparesthesecategoriesinthelight The issue of mineral rarity was also highlighted oftheirdiscoveryhistoryandcompositioninorderto fromthecrystalsymmetrypointofviewintheworks reduce the dimensionality, increase interpretability of of Urusov (2002, 2007), where the concept of thetaxonomy,andminimizetheinformationloss;and ‘‘evolutionary desymmetrization of mineral matter’’ (3)assesses theevolution ofmineral rarity. was proposed. This underpins the gradual increase in thepercentageoflow-symmetry(particularlytriclinic) MATERIALS AND METHODS mineral species as compared to those of higher- symmetry classes. In this context, Urusov (2007) The basic data used in this study comes from the mentions several mechanisms by which secondary, list of IMA-approved mineral species, including their lower-symmetry minerals form at the expense of mineral formulae, crystal systems, locality counts, higher-symmetry species. The general process here is discovery years, and classification within the Nickel- one that follows a passive mineral and symmetry Strunzclassificationscheme.Thesewereappendedto evolutionary model, similar to that as proposed by through subsequent data searches and extractionfrom Krivovichev et al. (2018): it implies that while new otherresources.Someofthisdataisprovidedbyweb mineral species of lower symmetry are formed, the resources and some is accessible through hard-copy precursor, older mineral species continue to be form publications. The bulk of information on the crystal and be produced, i.e., they do not disappear entirely. chemistry of minerals, their occurrence, associated Thisphenomenonhas ledto arelativeincrease inthe minerals, and discovery localities was accessed percentage of rare minerals (mostly lower symmetry) through digital versions of The American Mineralo- discovered (often within a single deposit and in very gist, The Canadian Mineralogist, and Mineralogical low abundance) over the last five decades primarily Magazine journals and reviewed afterward. The IMA through the application of high-resolution analytical list served as a primary skeleton of information and techniques. This suggests that a hypothesis of a wassupplementedwithdatacomingfromtheoriginal ‘‘natural selection’’ algorithm may be applied to the literature. mineralkingdom,underwhicheverymineraltendsto minimize desymmetrization to ensure stability. The Materials latteristheprincipleofminimumdesymmetrizationas proposed by Urusov (2002). Accordingly, large-scale TheIMAListofMineralSpecies.Theofficiallistof globalprocessesintheEarth’scrustfavordesymmet- IMA-approved mineral species (http://cnmnc.main.jp/ rization in the minerals that develop. However, this imalist.htm) is alsoaccessible via theRRUFF Project tendency could be reduced in specific environments (https://RRUFF.info/IMA)(Lafuenteetal.2016).The with extreme geochemical conditions, precisely those RRUFF Project IMA list allows users to search over where most rarespecies haveformed. 5740 (as of November 12, 2021) species by mineral Rarity,inthecontextofthediscussionpresented,is name,composition,crystalsystem,spacegroup,point consideredmoreasarelativeconceptthananabsolute group, unit-cell parameters, origin, paragenetic mode, one. Mineral species are considered relatively rare IMA status, and even water solubility. Additionally, compared with the abundance of other species, or theRRUFFresourceprovidesdataoncrystalstructure based on their sharing unique crystal-chemical prop- groups, oldest known age, Raman spectra, and other ertiesandrestrictedP-T-Xstabilityranges.Therefore, physical properties. the demarcation point wherein a mineral species is definedasbeingrarebecomessubjective.Thefactthat The Mineral Evolution Database (MED). The a mineral species has been discovered at many Mineral Evolution Database (Golden et al. 2016), geographic localities cannot be considered a valid designed primarily for supporting mineral evolution indicatorastowhetheramineralmaybeconsideredas and ecology studies, is another resource freely endemic or not and thus is insufficient to completely accessible through RRUFF Project (https://RRUFF. address the taxonomy issue. In this context, a new, info/Evolution). The MED contains mineral locality alternative point of view on the issue is provided, data extracted from mindat.org. As of November 14, including the presentation of a taxonomy proposal 2021,theMEDdataon810,907mineral-localitypairs based on the study of mineral rarity as an absolute isavailable.Additionally,thedatabaseallowsretrieval measure and deciphering the similarities within the of subsets of mineral-locality pairs through years rarityclassesusingmachine-learning(ML)techniques. between 2016 and 2020. The latter is extremely Downloaded from http://pubs.geoscienceworld.org/canmin/article-pdf/60/5/731/5721758/i1499-1276-60-5-731.pdf by Mineralogical Association of Canada user on 17 November 2022 734 THECANADIANMINERALOGIST helpful in assessing the evolution of rarity and within the defined rarity groups. They are: (1) data endemicitywithtime. cleaning,(2)datatransformation,and(3)dataparsing Athena.TheAthenamineraldatabasewasthefirst and normalization to at least a second normal form completemineraldatabaseontheweb,createdduring (2NF) where appropriate. In addition, two unsuper- 1986–1987 at the Department of Mineralogy at the visedmachinelearning(ML)techniqueswereusedfor Geneva Natural History Museum. In 1994, the testing the hypotheses in regard to data distribution: database was published online; since then, it has (1) k-means clustering and kernel density estimation become accessible to the geoscientific community. (KDE) of unlabeled one-dimensional data for data The Nickel-Strunz classification used in the current classification and (2) ordinary least-squares linear research was kindly provided by Pierre Perroud, the regression models with second- and third-degree founder of Athena (https://athena.unige.ch/athena/), polynomial feature transforms for predicting the withhiswritten permission. ‘‘true’’ rate of mineral discoveries. Furthermore, a HandbookofMineralogy.HandbookofMineralogy network analysis of element co-occurrences was (http://www.handbookofmineralogy.org) is another employed to encode the interactions between mineral web resource for accessing data on 4988 IMA- associationsof severalrarity groups. approved minerals (as of December 2, 2021), main- Networkcommunitystructureswerecomputedand tained by the Mineralogical Society of America measured through agglomerative and division com- (MSA) since 2001. A variety of data is stored under munity detection algorithms known as the Girvan- the website: crystal data, physical properties, optical Newman algorithm and a greedy modularity maximi- properties, unit-cell data, occurrence, association, zationbasedontheClauset-Newman-Moorealgorithm distribution,name origins,and references. (Clausetetal.2004).Greedymodularitymaximization Mindat. In rare cases, the number of occurrences focusesonidentifyingthosecommunitiesthatproduce andmineralformulaeofsomeraremineralswascross- the greatest increase in modularity, beginning with checked with the online mineral database Mindat each node in its community. On the other hand, the (www.mindat.org) in November 2021. The localities Girvan-Newman algorithm detects communities by returned were then screened visually and analyzed steadily removing edges from the original graph. At through external resources to decide whether these eachstep,thealgorithmremovesthe‘‘mostvaluable’’ localitiesconstitute differentgeological units ornot. edge of the network, traditionally this being the edge All web resources cited above, other sources used with the highest betweenness centrality. The number in research negligibly, and online and hard-copy ofelementsco-occurrencewasusedastheattributeto publications used to develop the core analytics of the measuretheweight ofeachedge. research, were accessed under ‘‘fair use’’ conditions The work follows PEP-8 rules of Rossum et al. governed by The Copyright Act of 1976 — a United (2001) as much as possible, and the authors recom- States copyright law (17 USC § 107) and what is mend that other developers in geoscientific areas known as ‘‘fair dealing’’ in other countries (Canada, follow generally accepted code quality guides. A Australia, UK, EU, and its Member States). All the standardizedwayofbuildingthedatawarehousesand datausedinthisresearchresidesinthepublicdomain analyticspipelinesisnotonlyasuggestion,butrather andisfreely accessiblethrough webinterfaces. ademanddictated byprogrammingethics. Moreover, All the code developed during the research is storing the code and history of commits in version- provided under MIT license in the public GitHub control systems (e.g., GitHub, GitLab, or Bitbucket) repository ‘‘mineral-rarity’’, accessible through an under open-access conditions to track and to manage open-source geoscientific computing organization changes to code developed within the scope of called ‘‘mineralogy-rocks’’ (https://github.com/ research, is strongly encouraged. The code repository mineralogy-rocks/mineral-rarity). providedherecouldbetreatedasaguidetosettingup the local development environment, analyzing miner- Methodology alogicaldatausingmoderntools,andwritingdescrip- tiveand meaningful commitshistory. Python3.10.1wasusedalongwithsharedscientific libraries (NumPy, Pandas, scikit-learn, NetworkX, Key definitions matplotlib) to build up the analytics warehouse and compute the dependencies within the collected data. Afewkeydefinitionsareprovidedinthissectionto Several data analysis algorithms were employed for emphasize the importance of understanding the parsing text data like mineral formulae, encoding terminologyused inthis paper. categorical data into binary (‘‘one-hot encoding’’ or Rarity. While rarity is considered as an absolute, OHE), and calculating the co-occurrence matrices notrelative,metric,anideaisputforwardthatoncea Downloaded from http://pubs.geoscienceworld.org/canmin/article-pdf/60/5/731/5721758/i1499-1276-60-5-731.pdf by Mineralogical Association of Canada user on 17 November 2022 THETAXONOMYOFMINERALOCCURRENCERARITYANDENDEMICITY 735 species’ rarity status is assigned, it can shift toward a similar in terms of rarity. For instance, charoite is more common category (case 1) and, in very rare recorded from only three localities, and abramovite cases,towardararerone(case 2)withtime.Forcase occurs only at Kudriavy volcano, Kuril Islands, 1, the rarity of a mineral follows a ‘‘normal rarity Russian Federation. Both are considered rare species flow’’, which is typical behavior of the majority of and are therefore similar from a taxonomical point of species.Whenamineralisdiscoveredatnlocalities,it view. may subsequently be discovered at other localities However, charoite’s abundance at the recorded (nþ1), thus shifting its rarity index toward that of a localitiesissohighthatthename‘‘charoitite’’hasbeen morecommonmineral.However,inexceptionalcases, appliedtothoserocksconsistingofmorethan50%of a mineral recorded at n localities may have been charoite, along with variable proportions of quartz, erroneously reported from one locality, meaning that aegirine, and K-feldspar. In contrast, abramovite theabsolutenumberoflocalitieswherethemineralis occurs as tiny elongated lamellar-shaped crystals, up recorded from is n–1. For instance, in 2011, jacutin- to 130.2 mm in size, that are barely visible to the gaite was erroneously reported from Libsˇta´t copper naked eye. Therefore, while charoite and abramovite mines (Kosˇˇta´lov, Semily District, Liberec Region, aresimilarintermsofthelocality-countsmetric,they Czech Republic), due to a mistake developed during are different in terms of their relative abundances in thin-section polishing (Malecetal.2012). the upper crust. A rarity taxonomy must consider the Another example is samarskite-(Yb), an endemic overallabundanceofamineralinadditiontoitsspatial mineralthatwaserroneouslyreportedasoccurringata distributiontobeasrobustandcompleteasispossible. secondary locality in the Pryor Mountains (Big Horn However, an abundance parameter could not be Co., Montana, USA). Upon detailed examination, it introduced into the research due to a lack of data was demonstrated that the chemical composition did relating to the relative size or morphology of all the notinfactmatchthatofsamarskite-(Yb).Assuch,the minerals considered. Of course, these discrepancies mineral rarity index would necessarily shift toward impactmoregreatlyonrelativelyraremineralspecies that of a rarer one, i.e., following an ‘‘inverse rarity thanthosethat are widespread. flow’’. Discoveryyear.Inordertoassesstherateatwhich Locality. Particular attention should be drawn to minerals have been discovered, which can then be what defines a mineral locality, especially when one employed to develop regression models, the year in has notyetbeen standardized. Insome instances,this whichamineralwasfirstdiscoveredwasconsideredas termcanhaveaminimalmeaning,suchasalocalityof aparameter.Importantly,thediscoveryyearisdefined a mineral found in a thin section from a sample as the year of the first publication or report about the collectedatasetofexactcoordinates.Othertimes,this mineral,not theIMAapproval orredefinition year. termhasamuchbroaderandlessdefinedmeaning,for Mineral abundance. When comparing the abun- example,anoutcrop,amine,orageologicformation. danceofelementsintheEarth’scrusttothenumberof Thelatterintroducesbiasinthedataset,whichprefers mineralswhereagivenelementispresentinessential microlocalities,theseoftencorrespondingtodifferent quantities, a simple rule was followed: if a chemical outcrops of the same rock formation or to a different element is present in a stoichiometric formula, it is levelwithin an undergroundmine. considered an essential constituent. This metric is The data modeling presented in this paper is dependent on the correct number of the localities for calleda‘‘mineralclarke’’(Krivovichevetal.2018),a a given mineral species and other data used in proportion of minerals containing a specific element conjunction with the locality counts. While the relative to all those minerals present in the data set. localities used in this research constitute a data point TheKrivovichevetal.(2018)approachwasfollowed, (geographical location of mineral occurrence), it also and this parameter was calculated as a proportion of introducesanimmeasurablebiasintotheresultsforthe minerals containing the element but only within the reasons mentionedabove. Additional caveats include: members of a specific rarity group, called ‘‘mineral (1) the number of occurrences does not necessarily abundance’’forclarity(i.e.,amineralabundanceofSi reflecttheabundanceofagivenmineralintheEarth’s meansaproportionofmineralswhereSiispresentin crust; (2) the abundance of a single phase at the all minerals within the rarity group under consider- locality is not considered, and (3) frequently, a few ation). Technically, the mineral abundance of each localities represent different portions of a given element was calculated using a regular expression quarry, open pit, or mine, all related to the same ore extraction of unique elements from every IMA- body,geologicalformation,orstratigraphicunit.Inthe approved mineral formula. This data, grouped by firstcase,whenconsideringtwodifferentrarespecies mineral species, was used to calculate co-occurrence recorded at fewer than five localities, they are very matrices of elements within each rarity group and Downloaded from http://pubs.geoscienceworld.org/canmin/article-pdf/60/5/731/5721758/i1499-1276-60-5-731.pdf by Mineralogical Association of Canada user on 17 November 2022 736 THECANADIANMINERALOGIST FIG.1.Mineral-localitypairsusing:(a)rawdataand(b)log-transform(base10). compare their crustal abundance to the observed within this type of data, and to further interpret the distributionwithin minerals. resultsinlightof theclassification obtained. Themineralversuslocalitycountspairswereused Dataclusteringandclassification asentrydatafortheanalysis,sincethegoalduringthis step was to find the similarities within localities data The raw localities data is hard to visualize and ratherthantesthypotheses.Datapreparationinvolved interpretbecauseofitscomplexityandavastspreadof data cleaning and transformation, including feature data(Fig.1)—somemineralsarerecordedatonlyone scaling—a common requirement for many machine locality (gaotaiite) while others are recorded at learning estimators, so that individual features look .60,000 localities (quartz). A common technique to likestandardnormallydistributeddata(e.g.,Gaussian delineate raw one-dimensional data into categories is withzeromean and unit variance). to calculate the frequency distribution over the 25%, Twowaysto placethedata attributes onthesame 50%, and 75% quantiles. In the case of mineral- scale were considered: so-called min-max scaling localitypairs,25%correspondstoonelocality,50%to (normalization) and standardization. Considering that 4, and 75% to 18 localities. The latter appears mineral-localitycountpairsareleft-skewed(owingto reasonableenoughandsuggestsapreliminaryskeleton a large group of minerals occurring at only a few forthetaxonomy asfollows: localities), log-transformation (base 10), furthered by standardization, was employed to achieve the unit (1) mineralsoccurringat1–4localitiesmakeup46% variance and to reduce the impact of outliers on the of the data volume. All of these species are rare, clustering. The standardization component included while those found at exactly one locality are removingthemean and scalingto unit-variance. endemic; K-means is an unsupervised machine learning (2) mineralsfoundat5–18localitiesconstitutenearly technique commonly used to organize a sizable 26% of data volume and could be regarded as multivariatedatasetintoclassesandrecognizestypical transitional betweenrare and commonminerals; characteristics. For the k-means algorithm, often (3) mineralsrecordedat.18localitiesmakeup28% referred to as Lloyd’s algorithm (Lloyd 1982), there of data volumeand are relatively common. arethreeprincipalhyperparameterstosetuptodefine Based on the preliminary examination of the themodel’sconfiguration:(1)initialvaluesofclusters, quartiles, a set of clustering techniques of unlabeled (2) distance measures, and (3) the number of clusters data was employed to separate samples into n groups (k value). The k-meansþþ initialization was used to of equal variance, to identify similar rarity groups select initial cluster centers for clustering, with 10 as Downloaded from http://pubs.geoscienceworld.org/canmin/article-pdf/60/5/731/5721758/i1499-1276-60-5-731.pdf by Mineralogical Association of Canada user on 17 November 2022 THETAXONOMYOFMINERALOCCURRENCERARITYANDENDEMICITY 737 FIG. 2. Visual representation of the Elbow method: within- FIG.3.KDEplotscalculatedusingtheEpanechnikov(1969) cluster sum of squares (WCSS) and the number of method on one-dimensional standardized and scaled clusters. localitycountswithvariousbandwidths. thenumberoftimesthek-meansalgorithmwillberun in the analysis. Results coming from this choice withdifferentcentroidseeds,and300asthemaximum suggest the existence of the following clusters within number of iterations of the k-means algorithm for a thedata: single run. In addition, the k-meansþþ allows the initialization of the centroids distant from each other. (1) highly dense cluster with minerals occurring at 1 The number of clusters (k) is the most critical locality; hyperparameter in k-means clustering. In order to (2) 2–4localities; calculate the optimal k, both elbow and silhouette (3) 5–16localities; (Rousseeuw 1987) methods wereused. (4) 17–70localities; Severalk-meanswererun,incrementedkwitheach (5) mineralsoccurring at .70localities. iteration, and recorded the sum of the squared distances(within-clustersumofthesquare)witheach Another powerful method to identify intervals run to perform the empirical elbow method (Fig. 2). within a one-dimensional array of data is KDE, often The‘‘elbowpoint’’iswherethewithin-clustersumof referred to as the Parzen-Rosenblatt window method square (WCSS) curve starts to bend. The x-value of (Rosenblatt1956,Parzen1962),anon-parametricway thispointfortheuseddatasetisbetweenk¼4andk¼ to estimate the probability density function of a 7,meaningthattheoptimalnumberofclusterswould random variable. The KDE method is also regarded be within thisrange. as an unsupervised clustering approach and is Thesilhouettecoefficientisanothermethodtofind instrumental in estimating the class-conditional mar- the optimal number of clusters and measures cluster ginal densities of data. During the study, Gaussian, cohesion and separation. The technique allows the Epanechnikov,andtophat(rectangular)methodswere identification of how well each object has been used to calculate KDE over various bandwidth classified based on two factors: (1) how similar the parameters from 0.1 to 0.9 at 0.05 step. The dataobjectistootherobjectsinitscluster(cohesion) Epanechnikov (1969) method revealed the highest and(2)howdifferentthedataobjectisfromobjectsin performance and data fit (Fig. 3). The bandwidth other nearestcluster (separation). affects how ‘‘smooth’’ the resulting KDE curve is, Despitealloftheclustersbeingabovetheaverage controlling the tradeoff between bias and variance in silhouettescores,thesilhouetteanalysisshowsthatthe the result. This approach was taken to determine the cluster values of 2–4 are a poor choice for the given optimal number of clusters by first finding dense data,duetowidefluctuationsinthesizeoftheclusters regions in data and focusing on features described by that result. Silhouette analysis is more ambivalent in localmaxima(modes)andlocalminima(anti-modes) decidingbetween5and7.Whentheclusternumberis (Bugrien et al. 2014). Accordingly, KDE using the equal to 5, all the clusters are more or less of similar Epanechnikovmethodimpliestheexistenceofatleast thickness. For this study, and to better differentiate 6 margins within the data: 1, 2, 4, 5, and between 12 mineralsoccurringatpreciselyonelocality,akvalue and 20 localities, depending on the bandwidth ¼5wasselectedfortheinitialnumberofclustersused parameter. Downloaded from http://pubs.geoscienceworld.org/canmin/article-pdf/60/5/731/5721758/i1499-1276-60-5-731.pdf by Mineralogical Association of Canada user on 17 November 2022 738 THECANADIANMINERALOGIST TABLE1.THETAXONOMYOFRARITYGROUPS ofmineraldiscoveryandthediscoveryyear.Thelatter iscovered inthefollowing subsection ofthepaper. Rarity Rarity Numberof Numberof group subgroup Index localities minerals b) R:rare subgroup Rare Endemic RE 1 1232 These minerals are not endemic anymore, but are Rare RR 2–4 1473 stillvery rare,taxonomically. Transitional Rare TR 5–16 1313 Ubiquitous TU 17–70 840 2. T: transitional group – minerals occurring at 5–70 localities (2153 minerals): Ubiquitous U .70 701 The term ‘‘transitional species’’ designates those speciesthatarenolongerrarebutarenotyetcommon THE TAXONOMY OF MINERAL RARITY enough at the same time, i.e., they are transitional betweenthesetwocategories.Generally,thesespecies It must be noted that the categorization based on can relatively quickly transform through a ‘‘normal the k-means clustering results, the KDE implications, rarityflow’’,duetotheirdiscoveryatnewlocationsor and quantile assessment broadly intersects, especially the intensive application of new analytical methods. intherangeofrarerspecies,sincemorethanahalfof For instance, more than 4500 IMA-approved species thedatavolumeisconcentratedwithintherangeof1– were discovered during the period 1950–2021. Once 4 localities. Therefore, all the results were taken into discovered, almost every mineral was regarded as account, and the locality counts data were classified endemic or rare. However, some of them were later into the following groups (Table 1), each of which discovered at enough additional localities to shift the represents therarity taxonomy ofa mineralspecies: rarity status of these species toward a more common state, into a ‘‘transitional state’’, in this case. The 1. R:raregroup–mineralsoccurringat,5localities transitionalspeciesaredividedintotwosubcategories, (2705 minerals) delineatingthetypeof thetransition: This group is well-described byHazen & Ausubel a) R:rare subgroup (2016),whodefined‘‘rare’’mineralsasthoserecorded fromfiveorfewerlocalities.Whiletheywerethefirst Themineralsofthissubgroupshouldbeconsidered to make an in-depth analysis of mineral rarity and to rarewithinabroaderextent.However,theyarehighly likely to become common and transfer into the next pose the question in regard to taxonomy, this paper stage withthecontinuation ofmineral exploration. recommends limiting this range to 1–4, since all algorithms, especially the KDE, detect an extremum b) U: ubiquitoussubgroup between4and5localities.Additionally,theraregroup is further split into two distinct and dense subcatego- In contrast, the minerals of this subgroup are no ries: longerconsideredrare.Theyarenotlikelytobecome rareby‘‘inverserarityflow’’,andtheytendtobecome a) E:endemicsubgroup ubiquitous. Once a new mineral is discovered, it is often 3. U: ubiquitous group – minerals occurring at .70 reported as endemic unintentionally. For instance, of localities (701 intotal). 65 minerals approved by IMA in 2021, 57 were reported from only one locality (as of November 11, Thesemaybeconsideredasrock-formingminerals thatare relatively widespread in distribution. 2021) and thus constitute endemic species. However, nitscheite, a new endemic mineral with a uranyl- ‘‘True’’ endemicity sulfatesheet,wasfirstreportedfromtheGreenLizard Mine(Utah,USA)andsubsequentlydiscoveredatthe Itisnotedthattheproportionofobservedendemic Krunkelbach Valley Uranium deposit (Freiburg Re- minerals has increased steadily since 1800 (Fig. 4), gion, Germany), shortly after the discovery of the increasingfrom12.5%in1839to87.6%in2021,with mineral at the type locality. The latter indicates the only slight deviations from the general trend prior to importance of finding the criteria to discriminate 2011.Asteepincreaseisobservedstartingfrom2012. betweennewlydiscoveredspeciesthatare‘‘endemic’’ Thelattercouldbeattributedtoageneralclimbinrate conventionally (RE) and ‘‘true’’ endemic minerals of mineral discovery during this period. However, (tRE). Accordingly, two more factors should be uponcloserexamination,itcanbeseenthatdespitea consideredwhenassessing‘‘trueendemicity’’:therate similarrateofmineraldiscoveryovertheperiod2009– Downloaded from http://pubs.geoscienceworld.org/canmin/article-pdf/60/5/731/5721758/i1499-1276-60-5-731.pdf by Mineralogical Association of Canada user on 17 November 2022 THETAXONOMYOFMINERALOCCURRENCERARITYANDENDEMICITY 739 2011, the proportion of endemic minerals was relatively lower compared to that over the period 2012–2021. Considering the principle of ‘‘normal rarity flow’’, a hypothesis can be made that a portion oftheobservedREmineralsdiscoveredduringthelast decade will become RR over the next few years. To decipher the discovery rate of ‘‘true’’ endemicity and determine the optimal features required for training, the ML model was trained, and several linear regression models were tested, using four different spans of years: (1) 1900–1990, (2) 1900–2000, (3) 1900–2011, and (4) 1900–2021. The presumption is that the discovery year and the number of minerals discovered during a specific year are independent features during this analysis, while the number of FIG.4.Discoveryrateofendemicminerals. endemicmineralsdiscovereddependsonthem.Before trainingthemodels,thenonlinearfeatureoftheinput datawasconsidered,and asecond-degreepolynomial feature transformation was applied. After the models were trained, the results were compared with those trained with third-degree polynomial transformed features (Fig. 5). FIG.5.Polynomialregressionmodelpredictionswith95%confidenceintervalsofendemicmineraldiscoveryratesusing2nd and3rddegreepolynomialtransform. Downloaded from http://pubs.geoscienceworld.org/canmin/article-pdf/60/5/731/5721758/i1499-1276-60-5-731.pdf by Mineralogical Association of Canada user on 17 November 2022 740 THECANADIANMINERALOGIST TABLE2.EFFICIENCYOFPOLYNOMIALREGRESSIONMODELSWITHDIFFERENT TRAININGSETS Modelestimates 2nddegreepolynomialtransform 3rddegreepolynomialtransform Explained Explained Trainingset R2 RMSE variance R2 RMSE variance 1900–1990 0.6444 1.1685 0.6444 0.7519 0.9759 0.7659 1900–2000 0.9432 0.7781 0.9432 0.9360 0.8259 0.9360 1900–2011 0.6889 2.7114 0.6900 0.6805 2.7477 0.6824 1900–2021 0.7527 12.4175 0.7556 0.7940 11.3352 0.7977 In some cases, models trained on third-degree COMPOSITION OF RARITY GROUPS polynomial transformed features lead to ‘‘overfit- RaregroupR(RE,RR,TR) ting’’, specifically when training is performed using data over the range of 1900–2021. Additionally, Over the period 1800–1950, 323 rare (20 RE, 75 modelingissomewhatdependentonthesetofinitial RR, and 228 TR) minerals were discovered, and featuresthatwereused.Therefore,theinitialdatafor another 3639 (1165 RE including 276 tRE, 1306 RR, each run was split into a training set (90% of data) and 1168 TR) were discovered after 1950. The and a testing set (10%), which was followed by majority of rare group minerals are silicates (RE: features shuffling, model training, running the 317 including 78 tRE, RR: 379, TR: 332), along with predictions, and finally, evaluating the efficiency of phosphates,arsenates,andvanadates(RE:286includ- the model (Table 2). ing 72 tRE, RR: 264, TR: 212). The most uncommon Takingintoaccounttheresultsofmodelfitsanda are so-called organic minerals (RE: 24 including 12 general presumption that the number of endemic tRE; RR: 16, TR: 12) and those containing native minerals observed during the last decade has been elements (RE: 29 including 5 tRE, RR: 37, TR: 34) overestimated, the features set for the period 1900– (Fig.6).EachgroupofRE,RR,andTRexhibitssimilar 2000 transformed to a second-degree polynomial was distributions of those minerals representing different chosenforestimatingthepreliminarygapbetweenthe crystal-chemical groups, thus suggesting a general values of ‘‘true’’ endemicity and those observed. The homogeneitywithin values for(RE,RR,TR). Nearly 70% of RE, RR, and TR mineral species predictedtargetvaluesfortheperiodof2000–2021are exhibit low symmetry, i.e., belonging to the triclinic, 17–33 for ‘‘true’’ endemic minerals, with ~550 total monoclinic, and orthorhombic crystal systems, with endemicspecies.Thisnumberisonlyapproximate,but only7.3%ofthosemineralsintheREandRRgroups theapproach and theresultsobtained generallyimply being cubic, rising slightly to 8.9% in the TR group that: (1) the ‘‘true’’ endemicity rate follows the same (Fig. 7). As in the case of chemical groups, the generaltrendasrateofdiscovery,butwithatendency distribution patterns in the RE, RR, and TR groups to increase at a rate lower in magnitude, and (2) the follow similar patterns when the proportions of those ~380 minerals discovered after 2000 are more likely minerals with different crystal systems are compared. to be classified as RR rather than RE, taxonomically. Following the approach used by Urusov (2002), the The latter indicates that the total number of tRE symmetryindexisdefinedas(HþM)/L,whereHisthe mineralsmightbelowerthanthatofRE(~850versus percentage of species with the highest symmetry 1232), while the number of RR could be higher category and M and L are the percentages of the (~1750versus1473).However,theseobservationsdo intermediateandlowestcategories.Applyingthis,the not influence the proposed classifications, since any symmetry indices vary between 0.47 and 0.48 for RE deviations only relate to rare group species, specifi- and RR groups, respectively, while TR exhibits a callyREandtRE.Takingintoaccountthatthe‘‘true’’ slightly higher value of 0.54. The latter is still below discovery rate of endemic mineral species clearly theindexof0.55calculatedbyUrusov(2007)for3314 deviatesfromtheobservedrateofdiscovery,separate rare species. data analytics are provided for all endemic minerals Taking into account the chemical similarities (RE),alongwithdataforthosediscoveredbefore2000 among values for R(RE,RR,TR) and the difference in (tRE),where appropriate. symmetry indices between R(RE,RR) and TR, further Downloaded from http://pubs.geoscienceworld.org/canmin/article-pdf/60/5/731/5721758/i1499-1276-60-5-731.pdf by Mineralogical Association of Canada user on 17 November 2022

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.