Thesis submitted in fulfilment of the requirements for Degree of Doctor of Philosophy Antarctic biodiversity surveys using high throughput sequencing: understanding landscape and communities of the Prince Charles Mountains Paul Czechowski December 2015 School of Biological Sciences Di�culties are just things to overcome, after all - Ernest Shackleton Contents Abstract v Publications, Presentations, Awards vii Thesis Declaration ix Acknowledgments x Chapter Format xi 1. Antarctic terrestrial biodiversity and environmental metagenetics 3 1.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Technical considerations . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1. Extraction of environmental samples . . . . . . . . . . . . . . 6 1.2.2. High throughput platforms . . . . . . . . . . . . . . . . . . . . 7 1.2.3. Marker choice . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.4. Library generation . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.5. Amplification . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.6. Sequence analysis . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.7. Recent improvements of metagenetic HTS approaches . . . . . 11 1.3. The potential of metagenetics for Antarctic biology . . . . . . . . . . 15 1.3.1. Community structures . . . . . . . . . . . . . . . . . . . . . . 15 1.3.2. Geographic distribution of Antarctic biota . . . . . . . . . . . 16 1.3.3. Supporting conservation e�orts . . . . . . . . . . . . . . . . . 17 1.4. Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.6. Authors contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2. Eukaryotic soil diversity of the Prince Charles Mountains 21 2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 i Contents 2.2. Methods and materials . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.2.1. Fieldwork, soil storage and DNA extraction . . . . . . . . . . 25 2.2.2. Amplification and library generation . . . . . . . . . . . . . . 26 2.2.3. Read processing . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.2.4. Eukaryotic a and b diversity comparison . . . . . . . . . . . . 27 2.2.5. Distribution of phylotypes across sites . . . . . . . . . . . . . 28 2.2.6. Species-level assignment of phylotypes . . . . . . . . . . . . . 30 2.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.1. Read processing . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.2. Eukaryotic a and b diversity comparison . . . . . . . . . . . . 31 2.3.3. Distribution of phylotypes across sites . . . . . . . . . . . . . 33 2.3.4. Species-level assignment of phylotypes . . . . . . . . . . . . . 34 2.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.4.1. Technical considerations . . . . . . . . . . . . . . . . . . . . . 35 2.4.2. Di�erences in eukaryotic diversity among three locations . . . 36 2.4.3. Distribution of highly abundant phylotypes . . . . . . . . . . . 37 2.4.4. Validity of species-level taxonomic assignments . . . . . . . . . 38 2.5. Summary and conclusions . . . . . . . . . . . . . . . . . . . . . . . . 39 2.6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2.7. Supplemental data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.8. Supplemental information . . . . . . . . . . . . . . . . . . . . . . . . 41 2.8.1. Methods and Materials . . . . . . . . . . . . . . . . . . . . . . 41 2.8.2. Results and comments . . . . . . . . . . . . . . . . . . . . . . 50 3. Phylotypes and morphotypes of Antarctic invertebrates 59 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.1. Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.2.2. DNA extractions . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.2.3. Primers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.2.4. Amplification and sequencing . . . . . . . . . . . . . . . . . . 66 3.2.5. Reference data for taxonomic assignments . . . . . . . . . . . 66 3.2.6. Generation of phylotype observations . . . . . . . . . . . . . . 67 3.2.7. Selection of processing parameters for 18S and COI phylotypes 68 3.2.8. Concordance between phylotypes and morphotypes . . . . . . 69 ii Contents 3.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 3.3.1. Selection of analysis parameters . . . . . . . . . . . . . . . . . 71 3.3.2. Concordance between morphotypes and phylotypes . . . . . . 71 3.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.4.1. Analysis parameters . . . . . . . . . . . . . . . . . . . . . . . 74 3.4.2. Detecting cryptic invertebrates . . . . . . . . . . . . . . . . . 75 3.4.3. Metagenetic marker choice for Antarctic invertebrates . . . . . 76 3.5. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.6. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.7. Supporting information . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.8. Supplemental information . . . . . . . . . . . . . . . . . . . . . . . . 79 3.8.1. Methods and Materials . . . . . . . . . . . . . . . . . . . . . . 79 3.8.2. Tables and Figures . . . . . . . . . . . . . . . . . . . . . . . . 83 3.8.3. Analysis code . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 4. Salinity gradients determine invertebrate distribution 90 4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.2. Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.2.1. Fieldwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.2.2. Soil geochemical and mineral analysis . . . . . . . . . . . . . . 94 4.2.3. Preparation and analysis of environmental observations . . . . 97 4.2.4. Preparation and analysis of biological observations . . . . . . 98 4.2.5. Constrained ordination . . . . . . . . . . . . . . . . . . . . . . 99 4.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 4.3.1. Environmental data . . . . . . . . . . . . . . . . . . . . . . . . 99 4.3.2. Biological data . . . . . . . . . . . . . . . . . . . . . . . . . . 100 4.3.3. Biological data in relation to environment . . . . . . . . . . . 100 4.4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 4.5. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.6. Data accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.7. Authors contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.9. Funding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.10.Supplemental information . . . . . . . . . . . . . . . . . . . . . . . . 109 4.10.1. Phylotype data generation for 18S and COI . . . . . . . . . . 109 iii Contents 4.10.2. Sequence tag selection and amplicon orientations . . . . . . . 110 4.10.3. Intermediate results of environmental data processing . . . . . 111 4.10.4. Intermediate results of biological data processing . . . . . . . 111 4.10.5. Intermediate results of biological data in relation to environment111 4.10.6. Data and analysis scripts, additional figures and tables . . . . 112 5. Synthesis 130 5.1. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 5.1.1. Technical and computational methods . . . . . . . . . . . . . 130 5.1.2. Biodiversity information from the Prince Charles Mountains . 130 5.2. High throughput sequencing for Antarctica . . . . . . . . . . . . . . . 132 5.3. Implications and future improvements . . . . . . . . . . . . . . . . . . 135 5.4. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 A. Phylotype information chapter 2 139 B. Analysis code chapter 3 162 C. Analysis code chapter 4 190 D. Molecular tagging of amplicons 236 Bibliography 244 iv Abstract Antarctic soils are home to small, inconspicuous organisms including bacteria, uni- cellular eukaryotes, fungi, lichen, cryptogamic plants and invertebrates. Antarctic soil communities are distinct from other soil biota as a consequence of long-term persistence under harsh environmental conditions; furthermore their long history of isolation is responsible for a high degree of endemism. Of major concern is the establishment of non-indigenous species facilitated by human-mediated climate change and increased human activity, threatening the highly specialised endemic species. A lack of baseline information on terrestrial Antarctic biodiversity currently impairs e�orts to conserve the unique but still largely unknown Antarctic biota. In this thesis I apply metagenetic high throughput sequencing (MHTS) methods to address the deficiency of biological information from remote regions of continental Antarctica, and use the data generated to explore environmental constraints on Antarctic biodiversity. In Chapter 1, I introduce current issues impeding the generation of baseline Antarctic biodiversitydataandevaluatetheapplicationofusingMHTStechniques. Thisreview highlights the potential of using MHTS approaches using amplicon sequencing to retrieve Eukaryotic biodiversity information from terrestrial Antarctica. In Chapter 2, the eukaryotic diversity of three biologically unsurveyed regions in the Prince Charles Mountains, East Antarctica (PCMs) is explored. Total eukaryote biodiversity in the PCMs appears to follow an altitudinal or latitudinal trend, which is less obvious for terrestrial invertebrates. In order to apply MHTS to the study of Antarctic invertebrates, thecomparativetaxonomicassignmentfidelitiesofmetageneticmarkers and morphological approaches are explored in Chapter 3. Fidelities of taxonomic assignments to four Antarctic invertebrate phyla di�ered depending on metagenetic marker,andonlyapplicationofnon-arbitrarysequenceprocessingparametersresulted in these findings. In Chapter 4, I use MHTS-derived biodiversity information to explore the relationship between soil properties and invertebrate biodiversity in the PCMs. AcrosslargespatialscalesdistributionofphylaTardigradaandArachnidaand v classes Enoplea (Nematoda) and Bdelloidea (Rotifera) in inland areas are constrained by terrain-age-related accumulation of salts, while other Classes (Chromadorea, Nematoda and Monogonata, Bdelloidea) are better able to tolerate high salinity. In moister, nutrient-richer and more coastal areas, this e�ect was less pronounced and a higher invertebrate diversity was found. The methods applied and developed in this thesis are a valuable starting point to advance the collection of biodiversity information across terrestrial Antarctica and other remote habitats. The work presented here provides examples for generation and usage of MHTS information from remote Antarctic habitats, demonstrates how biodiversity information retrieved using di�erent metagenetic markers can be combined, developed methods for assessing the quality of MHTS markers and finally demonstrated the application of MHTS data to investigate the environmental determinants of invertebrate diversity in remote ice-free habitats. Future MHTS biodiversity studies of Antarctic terrestrial habitats should incorporate large sample numbers and use combined data from multiple genetic markers. vi Publications, presentations and awards Publications 2013 Laurence J. Clarke, Paul Czechowski, Julien Soubrier, Mark I. Stevens, AlanCooper(2014). ModulartaggingofampliconsusingasinglePCRfor high-throughput sequencing. Molecular Ecology Resources. 14, 117 121.1 Conference presentations 2014 2014 SCAR Open Science Conference (SCAR, SkyCity, Auckland, New Zealand): Paul Czechowski, Duanne White, Laurence J. Clarke, Alan Cooper, Mark I. Stevens (April): High-throughput DNA sequencing reveals Antarctic soil biodiversity. 2014 Understanding Biodiversity Dynamics using diverse Data Sources (CBA, AustralianNationalUniversity,Canberra,Australia): Paul Czechowski, Duanne White, Laurence J. Clarke, Alan Cooper, Mark I. Stevens (April): High-throughput DNA sequencing reveals Antarctic soil biodiversity. 2013 Biodiversity Genomics Conference (CBA, Australian National University, Canberra, Australia): Paul Czechowski, Laurence J. Clarke, Alan Cooper, Mark I. Stevens (April): Exploring Distribution and Evolution of Antarctic Invertebrates using High-Throughput Sequencing. 1See Appendix D vii Awards 2014 Australian Government’s National Taxonomy Research Grant Program - StudentTravelGrant,AustralianBiologicalResourcesStudy,Government of Australia, $ 1650- travel support for conference attendance. 2013 MiSeq Pilot the Possibilities Grant Program, Illumina Australia, One library reagents kit for experimental libraries. 2012 Small Research Grants Scheme, The Royal Society of South Australia, $ 1500,- for laboratory work. 2012 International Post-Graduate Research Scholarship, The University of Adelaide, $ 25849,- p.a. livelihood. viii
Description: