Current Biology, Volume 24 Supplemental Information Lactase Persistence Alleles Reveal Partial East African Ancestry of Southern African Khoe Pastoralists Gwenna Breton, Carina M. Schlebusch, Marlize Lombard, Per Sjödin, Himla Soodyall, and Mattias Jakobsson Inventory of Supplemental Information Figures S1a-c. Haplotype plots. Related to Figure 2 Figure S2. Genome local ancestry inference. Related to Figure 2 Figure S3. ADMIXTURE runs. Related to Figure 3 Figure S4. TREEMIX analysis. Related to Figure 3 Figure S5. Principal Components Analysis. Related to Figure 3 Figure S6. Selection scan results. Related to Results and Discussion Figure S7. Selection coefficient estimation. Related to Results and Discussion Table S1. Population information. Related to Table 1 Table S2. Polymorphism frequencies. Related to Table 1 Table S3. Formal tests of admixture and dating of admixture times. Related to Results and Discussion Table S4. Population information for admixture analyses. Related to Figure 3 Supplemental Introduction Supplemental Experimental Procedures Supplemental Discussion Supplemental References Supplementary Figures: Figure S1a: Haplotype plots over a 54.6 kb region of chromosome 2, showing the haplotype block surrounding the East African LP-variant for the merged dataset. All individuals containing the G variant at the 136,608,746 position in the southern African dataset has been extracted and only haplotypes containing the G variant at the 136,608,746 position (known from southern African only datasets above) was visualized together with all MKK haplotypes (without any filtering). The consensus sequence, showing the major allele in the Nama population, is presented on top of the figure. Haplotypes of individuals are shown below the consensus sequence and positions that differ from the consensus sequence are colored red, while positions similar to the consensus sequence are shown in blue. The Y axis indicates the population groups to which haplotypes belong to and the X-axis, the base pair number of the SNPs on chromosome 2. Figure S1b: Haplotype plot for southern African individuals over a 54.6 kb region of chromosome 2, showing the haplotype block surrounding the European LP-variant. Individuals containing the European variant, 13910T, at position 136,608,646 on chromosome 2 (homozygous and heterozygous) were extracted. Thereafter the haplotypes of these individuals were sorted according to the variant they contain at position 136,608,646 and visualized here. The consensus sequence, showing the major allele in the Coloured Wellington population, is presented on top of the figure. Haplotypes of individuals are shown below the consensus sequence and positions that differ from the consensus sequence are colored red, while positions similar to the consensus sequence are shown in blue. The Y axis indicates the population groups to which haplotypes belong to and the X- axis, the base pair number of the SNPs on chromosome 2. The position of the European LP-variant is highlighted in green on the X-axis. A green block outlines the common haplotype background, or haplotype block, associated with the European LP- variant. Figure S1c: Haplotype plots over a 54.6 kb region of chromosome 2, showing the haplotype block surrounding the European LP-variant for the merged dataset. All individuals of the southern African dataset containing the T variant at position 136,608,646 have been extracted and their haplotypes sorted according to the variant at the 136,608,646 position before visualization together with all CEU haplotypes (without any filtering). The consensus sequence, showing the major allele in the Coloured Wellington population, is presented on top of the figure. Haplotypes of individuals are shown below the consensus sequence and positions that differ from the consensus sequence are colored red, while positions similar to the consensus sequence are shown in blue. The Y axis indicates the population groups to which haplotypes belong to and the X-axis, the base pair number of the SNPs on chromosome 2. The position of the European LP-variant is highlighted in green on the X-axis. Figure S2: Genome local ancestry inference surrounding the LCT and MCM6 genes (positions highlighted red on X-axis) using parental populations of Ju’/hoansi San (red), East African Maasai (blue), West African Yoruba (green) and European HapMap CEU (yellow) populations. X-axis show the position along chromosome 2 and bars along the Y-axis are the 14 Nama chromosomes, the Y-axis for each individual shows the number of assignments to each of the four parental populations. Symbols on the right indicate whether the 14010C mutation was present in the particular individual (each individual is represented by two horizontal bars – chromosomes). Symbols are $ for homozygous for 14010C; # for heterozygous for 14010, + for homozygous for 14010G (variants were typed separately via sequencing the LCT control region). Figure S3: Full ADMIXTURE results: Genetic clustering analysis. Clustering of 766 individuals from 43 populations (233,363 SNPs) assuming two to 11 clusters (K=2-11). Figure S4: TREEMIX results. A) Tree of Khoe-San populations with comparative East and West African populations, assuming 3 migration edges. B) Tree of Khoe-San populations with comparative European, East and West African populations, assuming 6 migration edges. Figure S5: PCA results. A) First 4 principal components (PCs) including all Khoe-San populations together with East African, West African and European comparative populations. B) First two PCs of only the Nama together with East African, West African and European comparative populations. Figure S6: Selection scans. |iHS| values in the region chromosome 2 from position 135.5-137.5 Mb including the LCT and MCM6 genes. Black dots are individual |iHS| values while the gray line is an average over 30 SNPs with a 1 SNP step length. The orange horizontal bars give the genome wide |iHS| average for the particular population and the vertical lines indicate the standard deviation. The locations of genes in the region are shown by blue rectangles and the gene names are given above.
Description: