G C A T genes T A C G G C A T Article MHC-Dependent Mate Selection within 872 Spousal Pairs of European Ancestry from the Health and Retirement Study ZhenQiao1,JosephE.Powell2andDavidM.Evans1,3,* 1 UniversityofQueenslandDiamantinaInstitute,TranslationalResearchInstitute,Brisbane, Queensland4102,Australia;[email protected] 2 InstituteforMolecularBiosciences,UniversityofQueensland,Brisbane,Queensland4072,Australia; [email protected] 3 MedicalResearchCouncil(MRC)IntegrativeEpidemiologyUnit,SchoolofSocial&CommunityMedicine, UniversityofBristol,BristolBS81TH,UK * Correspondence:[email protected]@bristol.ac.uk;Tel.:+61-7-344-37051 Received:15November2017;Accepted:15January2018;Published:22January2018 Abstract: Disassortative mating refers to the phenomenon in which individuals with dissimilar genotypesand/orphenotypesmatewithoneanothermorefrequentlythanwouldbeexpectedby chance. Although the existence of disassortative mating is well established in plant and animal species,theonlydocumentedexampleofnegativeassortmentinhumansinvolvesdissimilarityat themajorhistocompatibilitycomplex(MHC)locus. Previousstudiesinvestigatingmatingpatternsat theMHChavebeenhamperedbylimitedsamplesizeandcontradictoryfindings. Inspiredbythe sparseandconflictingevidence,weinvestigatedtherolethattheMHCregionplayedinhumanmate selectionusinggenome-wideassociationdatafrom872EuropeanAmericanspousesfromtheHealth andRetirementStudy(HRS).First,wetreatedtheMHCregionasawhole,andinvestigatedgenomic similaritybetweenspousesusingthreelevelsofgenomicvariation: single-nucleotidepolymorphisms (SNPs),classicalhumanleukocyteantigen(HLA)alleles(bothfour-digitandtwo-digitclassifications), andaminoacidpolymorphisms. TheextentofMHCdissimilaritybetweenspouseswasassessed usingapermutationapproach. Second, weinvestigatedfinescalematingpatternsbytestingfor deviationsfromrandommatingatindividualSNPs,HLAgenes,andaminoacidsinHLAmolecules. Third,weassessedhowextremethespousalrelatednessattheMHCregionwascomparedtothe restofthegenome,todistinguishtheMHC-specificeffectsfromgenome-wideeffects. Weshowthat neithertheMHCregion,noranysingleSNPs,classicHLAalleles,oraminoacidpolymorphisms withintheMHCregion,weresignificantlydissimilarbetweenspousesrelativetonon-spousepairs. However, dissimilarity in the MHC region was extreme relative to the rest of genome for both spousalandnon-spousepairs. Despitethelong-standingcontroversy,ouranalysesdidnotsupport asignificantroleofMHCdissimilarityinhumanmatechoice. Keywords: disassortativemating;non-randommating;majorhistocompatibilitycomplex;human leukocyteantigen;mateselection 1. Introduction Assortative mating occurs when individuals tend to reproduce with mates who are either similar(knownaspositiveassortmentorhomogamy)ordissimilar(knownasnegativeassortment ordisassortativemating)tothemselves. Inhumanpopulations,large-scalestudieshaveshownthat theoverwhelmingmajorityofassortmentispositive[1–4]. Spousalcorrelationsindicatehighpositive assortmentontraitssuchasreligionandsocialattitudes,moderateassortmentonintelligence,and Genes 2018,9,53;doi:10.3390/genes9010053 www.mdpi.com/journal/genes Genes 2018,9,53 2of13 lowassortmentformanyphysicaltraitsincludingheight[5]. Theonlywell-documentedexample ofnegativeassortmentinhumansisreportedtoinvolvespousaldissimilarityathumanleukocyte antigen(HLA)alleleswithinthemajorhistocompatibilitycomplex(MHC)region,althoughthisis controversialandtheevidencehasbeenconflicting[6–10]. The MHC is a highly polymorphic region of the human genome located on the short arm of chromosomesixthatextendsapproximately4.97Mb. TheMHCcontainsmanygenesthatencodecell surfacemoleculesinvolvedinimmunerecognitionandresponsetospecificpathogens,includingthe humanleukocyteantigen(HLA)genesthatcodeforcellsurfacereceptorsthatpresentforeignand self-peptidestotheadaptivearmoftheimmunesystem[6,11]. Thehighpolymorphismandgreat diversitywithintheMHCinhumansandotherspecieshasbeenhypothesizedtobearesultofseveral evolutionaryforcesincludingbalancingselectionandpotentiallysexualselectionanddisassortative mating[7,12]. NegativeassortmentatMHClocihasbeenobservedinanumberofvertebrates[13], suchas fish[14,15],reptiles[16],birds[17,18],rodents[19–21],andotherspecies[22]. Onehypothesisisthat sincenegativeassortmentincreasesheterozygosityatHLAloci,thisgreaterdiversitywillimprove offspring’s ability to respond to a greater range of pathogens, and consequently confer a survival advantage. Althoughtheexistenceofdisassortativematingiswell-establishedinanimals,evidence forMHC-basedmatechoiceinhumansishighlycontroversial[6,8,9]andwasbasedatleastinitially uponindirectevidenceobtainedfromsocalled“sweatyT-shirt”studiesthatshowedwomenwere foundtopreferT-shirtswornbyMHC-dissimilarmen[23]. Morerecently,genomicstudieshavebeenconductedtodirectlymeasureHLAsimilarityamong real-worldcouples. Theresultsofthesestudieshavebeenconflicting. Manyhavefoundnostrong evidence of disassortative effects of the MHC on mate choice [24–29], whilst others have shown dissimilarity[30]orevenexcessMHCsimilarityamongspouses[31]. Theavailabilityofgenome-wide data has also allowed researchers to investigate MHC-dependent mate choice at better resolution, as well as to contrast genetic similarity between spouses across the MHC region from similarity acrosstherestofthegenome. In2008,Chaixetal. foundthat28EuropeanAmericancouplesinthe HapMap2datasetweresignificantlymoreMHCdissimilarthanrandompairingsofindividualsin the same dataset, suggesting an important role of the MHC in mate selection (although the same authorsfoundnoevidenceofexcessMHCdissimilaritybetweenAfricanspouses)[6]. Shortlyafter, Dertietal. failedtoreplicatethisanalysisintheHapMapphase2dataset,ortovalidatetheresultsin thelargerHapMapphase3sample[8]. However,bothofthesestudiesutilizedsmallsamples,which maynotbelargeenoughtodemonstratenegativeassortmentreliablyandarebasedonacultural isolate(i.e.,MormonsinUtah),whichmaynotbegeneralizabletootherpopulations. Incontrast,in alargersampleof930coupleswithchildrenafflictedwithmultiplesclerosis(MS),Khankhanianet al. reportedexcessdissimilarityintheMHCIIregion,butexcesssimilarityintheMHCIregion[32]. However, this dataset may not be representative of the general population and may be subject to spuriouscorrelationsresultingfromtheascertainmentonaffectedoffspringwithmultiplesclerosis. AveryrecentreviewofMHC-dependentmateselectioninhumanandnon-humanprimatesfound some evidence for preferential selection of MHC diverse mates (i.e., individuals with high MHC heterozygosityweremostdesirable)butnooveralleffectofMHCdissimilarityonmatingdecisionsin humans[33]. Drivenbytheseapparentlycontradictoryfindings,weinvestigatedMHC-dependentmatechoice inalarge,independentsampleofindividualsfromtheHealthandRetirementStudy[34]. Inaddition to investigating overall genomic similarity across the entire MHC region, we also tested couples forincreased/decreasedsimilarityatindividualsingle-nucleotidepolymorphisms(SNPs),classical HLAalleles,andHLAaminoacids. Ourrationaleforexaminingfinescalepatternsofvariationwas thatdifferentregionsoftheMHCmayexhibitvaryinglevelsofsimilarity/differentiationandthat examiningallvariantsintheMHCregionincombinationmaynotaccuratelyrepresentthebiological significanceofindividualloci. Genes 2018,9,53 3of13 2. MaterialsandMethods 2.1. Data PubliclyavailabledatafromtheHealthandRetirementStudy(HRS;dbGaPStudyAccession phs000428.v1.p1)wasusedtoinvestigategeneticsimilaritybetweenspousepairs. TheHRSconsists of 12,507 individuals genotyped on the Illumina HumanOmni2.5-4v1 array (Illumina, San Diego, CA,USA)[34]. Accordingtothedocumentationsuppliedwiththedataset,8652individualswereof Europeanancestry,andthiswasconfirmedbyprincipalcomponentsanalysisappliedtogenome-wide SNPdata(FigureS1). Weremoved1000ofthe8652individualswithestimatedgenome-wideSNP relatedness greater than 0.025 (roughly equal to 5th degree relatives) as calculated by the GCTA software [35–37]. Of the remaining individuals, there were 872 opposite-sex spousal pairs (where cohabitationwastreatedthesameasmarriage),whichformthebasisofallanalysespresentedhere. Allcoupleswerebornbetween1910and1960,andmorethan90%wereborninthe1930sand1940s. Strictqualitycontrol(QC)procedureswereperformedtoensureonlycommon,highqualitygenotypes wereusedintheanalysis: noSNPsfailedHardy–Weinbergequilibriumtest(HWEp-value<10−6)and 75,717SNPswereremovedduetoamissingrategreaterthan0.01. AfterHLA-imputation(seebelow), afurther937,969SNPswereremovedfromanalysesbecausetheyhadaminorallelefrequency(MAF) lessthan0.05. AfterindividualandmarkerQC,1,138,428autosomalSNPsremained. Inthisdataset, 4678SNPswerelocatedintheMHCregion(definedaspositions28,477,797–33,448,354onchromosome 6,inGRCh37/hg19genomebuilds)[38]. TheclassicalHLAalleles,andHLAaminoacidpolymorphismsinHLAgenes,wereimputed basedontheSNPsintheMHCregionusingthesoftwarepackageSNP2HLA[39]. EightHLAgenes (threeclassIgenes: HLA-A,-B,-C,fiveclassIIgenes: HLA-DPA1,-DPB1,-DQA1,-DQB1,and-DRB1) wereimputed. TheHLAalleleswereimputedforbothtwo-andfour-digitclassifications,andthe identity of 399 amino acid sequences/residues located within the HLA region was inferred using thefour-digitHLAalleleinformation. WethenchoseHLAtypesforgenesandaminoacidsforeach individualbasedontheimputedposteriorprobabilities. Iftheimputedprobabilitiesforanindividual implicatedmorethantwopossiblealleles/aminoacids(i.e.,multiallelicsites),thenthegenotypewas settomissing. ThemissingrateforindividualHLAgenesandaminoacidsreferstothepercentage of individuals having at least one missing allele at this locus (Tables S1–S3). Twenty three out of 399aminoacidresidueswithamissingrategreaterthan1%wereremovedfromthedownstream analyses.SNP2HLAhastheadvantageofenablingresearcherstopotentiallyidentifyfunctionalcoding variantsthatmightbecausalforcomplexdiseases[39]. 2.2. StatisticalAnalysis 2.2.1. MajorHistocompatibilityComplexRegionasaWhole WefirsttreatedtheMHCregionasawhole,andinvestigatedmateselectionpatternsusingthree levelsofgenomicvariationseparately(i.e.,threedifferentanalyses): SNPs,classicalHLAalleles,and aminoacidpolymorphisms. Wethentestedforanydeviationsfromrandommatingatindividual SNPs,HLAgenes,andHLAaminoacidsinordertoinvestigatewhethertherewasanyevidencefor finescalemateselectionpatterns. Followingpreviousstudies[6,8],theproportionofidenticalvariantsbetweenapairofindividuals wasquantifiedusingtheidentitycoefficient(Q).ForagivenSNP,Qwas0ifbothindividualswere homozygous and carrying a different allele (e.g., AA and GG); Q was 1 if both individuals were homozygousandcarryingthesameallele(e.g.,AAandAA)orheterozygousandcarryingthesame alleles(e.g.,AGandAG);andQwas0.5inallothercases. InthecaseofagivenclassicalHLAgeneor aminoacidresidue,wedefinedtheidentitycoefficientas0ifthetwoindividualsshared0HLAalleles oraminoacidresiduesincommon,0.5iftheysharedonlyonealleleoraminoacidresidueincommon, and1inallothercases. Genes 2018,9,53 4of13 TherelatednesscoefficientRwasdefinedasR =(Q −Q )/(1−Q )whereQ istheidentity a,b a,b m m a,b coefficientforagivenpairofindividuals(a,b),andQ istheidentitycoefficientaveragedoverall m possibleopposite-sexpairsofindividualsinthestudysample(wenotethatthisdiffersslightlyfrom the definition of Q as defined in Chaix et al. [6] who calculate Q over all individuals, not just m m oppositesexpairs). PositivevaluesofR indicategeneticsimilaritybetweenindividualsaandb, a,b whereasnegativevaluesimplygeneticdissimilaritybetweenthemrelativetorandomopposite-sex pairsofindividualsinthesample. Themean(R )andmedian(R )valuesofRwere mean(spouse) median(spouse) derivedforallspousalpairsintheHRS. ThegeneticrelatednesscoefficientRwascalculatedacrosstheMHCregionforeachofthreelevels ofgenomicvariation: SNPs,classicalHLAgenes(bothfour-digitandtwo-digitclassifications),and aminoacids. Namely,identitycoefficientsforeachcouple(Q )werefirstcalculatedacross4678SNPs, a,b 8HLAallelictypes,or376HLAaminoacidsresidues,andthecorrespondingQ wasobtainedby m averagingoverallpossibleopposite-sexpairings. Then,therelatednesscoefficientbetweencouples (R )atdifferentlevelscouldbecomputedusingthecorrespondingQ andQ . NotethatRwas a,b a,b m amultilocusrelatednesscoefficientwhenusingclassicalHLAallelesoraminoaciddata. TheextentofMHCdissimilaritybetweenspouseswasassessedusingapermutationapproach. In each permutation, females were randomly paired with males and relatedness coefficients were calculated for each pairing. The mean and median relatedness coefficients (R or mean(permutations) R )werethencalculatedacrossallcouplesforeachpermutationandatwo-sidedp-value median(permutations) wascalculated. Thetwo-sidedp-value(p)wasequaltotheproportionofpermutationsinwhichthe mean/median of the permutated couples was the same or more extreme than the mean/median relatedness across the actual couples; 100,000 permutations were performed. Real couples were allowedinpermutations. 2.2.2. GenomicSimilarityacrosstheMHCComparedtoGenomicSimilarityacrosstheGenome The permutation approach described above was used to assess whether MHC relatedness amongcoupleswasmoredissimilarthanMHCrelatednessamongstrandompairingsofoppositesex individuals. However,wewerealsointerestedinhowextremespousalrelatednessattheMHCregion wascomparedtotherestofthegenome. Following previous studies [6,8], we calculated the recombination rates and relatedness coefficientsbetweenspousalpairsforslidingwindowsof4.97Mb(i.e., thesamesizeastheMHC region) across the genome. Recombination rates were obtained from phase 2 of the International HapMapProject,whichwereliftedovertoHg19[40],andthecentromerebasepositionswereobtained from the UCSC Genome Browser web site (http://genome.ucsc.edu). The detailed steps for this analysis were as follows: first, the genome (except chromosomes 6, X and Y) was divided into segments of 4.97 Mb in overlapping increments of 300 Kb. Segments with less than 1000 SNPs or thoseoverlappingacentromerewereexcluded. Therationaleforremovingchromosome6wasto ensure that the ‘background’ genomic windows included neither the MHC region nor a region in linkagedisequilibrium(LD)withtheMHC.Second,therecombinationrate(cM/Mb)wascalculated by dividing the difference in genetic distances (which reflects recombination frequency) between thetwopositionsateachendofthegenomicwindowby4.97Mb. Toaccountforthelowlinkage disequilibriumacrosstheMHC,weonlyutilizedsegmentswithrecombinationratesthesameorlower thantheMHCregion. Werandomlysampled10,000genomicwindowsfromtheremainingwindows. Wethencomparedthemeanandmedianrelatednesscoefficientsbetweenspousalpairsacrossthe MHCregionagainstsimilar-sizedwindowsacrossthegenomethatmettheabovecriteria. Wealso examinedgenomicsimilarityinthecaseofopposite-sexnon-spousalpairs. 2.2.3. IndividualSNPs,HLAAlleles,andAminoAcids WefurtherinvestigatedMHCdissimilarityatfinerscales. First,inordertodetectevidenceof significantlydissimilarSNPswithintheMHCregion,theSpearman’srankcorrelationcoefficientρ Genes 2018,9,53 5of13 wasusedtomeasurerelatednessbetweenspousalpairsbasedontheirSNPgenotypes(codedas0,1, and2)atindividualSNPs. ForHLAgenesandaminoacidresidues,relatednesswasmeasuredusing asimilarityscoreSC,whichwascalculatedbycountinghowmanycopiesofHLAallelesoramino acidsweresharedbetweeneachspousepair(i.e.,0,1,or2)andthensummedacrossallspouses[32]. Thus, a SC could be obtained for each of the 8 classical HLA alleles and 376 amino acid residues. Significancewasassessedusinganormaldistribution,withmeanandstandarddeviationestimated from100,000randompermutations(wherethespouseswerereordered);allp-valuesweretwo-sided. 3. Results Inthisstudyweinvestigatedgeneticsignaturesofnon-randommatingpatternsusingseveral complementaryapproaches.First,weusedtherelatednesscoefficientRtoinvestigategeneticsimilarity between spouses compared to random combinations of opposite sex individuals across the MHC region. WealsocomparedthedegreeofsimilaritybetweenspousesintheMHCregiontotherestof thegenome,tocharacterizewhethertheMHCappearedunusualinthisregard. Finally,weusedthe Spearman’srankcorrelationcoefficientρandsimilarityscoresSCtoinvestigatefinescaleevidenceof non-randommatingatindividualSNPs,classicalHLAgenes,andaminoacids. 3.1. WholeMHCRegion FollowingconfirmationthatindividualsintheHRSdatasetwereofEuropeanancestry(FigureS1), weplottedRacrosstheMHCregionagainstgenome-wideSNPrelatedness(excludingchromosome6) ascalculatedbytheGCTAsoftwareforspousalandnon-spousepairs(FigureS2). Wefoundlittle evidencetosuggestthatspousalpairswhoweremostgeneticallysimilarattheMHCwerealsomost geneticallysimilaracrossthegenome. RegressingspousalsimilarityattheMHCongenome-wide similarity was not significant (p > 0.05). In addition, there were no outlying pairs that showed extremegenome-widesimilarity/dissimilarity. Theseresultssuggestthatspousalsimilarityinduced by population stratification is unlikely to mask any MHC dissimilarity induced by disassortative matinginourdataset. ThedistributionofRbetweenspousesatthreedifferentlevelsofgenomicvariationisdisplayed inFigure1. UsingSNPdata,theRcoefficientshowedaslightpositive-skeweddistribution,suggesting that the median might be a better measure of central tendency than the mean for this measure. The distribution of R across spouse pairs was slightly positively skewed in the case of four- and two-digitclassicalHLAallelesalso. Incontrast,theRcoefficientapproachedanormaldistributionin thecaseofaminoacids. TheresultsofrelatednessanalysesatthreelevelsofgenomicvariationaresummarizedinTable1. Spouses display slight dissimilarity at all of the three levels of genomic variation, but these do notreachstatisticalsignificance. Usingmolecularmarkers(averagerelatednesscoefficientsacross 4678SNPs),weobservedthatthemedianMHCrelatednessbetweenspousesintheHRSdatasetatthe MHClociwascomparabletorandomlyselectedopposite-sexpairs(R =−0.012,two-tailed median(spouse) p=0.829). Second,basedonHLAtypes(averagerelatednesscoefficientsacrosseightclassicalHLA genes),nosignificantMHC-dissimilaritywasobservedamongspousesateitherfour-ortwo-digit classifications. Lastly,inthecaseofthemultilocusrelatednesscoefficientsbasedontheaminoacid data,theactualcouplesweresignificantlydissimilarrelativetorandompairs(R =−0.014, mean(spouse) two-tailedp-value=0.040); however, thesignificancedisappearedwhenwelookedatthemedian MHCrelatednessbetweenspouses. Genes 2018,9,53 6of13 Genes 2018, 9, x FOR PEER REVIEW 6 of 13 FigureF1ig.uDries t1ri. bDutisiotrnibouftiroenl aotef drneelastsedcnoeesfsfi ccioeenftfsici(exnatsx i(sx) aacxrios)s satchroesMs athjoe rMHaisjotro cHomistpocaotimbpilaittyibiCliotym plex (MHC)Croemgpiolenx a(MmHoCn)g rsepgioouns aamlopnagir ssp.o(uAsa)la pllaisrisn. g(Ale) -anlul scilnegoltei-dneucpleooltyidme oproplyhmisomrpsh(iSsmNsP (sS)N,(PBs)), a(Bll) eight classicaalll heuigmhta nclalessuikcaol chyutemaann tilgeuekno(cHytLe Aan)tgigeenne s(H(aLtAf)o uger-ndesig (iattr efosuorlu-dtiigoint )r,e(sCol)uatilolne)i,g (hCt) calalls seiigcahlt HLA classical HLA genes (at two-digit resolution), and (D) all 376 amino acid polymorphisms. genes(attwo-digitresolution),and(D)all376aminoacidpolymorphisms. Table 1. Summary of relatedness analyses across the entire MHC region calculated across three levels of genomic variation (SNPs, classical HLA genes, and amino acid polymorphisms) in the Table1.SummaryofrelatednessanalysesacrosstheentireMHCregioncalculatedacrossthreelevels Health and Retirement Study (HRS) individuals of European ancestry. ofgenomicvariation(SNPs,classicalHLAgenes,andaminoacidpolymorphisms)intheHealthand RetirementStudy(HRS)iDnadtiavsiedtus alsofEuropeanancesPtaryr.ameters Relatedness 1 p-Value 2 Mean 3 −0.001 ± 0.137 0.829 SNPs Datasets Parameters MRedeliaante 4d ness−01.012 ± 0.124 p-V0a.l2u1e8 2 Mean −0.007 ± 0.184 0.258 Classical HLA genes (four-dMigieta rnes3olution) −0.001±0.137 0.829 SNPs Median −0.037 ± 0.128 0.500 Median4 −0.012±0.124 0.218 Mean −0.008 ± 0.209 0.211 ClassCiclaaslsHicLalA HgLeAn egsenes (two-digMite raensolution) M−e0d.0ia0n7 ±0.1−804.062 ± 0.143 00.2.45189 (four-digitresolution) Median −0.037±0.128 0.500 Mean −0.014 ± 0.226 0.040 Amino acids ClassicalHLAgenes Mean M−e0d.0ia0n8 ±0.2−009.013 ± 0.214 00.2.21315 (tw1o -Ddeisgcirtiprteisvoel usttaitoisnt)ics of the relMateeddniaenss coefficients be−tw0e.e0n6 2re±al 0c.o1u4p3les (N = 872); 2 t0w.4o1-s9ided p-value, derived from the 100,000 Mpeeramnutations; 3 the me−an0 .r0e1la4te±dn0e.s2s2 c6oefficient (±1 SD) 0a.c0ro4s0s all reAalm cionuoplaecsi; dSsD: standard deviaMtioend; i4a nthe median relate−d0n.e0s1s3 co±ef0fi.c2i1en4t (±1 MAD) acros0s. 2a3ll5 real couples; MAD: median absolute deviation, MAD = (median|x-median(x)|)/0.6745. 1Descriptivestatisticsoftherelatednesscoefficientsbetweenrealcouples(N=872);2two-sidedp-value,derived fromthe100,000permutations;3themeanrelatednesscoefficient(±1SD)acrossallrealcouples;SD:standard deviatioFno;r4 tthheem geedniaonmriecla twedinndesoswcose fafincaielnyts(e±s,1 oMnAlyD )0a.8cr1o%ss aolfl rtehael c1o0u,p0l0e0s; MraAnDdo:mmelydi asnamabpsoleludt egdeenvoiamtioicn , MwADin=do(mwesd ieanx|hxib-miteeddi amn(xo)r|e) /e0x.6t7r4e5m.e dissimilarity than the MHC region among spousal pairs (Figure 2A). However, this pattern was repeated and even more extreme among opposite-sex Fonrotnh-sepgoeunsoe mpaiicrsw (Finigduorwe 2sBa)n. Iatl syhsoeusl,do bnely n0o.t8ed1% thoatf stihnece1 t0h,0e 0n0urmabnedro omf slpyosuasmesp (l8e7d2 gpaeinrso)m isi cmwucihn dows smaller than the number of non-spousal pairs (759,512 pairs), there is a larger standard error of the exhibitedmoreextremedissimilaritythantheMHCregionamongspousalpairs(Figure2A).However, median relatedness of the sample genomic windows between spouses and consequently much thispatternwasrepeatedandevenmoreextremeamongopposite-sexnon-spousepairs(Figure2B). It should be noted that since the number of spouses (872 pairs) is much smaller than the number of non-spousal pairs (759,512 pairs), there is a larger standard error of the median relatedness of the sample genomic windows between spouses and consequently much tighter clustering of the relatedcoefficientsaroundthenullamongthenon-spousepairs. Ourresultstronglysuggeststhatthe Genes 2018,9,53 7of13 Genes 2018, 9, x FOR PEER REVIEW 7 of 13 relatedntiegshstepr actlutesrtenriantgt hofe tMhe HreClatleedv ecoleisffiecxietnretsm aerowunhde nthceo nmulpl aarmeodntgo ththee nroens-tspoofutshee pgaeirnso. Omuer, raensudltt hatthe MHCdsitsrosnimgliyl asruigtygeisnts stphoatu tshees rreelalatetdivneestso ptahtteerrne satt othfet hMeHgCe nleovmel eisi sexatsreemxetr wemheen acsomitpisariend otop pthoes ite-sex rest of the genome, and that the MHC dissimilarity in spouses relative to the rest of the genome is as non-spousepairs. extreme as it is in opposite-sex non-spouse pairs. Figure 2. Median relatedness coefficients for the sampled 4.97 Mb genomic regions (N = 10,000) Figure2.Medianrelatednesscoefficientsforthesampled4.97Mbgenomicregions(N=10,000)across across the genome between (A) spouses (B) opposite-sex non-spouse pairs. Each grey dot represents thegenomebetween(A)spouses(B)opposite-sexnon-spousepairs. Eachgreydotrepresentsone one sampled genomic region. The MHC region for spouses and non-spouse pairs were plotted in sampledgenomicregion.TheMHCregionforspousesandnon-spousepairswereplottedinyellow yellow (in A) and blue (in B), respectively. The outlying points in the right hand side of panel B (inA)arnepdrebsleunet o(ivnerBla)p,priensgp ewcitnidvoewlys. fTrohme otuhet lysainmge pgoeninomtsicin rethgieonr igofh tchhraonmdossoimdee o3fp-p aann ealreBa roefp resent overlapppairntigcuwlarinlyd hoigwhs lifnrkoamget dhiesesqaumiliebrgiuemno amndic lorewg rieocnomofbicnhartoiomn.o some3p-anareaofparticularlyhigh linkagedisequilibriumandlowrecombination. 3.2. Individual SNPs, HLA Alleles, and Amino Acids 3.2. IndividuFaulrtShNerP asn,aHlyLsAes Awlelreel ecso,nadnudctAedm tion oidAenctiidfys potential individual SNPs, HLA genes, and amino acid residues within the MHC region exhibiting non-random mating patterns. FurtheranalyseswereconductedtoidentifypotentialindividualSNPs,HLAgenes,andamino Using Spearman’s rank correlation coefficient ρ, no individual SNP remained significant after acidresmiduulteipslew ciotrhrienctitohne (MFigHuCre r3e; gTiaobnle eSx4h). iTbhitei ntogp nMoHnC-r aSnNdPo rms30m94a0t9i8n (gρ p= a0t.t1e2r3n, tsw.o-sided p-value = Us0i.n00g03S)p ise alorcmataedn ’isn rthaen DkEcAoHrr-eBloaxt iHoneliccaoseef fi16c i(eDnHtXρ1,6n) goeinned ainvdi dshuoawlsS sNimPilraerimtya bientwedeesni gspnoifiusceasn tafter multipl[e41c].o DrrHeXct1i6o nen(cFoidgeus rDeE3A;DTa bbolxe pSr4o)t.eiTnhs ewthoicphM arHe pCuStaNtivPe rRsN30A9 4h0e9lic8a(sρes= an0d.1 a2r3e, itmwpoli-csaitdeded inp -value cell cycle progression [42]. Single-nucleotide polymorphism rs115840813 was the SNP showing the = 0.0003) is located in the DEAH-Box Helicase 16 (DHX16) gene and shows similarity between most amount of dissimilarity (ρ = −0.115, two-sided p-value = 0.0007). It is a non-coding transcript spouses[41].DHX16encodesDEADboxproteinswhichareputativeRNAhelicasesandareimplicated exon variant, located in a processed pseudogene, USP8P1. incellcycleprogression[42]. Single-nucleotidepolymorphismrs115840813wastheSNPshowingthe mostamountofdissimilarity(ρ=−0.115,two-sidedp-value=0.0007). Itisanon-codingtranscript exonvariant,locatedinaprocessedpseudogene,USP8P1. Genes 2018,9,53 8of13 Genes 2018, 9, x FOR PEER REVIEW 8 of 13 FiguFriegu3r.eM 3.H MCHcCo rcroerlaretliaotniopnl optl.otT. hTeheS pSepaeramrmaann’s’s rraannkk ccoorrrreellaattiioonn ccooeefffificcieiennt t(ρ(ρ) )bbetewtweeene nspsopuosuess east at eacheaScNh PSNisPp ilso pttleodtteadg aaignasitnsitts itps hpyhsyisciaclacl hchrorommoossoommaallp poossiittiioonn.. AAllll nnoommininaalllyly sisgigninfiicfiacnatn StNSPNsP (sba(bseadse d ontowno t-wsiod-esdidepd-v pa-lvuaelsueasn adnbde bfoefroerec ocrorrercetcitoionnf oforrm muultltiippllee ccoommppaarriissoonnss)) aarree pplolotttetded ini ncoclooluoru. rT.hTeh e positions of the classical HLA genes are from UCSC (GRCh37/hg19) and superimposed in the plot. positionsoftheclassicalHLAgenesarefromUCSC(GRCh37/hg19)andsuperimposedintheplot. In the case of classical HLA genes, we conducted analyses for both four- and two-digit In the case of classical HLA genes, we conducted analyses for both four- and two-digit classification and found similar results; that is, although several HLA genes exhibited below average classsiimficialatrioitny, annod sifgonuinfidcasnitm niolamrirneaslu pl-tvsa;ltuhea twias,s aolbthseoruvgehd (sTeavbelrea 2l;H TLabAleg Se5n)e. s exhibitedbelowaverage similarity,nosignificantnominalp-valuewasobserved(Table2;TableS5). Table 2. Summary of similarity scores (SC) for each HLA gene (at four-digit resolution) in HRS TablEeur2o.pSeaunm Ammaerryicaonfss. imilarity scores (SC) for each HLA gene (at four-digit resolution) in HRS EuropeanAmericans. Similarity Score (SC) 2 HLA Gene Similarity Score (SC) 1 Two-Sided p-Value SiMmielaarnityScore(SC)S2D HLAGene SimilarityScore(SC)1 Two-Sidedp-Value A 411 421.90 14.37 0.448 Mean SD C 260 285.20 13.98 0.071 A 411 421.90 14.37 0.448 CB 260191 2852.0207.78 131.298.31 0.00.71173 DRBB1 191259 2072.6785.61 121.331.49 0.01.76324 DRB1 259 265.61 13.49 0.624 DQA1 488 497.30 16.47 0.572 DQA1 488 497.30 16.47 0.572 DDQQBB11 324324 3313.3611.61 141.485.85 0.06.06808 DPA1 1275 1267.73 11.95 0.543 DPA1 1275 1267.73 11.95 0.543 DPB1 585 586.24 13.72 0.928 DPB1 585 586.24 13.72 0.928 1Similarityscoresbetweenspouses;2similarityscoressummarizedfromnormaldistribution. 1 Similarity scores between spouses; 2 similarity scores summarized from normal distribution. InthInis tahnisa laynsaisly,wsise, tweset etdesbteodth baomthi naomaicniod sawciditsh iwnitthhienp trhoet epinrostteriunc tsutrruecatunrde inantdh einsi gthnea lspigenpatli de (i.e.,ptehpetidshe o(rit.ep., etphtei dsehoprrte speenpttiadte thpreeNse-ntte ramt itnheu sNo-ftetrhmeinnuass coefn tthHe LnAaspcernetc uHrsLoAr pprreoctueirnsodr epnrootteedin by negadteinvoeteadm binyo naecgiadtipvoe saitmioinnos )a.cUids ipnogstihtieonssim). iUlasrinitgy tshceo rseimSiCla,r2it1y asmcoirneo SaCc,i d21r easmidinuoe safcriodm retshideuHesL A from the HLA region showed nominal dissimilarity among spouses (Figure 4); however, no regionshowednominaldissimilarityamongspouses(Figure4);however,noindividualaminoacid individual amino acid polymorphism remained significant after correction for multiple testing polymorphismremainedsignificantaftercorrectionformultipletesting(TableS6). Thetopsignalwas (Table S6). The top signal was found for amino acid position 40 and 51 in HLA-DQA1 (two-sided foundforaminoacidposition40and51inHLA-DQA1(two-sidedp-value=0.006). p-value = 0.006). Genes 2018,9,53 9of13 Genes 2018, 9, x FOR PEER REVIEW 9 of 13 2.4 AA_DQA1_40 AA_DQA1_51 2.2 AA_C_304 2 AA_C_80 AA_B_178 1.8 AA_C_77 AA_A_70 AA_B_131 associated.HLA.gene AA_C_152 1.6 AA_B_109 AA_DQB1_70 HLA-A AA_B_177 AA_A_149 AA_C_285 AA_B_180 HLA-C P)1.4 AA_DQA1_50 HLA-B -log(101.2 AA_A_245 AA_C_295 AA_B_163 AA_DQA1_-16 AA_DQA1_53 HHLLAA--DDRQBA11 HLA-DQB1 1 HLA-DPA1 HLA-DPB1 0.8 0.6 0.4 0.2 0 Amino Acid Polymorphisms FiguFriegu4r.eM 4.a Mnhaanthtaatntatnw tow-soi-dsieddedp -pv-avlaulueef oforre eaacchha ammiinnoo aacciidd ppoollyymmoorrpphhisimsm tetsetsetde.d H.HLAL AAmAimnoin aociadc id polymorphisms are displayed along the X-axis, with minus log ten two-sided p-value displayed on polymorphismsaredisplayedalongtheX-axis,withminuslogtentwo-sidedp-valuedisplayedonthe the Y-axis. Each data point in the plot signifies an amino acid residue. The color of the data point Y-axis.Eachdatapointintheplotsignifiesanaminoacidresidue.Thecolorofthedatapointindicates indicates the different HLA genes. thedifferentHLAgenes. 4. Discussion 4. Discussion In this study we performed a set of analyses to investigate genetic signatures of non-random Inthisstudyweperformedasetofanalysestoinvestigategeneticsignaturesofnon-random mating patterns in 872 European American spousal pairs from the HRS dataset. We found no strong mateinvigdpenacttee tron ssuipnp8o7r2t aE ruorloep foear nthAe mMeHrCic arengsiopno uins ahlupmaairns mfraotme stehleectHioRnS. Ndeaittahseert t.hWe eMfHouCn rdegnioons,t oror ng evidaennyc seintoglseu SpNpPosr,t calarsosilce HfoLrAth aelleMleHs, Corr eagmioinno iancihdu pmoalynmmoraptehissemlesc wtioitnhi.nN theeit hMeHrtCh ereMgioHnC, wreegreio n, orasniygnsiifnicgalnetlSyN dPisss,icmlaislasri cbHetLwAeeanl lsepleosu,soers armelaintiovea ctoid npoonl-yspmoourspeh pisamirss. wWihthilisnt tgheeneMtiHc dCisrseigmioilnar,iwtye re signbiefitcwaenetnly spdoisussiems ialat rthbee tMwHeeCn wspaso uexstersemreel actoivmeptaorendo tno- stpheo uresset poaf itrhse. gWenhoilmsteg, ethnies tsicamdeis spiamttielranr ity betwweaesn aslspoo oubsseesrvaetdth ien MraHndComw apsaierxintrgesm oef ocpopmopsiatree-sdexto inthdeivriedsutaolsf. thegenome,thissamepatternwas alsoobservedinrandompairingsofopposite-sexindividuals. 4.1. Dissimilarity Between Spouses across the MHC Region as a Whole 4.1. DissimilaritybetweenSpousesacrosstheMHCRegionasaWhole There has been considerable debate in the literature regarding the existence of MTHhCer-edehpaesnbdeeennt mcoantsei dseelreacbtiloend [e6b,8a–t1e0i,n43t]h. eUlsiitnegr aat usirmeirleagr adredfiinngititohne oefx tihstee rnecleatoedfnMeHss Cco-defefpiceinendte nt R as used in this work, Chaix et al. measured genetic similarity across 9010 SNPs spread throughout mateselection[6,8–10,43]. UsingasimilardefinitionoftherelatednesscoefficientRasusedinthis the MHC region for 28 European American couples in the HapMap phase 2 dataset, and found that work,Chaixetal. measuredgeneticsimilarityacross9010SNPsspreadthroughouttheMHCregion mating pairs were significantly more MHC-dissimilar than random opposite-sex pairs of for 28 European American couples in the HapMap phase 2 dataset, and found that mating pairs individuals [6]. However, Derti et al. did not find any evidence for disassortative mating in the weresignificantlymoreMHC-dissimilarthanrandomopposite-sexpairsofindividuals[6]. However, HapMap phase 3 dataset, and noted that minor changes to the methods of Chaix et al. would Dertietal. didnotfindanyevidencefordisassortativematingintheHapMapphase3dataset,and decrease the significance, or have rendered the original results from the HapMap 2 dataset noted that minor changes to the methods of Chaix et al. would decrease the significance, or have non-significant [8]. Such modifications included increasing the number of permutations, removing rendered the original results from the HapMap 2 dataset non-significant [8]. Such modifications the couple displaying the highest MHC dissimilarity, or using the median rather than mean inclruedlaetdedinnecsrse atosi nqguatnhteifny uthmeb MerHoCf pdeisrsmimuitlaatriiotyn sa,mreomngo vspinogustehse, ceotcu. pWlehednis Dplearytii negt atlh. eexhaimghineesdt M24H C dissciomuiplalersit yw,ohro uwsienrge tohnelym epdreiasennrt atinh erthteh aHnampMeaanp rePlhaatesed n3e sdsattoasqeut; anvitriftuyatlhlye MnoH eCviddiesnscime ifloarr ity amodnisgassspoorutasteivse, emtca.tiWngh eancroDsesr tthiee tMalH.Cex wamasi nfoeudn2d4 (cRomueapn(slpeoussesw) =h o−0w.0e1r4e, townoly-spidreeds epn-tvainluteh e= H0.3a5p1M). ap PhaSsein3ced athtaes ceot;uvpilretsu saelllyecnteode fvriodmen Pcheafsoer 2d iasnadss 3o rotfa ttihvee HmaaptMingapa cwroerses tfhroemM tHheC swamase fpooupnudla(Rtion (and mean(spouses) =−0.014,two-sidedp-value=0.351). SincethecouplesselectedfromPhase2and3oftheHapMap Genes 2018,9,53 10of13 werefromthesamepopulation(andthereforefreeofpopulationstratification)andhighlyconcordant intermsofdataquality, theauthorsconcludedthatthefindingsinthepreviousstudywerelikely drivenbysporadicoutliers. SimilartoDertietal.,bytreatingtheentireMHClocusasasingleentity,wedidnotfindany strongevidencetosupportarolefortheMHCinmateselection. Althoughtherewassomenominal evidenceofdissimilaritybetweencouplesatclassicalHLAallelesbasedontwo-digitresolution,the effectwasmarginalanddisappearedwhenweexaminedmedianratherthanmeanlevelsofallele sharing. Weconsiderourresultsrobustforthefollowingreasons. First, ourresultsdonotappear tobedrivenbyspousalpairswithextremegenetic(dis)similarityattheMHC(Figure1;FigureS3). Second,wetookstepstoensurethatourresultswerenotdrivenbylatentpopulationstratification. All individuals were broadly of European ancestry (Figure S1). Third, we showed that genomic relatednessbetweenspousalpairsacrosstheMHCwasnotcorrelatedwithgeneticrelatednessacross therestofthegenome(FigureS2). Iffine-scalepopulationsubstructureweremaskingdisassortmentat theMHC,wewouldexpectgeneticsharingbetweenspousesattheMHCtobecorrelatedwithgenetic similaritybetweenspousesacrossthegenome. Thisresultsuggeststhatpopulationstratificationis unlikelytobeamajorcontributortospousalsimilarityattheMHCinourdataset. Fourth,wevastly increasedthenumberofspousalpairs(~900pairs)comparedtopreviousstudiesofdisassortative matingintheMHC.Finally,weincreasedthenumberofpermutationsruntoevaluatesignificance comparedtopreviousstudiesto100,000,whichreducedthemarginoferroraroundourestimatesof thepermutedp-value[44]. 4.2. DissimilaritybetweenSpousesatIndividualSNPs,HLAAlleles,andAminoAcids Given the above results across the entirety of the MHC, we expected the likelihood that any individualeffectsatfinerscaleswouldbesubtleinoursample,andthereforewewerenotsurprised tofindnoindividualSNPs, classicalHLAgenes, oraminoacidsweresignificantafteradjustment for multiple comparisons. Similar to our work, Laurent et al. performed a genome-wide scan to detectgenespotentiallyinvolvedinmatingchoices,andfoundnoneoftheclassicalHLAgeneswere significant in HapMap phase 3 European American couples [45]. In contrast, Khankhanian et al. reportedsignificantsimilarityatMHCclassIregion,aswellassignificantdissimilarityattwoclassical HLAgeneswithintheMHCclassIIregion(i.e.,HLA-DQA1andHLA-DQB1),among930spousesin aEuropeanandEuropeanAmericansample[32]. However,thisdatasetconsistedofcoupleshaving childrenafflictedwithmultiplesclerosis,adiseaseshowingstrongassociationwiththeMHCregion. Itislikelythatascertainingspouseswhohaveachildwithmultiplesclerosismayleadtoaspurious correlationbetweenparentalriskallelesattheMHC(i.e.,colliderbias),andsofindingsfromthisstudy shouldbetreatedwithcaution. 4.3. GeneralDiscussion ReportsofnegativeassortmentatMHClociintheliteraturehavebeeninconsistent,although there is some evidence to suggest that MHC-dependent mate selection in humans may be population-specific [6,25,27–30]. For example, two previous studies supporting the existence of MHC-relateddisassortativematingwerebothbasedonculturalisolates: theHutterites[30]andthe Mormons[6]. Thishasledtospeculationthatinpopulationisolates,selectionontheMHCmightplay aroleinavoidanceofinbreeding,ratherthandisassortativematingperse. However,effectssuchas these,iftheyexist,arelikelytobesmall. ThereisnoevidencefornegativeassortmentintheMHCin populationgroupsthatexperiencehighpathogenpressure,suchasAmerindiantribes[28],inwhich, presumably,thepreferenceformatingwithanindividualcarryingparticularpathogen-resistantalleles isfarmoreimportantthanmatingwithMHC-dissimilarindividuals. Additionalfactorsthatmight explainsomeofthediscrepanciesinresultsbetweenstudiesinvolvedifferencesingenerationaland socialfactorsbetweenthedifferentcohorts. Forexample,theindividualsinthepresentstudy(mainly borninthe1930sand1940s)wouldhavechosenspousalpartnersatatimewhensuchchoicesmay
Description: