ebook img

Playing the role of weak clique property in link prediction: A friend recommendation model PDF

1.6 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Playing the role of weak clique property in link prediction: A friend recommendation model

Playingtheroleofweak cliqueproperty inlink prediction: A friend recommendation model Chuang Ma,1 Tao Zhou,2 and Hai-Feng Zhang1,3,4,∗ 1School of Mathematical Science, Anhui University, Hefei 230601, China 2WebSciencesCenter,UniversityofElectronicScienceandTechnologyofChina, Chengdu610054, China 3KeyLaboratoryofComputerNetworkandInformationIntegration(SoutheastUniversity),MinistryofEducation,211189,P.R.China 4 CenterofInformationSupport &Assurance Technology, Anhui University, Hefei230601, China (Dated:January21,2016) Animportantfactinstudyingthelinkpredictionisthatthestructuralpropertiesofnetworkshavesignificant impactsontheperformanceofalgorithms. Therefore,howtoimprovetheperformanceoflinkpredictionwith 6 the aid of structural properties of networks is an essential problem. By analyzing many real networks, we 1 findacommonstructureproperty: nodesarepreferentiallylinkedtothenodeswiththeweakcliquestructure 0 (abbreviatedasPWCStosimplifydescriptions). BasedonthisPWCSphenomenon,weproposealocalfriend 2 recommendation (FR) index to facilitate link prediction. Our experiments show that the performance of FR n index is generally better than some famous local similarity indices, such as Common Neighbor (CN) index, a Adamic-Adar(AA)indexandResourceAllocation(RA)index. WethenexplainwhyPWCScangiveriseto J thebetterperformanceofFRindexinlinkprediction. Finally,amixedfriendrecommendationindex(labelled 0 MFR)isproposedbyutilizingthePWCSphenomenon,whichfurtherimprovestheaccuracyoflinkprediction. 2 PACSnumbers: 89.75.Hc,89.20.Hh ] h p I. INTRODUCTION sparse and very huge, so Liu et al. presented a local ran- - c dom walk method to solve the problem of missing link pre- o diction,andwhichcangivecompetitivelygoodpredictionor s Theresearchoflinkpredictionmainlyfocusesonforecast- s. ing potentialrelationsbetweennonadjacentnodes, including evenbetterpredictionthanotherrandom-walk-basedmethods whilehasalowercomputationalcomplexity[14]. Inviewof c the predictionofthe unknownlinks or the furthernodes[1]. i the local communityfeaturesin many networks, Cannistraci s Sincethewiderangeofapplicationsoflinkprediction,suchas et al. proposed an efficient computationalframework called y recommendingfriendsin online social networks[2], explor- h localcommunityparadigmtocalculatethelinksimilaritybe- ing protein-to-protein interactions [3], reconstructing airline p tweenpairsofnodes[3]. Liuetal. designedaparameter-free network [4], and boostinge-commercescales, which has at- [ localblockingpredictorto detectmissing linksin givennet- tractedmuchattentionrecently[5–7].Theprobabilisticmodel worksvialocallinkdensitycalculations,whichperformsbet- 1 and machine learning were mainly introduced in link pre- v diction. The notion of probabilistic link prediction and path terthanthetraditionallocalindiceswiththesametimecom- 6 plexity[15]. analysisusingMarkovchainsmethodwerefirstproposedand 4 evaluatedinRef.[8],andthenMarkovchainsmethodwasfur- 1 therstudied inadaptivewebsites [9]; InRef. [10], Popescul 5 0 etal.studiedtheapplicationofstatisticalrelationallearningto . linkpredictioninthedomainofscientificliteraturecitations. Sincethestructuralpropertiesofnetworkshavesignificant 1 0 However, the mentionedmethodsfor link predictionwere effectson the performanceof algorithmsin link predictions, 6 mainly based on attributes of nodes. It is known that the there are some literatures have proposed some methods by 1 structure of the network is easier to be obtained than the at- making use of the structural properties of networks. Such v: tributesofnodes,asaresult,thenetwork-structure-basedlink as the algorithms by playing the roles of hierarchical struc- i predictionhasattractedincreasingattention. Alongthisline, ture[12], clustering[16], weakties[5]andlocalcommunity X Liben-Nowelletal. developedapproachesto link prediction paradigm[3]. However,currentadvancesinincludingstruc- r basedonmeasuresforanalyzingthe“proximity”ofnodesin turalpropertiesintolinkpredictionarestillnotenough.Inthis a a network [11]. Since hierarchical structure commonly ex- paper,byinvestigatingthelocalstructuralpropertiesinmany istsinthefoodwebs, biochemicalnetworks,socialnetworks real networks, we find a common phenomenon: nodes are and so forth, a link prediction method based on the knowl- preferentiallylinked to the nodeswith weak clique structure edge of hierarchical structure was investigated in Ref. [12], (PWCS). Then based on the observedphenomenon,a friend and they found that such a method can provide an accurate recommendation (FR) index is proposed. In this method, performance. Zhouetal. proposedalocalsimilarityindex— whenanodejintroducesoneofhisfriendstoanodei,hewill ResourceAllocation(RA)indextopredictthemissinglinks, notintroducethenodeswhoarealsotheneighborsofnodei. andtheirfindingsindicatethatRA indexhasthebestperfor- OurresultsshowthattheperformanceofFRindexissignifi- manceoflinkprediction[13]. Giventhatmanynetworksare cantlybetterthanCN,AAandRAindicessinceFRindexcan makegooduseofPWCSinnetworks. Atlast,tofurtherplay theroleofPWCS,wedefineamixedfriendrecommendation (MFR)method,whichcanbetterimprovetheaccuracyinlink ∗Electronicaddress:[email protected] prediction. 2 II. PRELIMINARIES CNindex SCN =|Γ(i)∩Γ(j)|, (3) Considering an undirected and unweighed network ij G(V,E),whereV isthesetofnodesandEisthesetoflinks. AAindex Themultiplelinksandself-connectionsarenotallowed. For anetworkwithsizeN,theuniversalsetofallpossiblelinks, pdaeinrootefdnobdyeUs,,xc,oynt∈ainVin,gwoefaNss(iNg2n−1a)spcaoirres,oSfli,nkacs.coFrodrinegactho SiAjA = X lg(k1(l)), (4) xy Γ(i)∩Γ(j) adefinedsimilaritymeasure.Higherscoremeanshighersim- ilaritybetweenxandy,andviceversa.SinceGisundirected, RAindex thescoreissupposedtobesymmetry,thatisS = S . All xy yx thenonexistentlinksaresortedinadescendingorderaccord- 1 ing to their scores, and the links at the top are most likely SRA = , (5) ij X k(l) to exist [13, 14]. To test the predictionaccuracy of each in- Γ(i)∩Γ(j) dex, we adopt the approach used in Ref. [13]. The link set E is randomly divided into two sets E = ET ∪ EP with respectively. ET ∩EP = ∅. WheresetET isthetrainingsetandissup- posedtobeknowninformation,andEP isthetestingsetfor thepurposeoftestingandnoinformationthereinisallowedto III. DATASET beusedforprediction. Asinpreviousliteratures,thetraining setET alwayscontains90%oflinksinthiswork,andthere- In this paper, we choose twelve representative networks maining10%oflinksconstitutethetestingset. Twostandard drawn from disparate fields: including: (1) C. elegans-The metricsare used to quantifythe accuracyof predictionalgo- neural network of the nematode worm C. elegans [19]; (2) rithms: areaunderthereceiveroperatingcharacteristiccurve NS-A coauthorship network of scientists working on net- (AUC)andPrecision[5]. work theory and experiment [20]; (3) FWEW-A 66 compo- Area under curve (AUC) can be interpreted as the proba- nent budget of the carbon exchanges occurring during the bility that a randomly chosen missing link (a link in EP) is wet and dry seasons in the graminoid ecosystem of South givenahigherscorethanarandomlychosennonexistentlink Florid [21]; (4) FWFW-A food web in Florida Bay during (alinkinU −EP). Whenimplementing,amongnindepen- the rainy season [21]; (5) USAir-The US Air transportation dentcomparisons,iftherearen′ timesthemissinglinkhasa system [22]; (6) Jazz-A collaboration network of jazz mu- higherscoreandn′′ timestheyarethesamescore,AUCcan sicians [23]; (7) TAP-yeast protein-protein binding network bereadasfollow[5]: generated by tandem affinity purification experiments [24]; (8)Power-AnelectricalpowergridofthewesternUS[19];(9) n′+0.5n′′ Metabolic-AmetabolicnetworkofC.elegans[25];(10)Yeast- AUC = . (1) n A protein-proteininteraction network in budding yeast [26]; (11) Router-A symmetrized snapshot of the structure of the If all the scores generated from independentand identical Internetat the levelof autonomoussystems [27]; (12)PB-A distribution,theaccuracyshouldbeabout0.5. Therefore,the networkof the US politicalblogs[28]. Topologicalfeatures degreetowhichtheaccuracyexceeds0.5indicateshowmuch ofthesenetworksaresummarizedinTab.I. thealgorithmperformsbetterthanpurechance. Precision is the ratio of the number of missing links pre- dicted correctly within those top-L ranked links to L, and IV. UNIVERSALITYOFPWCSPHENOMENON L = 100 in this paper. If m links are correctly predicted, thenPrecisioncanbecalculatedas[5]: TocheckwhetherthePWCSphenomenoncommonlyexists m in real networks, we divide all links into common links or Percision= . (2) strong-tie links by judging whether the number of common L neighborsbetweenthetwoendpointsislargerthanathreshold We mainly compare three local similarity indices for link β. Taking Fig. 1 as an example, when we choose β = 3, prediction, including (1) Common Neighbors(CN) [17]; (2) thelinks{A,B}and{A,C}inFigs.1(a),(b)and(c)canbe Adamic-Adar(AA)index[18];(3)ResourceAllocation(RA) correspondinglydegeneratedtothesketchesinFig1.(d)-(f), index [13]. Among which, CN index is the simplest index. wherecommonlinks andstrong-tielinksare markedby fine AAindexandRAindexhavethesimilarform,andtheyboth linksandboldlinks,respectively. depress the contribution of the high-degree common neigh- Inthispaper,thethresholdβ ischosensuchthatthenum- bors, however,Zhouetal. have shownthatthe performance ber of common links and the number of strong-tie links are ofRAindexisgenerallybetterthanAAindex. approximately equal in each network. Once the value of β LetΓ(i)betheneighborsetofnodei,|.|bethecardinality is fixed, there are seven possible configurationsfor the con- oftheset, andk(i)bethedegreeofnodei. ThenCNindex, nectedsubgraphswith3nodes(i.e.,triples[29]),alltheseven AAindexandRAindexaredefinedas configurationsareplottedinFig.2, wheretheboldlinksand 3 TABLE I: The basic topological features of twelve example networks.N andM arethetotalnumbersofnodesandlinks,respec- tively. C andrareclusteringcoefficientandassortativecoefficient, respectively. H isthedegreeheterogeneity, definedasH = hk2i, hki2 wherehkidenotestheaveragedegree[29]. Network N M C r H C.elegans 297 2148 0.308 -0.163 1.801 NS 1589 2742 0.791 0.462 2.011 FWEW 69 880 0.552 -0.298 1.275 FWFW 128 2075 0.335 -0.112 1.237 USAir 332 2126 0.749 -0.208 3.464 Jazz 198 2742 0.633 0.02 1.395 Tap 1373 6833 0.557 0.579 1.644 Power 4941 6594 0.107 0.003 1.45 Metabolic 453 2025 0.655 -0.226 4.485 Yeast 2375 11693 0.388 0.454 3,476 FIG. 1: (Color online) Degenerating the upper sketches into the Router 5022 6258 0.033 -0.138 5.503 lower cases by judging whether two links {B,A} and {A,C} are PB 1222 16724 0.36 -0.221 2.971 strong-tielinkorcommonlink. Hereweassumethatifthenumber ofcommonneighborsbetweenAandB(orAandC)islargerthan β=3,thenthelinkisstrong-tielink;otherwise,thelinkiscommon linkintheoppositecase. Finelinesandtheboldlinesin(d)-(f)are fine lines denote strong-tie links and common links, respec- thecommonlinksandthestrong-tielinks,respectively. tively. Let N ,i = 1,··· ,7 be the number of CS ,i = i i 1,··· ,7(eachCSrepresentsaconfigurationinFig.2)innet- works. If both of {A,B} and {A,C} are strong-tie links, thentheprobabilityofnodeB connectingnodeC isdefined as[30]: 3N +N P = 4 6 . (6) 1 N +3N +N 1 4 6 Ifonlyoneoflinks{A,B}or{A,C}isstrong-tielink,then theprobabilityofnodeB connectingnodeC isdefinedas: 2N +2N P = 6 7 . (7) 2 2N +2N +N 6 7 3 FIG. 2: (Color online) Seven possible configurations of connected subgraphs with three nodes. Fine lines and the bold lines are the If neither of them is strong-tie link, then the probability of commonlinksandthestrong-tielinks,respectively. nodeB connectingnodeC is: 3N +N P = 5 7 . (8) 3 3N +N +N 5 7 2 RouterandPBnetworks,whereP > P > P . Asaresult, 1 2 3 First,wedefineasubgraphwithnnodesbeaweakclique we can state that PWCS phenomenon is more significant in thenumberoflinksamongthennodesisratherdense,which thesesixnetworks. isanextendeddefinitionofn-cliquewhereallpairsofnodes are connected. Next, by calculating the probability of node B connectingC, we canjudgewhetherthe phenomenonthat V. FRIENDRECOMMENDATIONMODEL nodesarepreferentiallylinkedtothenodeswithweakclique structure(i.e.,PWCSphenomenon)commonlyexistsinanet- Given that PWCS phenomenon commonly exists in real work. We say thatthe PWCS phenomenonexistsin the net- networks,whethercanwedesignaneffectivelinkprediction work if P > P and P > P . Moreover, we say that the methodbasedonthisphenomenon. Consideringthecasesin 1 2 1 3 PWCS phenomenonis significant if P > P > P , other- Fig. 3, where node 3 ask its neighbor node 2 to introduce a 1 2 3 wise,thePWCSphenomenonisweakasP >P ≥P . friendtoit. Sincethenumberofcommonneighborsbetween 1 3 2 Table IIreportsthevaluesofP , P andP inthetwelve node2andnode3inFig.3(c)islargerthanthatofinFig.3(b) 1 2 3 real networks (labelled as RN) and the values on the corre- andisfurtherlargerthanthatofin Fig. 3(a), inotherwords, spondingnullnetworks(labelledNN)arealsocomparatively the strength of link {2,3} in Fig. 3(c) is the strongest. Ac- shown. One can find that P > P and P > P in eleven cording to PWCS phenomenon, the probability (labelled by 1 2 1 3 networks except for Metabolic network (P < P , marked f )ofnode1(callnominee,greencolor)beingintroduced 1 3 123 by red color). However,in the correspondingnullnetworks, tonode3(callacceptor,redcolor)bynode2(callintroducer, P ≈P ≈P . Also,forC.celegans,FWEW,FWFW,Power, bluecolor)inFig.3(c)shouldbelargerthanthatofinFig.3 1 2 3 4 TABLEII: Thevalues of P1, P2 andP3 in12real networks (RN) andthecorresponding nullnetworks(NN)arereported. Resultsin NNaremarkedinItalic.ResultsinnetworkswithsignificantPWCS, i.e.,P1 >P2 >P3areshowninbluecolor,andresultsinMetabolic aremarkedbyredcolorduetoitsspecificity. Network Network P1 P2 P3 FIG.3:(Coloronline)TheroleofPWCSontheprobabilityoff123. RN 0.2351 0.1654 0.1519 Node2(bluecolor,callintroducer)wanttointroducenode1(green C.elegans NN 0.1011 0.0970 0.0953 color,callnominee)tonode3(redcolor,callacceptor).Thenumber RN 0.9292 0.2392 0.5970 commonneighborbetweennode2andnode3in(a),(b)and(c)is0, NS NN 0 0.0037 0.0045 1and2,respectively. AccordingtoEq.(9),onehas(a)f123 =1/3; RN 0.5998 0.4832 0.2504 (b) f123 = 1/2; (c) f123 = 1. Namely, the probability of node FWEW NN 0.7421 0.7627 0.7647 1beingintroduced tonode 3in(c)islargerthan(b)andisfurther largerthanin(a). RN 0.4191 0.3532 0.1230 FWFW NN 0.5220 0.5259 0.5207 RN 0.7008 0.1519 0.2355 USAir NN 0.0752 0.0765 0.0797 RN 0.6902 0.3968 0.4503 neighbors. Jazz NN 0.2734 0.2790 0.2804 Withtheabovepreparations,thesimilarityindexSFRfora ij RN 0.7862 0.2969 0.3673 pairofnodesiandj isdefinedas Tap NN 0.0141 0.0136 0.0146 RN 0.2781 0.0854 0.0686 f +f Power SFR = ij ji, (11) NN 0 0 0 ij 2 RN 0.1630 0.0760 0.1643 Metabolic NN 0.0409 0.0374 0.0389 whichguaranteesSFR =SFR. ij ji RN 0.5945 0.1498 0.1530 The sketches in Fig. 4 is given to show how to calculate Yeast NN 0.0089 0.0090 0.0086 the similarity between node 1 and node 2 based on the FR RN 0.1992 0.0254 0.0022 Router index. Also,thered,blueandgreennodesdenotetheaccep- NN 0 0 0 tors, introducersand nominees, respectively. Node 2 can be RN 0.3998 0.1247 0.0855 PB introducedtonode1bynode3(seeFig.4(a))ornode4(see NN 0.0457 0.0454 0.0441 Fig.4(b)). Whennode3isanintroducer(seeFig.4(a)),who will introducenodes2, 5 and 7 (greencolor) to node 1 with (b),andthenfurtherlargerthanthatofinFig.3(a). Toreflect equalprobability,butexcludesnode4,i.e.,f231 =1/3. Sim- thementionedfact,wedefinef betheprobabilityofibeing ilarly, whennode4 is anintroducer(see Fig. 1(b)),whojust ilj introducedtojbytheircommonneighborl,whichisgivenas: introducesnodes 2 and 6 (green color) to node 1 with equal probability, but excludes node 3, i.e., f = 1/2. There- 241 1 fore, the probabilityf = 1/3+1/2 = 5/6. Likely, from f = . (9) 21 ilj k(l)−1−|Γ(l)∩Γ(j)| Figs.5(c)and(d),thevalueoff =1/2+1/2=1. There- 12 fore,theFRsimilarityindexisSFR =SFR =11/12. Based on the definition in Eq. (9), the values of f in 12 21 123 CombingEqs.(9),(10)and(11),theadvantagesofFRin- Fig.3(a),(b)and(c)are1/3,1/2and1, respectively. Thatis dexcanbesummarizedas:1)similartomanylocalsimilarity tosay,theprobabilityf canreflectthePWCSphenomenon ilj indices,thesimilaritybetweenapairofnodesincreaseswith inrealnetworks. thenumberofcommonneighbors;2)likeAAindexandRA More importantly, Eq. (9) addresses two important facts: index,FRindexdepressesthecontributionofthehigh-degree first, since node l will not introducenode j to j, as a result, commonneighbors;3)mostimportantly,FRindexcanmake 1 is subtracted in denominator of Eq. (9); second, in social useofthePWCSphenomenoninmanyrealnetworks;4)FR communication,whenafriendintroduceoneofhisfriendsto indexhashigherresolutionthanotherlocalsimilarityindices. me,heshouldintroducehisfriendsbutexcludingthecommon Forinstance,thesimilaritiesSCN,SAAandSRAarethesame friends. Therefore,thecommonneighborssetbetweenj and 13 13 13 inFigs.3(a),(b)and(c). Yet,thevalueofSFR inFig.3(c)is l (i.e., Γ(l)∩Γ(j)) should be subtracted in denominator of 13 largerthanFig.3(b),andisfurtherlargerthanFig.3(a). Eq. (9). For instance, in Fig. 3(c), node2 will notintroduce node3tonode3,andnodes4and5shouldnotbeintroduced tonode3. VI. RESULTS Let f be the weight of node i being introduced to node ij j (weuseweightratherthanprobabilitysincef maylarger ij than1),whichiswrittenas: In this section, the comparisonof FR index with CN, AA andRAindicesintwelvenetworksissummarizedinTab. III. 1 As shown in Tab. III, FR index in general outperforms the f = f = . (10) ij X ilj X k(l)−1−SCN other three indices in link prediction, regardless of AUC or l∈Γ(i)∩Γ(j) l∈Γ(i)∩Γ(j) jl Precision. Thehighestaccuracyineachlineisemphasizedin Here the value of f increases with the numberof common bold. ij 5 Moreover,thecorrelationofrankingvaluesbetweenFRin- dex and RA index is given in Fig. 5, where the percentage values in x or y axis is the top percentage of ranking val- uesbasedonPrecision. Asaresult,asmallpercentagevalue meansahigherrankingvalue.Fig.5describesthatahighRA rankingvalue of links givesrise to a highFR rankingvalue. However,ahighFRrankingvalueoflinksmayinducealow RA ranking value of links. Take Tap and Yeast networks as examples, based on FR index, some links have higher rank- ingvalues,howevertheircorrespondingrankingvaluesbased onRAindexmayverysmall(seetheregionsmarkedbypink dashboundaryinFigs.5(g)and(j)). By analyzing a typical case in the Yeast network (see Fig. 6), where two nodes A and B are the neighbors of in- troducerC(infact,therehasalinkconnectingAandBinthe FIG. 4: (Color online) Calculation of the similarity S1F2R between Yeast network). Since links {A,C} and {B,C} are strong- node1andnode2.Nodes1and2canbeintroducedbytheircommon tie links. WhenusingFR index,thesimilaritySFR is rather AB neighbors3and4.(a)node3introduceshisfriendstonode1.Only large, whichcanpredictthe existenceof link{A,B}. How- neighbornodes2,5,7canbeintroducedtonode1butexcludesnode ever,forRAindex,sincethelargedegreevalueofintroducer 4, sincenode 4has been afriend of node 1. Thus, theprobability C, the similarity SRA is very small, such an existing link of node 3 introducing node 2 to node 1 is: f231 = 1/3; (b) node {A,B}cannotbeacAcBuratelypredictedbyRAindex. 2 is introduced to node 1 by node 4, here only nodes 2 and 6 can be introduced to node 1. As a result, the probability f241 = 1/2; (c)node1isintroducedtonode2bynode3,hereonlynodes1and 5canbeintroduced tonode 1. Asaresult, theprobabilityf132 = VII. ROLEOFPWCS 1/2; (d)node1isintroducedtonode2bynode4,hereonlynodes 1 and 6 can be introduced to node 1. As a result, the probability WehavevalidatedthattheFRindexbasedonPWCSphe- f141 =1/2.Wehavef21 =1/3+1/2bycombing(a)and(b),and nomenoncanimprovetheperformanceoflinkprediction,and f12 =1/2+1/2bycombing(c)and(d).SotheFRsimilarityindex the reasonswere also analyzed. Here we wantto knowhow isS1F2R =S2F1R =11/12. the strength of PWCS affects the performance of RA index and FR index. For this purpose, we propose a generalized friendrecommendation(GFR)index,whichisgivenas: TABLE III: Comparison of SFR with SCN, SAA and SRA in 12 networks, includingAUCandPrecision. Thehighestvalueineach 1 1 1 rowismarkedinbold. SiGjFR = 2 X (cid:0)k(l)−αSCN + k(l)−αSCN(cid:1)(,12) Network Metric CN AA RA FR l∈Γ(i)∩Γ(j) jl il AUC 0.8501 0.8663 0.8701 0.8756 C.elegans Precision 0.1306 0.1374 0.1315 0.1504 where parameter 0 ≤ α ≤ 1 is used to uncover the role of AUC 0.9913 0.9916 0.9917 0.9916 PWCS. As α = 0, Eq. (12) returns to RA index, that is, NS Precision 0.8707 0.9731 0.9712 0.9832 SGFR = SRA. When α = 1, the difference between FR ij ij AUC 0.6868 0.6939 0.7017 0.7595 methodand GFR method is the absence of 1 in the denomi- FWEW Precision 0.1415 0.1551 0.1664 0.2763 natorsofEq.(12),therefore,wecansimplyviewGFRindex AUC 0.6074 0.6097 0.6142 0.6623 FWFW is the same as FR index when α = 1. As a result, with the Precision 0.0837 0.0853 0.082 0.1798 increasing of α from zero to one, SSFR index can compre- AUC 0.9558 0.9676 0.9736 0.9752 USAir hensivelyinvestigatetheroleofPWCS ontheRA indexand Precision 0.606 0.6218 0.6337 0.6586 FRindex. AUC 0.9563 0.963 0.9717 0.9714 Jazz The effect of α on the Precision in all twelve networks is Precision 0.8247 0.8401 0.8192 0.8406 plotted in Fig. 7. As illustrated in Fig. 7, several interesting AUC 0.9538 0.9545 0.9548 0.955 Tap Precision 0.7594 0.78 0.7818 0.8659 phenomenaandmeaningfulconclusionscanbesummarized: AUC 0.6249 0.6251 0.6245 0.6248 First,exceptforMetabolicnetwork,thePrecisionforthecase Power Precision 0.1215 0.0952 0.0801 0.1275 ofα > 0isfarlargerthanthecaseofα = 0(i.e.,RAindex) AUC 0.9248 0.9565 0.9612 0.9623 inallother11networks.SinceP >P andP >P inthese Metabolic 1 2 1 3 Precision 0.2026 0.2579 0.3219 0.3302 11networks,whichindicatesthatPWCSphenomenoninnet- AUC 0.9158 0.9161 0.9167 0.9172 workscanensurethehigheraccuracyofFRindex(i.e.,α=1) Yeast Precision 0.6821 0.6958 0.4988 0.8041 inlinkprediction;Second,Metabolicnetworkhasnon-PWCS AUC 0.6519 0.6523 0.652 0.6519 Router phenomenonsinceP1 > P2 andP2 < P3,andFig.7(i)sug- Precision 0.1144 0.1104 0.0881 0.0592 gests that Precision decreases with the value of α. In other AUC 0.9239 0.9275 0.9286 0.9309 PB words, FR index is invalid in network with non-PWCS phe- Precision 0.4205 0.3782 0.2509 0.3454 nomenon,whichagainemphasizesthe importanceofPWCS 6 FIG.5:(Coloronline)ThecorrelationofrankingvaluesbetweenFRindexandRAindexbasedonPrecision.Thepercentagevaluesinx-axis and y-axis are the top percentage ranking values of FR index and RA index, respectively. The regions marked by pink dash boundary in subfigures(g)and(j)correspondtothecasesinwhichsomelinkshavehigherFRrankingvaluesbuthavelowRArankingvalues. in link prediction; At last, by systematically comparing the issignificant. Unfortunately,themaximalvalueαinEq.(12) subfiguresinFig.7,onecanseethat,whenthenetworkswith is one. the denominator may be negative if α > 1. So we weakPWCSP >P ≥P (i.e.,theinsetsarelightredback- design a new index to further explore the role of significant 1 3 2 ground,seeFig.7(b),(e),(f),(g)and(j)),Precisionincreases PWCS. withαatfirstandthendecreaseswhenαisfurtherincreased SinceEq.(12)canberewrittenas (except Fig. 7(e)). However, when P > P > P (i.e., 1 2 3 networks with significant PWCS, the insets are white back- 1 2k(l)−SCN −SCN SGFR = il jl (13) ground,seeFigs.7(a),(c),(d),(h),(k)and(l)),Precisional- ij 2 X (k(l)−SCN)(k(l)−αSCN) waysincreaseswiththevalueofαevenwhenα=1.0. l∈Γ(i)∩Γ(j) jl il In view of this observation, we can conjecture the role of whenα=1. TofurtherplaytheroleofPWCS,anothersimi- PWCScanbefurtherexploredwhenthePWCSphenomenon larityindex,calledstrongfriendrecommendation(labelledas 7 ofP ,P andP indifferentnetworks.Tothisend,wedesign 1 2 3 amixedfriendrecommendation(labelledMFR)index: SSFR, P >P >P ; ij 1 2 3  SiMjFR =  SiFjR, P1 >P2,P1 >P3,P2 <P3; SRA, otherwise. (15)  ij Table 5 lists the results of MFR index and FR index in 7 networks (since MFR index is the same to FR index when P > P , P > P and P < P , in this case, it is unnec- 1 2 1 3 2 3 essary to compare the two indices). The results in Tab. V indicate that, compared with FR index, MFR index can fur- therimprovetheaccuracyoflinkprediction. VIII. CONCLUSIONS In summary, by analyzing the structural propertiesin real FIG.6:(Coloronline)AtypicalcaseintheYeastnetworkisconsid- networks, we have found that there exists a common phe- eredtoemphasize thedifference between FRindex andRA index, nomenon: nodes are preferentially linked to the nodes with wherenodesA,BandCarethenode1175,421and205intheYeast weak clique structure. Then we have proposeda friend rec- network.Twolinks{A,C}and{B,C}shareacommonendpointC, andbothofthemarestrong-tielinks. Therefore,thesimilaritySAFBR ommendationmodeltobetterpredictthemissinglinksbased isratherlarge. However,whenusingRAindex,therankingnumber onthesignificantphenomenon.Throughthedetailedanalysis ofSARBAisverylowowingtothelargedegreevalueofnodeC,caus- and experimental results, we have shown that FR index has ingthefailureofRAindexinpredictingsuchanexistinglink. Red severaltypicalcharacteristics: First,FRindexisbasedonthe nodes,greennodesandbluenodesaretheneighborsofA,BandC informationofcommonneighbors,whichisalocalsimilarity (includingthemselves), respectively. Purplenodesarethecommon index. Thus, the algorithm is simple and has low complex- neighborsofAandC;whitenodesarethecommonneighborsofA, ity; Second, the common neighbors with small degrees has BandC. greatercontributionsthanthecommonneighborswithlarger degrees;Third,FRindexcantakefulladvantageofthePWCS phenomenon,andsoforth. SFR)index,isgiveninfollowing Furthermore,wealsoproposedanSFRindextofurtherim- provetheaccuracyoflinkpredictionwhennetworkshavesig- 1 2k(l) nificantPWCSphenomenon. Atlast,byjudgingwhetherthe SSFG = . (14) ij 2 X (k(l)−SCN)(k(l)−αSCN) networkshavesignificantPWCS,weakPWCSornon-PWCS l∈Γ(i)∩Γ(j) jl il phenomenon, we have also proposed a mixed friend recom- mendationindexwhichcanincreasetheaccuracyoflinkpre- Combing Eq. (13) with Eq. (14), we can find that two sub- dictionindifferentnetworks.Inthiswork,wemainlyapplied trahendsSCN and SCN in the numeratorofEq. (13) are re- il jl FRindextounweighedandundirectednetworks,andhowto moved.SoEq.(14)canbetterplaytheroleofPWCS. generalizeourFRindextoweighted[31,32]ordirectednet- WeconjecturethattheperformanceofSFRindexisbetter works[33]isourfurtherpurpose. thanGFRindexwhenP >P >P (i.e.,significantPWCS), 1 2 3 andworsethanthatofGFRindexwhenP <P andP <P 1 2 1 3 (i.e.,non-PWCS).However,itisdifficulttodistinguishwhich one has better performancewhen P > P ≥ P (i.e., weak Acknowledgments 1 3 2 PWCS). As presented in Tab. IV, Precision in 12 networks validatesourconjecture. This work is supported by the National Natural Science Synthesizingtheaboveresults,wecanfindthattheranking Foundation of China (Grant Nos. 61473001), and par- of P , P and P hasdeterminanteffecton the performance tiallysupportedbyopenfundofKeyLaboratoryofComputer 1 2 3 of the proposedindex. Inspiredby thisclue, we may design NetworkandInformationIntegration(SoutheastUniversity), auniversalindicatortodolinkpredictionbasedonthevalues MinistryofEducation(No.K93-9-2015-03B). [1] L. Getoor and C. P. Diehl, ACM SIGKDD Explorations [2] S.Scellato,A.Noulas, andC.Mascolo, inProceedingsofthe Newsletter7,3(2005). 17th ACM SIGKDD international conference on Knowledge 8 TABLEIV:ThecomparisonofPrecisionbetweenSFRindexandGFRindex(α=1)in12networks.Theresultssuggestthattheaccuracyof linkpredictioncanbefurtherimprovedbySFRindexwhenthenetworkshavesignificantPWCS.Thehighestvalueineachcaseismarkedas bold. P1 >P2 >P3 Index C.elegans FWEW FWFW Power Router PB GFR(α=1) 0.1511 0.2676 0.172 0.1354 0.0982 0.3595 SFR(α=1) 0.1577 0.2912 0.2057 0.1658 0.112 0.4353 P1 >P3 ≥P2 P1 <P3,P2 <P3 Index NS USAir Jazz Tap Yeast Metabolic GFR(α=1) 0.9804 0.6807 0.8532 0.8568 0.8178 0.3064 SFR(α=1) 0.9744 0.6866 0.8739 0.8485 0.8587 0.2912 FIG.7: (Coloronline)EffectsofαinEq.(13)onPrecisionareplottedin12networks. InsetineachsubfigureistoshowthevaluesofP1, P2 andP3. ThebackgroundofinsetiswhitecolorwhenP1 > P2 > P3;thebackground ofinsetislightredcolorwhenP1 > P3 ≥ P2. Otherwise,thebackgroundofinsetforMetabolicisgraycolor. discoveryanddatamining(ACM,2011),pp.1046–1054. inginanImperfectWorld(Springer,2002),pp.60–73. [3] C.V. Cannistraci, G. Alanis-Lobato, andT. Ravasi, Scientific [10] A. Popescul and L. H. Ungar, in IJCAI workshop on learn- Reports3,1613(2013). ingstatisticalmodelsfromrelationaldata(Citeseer,2003),vol. [4] R. Guimera` and M. Sales-Pardo, Proceedings of the National 2003. AcademyofSciences106,22073(2009). [11] D.Liben-NowellandJ.Kleinberg,JournaloftheAmericanSo- [5] L. Lu¨ and T. Zhou, Physica A: Statistical Mechanics and its cietyforInformationScienceandTechnology58,1019(2007). Applications390,1150(2011). [12] A. Clauset, C. Moore, and M. E. Newman, Nature 453, 98 [6] L.Lu¨,L.Pan,T.Zhou,Y.-C.Zhang,andH.E.Stanley,Proceed- (2008). ingsoftheNationalAcademyofSciences112,2325(2015). [13] T.Zhou,L.Lu¨,andY.-C.Zhang,TheEuropeanPhysicalJour- [7] L.Lu¨,C.-H.Jin, andT.Zhou, PhysicalReviewE80, 046122 nalB71,623(2009). (2009). [14] W.LiuandL.Lu¨,EPL(EurophysicsLetters)89,58007(2010). [8] R.R.Sarukkai,ComputerNetworks33,377(2000). [15] Z.Liu,W.Dong,andY.Fu,Chaos: AnInterdisciplinaryJour- [9] J.Zhu,J.Hong,andJ.G.Hughes,inSoft-Ware2002:Comput- nalofNonlinearScience25,013115(2015). 9 TABLEV:ComparisonofPrecisionbetweenFRindexandWFRindexin7networks.Thehighestvalueineachcaseisgiveninbold. Metric Index C.elegans FWEW FWFW Power Router PB Metabolic FR 0.8756 0.7595 0.6623 0.6248 0.6519 0.9309 0.9623 AUC WFR 0.8771 0.7771 0.6878 0.6247 0.6516 0.9314 0.9612 FR 0.1504 0.2763 0.1798 0.1275 0.0592 0.3454 0.3302 Precision WFR 0.1577 0.2912 0.2057 0.1658 0.112 0.4353 0.3219 [16] X.Feng,J.Zhao,andK.Xu,TheEuropeanPhysicalJournalB L. Ling, N. Zhang, et al., Nucleic Acids Research 31, 2443 85,1(2012). (2003). [17] M.E.Newman,PhysicalReviewE64,025102(2001). [27] N. Spring, R. Mahajan, D. Wetherall, and T. Anderson, Net- [18] L.A.AdamicandE.Adar,SocialNetworks25,211(2003). working,IEEE/ACMTransactionson12,2(2004). [19] D.J.WattsandS.H.Strogatz,Nature393,440(1998). [28] S.D.Reese,L.Rutigliano,K.Hyun,andJ.Jeong,Journalism [20] C.VonMering, R.Krause, B.Snel, M.Cornell, S.G.Oliver, 8,235(2007). S.Fields,andP.Bork,Nature417,399(2002). [29] M.E.J.Newman,Networks: anintroduction(OxfordUniver- [21] R.Ulanowicz,C.Bondavalli,andM.Egnotovich,AnnualRe- sityPress,2010). port to the United States Geological Service Biological Re- [30] L. Lu¨ and T. Zhou, EPL (Europhysics Letters) 89, 18001 sourcesDivisionRef.No.[UMCES]CBLpp.98–123(1998). (2010). [22] http://vlado.fmf.uni-lj.si/pub/networks/data/. [31] J. Zhao, L. Miao, J. Yang, H. Fang, Q.-M. Zhang, M. Nie, [23] P.M.GleiserandL.Danon,AdvancesinComplexSystems6, P.Holme,andT.Zhou,ScientificReports5,12261(2015). 565(2003). [32] C. Aicher, A. Z. Jacobs, and A. Clauset, Journal of Complex [24] A.-C.Gavin, M. Bo¨sche, R.Krause, P.Grandi, M.Marzioch, Networks3,221(2015). A.Bauer,J.Schultz,J.M.Rick,A.-M.Michon,C.-M.Cruciat, [33] F.Guo,Z.Yang,andT.Zhou,PhysicaA:StatisticalMechanics etal.,Nature415,141(2002). anditsApplications392,3402(2013). [25] J.DuchandA.Arenas,PhysicalReviewE72,027104(2005). [26] D.Bu,Y.Zhao,L.Cai,H.Xue,X.Zhu,H.Lu,J.Zhang,S.Sun,

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.