Log-Normal Matrix Completion for Large Scale Link Prediction Brian Mohtashemi Thomas Ketseoglou Abstract—The ubiquitous proliferation of online social net- problemtoaconvexminimizationovertheLagrangian,which works has led to the widescale emergence of relational graphs issubsequentlysolvedwithProximalDescentandAlternating expressing unique patterns in link formation and descriptive Direction Method of Multipliers (ADMM). Through experi- user node features. Matrix Factorization and Completion have mentation on Google Plus, Flickr, and Blog Catalog social become popular methods for Link Prediction due to the low 6 rank nature of mutual node friendship information, and the networks, we demonstrate the advantage of incorporating 1 availability of parallel computer architectures for rapid matrix structured sparsity information in the resulting optimization 0 processing.CurrentLinkPredictionliteraturehasdemonstrated problem. 2 vast performance improvement through the utilization of spar- n sity in addition to the low rank matrix assumption. However, II. RELATEDWORK a the majority of research has introduced sparsity through the J limited L1 or Frobenius norms, instead of considering the more Link predictionhas been thoroughlyresearchedin the field 8 detailed distributions which led to the graph formation and of social network analysis as an essential elementin forecast- 2 relationship evolution. In particular, social networks have been ing future relationships, estimating unknown acquaintances, foundtoexpresseitherPareto, or morerecentlydiscovered, Log and deriving shared attributes. In particular, [18] introduces ] Normal distributions. Employing the convexity-inducing Lovasz the concept of the Social Attribute Network, and uses it to I Extension, we demonstrate how incorporating specific degree S distribution information can lead to large scale improvements predict the formation and dissolution of links. Their method s. inMatrixCompletionbasedLinkprediction.WeintroduceLog- combines features from matrix factorization, Adamic Adar, c NormalMatrixCompletion(LNMC),andsolvethecomplexopti- andRandomwalkwithRestartusinglogisticregressiontogive [ mizationproblembyemployingAlternatingDirectionMethodof link probabilities. However, the calculations of such inputs Multipliers. Using data from three popular social networks, our 1 may be time-intensive, and shared attributes may be unlikely, experiments yield up to 5% AUC increase over top-performing v non-structured sparsity based methods. leading to non-descriptive feature vectors. 4 Matrix Completion for Link Prediction has previously been 1 I. INTRODUCTION investigated within the Positive Unlabeled (PU) Learning 7 7 As a result of widespread research on large scale rela- Framework, where the nuclear norm regularizes a weighted 0 tional data, the matrix completion problem has emerged as value-specific objective function [19]. Although the weighted . a topic of interest in collaborative filtering, link prediction objectiveimprovesthe predictionresults, the subsequentopti- 1 0 [1]–[16], and machine learning communities. Relationships mization is non-convexand thus subject to instability. Binary 6 betweenproducts,people,andorganizations,havebeenfound Matrix completion employing proximal gradient descent is 1 togeneratelowranksparsematrices,withabroaddistribution studied in [20], however, sparsity is not considered, and Link : v of rank and sparsity patterns. More specifically, the node Prediction is not included in the experiment section. The i degreesinthesenetworksexhibitwellknownProbabilityMass structural constraints that must be satisfied for provablyexact X Functions (PMFs), whose parameters can be determined via completion are described in [21]. In this technical report, the r a Maximum Likelihood Estimation. In collaborative filtering or requiredcardinalityofuniformlyselectedelementsisbounded link prediction applications, row and column degrees may be based on the rank of the matrix. Unique rank bounds for characterized by differing PMFs, which may be harnessed matrix completion are considered in [22], where the Schatten to provide improved estimation accuracy. Directed networks p-Normisutilizedonthesingluarvaluesofthematrix.Matrix have unique in-degree and out-degree distributions, whereas Completion for Power Law distributed samples is studied undirected networks are symmetric and thus exhibit the same in [23], where various models are compared, including the row-wiseandcolumnwise degreedistributions.Thoughorigi- Random Graph, Chung Lu-Vu, Preferential Attachment, and nallythoughttofollowstrictPowerLawDistributions,modern ForestFiremodels.However,linkpredictionisnotconsidered socialnetworkshavebeenfoundtoexhibitLogNormaldegree and the resulting optimization problem is non-convex. patterns in link formation [17]. Theconceptofsimultaneouslysparseandlowrankmatrices In this work, we propose Log Normal Matrix Completion wasintroducedin[24],whereIncrementalProximalDescentis (LNMC) as an alternative to typical L or Frobenius norm employedtosequentiallyminimizetheobjective,andthreshold 1 constrained matrix completion for Link Prediction. The in- thesingularvaluesandmatrixentries.Duetothesequentiality corporation of the degree distribution prior generally leads to oftheoptimization,thememoryfootprintisreduced,however, a non-convex optimization problem. However, by employing theobjectiveisnon-convexandmayresultinalocalminimum theLovaszextensionontheresultingobjective,wereducethe solution.Also, the tested methodsemployedin simulation are elementary, and more advancedtechniques are well known in where thelinkpredictioncommunity.Simultaneousrowandcolumn- X ,if i,j ǫ Ω wise sparsity is discussed in [23], where a Laplacian based X = i,j { } Ωi,j norm is employed on rows and a Dirichlet semi-norm is (0, otherwise, utilizedoncolumns.Acomparisonbetweennuclearandgraph is the Frobenius norm, and is the nuclear norm F ∗ based norms is additionally provided. In [25], Kim et. al k(S·ckhatten p-norm with p = 1). Tkh·eknuclear norm can be present a matrix factorization method which utilizes group defined as wise sparsity, to enable specifically targeted regularization. min{m,n} However, the datasets which we utilize do not identify group X ∗ = σi, (2) k k membership, and thus we will not consider affiliation in our i=1 X prediction models. where σ is the i eigenvalue, when arranged in decreasing i th Structured sparsity was thoroughlyinvestigatedin [26], and order, and m and n are the row count and column count, applied to Graphical Model Learning. However, the paper respectively. In this paper, m is assumed equal to n. Xˆ is focuses solely on the Pareto Distribution which characterizes the estimated complete matrix after convergence is attained. scale-freenetworks,anddoesnotcovertheLogNormalMeth- Generally, these problems are solved using proximal gradient odswhicharepresentedinthispaper.Also,LinkPredictionis descent, which employs singular value thresholding on each not considered in the experimental section. Node specific de- iteration [29]. However,this problemgenerallylacks incorpo- greepriorsareintroducedin[27],andtheLovaszExtensionis ration of prior sparsity information encoded into the matrix. additionallyemployedto learn scale free networkscommonly Thus we augment the problem as formedbyGaussianModels.However,thestabilityoftheedge Xˆ =argmin A X 2 +λ X +G(X), (3) rank updating is not proven, and Log Normally distributed k Ω− ΩkF 1k k∗ X networks are not considered. where G is defined as follows: The Lovasz Extensionand backgroundtheoryare presented in [28], where Bach provides an overview on submodular G(X)=λ Γ (X)+λ Γ (X). (4) 2 i,α 3 j,β functions and minimization. Here, Γ (X) is a sparsity inducing term, where i implies i,α that the sparsity is applied on matrix rows, j implies sparsity III. PROPOSED APPROACH is applied on matrix columns, α is the prior in-degree distri- A. Link Prediction bution, and β is the out-degree distribution. For the rest of this paper, we will consider the case of symmetric adjacency In this paper, we consider social network graphs, since matrices, and thus set λ to 0. 3 they have been proven to follow Pareto, and more recently discovered, Log Normal, degree distributions. The Social C. Log-Normal Degree Prior NetworkLinkPredictionprobleminvolvesestimatingthe link As demonstrated in [17], many social networks, including status, Xi,j, between node i and node j, where Xi,j is Google+, tend to exhibit the Log-NormalDegree Distribution limited to binary outcomes. Together, the set of all nodes, V, and links, E, form the graph G = (V,E), where E is p(d)= 1 e−(ln2dσ−2µ)2. (5) dσ√2π only partially known. Unknown link statuses may exist when either the relationship between i and j is non-public, or the Thus we derive Γ(X) as the Maximum Likelihood Estimate observationisconsideredunreliableoverseveralcrawlsofthe Γ(X)= ln p(d ), (6) social network. Combined, the observations can be expressed − Xi i in the form of a partialadjacencymatrix, A , which contains Y Ω whered isthedegreeofthe i rowofX,whichsimplifies all knownvalues in the set of observedpairs, Ω. Unmeasured Xi th to the following: states between two nodes are set to 0 in A . This matrix Ω can be stored in sparse format for memory conservation, and (lnd µ)2 Γ(X)= ln(d σ√2π)+ Xi − . (7) operation complexity reduction. Xi 2σ2 i X This is equivalent to a summation of scaled Pareto Distribu- B. Structured Sparsity based Matrix Completion for Link tionswithshapeparameter1addedtoadditionalsquareterms. Prediction Thus the final optimization problem becomes As demonstrated in [19], [20], [24], Matrix Completion Xˆ =argmin A X 2 +λ X + involvessolvingforunknownentriesinmatricesbyemploying k Ω− ΩkF 1k k∗ X the low-rankassumption in addition to other side information (lnd µ)2 (8) regardingmatrixformationandevolution.Traditionally,matrix λ ln(d σ√2π)+ Xi − . 2 Xi 2σ2 completion problems are expressed as i X Duetothepresenceofthelogtermintheoptimization,convex Xˆ =argmin A X 2 +λ X , (1) k Ω− ΩkF k k∗ methodscannotbe directly applied to the minimization,since X the problem is not guaranteed to have an absolute minimum. parallelized for gradient calculation and recombined for the Optimization of this problem is a multi-part minimization, Eigenvaluedecomposition.Althoughtheinterimresultofeach which can be solved using the Alternating Direction Method round of minimization is generally not sparse, matrix entries of Multipliers (ADMM). with values below a given threshhold can be forced to 0 to allow sparse matrix Eigenvalue Decomposition (such as eigs D. Optimization in Matlab) to be performed with minimal error. ADMM allows the optimization problem to be split into E. Lovasz Extension lesscomplexsub-problems,whichcanbesolvedusingconvex minimizationtechniques.Inordertodecouple(8)intosmaller (10) is a non-convex optimization problem due to the log subproblems, the additional variable, Y, is introduced as of the set cardinality function. However, the problem can be altered into a convex form using the Lovasz Extension on argmin A X 2 +λ X +Γ(Y) k Ω− ΩkF 1k k∗ the submodularset function.As describedin [28], the Lovasz X Extension takes on the following form: s.t. X =Y. n ExpressingtheprobleminADMMupdateform,thesequential f(w)= w [F( z ,...,z ) F( z ,...,z )]. (15) optimization becomes zj { 1 j} − { 1 j−1} j=1 X Xk+1 =argmin A X 2 +λ X (9) Here, z is a permutationof j which ensurescomponentsof w {k Ω− ΩkF 1k k∗ X are ordered in decreasing fashion, w w w , and F µ z1 ≥ z2 ≥ zn + X Yk+Vk 2 is a submodularset function.The Lovasz Extensionis always 2k − kF} µ convex when F is submodular, thus allowing convex opti- Yk+1 =argminλ2Γ(Y)+ 2kXk+1−Y +Vkk2F (10) mization techniques to be used on the resulting transformed Y problem. Vk+1 =Vk+Xk+1 Yk+1. (11) − In order to transform each individual row of sampled re- Inpractice,stepsizevalues,µ,intherange[.01,.1]havebeen lationship information into a set, S, the support function, foundtoworkwell.Convergenceisassumed,andthesequence Si =Supp (Xi)isutilized.AsaresultSiǫ 0,1 n,wherenis { } is terminated once Xk+1 Xk 2 < δ. The initial values, thenumberofcolumnspresentinthematrixX.Asubmodular X0, Y0 and V0 arekset to −zeros kmFatrices. Although ADMM set function must obey the relationship hasslowconvergenceproperties,arelativelyaccuratesolution F(A p ) F(A) F(B p ) F(B), (16) can be attained in a few iterations. Due to the convexity of ∪{ } − ≥ ∪{ } − the initial equation, proximal gradient descent is employed where A B, and p is an additional set element. In this ⊆ for minimization. The proximal gradient method minimizes paper, F is a log-normaltransformationon the degree d. The problems of the form degree, d = nS , is modular, and thus follows (16) with i i i,j strict equality. Thus for F to be sub modular, the subsequent minimize g(X)+h(X), (12) P transformation of the degree must be submodular as well. using the gradient and proximal operator as After applying the Lovasz Extension to (7), the result is m n Xk,l+1 =proxψlh(Xk,l−ψl∇g(Xk,l)), (13) Γ(X)= [ln2(j+1) ln2(j) (17) − where ψl+1 = φψl, and φ is a multiplier utilized on each Xi=1Xj=1 gradientdescentround.Typicallyavalueof.5issufficientfor (σ2 µ)(ln(j+1) ln(j)) + − − ]X . φ,leadingtorapidconvergencein10rounds,however,avalue σ2 | i,j| <.5 would result in slower, but more accurate minimization. Here, X is used in order to maintain the positivity required The optimal value for ψ0 is determined throughexperimenta- for the| L|ovasz Extension to remain convex. Further details tion. For Log-Normal Matrix Completion, g(X) = AΩ regardingthe optimization of this problem can be obtained in XΩk2F + µ2kX − Yk + Vkk2F, and h(X) = λkXkk∗. Th−e Appendix A. proximaloperatorof h(X)becomesa sequentialthresholding on the eigenvalues, σ, of the argument in (13) F. Considerations In order for (17) to be utilized, (7) must remain a submod- prox =Q diag ((σ ψ) ) QT, (14) ψh i− + i ular function of the degree. Thus, both the first derivativeand whereQisthematrixofeigenvectors.Thesubproblemreaches the second derivative of the function must remain positive, convergence when Xk,l+1 Xk,l 2 < κ. The noise of creating the following constraint: k − kF the matrixis reducedthroughsequentialthresholding,leaving ln(d+τ) (1+µ σ2). (18) only the strongest components of the low rank matrix. This ≥ − algorithm is advantageous due to rapid convergence prop- τ is introduced to prevent the left side of the inequality erties and automatic rank selection. Known as the Iterative from approaching . In practice, a small constant is also −∞ Soft Thresholding Algorithm (ISTA), this method can be subtractedor added fromthe obtainedset functionin orderto 2) Matrix Completion with L Sparsity (MCLS) - MCLS 1100−02 LEomgp−iNricoarml Daal tFait 1100−01 LEomgp−iNricoarml Daal tFait 1100−−21 LEomgp−iNricoarml Daal tFait is used by Richard et al. [214], and representsone of the Probability1100−−64 Probability1100−−32 Probability10−3 fiRrasntkatatessmupmtsptaiotnin.corporatingL1 sparsity with the Low 10−8 10−4 10−4 3) Logistic Regression(MF + RwR + AA) - In their paper 10−1100 0 101 Degree 102 103 10−150 0 101 Degree 102 103 10−150 0 101 Degree 102 103 on Social Attribute Networks, Gong et al. [17] provide a method which combines features from Matrix Factor- (a) Google+ (b) Flickr (c) Blog Cat. ization, Random Walks with Restart, and Adamic Adar, Fig. 1: Empirical Node Degree Data and Fitted Log-Normal whicheffectivelysolvesthelinkpredictionproblemwith Probability Distribution Functions high accuracy. In this paper, the attributes are removed fromthenetworkforequalcomparisonwithourmethod. In order to provide a fair basis on which to judge the assure thatF( )=0. These small coefficientsare determined ∅ performance, Area Under the Curve (AUC) is employed for duringthe Cross Validation phase, after obtainingthe optimal comparison.By utilizing the AUC as the performancemetric, σ and µ values which satisfy the given constraints. we avoid the need for data balancing, a process which fre- IV. EXPERIMENT quently results in undersampling negative samples. Thus, all methods can benefit from the additional training data. In orderto comparethe performanceof the LNMC method The results are obtained via 10 fold Cross Validation, using with other popular Link Prediction methods, an experiment − arandomsamplingmethodforhyper-parameterselection.The wasperformedusingseveraldatasetsfromexistingliterature: rounds are averaged to produce the results shown in Table I. 1) Google + - The Google + dataset [18] contains 5,200 nodes and 24,690 links, captured in AUG 2011. The B. Performance Comparison data contains both Graph topology and node attribute As demonstrated in Fig. 2, LNMC outperforms MCPS, information; however, the side-features are removed MCLS,andLR,ontheGooglePlusdataset.Duetothehighly since our method requires edge status only. Log-Normalcharacteristic [17] of the data set, LNMC’s fine- 2) Flickr - Flickr is a social network based on image tuned degree specific prior captures the degree distribution hosting, where users form communities and friendships behaviorincombinationwiththelowrankfeaturesofthedata, basedoncommoninterests.TheFlickrdataset[30]con- leadingtohighAUCvalues.Thehighnumberoftruepositives tains 80,513 nodes, 5,899,882 links, and 195 groups. compared to the false positive rate leads to jagged graph Groupaffiliationwasdiscardedduetoirrelevancetothe distribution. In Fig 3, it is clear that matrix completion with LNMC method. ParetoSparsityproduceslowAUCvaluesduetotheinaccurate 3) Blog Catalog - Blog Catalog [30] is a blogging site distribution representation. Similarly the LR method fails to where users can form friendships, and acquire group capture accurate low rank information because the low rank membership.Theutilizeddatasetcontains10,312nodes, matrix factorization is done prior to the the gradient descent 333,983links, and 39 groups.Again, for the contextof training for Logistic Regression. Due to the Pareto nature this paper, the group information was removed. AsseeninFig.1,alldatasetsfollowaroughlyLog-Normal distribution, with varying amounts of degree sparsity, and 1 variance. Due to the high number of low degree nodes in the 0.9 Google+ dataset, all points appear constrained to the left of 0.8 the plot axis; however, as we will illustrate, the Log-Normal Distribution is still superiorto the Pareto Distribution for link 0.7 e prediction. During the training phase, 10% of the data was Rat0.6 removedinordertouseforfuturepredictions.Forthepurposes ve of demonstration, only 1,000 of the highest degree nodes are ositi0.5 P maintained for adjacency matrix formation. ue 0.4 Tr 0.3 V. RESULTS 0.2 Matrix Completion with Log Normal Sparsity A. Baseline Methods and Performance Metrics Matrix Completion with Pareto Sparsity 0.1 Matrix Completion with L1 Sparsity In order to understand the advantage of LNMC, the results Logistic Regression (MF + RwR + AA) 0 are compared against the following methods: 0 0.2 0.4 0.6 0.8 1 False Positive Rate 1) Matrix Completion with Pareto Sparsity (MCPS) - MCPS [26] utilizes the same algorithm which we have Fig.2:ReceiverOperatingCharacteristicforGooglePlusData outlined in the paper with the exception of the prior. MCPS employs the Pareto Distribution f(d)=(δ)χ. of the Flickr dataset, both the LNMC and MCPS methods d DataSet LNMC MCPS MCLS LR(MF+RwR+AA) perform the same. As can be seen in (17), LNMC can adapt Google+ .8541 .8439 .8113 .8434 to Scale Free Networkswhenthe firstterm is smallcompared Flickr .9052 .9052 .8504 .8972 tothesecondterm.LogisticRegressionperformspoorlysince BlogCatalog .7918 .7846 .7150 .7727 the features are set, whereas Matrix Completion methods automaticallyselectthenumberoflatentparameterstoutilize. TABLE I: AUC Performance Comparison 1 VI. CONCLUSION 0.9 As demonstrated both theoretically, and experimentally, 0.8 LNMC is able to sufficiently encapsulate the advantages of 0.7 ParetoSparsityinadditiontoLogNormalSparsity.Previously e Rat0.6 describedbyGongetal.in[17],manymodernsocialnetworks ve with undirected graph topologies exhibit Log Normal degree ositi0.5 distributions. Thus by incorporating the degree-specific prior P ue 0.4 the optimization encourages convergence to a Log-Normal Tr 0.3 degree distribution. Due to the non-convexity of solving the joint low-rank and structured sparsity inducing prior, the 0.2 Matrix Completion with Log Normal Sparsity Matrix Completion with Pareto Sparsity Lovasz Extensionis introducedto solve the complexproblem 0.1 Matrix Completion with L1 Sparsity efficiently. Through analysis on three datasets, and using 3 Logistic Regression (MF + RwR + AA) 0 top performingmethods, we provideresults which exceed the 0 0.2 0.4 0.6 0.8 1 False Positive Rate current optimum. These results reveal the fundamental value ofpriordegreeinformationinLinkPrediction,andcanprovide Fig. 3: Receiver Operating Characteristic for Flickr Data insightintounderstandingthe complexdynamicswhichcause links to form in a similar way across different networks. As seen in Fig. 4, LNMC outperforms the Pareto Sparisty In future research we plan to investigate the incorporation of based matrix completion, due to the inclusion of the squared side information into the objective. Node attributes introduce logterms.TheL sparsityusedintheMCLSmethodisinsuf- additional challenges, including missing features, and addi- 1 ficiently descriptive for accurate matrix estimation. Thus Lo- tional training complexities. gisticRegression,whichincorporatesmoredescriptivefeatures outperforms the MCLS method. For purposes of comparison, REFERENCES [1] H.R.SaandR.B.Prudencio,“SupervisedLearningforLinkPrediction in Weighted Networks,” Center of Informatics, Federal University of Pernambuco, Tech.Rep. 1 [2] Y.Sun,R.Barber,M.Gupta,C.C.Aggarwal, andJ.Han,“Co-Author RelationshipPredictioninHeterogeneousBibliographicNetworks,”Uni- 0.9 versityofIllinois atUrbana-Champaign, Tech.Rep. [3] G.-J. Qi, C. C. Aggarwal, and T. Huang, “Link Prediction across 0.8 Networks byBiasedCross-NetworkSampling,”University ofIllinois at 0.7 Urbana-Champaign, Tech.Rep. e [4] P. Sarkar, D. Chakrabarti, and M. I. Jordan, “Nonparametric Link Rat0.6 Prediction in Dynamic Networks,” University of California Berkeley, ve Tech.Rep. ositi0.5 [5] Z.Lu,B.Savas,W.Tang,andI.Dhillon,“SupervisedLinkPrediction P ue 0.4 UsingMultipleSources,”inIEEE10thInternationalConferenceonData Tr Mining, 2010,pp.923–928. 0.3 [6] J. Zhu, “ Max-Margin Nonparametric Latent Feature Models for Link Prediction,” in Proceedings of the 29th International Conference on 0.2 Matrix Completion with Log Normal Sparsity Machine Learning,2012. Matrix Completion with Pareto Sparsity [7] K. T. Miller, T. L. Griffiths, and M. I. Jordan, “ Nonparametric 0.1 Matrix Completion with L1 Sparsity Latent Feature Models for Link Prediction,” in Proceedings onNeural Logistic Regression (MF + RwR + AA) 0 Information ProcessingSystems,2009. 0 0.2 0.4 0.6 0.8 1 [8] L.LuandT.Zhou,“LinkPredictioninComplexNetworks:ASurvey,” False Positive Rate UniversityofFribourg,CherminduMusee,Fribourg,Switzerland,Tech. Rep. Fig. 4: Receiver Operating Characteristic for Blog Catalog [9] J. Leskovec, D. Huttenlocher, and J. Kleinberg, “ Predicting Positive Data andNegative LinksinOnlineSocialNetworks,” inInternational World WideWebConference, 2010. [10] Y. Dong, J. Tang, S. Wu, and J. Tian, “ Link Prediction and Rec- AUC values for each method and dataset, are contained in ommendation across Heterogeneous Social Networks,” in IEEE 12th International Conference onDataMining, 2012. Table I. As highlighted by the AUC Table, LNMC provides [11] P.S. Yu, J. Han, and C. Faloutsos, Link Mining: Models, Algorithms, optimal results over all datasets. andApplications. NewYork,NY:Springer, 2010. [12] D. Li, Z. Xu, S. Li, and X. Sun, “Link Prediction in Social Networks Based on Hypergraph,” in International World Wide Web Conference, 2013. [13] H.-H. Chen, L. Gou, X. Zhang, and C. L. Giles, “Capturing Missing EdgesinSocialNetworksUsingVertexSimilarity,”inK-CAP-11,2011. Data: Xk+1,Vk,µ,Yinit=(Xk+1+Vk) [14] E. Perez-Cervantes, J. M. Chalco, M. Oliveira, and R. Cesar, “ Using Data: γ,U =0 ,ω LinkPredictiontoEstimatetheCollaborativeIinfluenceofResearchers,” N Result: Y inIEEE9thInternational Conference oneScience, 2013,pp.293–300. [15] Z.Yin,M.Gupta,T.Weninger,andJ.Han,“AUnifiedFrameworkfor initialization; LinkPredictionUsingRandomWalks,”inInternational Conference on while Y YT <ω do AdvancesinSocialNetworksAnalysisandMining,2010,pp.152–159. k − k2 for r =0 N 1 do [16] P. Symeondis, E. Tiakas, and Y. Manolopoulos, “ Transitive Node → − Similarity for Link Prediction in Social Networks with Positive and Yr,∗ =LovaszOptimize(Yinitr,∗,Ur,∗) Negative Links,”inRecSys2010, 2010. end [17] N. Z. Gong, W. Xu, and L. Huang, “Evolution of Social-Attribute U =U +γ(Y YT) Networks:Measurements, Modeling, andImplications usingGoogle+,” − inIMC,2012,pp.1–14. end [18] N. Z. Gong, A. Talwalkar, and L. Mackey, “Joint link prediction and Y = 1(Y +YT) attribute inference using a social-attribute network,” ACM Tranactions 2 return Y onIntelligent Systems andTechnology, vol.5,pp.1–14,2014. [19] C.-J. Hsieh, N. Natarajan, and I. Dhillon, “PU Learning for Matrix Completion,” in International Conference on Machine Learning 32, Algorithm 1: Optimization with Symmetry Constraint 2015,2015,pp.1–10. [20] M. A. Davenport, Y. Plan, E. van den Berg, and M. Wootters, “1-Bit MatrixCompletion,”GeorgiaInstituteofTechnology,Tech.Rep.,2014. [21] Y.Chen,S.Bhojanapalli, S.Sanghavi,andR.Ward,“Completing Any Low Rank Matrix Provably,” University of California Berkeley, Tech. Rep.,2014. [22] F. Nie, H. Wang, X. Cai, H. Huang, and C. Ding, “Robust Matrix Completion viaJoint Schatten p-Norm andlp-Norm Minimization,” in IEEEInternational Conference onDataMining, 2012,pp.1–9. [23] R. Meka, P. Jain, and I. S. Dhillon, “Matrix Completion from Power- Law Distributed Samples,” in Neural Information Processing Systems, Data: yinit,u,M 2015,pp.1–9. Data: d=yinit u,p=0 M [24] E.Richard,P.-A.Savalle,andN.Vayatis,“EstimationofSimultaneously − Data: Set membership function ζ Sparse and Low Rank Matrices,” in Proceedings of the 29th Interna- tionalConference onMachine Learning,2012,pp.1–8. Data: θ transformation which translates sorted position [25] J.Kim,R.Monteiro,andH.Park,“GroupSparsityinNonnegativeMa- index to original index trixCompletion,”inProceedingsoftheSIAMInternationalConference Result: y onData,2012,pp.1–12. [26] A. Defazio and T. S. Caetano, “A Convex Formulation for Learning initialization; Scale-FreeNetworksviaSubmodularRelaxation,” inAdvancesinNeu- for l=0 M 1 do ralInformation ProcessingSystems25,2015,pp.1–9. q =θ(→l) − [27] Q. Tang, S. Sun, C. Yang, and J. Xu, “Learning Scale Free Network byNodeSpecificDegreePrior,”ToyotaTechnicalInstitute,Tech.Rep., pq =|dq|−λµ2(ln2(l+1)−ln2(l)+(σ2−µ)(lnσ(l2+1)−ln(l))) 2015. ζ(q).value=p r =l q [28] F.Bach,“LearningwithSubmodularFunctions:AConvexOptimization while r >1 and ζ(θ(r)).value ζ(θ(r 1)).value Perspective,” EcoleNormaleSuperieure, Tech.Rep.,2013. ≥ − [29] S. Gu, L. Zhang, W. Zuo, and X. Feng, “Weighted Nuclear Norm do MinimizationwithApplicationtoImageDenoising,”inComputerVision Join the sets containing θ(r) and θ(r 1) andPatternRecognition, 2014. ζ(θ(r)).value= 1 p − [30] L.TangandH.Liu,“ScalableLearningofCollectiveBehaviorbasedon |ζ(θ(r)) iǫζ(θ(r)) i SparseSocialDimensions,”inProceedingsofthe18thACMConference set: r to the first element of ζ(θ(r)) by sort ordering P onInformationandKnowledge Management, 2009,pp.1–10. end end APPENDIX for j =1 to N do As seen in [26], the optimization of (10) is performed by y =ζ(i).value if y <0 then j j first imposing the symmetry constraint on Y as y =0 q µ end argminλ Γ(Y)+ Xk+1 Y +Vk 2 Y 2 2k − k2 if di <0 then y = y s.t. Y =YT. q − q end This minimization leads to the following algorithm: end return y Algorithm 2: LovaszOptimize Problem

