ebook img

Greed is Good: Optimistic Algorithms for Bipartite-Graph Partial Coloring on Multicore Architectures PDF

1.6 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Greed is Good: Optimistic Algorithms for Bipartite-Graph Partial Coloring on Multicore Architectures

Greed is Good: Optimistic Algorithms for Bipartite-Graph Partial Coloring on Multicore Architectures Mustafa Kemal Tas¸ Kamer Kaya Erik Saule Computer Science and Engineering, Computer Science and Engineering, Computer Science, Sabancı University, Istanbul, Turkey Sabancı University, Istanbul, Turkey The University of North [email protected] Dept. Biomedical Informatics, Carolina at Charlotte, The Ohio State University, Columbus, USA Charlotte, NC, USA [email protected] [email protected] 7 1 0 2 Abstract—In parallel computing, a valid graph coloring largenumberofcolorssincethenumberofbarriers(between yields a lock-free processing of the colored tasks, data points, the color sets), and hence the parallelization overhead, n etc., without expensive synchronization mechanisms. However, a will be less. Unfortunately, the distance-1 graph coloring coloring is not free and the overhead can be significant. In J problem (D1GC), i.e., coloring a graph with the minimum particular, for the bipartite-graph partial coloring (BGPC) 0 and distance-2 graph coloring (D2GC) problems, which have number of colors such that all adjacent vertices have differ- 1 varioususe-caseswithinthescientificcomputingandnumerical entcolors,isNP-Completeandhardtoapproximate[1],[2]. optimization domains, the coloring overhead can be in the The naive, adjacency-based neighborhood is not suitable ] orderofminuteswithasinglethreadformanyreal-lifegraphs. C for numerous applications such as numerical optimization In this work, we propose parallel algorithms for bipartite- D graph partial coloring on shared-memory architectures. and efficient computation of Hessians and Jacobians. . Compared to the existing shared-memory BGPC algorithms, Instead, the problem can be modeled as a bipartite graph s the proposed ones employ greedier and more optimistic partial-coloring (BGPC) problem. In BGPC, given a bipar- c [ techniques that yield a better parallel coloring performance. titegraphG=(VA∪VB,E),onewantstocolorthevertices 1 Imnorpearthtiacnula4r×, ofnast1e6r cthoarens,thtehier pcoruopnotesrepdaratlsgoinritthhmesCoplerPfaorcmk in VA with minimum colors, such that all vertex pairs that areadjacenttoatleastoneV vertexhavedifferentcolors.A v library which is, to the best of our knowledge, the only B 8 publicly-available coloring library for multicore architectures. similarproblemisdistance-2graphcoloring(D2GC),where 2 In addition to BGPC, the proposed techniques are employed a graph is colored in a way that the color of each vertex 6 to devise parallel distance-2 graph coloring algorithms and is different than the colors of the vertices in its distance-2 2 similar performance improvements have been observed. neighborhood.FormoredetailsontheapplicationsofBGPC 0 Finally, we propose two costless balancing heuristics for and D2PC and parallel algorithms to solve these problems . BGPC that can reduce the skewness and imbalance on the 1 cardinality of color sets (almost) for free. The heuristics on shared-memory and distributed-memory architectures, 0 can also be used for the D2GC problem and in general, we refer the reader to [3], [4], [5], [6], [7], [8]. 7 they will probably yield a better color-based parallelization From the parallel computing perspective, another 1 : performance especially on many-core architectures. desirable property of a good coloring is the balance on v Keywords-Greedy graph coloring; bipartite-graph coloring; the color set cardinalities [9], [10], [11], [12]; a more i X distance-2 coloring; shared-memory parallel algorithms. balanced coloring can improve the convergence speed and r the value of the final objective function for some iterative a I. INTRODUCTION algorithms. However, a tight balance is not required if A coloring on a graph G = (V,E) explicitly partitions shared-memory parallelism is the only concern; if all the the vertices in V into a number of disjoint subsets such color set cardinalities are above a certain threshold, that that two vertices u,v ∈ V that are in the same color set depends on the number of processors/cores available and are independent from each other, i.e., (u,v) ∈/ E. Graphs the task heterogeneities, the parallel performance will not have been frequently used to model data, e.g., matrices and be disrupted by the remaining imbalance since there will tensors,aswellascomputations.Inthesemodels,twoneigh- be enough work to feed all the available cores/processors. bor vertices usually imply a potential race-condition in a Good colorings are not free and their generation adds an parallelexecution.Ontheotherhand,givenavalidcoloring overhead for parallelization. Furthermore, the impact of this onV,eachcolorset,formedbyindependentvertices,canbe overhead increases if the coloring is performed sequentially simultaneouslyprocessedinalock-freemannerandwithout and the actual job is executed on a large number of cores. a synchronization overhead. Moreover, in practice, a good This is why parallelization of graph coloring algorithms coloring with a small number of colors will probably yield have been extensively studied for all the problems above, a better performance compared to a bad coloring with a e.g.,[3],[6],[13],[14].Theresultsintheliteratureshowthe execution time of a sequential D1GC algorithm is less than Algorithm 1 GREEDYGRAPHCOLORING a second for many real-life graphs. However, for D2GC Input: G = (V,E), V ⊆ V: vertices to be colored, color and BGPC, the overhead can be in the order of minutes. nbor(.):theneighborhoodfunctionfortheverticesinVcolor. The contribution of this paper are three-fold: 1) We Output: c[.]: a valid coloring array for Vcolor propose parallel BGPC algorithms on multicore architec- 1: W ←Vcolor tures that employ greedier and more optimistic techniques 2: c[v]←−1, ∀v ∈Vcolor compared to the existing algorithms. We compared the 3: while W is not empty do 4: c← COLORWORKQUEUE(G,W,c) performance of the proposed algorithms with the one in 5: W ←REMOVECONFLICTS(G,W,c) the ColPack library which, to the best of our knowl- edge, is the only publicly-available coloring library with a parallel BGPC implementation. On the average for eight Algorithm 2 COLORWORKQUEUE UFL matrices and with 16 threads/cores, 1.47× speedup is Input: G = (V,E), W: vertices to color, nbor(.): the obtained via basic optimizations, another 2.81× speedup is neighborhood function, c[.]: an incomplete coloring with no obtainedbyemployingfasterandmoreoptimistictechniques conflicts. Output: c[.]: an optimistic coloring. without a significant increase on the number of colors. Overall, the proposed algorithm is 4.71× faster than the 1: for each w ∈W in parallel do parallel ColPack implementation on 16 threads and uses 2: F ←∅ (cid:46) thread private forbidden color set for w 3: for each u∈ nbor(w) do only 8% more colors. 2) We applied the same techniques 4: if c[u](cid:54)=−1 then for the D2GC problem and observed similar speedups on 5: F ←F ∪{c[u]} the five of eight, square, structurally symmetric matrices in 6: col←0 (cid:46) first-fit coloring policy our test-bed. 3) We integrated two online heuristics to the 7: while col∈F do proposed BGPC algorithms that aim to balance the color 8: col←col+1 set cardinalities during the course of the coloring without a 9: c[w]←col significant computational overhead: The first heuristic tries not to increase the number of colors, whereas the second oneaggressivelyimprovesthebalancebyusingmorecolors the performed to check if they are conflicting with the other (only 11% more on average for the eight graphs in our vertices in V . When conflicts are detected, the conflict- experiments). color ing vertices are added to the next iteration’s vertex queue The rest of the paper is organized as follows: Section II and the procedure is repeated. This greedy and optimistic introduces the notation and background on parallel coloring approach can be used for almost all the coloring variants algorithms. The proposed BGPC algorithms are described andthedefinitionsofV andnbor(.)changewithrespect in detail in Section III and their adaptation for D2GC color to the problem. For the BGPC problem on a bipartite graph is presented in Section IV. The balancing heuristics are G=(V,E)whereV =V ∪V hastwoparts,V =V describedinSectionV.SectionVIpresentstheexperimental A B color A and for each u∈V , nbor(u) is defined as {v ∈V \{u}: results and Section VII briefly surveys the related coloring A A ∃w ∈ V s.t. (u,w) ∈ E and (v,w) ∈ E}. For D2GC, literature. Section VIII concludes the paper. B V = V and nbor(u) is the set of vertices in V whose color II. BACKGROUNDANDNOTATION shortest-path distances to u are less than or equal to two. The BGPC problem can also be considered as a Most of the recent coloring algorithms use a speculative, hypergraph coloring problem [6] where the elements of iterative approach which first colors the vertices V correspond to the pins to be colored, and the ones in optimistically in parallel hoping that a valid coloring A will be generated, e.g., [3], [13], [14], [15]. The validity of the coloring is then verified in a conflict removal step; if Algorithm 3 REMOVECONFLICTS a conflict, i.e., a pair of neighbor vertices with the same Input: G = (V,E): the graph to color, W: vertices to color,isdetected,oneoftheverticesistaggedtobecolored color,nbor(.):theneighborhoodfunction,c[.]:anoptimistic in the next iteration. Let G = (V,E) be a graph and let coloring. Vcolor ⊆ V be the vertices that need to be colored. Let Output: Wnext: the work queue for next iteration, c[.]: a nbor(v) ⊂ Vcolor define the neighborhood structure of the (probably incomplete) coloring with no conflicts. vertices to be colored. Throughout the text, non-negative 1: Wnext ←∅ (cid:46) a shared queue for the next iter. integers will be used for coloring and -1 is used for an 2: for each w ∈W in parallel do uncolored vertex. A pseudocode of the greedy optimistic 3: for each u∈ nbor(w) do graph coloring approach is given in Algorithms 1, 2 and 3. 4: if c[u]=c[w] and w >u then Asthealgorithmsshow,ateachiteration,asetofvertices 5: Wnext ←Wnext∪{w} : atomic inW areoptimisticallycolored.Aconflictremovalphaseis 6: break V correspond to the nets in the hypergraph which define traversalinthefirstiterationisΘ(cid:0)(cid:80) |vtxs(v)|2(cid:1).The B v∈VB the neighborhood. Based on this analogy, for clarity, while complexity of the conflict removal phase for the first itera- describing our BGPC algorithms we will use the terms tion is also O(cid:0)(cid:80) |vtxs(v)|2(cid:1). Although there can be v∈VB vertexandnettodenoteaV andV vertex,respectively,in early terminations (line 6 of Alg. 3), this worst-case bound A B thebipartitegraph.Similarly,foravertexu∈V (v ∈V ), is tight; if the optimistic coloring is valid, the neighborhood A B nets(u) (vtxs(v)) will denote the set of V (V ) vertices needs to be traversed for each vertex in V =V in the B A A color adjacent to u (v). first conflict removal phase. Unfortunately, for many BGPC use cases, such as numerical optimization, there can be V B III. ALGORITHMSFORBIPARTITE-GRAPHCOLORING nets having tens of thousands of adjacent vertices. These In BGPC, both of the coloring and conflict removal nets will be problematic while coloring a bipartite graph phases can be performed in two ways: vertex-based especially for the first iteration that dominates the overall and net-based. The existing literature on shared-memory execution time according to our experience. bipartite-graphpartialcoloringalgorithmsfollowtheformer In net-based coloring, the vertices are colored by approach.However,net-basedcoloringcanbemoreefficient observing the neighborhood from the nets’ side; in BGPC, since the neighborhood single-handedly defines the validity a conflict is created when “two vertices in the same vtxs of the coloring. Furthermore, depending on the iteration set are colored with the same color”. Hence, the net-based number and the size of the current work queue W, i.e., the approach sounds more natural for coloring. The coloring number of remaining vertices to be colored, this approach and conflict removal phases of the most straightforward can be more efficient.. and the most optimistic net-based BGPC are given in The vertex-based BGPC approach, which is employed Algorithms 6 and 7, respectively. by the ColPack library, traverses the neighborhood starting from the vertices to be colored both for Algorithm 6 BGPC-COLORWORKQUEUE-NET-V1 COLORWORKQUEUE and REMOVECONFLICTS as shown Input: G=(V ∪V ,E): a bipartite graph. A B in Algorithms 4 and 5, respectively. Output: c[.]: the (most) optimistic coloring array. 1: for each v ∈VB in parallel do Algorithm 4 BGPC-COLORWORKQUEUE-VERTEX 2: F ←∅ : thread private forbidden color set for v Input: G=(V ∪V ,E): a bipartite graph, W: vertices to 3: col←0 : thread private A B color, c[.]: an incomplete coloring with no conflicts. 4: for each u∈ vtxs(v) do Output: c[.]: an optimistic coloring. 5: if c[u]=−1 or c[u]∈F then 1: for each w ∈W in parallel do 6: while col∈F do 2: F ←∅ : thread private forbidden color set for w 7: col←col+1 3: for each v ∈ nets(w) do 8: c[u]←col 4: for each u∈ vtxs(v) \{w} do 9: F ←F ∪{c[u]} 5: if c[u](cid:54)=−1 then 6: F ←F ∪{c[u]} 7: ... (cid:46) first-fit coloring (lines 6-9 in Alg. 2) Algorithm 7 BGPC-REMOVECONFLICTS-NET Input: G = (V ∪V ,E): a bipartite graph to color, c[.]: A B an optimistic coloring. Algorithm 5 BGPC-REMOVECONFLICTS-VERTEX Output: c[.]: an incomplete coloring. Input: G = (VA ∪VB,E), W: vertices to color, nbor(.): 1: for each v ∈VB in parallel do the neighborhood function, c[.]: an optimistic coloring. 2: F ←∅ : thread private forbidden color set for v Output: Wnext: the work queue for next iteration, c[.]: a 3: for each u∈ vtxs(v) do (probably incomplete) coloring with no conflicts. 4: if c[u](cid:54)=−1 then 1: Wnext ←∅ : a shared queue for the next iter. 5: if c[u]∈F then 2: for each w ∈W in parallel do 6: c[u]←−1 3: for each v ∈ nets(w) do 7: else 4: for each u∈ vtxs(v) \{w} do 8: F ←F ∪{c[u]} 5: ... (cid:46) detect conflicts (lines 4-6 in Alg. 3) The net-based coloring in Algorithm 6 processes the For BGPC-COLORWORKQUEUE-VERTEX, the vertex- nets in parallel to color the vertices in the adjacency lists. based approach needs to go over all the vertices in V The complexity of each iteration is linear in terms of color in the first iteration. That is each net v ∈V will be visited the size of the graph (|V ∪V |+|E|). However, while B A B vtxs(v) times and for each visit, all vtxs(v) edges will coloring, each thread only checks the local conflicts within be processed; hence, the complexity of the neighborhood the neighborhood of the current net’s adjacency; this is the optimism. When a vertex u is visited (line 4), the thread Algorithm 8 BGPC-COLORWORKQUEUE-NET first checks the value of c[u]. If c[u] is not set yet (or set Input: G = (V ∪ V ,E): a bipartite graph, c[.]: an A B to -1 in the previous conflict removal phase), or if c[u] has incomplete coloring. been used for the current net before, u is recolored (line 8). Output: c[.]: an optimistic coloring array. While doing that, Algorithm 6 imitates a net-level first-fit 1: for each v ∈VB in parallel do coloring(lines6–8)forthevisitedvertices.Thisisthemost 2: F ←∅ : thread private forbidden color set for v optimistic net-based coloring since the threads “hope” that 3: Wlocal ←∅ : thread private vertices to be colored 4: for each u∈ vtxs(v) do theyareusingacolorintheearlierpositionsoftheadjacency 5: if c[u](cid:54)=−1 and c[u]∈/ F then list which will not appear later positions. Unfortunately, our 6: F ←F ∪{c[u]} preliminary experiments show that this level of optimism is 7: else maleficent due to the large number of conflicts it incurs. 8: Wlocal ←Wlocal∪{u} Although the net-based approach is not straightforward 9: col←|vtxs(v)|−1 (cid:46) reverse first-fit coloring to employ for the coloring phase, it suits much better for 10: for each u∈Wlocal do the conflict removal phase; a net-based traversal given in 11: while col∈F do Algorithm 7 is sufficient to detect all the existing conflicts. 12: col←col−1 Moreover, unlike its vertex-based variant, the complexity 13: c[u]←col of an iteration is linear in terms of the graph size. One 14: col←col−1 drawback is that it may remove more colorings than required compared to vertex-based approach. However, we Remaining|Wnext|afterthefirstiteration did not observe a significant performance reduction due to Matrix-Graph |VB| Alg.6 Alg.6+reverse Alg.8 bone010 986,703 863,785 806,264 610,924 this optimism of net-based conflict detection. coPapersDBLP 540,486 409,621 303,152 133,874 Tokeepthecoloringprocessintherighttrackbyreducing TableI thenumberofconflicts,weproposealessoptimisticversion THENUMBEROFUNCOLORED(REMAINING)VERTICESAFTERTHE FIRSTITERATIONFORTWOGRAPHS,OBTAINEDFROMMATRICES ofBGPC-COLORWORKQUEUE-NET-V1asinAlgorithm8. BONE010ANDCOPAPERSDBLP,WHENALGORITHMS6AND8ARE Therearetwomainmodifications:first,toreducethenumber USEDON16THREADS. of re-colorings within the adjacency list of a single net, the algorithm first performs a pass on the adjacency list and thecoloringphase,itisnoteasytorestricttheneighborhood marks the forbidden colors (the for loop at line 4). While thatneedstobetraversedtoidentifyalltheconflicts.Hence, doing that, it also stores the vertices that need to be colored net-based conflict removal can be much faster than the inathreadprivatequeueWlocal (line8).Afterthefirstpass, vertex-based variant for the first few iterations. Although it the vertices in Wlocal are visited and colored one-by-one. can make the performance even worse for later iterations, The second modification is applied while coloring these in our experiments, 78% of the runtime is observed to be vertices; instead of using a first-fit policy that uses the used on the first iteration. That number goes up to 89% for smallest possible color for a vertex, we employ a reverse the first two iterations on average for eight graphs we used. first-fit policy (lines 9–14) that uses the largest possible Thus, attacking these first iterations would be enough. color smaller than |vtxs(v)| while coloring the vertices in Figure 1 shows the execution times of each iteration of W . This policy never uses a negative color since there local different algorithms while coloring coPapersDBLP with 16 are at most |vtxs(v)| vertices in W and |vtxs(v)| local threads. In the figure, an algorithm X-Y applies X-based colors can still be used for them. Besides, since |vtxs(v)| is a lower-bound on the number of colors used, we do not expect a large increase on the number of colors used. 1000 Moreover, reverse first-fit is expected to produce less Conf. Removal Coloring number of conflicts compared to the first-fit in Algorithm 6, ec) since it does not use always use the same small colors but e (ms100 prioritize different colors for each net. To understand the &m n benefits of these modifications better, we refer the reader to o & u 10 Table I where the number of colored (remaining) vertices ec Ex after the first iteration is presented for two graphs when Algorithms 6 and 8 are employed. 1 A drawback of the net-based conflict detection is the 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 need of traversing all the nets for all iterations. For the V-V-64D V-N∞ V-N1 V-N2 N1-N2 N2-N2 Algorithm - Rounds vertex-based approach, it is sufficient to visit only the neighborhoodoftheverticescoloredatthecurrentiteration. Figure 1. The execution times (in msec.) of each iteration for various However, without an intelligent net-marking technique in algorithmswhilecoloringcoPapersDBLPwith16threads. coloring and Y-based conflict removal where the letter V Algorithm 9 D2GC-COLORWORKQUEUE-NET and N denote vertex- and net-based, respectively. A number Input: G=(V,E): a graph, c[.]: an incomplete coloring. nadjacenttotheletterNdenotesthatthealgorithmperforms Output: c[.]: the (most) optimistic coloring array. net-basedapproachforthefirstniterationsandswitchtothe 1: for each v ∈V in parallel do vertex-based approach. A more detailed explanation of the 2: F ←∅ : thread private forbidden color set for v algorithms is given in Section VI. The figure tells that: 1) 3: Wlocal ←∅ : thread private vertices to be colored mostofthetimeisspentforthecoloring,2)mostofthetime 4: if c[v](cid:54)=−1 then 5: F ←F ∪{c[v]} is spent in the first iterations, 3) using net-based conflict re- 6: else movalateveryiterationcanmaketheperformanceworse(V- 7: Wlocal ←Wlocal∪{v} N∞), 4) using net-based coloring is a performance-wise 8: for each u∈ nbor(v) do good idea for the first iteration (N1-N2), 5) using an ad- 9: if c[u](cid:54)=−1 and c[u]∈/ F then ditional net-based coloring at the second iteration is not 10: F ←F ∪{c[u]} useful (N1-N2). The last observation can be obsolete if a 11: else better net-based (or a hybrid) coloring approach is found. 12: Wlocal ←Wlocal∪{u} Implementation details: For all the algorithms described 13: col←|nbor(v)| (cid:46) reverse first-fit coloring above,thememoriesfortheforbiddencolorsetF andthelo- 14: for each u∈Wlocal do 15: while col∈F do calvertexqueuesW areallocatedonlyonceandsimple local 16: col←col−1 arraysareusedtorealizethem.Furthermore,thesestructures 17: c[u]←col are never actually emptied or reset. For each thread, F 18: col←col−1 is repetitively used for different nets/vertices via different markers without any reset operation. Similarly, the local queueW isemptiedbyonlysettingalocalpointerto0. Algorithm 10 D2GC-REMOVECONFLICTS-NET local Input: G = (V,E): a graph to color, c[.]: an optimistic coloring. Output: c[.]: an incomplete coloring. IV. ALGORITHMSFORDISTANCE-2GRAPHCOLORING 1: for each v ∈V in parallel do 2: F ←∅ : thread private forbidden color set for v The net-based approach can also be used for the D2GC 3: if c[v](cid:54)=−1 then problem. Due to the similarity between the problem 4: F ←F ∪{c[v]} definitions of BGPC and D2GC, the corresponding vertex- 5: for each u∈ nbor(v) do and net-based algorithms can be implemented along the 6: if c[u](cid:54)=−1 then lines of the bipartite graph partial coloring algorithms given 7: if c[u]∈F then above with a single difference: distance-1 neighbors must 8: c[u]←−1 also be considered in the neighborhood as well. Here for 9: else 10: F ←F ∪{c[u]} completeness, we present the pseudo-codes for net-based D2GCcoloringandconflictremovalphasesinAlgorithms9 and 10, respectively, but skip the vertex-based versions due to the space limitation. Since the input graph G = (V,E) V. BALANCINGCOLORSETCARDINALITIES is unipartite, but not bipartite, instead of the nets(u) and As mentioned before, graph coloring has been frequently vtxs(u), the notation nbor(u) will be used to denote the used to parallelize a large task with many sub-tasks. In adjacency list of a vertex u∈V. However, for consistency, our preliminary experiments, the (reverse) first-fit policy we will keep naming these greedier versions as net-based. generated a few large color sets (of small colors) and Unlike the BGPC algorithms, for D2GC, the threads visit thousands of color sets with less than 2 elements for a actual vertices to be colored; this is why, both of the D2GC real-life optimization problem. This result is in concordant coloring and conflict removal algorithms first process the with a comprehensive recent study focusing solely on color of the visited vertices (lines 4–7 of Algorithm 9 and balancing, parallel balancing heuristics, and their practical lines 3–4 of Algorithm 10). This is necessary to handle the impacts on parallel computing [9]. In fact, on a single distance-1neighborswhichistheadditionalrequirementfor multicore CPU, the performance reduction (in FLOPS) D2GC compared to BGPC. The same reverse first-fit policy may not hurt too much since most of the vertices, with is applied while coloring the vertices; the only difference is small colors, can still be processed in parallel. However, the candidate color is initialized with |nbor(v)| instead of the impact of the imbalance increases with the number |nbor(v)|−1 (as in D2GC) since the vertex assigned to a of processors/cores. Furthermore, in most of the iterative thread will also be colored by the thread requiring at least algorithms, processing only a few vertices and updating |nbor(v)|+1 available colors (including color 0). the current solution can be harmful from the optimization perspective since this restricts the dimensions of the moves largecolornumbersandfocusthelatercolorsintheinterval in the search space performed to reach a better solution. more,theminimumcolortostartissettocol /3+1(the max In this work, we experimented on cost-free and unsuper- last line of Alg. 12). However, filling these color sets with visedbalancingheuristicswithintheBGPCandD2GCalgo- more vertices increases the probability of them being in a rithmsproposedabove.Thestraightforwardchoicewouldbe forbidden-color array. Thus, more colors are expected to keeping color set cardinalities dynamically throughout the appear during the course of execution due to the conflicting execution; but this is expensive especially for large number nature of balancing and using less number of colors. of cores. Instead, the first proposed heuristic tries to keep the number of colors the same as much as possible and the Algorithm 12 COLORWORKQUEUE-B2 second one aggressively applies balancing hence increases Input: G = (V,E), W: vertices to color, nbor(.): the the number of colors (only around 10% on average). The neighborhood,c[.]:anincompletecoloringwithnoconflicts. heuristics are given in Algorithms 11 and 12 for the vertex- Output: c[.]: an optimistic coloring. based approach. The net-based variants are also similar. 1: colmax ←0 : thread private 2: colnext ←0 : thread private Algorithm 11 COLORWORKQUEUE-B1 3: for each w ∈W in parallel do Input: G = (V,E), W: vertices to color, nbor(.): the 4: ... (cid:46) lines 2-6 of Alg. 2 neighborhood,c[.]:anincompletecoloringwithnoconflicts. 5: col←colnext 6: while col∈F do Output: c[.]: an optimistic coloring. 7: col←col+1 1: colmax ←0 : thread private 8: if col>colmax then 2: for each w ∈W in parallel do 9: col←0 3: ... (cid:46) lines 2-6 of Alg. 2 10: while col∈F do 4: if w mod2=0 then 11: col←col+1 5: col←colmax 12: c[w]←col 6: while col∈F do 13: colmax =max(colmax,col) 7: col←col−1 14: colnext =min(col+1,colmax/3+1) 8: if col=−1 then 9: col←colmax+1 10: while col∈F do VI. EXPERIMENTS 11: col←col+1 Alltheexperimentsinthepaperareperformedonasingle 12: else machinerunningon64bitCentOS6.5equippedwith64GB 13: col←0 RAM and a dual-socket Intel Xeon E7-4870 v2 clocked at 14: while col∈F do 2.30 GHz where each socket has 15 cores (30 in total). For 15: col←col+1 the multicore implementations, we used OpenMP and all 16: c[w]←col the codes are compiled with gcc 4.9.2 with the -O3 17: colmax =max(colmax,col) optimization flag enabled. For each problem, we experi- mentedoneightdifferentalgorithmswhicharecombinations In the first balancing heuristic B1, each thread keeps of the heuristics given in Sections III and IV. For fairness, track of the maximum color it uses (col at line 1). The all the algorithms are implemented within the ColPack max threads employ the first-fit policy for the odd-numbered environment using the same data structures as much as vertices (or nets) and otherwise, they employ the reverse possible. All the algorithms are summarized below: first-fit policy starting from colmax. Unlike the original • V-V:vertex-basedcoloringandconflictremovalwithfirst-fitpolicy. BGPC and D2GC algorithms, starting from col , instead This is the default implementation of ColPack for BGPC. For max D2GC, ColPack does not have a parallel implementation but a of |nbor(w)| − 1, necessitates a safety check (line 8). If sequential one exists. We implemented the parallel version based this is the case, the heuristic initiates a first-fit starting from ontheBGPCalgorithmbyaddingthecorrespondingstatementsfor col + 1. By performing alternating policies w.r.t. the distance-1neighbors. max vertex (or net) id, B1 hopes to distribute the colors evenly • V-V-64:SameasV-Vbutthechunk-sizefordynamicschedulingof OpenMPthreadsaresetto64. in the interval [0,colmax]. If there is no color between this • V-V-64D: In ColPack’s conflict removal, a conflicting vertex is interval, it extends the size of the interval. immediately added to the shared work queue of the next iteration. UnlikeV-V64,thisalgorithmperformsalazyconstructionbyusing The second heuristic B2, given in Algorithm 12, keeps a privatequeuesforeachthreadthatarecombinedattheendofeach variablecolnext tInadditiontocolmax tostartfromthecolor iteration. search.Theideaisthesame:theheuristicwantstodistribute • V-N∞, V-N1 and V-N2: Vertex-based coloring (64D) with net- based conflict removal in all, the first, and the first two iterations, the colors in between [0,col ] but increments the color max respectively.Afterthat,thealgorithmsswitchtovertex-based(64D) to start by one for each vertex/net. To aggressively favor conflictremoval. Avg.#colors Speedup Avg.#colors Speedup normalized SpeedupoversequentialV-V overV-V normalized SpeedupoversequentialV-V overV-V Algorithm w.r.t.V-V t=2 t=4 t=8 t=16 fort=16 Algorithm w.r.t.V-V t=2 t=4 t=8 t=16 fort=16 V-V 1.00 0.74 1.24 1.88 2.76 1,00 V-V 1.00 0.93 1.65 2.81 3.78 1,00 V-V-64 1.01 0.81 1.40 2.36 4.00 1,45 V-V-64 1.01 0.99 1.89 3.55 6.41 1.70 V-V-64D 1.01 0.85 1.46 2.41 4.05 1,47 V-V-64D 0.99 1.04 1.99 3.75 6.86 1.81 V-N∞ 1.01 1.47 2.34 3.65 5.84 2,11 V-N∞ 1.00 1.62 3.01 5.41 9.20 2.43 V-N1 1.01 1.48 2.35 3.64 5.85 2,11 V-N1 1.01 1.71 3.19 5.83 10.07 2.66 V-N2 1.01 1.49 2.37 3.71 6.01 2,17 V-N2 0.99 1.72 3.21 5.87 10.09 2.67 N1-N2 1.08 2.39 4.24 7.17 11.38 4,12 N1-N2 1.09 3.47 6.26 10.82 16.76 4.43 N2-N2 1.07 1.44 2.63 4.57 7.50 2,71 N2-N2 1.10 2.24 4.04 6.94 11.19 2.96 TableIII TableIV THEAVERAGESPEEDUPSOVERSEQUENTIALANDPARALLELV-VON16 THEAVERAGESPEEDUPSOVERSEQUENTIALANDPARALLELV-VON16 THREADSANDTHEINCREASEONTHENUMBEROFCOLORSWHENTHENATURAL THREADSANDTHEINCREASEONTHENUMBEROFCOLORSWHEN ORDERINGOFTHECOLUMNSISUSED.THENUMBERSARETHEGEOMETRIC SMALLEST-LASTORDEROFTHECOLUMNSISUSED.THENUMBERSARETHE MEANSOFTHEINDIVIDUALRESULTSFOREACHMATRIX. GEOMETRICMEANSOFTHEINDIVIDUALRESULTSFOREACHMATRIX. Color Speedupover w.r.t. SpeedupoversequentialV-V V-V-64D • N1-N2 and N2-N2: Similarly, these two algorithms use net-based Algorithm V-V t=2 t=4 t=8 t=16 fort=16 coloring in the first and first-two iterations, respectively, and both V-V-64D 1.04 1.38 2.18 3.46 6.11 1.00 use net-based conflict removal only in the first two iterations. The V-N1 1.04 2.32 3.38 5.22 8.97 1.39 algorithmsswitchtothevertex-based(64D)variantsafterthat. V-N2 1.04 2.27 3.37 5.24 8.87 1.37 N1-N2 1.09 2.49 4.44 7.85 13.20 2.00 The experiments are performed on eight graphs given TableV in Table II which are generated from their corresponding THEAVERAGESPEEDUPSOVERSEQUENTIALV-VANDPARALLELV-V-64DON16 UFL matrices [18]. Seven out of the eight graphs have THREADSANDTHEINCREASEONTHENUMBEROFCOLORS(OVERV-V)WHEN beentakenfromthecoloringandrelatedparallelcomputing THENATURALORDERINGOFTHECOLUMNSISUSED.THENUMBERSARETHE literature [9], [16], [17]. We also included a matrix from GEOMETRICMEANSOFTHEINDIVIDUALRESULTSFOREACHMATRIX.THE RESULTSARETHEAVERAGESOF10EXPERIMENTSFOREACH MovieLens dataset [19], 20M movielens, since matrix MATRIX-ALGORITHM-THREADTRIPLET. decomposition, and our preliminary experiments on these matrices, is the application that motivated us for this study. ForBGPC,wecoloredthecolumnsofthesematriceswhere B. Experiments for distance-2 graph coloring the rows are considered as the nets. For D2GC, we used 5 ForD2GC,wehaveexperimentedonthefiveofeightma- of 8 structurally symmetric matrices. This is denoted in the tricesinourdatasetasexplainedabove.We’veselectedfour last column of the table. algorithms which obtained promising results in the BGPC experiments. The results are presented in Table V. Similar A. Experiments for bipartite graph partial coloring to BGPC, 16-thread V-N1 and N1-N2 is 8.97× and 13.2× The execution times of BGPC algorithms for each matrix faster than sequential V-V with only 4% and 9% increase as well as the number of distinct colors are given in in color counts, respectively. V-V-64D is used to normalize Figure 2 and the results are summarized in Table IV. When the 16-thread speedups since all the algorithms employ the the natural vertex order is used, compared to sequential 64D option. When the improvement of chunk size and lazy ColPack implementation of BGPC, i.e., V-V, one can work-queue construction is removed, the optimism in N1- obtain 6.01× speed-up on 16 threads with 1% increase on N2obtains2×performanceon16-threadswithonlyaround thenumberofcolors(V-N2).Whenthenet-basedcoloringis 5% increase on the number of distinct colors. employed for one iteration (N1-N2), the speedup increases C. Experiments on balancing to 11.38× with a small, 8% increase on the number of colors.Thesealgorithmsare2.17×and4.12×,respectively, The impact on balancing heuristics B1 and B2 are faster than the parallel BGPC in ColPack on 16 threads. presentedinTableVIforBGPCexperiments.Theheuristics We also used compared the results when the smallest-last areappliedtoV-N2andN1-N2andtheresultsarecompared order in ColPack is employed. As Table II shows, this with their original implementation. Experimental results ordering indeed reduces the number of colors for most of show that, applying these heuristics is for free, i.e., there the cases. The results of these experiments are summarized is no computational overhead as expected. For B1, the in Table IV. Since the sequential ColPack execution for standard deviation of the color cardinalities decreases this ordering is slower than that of the natural ordering, 0.69× and 0.84× when applied to V-N2 and N1-N2, the speedups increase: compared to sequential V-V, the respectively, on the expense of 4% color increase. For B2, algorithms V-N2 and N1-N2 are 10.09× and 16.76× faster, which aggressively tries to reduce the number of colors, the respectively, with 16 threads. Compared to parallel V-V, on standard deviation decreases 0.25× and 0.62× with around 16 threads, N1-N2 is 4.43× faster with 9% increase on the 9% and 13% increase on the number of colors for V-N2 number of colors used. and N1-N2, respectively. To better visualize the impact of t=2 t=4 t=8 t=16 t=2 t=4 t=8 t=16 t=2 t=4 t=8 t=16 t=2 t=4 t=8 t=16 5 70 700 90000 4.5 Execu-on -me (sec) 234560000000000 23456780000000000000000000000000000 #Colors Execu-on -me (sec) 123...1234555 2345600000 #Colors 100 10000 0.5 10 0 0 0 0 V-V V-V-64 V-V-64D V-N∞ V-N1 V-N2 N1-N2 N2-N2 V-V V-V-64 V-V-64D V-N∞ V-N1 V-N2 N1-N2 N2-N2 Algorithm Algorithm Series1 Se(raie)s32 0M moviSeelreienss5 Series7 Series1 Series3( b) af sheSlleries5 Series7 Series2 Series4 Series6 Series8 Series2 Series4 Series6 Series8 6 160 8 50 140 7 45 Execu-on -me (sec) 12345 246811000002 00 #Colors Execu-on -me (sec) 123456 51122334 0505050 #Colors 0 0 0 0 V-V V-V-64 V-V-64D V-N∞ V-N1 V-N2 N1-N2 N2-N2 V-V V-V-64 V-V-64D V-N∞ V-N1 V-N2 N1-N2 N2-N2 Algorithm Algorithm (c) bone010 Series1 Series3( d) channeSleries5 Series7 Series1 Series3 Series5 Series7 Series2 Series4 Series6 Series8 Series2 Series4 Series6 Series8 90 700 9 4000 Execu-on -me (sec) 12345678 511223300505050000000 000000 #Colors Execu-on -me (sec) 1234567800000000 123456000000000000 #Colors 0 0 0 0 V-V V-V-64 V-V-64D V-N∞ V-N1 V-N2 N1-N2 N2-N2 V-V V-V-64 V-V-64D V-N∞ V-N1 V-N2 N1-N2 N2-N2 Algorithm Algorithm Series1 Ser(iee)s3c oPapersDSBerLiePs5 Series7 Series1 Series3 (f) HV15SReries5 Series7 Series2 Series4 Series6 Series8 Series2 Series4 Series6 Series8 18 80 50 3000 Execu-on -me (sec) 11110246468 234567000000 #Colors Execu-on -me (sec) 1122334405050505 511220050500000 0000 #Colors 2 10 5 0 0 0 0 V-V V-V-64 V-V-64D V-N∞ V-N1 V-N2 N1-N2 N2-N2 V-V V-V-64 V-V-64D V-N∞ V-N1 V-N2 N1-N2 N2-N2 Algorithm Algorithm (g) nlpkkt120 (h) uk 2002 Figure2. Theexecutiontimes(leftaxis)on2,4,8,16threads,respectively,andthenumberofcolors(rightaxis)forallthematricesandalgorithms. Properties SequentialBGPCV-V Columndeg. Natural SmallestLast Matrix #rows #cols #nnz max. Std.dev. Exec.time #colors Exec.time #colors Used 20M movielens 26,744 138,493 20,000,263 67,310 3,085.81 587.15 70,815 1,236.33 68,077 (cid:88)× af shell[16] 1,508,065 1,508,065 27,090,195 35 1.00 3.39 50 4.13 45 (cid:88)(cid:88) bone010[16] 986,703 986,703 36,326,514 63 7.61 4.28 132 6.86 110 (cid:88)(cid:88) channel[9] 4,802,000 4,802,000 42,681,372 18 1.00 2.57 39 4.75 36 (cid:88)(cid:88) coPapersDBLP[9] 540,486 540,486 15,245,729 3,299 66.23 6.73 3,321 9.68 3,300 (cid:88)(cid:88) HV15R[17] 2,017,169 2,017,169 283,073,458 484 53.95 66.94 508 87.01 484 (cid:88)× nlpkkt120[16] 3,542,400 3,542,400 50,194,096 28 3.00 4.22 59 7.88 49 (cid:88)(cid:88) uk-2002[9] 18,520,486 18,520,486 298,113,762 2,450 27.51 32.66 2,450 41.23 2,450 (cid:88)× TableII GRAPHS/MATRICESUSEDINTHEEXPERIMENTS:COLUMNS2-4ARETHENUMBERSOFROWS,COLUMNS,ANDNONZEROS,RESPECTIVELY.THENEXTTWOCOLUMNSARE THEMAXIMUMNUMBEROFNONZEROSINACOLUMNANDTHESTANDARDDEVIATIONOFTHENONZERODISTRIBUTION.COLUMNS7-8SHOWTHEEXECUTIONTIMEOF THESEQUENTIALBGPCALGORITHMANDTHEAVERAGENUMBEROFCOLORSWHENTHENATURALROWORDERISEMPLOYED.THENEXTCOLUMNSDOTHESAMEFOR THESMALLEST-LASTORDERIMPLEMENTEDINColPackTOREDUCETHENUMBEROFDISTINCTCOLORS.THELASTCOLUMNSHOWIFTHEMATRIXISUSEDINBGPC ANDD2GCEXPERIMENTS,RESPECTIVELY.THEORDERINGTIMEISNOTINCLUDEDINTHETABLE.MOREOVER,SINCETHEEXECUTIONSARESEQUENTIAL,ACONFLICT DETECTIONPHASEISNOTPERFORMED. Normalizedw.r.t.X-N2 hybrid MPI + OpenMP systems [32]. One common point Coloring #Color Average Std. Algorithm time sets card. Dev. of [5], [6] and the proposed work is that the conflict V-N2-U 1.00 1.00 1.00 1.00 removal phase of D2GC has been performed around middle V-N2-B1 0.95 1.04 0.96 0.69 V-N2-B2 0.95 1.13 0.89 0.25 vertices which is similar to the net-based conflict removal. N1-N2-U 1.00 1.00 1.00 1.00 Nevertheless, the authors studied D2GC in the distributed N1-N2-B1 0.99 1.04 0.96 0.84 N1-N2-B2 0.99 1.09 0.91 0.62 setting and applied the approach for all iterations. TableVI The balanced graph coloring problem has been studied IMPACTOFBALANCINGHEURISTICS,B1ANDB2,ONTHECOLORSET in the literature from different aspects; from theoretical CARDINALITIESANDTHENUMBEROFCOLORSETSFORPARALLELBGPC perspective, the term “equitable” is used for the colorings ALGORITHMSV-N2ANDN1-N2ON16THREADS.RESULTSARENORMALIZED WITHTHEORIGINALUNBALANCEDALGORITHMSDENOTEDWITH-U. wherethecolorsetcardinalitiesdifferatmostone[10],[11]. The most comprehensive study from the parallel computing perspective is recently introduced by Lu et al. [9]. In this these balancing heuristics, Figure 3 shows the distribution work, we follow a similar approach but mostly aim to of color set cardinalities for the original and balanced devise costless and online balancing heuristics that can be executions of V-N2 and N1-N2 on coPapersDBLP. applied to the parallel greedy graph coloring algorithms. VII. RELATEDWORK Coloring has mostly been investigated for distance-1 coloring, but most ideas can be ported to other variants. VIII. CONCLUSIONANDFUTUREWORK Since graph coloring is NP-Complete [1] and hard to approximate [2] in most of its variants, the vertices are In this work, we proposed novel, greedier and more greedily colored one after another, and the lowest available optimisticparallelalgorithmsforparallelBGPCandD2GC. color for a vertex is selected. Such an algorithm produces a We also proposed two costless balancing heuristics that coloringwithlessthan1+∆indistance-1coloring.Though can be applied to both BGPC and D2GC, as well as other to avoid the worst case, it is common to carefully choose coloring variants, to balance the color set cardinalities theorderinwhichtheverticesareprocessed[7]usingeither and improve the impact of the coloring on the real a static ordering [20], [21], or dynamic ordering [22]. application to be parallelized. The results show that the Earlier coloring algorithms [23], [24], [25], are based proposed techniques are useful in practice and improves generating maximum independent sets in parallel via the performance and the goodness of the coloring. algorithms such as [26]. Though, recent techniques The proposed techniques are suitable for GPUs and Intel optimistically color the vertices in parallel assuming that a Xeon Phi architectures which can be considered as a future valid coloring will be generated and then verify the validity work. In fact, the task sizes in the vertex-based approach, of the coloring. One of the neighbor vertices that are of i.e., the neighborhood sizes, deviates much more compared the same color is tagged to be colored in the next iteration to that of the net-based approach, i.e., number of vertices of the algorithm. This technique was successfully applied adjacenttoanet,whichcanbeacomfortwhileparallelizing on distributed memory machine [27], [28], [29], [30], the coloring algorithms on manycore architectures. We also including for BGPC and D2GC [5], [6]. The algorithm believe that a better net-based coloring and a better cost- was investigated also on shared memory, multicore and free, self-balancing heuristic worth to investigate since their manycore architectures [13], [31], [16], [8], [14] and on impact will be significant as the experimental results imply. Color set (sorted w.r.t. cardinality) Color set (sorted w.r.t. cardinality) 1 159 317 475 633 791 949 1107 1265 1423 1581 1739 1897 2055 2213 2371 2529 2687 2845 3003 3161 3319 10000 1 159 317 475 633 791 949 1107 1265 1423 1581 1739 1897 2055 2213 2371 2529 2687 2845 3003 3161 3319 #ver%ces in the color set (log scale) 101001000100001000001 VVV---NNN222---UBB21 #ver%ces in the color set (log scale) 1010010001 NNN111---NNN222---BBU21 (a) BalancingcoPapersDBLPforV-N2 (b) BalancingcoPapersDBLPforN1-N2 Figure3. Impactofbalancingheuristics,B1andB2,onthecolorsetcardinalitiesandthenumberofcolorsetsforBGPCalgorithmsparallelV-N2(left)andN1-N2(right) on16-threadsforcoPapersDBLP. ACKNOWLEDGMENTS [13] U¨. V. C¸atalyu¨rek, J. Feo, A. H. Gebremedhin, M. Halap- panavar,andA.Pothen,“Graphcoloringalgorithmsformulti- Kamer Kaya was supported by TU¨B˙ITAK BIDEB 2232 core and massively multithreaded architectures,” Parallel Computing, vol. 38, no. 10-11, pp. 576–594, Oct-Nov 2012. program under grant number 115C018. [14] M. Deveci, E. G. Boman, K. D. Devine, and S. Rajaman- ickam, “Parallel graph coloring for manycore architectures,” REFERENCES in2016IEEEParallelandDistributedProcessingSymposium (IPDPS), May 2016, pp. 892–901. [1] D.W.Matula,“Amin-maxtheoremforgraphswithapplica- [15] A. E. Sariyuce, E. Saule, and U. V. Catalyurek, “Scalable tion to graph coloring,” SIAM Review, vol. 10, pp. 481–482, hybrid implementation of graph coloring using mpi and 1968. openmp,” in Proc. 2012 IPDPSW & PhD Forum, ser. [2] D. Zuckerman, “Linear degree extractors and the inapprox- IPDPSW ’12. Washington, DC, USA: IEEE Computer imability of max clique and chromatic number,” Theory of Society, 2012, pp. 1744–1753. [Online]. Available: http: Computing, vol. 3, pp. 103–128, 2007. //dx.doi.org/10.1109/IPDPSW.2012.216 [3] A. H. Gebremedhin, D. Nguyen, M. M. A. Patwary, and [16] M. Patwary, A. Gebremedhin, and A. Pothen, “New mul- A.Pothen,“ColPack:Softwareforgraphcoloringandrelated tithreaded ordering and coloring algorithms for multicore problemsinscientificcomputing,”ACMTrans.Math.Softw., architectures,” in Euro-Par 2011 Parallel Processing - 17th vol. 40, no. 1, pp. 1:1–1:31, Oct. 2013. International Conference, E. Jeannot, R. Namyst, and J. Ro- [4] T.F.ColemanandJ.J.More,“EstimationofsparseJacobian man,Eds. SpringerBerlin/Heidelberg,2011,pp.250–262. matrices and graph coloring problems,” SIAM Journal on [17] T.Tessem,“Improvingparallelsparsematrix-vectormultipli- Numerical Analysis, vol. 1, no. 20, pp. 187–209, 1983. cation,” Master’s thesis, Univ. of Bergen, Fac. of Math. and [5] D. Bozdag˘, U¨. C¸atalyu¨rek, A. Gebremedhin, F. Manne, Natural Sciences Dept. of Informatics, December 2013. E. Boman, and F. O¨zgu¨ner, “A parallel distance-2 graph [18] T. A. Davis and Y. Hu, “The university of florida sparse coloring algorithm for distributed memory computers,” in matrix collection,” ACM Trans. Math. Softw., vol. 38, Proc.of1stInt’l.Conf.onHighPerformanceComputingand no. 1, pp. 1:1–1:25, Dec. 2011. [Online]. Available: Communications, ser. Lecture Notes in Computer Science, http://doi.acm.org/10.1145/2049662.2049663 vol.3726. Sorrento,Italy:Springer,Sep2005,pp.796–806. [19] “GroupLensResearch:MovieLensDataset,”http://grouplens. [6] ——, “Distributed-memory parallel algorithms for distance- org/datasets/movielens/,2016,[Online;accessedOct-1-2016]. 2 coloring and related problems in derivative computation,” [20] D. W. Matula and L. L. Beck, “Smallest-last ordering and SIAM Journal of Scientific Computing, vol. 32, no. 4, pp. clustering and graph coloring algorithms,” J. ACM, vol. 30, 2418–2446, 2010. pp. 417–427, July 1983. [7] A. H. Gebremedhin, F. Manne, and A. Pothen, “What color [21] D. J. A. Welsh and M. B. Powell, “An upper bound for the is your jacobian? Graph coloring for computing derivatives,” chromaticnumberofagraphanditsapplicationtotimetabling SIAM Review, vol. 47, no. 4, pp. 629–705, 2005. problems,” The Comp. Journal, vol. 10, pp. 85–86, 1967. [8] ——, “Parallel distance-k coloring algorithms for numerical [22] D. Bre´laz, “New methods to color the vertices of a graph,” optimization,” in Euro-Par 2002 Parallel Processing - 8th Commun. ACM, vol. 22, pp. 251–256, April 1979. International Conference, 2002, pp. 912–921. [23] J. Allwright, R. Bordawekar, P. D. Coddington, K. Dincer, [9] H.Lu,M.Halappanavar,D.Chavarra-Miranda,A.Gebremed- and C. Martin, “A comparison of parallel graph coloring al- hin, and A. Kalyanaraman, “Balanced coloring for parallel gorithms,”NortheastParallelArchitecturesCenteratSyracuse computingapplications,”inParallelandDistributedProcess- University (NPAC), Tech. Rep. SCCS-666, 1994. ing Symposium (IPDPS), 2015 IEEE, May 2015, pp. 7–16. [24] M.JonesandP.Plassmann,“Aparallelgraphcoloringheuris- [10] W. Meyer, “Equitable coloring,” Amer. Math. Monthly, tic,” SIAM Journal on Scientific Computing, vol. 14, no. 3, vol. 80, pp. 920–922, 1973. pp. 654–669, 1993. [11] A. Hajnal and E. Szemeredi, “Proof of a conjecture of p. [25] R. K. Gjertsen Jr., M. T. Jones, and P. Plassmann, “Parallel erdos,” London: North-Holland, pp. 601–623, 1970. heuristics for improved, balanced graph colorings,” Journal [12] J. Robert K. Gjertsen, M. T. Jones, and P. E. Plassmann, onParallelandDist.Computing,vol.37,pp.171–186,1996. “Parallel heuristics for improved, balanced graph colorings,” [26] M.Luby,“Asimpleparallelalgorithmforthemaximalinde- JournalofParallelandDistributedComputing,vol.37,no.2, pendent set problem,” SIAM Journal on Computing, vol. 15, pp. 171–186, 1996. no. 4, pp. 1036–1053, 1986.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.