ebook img

TEXAS TECH UNIVERSITY SYSTEM AKIN: A Streaming Graph Partitioning Algorithm for ... PDF

23 Pages·2017·7.03 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview TEXAS TECH UNIVERSITY SYSTEM AKIN: A Streaming Graph Partitioning Algorithm for ...

TEXAS TECH UNIVERSITY SYSTEM AKIN: A Streaming Graph Partitioning Algorithm for Distributed Graph Storage Systems Wei Zhang*, Yong Chen, Dong Dai Department of Computer Science Texas Tech University May 2nd, 2018 Data-Intensive Scalable Computing Laboratory CCGrid 2018, Washington D.C. Introduction Meet Graph Stream Data-Intensive Scalable Computing Laboratory -1 - CCGrid 2018, Washington D.C. Introduction Dynamic Growing Graphs in Real-life Applications Ø Social Networks Ø Civil Engineering v Traffic Network Ø Mass-Communication v World Wide Web Ø Marketing v Co-purchase Network Ø Metadata Management v GraphMeta Ø Bioinformatics Ø Other Instant Graph Analysis v Gene Sequencing (Nucleotides) v Protein-Protein Interaction Data-Intensive Scalable Computing Laboratory -2 - CCGrid 2018, Washington D.C. Introduction Graph Stream on Distributed Graph Storage 8 Ø Graph Stream 7 v Stream of vertices/edges data generated by 6 a series of graph-related events 8 6 5 Ø Vertex Stream 7 1 Edge Vertex Stream Stream v Generated by vertex creation events 2 6 4 Ø Edge Stream 2 5 3 3 5 v Generated by edge creation events 3 4 2 1 2 7 1 6 8 Distributed Graph Storage 5 1 2 4 3 Data-Intensive Scalable Computing Laboratory -3 - CCGrid 2018, Washington D.C. Background Graph Partitioning: Problem and Algorithms Data-Intensive Scalable Computing Laboratory -4 - CCGrid 2018, Washington D.C. Background Definition: k-way Graph Partitioning G. Karypis and V. Kumar, “A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs,” SIAM Journal on Scientific Computing, vol. 20, no. 1, pp. 359–392, 1999. Data-Intensive Scalable Computing Laboratory -5 - CCGrid 2018, Washington D.C. Background Graph Partitioning – Two Major Pursuits Ø Maintaining Partition Balance v Each partition contains roughly equal number of vertices Ø Minimizing Edge-Cut Ratio v Edge-cut = Total number of cross-partition links !"#$% ’()*+, "- .,"//01$,#2#2"3 %234/ v Edge-cut Ratio = !"#$% 3()*+, "- $%% +56+/ 23 $ 6,$17 v Smaller edge-cut ratio means less communication overhead between distributed storage nodes. Data-Intensive Scalable Computing Laboratory -6 - CCGrid 2018, Washington D.C. Background Existing Graph Partitioning Algorithms Offline Approach Online Approach Online Approach (Semi-Streaming) (Streaming) v METIS/Chaco/SBV-cut/... v Prefer Big/Avoid Big/... v DH/LDG/FENNEL/... v Load Entire Graph v Graph Stream Buffer v Not Buffering Graph Stream v Multiple Iterations v Buffer-flushing Actions v Compute on Each Vertex v Intensive Computation v Moderate Computation v Low Computation Overhead v Optimal Partitioning Result v Moderate Partitioning Result v Poor Partitioning Result v Optimal Edge-cut Ratio v Moderate Edge-cut Ratio v High Edge-cut Ratio v Balanced Partitions v Balanced Partitions v Balanced Partitions v Incapable on Streaming v Incapable on Streaming v Capable on Streaming Workload Workload Workload Data-Intensive Scalable Computing Laboratory -7 - CCGrid 2018, Washington D.C. Design Principles Rethink Graph Partitioning in Streaming Setting Data-Intensive Scalable Computing Laboratory -8 - CCGrid 2018, Washington D.C. Design Principles Graph Partitioning v.s. Streaming Setting GP 59 SS Graph Partitioning Non-Optimal Partitioning Result Minimized Edge-cut Balanced Partitions Limited Data per Streamed Event More Graph Data DistributedGraphStorage Streaming Setting Data-Intensive Scalable Computing Laboratory -9 - CCGrid 2018, Washington D.C.

Description:
AKIN: A Streaming Graph Partitioning Algorithm for Distributed Graph Storage Systems. Wei Zhang*, Yong Chen, Dong Dai. Department of Computer
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.