Table Of ContentHigh Performance Computing for
Big Data
METHODOLOGIES AND APPLICATIONS
Chapman & Hall/CRC
Big Data Series
SERIES EDITOR
Sanjay Ranka
AIMS AND SCOPE
This series aims to present new research and applications in Big Data, along with the computa-
tional tools and techniques currently in development. The inclusion of concrete examples and
applications is highly encouraged. The scope of the series includes, but is not limited to, titles in the
areas of social networks, sensor networks, data-centric computing, astronomy, genomics, medical
data analytics, large-scale e-commerce, and other relevant topics that may be proposed by poten-
tial contributors.
PUBLISHED TITLES
HIGH PERFORMANCE COMPUTING FOR BIG DATA
Chao Wang
FRONTIERS IN DATA SCIENCE
Matthias Dehmer and Frank Emmert-Streib
BIG DATA MANAGEMENT AND PROCESSING
Kuan-Ching Li, Hai Jiang, and Albert Y. Zomaya
BIG DATA COMPUTING: A GUIDE FOR BUSINESS AND TECHNOLOGY
MANAGERS
Vivek Kale
BIG DATA IN COMPLEX AND SOCIAL NETWORKS
My T. Thai, Weili Wu, and Hui Xiong
BIG DATA OF COMPLEX NETWORKS
Matthias Dehmer, Frank Emmert-Streib, Stefan Pickl, and Andreas Holzinger
BIG DATA : ALGORITHMS, ANALYTICS, AND APPLICATIONS
Kuan-Ching Li, Hai Jiang, Laurence T. Yang, and Alfredo Cuzzocrea
High Performance Computing for
Big Data
METHODOLOGIES AND APPLICATIONS
EDITED BY CHAO WANG
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2018 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed on acid-free paper
International Standard Book Number-13: 978-1-4987-8399-6 (Hardback)
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the
validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copy-
right holders of all material reproduced in this publication and apologize to copyright holders if permission to publish
in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know
so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho-
tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission
from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users.
For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been
arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
Preface, vii
Acknowledgments, xi
Editor, xiii
Contributors, xv
Section i Big Data Architectures
chapter 1 ◾ Dataflow Model for Cloud Computing Frameworks
in Big Data 3
Dong Dai, Yong chen, anD gangYong Jia
chapter 2 ◾ Design of a Processor Core Customized for Stencil
Computation 19
YouYang Zhang, Yanhua Li, anD Youhui Zhang
chapter 3 ◾ Electromigration Alleviation Techniques for 3D
Integrated Circuits 37
Yuanqing cheng, aiDa toDri-SaniaL, aLberto boSio, Luigi DiLiLLo, patrick girarD,
arnauD ViraZeL, paScaL ViVet, anD Marc beLLeViLLe
chapter 4 ◾ A 3D Hybrid Cache Design for CMP Architecture
for Data-Intensive Applications 59
ing-chao Lin, Jeng-nian chiou, anD Yun-kae Law
Section ii Emerging Big Data Applications
chapter 5 ◾ Matrix Factorization for Drug–Target Interaction Prediction 83
Yong Liu, Min wu, Xiao-Li Li, anD peiLin Zhao
v
vi ◾ Contents
chapter 6 ◾ Overview of Neural Network Accelerators 107
Yuntao Lu, chao wang, Lei gong, Xi Li, aiLi wang, anD Xuehai Zhou
chapter 7 ◾ Acceleration for Recommendation Algorithms
in Data Mining 121
chongchong Xu, chao wang, Lei gong, Xi Li, aiLi wang, anD Xuehai Zhou
chapter 8 ◾ Deep Learning Accelerators 151
YangYang Zhao, chao wang, Lei gong, Xi Li, aiLi wang, anD Xuehai Zhou
chapter 9 ◾ Recent Advances for Neural Networks Accelerators
and Optimizations 171
Fan Sun, chao wang, Lei gong, Xi Li, aiLi wang, anD Xuehai Zhou
chapter 10 ◾ Accelerators for Clustering Applications
in Machine Learning 191
Yiwei Zhang, chao wang, Lei gong, Xi Li, aiLi wang, anD Xuehai Zhou
chapter 11 ◾ Accelerators for Classification Algorithms
in Machine Learning 223
ShiMing Lei, chao wang, Lei gong, Xi Li, aiLi wang, anD Xuehai Zhou
chapter 12 ◾ Accelerators for Big Data Genome Sequencing 245
haiJie Fang, chao wang, ShiMing Lei, Lei gong, Xi Li, aiLi wang, anD Xuehai Zhou
INDEX, 261
Preface
As scientific applications have become more data intensive, the management
of data resources and dataflow between the storage and computing resources is
becoming a bottleneck. Analyzing, visualizing, and managing these large data sets is
posing significant challenges to the research community. The conventional parallel archi-
tecture, systems, and software will exceed the performance capacity with this expansive
data scale. At present, researchers are increasingly seeking a high level of parallelism
at the data level and task level using novel methodologies for emerging applications. A
significant amount of state-of-the-art research work on big data has been executed in the
past few years.
This book presents the contributions of leading experts in their respective fields. It
covers fundamental issues about Big Data, including emerging high-performance archi-
tectures for data-intensive applications, novel efficient analytical strategies to boost data
processing, and cutting-edge applications in diverse fields, such as machine learning, life
science, neural networks, and neuromorphic engineering. The book is organized into two
main sections:
1. “Big Data Architectures” considers the research issues related to the state-of-the-art
architectures of big data, including cloud computing systems and heterogeneous
accelerators. It also covers emerging 3D integrated circuit design principles for mem-
ory architectures and devices.
2. “Emerging Big Data Applications” illustrates practical applications of big data
across several domains, including bioinformatics, deep learning, and neuromorphic
engineering.
Overall, the book reports on state-of-the-art studies and achievements in methodolo-
gies and applications of high-performance computing for big data applications.
The first part includes four interesting works on big data architectures. The contribution
of each of these chapters is introduced in the following.
In the first chapter, entitled “Dataflow Model for Cloud Computing Frameworks in Big
Data,” the authors present an overview survey of various cloud computing frameworks.
This chapter proposes a new “controllable dataflow” model to uniformly describe and com-
pare them. The fundamental idea of utilizing a controllable dataflow model is that it can
effectively isolate the application logic from execution. In this way, different computing
vii
viii ◾ Preface
frameworks can be considered as the same algorithm with different control statements to
support the various needs of applications. This simple model can help developers better
understand a broad range of computing models including batch, incremental, streaming,
etc., and is promising for being a uniform programming model for future cloud computing
frameworks.
In the second chapter, entitled “Design of a Processor Core Customized for Stencil
Computation,” the authors propose a systematic approach to customizing a simple core
with conventional architecture features, including array padding, loop tiling, data prefetch,
on-chip memory for temporary storage, online adjusting of the cache strategy to reduce
memory traffic, Memory In-and-Out and Direct Memory Access for the overlap of com-
putation (instruction-level parallelism). For stencil computations, the authors employed all
customization strategies and evaluated each of them from the aspects of core performance,
energy consumption, chip area, and so on, to construct a comprehensive assessment.
In the third chapter, entitled “Electromigration Alleviation Techniques for 3D Integrated
Circuits,” the authors propose a novel method called TSV-SAFE to mitigate electromigra-
tion (EM) effect of defective through-silicon vias (TSVs). At first, they analyze various pos-
sible TSV defects and demonstrate that they can aggravate EM dramatically. Based on the
observation that the EM effect can be alleviated significantly by balancing the direction of
current flow within TSV, the authors design an online self-healing circuit to protect defec-
tive TSVs, which can be detected during the test procedure, from EM without degrading
performance. To make sure that all defective TSVs are protected with low hardware over-
head, the authors also propose a switch network-based sharing structure such that the EM
protection modules can be shared among TSV groups in the neighborhood. Experimental
results show that the proposed method can achieve over 10 times improvement on mean
time to failure compared to the design without using such a method, with negligible hard-
ware overhead and power consumption.
In the fourth chapter, entitled “A 3D Hybrid Cache Design for CMP Architecture for
Data-Intensive Applications,” the authors propose a 3D stacked hybrid cache architecture
that contains three types of cache bank: SRAM bank, STT-RAM bank, and STT-RAM/
SRAM hybrid bank for chip multiprocessor architecture to reduce power consumption
and wire delay. Based on the proposed 3D hybrid cache with hybrid local banks, the
authors propose an access-aware technique and a dynamic partitioning algorithm to miti-
gate the average access latency and reduce energy consumption. The experimental results
show that the proposed 3D hybrid cache with hybrid local banks can reduce energy by
60.4% and 18.9% compared to 3D pure SRAM cache and 3D hybrid cache with SRAM local
banks, respectively. With the proposed dynamic partitioning algorithm and access-aware
technique, our proposed 3D hybrid cache reduces the miss rate by 7.7%, access latency by
18.2%, and energy delay product by 18.9% on average.
The second part includes eight chapters on big data applications. The contribution of
each of these chapters is introduced in the following.
In the fifth chapter, entitled “Matrix Factorization for Drug–Target Interaction
Prediction,” the authors first review existing methods developed for drug–target interaction
prediction. Then, they introduce neighborhood regularized logistic matrix factorization,
Preface ◾ ix
which integrates logistic matrix factorization with neighborhood regularization for
accurate drug–target interaction prediction.
In the sixth chapter, entitled “Overview of Neural Network Accelerators,” the authors
introduce the different accelerating methods of neural networks, including ASICs, GPUs,
FPGAs, and modern storage, as well as the open-source framework for neural networks.
With the emerging applications of artificial intelligence, computer vision, speech recogni-
tion, and machine learning, neural networks have been the most useful solution. Due to
the low efficiency of neural networks implementation on general processors, variable spe-
cific heterogeneous neural network accelerators were proposed.
In the seventh chapter, entitled “Acceleration for Recommendation Algorithms in Data
Mining,” the authors propose a dedicated hardware structure to implement the train-
ing accelerator and prediction accelerator. The training accelerator supports five kinds of
similarity metrics, which can be used in the user-based collaborative filtering (CF) and
item-based CF training stages and the difference calculation of SlopeOne’s training stage.
A prediction accelerator that supports these three algorithms involves an accumulation
operation and weighted average operation during their prediction stage. In addition, this
chapter also designs the bus and interconnection between the host CPU, memory, hardware
accelerator, and some peripherals such as DMA. For the convenience of users, we create
and encapsulate the user layer function call interfaces of these hardware accelerators and
DMA under the Linux operating system environment. Additionally, we utilize the FPGA
platform to implement a prototype for this hardware acceleration system, which is based on
the ZedBoard Zynq development board. Experimental results show this prototype gains a
good acceleration effect with low power and less energy consumption at run time.
In the eighth chapter, entitled “Deep Learning Accelerators,” the authors introduce
the basic theory of deep learning and FPGA-based acceleration methods. They start from
the inference process of fully connected networks and propose FPGA-based accelerating
systems to study how to improve the computing performance of fully connected neural
networks on hardware accelerators.
In the ninth chapter, entitled “Recent Advances for Neural Networks Accelerators and
Optimizations,” the authors introduce the recent highlights for neural network accelerators
that have played an important role in computer vision, artificial intelligence, and c omputer
architecture. Recently, this role has been extended to the field of electronic design automa-
tion (EDA). In this chapter, the authors integrate and summarize the recent highlights
and novelty of neural network papers from the 2016 EDA Conference (DAC, ICCAD, and
DATE), then classify and analyze the key technology in each paper. Finally, they give some
new hot spots and research trends for neural networks.
In the tenth chapter, entitled “Accelerators for Clustering Applications in Machine
Learning,” the authors propose a hardware accelerator platform based on FPGA by the
combination of hardware and software. The hardware accelerator accommodates four
clustering algorithms, namely the k-means algorithm, PAM algorithm, SLINK algorithm,
and DBSCAN algorithm. Each algorithm can support two kinds of similarity metrics,
Manhattan and Euclidean. Through locality analysis, the hardware accelerator presented a
solution to address the off-chip memory access and then balanced the relationship between