Table Of Content

High Performance Computing for Big Data METHODOLOGIES AND APPLICATIONS Chapman & Hall/CRC Big Data Series SERIES EDITOR Sanjay Ranka AIMS AND SCOPE This series aims to present new research and applications in Big Data, along with the computa- tional tools and techniques currently in development. The inclusion of concrete examples and applications is highly encouraged. The scope of the series includes, but is not limited to, titles in the areas of social networks, sensor networks, data-centric computing, astronomy, genomics, medical data analytics, large-scale e-commerce, and other relevant topics that may be proposed by poten- tial contributors. PUBLISHED TITLES HIGH PERFORMANCE COMPUTING FOR BIG DATA Chao Wang FRONTIERS IN DATA SCIENCE Matthias Dehmer and Frank Emmert-Streib BIG DATA MANAGEMENT AND PROCESSING Kuan-Ching Li, Hai Jiang, and Albert Y. Zomaya BIG DATA COMPUTING: A GUIDE FOR BUSINESS AND TECHNOLOGY MANAGERS Vivek Kale BIG DATA IN COMPLEX AND SOCIAL NETWORKS My T. Thai, Weili Wu, and Hui Xiong BIG DATA OF COMPLEX NETWORKS Matthias Dehmer, Frank Emmert-Streib, Stefan Pickl, and Andreas Holzinger BIG DATA : ALGORITHMS, ANALYTICS, AND APPLICATIONS Kuan-Ching Li, Hai Jiang, Laurence T. Yang, and Alfredo Cuzzocrea High Performance Computing for Big Data METHODOLOGIES AND APPLICATIONS EDITED BY CHAO WANG CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2018 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Printed on acid-free paper International Standard Book Number-13: 978-1-4987-8399-6 (Hardback) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho- tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http:// www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents Preface, vii Acknowledgments, xi Editor, xiii Contributors, xv Section i Big Data Architectures chapter 1 ◾ Dataflow Model for Cloud Computing Frameworks in Big Data 3 Dong Dai, Yong chen, anD gangYong Jia chapter 2 ◾ Design of a Processor Core Customized for Stencil Computation 19 YouYang Zhang, Yanhua Li, anD Youhui Zhang chapter 3 ◾ Electromigration Alleviation Techniques for 3D Integrated Circuits 37 Yuanqing cheng, aiDa toDri-SaniaL, aLberto boSio, Luigi DiLiLLo, patrick girarD, arnauD ViraZeL, paScaL ViVet, anD Marc beLLeViLLe chapter 4 ◾ A 3D Hybrid Cache Design for CMP Architecture for Data-Intensive Applications 59 ing-chao Lin, Jeng-nian chiou, anD Yun-kae Law Section ii Emerging Big Data Applications chapter 5 ◾ Matrix Factorization for Drug–Target Interaction Prediction 83 Yong Liu, Min wu, Xiao-Li Li, anD peiLin Zhao v vi ◾ Contents chapter 6 ◾ Overview of Neural Network Accelerators 107 Yuntao Lu, chao wang, Lei gong, Xi Li, aiLi wang, anD Xuehai Zhou chapter 7 ◾ Acceleration for Recommendation Algorithms in Data Mining 121 chongchong Xu, chao wang, Lei gong, Xi Li, aiLi wang, anD Xuehai Zhou chapter 8 ◾ Deep Learning Accelerators 151 YangYang Zhao, chao wang, Lei gong, Xi Li, aiLi wang, anD Xuehai Zhou chapter 9 ◾ Recent Advances for Neural Networks Accelerators and Optimizations 171 Fan Sun, chao wang, Lei gong, Xi Li, aiLi wang, anD Xuehai Zhou chapter 10 ◾ Accelerators for Clustering Applications in Machine Learning 191 Yiwei Zhang, chao wang, Lei gong, Xi Li, aiLi wang, anD Xuehai Zhou chapter 11 ◾ Accelerators for Classification Algorithms in Machine Learning 223 ShiMing Lei, chao wang, Lei gong, Xi Li, aiLi wang, anD Xuehai Zhou chapter 12 ◾ Accelerators for Big Data Genome Sequencing 245 haiJie Fang, chao wang, ShiMing Lei, Lei gong, Xi Li, aiLi wang, anD Xuehai Zhou INDEX, 261 Preface As scientific applications have become more data intensive, the management of data resources and dataflow between the storage and computing resources is becoming a bottleneck. Analyzing, visualizing, and managing these large data sets is posing significant challenges to the research community. The conventional parallel architecture, systems, and software will exceed the performance capacity with this expansive data scale. At present, researchers are increasingly seeking a high level of parallelism at the data level and task level using novel methodologies for emerging applications. A significant amount of state-of-the-art research work on big data has been executed in the past few years. This book presents the contributions of leading experts in their respective fields. It covers fundamental issues about Big Data, including emerging high-performance architectures for data-intensive applications, novel efficient analytical strategies to boost data processing, and cutting-edge applications in diverse fields, such as machine learning, life science, neural networks, and neuromorphic engineering. The book is organized into two main sections: 1. “Big Data Architectures” considers the research issues related to the state-of-the-art architectures of big data, including cloud computing systems and heterogeneous accelerators. It also covers emerging 3D integrated circuit design principles for memory architectures and devices. 2. “Emerging Big Data Applications” illustrates practical applications of big data across several domains, including bioinformatics, deep learning, and neuromorphic engineering. Overall, the book reports on state-of-the-art studies and achievements in methodologies and applications of high-performance computing for big data applications. The first part includes four interesting works on big data architectures. The contribution of each of these chapters is introduced in the following. In the first chapter, entitled “Dataflow Model for Cloud Computing Frameworks in Big Data,” the authors present an overview survey of various cloud computing frameworks. This chapter proposes a new “controllable dataflow” model to uniformly describe and com- pare them. The fundamental idea of utilizing a controllable dataflow model is that it can effectively isolate the application logic from execution. In this way, different computing vii viii ◾ Preface frameworks can be considered as the same algorithm with different control statements to support the various needs of applications. This simple model can help developers better understand a broad range of computing models including batch, incremental, streaming, etc., and is promising for being a uniform programming model for future cloud computing frameworks. In the second chapter, entitled “Design of a Processor Core Customized for Stencil Computation,” the authors propose a systematic approach to customizing a simple core with conventional architecture features, including array padding, loop tiling, data prefetch, on-chip memory for temporary storage, online adjusting of the cache strategy to reduce memory traffic, Memory In-and-Out and Direct Memory Access for the overlap of computation (instruction-level parallelism). For stencil computations, the authors employed all customization strategies and evaluated each of them from the aspects of core performance, energy consumption, chip area, and so on, to construct a comprehensive assessment. In the third chapter, entitled “Electromigration Alleviation Techniques for 3D Integrated Circuits,” the authors propose a novel method called TSV-SAFE to mitigate electromigration (EM) effect of defective through-silicon vias (TSVs). At first, they analyze various pos- sible TSV defects and demonstrate that they can aggravate EM dramatically. Based on the observation that the EM effect can be alleviated significantly by balancing the direction of current flow within TSV, the authors design an online self-healing circuit to protect defective TSVs, which can be detected during the test procedure, from EM without degrading performance. To make sure that all defective TSVs are protected with low hardware overhead, the authors also propose a switch network-based sharing structure such that the EM protection modules can be shared among TSV groups in the neighborhood. Experimental results show that the proposed method can achieve over 10 times improvement on mean time to failure compared to the design without using such a method, with negligible hardware overhead and power consumption. In the fourth chapter, entitled “A 3D Hybrid Cache Design for CMP Architecture for Data-Intensive Applications,” the authors propose a 3D stacked hybrid cache architecture that contains three types of cache bank: SRAM bank, STT-RAM bank, and STT-RAM/ SRAM hybrid bank for chip multiprocessor architecture to reduce power consumption and wire delay. Based on the proposed 3D hybrid cache with hybrid local banks, the authors propose an access-aware technique and a dynamic partitioning algorithm to mitigate the average access latency and reduce energy consumption. The experimental results show that the proposed 3D hybrid cache with hybrid local banks can reduce energy by 60.4% and 18.9% compared to 3D pure SRAM cache and 3D hybrid cache with SRAM local banks, respectively. With the proposed dynamic partitioning algorithm and access-aware technique, our proposed 3D hybrid cache reduces the miss rate by 7.7%, access latency by 18.2%, and energy delay product by 18.9% on average. The second part includes eight chapters on big data applications. The contribution of each of these chapters is introduced in the following. In the fifth chapter, entitled “Matrix Factorization for Drug–Target Interaction Prediction,” the authors first review existing methods developed for drug–target interaction prediction. Then, they introduce neighborhood regularized logistic matrix factorization, Preface ◾ ix which integrates logistic matrix factorization with neighborhood regularization for accurate drug–target interaction prediction. In the sixth chapter, entitled “Overview of Neural Network Accelerators,” the authors introduce the different accelerating methods of neural networks, including ASICs, GPUs, FPGAs, and modern storage, as well as the open-source framework for neural networks. With the emerging applications of artificial intelligence, computer vision, speech recogni- tion, and machine learning, neural networks have been the most useful solution. Due to the low efficiency of neural networks implementation on general processors, variable spe- cific heterogeneous neural network accelerators were proposed. In the seventh chapter, entitled “Acceleration for Recommendation Algorithms in Data Mining,” the authors propose a dedicated hardware structure to implement the training accelerator and prediction accelerator. The training accelerator supports five kinds of similarity metrics, which can be used in the user-based collaborative filtering (CF) and item-based CF training stages and the difference calculation of SlopeOne’s training stage. A prediction accelerator that supports these three algorithms involves an accumulation operation and weighted average operation during their prediction stage. In addition, this chapter also designs the bus and interconnection between the host CPU, memory, hardware accelerator, and some peripherals such as DMA. For the convenience of users, we create and encapsulate the user layer function call interfaces of these hardware accelerators and DMA under the Linux operating system environment. Additionally, we utilize the FPGA platform to implement a prototype for this hardware acceleration system, which is based on the ZedBoard Zynq development board. Experimental results show this prototype gains a good acceleration effect with low power and less energy consumption at run time. In the eighth chapter, entitled “Deep Learning Accelerators,” the authors introduce the basic theory of deep learning and FPGA-based acceleration methods. They start from the inference process of fully connected networks and propose FPGA-based accelerating systems to study how to improve the computing performance of fully connected neural networks on hardware accelerators. In the ninth chapter, entitled “Recent Advances for Neural Networks Accelerators and Optimizations,” the authors introduce the recent highlights for neural network accelerators that have played an important role in computer vision, artificial intelligence, and c omputer architecture. Recently, this role has been extended to the field of electronic design automa- tion (EDA). In this chapter, the authors integrate and summarize the recent highlights and novelty of neural network papers from the 2016 EDA Conference (DAC, ICCAD, and DATE), then classify and analyze the key technology in each paper. Finally, they give some new hot spots and research trends for neural networks. In the tenth chapter, entitled “Accelerators for Clustering Applications in Machine Learning,” the authors propose a hardware accelerator platform based on FPGA by the combination of hardware and software. The hardware accelerator accommodates four clustering algorithms, namely the k-means algorithm, PAM algorithm, SLINK algorithm, and DBSCAN algorithm. Each algorithm can support two kinds of similarity metrics, Manhattan and Euclidean. Through locality analysis, the hardware accelerator presented a solution to address the off-chip memory access and then balanced the relationship between

High Performance Computing for Big Data: Methodologies and Applications PDF

287 Pages·2018·8.693 MB·English

by Chao Wang

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download High Performance Computing for Big Data: Methodologies and Applications PDF Free - Full Version

by Chao Wang| 2018| 287 pages| 8.693| English

Download High Performance Computing for Big Data: Methodologies and Applications by Chao Wang in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About High Performance Computing for Big Data: Methodologies and Applications

No description available for this book.

Detailed Information

Author:	Chao Wang
Publication Year:	2018
Pages:	287
Language:	English
File Size:	8.693
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free High Performance Computing for Big Data: Methodologies and Applications Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download High Performance Computing for Big Data: Methodologies and Applications PDF?

Yes, on https://PDFdrive.to you can download High Performance Computing for Big Data: Methodologies and Applications by Chao Wang completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read High Performance Computing for Big Data: Methodologies and Applications on my mobile device?

After downloading High Performance Computing for Big Data: Methodologies and Applications PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of High Performance Computing for Big Data: Methodologies and Applications?

Yes, this is the complete PDF version of High Performance Computing for Big Data: Methodologies and Applications by Chao Wang. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download High Performance Computing for Big Data: Methodologies and Applications PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.