Table Of ContentMastering Gephi Network
Visualization
Produce advanced network graphs in Gephi and
gain valuable insights into your network datasets
Ken Cherven
BIRMINGHAM - MUMBAI
Mastering Gephi Network Visualization
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, without the prior written
permission of the publisher, except in the case of brief quotations embedded in
critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy
of the information presented. However, the information contained in this book is
sold without warranty, either express or implied. Neither the author, nor Packt
Publishing, and its dealers and distributors will be held liable for any damages
caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the
companies and products mentioned in this book by the appropriate use of capitals.
However, Packt Publishing cannot guarantee the accuracy of this information.
First published: January 2015
Production reference: 1220115
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78398-734-4
www.packtpub.com
Credits
Author Project Coordinator
Ken Cherven Leena Purkait
Reviewers Proofreaders
Ladan Doroud Cathy Cumberlidge
Miro Marchi Paul Hindle
David Edward Polley Samantha Lyon
Mollie Taylor
George G. Vega Yon Indexer
Monica Ajmera Mehta
Commissioning Editor
Ashwin Nair Graphics
Abhinash Sahu
Acquisition Editor
Sam Wood Production Coordinator
Conidon Miranda
Content Development Editor
Amey Varangaonkar Cover Work
Conidon Miranda
Technical Editors
Shruti Rawool
Shali Sasidharan
Copy Editors
Rashmi Sawant
Stuti Srivastava
Neha Vyas
About the Author
Ken Cherven is a Detroit-based data visualization and open source enthusiast,
with 20 years of experience working with data and visualization tools. In addition
to Gephi, he has worked with a variety of open source tools, including MySQL,
SpagoBI, JasperServer, D3, Protovis, Omeka, QGIS, Leaflet, and Exhibit. He also
has considerable experience using corporate software tools from Microsoft,
Cognos, Tableau, and Oracle.
An automotive analyst and visualizer by day, he spends much of his personal time
turning baseball data into web-based visualizations housed on his website, http://
visual-baseball.com. He has previously authored Network Graph Analysis and
Visualization with Gephi, Packt Publishing, as well as a self-published book, MLB
Pennant Races, 1901-1968: A Visual Analysis of Baseball's Pennant Races, Visual-Baseball
Press. His current areas of interest include visual dashboards, interactive networks,
and anything involving geographic information.
Acknowledgments
I would like to thank the members of my family for their patience and understanding
over the course of several months spent working on this book. This always starts
with my wife, Karen, and extends to my children, Kellen, Kristopher, and Katie, as
well as my always helpful mother-in-law, Carole Young.
This book would not have been possible without the considerable efforts of a group
of thorough technical and content editors. I would like to sincerely thank Mollie
Taylor, Ladan Doroud, Miro Marchi, Ted Polley, George Vega Yon, Marta Castellani,
and Manasi Pandire for their considerable efforts to make this the best possible book.
All of your input has been noted, and many improvements have been incorporated.
A special thanks also to Amey Varangaonkar at Packt Publishing for managing
the entire process while also making recommendations that will result in a more
enjoyable reading experience. Thanks also to others who helped in the early stages
by providing useful feedback to get the book started. This list includes Joanne
Fitzpatrick and Richard Gall at Packt Publishing, plus Gephi community members,
Randy Novak, Mike Hughes, Matthieu Totet, Marco Valli, Gerry Wilson, and Carlos
Benito Amat.
Finally, I would like to thank the creators and maintainers of Gephi for providing
such a powerful tool that allows users to explore the fascinating world of network
science. Thanks also to the growing community of enthusiasts who use Gephi
to create some remarkable visualizations. My hope is that this book will make it
easier for you to tap into the power of Gephi and, perhaps, even provide a few new
approaches to leverage this powerful tool.
About the Reviewers
Ladan Doroud is a PhD candidate at the University of California, Davis. She
received her master's degree in computer science from the same university in 2013.
She is currently working on her PhD in computer science in Prof. Eisen's lab as a
computational biologist and data scientist. Her research interests mainly lie in the
area of large-scale network analysis, clustering and data mining with special focus
on community detection, and function prediction of protein sequences in large-scale
biological networks.
She has an extensive background in learner-centered education, including her
collaboration with Udacity, Inc. in 2014 as a course manager on the data science track,
as well as her collaboration with the California State Summer School for Mathematics
and Science (COSMOS) in 2011. She can be reached at ldoroud@ucdavis.edu.
Miro Marchi is a PhD candidate at the University of Verona, Italy. He received his
master's degree in cultural anthropology, ethnology, and ethnolinguistics from Ca'
Foscari University of Venice in 2010.
He has authored Self-Governance Lessons from Bali and Stephen Lansing, Cangiani M.
(ed.), Alternative Approaches to Development, Cleup, 2012, where he has reviewed the
research of the interdisciplinary team coordinated by the anthropologist, Stephen J.
Lansing, on farmers' cooperation network for rice cultivation in Bali.
His current research focuses on finding practical ways to foster the emergence
of self-organization in social-economic networks. He is applying ethnographic
methods coupled with community-based online network visualization, which is
built with Drupal and D3 and available at www.retebuonvivere.org/rete, and he
is interested in the use of complexity theory for sustainability and the commons. He
can be reached at miro.marchi@gmail.com.
David Edward Polley is a social sciences librarian at Indiana University-Purdue
University Indianapolis (IUPUI). Prior to joining IUPUI, he worked as a researcher at
the Cyberinfrastructure for Network Science Center in the Indiana University School
of Informatics and Computing, Bloomington. He is interested in the various ways
people use data, generated in social science research. He is the coauthor of a book
on data visualization with Dr. Katy Börner titled, Visual Insights: A Practical Guide
to Making Sense of Data.
Mollie Taylor is the President of Proximity Viz LLC, located in Atlanta, Georgia,
USA, which provides data visualization and mapping services to a wide range of
clients. She holds degrees in economics and international affairs from the Georgia
Institute of Technology. Her blog on programming for data analysis can be found
at http://blog.mollietaylor.com/.
George G. Vega Yon is currently a PhD student at the California Institute of
Technology. He holds a BA degree in business administration and an MA degree
in economics and public policy from Adolfo Ibáñez School of Government (Chile).
He is the author of several R and Stata modules, including ABCoptim: Implementation
of Artificial Bee Colony (ABC) Optimization, rgexf: an R package to work with GEXF graph
files, and Introducing PARALLEL: Stata module for parallel computing. He has shown a
deep interest in statistical computing and data visualization; furthermore, he is the
founder of the Chilean R-Users Group (useR).
He is the cofounder of the entrepreneurship, NodosChile.org Social Network
Analysis, one of the first companies in Chile to put the eye on applied SNA analysis.
George's scholarly interests are focused on policy analysis, complexity and statistical
computing—recognized by the community, as he has served as a reviewer of the
Journal of Computational Economics.
www.PacktPub.com
Support files, eBooks, discount offers,
and more
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF
and ePub files available? You can upgrade to the eBook version at www.PacktPub.
com and as a print book customer, you are entitled to a discount on the eBook copy.
Get in touch with us at service@packtpub.com for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign
up for a range of free newsletters and receive exclusive discounts and offers on Packt
books and eBooks.
TM
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital
book library. Here, you can search, access, and read Packt's entire library of books.
Why subscribe?
• Fully searchable across every book published by Packt
• Copy and paste, print, and bookmark content
• On demand and accessible via a web browser
Free access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access
PacktLib today and view 9 entirely free books. Simply use your login credentials for
immediate access.
Table of Contents
Preface 1
Chapter 1: Fundamentals of Complex Networks and Gephi 7
Graph applications 8
Collaboration graphs 8
Who-talks-to-whom graphs 9
Information linkages 9
Technological networks 9
Natural-world networks 9
A network graph analysis primer 10
Paths and connectivity 11
Paths 11
Cycles 12
Connectivity 13
Network structure 13
Centrality 14
Components 17
Giant components and clustering 18
Homophily 19
Density 19
Network behaviors 20
Contagion and diffusion 20
Network growth 21
Overviewing Gephi 22
Primary windows 23
Data laboratory 23
Manual entry 24
CSV import 24
Excel import 25
MySQL import 25
Graph file import 25