Building Probabilistic Graphical Models with Python Solve machine learning problems using probabilistic graphical models implemented in Python with real-world applications Kiran R Karkera BIRMINGHAM - MUMBAI Building Probabilistic Graphical Models with Python Copyright © 2014 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: June 2014 Production reference: 1190614 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78328-900-4 www.packtpub.com Cover image by Manju Mohanadas ([email protected]) [ FM-2 ] Credits Author Project Coordinator Kiran R Karkera Melita Lobo Reviewers Proofreaders Mohit Goenka Maria Gould Shangpu Jiang Joanna McMahon Jing (Dave) Tian Xiao Xiao Indexers Mariammal Chettiyar Hemangini Bari Commissioning Editor Kartikey Pandey Graphics Disha Haria Acquisition Editor Nikhil Chinnari Yuvraj Mannari Abhinash Sahu Content Development Editor Madhuja Chaudhari Production Coordinator Alwin Roy Technical Editor Krishnaveni Haridas Cover Work Alwin Roy Copy Editors Alisha Aranha Roshni Banerjee Mradula Hegde [ FM-3 ] About the Author Kiran R Karkera is a telecom engineer with a keen interest in machine learning. He has been programming professionally in Python, Java, and Clojure for more than 10 years. In his free time, he can be found attempting machine learning competitions at Kaggle and playing the flute. I would like to thank the maintainers of Libpgm and OpenGM libraries, Charles Cabot and Thorsten Beier, for their help with the code reviews. [ FM-4 ] About the Reviewers Mohit Goenka graduated from the University of Southern California (USC) with a Master's degree in Computer Science. His thesis focused on game theory and human behavior concepts as applied in real-world security games. He also received an award for academic excellence from the Office of International Services at the University of Southern California. He has showcased his presence in various realms of computers including artificial intelligence, machine learning, path planning, multiagent systems, neural networks, computer vision, computer networks, and operating systems. During his tenure as a student, Mohit won multiple competitions cracking codes and presented his work on Detection of Untouched UFOs to a wide range of audience. Not only is he a software developer by profession, but coding is also his hobby. He spends most of his free time learning about new technology and grooming his skills. What adds a feather to Mohit's cap is his poetic skills. Some of his works are part of the University of Southern California libraries archived under the cover of the Lewis Carroll Collection. In addition to this, he has made significant contributions by volunteering to serve the community. Shangpu Jiang is doing his PhD in Computer Science at the University of Oregon. He is interested in machine learning and data mining and has been working in this area for more than six years. He received his Bachelor's and Master's degrees from China. [ FM-5 ] Jing (Dave) Tian is now a graduate researcher and is doing his PhD in Computer Science at the University of Oregon. He is a member of the OSIRIS lab. His research direction involves system security, embedded system security, trusted computing, and static analysis for security and virtualization. He is interested in Linux kernel hacking and compilers. He also spent a year on AI and machine learning direction and taught the classes Intro to Problem Solving using Python and Operating Systems in the Computer Science department. Before that, he worked as a software developer in the Linux Control Platform (LCP) group at the Alcatel-Lucent (former Lucent Technologies) R&D department for around four years. He got his Bachelor's and Master's degrees from EE in China. Thanks to the author of this book who has done a good job for both Python and PGM; thanks to the editors of this book, who have made this book perfect and given me the opportunity to review such a nice book. Xiao Xiao is a PhD student studying Computer Science at the University of Oregon. Her research interests lie in machine learning, especially probabilistic graphical models. Her previous project was to compare two inference algorithms' performance on a graphical model (relational dependency network). [ FM-6 ] www.PacktPub.com Support files, eBooks, discount offers and more You might want to visit www.PacktPub.com for support files and downloads related to your book. Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub. com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details. At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks. TM http://PacktLib.PacktPub.com Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books. Why Subscribe? • Fully searchable across every book published by Packt • Copy and paste, print and bookmark content • On demand and accessible via web browser Free Access for Packt account holders If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access. [ FM-7 ] Table of Contents Preface 1 Chapter 1: Probability 5 The theory of probability 5 Goals of probabilistic inference 8 Conditional probability 9 The chain rule 9 The Bayes rule 9 Interpretations of probability 11 Random variables 13 Marginal distribution 13 Joint distribution 14 Independence 14 Conditional independence 15 Types of queries 16 Probability queries 16 MAP queries 16 Summary 18 Chapter 2: Directed Graphical Models 19 Graph terminology 19 Python digression 20 Independence and independent parameters 20 The Bayes network 23 The chain rule 24 Reasoning patterns 24 Causal reasoning 25 Evidential reasoning 27 Inter-causal reasoning 27