Table Of ContentUUnniivveerrssiittyy ooff KKeennttuucckkyy
UUKKnnoowwlleeddggee
University of Kentucky Master's Theses Graduate School
2006
CCAACCHHEE OOPPTTIIMMIIZZAATTIIOONN AANNDD PPEERRFFOORRMMAANNCCEE EEVVAALLUUAATTIIOONN OOFF AA
SSTTRRUUCCTTUURREEDD CCFFDD CCOODDEE -- GGHHOOSSTT
Anand B. Palki
University of Kentucky, apalki@gmail.com
RRiigghhtt cclliicckk ttoo ooppeenn aa ffeeeeddbbaacckk ffoorrmm iinn aa nneeww ttaabb ttoo lleett uuss kknnooww hhooww tthhiiss ddooccuummeenntt bbeenneefifittss yyoouu..
RReeccoommmmeennddeedd CCiittaattiioonn
Palki, Anand B., "CACHE OPTIMIZATION AND PERFORMANCE EVALUATION OF A STRUCTURED CFD
CODE - GHOST" (2006). University of Kentucky Master's Theses. 363.
https://uknowledge.uky.edu/gradschool_theses/363
This Thesis is brought to you for free and open access by the Graduate School at UKnowledge. It has been
accepted for inclusion in University of Kentucky Master's Theses by an authorized administrator of UKnowledge.
For more information, please contact UKnowledge@lsv.uky.edu.
ABSTRACT OF THESIS
CACHE OPTIMIZATION AND PERFORMANCE EVALUATION OF A STRUCTURED
CFD CODE - GHOST
This research focuses on evaluating and enhancing the performance of an in-house, structured,
2D CFD code - GHOST, on modern commodity clusters. The basic philosophy of this work is to
optimize the cache performance of the code by splitting up the grid into smaller blocks and
carrying out the required calculations on these smaller blocks. This in turn leads to enhanced
code performance on commodity clusters. Accordingly, this work presents a discussion along
with a detailed description of two techniques: external and internal blocking, for data access
optimization. These techniques have been tested on steady, unsteady, laminar, and turbulent test
cases and the results are presented. The critical hardware parameters which influenced the code
performance were identified. A detailed study investigating the effect of these parameters on the
code performance was conducted and the results are presented. The modified version of the code
was also ported to the current state-of-art architectures with successful results.
KEYWORDS: Cache Optimization, External blocking, Internal blocking, Structured CFD
Code Optimization, Commodity Clusters
Anand .B. Palki
12/15/2006
Copyright © Anand B Palki, 2006
CACHE OPTIMIZATION AND PERFORMANCE EVALUATION OF A STRUCTURED
CFD CODE - GHOST
By
Anand .B. Palki
Dr. Raymond .P. LeBeau
Director of Thesis
Dr. L. S. Stephens
Director of Graduate Studies
12/15/2005
RULES FOR THE USE OF THESIS
Unpublished thesis submitted for the Master’s degree and deposited in the University of
Kentucky Library are as a rule open for inspection, but are to be used only with due
regard to the rights of the authors. Bibliographical references may be noted, but
quotations or summaries of parts may be published only with the permission of the
author, and with the usual scholarly acknowledgements.
Extensive copying or publication of the thesis in whole or in part also requires the
consent of the Dean of the Graduate School of the University of Kentucky.
A library that borrows this thesis for use by its patrons is expected to secure the signature of
each user.
Name Date
THESIS
Anand B Palki
The Graduate School
University of Kentucky
2006
CACHE OPTIMIZATION AND PERFORMANCE EVALUATION OF A STRUCTURED
CFD CODE - GHOST
THESIS
A thesis submitted in partial fulfillment of the
requirements for the degree of Master of Science in Mechanical Engineering
in the College of Engineering at the University of Kentucky
By
Anand .B. Palki
Lexington, Kentucky
Director: Dr. Raymond .P. LeBeau
Assistant Professor of Mechanical Engineering
University of Kentucky Lexington, Kentucky
2006
To My Parents
ACKNOWLEDGEMENTS
The satisfaction and euphoria that accompany the successful completion of any task would
not be complete without the mention of the people who made it possible, whose constant
encouragement and guidance crowned the effort with success.
I would like to express my sincere gratitude to my academic advisor Dr. Raymond
LeBeau, for his continuous support and encouragement throughout my work. The progress of
this work would not be achieved without his guidance and numerous ingenious suggestions. I
would like to thank Dr. P.G. Huang for helping me understand how a CFD code works. I
would also like to thank my defense committee members Dr. Jamey Jacob and Dr. M. Seigler
for their valuable time for serving on my committee and evaluating my work.
I am grateful to my student collaegues who made my stay at the University of Kentucky
a memorable one. I am especially grateful to Abhishek T, Ajay Babu, Aditya C, Chaitanya
Penugonda, Chetan Babu, Daniel R, Jacky Rhinehart, Karthik M, Narendra BK, Phanindra C,
Radhika K, Sandeep B, Snehal P and Vijay N for their emotional support, entertainment,
caring and most importantly for being the surrogate family over the past two years. Finally, I
would like to thank my parents for their continuous encouragement and love without which
none of this would have been possible.
vii
Table of Contents
Acknowledgements............................................................................................................................................vii
List of Tables.....................................................................................................................................................x
List of Figures...................................................................................................................................................xi
List of Files.....................................................................................................................................................xiii
1. Introduction......................................................................................................................................................1
1.1 Overview.................................................................................................................................................1
1.2 Background - Memory Hierarchy...........................................................................................................2
1.3 Introduction to Problem..........................................................................................................................6
1.4 Goals of Optimizing the Code................................................................................................................7
1.5 Previous Work........................................................................................................................................8
1.5.1 General Cache Optimization Techniques........................................................................................8
1.5.1.1 Techniques for Reducing Capacity Misses..............................................................................9
1.5.1.2 Techniques for Reducing Conflict Misses.............................................................................11
1.5.1.3 Techniques to Hide Effects of Cache Misses........................................................................12
1.5.1.4 Techniques to Improve the Replacement Decisions by Cache..............................................13
1.5.2 Optimizations to CFD Codes........................................................................................................13
1.5.2.1 Techniques to Improve Parallel Performance:.......................................................................13
1.5.2.2 Techniques to Improve Single Node Performance:...............................................................14
1.6 External & Internal Blocking................................................................................................................15
2. Computational tools.......................................................................................................................................17
2.1 General Description of GHOST............................................................................................................17
2.1.1 GHOST Flowchart........................................................................................................................18
2.1.2 Governing equations.....................................................................................................................19
2.1.3 Calculation at artificial boundaries................................................................................................21
2.2 Grid File Data.......................................................................................................................................22
2.2.1 Finite Volume Method..................................................................................................................22
2.2.2 Generalized coordinates................................................................................................................22
2.2.3 Description of G.F90 Output.........................................................................................................25
2.2.4 Description of Input File...............................................................................................................26
2.3 Compilers & MPI Environment............................................................................................................26
2.4 Valgrind [79].........................................................................................................................................28
2.5 Kentucky Fluid Clusters........................................................................................................................30
2.6 Method used to measure performance..................................................................................................31
2.7 Summary...............................................................................................................................................32
3. External Blocking Results..............................................................................................................................33
3.1 Terminology..........................................................................................................................................33
3.1.1 Terms related to Code Versions....................................................................................................33
3.1.2 Terms related to performance study test results description.........................................................34
3.1.3 Terms related to cache behavior study test results........................................................................34
3.2 Test Case...............................................................................................................................................35
3.3 Types of Tests.......................................................................................................................................35
3.4 External Blocking.................................................................................................................................37
3.5 Performance Test Results......................................................................................................................37
3.5.1 KFC4 Results................................................................................................................................37
3.5.2 KFC5 Results................................................................................................................................41
3.5.3 Rectangular Blocks.......................................................................................................................42
3.5.4 Effects of Compiler Optimization Levels on Performance...........................................................43
viii
3.5.5 Effects of Different Compilers on Performance............................................................................46
3.5.5 Effect of different hardware on performance................................................................................50
3.5.6 Steady Turbulent Case Performance Results................................................................................53
3.6 Valgrind Results....................................................................................................................................57
3.6.1 KFC4 Results................................................................................................................................57
3.6.2 Comparison between G95 & IFC..................................................................................................59
3.6.3 Effect of Cache Thrashing.............................................................................................................62
3.6.4 Valgrind Results for Turbulent Case.............................................................................................63
3.7 Accuracy Test Results...........................................................................................................................64
3.8 Summary...............................................................................................................................................67
4. Internal Blocking Results...............................................................................................................................68
4.1 Basic Principle......................................................................................................................................68
4.2 Implementation of Internal Blocking in GHOST..................................................................................68
4.3 Primary Tests........................................................................................................................................71
4.4 Performance Test Results......................................................................................................................72
4.4.1 Comparison of Performance Test Results between External and Internal Blocking.....................74
4.5 Valgrind Results....................................................................................................................................76
4.5.1 Comparison of Valgrind results between Internal and External Blocking....................................79
4.6 Accuracy Test Results...........................................................................................................................81
4.7 Summary...............................................................................................................................................84
5. Unsteady Test Case Results...........................................................................................................................85
5.1 Laminar Unsteady Test Case................................................................................................................85
5.2 Performance Test Results......................................................................................................................87
5.3 Accuracy Test Results...........................................................................................................................90
5.4 Summary...............................................................................................................................................93
6. Conclusions And Future Work.....................................................................................................................94
6.1 Summary and Conclusions....................................................................................................................94
6.2 Future Work..........................................................................................................................................98
Appendix...........................................................................................................................................................100
A.1 Steps to implement internal blocking to GHOST...............................................................................100
References.........................................................................................................................................................116
Vita................................................................................................................................................................122
ix
Description:PERFORMANCE EVALUATION OF A. STRUCTURED CFD CODE - GHOST. Anand B. Palki. University of Kentucky, apalki@gmail.com. This Thesis is