Today’s Agenda (cid:73) 08:30 Welcome and broader context (Saman Amarasinghe) (cid:73) 08:40 Introduction to OpenTuner (Jason Ansel) (cid:73) 09:10 Search techniques (Kalyan Veeramachaneni) (cid:73) 09:35 In depth example (Jeffrey Bosboom) (cid:73) 10:00 Break (cid:73) 10:15 Applications (cid:73) Halide (Jonathan Ragan-Kelley) (cid:73) SEJITS (Chick Markley) (cid:73) JVM optimization (Tharindu Rusira) (cid:73) 11:00 Hands on session (Shoaib Kamil) (cid:73) 11:45 Discussion 1/41 Introduction to OpenTuner Jason Ansel MIT-CSAIL Febuary 8, 2015 2/41 $ g++−O3−o raytracer a raytracer.cpp $ time ./raytracer a ./raytracer a 0.17s user 0.00s system 99% cpu 0.175 total 1.47x speedup with: $ g++−O3−o raytracer b apps/raytracer.cpp −funsafe−math−optimizations −fwrapv (cid:44)→−fno−expensive−optimizations −−param=max−peel−branches=115−fweb −fno− (cid:44)→ cx−fortran−rules −−param=max−inline−recursive−depth=25−fno−btr−bb− (cid:44)→ exclusive −fno−tree−ch−−param=iv−max−considered−uses=69−fgcse−las − (cid:44)→ ftree−loop−distribution −−param=max−goto−duplication−insns=11−−param= (cid:44)→ max−hoist−depth=44−fsched−stalled−insns−dep−−param=max−once−peeled− (cid:44)→ insns=165−−param=max−pipeline−region−insns=316−−param=iv−consider−all (cid:44)→−candidates−bound=75 $ time ./raytracer b ./raytracer b 0.12s user 0.00s system 99% cpu 0.119 total Raytracer Example An example ray tracer program: raytracer.cpp 3/41 1.47x speedup with: $ g++−O3−o raytracer b apps/raytracer.cpp −funsafe−math−optimizations −fwrapv (cid:44)→−fno−expensive−optimizations −−param=max−peel−branches=115−fweb −fno− (cid:44)→ cx−fortran−rules −−param=max−inline−recursive−depth=25−fno−btr−bb− (cid:44)→ exclusive −fno−tree−ch−−param=iv−max−considered−uses=69−fgcse−las − (cid:44)→ ftree−loop−distribution −−param=max−goto−duplication−insns=11−−param= (cid:44)→ max−hoist−depth=44−fsched−stalled−insns−dep−−param=max−once−peeled− (cid:44)→ insns=165−−param=max−pipeline−region−insns=316−−param=iv−consider−all (cid:44)→−candidates−bound=75 $ time ./raytracer b ./raytracer b 0.12s user 0.00s system 99% cpu 0.119 total Raytracer Example An example ray tracer program: raytracer.cpp $ g++−O3−o raytracer a raytracer.cpp $ time ./raytracer a ./raytracer a 0.17s user 0.00s system 99% cpu 0.175 total 3/41 Raytracer Example An example ray tracer program: raytracer.cpp $ g++−O3−o raytracer a raytracer.cpp $ time ./raytracer a ./raytracer a 0.17s user 0.00s system 99% cpu 0.175 total 1.47x speedup with: $ g++−O3−o raytracer b apps/raytracer.cpp −funsafe−math−optimizations −fwrapv (cid:44)→−fno−expensive−optimizations −−param=max−peel−branches=115−fweb −fno− (cid:44)→ cx−fortran−rules −−param=max−inline−recursive−depth=25−fno−btr−bb− (cid:44)→ exclusive −fno−tree−ch−−param=iv−max−considered−uses=69−fgcse−las − (cid:44)→ ftree−loop−distribution −−param=max−goto−duplication−insns=11−−param= (cid:44)→ max−hoist−depth=44−fsched−stalled−insns−dep−−param=max−once−peeled− (cid:44)→ insns=165−−param=max−pipeline−region−insns=316−−param=iv−consider−all (cid:44)→−candidates−bound=75 $ time ./raytracer b ./raytracer b 0.12s user 0.00s system 99% cpu 0.119 total 3/41 (cid:73) Specific to: (cid:73) raytracer.cpp (cid:73) Same flags are 1.42x slower than -O1 for fft.c (cid:73) GCC 4.8.2-19ubuntu1 (cid:73) Intel Core i7-4770S (cid:73) Autotuners can help! iv-consider-all-candidates-bound what??? This command is brittle and confusing: $ g++−O3−o raytracer b apps/raytracer.cpp −funsafe−math−optimizations −fwrapv (cid:44)→−fno−expensive−optimizations −−param=max−peel−branches=115−fweb −fno− (cid:44)→ cx−fortran−rules −−param=max−inline−recursive−depth=25−fno−btr−bb− (cid:44)→ exclusive −fno−tree−ch−−param=iv−max−considered−uses=69−fgcse−las − (cid:44)→ ftree−loop−distribution −−param=max−goto−duplication−insns=11−−param= (cid:44)→ max−hoist−depth=44−fsched−stalled−insns−dep−−param=max−once−peeled− (cid:44)→ insns=165−−param=max−pipeline−region−insns=316−−param=iv−consider−all (cid:44)→−candidates−bound=75 4/41 (cid:73) Autotuners can help! iv-consider-all-candidates-bound what??? This command is brittle and confusing: $ g++−O3−o raytracer b apps/raytracer.cpp −funsafe−math−optimizations −fwrapv (cid:44)→−fno−expensive−optimizations −−param=max−peel−branches=115−fweb −fno− (cid:44)→ cx−fortran−rules −−param=max−inline−recursive−depth=25−fno−btr−bb− (cid:44)→ exclusive −fno−tree−ch−−param=iv−max−considered−uses=69−fgcse−las − (cid:44)→ ftree−loop−distribution −−param=max−goto−duplication−insns=11−−param= (cid:44)→ max−hoist−depth=44−fsched−stalled−insns−dep−−param=max−once−peeled− (cid:44)→ insns=165−−param=max−pipeline−region−insns=316−−param=iv−consider−all (cid:44)→−candidates−bound=75 (cid:73) Specific to: (cid:73) raytracer.cpp (cid:73) Same flags are 1.42x slower than -O1 for fft.c (cid:73) GCC 4.8.2-19ubuntu1 (cid:73) Intel Core i7-4770S 4/41 iv-consider-all-candidates-bound what??? This command is brittle and confusing: $ g++−O3−o raytracer b apps/raytracer.cpp −funsafe−math−optimizations −fwrapv (cid:44)→−fno−expensive−optimizations −−param=max−peel−branches=115−fweb −fno− (cid:44)→ cx−fortran−rules −−param=max−inline−recursive−depth=25−fno−btr−bb− (cid:44)→ exclusive −fno−tree−ch−−param=iv−max−considered−uses=69−fgcse−las − (cid:44)→ ftree−loop−distribution −−param=max−goto−duplication−insns=11−−param= (cid:44)→ max−hoist−depth=44−fsched−stalled−insns−dep−−param=max−once−peeled− (cid:44)→ insns=165−−param=max−pipeline−region−insns=316−−param=iv−consider−all (cid:44)→−candidates−bound=75 (cid:73) Specific to: (cid:73) raytracer.cpp (cid:73) Same flags are 1.42x slower than -O1 for fft.c (cid:73) GCC 4.8.2-19ubuntu1 (cid:73) Intel Core i7-4770S (cid:73) Autotuners can help! 4/41 How to Autotune a Program Program 5/41 How to Autotune a Program Program Search Space Definition Executes Run Method 5/41
Description: