ebook img

Java Microarchitectures PDF

260 Pages·2002·9.83 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Java Microarchitectures

JAVA MICROARCHITECTURES THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE JAVA MICROARCHITECTURES Edited by Vijaykrishnan Narayanan Pennsylvania State University Mario I. Wolczko Sun Microsystems, Inc. SPRINGER SCIENCE+BUSINESS MEDIA, LLC ISBN 978-1-4613-5341-6 ISBN 978-1-4615-0993-6 (eBook) DOI 10.1007/978-1-4615-0993-6 Library of Congress Cataloging-in-Publication Data A CLP. Catalogue record for this book is available from the Library of Congress. Copyright © 2002 Springer Science+Business Media New York Originally published by Kluwer Academic Publishers in 2002 Softcover reprint of the hardcover 1st edition 2002 All rights reserved. No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording, or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed on acid-free paper. Contents List of Figures VB List of Tables Xl Preface Xlll 1 Benchmarking the Java Virtual Architecture 1 David Gregg, James Power and John Waldron 2 A Study of Memory Behavior of Java Workloads 19 Yefim Shuf, Mauricio J. Serrano, Manish Gupta and Jaswinder Pal Singh 3 An Efficient Hardware Implementation of Java Bytecodes, Threads, and 41 Processes for Embedded and Real-Time ApplIcations David S. Hardin, Allen P. Mass, Michael H. Masters and Nick M. Mykris 4 Stack Dependency Resolution for Java Processors ba,>ed on Hardware 55 Folding and Translation: A Bytecode Processing Analysis M. Watheq El-Kharashi, Fayez Gebali and Kin F. Li 5 Improving Java Performance in Embedded and General-Purpose Processors 79 Ramesh Radhakrishnan, Lizy K. John, Ravi Bhargava and Deepu Talla 6 The Delft-Java Engine 105 John Glossner and Stamatis Vassiliadis 7 Quicksilver: A Quasi-static Java Compiler for Embedded Systems 123 Samuel P. Midkiff, Pramod G. Joisha, Mauricio Serrano, Manish Gupta, Anthony Bolmarcich and Peng Wu vi JAVA MICROARCHITECTURES 8 Concurrent Garbage Collection Using Hardware-Assisted Profiling 143 Timothy Heil and James E. Smith 9 Space-Time Dimensional Computing for Java Programs on the MAJC 161 Architecture Shailender Chaudhry and Marc Tremblay 10 Java Machine and Integrated Circuit Architecture (JAMAICA) 187 Ahmed El-Mahdy, Ian Watson and Greg Wright 11 Dynamic Java Threads on the JAMAICA Single-Chip Multiprocessor 207 Greg Wright, Ahmed El-Mahdy and Ian Watson References 231 Index 251 List of Figures 1.1 Average dynamic bytecode percentages for the top 10 methods in terms of bytecodes executed 7 1.2 A summary of dynamic percentages of category usage by the applications in the SPEC JVM98 suite 11 2.1 Characterization of heap accesses and accesses to object fields 24 2.2 Characterization of hot spots 26 2.3 Simulation results 28 2.4 Classification of data related misses 31 2.5 Assessment of opportunities for prefetching 35 3.1 JEMCore Java Processor Core Architecture 44 3.2 Java Grande Forum Synchronization Benchmark Results 46 3.3 Multiple Java Virtual Machine Data Structures 48 3.4 The JEMBuilder Application Builder 50 3.5 Timer interrupt handler code in Java 52 3.6 Timer interrupt notification thread 52 3.7 aJ-100 Block Diagram 53 3.8 aJ -100 Package (larger than actual size) 54 4.1 Proposed Java processor architecture 58 4.2 Dual-architecture Java processor pipeline compared with a pure mc processor pipeline and a RISC pipeline 59 4.3 Percentages of eliminated instructions relative to all in- structions and relative to stack instructions (producers and non-anchor consumers) only 70 4.4 Speedup of folding 70 4.5 Percentages of occurrence of different folding cases rec- ognized by the folding information generation (FIG) unit 72 4.6 Percentages of occurrence of different folding operations performed by the bytecode queue manager (BQM) 72 4.7 Percentages of occurrence of different folding patterns at the output of the folding translator unit (FT) 72 Vlll JAVA MICROA RCHITECTURES 4.8 Percentages of occurrence of different operations per- formed by the local variable file (LVF) 72 4.9 Percentages of occurrence of different folding patterns processed by the load/store unit (LS) 73 4.10 Percentages of usage of different execution units (EXs) 73 5.1 Block diagram of the picoJava-II microprocessor core 83 5.2 Basic pipeline of the picoJava-II core 83 5.3 Increasing decode bandwidth using a fill unit and DB-Cache 84 5.4 Trends in decode rate and hit rate for different DB-Cache sizes 86 5.5 Performance improvement when adding a fill unit, DB- Cache (64-16K entries) and instruction execute width of two to a picoJava-II processor 88 5.6 Relative performance of picoJava-II using the fill unit, DB-Cache (64-16K entries), execution width of two and stack disambiguation 90 5.7 Available ILP in Java workloads 91 5.8 The Hardware Interpreter (Hard-Int) architecture 93 5.9 Translating bytecodes in the Hard-Int architecture 94 5.10 Execution cycles for different execution modes on a 4- way machine 98 5.11 Execution cycles for different execution modes on a 16- way machine 99 5.12 Cycles executed per bytecode on a 4-way machine 102 6.1 DELFT-JAVA concurrent multi-threaded processor or- ganization showing mUltiple thread units, local and global processor units, thread register files, cache memory, con- trol unit, and Link Translation Buffer (LTB) 107 6.2 Indirect register access mechanism showing indirect mem- ory locations (idx), update adders, underflow/overflow signal, and resolved register address multiplexor 113 6.3 Indirect register mapping showing how a resolved regis- ter address is mapped to main memory 114 6.4 Performance results of a vector-multiply routine for var- ious processor models showing speedup normalized to an implementable pipe lined stack model 120 7.1 The indirection scheme for quasi-static compilation 128 7.2 Pseudo-code showing explicit checks for reference resolution 131 7.3 Timing measurements for an input size of 100 138 7.4 Timing measurements for an input size of 10 139 7.5 Comparing indirection table update strategies 140 List of Figures ix 8.1 Example concurrent reference mutation 146 8.2 Concurrent GC RPA query 150 8.3 The relational profiling architecture contains the profile control table (PCT) and the query engine 152 8.4 Generational write-barrier pseudo-code 155 8.5 Time line for the second GC in the Strata benchmark 156 8.6 System-on-a-chip design 157 9.1 An illustration of the Java Stack for a Java Thread 166 9.2 Java Object Structure 171 9.3 Block diagram for a MAJC implementation 176 9.4 Efficiency of the Speculative Thread for various Over- heads and Savings 184 10.1 Dynamic bytecode execution frequencies for various byte- code classes 191 10.2 Normalized dynamic instruction execution counts for var- ious execution models 192 10.3 Cumulative distribution of local variable access for se- lected SPEC JVM98 programs 193 10.4 Method call depth distribution for selected SPEC JVM98 programs 194 10.5 Register-windows miss ratios versus the number of register- windows, for selected SPEC JVM98 programs 196 10.6 Per-procedure visible registers and argument-passing operation 198 10.7 Normalized static instruction counts, broken down into various bytecode-mapping overheads, for selected SPEC JVM98 kernels 200 10.8 Active temporary variables distribution for selected SPEC JVM98 kernels 201 10.9 Distribution of active temporary variables that need sav- ing across method calls, for selected SPEC JVM98 kernels 202 10.10 The effect of the proposed optimizations on static in- struction counts, for selected SPEC JVM98 kernels 204 11.1 Token/thread life-cycle 211 11.2 Serial & parallel executions 212 11.3 Speedup of nfib in the current configuration 216 11.4 Speedup of nfib in the future configuration 217 11.5 Speedup of nfib, current configuration, token passing vs. oracle 218 11.6 Speedup of jnfib, using light RTS 219 11.7 Speedup of jnfib, using medium RTS 220 x JAVA MICROARCHlTECTURES 11.8 Speedup ofjnjib, current configuration, light RTS, P=32, T=2 221 11.9 The Empty program 222 11.10 Speedup vs. outer loop iterations for the Empty program 223 11.11 Load balance of Empty program, LN = 219, current configuration 224 11.12 mpeg2encode results 226 11.13 jmpeg2decode results 227

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.