ebook img

Analytical Studies of Strategies for Utilization of Cache Memory in Computers PDF

4 Pages·0.12 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Analytical Studies of Strategies for Utilization of Cache Memory in Computers

Analytical Studies of Strategies for Utilization of Cache Memory in Computers Satya N. Majumdar and Jaikumar Radhakrishnan Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai-400005, India We analyze quantitatively several strategies for better utilization of thecache or thefast access 0 memory in computers. We define a performance factor α that denotes the fraction of the cache 0 area utilized when the main memory is accessed at random. We calculate α exactly for different 0 competing strategies, including the hash-rehash and the skewed-associative strategies which were 2 earlier analyzed via simulations. n a PACS: 07.05.Bx, 07.05.Tp, 02.50.-r J 9 The memory of a computer is organized in pages. A point (i) as no time is wasted in searching. However it ] single page consists of several words. As the computer performs poorly on point (ii) as a page can be sent out h performs its computations, it reads from and writes into even when most of the cache is unused. c e memory. When a word of main memory is accessed, the Thesecondstrategy,Strategy2,usuallycalledassocia- m entire page containing the word is fetched and stored in tivemappingstrategy,allowsP toresideanywhereinthe a - a more accessible partofthe computer’s hardwarecalled cache; if all the pages in the cache are already occupied, at the cache, whichadmits fastaccess. Subsequentaccesses one of them, chosen according to some rule (e.g. least t are likely to be for words in this page and hence the av- recently used), will be sent back to the main memory to s . erage memory access time is considerably reduced [3]. make space for the P . In this strategy, when a page t a a Sincefastmemoryisexpensive,thecacheusuallyholds P is accessed,determining if it is alreadypresent in the a m farfewerpagesthanthemainmemory. Whenaprogram cache can be expensive, both from the point of view of - requires a certain page that is not already in the cache, time taken and the hardware needed to implement the d thecomputerhastofetchitfromthemainmemory. But search. It has, however, its advantage in the utilization n o where in the cache should this new page be placed? The ofcachearea,sincenopageinthecacheissentoutunless c placementsinthe cacheoftheseincomingpagesmustbe the cache is full. [ organized so that (i) very little time is spent in locating Thus Strategy 1, though preferable from the point of 1 the page where the word is stored in the cache and (ii) view of design, is likely to be inferior to Strategy 2 in v the cache area is maximally utilized, i.e., ideally pages utilization of the memory available in the cache. In or- 0 shouldnotbesentbacktothemainmemory,whenthere der to improve the performance of caches, several other 9 is space left in the cache. strategies, which try to to combine the advantages of 0 1 To fix our notations, we assume that the main mem- Strategies 1 and 2, have been proposed. In this letter, 0 ory has M =2m pages and the cache has N =2n pages, wewillprimarilybeconcernedwiththreesuchstrategies 0 where m≫n. We use m-bit andn-bit 0-1strings as ad- A, B and C mentioned below. These strategies perform 0 dresses for pages in the main memory and the cache re- considerablybetter thanStrategy2onpoint(i), andare / t spectively. The page of the main memory corresponding found to be better than Strategy 1 on point (ii). Our a m to the address a ∈ {0,1}m will be denoted by Pa. Simi- goal in this Letter would be to compare quantitatively larly Q will denote the page in the cache corresponding the performances of these various strategies. - b d to the address b∈{0,1}n. When an accessto pageP is Strategy A: This is knownas hash-rehash Strategy [1]. a n made in the mainmemory,it has to be broughtto a cer- Inthisstrategy,eachpageofthemainmemoryisallowed o tain pageQ in the cache. So the question is what is the to reside in two locations Q and Q in the cache that c b b1 b2 : best strategy for choosing Qb (i.e., cache organization) are determined as follows: b1 is the string of the first v such that the cache area is maximally utilized. n bits of a and b is the string of the next n bits of a. i 2 X There are two extreme strategies for cache organiza- When a page Pa from the main memory is brought to r tion. Strategy 1, usually called direct mapping strategy, the cache, it is first put in Qb1 provided Qb1 is empty; a assigns a fixed location Qb in the cache for eachpage Pa if Qb1 is occupied we place Pa in Qb2. (If there is some ofthe mainmemorywherebisthefirstnbits ofa. That pagePa′ residingat Qb2, then it is replacedby Pa andis is,eachtimeP isfetchedintothecache,itwillbeplaced sent back to the main memory.) a inpageQ ;ifthereisalreadyapageofthemainmemory Strategy B: This is known as two-way skewed- b residing at Q , then that page will be sent back to the associative strategy [7,2,5,6]. In this case, we divide the b main memory (the main memory will be updated) and cacheintotwobanksQandQ′,eachwithcapacity2n−1. ′ P will replace it in the cache. In this strategy when an The pages of the two banks are denoted by Q and Q a b b access to a page P is made, we know exactly where to respectively where b∈{0,1}n−1. With each page P are a a ′ finditinthecache,namelyQ . Thusitperformswellon associated two pages Q and Q , one from each bank. b b1 b2 1 1 When a page P is brought to the cache, we first try to k a x (k)=1− 1− . (1) 1 store P in the first bank at location Q , and if that N a b1 (cid:0) (cid:1) ′ fails, we store it in the second bank at location Q . b2 The performance factor is then given by, Strategy C: This is known as two-way set-associative 1 strategy [1,3]. Here the cache is again divided into two α = lim x (n)=1− =0.63212.... (2) 1 n→∞ 1 e banks. TheonlydifferenceisthatweassociatepagesQ b1 ′ andQb1 withpagePa. Thatis,ifQb1 isalreadytaken,we Similar arguments can be used for Strategy C also. try to store the page in the second bank but at location Here there are two banks and each bank has N′ = N/2 ′ ′ Q (and not Q as in Strategy B). bins. For every bin in the first bank, there is an associ- b1 b2 ated bin in the second bank. A bin from the first bank Clearly,StrategiesA,B andC arevariantsofStrategy is chosen at random and a ball is thrown into that bin. 1. Recently, based on data obtained from simulations, If the bin was already occupied then the ball is thrown Seznec and Bodin [7,2,5,6] have strongly advocated the into the associated bin in the second bank. If the sec- use of skewed-associative caches (Strategy B). However ond bin was also occupied, then the ball is rejected. Let no attempt appears to have been made to compare the x (k) and x (k) denote the fraction of occupied bins in performance of this strategy with older strategies within 1 2 the first and the second bank respectively after k tri- a theoretical framework. In this letter, we take the first als. Then clearly x (k) = 1−(1−1/N′)k as given by steps in this direction. In particular, we calculate ana- 1 Eq. (1). We also note that the probability that a bin lytically a quantity called performance factor α (defined in the second bank remains unoccupied given that its below) for various strategies mentioned above. partner in the first bank is occupied is simply given by The cache has place to store N = 2n pages. If N x (k)−x (k) = k (1−1/N′)k−1. This is because out 1 2 N′ pages are accessed at random, a perfect strategy would of k trials, only one (which can be 1-st, 2-nd,..., or the accomodate them all in the cache, without sending any k-th trial) should be succesful in choosing the given bin page back to the main memory. But for most practi- in the first bank and the others should be unsuccessful. cal strategies some pages will be sent back to the main So the performance factor α is given by, C memory. We define the performance factor α of a strat- egy to be the expected fraction of the randomly ac- α = lim x1(N)+x2(N) =1−2e−2 =0.72932.... (3) cessed N pages (in the N → ∞ limit) that get acco- C N→∞ N modated in the cache. A perfect strategy has α = 1. Itturnsoutthatsuchelementaryarguments,however, For all other strategies, α ≤ 1. We show below that α =1−e−1 =0.63212...forStrategy1butitincreases donotgiveustheexpressionsforthefractionofoccupied 1 binsforStrategiesAandB andonehastounfortuantely considerably to α = (e2 − 1)/(e2 + 1) = 0.76159..., A α = 1 − 1 (1 + e1−e−2) = 0.77167... and α = carryoutmoredetailedcomputationsforthosetwocases B 2e2 C which we outline below. 1−2e−2 = 0.72932... for variants A, B and C respec- WefirstconsiderStrategyA. Inthiscasetherearetwo tively. fixed locations Q and Q in the cache corresponding b1 b2 Itturnsoutthatα andα canbecomputedquiteeas- to a page P in the mainmemory. If a is anm bit string 1 C a ily using elementary probability arguments. Let us first then b consists of the first n bits of a and b the next n 1 2 compute α for Strategy 1 in its original form. Suppose bits of a (assuming m ≫ 2n). Suppose that an address 1 apageP ofthemainmemoryischosenatrandom. The a is accessed randomly by the computer. As a varies a first n bits of a will be uniformly distributed in {0,1}n. uniformly over {0,1}m, the two correspondingstrings b 1 Thus, each access corresponds to the choice of one of and b also vary uniformly over {0,1}n and their distri- 2 the N = 2n pages of the cache, and we want to know butions are independent. Thus, in the language of balls how many distinct pages of the cache are expected to be and bins, we have the following process. There are N chosen after N random pages of the main memory are balls and N bins. The balls are placed one after another accessed. In other words, we have N bins and N balls using the following strategy. For a given ball, a bin is and each ball is thrown at random into one of the bins; chosenatrandomandthe ballis attempted to be placed if the bin is already occupied then the new ball replaces there. Ifthebinisemptythe balloccupiesthebin; ifthe the existing ball. (Clearly, it makes no difference to the binisoccupied,anotherbinispickedatrandom(noteb 2 calculations if we think that the old ball stays and it is is independent of b unlike in Strategy C). If this second 1 the new ball that is discarded.) What is the expected bin is empty, the ball occupies it. However, if this bin is number of bins that will be occupied after all N balls also occupied, then the ball is discarded. have been tried? The probability that any fixed bin re- We define P (r,k) to be the probability that r bins A mains empty after k balls have been tried is (1−1/N)k. areoccupiedafter“time”k,i.e.,afterk trials. Notethat Therefore, the expected fraction x (k) of the occupied in this case each ball is tried at the most twice. The 1 bins after k trials is given by, evolution equation for P (r,k) is then given by, A 2 r r−1 2 2 Putting k = N in Eq. 9 and taking N → ∞ limit, we P (r,k+1)= P (r,k)+ 1− A A N N get the performance factor, (cid:0) (cid:1) (cid:2) (cid:0) (cid:1) (cid:3) ×P (r−1,k). (4) A e2−1 with the “boundary” condition, PA(0,k) = δk,0 and the αA = e2+1 =0.76159.... (10) “initial” condition, P (r,0) = δ . This equation can A r,0 be written in a compact matrix form, [PA(r,k +1)] = Next, we consider Strategy B. In this case, the cache Wˆ[PA(r,k)] where Wˆ is the (n+1)×(n+1) evolution areaisdividedequallyintotwobanks. Correspondingto matrixwhoseelementscanbeeasilyreadoffEq. 4. This every page P in the main memory, there are two fixed a equation can be solved using the standard techniques of locationsQ inthefirstcompartmentandQ inthesec- b1 b2 statistical physics which we outline below. The general ond. Again b and b are uniformly and independently 1 2 solution can be written as distributed over {0,1}n−1. The operation on the cache thencorrespondstothefollowingproblemwithballsand PA(r,k)= aλQλ(r)λk (5) bins. There are N = 2n bins, half of which, N′ = 2n−1, Xλ ′ belong to the first compartment and the remaining N where[Q (r)]istherighteigenvector(witheigenvalueλ) belong to the second. There are N balls and each is λ ofthematrixWˆ anda ’sareconstantstobedetermined placed in the bins according to the following rules. One λ fromtheinitialcondition. TheeigenvaluesofWˆ aresim- of the first N′ bins is chosen at random. If the bin is ply, λ = l2/N2, where l = 0,1,2,...,N. The l-th right occupied one of the N′ bins in the second half is chosen. eigenvector satisfies the equation, If even this bin is occupied the ball is discarded. Let P (r ,r ,k) denote the probability that after r2 (r−1)2 l2 B 1 2 N2Ql(r)+ 1− N2 Ql(r−1)= N2Ql(r), (6) “time’k,thefirstcompartmenthasr1 occupiedbinsand (cid:2) (cid:3) the second r2 occupied bins. Now the second compart- for all r ≥ 1. The generating function Q˜ (z) = ment is tried only if the first compartment fails to acco- l NQ (r)zr satisfies the differential equation, modate a givenball. If there arer1 occupied balls in the 0 l first compartment at time k, clearly the scond compart- P z2(1−z)d2Q˜l +z(1−z)dQ˜l −(l2−N2z)Q˜ =0. (7) ment has been tried only (k −r1) times, out of which dz2 dz l r2 attempts have been successful. Noting that for each compartment separately, the local strategy is exactly as With a little algebra, it is not difficult to show that the in Strategy 1 defined earlier, it is easy to see that, wellbehavedsolutionofthisdifferentialequationisgiven by, Q˜ (z) = zl(1 − z)P1,2l (2z − 1) for l < N and l N−l−1 PB(r1,r2,k)=P1(r1,k)P1(r2,k−r1) (11) Q˜ (z)=zN where Pa,b(x)’s are Jacobi polynomials de- N n fined as, where P (r,k) is the probability of having r occupied 1 balls in the first compartment after k trials and is the n Pa,b = 1 n+a n+b x−1 n−m x+1 m. same as in Strategy 1. P1(r,k) can be computed exactly n 2n mX=0(cid:18) m (cid:19)(cid:18)n−m(cid:19)(cid:0) (cid:1) (cid:0) (cid:1) by following the same steps as used for Strategy A. Us- ingthisexactresultinEq. 11andaftersomealgebra,we Thefinalsolutionforthegeneratingfunction,P˜A(z,k)= finallyfindtheexpectedfractionofoccupiedbins(taking NP (r,k)zr is given by, into account both the banks) after k trials, 0 A P P˜A(z,k)=zN +N−1alzl(1−z)PN1,−2ll−1(2z−1) Nl 2k xB(k)=1− 21 1− N1′ kS(N′,k) (12) Xl=0 (cid:0) (cid:1) (cid:0) (cid:1) ′ where S(N ,k) is the sum, wherethecoefficientsa ’saredeterminedfromtheinitial l condition P˜ (z,0) = 1. After some amount of algebra, we get, A S(N′,k)= N′(−1)N′−l N′ N′l m k. (cid:18) l (cid:19)(N′−1)N′ N′ n(2−δ ) Xl=0 (cid:0) (cid:1) a =(−1)l+N−1 m,0 (8) l n−m Putting k = N and taking the limit N → ∞, we finally forl=0,1,2,...N−1. Thustheexpectedfractionofoc- get the performance factor αB for Strategy B, cupied bins after time k,x (k)= 1 ∂P˜A(z,k)| is given by, A N ∂z z=1 α =1− 1 1+e1−e−2 =0.77167.... (13) B 2e2 (cid:0) (cid:1) N−1 x (k)=1+2 (−1)l+N l 2k. (9) The method illustrated above can not only calculate A Xl=1 (cid:0)N(cid:1) the final performance factor but it also gives exactly the 3 full probability distribution of the number of occupied Another generalization of Strategy B would be to con- bins at any arbitrary “time” k and for any cache size sider2banksonlybutofunequalsizesN andγN respec- N. It turns out that there is an easier method (to be tively. Then a similar calculation shows that the perfor- describedbelow)tocompute justthe performancefactor mance factor, which is now a function of γ, is given by, and is valid only in the N → ∞ limit. It not only cor- e−(1+γ) γ rectlyreproducestheexactresultsforα’sasfoundabove α (γ)=1− − e−exp[−(1+γ)/γ]. (18) B 1+γ e(1+γ) butcanfurtherbe usedtocompute theperformancefac- tors for more general and complicated strategies. This function has a single maximum at γ = 0.68932... Consider, for example, a generalized version of Strat- where αB(max) = 0.775862..., clearly better than egy A where instead of 2 fixed locations per page in the αB(1)=0.77167..., the result for two equal banks. cache, one now has p fixed locations. In the “ball and In summary, we have studied analytically the perfor- bin” language, each ball is tried at most p times before manceofseveralstrategiesforcacheutilization. Wehave being discarded. Let r(k) be the random variable that derivedexactvaluesforcacheutilizationfactorsforthese denotesthenumberofoccupiedbinsafterdiscrete“time” strategies under the assumption that pages in the main k. Clearly, the increment ∆r(k)=r(k+1)−r(k) is also memory are accessed at random. arandomvariablethattakesthevalue0withprobability Cacheutilizationis justoneofseveralissuesinthede- (r/N)p (whenallptrialsforagivenballareunsuccessful) sign of caches. Indeed, it has been reported [1,5] that and 1 with probability 1−(r/N)p. Taking expectations, two-way set associative caches require smaller execution we get hr(k+1)i−hr(k)i = 1−h(r/N)pi. We then de- timesthanhash-rehashcaches,although,ouranalysisin- fine x = r/N and t = k/N and note that they become dicates that the latter utilize the cache better. continuous in the N →∞ limit and hxi evolves as Togetperformancevaluesthatwillbeusefultodesign- ers of cache, one should take into accountthe amount of dhxi =1−hxpi. (14) time spent in decoding addresses, updating tables and dt the cost of hardware. Also, instead of assuming that the pages of the main memory are accessed at random, Using the method of bounded differences [4], it can be a better understanding of the access patterns in typical shown that for each t∈[0,1] and ǫ>0 applications might be needed. Pr[|x(t)−hx(t)i|≥ǫ] ≤2exp(−2ǫ2N/p). We are grateful to Abhiram Ranade for bringing the question of analysis of skewed associative caches to our Fromthis,itfollowsthat|hxpi−hxip|≤O(N−21). Thus, attention; in particular, the idea of using unequal banks one can neglect fluctuations in the N → ∞ limit. Us- in this strategy is due to him. We also thank D. Dhar ing this in Eq. 14, one gets a closed equation for hxi(t) for useful discussions. whichcanthenbeintegratedtogivetheperformancefac- tor,α =hxi(t=1). Forgeneralp,wegetα assolution p p of the equation, αp dy =1. (15) Z 1−yp 0 [1] A.Agarwal,AnalysisofCachePerformanceforOperating For example, for p = 1 and p = 2, we reproduce respec- sysems and Multiprogramming, Kluwer Academic Pub- tively, α =1−e−1 and α =(e2−1)/(e2+1)fromEq. lishers, 1989. 1 A 15. For large p, we find from Eq. 15, [2] F. Bodin and A. Seznec, Skewed-associativity enhances performance predictability, Proc. of the 22nd Intl. Symp. log2 1 on Computer Architecture,June 1995. α =1− +O . (16) p p p2 [3] K. Hwang and F. A. Briggs, Computer Architecture and (cid:0) (cid:1) Parallel Processing, McGraw-Hill, 1984. Thusaspincreases,thedeviationfromperfectbehaviour [4] C. J. H. McDiarmid, On the method of bounded dif- (1−αp) decays as a power law ∼1/p. ferences. In Surveys in Combinatorics: Invited Papers Inasimilarfashion,onecanconsiderageneralizedver- at the 12th British Combinatorial Conference (Editor: sionofStrategyBwhereinsteadof2equalcompartments J.Siemons),number141inLondonMathematicalSociety in the cache, one now considers p equal compartments. Lecture Notes Series, pages 148-188. Cambridge Univer- sity Press, 1989. In this case, once again the performance factor for gen- [5] A. Seznec, A case for two-way skewed associative caches, eral p can be easily computed using the continuous time Proc. of the 20th Inl. Symposium on Computer Architec- method. Apartfromreproducingtheresultforp=2,we ture, pp.169-178, 1993. find that for large p, [6] A. Seznec, A new case for skewed-associativity, Publica- 1 1 tion Interne,No. 1114, IRISA,France. αB(p)=1− (e−1)p +O p2 . (17) [7] A.SeznecandF.Bodin,Skewed-associativecaches, Proc. (cid:0) (cid:1) of PARLE’93, pp.305-316, June1993. 4

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.