ebook img

USENIX 1997 Annual Technical Conference Proceedings PDF

330 Pages·37.7 MB·English
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview USENIX 1997 Annual Technical Conference Proceedings

................................................................... ... Foard ditciooponiftaeh lsep sreoc eecdoinntgasc t: USENAIsXs ociation 256N0i nStthr Seuei2tt1,e5 BerkCeAl9 e4y7,U1 S0A Pho5n1e50:2 886 49 FAX5:1 504 587 38 Emaiolffic:e @usenix.org URLh:ttp ://www.usenix.org Thep riic$se3 fo 2r m embaenr$ds4 fo 0rn onmembers. OutstihUde.e S a.nACd.a napdlaea,ad sde $1pe8r c opfyop ro st(avagiiepar r imnattetde r). PaUsStE NTIeXc hCnoincfearle nces 199S6aD ni ego 198W9i nStaeDnri ego 199N5e Owr leans 198S8u mmSearFn r ancisco 199S4u mmBeors ton 198W8i nDtaelrl as 199W4i nStaeFnrr ancisco 198S7u mmPehroe nix 199S3u mmCeirn cinnati 198W7i nWtaesrh inDgCt on, 199W3i nStaeDnri ego 198S6u mmAetrl anta 199S2u mmSearAn n tonio 198W6i nDteenrv er 199W2i nStaeFnrr ancisco 198S5u mmPeorrt land 199S1u mmNears hville 198W5i nDtaelrl as 199W1i nDtaelrl as 198S4u mmSearLl atkC ei ty 199S0u mmAenra heim 198W4i nWtaesrh inDgCt on, 199W0i nWtaesrh inDgCt on, 198S3u mmTeorro nto 198S9u mmBearl timore 198W3i nStaeDnri ego 199©7C opyrbiyTg hUheSt E NAIsXs ociation AlRli gRhetsse rved. Thviosl uipmsue b laisasc hoeldl weocrtRkii.vg eth iotn sd ivpiadprueearmlswa iittnhh ae u tohrto hrae u thor's emplPoeyremri.is gssr iaonfontrt e hndeo ncommreerpcrioadoluft c hcteoi mopnwl oerfotkree ducaotri onal resepaurrcphUo SsEeNsaI.cX k nowalltelrd agdeesam paperakhrsei rnegi n. ISB1N- 880474 6-84- Prinittnhe Uden iStteadot fAe mse roin5c 0ar% e cypcalp1ee0dr- ,1p 5o%cs otn swuamsetre . The USENIX Association Proceeodfti hneg s USEN1I9X9 A7n nuTaelc hnCiocnafle rence January 6-10, 1997 Anaheim, California, USA ACKNOWLEDGMENTS ProgCrhaami r InviTtaeladkn sGd u ru-CIoso-rIdni nators JohKno hPulre, A tria Corporation MarBya kSetanrfo,rd University BarKreyr chXeerovx Paarle , ProgCroammm ittee WoriknPs r ogCroeosrsd inator MatBtl aAzT&eT ,La bs-Research JohSnc himSimliceon lGr,ap hics BiBloll oMsickrosyof,t R esearch NathBaonrieenlFis rstt Viertuialn H,old ings USENBIoXa Lrida ison ChaBrrliiDegig gitasl E,qu ipment Corp. DanEi.Ge ele JrrO,.p e,n M arket, Inc. CleCmo lDiegit,al Equipment Corp. FrDeodu gAlT&iT sla,bs- Research TermRionoaml RoGbi ngSuen Mlilcro,sy stems GretPchhielnUln ivierspitys at, B uffalo Mike KarBeerlkesley, So ftware Design, Inc. JohSnc himSimliceon lGr,ap hics AdminisUtSrEaNtAIisXos no:c Sitaatffi on CaSrtla eHelwliettn-Pa,ck ard Labs EllYioeuE nxegcu,tiv e Director JuDdeys HarnMaeietisng, Pl anner ExteRrenvaile wers DanVi.Ke lle Tiutonria,l D irector HowaArldt SusLaoVn e rso ZanKnnai gMahrkteti,ng RemAzrip aci PauLloan g CyntDheinVaoen ,dor Exhibition GaurBaavn ga StepMhaennl ey TreBvloarc kwellH eiMdciC lure USENPIrXo ceePdriondgusc tion AarBorno wn KiMrckK usick PennJfieenlPsdueb linca,tio ns Manager LarCrayb le LarMrcyV oy DatRae productions KarAe.Cn a sella EthMainl ler MikDea hlin GreMgi nshall USENSIuXp pSotratff JeffDeutch RobMeorrtr is CollBeiednd le AndDruesas eau AdaMmo skowitz EilCeuernt is JoDhuns tin BiNlels heim DiaDneeM artini YasuEhnidroo GeorNgeev ille-NeJiullK ieei ser BilFluyl ler KelOl'yH air TonVie glia JohLn.F urlani JohOnu sterhout KanGthha traju KenPte acock TimoGtihbys on JayaAr.Ra emdid y JohHne idemann JiRme es Will Hill JohRno ach LarHruys ton GleSncno tt JohIno annidis SteSveen ator ChrJiasc kson ChrSimsa ll DeepKaakk adia KeiStmhi th DineKsaht iyar RameSsuhb rahmaniam DavKeo rn ErSiucl tan ChrisKtroepbhse r WiTnr eese OrrKarni eger AmiVna hdat ToLma Strange LinWdaan g DiaLneeb el JiWmo odward JohLnoV erso CliYffou ng ii 19A9n7n uTaeclh Cnoincfearle nce USENAIsXs ociation CONTENTS Preface ...................................................................................... v Author Index. ............................................................................... vi Wednesday, January 8 Performance I Session Chair: Carl Staelin, Hewlett-Packard Laboratories Embedded !nodes and Explicit Grouping: Exploiting Disk Bandwidth for Small Files ........................ 1 GOrbesgeorvriyn Rg . tGhea nEgffeerc, tMs o. Frf Manusi t iK-Zaaosnheo Dekis, kMs .a.ss.a.c.h.u.s.e.tt.s .I.n.st.i.tu.t.e. o.f. T.e.c.h.n.o.lo.g.y. ............................ 19 ey ARo RdnevisiVtaatni oMne otef r,K lenrnfoeml Siaytniocnh rSocniieznactieosn I nSscthiteumtee, sU .n.iv.e.r.s.it.y. o.f. S.o.u.t.h.e.rn. .C.a.l.if.o.r.n.ia. ........................ 31 ey Christopher Small, Stephen Manl , Harvard University Interface Tricks Session Chair: Rob Gingell, Sun Microsystems Porting UNIX to Windows NT .................................................................. 43 G. PDraovtiedc ted KShornare, dA TL&ibTra Lriaebs-s-RAe sNeaewrc hA pproach to Modularity and Sharing ................................ 59 ey Arindam Banerji, Hewlett-Packard Laboratories, John M. Trac , T.J. Watson Research Center; DExatveindd Lin. gC tohhen O, Upenriavteirnsgi tySy osft Nemot aret tDhea mUes er-Level: the Ufo Global File System ............................ 77 Albert D. Alexandrov, Maximilian lbel, Klaus E. Schauser, Chris J. Scheiman, University of California, Santa Barbara Client Tricks Session Chair: Fred Do uglis, AT&T Labs-Research Network-aware Mobile Programs ................................................................ 91 Mudumbai Ranganathan, Anurag Acharya, Shamik Sharma, Joel Saltz, University of Maryland Using Smart Clients to Build Scalable Services .................................................... 105 Chad Yoshikawa, Brent Chun, Paul Eastham, Amin Vahdat, Thomas Anderson, David Culler University of California, Berkeley 9 Thursday, January Clustering Session Chair: Clem Cole, Digital Equipment Corporation Building Distributed Process Management on an Object-Oriented Framework ............................ 119 KAedna pSthivirer aiff,nd S Runel Miabiclero Psayrsatellmels CLoambopruatoinrgie so n Networks of Workstations ................................ 133 Robert D. Blumofe, University of Texas, Austin; Philip A. Lisiecki, Massachusetts IAn sDtiitsuttreib ouft eTde cShhnaorleodg yM emory Facility for FreeBSD ................................................ 149 Pedro A. Souto; Eugene W Stark, State University of New York, Stony Brook USENIX Association 1997 Annual Technical Conference iii Tools Session Chair: Matt Blaze, AT&T Labs-Research Cdt: A General and Efficient Container Data Type Library ............... ...... ....................... 163 KAi eSmim-Pphleo anngd V Eo,x AteTn&sibT lLe aGbrsa-pRheisceaalr Dche bugger ............ .................................... ..... 173 DCgaevti,d C Rp. uHt,a annsdo nS,t aJgeffe-rSey Laf.e K Foirnle , TPrrainnscpeotrotn T Uoonlisv efrosri ttyh e Internet . .... .................................1 85 Bill Cheswick, Bell Laboratories Friday, January 10 User Tools Session Chair: Charlie Briggs, Digital Equipment Corporation WebGlimpse-Combining Browsing and Searching .. ...... ......... ......... ....................... 195 MUdaii lMinagn Lbiesrt, AMricchhiaveel TSomoiltsh ,. B.u.r.r.a. G..o.p.a.l,. U..n.iv.e.r.s.it.y. o.f. A.r.i.w..n.a ....................................... 207 SEaxmpe Lrieefflnceer , wSiitlhic oGnr oGurpaLpehnisc:s ;M Maeklianngg Ue sTeonretut bUas,e Tfuolr tAubgaa inC o.n.s.u.l.ti.n.g. . .................................. 219 ey Bradl N. Miller; John T. Riedl; Joseph A. Konstan, University of Minnesota Performance II OSevsesricoonm Cinhga iWr:o Mrkisktea tKioanr eSlcsh, Bedeurlkienlge yP rSoobfltewmasre i nD ae sRigenal -Time Audio Tool .......... ....................2 35 OIsnid Dore sKigonuivnegla Ls,i gVhictwkye iHgahrt dTmhraena, dUsn fiovre Srsuitbys tCraotlel eSgoef Ltwoanrdeo n. ............................ ..............2 43 MHiagthth-Pewer fHoraminaensc, eU Lniovcearls iAtyre oaf CWoymommiunngi cation With Fast Sockets . ....... ............................. 257 ey SCteavcehn iHn.g R aondrdig Suetsa, sThhionmga s E. Anderson, David E. Culler, University of California, Berkel Session Chair: Bill Bolosky, Microsoft Research An Analytical Approach to File Prefetching . .................. .................................... 275 Hui Lei, Dan Duchamp, Columbia University Optimistic Deltas for WWW Latency Reduction ........................ ........................... 289 Gaurav Banga, Rice University; Fred Doug/is, Michael Rabinovich, AT&T Labs-Research A Toolkit Approach to Partially Connected Operation .............. .................. .......... ..... 305 Dan Duchamp, Columbia University iv 1997 Annual Technical Conference USENIX Association Preacfe Welcome to Anaheim and to the 1997 USENIX Conference! On behalf of the USENIX Association, thank you for coming to the conference. We have a lot going on here this year: the refereed paper tracks, invited talks, works-in-progress and guru-is-in sessions, and two full days of timely tutorials. New at the 1997 conference is the co-location of and joint registration with the USELINUX conference. We think you'll find plenty of interesting sessions here to keep you busy the whole week! We have chosen 23 papers for the refereed track this year. They cover topics such as filesystems, networking, net­ worked systems, programming tools, user tools, and perfonnance. Please extend my thanks to the 155 authors from 37 universities, 13 companies, and 11 countries for their 74 paper submissions for the technical track. Without their papers describing their recent efforts, we would not have a refereed track to present. Some of the papers not accepted for the refereed track will appear infonnally, in the works-in-progress session or maybe even in a birds-of­ a-feather gathering. There are many people involved in preparing a USENIX conference, many more than space here can mention. Those of special note include the wonderful USENIX staff I worked with: Ellie Young, Judy DesHarnais, Zanna Knight, and Pennfield Jensen-they were always glad to offer me advice, direction, and able assistance. Dan Geer, my USENIX board liaison, kept a watchful eye and was always ready to listen. Special thanks go to Mary Baker and Berry Kercheval for arranging the invited talks sessions; to Dan Klein who organized the fantastic tutorial selection; to Margo Seltzer and Vera Gropper at Harvard University for hosting the program committee meeting; to Keith Smith who served as the Program Commitee's scribe; and to my employer, Pure Atria, for supporting my work as chair. Finally, I would like to thank my program committee ( IO hardy souls) and the external reviewers (listed separately) for their long hours reading papers and writing detailed reviews last summer. The program committee provided feedback to the authors of each submission, drawing from the 5 or more reviews of each paper. We've got a fine technical track this year due to the hard work and hard decisions of these volunteers. John Kohl, Program Chair V USENIX Association 1997 Annual Technical Conference Author Index Anurag Acharya ............................9 1 Sam Leffler .............................. 207 Albert D.A lexandrov ........................7 7 Philip A.L isiecki .......................... 133 Thomas E.A nderson ...................1 05, 257 Udi Manber .............................. 195 Gaurav Banga .............................2 89 Stephen Manley ............................ 31 Arindam Banerji.. ..........................5 9 Bradley N. Miller .........................2 19 Robert D.B lumofe .........................1 33 Michael Rabinovich ....................... 289 Bill Cheswick .............................1 85 Mudumbai Ranganathan ..................... 91 Brent Chun ...............................1 05 John T. Riedl .............................2 19 David L.C ohn .............................5 9 Steven H. Rodrigues ....................... 257 David E. Culler. .......................1 05, 257 Joel Saltz ................................. 91 Fred Douglis ..............................2 89 Klaus E. Schauser .......................... 77 Dan Duchamp ........................2 75, 305 Chris J. Scheiman .......................... 77 Paul Eastham .............................1 05 Shamik Sharma ............................ 91 Gregory R. Ganger ...........................I Ken Shirriff .............................. 119 Bu rra Gopal ..............................1 95 Christopher Small .......................... 31 Matthew Haines ...........................2 43 Michael Smith ............................ 195 David R. Hanson ..........................1 73 Pedro A. Souto ........................... 149 Vicky Hardman ...........................2 35 Eugene W. Stark .......................... 149 Maximilian Ibel ............................7 7 Melange Tortuba .......................... 207 M.F rans Kaashoek ..........................I John M.T racey ............................ 59 Joseph A.K onstan .........................2 19 Amin Vahdat ............................. I 05 David G. Korn ............................. 43 Rodney Van Meter. ......................... 19 Jeffrey L.K orn ............................1 73 Kiem-Phong Vo ........................... I 63 Isidor Kouvelas ...........................2 35 Chad Yoshikawa .......................... I 05 Hui Lei ..................................2 75 vi 1997 Annual Technical Conference USENIX Association Embedded Inodes and Explicit Grouping: Exploiting Disk Bandwidth for Small Files Gregory R. Ganger and M. Frans Kaashoek M.I.T. Laboratory for Computer Science Cambridge MA 02139, USA {ganger,kaashoek}@lcs.rnit.edu http://www.pdos.lcs.mit.edu/ Abstract liseconds), the subsequent data bandwidth is reasonable (> IO MB/second). Unfortunately, although file systems have been very successful at exploiting this bandwidth for large files [Peacock88, McVoy91, Sweeney96], they Small file performance in most file systems is limited by have failed to do so for small file activity (and the cor­ slowly improving disk access times, even though cur­ responding metadata activity). Because most files are rent file systems improve on-disk locality by allocat­ small (e.g., we observe that 79% of all files on our file ing related data objects in the same general region. servers are less than 8 KB in size) and most files ac­ The key insight for why current file- systems perform cessed are small (e.g., [Baker91] reports that 80% of poorly is that locality is insufficient exploiting disk file accesses are to files of less than I 0KB), file system bandwidatdhj afcoern stmlya. ll data objects requires that they be performance is often limited by disk access times rather placed We describe C-FFS (Co-locating than disk bandwidth. embedded inodes explicit grouping, Fast File System), which introduces two techniques, One approach often used in file systems like the fast and for exploit­ file system (FFS) [McKusick84] is to place related data ing what disks do well (bulk data movement) to avoid objects (e.g., an inode and the data blocks it points to) what they do poorly (reposition to new locations). With near each other on disk (e.g., in the same cylinder group) embedded inodes, the inodes for most files are stored in order to reduce disk access times. This approach can in the directory with the corresponding name, remov­ successfully reduce the seek time to just a fraction of that ing a physical level of indirection without sacrificing the for a random access pattern. Unfortunately, it has some logical level of indirection. With explicit grouping, the fundamental limitations. First, it affects only the seek 1 data blocks of multiple small files named by a given di­ time component of the access time , which generally rectory are allocated adjacently and moved to and from comprises only about half of the access time even for the disk as a unit in most cases. Measurements of our random access patterns. Rotational latency, command C-FFS implementation show that embedded inodes and processing, and data movement, which are not reduced explicit grouping have the potential to increase small by simply placing related data blocks in the same gen­ file5 t-h7r oughput (for both reads and writes) by a factor eral area, comprise the other half. Second, seek times of compared to the same file system without these do not drop linearly with seek distance for small dis­ techniques. The improvement comes directly from re­ tances. Seeking a single cylinder (or just switching ducing the number of disk accesses required by an order between tracks) generally costs a full millisecond, and of magnitude. Preliminary experience with software­ this cost rises quickly for slightly longer seek distances development applications shows performance improve­ [Worthington95]. Third, it is successful only when no ments ranging from 10-300 percent. other activity moves the disk arm between related re­ quests. As a result, this approach is generally limited 1 Introduction to providing less than a factor of two improvement in performance (and often much less). It is frequently reported that disk access times have not Another approach, the log-structured file system kept pace with performance improvements in other sys­ 1 We use the terms access time and service time interchangeably to refer to the time from when the device driver initiates a read or write tem components. However, while the time required to request to when the request completion interrupt occurs. fetch the first byte of data is high (i.e., measured in mil- USENIX Association 1997 Annual Technical Conference (LFS), exploits djsk bandwidth for all file system data, including large files, small files, and metadata. The idea is to delay, remap and cluster all modified blocks, only writing large chunks to the disk [Rosenblum92]. Assumjng that free extents of disk blocks are always A. Ideal Layout available, LFS works extremely well for write activ­ ity. However, the design is based on the assumption that file caches will absorb all read activity and does not help in improving read performance. Unfortunately, anecdotal evidence, measurements of real systems (e .g., [Baker91]), and simulation studies (e.g., [Dahlin94]) all B. Reality after usage indicate that main memory caches have not eliminated read traffic. This paper describes the co-locating fast file system (C-FFS), which introduces two techniques for exploiting disk bandwidth for small files and metadata: and Embedding inodes in the embedded C. Embedded !nodes directory that names them (unless multiple directories do inodes explicit grouping. so), rather than storing them in separate inode blocks, removes a physical on-disk level of indirection without sacrificing the logical level of indirection. This tech­ nique offers many advantages: it halves the number of blocks that must be accessed to open a file; it allows the D. Explicit Grouping inodes for all names in a directory to be accessed with­ out requesting additional blocks; it eliminates one of the ordering constraints required for integrity during file Figure I: Organization and layout of file data on disk. creation and deletion; it eliminates the need for static This figure shows the on-disk locations of directory (over-)allocation of inodes, increasing the usable disk blocks (marked 'D'), inode blocks ('I') and the data capacity [Forin94]; and it simplifies the implementation for five single-block files ('Fl' -'F5' ) in four different and increases the efficiency of explicit grouping (there scenarios: (A) the ideal conventional layout, (B) a more is a synergy between these two techniques). realistic conventional layout, (C) with the addition of Explicit grouping places the data blocks of multiple embedded inodes, and (D) with both embedded inodes files at adjacent disk locations and accesses them as a and explicit grouping (with a maximum group size of single unit most of the time. To decide which small four blocks). files to co-locate, C-FFS exploits the inter-file relation­ ships indicated by the name space. Specifically, C-FFS groups files whose inodes are embedded in the same di­ the ideal layout of data and metadata for five single­ rectory. The characteristics of disk drives have reached block files, which might be obtained if one uses a fresh the point that accessing several blocks rather than just FFS partition. In this case, the inodes for all of the files one involves a fairly small additional cost. For exam­ are located in the same inode block and the directory ple, even assuming minimal seek distances, accessing block and the five file blocks are stored adjacently. With 16 KB requires only 10% longer than accessing 8 KB, this layout, the prefetching performed by most disks will and accessing 64 KB requires less than twice as long exploit the disk's bandwidth for reads and scatter/gather as accessing a single 512-byte sector. Further, the rela­ 1/0 from the file cache can do so for writes. Unfortu­ tive cost of accessing more data has been dropping over nately, a more realistic layout of these files for an FFS the past several years and should continue to do so. As file system that has been in use for a while is more like a result, explicit grouping has the potential to improve that shown in Figure 1B. Reading or writing the same small file performance by an order of magnitude over set of files will now require several disk accesses, most conventional file system implementations. Because the of which will require repositioning (albeit with limited incremental cost is so low, grouping will improve per­ seek distances, since the picture shows only part of a formance even when only a fraction of the blocks in a single cylinder group). With embedded inodes, one gets group are needed. the layout shown in Figure IC, wherein the indirection Figure I illustrates the state-of-the-art and the im­ between on-disk directory entries and on-disk inodes is provements made by our techniques. Figure IA shows elimjnated. Finally, with both embedded inodes and ex- 1997 Annual Technical Conference USENIX Association 2

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.