ebook img

DTIC ADA455550: Wide-Area Computing: Resource Sharing on a Large Scale PDF

10 Pages·0.29 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA455550: Wide-Area Computing: Resource Sharing on a Large Scale

Computing Practices s Wide-Area e c i t c a Computing: r P g n i t u Resource Sharing p m o C on a Large Scale Computing over wide-area networks has been largely ad hoc, but as needs increase, piecemeal solutions no longer make sense. Legion, a network-level operating system, was designed from scratch to target wide-area computing demands. C Andrew onsider almost any computing resource Grimshaw today—whether hardware, software, or data—and it will invariably be net- Adam Ferrari worked. Networking, especially wide- Frederick area networking, has created dramatic Knabe new possibilities for resource sharing. Cooperating contractors want selected access to each other’s enter- Marty prise systems. Researchers in geographically distant Humphrey universities need to pool and analyze data from multi- University of site experiments. Legacy codes on different comput- Virginia ing platforms must exchange information to support data mining and other integrated applications. These new possibilities depend on the ability to manage shared resources. But the sheer complexity of networked environments can turn this management problem into a nightmare. How do you share and manage resources yet maintain the autonomy of mul- provide a high-level way to share and manage them tiple administrative domains, hide the differences over the network. To be effective, such a system must between incompatible computer architectures, com- address the challenges posed by real end-user appli- municate consistently as machines and network con- cations (see the sidebar “Challenges for a Wide-Area nections are lost, and respect overlapping security Operating System”). Scalability, security, and fault tol- policies? The usual approach to these problems has erance are just a few of the characteristics a viable solu- been to deal with each situation individually. Piecemeal tion must have. solutions are cobbled together from scripts, sockets, Five years ago, we set out to design and build a and various networking tools. If all goes well, a sophis- wide-area operating system that would encompass all ticated programmer can build and maintain the appli- these challenges, allowing multiple organizations with cation, but even then the implementation tends to be diverse platforms to share and combine their resources. brittle and limited. Our system, Legion (http://legion.virginia.edu), is now Resource management is traditionally an operating operational on hundreds of hosts across nine US sites, system problem, but large-scale collections of including the two NSF supercomputer centers (San resources transcend classic operating system bound- Diego Supercomputer Center and National Center for aries. What is needed is a wide-area operating system Supercomputing Applications), two DoD supercom- that can abstract over a complex set of resources and puter centers (Naval Oceanographic Office and Army 0018-9162/99/$10.00 © 1999 IEEE May 1999 29 Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 1999 2. REPORT TYPE 00-00-1999 to 00-00-1999 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Wide-Area Computing: Resource Sharing on a Large Scale 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION University of Virginia,Department of Computer Science,151 Engineer’s REPORT NUMBER Way,Cahrlottesville,VA,22094-4740 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES The original document contains color images. 14. ABSTRACT 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE 9 unclassified unclassified unclassified Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 Legion Application Schedule class Application Application instances (simulation) class class (data mining) (visualization) Report Scheduler faults Host Host Vault Host Class Class Class Class Manager Manager Manager Manager Gather resource Detect info faults Host Host Host Host Vault Vault Host Host Load Glunix HPSS Unix FS Fork/exec leveler HPSS Cray T90 SP2 NOW SP2 Centurion Berkeley SDSC UVa U Michigan Figure 1. How the Legion wide-area operating system works. Legion Host and Vault proxy objects provide a uniform interface to heterogeneous collections of processing and storage resources. These resource proxy objects are managed by Class Manager objects. Class Managers detect and report resource faults, for example. Scheduler objects gather information about the system state; Class Managers use these Schedulers to select and access required resources. All these system components—resource objects, managers, schedulers, and application objects—are addressable in a single, systemwide, uniform object space. Research Laboratory), NASA’s Aeronautical Research approach, Legion is able to reuse local services, and Center, and several universities. Users have ported a sites do not have to change familiar local software range of scientific applications to Legion in areas such interfaces. as molecular biology, materials science, ocean and Legion is a component-based system: Distributed atmospheric science, electrical engineering, and com- application components are represented as indepen- puter science. dent, active objects. This approach greatly simplifies Legion is essentially a conduit between the end user the development of distributed applications and tools. and widely distributed collections of resources. Like a Instead of facing the complexity of a wide range of traditional operating system, it supports services such distributed resources and service interfaces, the pro- as resource management and a distributed file system. grammer works with the simple, uniform abstraction This operating system-style interface leverages appli- of distributed objects. Legion also supports a high level cation programmer experience and simplifies porting of site autonomy. Local sites can select and configure legacy applications to the Legion platform. However, the components that represent their resources and ser- unlike typical operating systems, Legion is layered on vices in any way they see fit, retaining complete con- top of existing software services. It uses the existing trol over local access control policies, resource quota operating systems, resource management tools, and mechanisms, and so on. Legion’s inherent flexibility security mechanisms at host sites to implement higher is its greatest strength, and its most important defin- level system-wide services. Because of this middleware ing characteristic. 30 Computer HOW LEGION WORKS vide the uniform interface for allocating persistent With components that must interoperate in wide- storage. These interfaces provide a consistent view area heterogeneous environments, Legion’s funda- of system resources, even though local resource mental object model resembles the Common Object interfaces differ significantly in practice. Request Broker Architecture (CORBA). Programmers • The resource object model provides a tremendous describe object interfaces in an interface description degree of site autonomy. Applications (acting as language (IDL) and then compile and link them to resource clients) use the generic object interfaces implementations in programming languages such as for the resources they require. Resource providers C++, Java, or Fortran. All system elements are objects are free to employ any desired implementation of and can communicate with one another regardless of the resource objects. location, heterogeneity, or implementation details. Within this object-based framework, Legion provides The second benefit is particularly significant. For the services of a distributed operating system. Figure example, if system administrators at a site want to 1 outlines the structure of a sample Legion system. enforce a specialized access control policy for their The easiest way to understand how Legion works local hosts, they can extend or replace the basic Host is to consider how it handles classic operating system implementation to enforce that policy. Similarly, some tasks, which we consider in turn. of the hosts in Legion systems may require access through a local queue management system such as Representing and managing resources Genias Software’s Codine (http://www.genias.de) or As Figure 1 shows, local sites use Host and Vault IBM’s LoadLeveler (http://www.rs6000.ibm.com/ objects to represent processors and storage, respec- software/sp_products/loadlev.html). In these cases, tively.Using objects to represent resources has two resource providers simply use extended, queue-aware primary benefits: Host objects. Likewise, if a resource provider makes storage in a local file system available to Legion, yet • Objects define a simple, consistent interface to wants to continue using local Unix-based accounting Legion’s resources. Hosts provide the uniform and quota tools, he can use a Vault object implemen- interface for creating objects (tasks); Vaults pro- tation that allocates storage under the appropriate Challenges for a Wide-Area ner institutions and to make a database of model with a regional, mesoscale weather Operating System image-processed results available to the model. The coupled models would feed At Boeing Company, designers use sim- partners. As a first step, they want a tool data to each other, creating more accurate ulation to make ever more complex air- that can automatically identify pertinent and detailed combined results. The exist- frames at a manageable cost. Pratt & Whit- MRI scans at partner hospitals, securely ing regional model runs only on a Cray ney, which designs and supplies jet engines move those scans over the Internet to Har- T90, while the global model runs on a to Boeing, also relies heavily on simulation. vard, and then process them. The partners Cray T3E and is being migrated to the When Boeing’s engineers simulate an air- will provide very little administrative sup- IBM SP. The applications need a way to frame’s behavior, they need to know how port for the tool. coordinate and exchange data with one the engine coupled to that airframe will In another medical setting, seven com- another at runtime, be scheduled to run perform under various conditions. How- peting Dayton, Ohio, hospitals are work- simultaneously on separate supercomput- ever, Pratt & Whitney cannot release its ing together to reduce costs. By sharing ers, and be easily controlled by a researcher proprietary engine simulations because of patient records and making them elec- at a single workstation. the significant intellectual property they tronically available to emergency room These applications characterize the encode. In an unwieldy information ex- physicians, they avoid expensive and time- spirit of wide-area computing. Some of the change process, Boeing engineers must ask consuming tests and can provide better requirements are unique, while others Pratt & Whitney engineers to run their sim- care more quickly. Each hospital has its overlap. The applications also illustrate the ulation at specified data points and then own legacy medical records system, IS per- following significant challenges, from send them results by tape. Boeing engineers sonnel, and procedures that must some- managing complexity to implementing then combine the information with their how be merged. However, each also has flexible, robust security. own simulation data and modify it accord- databases and programs that cannot be ingly. The process iterates. shared. Provide a high-level programming model In a completely different domain, Har- Finally, climate modeling groups at San Complexity is the programmer’s nemesis: vard Medical School researches the causes Diego Supercomputer Center, UCLA, and A large-scale system can comprise several dif- and symptoms of multiple sclerosis. They Lawrence Berkeley Laboratory want to ferent architectures, tens of sites, hundreds need to get MRI scans from multiple part- couple a global atmospheric circulation of applications, and potentially thousands May 1999 31 local Unix user-id for each Legion client. For example, an object’s Class Manager deter- Legion provides configurable default implementa- mines which resources the object may use, and tions of the basic resource objects, so resource might enforce a policy that lets instances run only providers generally need not write any code to make on a known set of trusted hosts. their resources available. However, through object • They actively monitor their instances. Class extension and replacement, Legion is flexible enough Managers query the status of their instances to to support new local resource interfaces and policies detect failures and coordinate failure response (see as they arise. Figure 1). In this role, Class Managers act as a dis- tributed, agent-based fault-detection and response Managing tasks and objects mechanism within Legion. Traditional operating systems must provide inter- • They support persistence.All Legion objects can faces for starting new tasks and controlling their exe- be persistent, existing arbitrarily beyond the life cution (suspend, resume, terminate, and so on). In of their creating program. When an object is not Legion, the notion of a task or process corresponds in use, it can be deactivated: Its state is saved to closely to the Legion object: Objects are the active com- stable storage, and its containing process is de- putational entities within the system. Legion encapsu- allocated (to conserve resources). This notion of lates object management functions in the Class object activation/deactivation is similar to tradi- Manager object type. Class Managers have three main tional operating systems temporarily swapping functions: out a job. To make object deactivation transpar- ent to clients, the Class Manager acts as an auto- • They support a consistent interface for object matic reactivation agent. If a client attempts to management. The Class Manager interface in- invoke a method on an inactive object, the Class cludes a natural set of object (or task) manage- Manager automatically reactivates it. Reactiva- ment operations, such as methods to create and tion is thus as transparent as resuming swapped- destroy objects. Each Class Manager is responsi- out processes in traditional systems. ble for a set of instances, which clients control through the Class Manager interface. Class Decomposing object management responsibilities Managers act as policymakers for their instances. into an arbitrary number of Class Managers provides of hosts. Reducing and managing complex- others, the ability to run psand get a list of trol policies, and computational cultures. ity is therefore critical. The object-oriented all processes throughout the system. We For example, a site might insist that users paradigm and object-based programming define a single system image as a universal authenticate via Kerberos before using its provide programmers and application name space and management infrastruc- resources, or that users sign an “accept- designers with encapsulation features and ture for all objects of interest to the system able use policy” statement, or that each tools for abstraction that reduce and com- and its users: files, processes, processors day from 1:00 p.m. to 6:00 p.m. no appli- partmentalize complexity. We firmly believe (hosts), storage, users, services, and so on. cations can be run that consume more than that object-based techniques are key to con- The names should be location independent five CPU minutes. Extensibility and flexi- structing robust, wide-area systems. (not contain any location information) and bility thus become essential—users must These techniques are not enough, how- should be usable from anywhere in the sys- be able to readily extend and configure the ever. Composable, high-level services must tem. Further, as programmers use resources system to satisfy local requirements. replace low-level interfaces such as rshand to create their own objects, they should not sockets in the programmer’s toolbox. be forced to explicitly place objects on a Manage heterogeneous resources Without such services, the complexity of particular host or disk—the system should Resource heterogeneity is a natural part distributed programming goes up dramat- handle that. Thus, the programmer or user of the distributed environment. Types of ically, increasing both the skill set required can specify or know an object’s location heterogeneity include processor, data for- to build applications and the fragility of when necessary, but if this information is mat, configuration (how much memory the resulting software. not relevant to his task, he can ignore it. and disk? which libraries are available on a host?), and operating system. If hetero- Offer a single system image Accommodate diverse geneity is not managed, individual users To combat the daunting number of dis- administrative policies and programmers must deal with the com- tinct hosts and file systems, programmers Most wide-area computing requires plexity induced by all the possible permu- need a single system image—the abstrac- joiningmultiple organizations and admin- tations of hardware, operating system, and tion of a single machine and associated istrative domains. To make this bridging resources, a task that can rapidly over- storage. For some, a “single system image” easy, the system must accommodate a whelm even the best programmers. means a single shared address space; for diverse set of local-use policies, access con- 32 Computer a natural distribution of the system’s object manage- address this deficiency, Legion supports a hierarchical ment activities. Also, because Class Managers are directory service, context space, which lets users assign extensible, replaceable objects, it is easy to customize arbitrary Unix-like string paths to objects. the system’s object management mechanisms. For The Legion naming mechanism reduces the com- example, to enable certain forms of failure resilience, plexity of designing distributed applications because some Legion classes use replication. The specialized it provides a single global name space for all system Class Managers used for these object classes create entities. A typical distributed environment supports and manage replicas transparently to clients. separate name spaces for files, hosts, and processes; Legion, in contrast, supports the same global name Naming space for all these as well as additional entities. The Naming is a basic interface issue in operating sys- interface to this global name space is very easy to use; tem design. For example, operating systems typically at the highest level (context space) the user manipu- define a name space for identifying processes (such as lates names in the familiar form of Unix-style paths. Unix PIDs), as well as a file system name space for Furthermore, Legion’s scalable, replicated binding ser- identifying files and directories. Legionrepresents all vices make name translation automatic and efficient.1 entities—files, processors, storage devices, networks, users, and so on—as objects. These objects are iden- Providing an extensible file system tified by a three-level naming scheme. At the lowest Traditional operating systems typically rely on a file level, each object is assigned an object address—a list system to manage and represent persistent storage. of network addresses for the object. An object address However, Legion’s global name space and persistent might contain an IP address and port number, for object model make a separate file system unneces- example. Because Legion objects can migrate, object sary—in practice, the generalized persistent object addresses change over time. Legion thus defines an space defined by Legion serves all the purposes of con- intermediate layer of location-independent names ventional file systems. In Legion’s “file system,” users called Legion object identifiers. LOIDs are globally see familiar elements such as paths, directories, and unique identifiers that are assigned to objects when universally accessible files, but they also see arbitrary they are created. Because they are binary, system- object types such as Hosts, Class Managers, and appli- assigned names, they are not convenient for users. To cation tasks. Grow without limits failure, application availability is the prod- must be a mechanism for supporting legacy The system must be able to add new uct of component availability. In today’s code without modification, and it must be hosts and resources over time. If the past business climate, an unavailable applica- able to support a variety of programming has shown us anything, it is that the num- tion can easily cost thousands of dollars languages. A wide-area computing envi- ber of interconnected computational per minute. A wide-area system must ronment must be language-neutral. resources will only increase. Users and therefore be resilient to failure and pro- organizations do not want arbitrary lim- vide a failure and recovery model and Implement flexible, robust security its on system size and capacity. System associated services to applications devel- Security includes a range of topics, architectures must therefore be scalable opers, so they can write robust applica- including authentication (how do I know and conform to the distributed systems tions. The model must include notions of who you are?), access control (who can do principle that “the amount of service fault detection, fault propagation, and a what to each resource?), and data integrity required of any single component of the set of useful failure mode assumptions. (how can I make sure no one can read or system must not grow as the system modify my data in memory, on disk, or on grows.” If an architecture does not con- Handle multilanguage and the network?). Each of these issues is in form, a component whose load (requests legacy applications the Boeing/Pratt & Whitney example. per second, for example) increases as the “I don’t know what computer language Clearly we must be able to provide high system expands will at some point become they’ll be using in a hundred years, but it levels of security, but there is more to the saturated, and performance will suffer. will be called Fortran” was a popular problem. Security can be costly in perfor- refrain in the 1980s. Hundreds of millions mance, capability restriction, and other Tolerate faults of lines of legacy code today are written in dimensions. Moreover, different users and Several years ago Leslie Lamport languages as varied as Lisp, RPG, Cobol, organizations want to enforce very differ- quipped, “A distributed system is one in assembler, C/C++, Java, and (of course) For- ent policies. The challenge is to provide which I cannot get something done be- tran. One thing is certain: Those codes will each user and organization with just the cause a machine I’ve never heard of is not be replaced overnight, and we will still right mechanism and policy rules but still down.” This indictment is driven by a sim- want to be able to run them in distributed to allow different users and organizations ple fact: Without mechanisms to deal with environments. The implication is that there to interact. May 1999 33 Because of this generality, Legion’s object space is results directly to data-dependent receivers. For exam- more flexible than conventional file systems. For ple, if the caller does not directly use the result of a example, users can customize individual files to better remote method, but needs it only as a parameter for suit application-specific behaviors such as specialized future invocations, the caller will never receive the file access patterns. Consider a file that contains a two- result. The macrodataflow protocol avoids the unnec- dimensional grid of data items. In a traditional file essary act of communicating the result back to the interface, accessing a single grid row or column might caller, and instead forwards the message directly to require multiple file operations. In Legion, users can the objects where it is needed. define an extended file type to represent the 2D file Legion fully automates the macrodataflow protocol. object, with additional methods to permit row and Clients can specify and execute program graphs of inter- column access. dependent remote method invocations using macro- dataflow library interfaces, or via Legion-aware com- Enabling interprocess communication pilers such as the Mentat Programming Language At the lowest level, Legion objects communicate via Compiler.2 Similarly, object developers need not be message passing to transmit method parameters and aware of macrodataflow; Legion automatically matches results. However, applications for wide-area systems incoming method parameters from multiple sources need tools to reduce communication and to tolerate into complete method invocations, and forwards out- high latencies. To address these requirements, Legion going results directly to data-dependent recipients. supports macrodataflow, a variation of the traditional remote method invocation model. Protecting resources and applications Like other asynchronous remote method mecha- Wide-area operating systems must protect the secu- nisms, macrodataflow permits multiple concurrent rity of both local resource providers and application invocations and lets users overlap remote methods and users. Resource providers require that the wide-area local computation. However, unlike other remote operating system manage local resources in accor- method protocols, macrodataflow forwards method dance with local policies. Application programmers How Legion Differs from ... age and reason about resources. The goal models of the two systems differ in many was to reconstruct a coherent computing respects. Globe objects are passive and are Common Object Request environment with core operating system physically distributed over potentially Broker Architecture capabilities over a complex, heterogeneous many resources, whereas Legion objects CORBA 3.0 defines communication environment. Thus, Legion can be used are active, independent entities. Because of protocols, naming and binding mecha- simply for its high-level operating system this difference, Legion provides a more nisms, invocation methods, persistence, services to run, schedule, and manage unified view of system components. and many other features and services legacy applications in a network, but it can Whereas in Globe there is a dichotomy essential for an object-based architecture.1 mimic the CORBA standard for integrat- between objects and processes, in Legion, As such, its feature set and Legion’s over- ing applications. These two aspects com- objects are themselves the units of compu- lap in many areas. bined give Legion its real power. tation, providing the basis for distribution, The two architectures differ in their As CORBA evolves, some operating sys- scheduling, and resource management. underlying emphasis, however. CORBA tem-type services are starting to be defined Globe and Legion both provide a plat- was initially a reaction to the software inte- for it. Scalability and other wide-area con- form for constructing applications based gration problem. Differences between soft- cerns are becoming more important. It on interoperable components. But Legion ware components in location, vendor, remains to be seen how well its architec- differs significantly in also providing an implementation language, or execution ture will accommodate these changes. integrated infrastructure for resource man- platform made building integrated applica- agement. This hallmark of a wide-area tions difficult if not impossible. CORBA Globe operating system is essential for large-scale developers focused on enabling interoper- The Globe project2at Vrije University resource sharing. ability, and the architecture provides a com- also shares many goals and attributes with mon, object-based playing field where Legion. Both occupy middleware roles Globus components can communicate and interact. (running on top of existing host operating The Globus project3at Argonne National In contrast, Legion began with funda- systems and networks), both support Laboratory and the University of Southern mental computing resources on a wide- implementation flexibility, both have a sin- California has the same base of target envi- area network—CPU, disk, data, and so gle uniform object model and architecture, ronments, technical objectives, and target on—and built an overarching framework and both use objects to abstract imple- end users as Legion, and shares some of its for them. It emphasizes the ability to man- mentation details. However, the object design features. However, Globus and 34 Computer must satisfy the security requirements of their appli- control decisions on the basis of credentials passed cations. along with method parameters. Credentials consist of Legion’s security mechanisms are an integral part a free-form set of rights signed by a responsible client. of its object architecture. The basic Legion security The default MayI implementation is based on user- service is user-selectable data privacy and integrity configurable access control lists, including the notion within the Legion message-passing layer. Legion lets of groups. messages be fully encrypted for privacy, digested and In addition to access control mechanisms, operating signed for integrity checking, or sent in the clear if low systems must define mechanisms for user identity and performance overhead is an application priority. authentication. Users (like all other Legion entities) Cryptographic services in Legion are based on the RSA are represented by objects, which are assigned unique public key system (http://www.rsa.com). To protect LOIDs. The user’s LOID contains his public key, but against certain kinds of public key tampering, objects the user keeps his private key safe through arbitrary encode their RSA public keys directly into their local means, such as a smart card. Trusted Legion pro- LOIDs. Simply by knowing an object’s LOID, a client grams executed by the user (the Legion login shell, for can communicate securely with that object. example) rely on the user’s private key to sign appro- In any operating system, access control and resource priate credentials for outgoing methods. These cre- protection are central issues. In Legion, all resources dentials form the basis for authenticating the user and are represented by objects, so access control and are typically used in conjunction with per-object access resource protection are specified entirely at the object control lists to enforce user access control. level. Invoked objects autonomously enforce access control invocation by invocation, using a mandatory APPLICATIONS OF LEGION internal method called MayI. When a method invo- Legion’s services can accommodate a variety of cation arrives at an object, it is first processed by the domains and platforms. Two current applications object’s MayI method, which can enforce an arbitrary illustrate its flexibility in supporting distributed enter- access control policy. Typically, MayI makes access prise computing. Legion have fundamentally different high- basing a system on a cohesive, compre- cessing distributed applications. As such, level objectives. Globus provides a basic set hensive, and extensible design outweigh the the Web is the perfect front end, or inter- of services that lets users write applications short-term advantages of evolutionary face, to applications running in wide-area for a wide-area environment. Working composition of existing services. operating systems such as Legion. Ap- components become part of a composite plication interfaces can be written in Java distributed computing toolkit. Legion, in The Web or they may use HTML and the Common contrast, strives to reduce complexity and The Web is not a single entity whose Gateway Interface (CGI). They can com- to provide the programmer with a single characteristics can be isolated and ana- municate with back-end applications using view of the underlying resources, so it lyzed. Rather, it is a broad collection of either native socket protocols, HTTP, or builds higher level system functionality on applications, protocols, and libraries higher level interfaces provided by the top of a single unified object model. focused on content delivery to end users wide-area operating system. Viewed this The Globus approach has several strong running browsers. Advances in Web way, the Web and wide-area operating sys- points. One is that it takes great advantage browser interfaces and functionality have tems such as Legion are complementary. of code reuse and builds on user knowledge driven the Web revolution, transforming For many users, the Web provides the most of familiar tools and work environments. it from an elitist tool to an omnipresent natural window into the Legion universe. This approach also has several drawbacks. phenomenon. Given that the Web is most As the number of services grows, the lack users’ primary experience with distributed References of a common programming interface and computing, it is important to define its role 1. “The CORBA Connection,” Comm. model becomes a significant burden. By in wide-area computing. ACM, Nov. 1998, special issue on providing a common object programming The Web in its current form clearly does CORBA, K. Seetharaman, ed. model for all services, Legion permits users not constitute a wide-area operating sys- 2. M. Van Steen, P. Homburg, and A. Tanen- and tool builders to combine the many ser- tem. Basic operating system issues, such as baum, “Globe: A Wide-Area Distributed vices available in the wide-area operating resource management and task scheduling, System,” IEEE Concurrency, Jan. 1999, system: schedulers, I/O services, applica- are simply not part of the Web’s structure. pp. 70-78. tion components, and so on. For example, This is not an indictment of the Web, but a 3. I. Foster and C. Kesselman, “Globus: A users can run the same access control tools recognition of its true strength as a remote Metacomputing Infrastructure Toolkit,” to configure security for files and for hosts. access medium for distributed content and Int’l J. Supercomputer Applications, Vol. We believe the long-term advantages of a ubiquitous interface technology for ac- 11, No. 2, 1997, pp. 115-128. May 1999 35 MRI Harvard MRI partner partner (exploded view) Monitor partner Partner host objects Partner host host Encrypted manager object MRI images Image Scanner MRI processing Download partner Write object binary and spawn Poll for Start and new hosts monitor MRI Read collection MRI objects MRI Dump collection collection MRI directory manager object partner Figure 2. The MRI data collection sys- MRI data collection Class Manager goes down, its own higher-order Class tem in development The MRI data collection system in development for Manager will detect the loss and restart it using the for Harvard Medical Harvard Medical School (see first example in the side- replica. This detection and restart behavior recurses School. The compo- bar “Challenges for a Wide-Area Operating System”) up a tree of metamanagers (typically only one or two nents of the MRI data is a good illustration of an application structure that levels) to the root Legion manager object, which has collection applica- fits well with Legion’s services.The components of the a hot spare. tion run on central MRI data collection application run on central servers The Class Manager, Host, and other objects in the servers at Harvard at Harvard and on front-end computers at the MRI system are all configured with strict access control. and on front-end centers. Figure 2 shows the architecture. Each leaf Calls to various objects must present credentials to computers at the MRI node has an MRI collection object (blue) that scans gain authorization. The MRI collection application centers. the local disk for specially tagged MRI images that the and its Legion infrastructure are owned and accessi- scanner has dumped. The MRI collection object copies ble only by a small set of Legion users at Harvard. these images into its persistent data space so that they These users can centrally monitor and configure the will not be deleted when the scanner’s “dumping direc- system using Legion tools that provide views of all the tory” is automatically cleaned up. Periodically the hosts, objects, and so forth that are running or down. MRI collection object calls the image processing object at Harvard (red) to upload the data in encrypted form, Climate modeling authenticating itself by including appropriately signed Climate modeling has progressed beyond basic certificates in the method invocations. When it receives atmospheric simulations to include multiple aspects a complete batch of scans, the image processing object of the earth system, such as full-depth ocean models, starts an image-processing pipeline, which consists of high-resolution land-surface models, sea ice models, objects automatically scheduled onto local compute and chemistry models. Typically, these models come servers. The final stage of the processing pipeline from different research groups at a variety of institu- inserts the results in the project’s image database. tions, are written in different languages, and require When a leaf node is rebooted, the node’s Host object different resources. As described in the fourth exam- (yellow) starts automatically and registers with its ple in the sidebar “Challenges for a Wide-Area manager (green) in the larger Legion net. The Class Operating System,” coupled applications composed Manager object (pink) for the MRI collection com- from existing models require the ability to coordinate ponent detects, via polling of the green Host object existing components and to manage combined Class Manager, that the node is up and requests a resources. restart of the blue MRI collection object for that node. Legion’s ability to combine and add value to exist- The yellow Host object on the node handles the ing components to create more complex applications request, detecting simultaneously if the MRI collec- fits nicely with this application. To construct the cou- tion object has been upgraded and, if so, download- pled climate model system, developers use the existing ing the new executable automatically. As it comes up, simulations as implementations for two new Legion the MRI collection object recovers its state, which may object types: Global Model and Mesoscale Model. In include as-yet-untransmitted MRI scans. doing so, they modify the simulations to enable link- Both the Host object and MRI collection object age to a Legion object interface (described in IDL), Class Managers have replicated persistent state. If the and modify the I/O calls in the models to use Legion 36 Computer file objects in place of the local file system. Each new This work was partially supported by DARPA model object supports a method to request the exe- (Navy) contract #N66001-96-C-8527, DOE grant cution of a simulation time interval. Coordination and DE-FD02-96ER25290, DOE contract Sandia LD- coupling of the model objects is accomplished through 9391, Northrup-Grumman (for the DoD HPCMOD/ the use of a Legion Coupler object. This object also PET program), and DOE D459000-16-3C. transforms data from each model into the format required by the other (for example, the models employ geographic grids that differ by an order of magnitude in resolution). References Legion also satisfies this application’s requirements 1. A. Grimshaw et al., “Architectural Support for Extensi- for managing resources. For example, application bility and Autonomy in Wide-Area Distributed Object developers can configure the Class Manager for the Systems,” Tech. ReportCS-98-12, Computer Science Global Model object to know that a Cray T3E is Dept., Univ. of Virginia, Charlottesville, Va., 1998. required for this object type. When the model becomes 2. A. Grimshaw, “Easy-to-Use Object-Oriented Parallel Pro- available on the IBM SP, they can reconfigure the Class cessing with Mentat,” Computer, May 1993, pp. 39-51. Manager with a single command to account for this new resource selection possibility. When a user wants to run the complete coupled simulation, a standard Legion component—the Scheduler object—coordi- Andrew Grimshawis an associate professor of com- nates the acquisition of all needed resources (such as puter science at the University of Virginia, where his a T3E or SP to run the global model, a T90 to run the research interests include metasystems, high-perfor- mesoscale model, and a workstation to host the mance parallel computing, heterogeneous parallel Coupler object). Regardless of the resources selected, computing, compilers for parallel systems, and oper- Legion automatically takes care of installing the ating systems. He is the chief designer and architect needed application components at the target sites, and of Mentat and Legion. He received a PhD in computer it uses the appropriate interfaces for the local site’s science from the University of Illinois at Urbana- task and storage allocation. Champaign. Adam Ferrariis a research scientist with the Legion We are continuing to develop higher level ser- project in the Department of Computer Science at the vices in Legion as we acquire more infor- University of Virginia. His research interests include mation from applications. For example, high-performance distributed computing, operating broad classes of applications can profit from similar systems, metacomputing, and computer security. He fault-response techniques. To address this need, we received an MS from Emory University and a PhD are designing drop-in fault tolerance modules based from the University of Virginia, both in computer sci- on the existing detection and reporting infrastructure. ence. He is a member of the IEEE Computer Society We also plan to develop new application tools, such as and the ACM. an integrated Legion debugger, and to port applica- tion toolkits such as Netsolve (http://www.cs.utk.edu/ Frederick Knabeis a senior research scientist in the netsolve). These efforts are guided by our close col- Department of Computer Science at the University of laborations with an expanding set of applications Virginia. His research interests include wide-area com- groups, such as the Harvard Medical School and the puting, computer security, and software risks. He climate modeling groups mentioned earlier. Finally, received a PhD in computer science from Carnegie we are actively engaged in commercializing the Legion Mellon University. platform for use in Internet and enterprise settings. For more information, visit the Legion site (http:// Marty Humphreyis a research assistant professor of legion.virginia.edu). v computer science at the University of Virginia. His research interests include real-time operating systems, real-time scheduling, distributed computing, and metacomputing. He received a PhD in computer sci- Acknowledgments ence from the University of Massachusetts and is a We thank Charles Guttmann of the Department of member of the IEEE. Radiology, Harvard Medical School, for the MRI example, and Greg Follen of NASA’s Lewis Research Center for briefing us on Boeing and Pratt & Whitney. Contact the authors at {grimshaw, ferrari, knabe, We also thank Sarah Wells for her assistance. humphrey}@virginia.edu. May 1999 37

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.