, N 9 - 15 047 Storage Needs in Future Supercomputer Environments Notes for the presentation by: Sam Coleman Lawrence Livermore National Laboratory July 25, 1991 at the NASA Goddard "Mass Storage Workshop" Introduction the devices currently attached to them are individually slow; storage systems based on The Lawrence Livermore National this architecture will become bottlenecks Laboratory (LLNL) is a Department of on HIPPI and other high-performance Energy contractor, managed by the networks. Computers with the I/O-channel University of California since 1952. Major performance to match these networks will projects at the Laboratory include the be even more expensive than the current machines. Strategic Defense Initiative, nuclear weapon design, magnetic and laser fusion, The need for higher-performance storage laser isotope separation and weather mod- eling. The Laboratory employs about 8,000 systems is being driven by the remarkable peopJe. There are two major computer advances in processor and memory tech- centers: The Livermore Computer Center nology available on relatively inexpensive and the National Energy Research workstations; the same technology is Supercomputer Center. making high-performance networks pos- sible. These advances will encourage scientific-visualization projects and other As we increase the computing capacity of applications capable of generating and LLNL systems and develop new applica- tions, the need for archival capacity will absorbing quantities of data that can only be increase. Rather than quantify that in- imagined today. crease, I will discuss the hardware and software architectures that we will need to To provide cost-effective, high-perfor- mance storage, we need an architecture like support advanced applications. that shown in Figure 1. In this example, striped storage devices, connected to a Storage Architectures HIPPI network through device controllers, transmit large blocks of data at high speed. The architecture of traditional super- Storage system clients send requests over a computer centers, like those at Livermore, lower-performance network, like an include host machines and storage systems Ethernet, to a workstation-class machine linked by a network. Storage nodes consist controlling the storage system. This of storage devices connected to computers machine directs the device controllers, also that manage those devices. These comput- over a lower-performance path, to send ers, usually large Amdahl or IBM main- data to or from the HIPPI network. Control frames, are expensive because they include messages could also be directed over the many I/0 channels for high aggregate HIPPI network, but these small messages performance. However, these channels and 276 would decrease the efficiency of moving formance of the system when large data large data blocks; since control messages blocks are accessed (this architecture will are small, sending them over a slower not be efficient for applications, like NFS, network will not degrade the overall per- that transmit small data blocks). Workstations Controller Striped Devices Workstation Class Machine Lower-Performance Control Network A High-Performance Storage Architecture Figure 1 To make the architecture in Figure 1 High-capacity media, with at least efficient, we will need the following the capacity of the largest D2 tape components: cartridges; Programmable device controllers • Robotics to mount volumes quickly; imbedding relatively high-level data-transfer protocols; Devices and systems that are more reliable than the 1-in-10 '2 error High-performance, possibly striped, rates quoted today; and archival storage devices to match the performance of the HIPPI network. Devices that are less expensive than These devices should be faster than the current high-performance de- the D1 and D2 magnetic tapes being vices. developed today; 277 In short, we need reliable, automated As a step toward the Figure 1 architecture, archival devices with the capacity of Creo we are investigating the architecture optical tapes (one terabyte per reel), the shown in Figure 2; we will connect existing performance of Maximum Strategy disks storage devices to our Network Systems (tens of megabytes per second), and the Corp. HYPERchannel, controlled by a cost of 8mm tape cartridge systems (less workstation-based UniTree System. Even than $100,000). though the hardW_ire connections are Control Crays Data E t h Mover Gray Disks e r n HYPERchannel e t s UniTree DX Workstation STK 3480 Tapes Amdahl 6380 Disks An Interim Storage Architecture at LLNL Figure 2 available today, the necessary software is mainframes that we use to control the not. In particular, there is no high-level current archive. file-transport software in the NSC DX HYPERchannel adapter. As an interim Software Needs solution, we will put IEEE movers _on our host machines, allowing direct file- To implement high.performance storage transport to and from the storage devices architectures, we need file-transport over the HYPERchannel. The UniTree software that supports the network- workstation will provide service to client attached devices in Figure 1. Whether or workstations and other network machines. not the TCP/IP and OSI protocols can This is acceptable, in the near term, transmit data at high speeds is subject to because most of the archival load comes debate; if not, we will have to develop new from the larger host machines. This protocols. architecture will replace the =Amdahl 278 From the human client's point of view, we Replication need software systems that provide Clients do not know if objects or transparent access to storage. Several services are replicated, and services transparencies are described in the IEEE do not know if clients are replicated. Mass Storage System Reference Model document: 1 Semantic The behavior of operations is inde- Access pendent of the location of operands Clients do not know if objects or and the type of failures that occur. services are local or remote. Syntactic Concurrency Clients use the same operations and Clients are not aware that other parameters to access local and re- clients are using services concur- mote objects and services. rently. The IEEE Reference Model Data representation Clients are not aware that different One way to achieve transparency is to de- data representations are used in velop distributed storage systems that span different parts of the system. clients environments. In homogeneous environments, like clusters of Digital Execution Equipment Corp. machines, transparency Programs can execute in any location can be achieved using proprietary soft- without being changed. ware. In more heterogeneous super- computer centers, standard software, Fault running on a variety of machines, is Clients are not aware that certain needed. The IEEE Storage System Standards faults have occurred. Working Group is developing standards (project 1244) on which transparent Identity software can be built. These standards will Services do not make use of the be based on the reference model shown in identity of their clients. Figure 3. The modules in the model are: Location Application Clients do not know where objects or Normal client applications codes. services are located. Bitfile Client Migration This module represents the library Clients are not aware that services routines or the system calls that in- have moved. terface the application to the Bitfile Server, the Name Server, and the Naming Mover. Objects have globally unique names which are independent of resource Bltfile Server and accessor location. The Bitfile Server manages abstract objects called bitfiles that represent Performance uninterpreted strings of bits. Clients see the same performance regardless of the location of objects Storage Server and services (this is not always The module that manages the actual achievable unless the user is willing storage of bitfiles, allocating media to slow down local performance). extents, scheduling drives, requesting 279 volume mounts, and initiating data Mover transfers. The Mover transmits data between two channels. The channels can be connected Physical Volume Repository to storage devices, host memories, or The PVR manages physical volumes networks. (removabledisks, magnetictapes,etc.) and mounts them on drives, robotically Site Manager or manually, upon request. This modules provides the administra- tion interface to all of the other modules of the model. Biffile Biffile Mover Mover Physical Applications Bitfile Biffile Storage Volume Client Server Sewer Name All Modules Server. Control Data Volumes The IEEE Mass Storage System Reference Model Figure 3 The key ideas that will allow standards tiers, allowing the other modules in based on the reference model to support the model to support a variety of transparency are: different naming environments. The Mover separates the data path The modularity of the Bitfile Client, from the control path, allowing the Bitfile Server, Storage Server, and controller-to-network path shown Physical Volume Repository allows in Figure 1. support for different devices and client semantics with a minimum of The Name Server isolates the map- device- or environment-specific ping of human-oriented names to software. machine-oriented bitfile indenti- 28O I would like to encourage people attending the Goddard conference to support the IEEE standards effort by participating in the Storage System Standards Working Group. For more information, contact me at: Sam Coleman Lawrence Livermore National Laboratory Mail Stop L-60 P. O. Box 808 Livermore, Ca. 94550 (41 5) 422-4323 [email protected] Until standard software systems are available, there are steps that the storage industry can take toward more transparent products. The Sun Microsystems Network File System and the CMU Andrew File System provide a degree of transparency. Work on these systems to improve their security and performance, and to provide links to hierarchical, archival systems, will improve their transparency. I would suggest that software vendors strive to provide operating-system access to archival storage systems, possibly through mechanisms like the AT&T File System Switch. To learn more about all of the storage issues that I have mentioned, I would encourage you to attend the 11th IEEE Mass Storage Symposium in Monterey, California October 7-10, 1991. For details, contact: Bernie O'Lear National Center for Atmospheric Research P. O. Box 3000 Boulder, Colorado 80307 Reference 1. Coleman, S. and Miller, S., editors, A Reference Model for Mass Storage Systems, IEEE Technical Committee on Mass Storage Systems and Technology, May, 1990. 2_1 T-- < O3 ,< z == G) 0 L. == LI. U.I 1.._ 0 0 _- _-J O_ oJ 0 09 o9 z 0 q,,, im 283 r_ oO _C Z A T-" 0 Z 0 m s_... u _3 .c ILl 0 --" mm •g,mp e- mm U tv.. !'- "' ..r- ,,I,,,d "-- (_ nm ,,4rid (_ ,,4,,d U 18_ o • • • o o o o o o • • • • • • - ,,j