ebook img

Arrows for Secure Information Flow - the Department of Computer PDF

38 Pages·2009·0.27 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Arrows for Secure Information Flow - the Department of Computer

Arrows for Secure Information Flow Peng Li Steve Zdancewic University of Pennsylvania, Department of Computer and Information Science, 3330 Walnut Street, Philadelphia, Pennsylvania, 19104-6389, USA Abstract This paper presents an embedded security sublanguage for enforcing information- flow policies in the standard Haskell programming language. The sublanguage pro- vides useful information-flow control mechanisms including dynamic security lat- tices, run-time code privileges and declassification all without modifying the base language. This design avoids the redundant work of producing new languages, low- ers the threshold for adopting security-typed languages, and also provides great flexibility and modularity for using security-policy frameworks. The embedded security sublanguage is designed using a standard combinator in- terface called arrows. Computations constructed in thesublanguage have static and explicit control-flow components, making it possible to implement information-flow control using static-analysis techniques at run time, while providing strong secu- rity guarantees. This paper presents a formal proof that our embeddedsublanguage provides noninterference, a concrete Haskell implementation and an example appli- 1 cation demonstrating the proposed techniques. Key words: Information flow, security, Haskell, arrows, type systems, combinators 1 Introduction Language-based information-flow security (Sabelfeld and Myers, 2003) has a long, rich history with many (mostly theoretical) results. This prior work has focused mainly on the problems of static program analysis for a wide variety of computation models and policy features. Often these analyses are presented as type systems whose soundness is justified by some form of noninterference 1 This paper is an expanded version of an earlier paper that appeared in IEEE CSFW (Li and Zdancewic, 2006). Email addresses: [email protected] (Peng Li), [email protected] (Steve Zdancewic). Preprint submitted to Elsevier 18 December 2009 result. The approach is compelling because programming-language techniques can be used to specify and enforce security policies that cannot be achieved by conventional mechanisms such as access control and encryption. Two full- scale language implementations have been developed: Jif (Myers, 1999; Myers et al., 2001) is a variant of Java, and Flow Caml (Simonet, 2003; Pottier and Simonet, 2002) is an extension of Caml. However, despite this rather large (and growing!) body of work on language- based information-flow security, there has been relatively little adoption of the proposed techniques—two success stories are the “taint-checking mode” available in the Perl language and the use of information-flow analysis to separate lower integrity components from higher integrity components built in the SparkAda (Chapman and Hilton, 2004) language. One important reason why these domain-specific security-typed languages have not been widely applied is that, because the information-flow policies are intended toapply inanend-to-endfashion, thewhole system hasto bewritten in the new language. However, it is expensive to build large software systems in a new language. Doing so can be justified only if the benefit of using the new language outweighs the cost of migrating to the new language—including the costs of retraining programmers and the time and expense necessary to port existing libraries and other code bases. Moreover, in practice, it may well be the case that only a small part of the system (maybe only a few variables in a large program) has information-flow security requirements. Although the system may be large and complex, the secret information flow in the system may not be completely unmanageable. In such cases, it is probably moreconvenient to use the programming language that best fits the primary functionality of the system rather than its security requirements, and manage security issues by traditional means such as code auditing and careful software engineering practices. Such practical solutions are often a compromise, because there is no provable security indeed. In re- ality, there is a language adoption threshold based on the ratio of security requirements to functionality requirements, and this threshold is very high. 1.1 Background on security-typed languages In security-typed languages like Jif (Myers, 1999; Myers et al., 2001) and Flow Caml (Simonet, 2003; Pottier and Simonet, 2002), variables can have security annotations in their type declarations. Each variable has a security level, represented syntactically by a label in the program. The security levels are usually partially ordered and form a lattice. For example, the simplest security lattice {L,H},L ⊑ H defines two labels L and H and specifies an or- 2 dering between the two security levels represented by these labels. At compile time, security-typed programs are checked to guarantee that there is no infor- mation flow from higher security levels to lower security levels; violations of this information-flow policy result in type errors. Security-typedlanguagesoftenfavorstatictypechecking,becauseinformation- flow analysis requires the view of the entire control-flow graph in order to examine implicit information flows caused by conditional branches. For ex- ample, the C program “if (h==0) l=0; else l=1;” has an implicit flow from h to l and, although there is no explicit assignment “l=(h==0);”, these code fragments are effectively equivalent. In a dynamically-typed language, such implicit flows are difficult to capture. For example, the taint-checking mode of Perl does not capture the implicit flow in the above code: even if h is tainted and there is information flow from h to l, l is still not tainted. This is fine because the taint-checking mode is used only to provide a modest level of integrity guarantees. However, for confidentiality purposes, implicit flows are often unacceptable, especially when the program is not trusted. To make security-typed languages practical, a feature called declassification is necessary: sometimes, we do need information flow from higher levels to lower levels, but only in permitted ways. For example, secret information can be sent to public places after it is encrypted. One popular solution is to make declassification an explicit (and unsafe) type cast; it is thus the programmers’ responsibility to use declassification safely. When the code is not trusted, declassification can be dangerous. The decentralized label model (Myers and Liskov, 2000) solves this problem by assigning code privileges to program modules, so each module can only declassify information of certain security levels that are determined by the code privilege. 1.2 Embedded security-typed sublanguages Thispaper presents adifferent approachtoenforcinginformation-flowsecurity policies. Rather than producing a new language from scratch, we show how to encode traditional information-flow type systems using general features of an existing, modern programming language. In particular, we show how the abstract data type and the type class features found in Haskell (Peyton Jones et al., 2002) can be used to build a module that effectively provides a security-typed sublanguage embedded in Haskell itself. This sublanguage can interoperate smoothly with existing Haskell code while still providing strong information-flow security guarantees. Importantly, we do not need to modify the design or implementation of Haskell itself—we use features in its standard (but advanced) type system. 3 Our approach reduces the adoption threshold for systems implemented in Haskell: such systems can be made more secure without completely rewriting them in a new language. The implementation can be a fine-grained mixture of normal code and security-hardened code (variables, data and computations over secure data). The programmer needs to protect only sensitive data and computationusing asoftwarelibrary,whichenforcestheinformation-flowpoli- cies throughout the entire system and provides end-to-end security goals like noninterference. Another benefit of our approach is flexibility. A specialized language like Jif must pickafixedpolicyframeworkinwhichthesecuritypoliciesareexpressed. Considering the plethora of features present in the literature expressing the label lattice, declassification options (Sabelfeld and Sands, 2005), dynamic policy information (Tse and Zdancewic, 2004a; Zheng and Myers, 2004), etc., it is unlikely that any particular choice of policy language will be suitable for all programs with security concerns. By contrast, since it is much easier to build a library module than to build a new language, it is conceivable that different programs would choose to implement entirely different policy frameworks. Our embedded sublanguage approach is modular in the sense that it provides an interface through which the programmer can choose which policy framework and type system to use for specific security goals. In this paper we sketch one possible policy framework that illustrates one particular choice of label lattice, declassification mechanism, and support for dynamic policies, but others could readily be implemented instead. Although we use Haskell’s advanced type system and helpful features like the ability to overload syntax, studying how to encode information-flow policies in the context of Haskell can point to how similar efforts might be undertaken in more mainstream languages like Java. Also, since the features we use are intended to be “general purpose,” they are more likely to find a home in a mainstream language than the less widely applicable security types. Evidence of this can be found, for example, in Sun’s recent addition of parametric polymorphism, a key component of our approach, to Java. 1.3 Overview of technical development There are two key technical challenges in embedding a useful security-typed sublanguage in Haskell. The first problem is that enforcing information-flow policies requires static analysis of the control flow graph of the embedded programs—purely dynamic enforcement mechanisms are generally too conser- vative in practice. The second problem is representing the policy information itself—depending on the desired model, the policy information might be quite complex, perhaps depending on information available only at run time. 4 Our solution to the first problem is to use arrows (Hughes, 2000). Intuitively, arrows provide an abstract interface for defining embedded sublanguages that support standard programming constructs familiar to programmers: sequen- tial composition, conditional branches, and loops. Haskell provides convenient syntactic sugar for writing programs whose semantics are given by an arrow implementation. To address the second problem, we use Haskell’s type class mechanism to give an interface for security lattices. Programs written in the embedded language can be parameterized with respect to this interface. Moreover, the embedded languagecaneasily begiven security-specific featuressuch asa declassification operation or run-time representation of privileges for access-control checks. In both cases, we make use of Haskell’s strong type system to guarantee that the abstractions enforcing the security policies are not violated. This encap- sulation means that it is not possible to use the full power of the Haskell language to circumvent the information-flow checks performed by the embed- ded language, for example. The rest of the paper is organized as follows. Section 2 presents a brief tu- torial of the arrow interface. Section 3 presents the detailed implementation of an arrow-based sublanguage for information-flow control. Section 4 gives some example programs that illustrate how the secure embedded language and the Haskell program can be smoothly integrated and considers the issues with enforcing the desired security properties. Section 5 formalizes and proves the security guarantee of our security sublanguage. Section 6 discusses some limitations and caveats with this approach and describes some future work. Section 7 concludes. 2 The arrows interface The concept of arrows was proposed by Hughes (2000) as a generalization of monads (Wadler, 1992). Both monads and arrows are generic interfaces for constructing programs using combinators. This section presents an informal 2 and brief tutorial of arrows and shows how arrows can be used to construct computations with explicit control flow structures. 2 The related references (Hughes, 2000; Paterson, 2003, 2001) in the bibliography can be used for more detailed studies of arrows. 5 2.1 Definition of arrows ThefollowingcodespecifiesthesimplestArrowtypeclass.Here,aisanabstract type of arrow with input type b and output type c. This arrow type class supports only three operations: pure, (>>>), and first. (cid:7) (cid:4) class Arrow a where pure :: (b -> c) -> a b c (>>>) :: a b c -> a c d -> a b d first :: a b c -> a (b, d) (c, d) (cid:6) (cid:5) Basic blocks and compositions The pure operation lifts a Haskell function of type b->c into the arrow; such lifted functions serve as the “basic blocks” of the control-flow graphs con- structed via arrow combinators. The infix operation (>>>) provides sequential (horizontal) composition of computations, and first provides parallel (verti- cal) composition of computations. An instance of the Arrow type class is required to satisfy a set of axioms that specify coherence properties between the operations. For example, (>>>) is required to be associative. We omit the complete description of the arrow axioms here; for our purposes, there is only one interesting case to consider, and it is discussed in Section 6. The simplest instance of Arrow is Haskell’s function arrow constructor (->) itself: every function of type b -> c is also an arrow computation (->) b c. Using the concept of monad transformers (Liang et al., 1995), one can also build arrow transformers, which correspond closely to the categorical notion of functors. Monad and arrow transformers allow program functionality be composed in a modular, layered fashion. Representing conditionals and loops The basic Arrow interface does not provide the ability to construct condi- tional computations—it can construct only control-flow graphs that represent straight-line code with no branches. Two other type classes refine arrows by permitting conditional branches and loops. The ArrowChoice type class provides an operation called left that extends the Arrow interface with the ability to perform a one-sided branch computa- tion depending on the arrow’s input value. In Haskell, the type Either b d describes a value that is a tagged union (or option) type that carries either a value of type b or one of type d. 6 (cid:7) (cid:4) class Arrow a => ArrowChoice a where left :: a b c -> a (Either b d) (Either c d) (cid:6) (cid:5) Using left and the other arrow primitives, the following operations can be implemented to construct different kinds of conditional computations: (cid:7) (cid:4) right :: a b c -> a (Either d b) (Either d c) (+++) :: a b c -> a b’ c’->a (Either b b’) (Either c c’) (|||) :: a b d -> a c d -> a (Either b c) d (cid:6) (cid:5) ArrowLoop provides the embedded language with a loop construct sufficient for encoding while and for loops. Intuitively, the loop operator feeds the d output of the arrow back into the d input of the arrow, introducing a cycle in the control-flow graph: (cid:7) (cid:4) class Arrow a => ArrowLoop a where loop :: a (b, d) (c, d) -> a b c (cid:6) (cid:5) The benefit of having this operation is that recursive computations can be constructed as a finite combination of arrow components. We will use this property in Section 3.6. Translating the do-syntax Programming directly with the arrow operations is sometimes cumbersome, because arrows require a point-free programming style. The do-syntax for ar- rows (Paterson, 2001) provides syntactic sugar for arrow programming, such as arrow abstraction, arrow application, sequential composition, conditional 3 branching and recursion. Internally, the Haskell compiler translates the do- syntax used in the embedded sublanguage into the basic arrow operations. For example, conditional statements are translated to pure, >>> and ||| opera- tors, using the following rule: [[ proc p -> if e then c1 else c2 ]] = pure (\p -> if e then Left p else Right p) >>> [[ proc p -> c1 ]] ||| [[ proc p -> c2 ]] This rule translates the if command in the sublanguage. The if construct in the translated code is the conditional expression in the base Haskell language. The sublanguage syntax “proc p->” provides an arrow abstraction that binds the arrow input to the variable p. The compiler is able to resolve this syntax overloading by using type infor- mation. For example, the type Protected mentioned in our implementation described below informs the compiler to use our definition for |||, which is 3 We use the Glasgow Haskell Compiler (http://www.haskell.org/GHC). 7 given by our FlowArrow instance of the ArrowChoice type class. This feature allows programmer-defined embedded sublanguages to have convenient syn- tax. 2.2 Control flow in arrow sublanguages The Haskell programming language itself provides branching, looping, and other control-flow constructs, so one might wonder why it is necessary to re- implement all of these features in the embedded sublanguage. Compared to Haskell’s full control-flow mechanisms (which also include function calls and exceptions, for example), the arrow type classes are actually quite impover- ished. The arrow interfaces isolate the base language (Haskell) and the sub- language (arrows): by design, the control flow constructs in the base language cannot be directly used to represent the control flow of the sublanguage. This separation property is crucial for the security analysis of arrow-based sublanguages. If an arrow implements only the operations in the ArrowChoice type class, conditional branches on arrow computations can only be imple- mented using the given arrow operations left, right, (+++), and (|||). By keeping the arrow implementation abstract, the programmer is forced to use these arrow operations for writing conditional branches, because there is no other way to manipulate the interface. Therefore, by designing the arrow interface with limited control-flow primi- tives, the control-flow graph of an arrow computation is determined by the composition of primitive arrow operations. In other words, arrows can force computations to be constructed with static and explicit control-flow struc- tures. This makes it possible to completely analyze the information flow in an arrow-based sublanguage before running the computation. A more permissive interface to the sublanguage (such as provided by monads for example) would allow base language branches to leak information about supposedly protected data. 3 An embedded security-typed language This section presents thedesign ofour secure embedded sublanguageusing the arrows interface. Our design uses the structure of arrow transformers, which allows arrow sublanguages to be composed in a modular, layered fashion. 8 3.1 Encoding the security lattice In our embedded language, the security labels are encoded using term-level values. We start by defining a generic Haskell interface for security labels and lattices: the type class Lattice provides a set of operations common for all security lattices. (cid:7) (cid:4) class (Eq a) => Lattice a where label_top :: a label_bottom :: a label_join :: a -> a -> a label_meet :: a -> a -> a label_leq :: a -> a -> Bool (cid:6) (cid:5) The programmer has the freedom to choose the implementation of the actual security lattice. Because we are encoding labels using terms, there is no lim- itation on the expressiveness of security policies: any security lattice can be encoded as long as its labels and the operations on them can be represented using Haskell. For simplicity and ease of presentation, we use the following three-point lattice throughout the rest of the paper. (cid:7) (cid:4) data TriLabel = LOW | MEDIUM | HIGH deriving (Eq, Show) instance Lattice TriLabel where label_top = HIGH label_bottom = LOW label_join x y=if x ‘label_leq‘ y then y else x label_meet x y=if x ‘label_leq‘ y then x else y label_leq LOW _ = True label_leq MEDIUM LOW = False label_leq MEDIUM _ = True label_leq HIGH HIGH = True label_leq HIGH _ = False (cid:6) (cid:5) 3.2 Encoding flow types and constraints Thispaperemploysasimpleinformation-flowtypesystemforpurely-functional arrow computations. An arrow computation has an input security label l1 and an output security label l2. Each typing judgment has the form Φ ⊢ c : l1 → l2 where c is a purely functional computation, l1 → l2 is the flow type assigned to c, and Φ is a list of label constraints. The type system is presented in Figure 1. There is also a certification judgement in the type system. We defer its discussion until Section 3.5. 9 The sublanguage types appearing in the typing judgments are encoded using the Haskell data type: (cid:7) (cid:4) data Flow l = Trans l l | Flat (cid:6) (cid:5) (1) Trans l1 l2 specifies a security type l1 → l2. (2) Flat means the input and output can be given the same arbitrary label. It specifies a security type l → l, where the label l can be determined by constraints in the context. The label constraints are encoded using the Constraint data type: (cid:7) (cid:4) data Constraint l = LEQ l l | USERGEQ l (cid:6) (cid:5) (1) LEQ l1 l2 represents a direct ordering between two labels: l1 ⊑ l2. (2) USERGEQl represents the constraint l ⊑ user.It requires that the run-time code privilege, which is represented as a label, be at least l. This will be used in Section 3.5 when we implement declassification. The purpose of the constraint set Φ is to implement late binding of the secu- rity lattice. The type system collects the label constraints when secure com- putations are constructed from individual components. Such constraints are checked when the secure computation is accessed through the policy enforce- ment mechanism, namely, the cert operation. This design makes it possible to use dynamic security lattices and also helps when implementing declassifi- cation. 3.3 Encoding typing judgments and rules The abstract datatype FlowArrow defines our secure embedded language by implementing the arrow interfaces described above: (cid:7) (cid:4) data FlowArrow l a b c = FA { computation :: a b c , flow :: Flow l , constraints :: [Constraint l] } (cid:6) (cid:5) AvalueoftypeFlowArrow l a b cisarecordwiththreefields.Thecomputation fieldencapsulatesanarrowoftypea b cthatistheunderlyingcomputationto be protected. The flow field specifies the security levels for the input and out- putofthecomputation.Theconstraintsfieldstoresthelistofflowconstraints Φ when the arrow computation is constructed from smaller components. FlowArrowencodesaninformation-flowtypingjudgmentforapurelyfunctional arrow computation, using the encoding of flow types and constraints we just 10

Description:
Dec 18, 2009 tices, run-time code privileges and declassification all without et al., 2001) is a variant of Java, and Flow Caml (Simonet, 2003; Pottier and Perl does not capture the implicit flow in the above code: even if h is tainted.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.