ebook img

JSForce: A Forced Execution Engine for Malicious JavaScript Detection PDF

0.65 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview JSForce: A Forced Execution Engine for Malicious JavaScript Detection

JSForce: A Forced Execution Engine for Malicious JavaScript Detection Xunchao Hu, Yao Cheng, Andrew Henderson Yue Duan, Heng Yin Department of EECS, Department of CSE, Syracuse University, USA University of California, Riverside Email: {xhu31, ycheng, anhender}@syr.edu Email: {yduan005, heng}@cs.ucr.edu Abstract—The drastic increase of JavaScript exploitation to execute within a particular environment, since they 7 attackshasledtoastronginterestindevelopingtechniquesto aim to exploit specific vulnerabilities, as opposed to be- 1 enable malicious JavaScript analysis. Existing analysis tech- nign JavaScript, which will run in a more environment- 0 niques fall into two general categories: static analysis and 2 dynamic analysis. Static analysis tends to produce inaccurate independent fashion. Fingerprinting techniques [42] are n results(bothfalsepositiveandfalsenegative)andisvulnerable widely adopted by JavaScript malware to examine the run- a to a wide series of obfuscation techniques. Thus, dynamic time environment. A dynamic analysis system may fail to J analysis is constantly gaining popularity for exposing the observe some malicious behaviors if the runtime environ- typical features of malicious JavaScript. However, existing 6 ment is not configured as expected. Such configuration is dynamicanalysistechniquespossesslimitationssuchas limited 2 quite challenging because of the numerous possible runtime code coverage and incomplete environment setup, leaving a broad attack surface for evading the detection. To overcome environment settings. Hence, existing dynamic analysis sys- ] R theselimitations,wepresentthedesignandimplementationof tems usually share the limitations of limited code coverage a novel JavaScript forced execution engine named JSForce and incomplete runtime environment setup, which leave C which drives an arbitrary JavaScript snippet to execute along attackers with a broad attack surface to evade the analysis. s. different paths without any input or environment setup. We To solve those limitations, Rozzle [26] explores multiple c evaluateJSForceusing220,587HTMLand23,509PDFreal- [ worldsamples.Experimentalresultsshowthatbyadoptingour environment related paths within a single execution. But it forcedexecutionengine,themaliciousJavaScriptdetectionrate requiresapredefinedenvironment-relatedprofileforpathex- 1 can be substantially boosted by 206.29% using same detection ploration.Constructionofacompleteprofileisachallenging v policy without any noticeable false positive increase. We also taskbecauseofthenumerousdifferentbrowsersandplugins, 0 make JSForce publicly available as an online service and 6 especially for recent fingerprinting techniques [31], [32]. will release the source code to the security community upon 8 the acceptance for publication. Thesefingerprintingtechniquesdonotrelyuponanyspecific 7 APIs,andthusRozzlecanbeevadedbecausethepredefined 0 profile cannot include those fingerprinting techniques. Also, 1. I. INTRODUCTION Rozzle may introduce runtime errors because it executes 0 MaliciousJavaScripthasbecomeanimportantattackvec- infeasible paths which may stop the analysis before the 7 torforsoftwareexploitationattacks.Attacksinbrowsers,as malicious code is executed. Revolver [25] employs a ma- 1 : wellasPDFfilescontainingmaliciousembeddedJavaScript, chine learning-based detection algorithm to identify evasive v are typical examples of how attackers launch attacks us- JavaScriptmalware.However,itisdependentuponaknown i X ing JavaScript. According to a recent report from Syman- samplesetandisunabletodetect0-dayJavaScriptmalware. r tec [8], there are millions of victims attacked by malicious Although symbolic execution of JavaScript [37] can be a JavaScript on the Internet each day. applied to explore all of the possible execution paths, the In recent years, a number of techniques [17], [29], [22], performance overhead of a symbolic string solver [41] and [21],[26],[35],[18],[25],[16]havebeenproposedtodetect the dynamic features of JavaScript make it infeasible for malicious JavaScript code. Due to the dynamic features of practical use. the JavaScript language, static analysis [20], [27], [38], [18] In this paper, we propose JSForce, a forced execution can be easily evaded using obfuscation techniques [46]. engine for JavaScript, which drives an arbitrary JavaScript Consequently, researchers rely upon dynamic analysis [17], snippet to execute along different paths without any in- [29], [22] to expose the typical features of malicious put or environment setup. While increasing code cover- JavaScript. More specifically, these approaches rely upon age, JSForce can tolerate invalid object accesses while visiting websites or opening PDF files with a full-fledged introducing no runtime errors during execution. This over- or emulated browser/PDF reader and then monitoring the comesthelimitationsofcurrentJavaScriptdynamicanalysis different features (eval strings [22], heap health [35], etc.) techniques. Note that, as an amplifier technique, JSForce for detection. does not rely on any predefined profile information or full- However, the typical JavaScript malware is designed fledgedhostingprogramslikebrowsersorPDFviewers,and it can examine partial JavaScript snippets collected during 1 if (( navigator .appName.indexOf(”Microsoft an attack. As demonstrated in Section V, JSForce can be Inte” + ”rnet Explorer”) == 1) && ( navigator .userAgent.indexOf(”Windows N” + leveraged to improve the detection rate of other dynamic ”T 5.1”) == 1) && (navigator .userAgent. analysis systems without modification of their detection indexOf(”MSI” + ”E 8.0”) == 1)) { policies. While the high-level concept of forced execution att = btt + 1; 2 has been introduced in binary code analysis (X-Force [33], 3 } if (att == 0) { iRiS [19]), we face unique challenges in realizing this 4 try { concept in JavaScript analysis, given that JavaScript and 5 new ActiveXObject(”UM0QS4dD”); 6 native code are very different languages by nature. } catch (e) { 7 We implement JSForce on top of the V8 JavaScript var tlMoOul8 = ’\x25’ + ’u9’ + ’\x30’ 8 engine [11] and evaluate the correctness, effectiveness, and + ’\x39’ + YYGRl6; runtime performance of JSForce with 220,587 HTML 9 tlMoOul8 += tlMoOul8; var CBmH8 = ”%u”; files and 23,509 PDF samples. Our experimental results 10 var vBYG0 = unescape; 11 demonstrate that adopting JSForce can greatly improve var EuhV2 = ”BODY”; 12 the JavaScript analysis results by 206.29% without any ... 13 noticeable increase in false positives and with reasonable 14 } } performance overhead. 15 setTimeout(”redir ()”, 3000); Our main contributions are summarized as follows: 16 1) WeproposeforcedexecutionofJavaScript,atechnique that forces a JavaScript snippet to execute along differ- ent paths while requiring no inputs or any environment Figure 1: The Malicious JavaScript Sample setup, to overcome the current limitations of existing JavaScript dynamic analysis techniques: limited code coverage and incomplete runtime environment setup. Figure1showsalistingofJavaScriptcodeusedforadrive- 2) To enable forced execution of JavaScript, we develop by-download attack against the Internet Explorer browser. a type inference model to detect and properly re- Line 1 employs precise fingerprinting to deliver only se- cover from exceptions. We have also developed path lectedexploitsthataremostlikelytosuccessfullyattackthe exploration algorithms for malicious JavaScript code browser.Lines5-7containevasivecodetobypassemulation- analysis. based detection systems. More precisely, the code attempts 3) We implement the technique with a prototype sys- to load a non-existant ActiveX control, named UM0QS4dD tem, named JSForce, and evaluate its correctness, (line 6). When executed within a regular browser, this effectiveness, and runtime performance using 220,587 operationfails,triggeringtheexecutionofthecatchblock HTML and 23,509 PDF real-world samples. Experi- that contains the exploitation code (lines 7-14). mental results show that by adopting JSForce, the However,anemulation-baseddetectionsystemmustemu- malicious JavaScript detection rate is substantially in- latetheActiveXAPIbysimulatingtheloadingandpresence creasedby206.29%whilestillusingthesamedetection of any ActiveX control. In these systems, the loading of the policy. This increase comes without any noticeable in- ActiveX control will not raise this exception. As a result, crease in false positives and with runtime performance the execution of the exploit never occurs and no malicious that is very suitable for large-scale analysis. activity is observed. Instead, the victim is redirected to a 4) We create an online service and make JSForce pub- benign page (line 16) if the fingerprinting or evasion stage licly available at [9]. Upon the acceptance for publi- fails. Attackers can also abuse the function setTimeout cation,wewillreleasethesourcecodeofJSForceto to create a time bomb [15] to evade detection. Detection the security community. systems can not afford to wait for long periods of time II. BACKGROUNDANDOVERVIEW during the analysis of each sample in an attempt to capture To provide the reader with a better understanding of the randomly triggered exploits. motivationforoursystemandtheproblemsthatitaddresses, Challenges and Existing Techniques: Static analysis webeginwithadiscussionofthemaliciousJavaScriptcode is a powerful technique that explores all paths of execu- used in drive-by-download attacks. tion. But, one particular issue that plagues static analysis Malicious JavaScript code: Malicious JavaScript code of malicious JavaScript is that not all of the code can is typically obfuscated and will attempt to fingerprint the be statically observed. For example, static analysis cannot version of the victim’s software (browser, PDF reader, etc.), observe malicious code hidden within eval strings, which identify vulnerabilities within that software or the plugins arefrequentlyexploitedbyattackerstoobfuscatetheircode. thatthatsoftwareuses,andthenlaunchoneormoreexploits. Therefore, current detection approaches [17], [29], [22] 2 rely upon dynamic analysis to expose features typically stop the analysis before any malicious code is executed. In seen within malicious JavaScript. More specifically, these contrast,JSForcedoesnotrelyuponpredefinedprofilefor approaches rely upon visiting websites or opening PDF pathexplorationandhandlesruntimeerrorsusingtheforced files with an instrumented browser or PDF reader, and execution model presented in Section III-A. By overcoming then monitoring different features (eval strings [22], heap thoselimitationsofRozzle,JSForceachievesgreatercode health [35], etc.) for detection. coverage. However, dynamic analysis techniques suffer from two Revolver [25] employs a machine learning-based de- fundamental limitations. The first limitation is limited code tection algorithm to identify evasive JavaScript malware. coverage. This becomes a much more severe limitation However, it requires that the malicious sample is present within the context of analyzing malicious JavaScript. At- within a known sample set so that its evasive version can tackersfrequentlyemployatechniquecalledcloaking [43], be determined based upon the classification difference. By which works by fingerprinting the victim’s web browser design, it can not be used for 0-day malware detection. and only revealing the malicious content when the victim Symbolic execution has also been applied to the task of is using a specific version of the browser with a vulnerable exposing malware [15]. This technique, while improving plugin. Cloaking makes dynamic analysis much harder be- code coverage over dynamic analysis, suffers from scala- cause the sample must be run within every combination of bility challenges and is, in many ways, unnecessarily pre- web browser and plugin to ensure complete code coverage. cise[26].WithinthecontextofJavaScriptanalysis,symbolic The widely-used event callback feature of JavaScript also execution becomes more challenging [37]. JavaScript appli- makes it challenging for dynamic analysis to automatically cationsacceptmanydifferentkindsofinput,andthoseinputs trigger code. For example, attackers can load the attack are structured as strings. For example, a typical application code only when a specific mouse click event is captured, might take user input from form fields, messages from a andautomaticallydeterminingandgeneratingsuchatrigger serverviaXMLHttpRequest,anddatafromcoderunning event is difficult. concurrently within other browser windows. It is extremely The second limitation is the complexity of the JavaScript difficult for a symbolic string solver [41] to effectively runtime environment. JavaScript is used within many ap- supply values for all of these different kinds of inputs and plications, and it can call the functionality of any plugin reasonabouthowthoseinputsareparsedandvalidated.The extensions supported by these applications. For dynamic rapidly evolving JavaScript language and its host programs analysis, any pre-defined browser setup handles a known (browsers, PDF readers, etc.) make the modeling of the set of browsers and plugins. Thus, there is no guarantee JavaScript API tedious work. Furthermore, the dynamic that this setup will detect vulnerabilities only present in less features (such as the eval function) of JavaScript make popular plugins. While it is possible to deploy a cluster of symbolic execution infeasible for many analysis efforts. machinesrunningmanydifferentoperatingsystems,browser Overview: JSForce, our proposed forced-execution applications,andbrowserplugins,theexponentialgrowthof engine for JavaScript, is an enhancement technology de- possible combinations rapidly causes scalability issues and signedtobetterexposethebehaviorsofmaliciousJavaScript makes this approach infeasible. at runtime. Different detection policies can be applied to Rozzle [26] attempts to address this code coverage prob- examine malicious JavaScript. While the forced execution lem by exploring environment-related paths within a single concept is first introduced for binary code analysis (X- execution. For instance, because att in Figure 1 depends Force [33]), we face unique challenges, such as type in- upon the environment-related API’s output, Rozzle will ference and invalid object access recovery, in enabling the executelines5-15andrevealthemaliciousbehaviorshidden forced execution concept for JavaScript. inlines8-14byexecutingboththetry andcatchblocks. We now illustrate how the forced execution of JavaScript But, it requires a predefined environment-related profile for code works. Consider the snippet shown in Figure 1. path exploration. Construction of a complete profile is a JSForce forces the execution through the different code challengingtaskbecauseofthenumerousdifferentbrowsers paths of the snippet. So, the exploitation code within the and plugins, especially for newer proposed fingerprinting catch block (lines 7-14) will be executed, no matter how techniques [31], [32], [42]. These new techniques do not the ActiveX API is simulated by the emulation-based anal- rely upon any specific APIs. For instance, the JavaScript ysis system. Moreover, JSForce will immediately invoke engine fingerprinting technique [32] relies upon JavaScript thecallbackfunctionpassedtosetTimeouttotriggerthe conformance tests such as the Sputnik [10] test suite to time bomb malware. determine a specific browser and major version number. JSForce’s path exploration forces line 2 to be exe- TherearenospecificAPIsusedforthefingerprinting.Thus, cuted, regardless of the result of the fingerprinting state- Rozzle cannot include it within the predefined profile and ment (line 1). Since btt is not defined within the code explore the environment-related paths. Rozzle also intro- snippet under analysis, which is a common scenario be- duces runtime errors into the analysis engine, which may cause collected JavaScript code may be incomplete due to 3 multi-stages of the attack, the execution of line 2 raises Types: a ReferenceError exception when running within a τ ∶∶= ∑ ϕ i∈T,T⊆{(cid:150),u,b,s,n,o} i normal JavaScript engine. When the exception is captured, Rows: JSForce creates a FakedObject named btt, which is (cid:37) ∶∶= str:τ,(cid:37) ∣ (cid:37) fed to the JavaScript engine to recover from the invalid τ Type environments: object access. However, the type of btt is unknown at the Γ ∶∶= Γ(x∶τ) timeofFakedObject’screation.JSForceinfersthetype ∣ ∅ baseduponhowtheFakedObjectisused.Forexample,if Type summands this FakedObject is added to an integer, JSForce will and indices: ϕ ∶∶= Undef thenchangeitstypefromFakedObjecttoInteger.We (cid:150) ϕ ∶∶= Null u call this faked object retyping. ϕ ∶∶= Bool(ξ ) b b ξ ∶∶= false∣true∣⊺ III. JAVASCRIPTFORCEDEXECUTION b ϕ ∶∶= String(ξ ) s s This section explains the basics of how a single forced ξ ∶∶= str∣⊺ s execution proceeds. The goal is to have a non-crashable ϕn ∶∶= Number(ξn) ξ ∶∶= num∣⊺ execution.WefirstpresenttheJavaScriptlanguagesemantics n ϕ ∶∶= Function(this∶τ;(cid:37)→τ) and then focus on how to detect and recover from invalid f ϕ ∶∶= Obj(∑ ϕ )((cid:37)) o i∈T,T⊆{b,s,n,f,(cid:150)} i object accesses. We then discuss how path exploration ϕ ∶∶= FObj fo occurs during forced execution. ϕ ∶∶= FFun ff A. Forced Execution Semantics Figure 3: Syntax of JavaScript Types ⟨EXPRESSIONS⟩ ::= c CONSTANT | x VARIABLE | x.f FIELD ACCESS The basic idea behind forced execution is that, whenever | x.prot PROTO ACCESS a reference error is discovered, a FakedObject is cre- | e op e BINARY OP ated and returned as the pointer of the property. During | this THIS | {f ∶e ,...,f ∶e } OBJECT LITERAL the execution of the program, the expected type of the 1 1 n n | {function(p ,...,p ){S}} FUNCTION DEF FakedObject is indicated by the involved operation. For 1 n | f(a1,...,an) FUNCTION CALL instance, adding a number object to a FakedObject | new f(a1,...,an) NEW indicates that the FakedObject’s type is number. When ⟨STATEMENTS⟩ ::= skip SKIP thetypeofaFakedObjectcanbedetermined,weupdate | S ∶S SEQ it to the corresponding type. 1 2 | varx VAR DECL Potentially, we could assign FakedObject with the | x∶=e ASSIGN type Object and reuse the dynamic typing rules of the | x.f ∶=e ASSIGN | if e then S else S CONDITIONAL JavaScript engine to coerce the FakedObject to an ex- 1 2 | while e do S WHILE pected type. Nevertheless, the dynamic typing rules of the | try{S}catch{S}finally{S} TRY CATCH JavaScript engine are designed to maintain the correctness | return e RETURN of JavaScript semantics and do not suffice to meet our Figure 2: Core JavaScript analysisgoalofachievingmaximizedexecution.Thiscanbe attributed to two reasons. First, while the JavaScript engine can cast the FakedObject:Object to proper primi- The JavaScript Language: JavaScript is a high-level, tive values, it cannot cast the FakedObject:Object to dynamic, untyped, and interpreted programming language. proper object types. For instance, when a FakedObject Figure 2 summarizes the syntax of the core JavaScript, with the type Object is used as a function object, the which captures the essence of JavaScript. At runtime, the JavaScript engine will raise the TypeError exception JavaScript engine dynamically interprets JavaScript code accordingtoECMAspecification[1].Second,thecastingof to 1) load/allocate objects, 2) determine the types of ob- FakedObjecttoprimitivevaluesbytheJavaScriptengine jects, and 3) execute the corresponding semantics. Given can lead to unnecessary loss of precision. To understand an arbitrary JavaScript snippet, execution may fail because why, consider the following loop: of undefined/uninitialized objects or incorrect object types. For instance, the execution of line 2 in Figure 1 raises a c = a/2; ReferenceError exception because btt is not defined. 1 for(i= c;i<10000;i++) 2 To tolerate such invalid object accesses, forced execution { 3 must handle such failures. memory[i] = nop + nop + shellcode; 4 4 var a = null; return x incomplete or some portion of the JavaScript code is 1 9 2 var b = c + 1; 10 }; missing. For example, a browser plugin referenced by var d = a.length; } 3 11 the JavaScript is not installed, or only a portion of the var func = null; d = func(6); 4 12 JavaScript code is captured during the attack. a = ”Hello World”; var f = Math.abs(d); 5 13 To handle this error, JSForce intercepts the lookup var e = new abc() ; array[5] = f; 6 14 if (b < 5) { process and a FakedObject named as the lookup key 7 8 func = function( is created whenever a failed lookup is captured. The cor- x) { responding parent object’s property is also updated to the Figure 4: JavaScript Sample FakedObject. Line 2 in Figure 4 presents such an example. The JavaScript engine searches the current code scopeforthedefinitionofc,whichisnotdefined.JSForce 5 } returns the FakedObject as the temporary value of c so that the execution can continue. The second case (ER 2) occurs when the object is Since a is not defined, a FakedObject will be created. initialized to the value null or undefined, but later has Withthebuilt-intypingruleoftheJavaScriptengine,cwill itspropertiesaccessed.JSForcemodifiestheinitialization be assigned the value NaN. The loop condition i < 10000 process to replace the null to a FakedObject if an will always evaluate to false. Thus, the loop body, which object is initialized as value null or undefined. For contains the heap spray code, will never be executed. Al- example, the variable a defined on line 1 in Figure 4 is though the path exploration of JSForce will guarantee assigned the value FakedObject instead of null under that the loop body will be executed once, without executing the forced execution engine. The variable a may later be the loop 10,000 times, it will likely be missed by heap updated to another value during execution, but this does not spraydetectiontoolsbecauseofthesmallchunkofmemory sabotage the execution of JavaScript code. allocated on the heap. Faked Object Retyping: When a FakedObject is Therefore, to overcome the above two issues, JSForce usedwithinanexpression,itmustberetypedtotheexpected introduces two new types, FObj and FFun, to the type. Otherwise, incorrect typing raises a TypeError JavaScript type system. The JavaScript type system de- exception and stops the execution. JSForce infers the ex- fined in [40] is extended to support these two new types. pectedtypeofFakedObjectbyhowthe FakedObject Figure 3 summarizes the new syntax of these JavaScript is used. Figure 5 summarizes the dynamic typing rules types. Type FObj is for FakedObject. At the mo- introduced by JSForce. The rules are divided into the ment FakedObject is created, we assign type FObj following five categories: as the temporary type of FakedObject. It can be 1) R-ASSIGN.Thisruledealswithassignmentstatements. subtyped to any types within the JavaScript type sys- When a FakedObject e is assigned to a new value 0 tem. When FakedObject is used as a function object, e , e is updated to the new value e1 with the type 1 0 FakedObject is casted to FakedFunction with type τ. The JavaScript engine handles this naturally, so FFun. The FakedFunction with type FFun can take no interference is required. For example, variable a arbitrary input and always returns FakedObject:FObj. in Figure 4 is assigned FakedObject at line 1 by FollowingJSForce’sdynamictypingrules,aintheabove JSForce. At line 4, the variable a is retyped as a loop sample will be typed to Number because it is used string object. as a dividend. c is then assigned to Number and the 2) R-CALL1 and R-NEW. These two rules describe loop body is executed repeatedly until the loop condition the typing rule for the scenario when a i<10000 is evaluated to false. By introducing these two FakedObject:FObj is used as a function call new types and their typing rules, JSForce solves the two or by the new expression. Function calls and the issues mentioned in the above paragraph. In the following new expression both expect their first operand to paragraphs,wedetailtheJavaScriptforcedexecutionmodel. evaluate to a function. So, JSForce updates the Reference Error Recovery: To avoid raising FakedObject:FObj to FakedFunction:FFun ReferenceError exceptions, we introduce the for this situation. The FakedFunction is a special FakedObject and recover the error by creating the function object which is configured to accept arbitrary FakedObject whenever necessary. There are two cases parameters. The return value of the function is set to that lead to reference errors. The first case (ER 1) is a a FakedObject:FObj so that it can be retyped failed object lookup. Every field access or prototype access whenever necessary. triggers a dynamic lookup using the field or prototype’s 3) R-CALL2.Thisruledescribesthecasewherethecallee name as the key. If no object is found, the lookup fails. is a known function, but a FakedObject:FObj is Such failures happen when the running environment is passed as a function parameter. JSForce types the 5 R-CALL1 R-ASSIGN τ0⊵Obj(Function(this∶τ′;⌈0⌉∶τ1,...,⌈n–1⌉∶τn,(cid:37)→τ))((cid:37)′) Γ⊢ e ∶ϕ Γ⊢e ∶τ Γ⊢ e ∶ϕ /τ lhs 0 fo 1 ref 0 fo Γ⊢ref e0=e1∶τ ⊢upde0∶ϕfo,(cid:37)′@τ ↤ϕff,Γ⊢ref e0(e1,....,en)∶ϕfo/⊥ R-CALL2 τ ⊵Obj(Function(this∶τ′;⌈0⌉∶τ ,...,⌈n–1⌉∶τ ,(cid:37)→τ))((cid:37)′) 0 1 n Γ⊢ e ∶τ /τ′ Γ⊢e ∶τ ...Γ⊢e ∶τ Γ⊢e ∶ϕ Γ⊢e ∶τ ... Γ⊢e ∶τ ref 0 0 1 1 (i–1) (i–1) i fo (i+1) (i+1) n n ⊢ e ∶ϕ @τ ↤τ ,Γ⊢ e (e ,....,e )∶τ/⊥ upd i fo i ref 0 1 n R-NEW τ0⊵Obj(Function(this∶τ′;⌈0⌉∶τ1,...,⌈n–1⌉∶τn,(cid:37)→τ))((cid:37)′) R-BINOPERATOR1 Γ⊢ref e0∶ϕfo/τ Γ⊢e1∶ϕfo Γ⊢e2∶τ′ ¬(e2 is ϕfo) ⊢ e ∶ϕ ,(cid:37)′@τ ↤ϕ ,Γ⊢ new e (e ,....,e )∶ϕ /⊥ ⊢ e ∶ϕ @τ ↤τ′,Γ⊢e op e ∶τ′ upd 0 fo ff ref 0 1 n fo upd 1 fo 1 2 R-BINOPERATOR2 R-INDEX1 Γ⊢e ∶ϕ Γ⊢e ∶ϕ Γ⊢e ∶ϕ τ ⊵Obj(ϕ )((cid:37) ) Γ⊢e ∶ϕ 1 fo 2 fo 1 fo 1 1 1 2 n ⊢ e ∶ϕ @τ ↤ϕ ,⊢ e ∶ϕ @τ ↤ϕ ,Γ⊢e op e ∶τ ⊢ e ∶ϕ @τ ↤τ ,Γ⊢ e [e ]∶ϕ upd 1 fo n upd 2 fo n 1 2 upd 1 fo 1 lhs 1 2 fo R UNARYOPERATOR R-INDEX2 Γ⊢e1∶ϕfo Γ⊢e1∶τ1 τ1⊵Obj(ϕ1)((cid:37)1) Γ⊢e2∶ϕfo ⊢upd(cid:37)1@ϕn↦τ′ ⊢upde2∶ϕfo@τ ↤ϕn,Γ⊢op e1∶τ ⊢upde2∶ϕfo@τ ↤ϕn,Γ⊢lhse1[e2]∶τ′ Figure 5: Typing rules FakedObject:FObj to the required type of the type and the operator’s semantics. callee’s arguments. The JavaScript language has many 5) R-INDEX1 and R-INDEX2. These two rules describe standard built-in libraries such as Math and Date. how to update the type when there are indexing op- WhenaFakedObject:FObjisusedbythestandard erations. A FakedObject:FObj is updated to an library function, we update the type based upon the ArrayObject ∶ φ whenever a key is used as an o specification of the library function [1]. Currently, array index to access elements of the FakedObject. JSForce implements retyping for several common JSForce creates an ArrayObject and initializes the libraries (e.g., Math, Number, Date). elements to FakedObject:FObj. The length of 4) R-BINOPERATOR1/2 and R-UNARYOPERATOR. the ArrayObject is set to 2*CurrentIndex. If an These three rules describe how to update the type Out-Of-Boundary access is found, JSForce doubles if the FakedObject:FObj is involved in an the length of ArrayObject. If the array index is expression with an operator. JSForce updates FakedObject, JSForce types it to number and the FakedObject:FObj’s type based upon the initializes it as 0, which avoids Out-Of-Boundary ex- semantics of the operator. For unary operators, it ceptions. If both the index object and base object are is straightforward to determine the type from the FakedObject:FObj, the R-INDEX2 rule is first operator’s semantics. For instance, the postfix operator appliedtoupdatetheindexobjecttonumber,thenthe indicates the type as number. For binary operators, R-INDEX1 rule is applied to update the base object to thetypingbecomesmorecomplicated.Ifbothoperands ArrayObject. are FakedObject:FObj and the operator does not Example: Table I presents a forced execution of the reveal the type of the operands, JSForce types them sample shown in Figure 4. In the execution, the branch to number. This is because the number type can be in lines 8-11 is not taken. At line 1, JSForce as- converted to most types naturally by the JavaScript signs a FakedObject:Fobj to a, instead of null. engine. For example, the number type in JavaScript This is because at line 3 the access to property length can be converted to the string type, but it may raises an exception if a is null. On line 2, we fail to convert a string to a number. Later during can see a FakedObject:FObj is first assigned to execution, if the types can be determined, JSForce c. Once c is added to 1, JSForce updates the will update the type to the correct type. If only one of value of c to a random number. Lines 6 and 7 show thetwooperandsisFakedObject:FObj,JSForce that if a FakedObject:FObj is used in the func- determines the type based upon the other operand’s tion call or new expression, JSForce updates it to 6 Statement Action Rule Algorithm 1 Path Exploration Algorithm 1:vara=null; a↤FakedObject ER 2 Definitions:switches-thesetofswitchedpredicatesinaforced c↤FakedObject ER 1 2:varb=c+1; R BINOPE execution,denotedbyasequenceofpredicateoffsetsinthesource c↤RanNumber RATOR1 file(SrcName:offset). For example, t.js∶15⋅t.js∶83⋅t.js∶100 3:vard=a.length; a.length↤FakedObject ER 1 means the branch in source file t.js with the offset 15, 83, 100 is 4:varfunc=null; func↤FakedObject ER 2 switched. EX, WL - a set of forced executions, each denoted by 5:a=”HelloWorld”; a↤”HelloWorld” R ASSIGN a sequence of switched predicates. preds ∶ Predicate×boolean abc↤FakedObject ER 1 - the sequence of executed predicates. 6:vare=newabc(); abc↤fakedFunction R NEW Input: The tested JS 7:if(b<5) NOACTION NONE Output: FULL EX func↤fakedFunction R CALL1 12:d=func(6) d↤FakedObject R ASSIGN 1: FULL EX←∅ 13:varf=Math.abs(d) d↤RanNumber R CALL2 2: SRC←{JS} array↤FakedObject ER 1 3: while SRC do 14:array[5]=f; array↤arrayObject R INDEX1 4: WL←{∅} array[5]↤f R ASSIGN 5: EX←∅ 6: js←SRC.pop() Table I: Forced execution of sample in Figure 4 7: while WL do 8: switches←WL.pop() 9: EX←EX∪switches FakedFunction:FFun. The return value of the faked 10: (preds,newJS)← EXECUTECODE(js,switches) function is still configured to FakedObject:FObj, so 11: SRC←SRC∪newJS 12: t←len(switches) that at line 13, d is updated to hold a random number. 13: preds←remove the first t elements in preds JSForce also automatically recovers from other ex- 14: for all (p,b)∈preds do ceptions by intercepting those exceptions to eliminate the 15: if !covered(p,¬b) then exception condition. For example, JSForce will update a 16: WL←WL∪switches⋅(p,b) divisor to a non-zero value if a division-by-zero exception 17: end if 18: end for is raised. 19: end while B. Path Exploration in JSForce 20: FULL EX←FULL EX∪{EX∶js} 21: end while OneimportantfunctionalityofJSForceisthecapability 22: procedure EXECUTECODE(JS,switches) of exploring different execution paths of a given JavaScript 23: preds←switches snippettoexposeitsbehaviorandacquirecompleteanalysis 24: CBQ←∅ 25: newJS←∅ results. In this subsection, we explain the path exploration 26: for all stmt∈JS do algorithm and strategies. 27: if isNoneEvalFunctionCallStmt(stmt) then In practice, attackers constantly adopt the dynamic fea- 28: if CalleeTakesStrings(stmt) then tures of JavaScript to aid in evading detection. This results 29: newJS ← newJS ∪ inincompletepathexplorationundertwocircumstances.The GetJSFromString(stmt) 30: end if firstiswhenstringsaredynamicallygenerated.Forinstance, 31: if CalleeRegisterCallback(stmt) then document.write is often abused to inject dynamically 32: CBQ←CBQ∪ExtractCBFunc(stmt) decoded maliciousJavaScript codeinto thepage atruntime. 33: end if The second is when event callbacks are used. As discussed 34: else if isBranchStmt(stmt) then in Section II, attackers can abuse event callbacks to stop 35: if GetSwitch(stmt)∈switches then 36: Execute according to switches the execution of malicious code. JSForce solves this by 37: else employing specific path exploration strategies. Within the 38: preds←preds⋅GetPredicate(stmt) execution,iffakedfunctionstakestringsasinput,JSForce 39: end if examines the strings and executes the code if they contain 40: end if JavaScript. This strategy is only applied on faked functions 41: end for 42: for all cb∈CBQ do since original functions (eval) can handle the strings as 43: (preds′,newJS′)← EXECUTECODE(cb,∅) defined. JSForce also detects the callback registration 44: newJS←newJS∪newJS′ functionandinvokesthecallbackfunctionimmediatelyafter 45: preds←preds⋅preds′ the current execution terminates. 46: end for JSForce treats try-catch statements as if-else return (preds,newJS) 47: end procedure statements,ie.,itexecuteseachtryblockandcatchblock separately. Ternary operators are also treated as if-else statements: both values are evaluated. There are several different path exploration algorithms: The goal of path exploration in JSForce is to maximize linear search, quadratic search, and exponential search [33]. the code coverage to improve the detection rate of mali- 7 cious payload with an acceptable performance overhead. changing the V8 property access failure handling functions Quadratic and exponential searches are too expensive, so likeRuntime LoadIC Miss.Forfakedobjectretyping, JSForce employs the linear search only. JSForce inserts additional code into runtime methods like Algorithm 1 describes the path exploration algorithm, Runtime BinaryOpIC Miss that is executed prior to which generates a pool of forced executions that achieve the exception being raised. This additional code follows the maximized code coverage. The complexity is O(n), where rules described in Section III-A to conduct the retyping n is the number of JavaScript statements. n may change processiftheinvolvedoperationcontainsaFakedObject. at runtime because JavaScript code can be dynamically Predicates Flip: We have two approaches available to generated.Initially,JSForceexecutestheprogramwithout flipthepredicates.Thefirstapproachistoflipthepredicates switching any predicates since switches is initialized as within the Just-in-Time code. The Just-in-Time code can ∅(line8)forthefirsttime.JSForceexecutestheprogram be optimized (inline caching, etc.) by V8 in accordance according to the switches at line 10 and returns preds with the execution profile. To enable predicates flipping, and dynamically generated code newJS. In lines 12-17, we a runtime function must be inserted before every branch determine if it would be of interest to further switch more so that JSForce can manipulate the predicate value. This predicate instances. Lines 11-13 compute the sequence of approach may affect the optimization process of Just-in- predicateinstanceseligibleforswitching.Notethatitcannot Time code. be a predicate before the last switched predicate specified JSForce takes the second approach: if the branch A in switches. Switching such a predicate may change of a predicate needs to be taken, JSForce replaces the the control flow such that the specification in switches other branch with this branch A. At runtime, no mat- becomes invalid. Specifically, line 16 switches the predicate ter which branch is taken, the branch A is executed. iftheotherbranchhasnotbeencovered.Ineachnewforced For instance, we want to take the {A} branch of the execution, we essentially switch one more predicate. statement if(e){A}else{B}. JSForce changes it to The procedure ExecuteCode (lines 22-47) describes if(e){A}else{A}, so that {A} is executed at runtime. the execution process. It collects dynamically generated Loops and Recursions: Sometimes, JSForce may JavaScript code (lines 28-30) and the executed predicates causealooptoexecuteforaverylongtime,duetotheintro- (lines 34-38). The new generated JavaScript code, newJS, duction of faked objects. To solve this problem, JSForce willbeexecutedafterthepathexplorationofthecurrentjs inserts a time counter for every loop statement (for...in finishes. The registered callback functions (lines 31-33) are andfor...ofareexcluded,astheywillalwaysterminate), also queued and invoked after the current execution finishes anditwillterminatetheloopiftheexecutiontimeexceedsa (lines 42-46). As an example, recall the callback function limit. Similarly, if JSForce forces a predicate that guards redir()usedinline16ofFigure1.Insteadofwaitingfor the termination of a recursive function call, a very deep the timeout, JSForce will trigger the redir() function recursion may result. To address deep recursion, JSForce immediately after the current execution finishes. monitorsthestackdepth.Oncethemaximumcallstacksize (definedbyV8)isreached,callstothatfunctionareomitted IV. IMPLEMENTATION by JSForce. JSForceisimplementedbyextendingtheV8JavaScript engine [11] on the X86-64 platform. It is comprised of V. EVALUATION approximately 4,600 lines of C/C++ code and 1,500 lines In this section, we present details on the evaluation of Python code. We address some prominent challenges of of correctness, effectiveness and runtime performance of its implementation in this section. JSForce using a large number of real-world samples. Reference Error Recovery & Faked Object Retyping: In V8, an abstract syntax tree (AST) is generated for every A. Dataset & Experiment Setup function, which is then compiled into native code (known as Just-In-Time code). V8 adopts an inline caching tech- Dataset: The complete dataset used for our evaluation nique [23] to accelerate property accesses. If the property consists of two sample sets: a malicious sample set and a access fails, the execution jumps to the V8 runtime system benignsampleset.Forthemaliciousset,wecollectedasam- which handles any inline cache misses. If the runtime plesetwith172,995HTMLfilesand23,509PDFfilesfrom system is unable to handle an inline cache miss, either various databases including VirusTotal [12], Contagio [2], due to reference error or type check error, it raises the MalTrafficAnalysis [3], and Threatglass [4]. Among those, corresponding exception and stops the execution. all samples from VirusTotal were new samples evaluated We modify the inline cache miss handling process to within a month of being submitted, with the samples pro- enable reference error recovery and faked object retyp- videdfromothersourcesbeingrelativelyold.Forthebenign ing. For reference error recovery, JSForce creates and sample set, we crawled the Alexa top 100 websites [5] and returns the FakedObject for failed object lookup by collected 47,592 HTML files. 8 Category Total Detected by JSForce Percentage are indeed malicious,proving that our tool canachieve very TruePositive 389 389 100% high true positive rate with accurate analysis details. FalsePositive 47,592 9 0.019% False positive: For our second goal, we analyzed Table II: Correctness Results. our benign sample set using JSForce and then observed whether any of the samples could be incorrectly labeled as malicious. As shown in the second row of Table II, the Experiment Setup: For JavaScript code analysis, we JSForce tags 9 out of 47,592 samples as malicious. We leverage the jsunpack [22] tool. Jsunpack is a widely used first manually confirmed that all 9 samples are clean and malicious JavaScript code analysis tool that utilizes the thereupon study why the false positives happen. It has been SpiderMonkey [6] JavaScript engine for code execution. verified by manual inspection that all of the false positives Six distinct configurations are predefined within jsunpack are caused by the inaccurate detection policy, to be more to maximize the exploration of JavaScript code by trying specific,theover-relaxationoftheshellcodestringmatching differentbrowsersandlanguagesettings.Forthesakeofour policyenforcedbyjsunpack.Thereasonwhyourtoolcould evaluation, we replaced the SpiderMonkey from jsunpack detect them as malicious is that it explores JavaScript code with JSForce and relied upon the detection policies in in a more complete fashion in consequence of our forced jsunpack for malicious code detection. Most of our experi- execution technique. Therefore, based upon the above ex- ments are based upon the comparison between the original perimental results, we argue that using JSForce will keep jsunpack and the JSForce-extended jsunpack. Note that a very low false positive rate for JavaScript code analysis, the experiments performed within this paper are only in- andisabletoassistinaccomplishingmorethoroughresults. tendedtoshowtheimprovementofdetectionresultsoverthe Theoretically, JSForce can generate higher code coverage originaloneswhenadoptingJSForce.Thedetectionpolicy than jsunpack and lead to better analysis results. But, the itselfisanotherimportantresearchtopicwhichisorthogonal question is by how much. With that, we conducted another to the focus of this paper. We conducted our experiments set of experiments to show the effectiveness of JSForce. on a test machine equipped with Intel(R) Xeon(R) E5-2650 CPU (20M Cache, 2GHz) and 128GB of physical memory. C. Effectiveness The operating system was Ubuntu 12.04.3 (64bit). For the evaluation of effectiveness, we would like to demonstrate that JSForce can indeed help the malicious B. Correctness JavaScript code analysis by performing efficient forced ex- Inthissection,weevaluatethecorrectnessoftheanalysis ecution. In order to achieve that, we utilize our malicious result for JSForce. The goals of this evaluation are two- HTMLandPDFsamplesetsandrunthesamplesetsagainst fold. First, we wish to know the true positive rate of our jsunpack both with or without JSForce for the evaluation. analysis results, meaning that we wish to verify whether a In the interest of showing how useful our faked object JavaScript program is undoubtedly malicious if it is tagged retypingis,wealsoconductanotherexperimentthatdisables as one by the analysis tools. Second, we wish to understand the retyping and only keeps the reference error recovery any false positives in the results so as to determine whether component and path exploration component. any benign JavaScript code can be mistakenly labeled as Experimental Results: Table III illustrates the ex- malicious. perimental results for effectiveness. It demonstrates that True positive: With our first goal in mind, we queried JSForce could greatly improve the detection rate for VirusTotal [12] for malicious HTML files and collected JavaScriptanalysis.Wecanseedetectionrateimprovements 389 samples which are precisely labeled with specific of 759.84% and 4.84% for HTML and PDF samples, re- CVE(CommonVulnerabilitiesandExposures)numbersthat spectively,whenusingJSForce-extendedjsunpackinstead match CVEs listed in jsunpack. Furthermore, we manually of the original version for analysis. And all the samples reviewedeachofthesamplesandconfirmedtheexistenceof detectedbyoriginaljsunpackarealsoflaggedbyJSForce- shellcode or malicious signatures. This step is to guarantee extendedjsunpack.Wefurtherbreakdownthenumbersinto all the samples we tested are real malicious samples that old and new sample sets and perceive that the extended should be detected by our tool. Then, we analyzed the versioncouldperformmuchbetterthanoriginaljsunpackin samples using jsunpack with JSForce. The experimental analyzing new samples. For new HTML samples, jsunpack result is listed in the first row of Table II as “true positive”. withJSForceisabletodetect817.3%moresampleswhile It shows that JSForce could successfully detect all of the for old samples, the number is 84.97%. Similar results are samples, resulting in a 100% true positive rate. To better also observed for PDF samples. After manual inspection, understand these results, we further inspected the detailed we confirmed that this is because many of the old samples analysis results to see why our tool tagged samples as havebeenanalyzedforquitesometimeandjsunpackalready malicious. Our inspection results revealed that all of the has the signatures stored in its database, leaving only a payloadandmalicioussignaturesextractedbytheJSForce small margin for JSForce to improve upon. For the faked 9 without with Detected Missed SampleSet Total Improvement JSForce JSForce ByBoth WithJSForce OldHTML 66,325 193 357 84.9% 193 0 NewHTML 106,018 2,250 20,649 817.3% 2250 0 HTMLTotal 172,995 2,443 21,006 759.8% 2443 0 OldPDF 22,081 6,306 6,475 2.7% 6306 0 NewPDF 1,428 32 170 431.2% 32 0 PDFTotal 23,509 6,338 6,645 4.8% 6338 0 Table III: Effectiveness Results. refer to the Section 6 Case Study for more details on this 100 topic.Asforanyundetectedsamples,JSForcewillexplore undetected samples 90 detected samples theentirecodespaceduringanalysis,whichrequiresalarger 80 amount of path exploration and longer analysis runtime. eg 70 atn D. Runtime Performance e 60 c re In this section, we evaluate the runtime performance p ev 50 of JSForce by using our malicious and benign datasets italu 40 with a comparison between the original jsunpack and the m uC 30 JSForce-extended version. 20 Runtime for Detected Samples: In this section, we comparetheruntimeperformanceusingtheHTMLandPDF 10 samples that can be detected by jsunpack both with and 0 0 20 40 60 80 100 120 140 160 180 200 without JSForce. The reason why we chose this sample Number of paths set is that we wished to observe whether the JSForce- Figure 6: Num of Path Exploration during Analysis. extended version can achieve efficiency comparable to the originaljsunpackwhenusingadetectablemalicioussample. The results are displayed in Figures 7 and 9. The results object retyping evaluation, we reran the test using 106,018 concludethatJSForce-extendedversionhasbetterruntime new HTML malicious samples with retyping component performance than jsunpack for over 90.9% of HTML and disabled. The result shows that only 8,677 samples can be 83.6% of PDF samples. This conclusion is quite surprising detected by JSForce in contrast to 20,649 with retyping astheJSForce-extendedversiontendstoexploremultiple enabled. This result reveals the usefulness of our faked paths while jsunpack only probes for one. object retyping component during analysis. Nevertheless, In theory, jsunpack should have better runtime perfor- throughourexperiments,weareabletodrawtheconclusion mance. However, after investigation, we found that many of that JSForce is quite effective for boosting the effective- theJavaScriptsamplesrequirespecificsystemconfigurations ness of JavaScript analysis. (suchasspecificbrowserkernelversion)torun.Asaresult, Number of Paths Explored: Potentially, there may be a when jsunpack performs analysis, it will run the JavaScript largenumberofpathsthatexistinsideofasingleJavaScript programs under multiple settings. This results in multiple program. The effectiveness and efficiency of JSForce are executions, which take additional time to complete. In con- closely related to the number of paths explored during trast,theJSForce-extendedversionhandledthisissuewith analysis. Hence, we would like to show some statistics on forced execution, resulting in better runtime performance in thenumberofpathsthatJSForceexploredduringanalysis. practice. The result depicted in Figure 6 shows that JSForce is Runtime for Undetected Samples: Figures 8 and 10 able to detect the maliciousness of samples with a limited show the runtime performance of JSForce for undetected number of path explorations. An interesting observation is samples.Weempiricallysetthetimelimittobe300seconds that over 96% of the samples were detected by exploring in consequence of the fact that experiment shows almost all only a single path. Even though most of the analysis for (99.6%) HTML and PDF samples can be analyzed within detectedsamplescanbefinishedbyexploringjustonepath, 300 seconds. As demonstrated in the figures, the average thepathexplorationofJSForceisstillessential.Notethat analysis runtime for HTML and PDF samples are 12.02 98% of the samples missed by the default jsunpack, but and 8.15 seconds, while the analysis for a majority (80%) detectedbytheJSForce-extendedversion,exploreatleast of HTML samples and PDF samples are finished within two paths. So, the analysis could still receive an enormous 8.54and7.4seconds,respectively.Whencomparedwiththe benefit from JSForce in terms of path exploration. Please original jsunpack, the JSForce-extended version achieves 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.