Using AOP to Bring a Project Back in Shape: The OurGrid Case AylaDantas† WalfredoCirne† KatiaSaikoski⋆ †Laborato´riodeSistemasDistribu´ıdos DepartamentodeSistemaseComputac¸a˜o UniversidadeFederaldeCampinaGrande,58109-970,CampinaGrande,PB,Brazil ayla,[email protected] ⋆ComputingLab-ResearchandDevelopment HPBrazil [email protected] Abstract log in [9]. This solution has grown in several ways and The design and development of distributed software has become widely used. Several research results (e.g., is a complex task. This was not different in OurGrid, fault management [25], resource sharing [2,3], applica- a project whose objective was to develop a free-to-join tionscheduling[12],etc)havebeenincorporatedintothe grid. After two years of development, it was necessary originalprototype. Thearchitectureofthesystemhadto toredesignOurGridinordertocopewiththeintegration be modified to accommodate such new features. If on problems that emerged. This paper reports our experi- theonehandtheinclusionofnewcapabilitieswasagood enceinusingAspect-OrientedProgramming(AOP)inthe evolution for the original idea; on the other hand, it in- processofredesigningtheOurGridmiddleware. Thees- troduced chaos to the project. Most of the new features sential direction of our approach was to get the project have been developed by independent teams. In order to (and the software) back in shape. We discuss how the helptheintegrationprocess,automatictestsneededtobe lack of separation of concerns created difficulties in the providedbyeachintegrator,andallthetestsneededtobe project design and development and how AOP has been executedbeforetheintegrationwascompleted. However, introducedtoovercometheseproblems. Inparticular,we providinghigh-qualityautomatictestsforagridsolution present the event-based pattern designed to better iso- isacomplextask. Besidestheusualdifficultiesindevel- late the middleware concerns and the threads. Besides, opingandtestingmulti-threadedanddistributedsoftware, we also present the aspects designed for managing the grids add new challenges due to their wide-dispersion, threadsandforaidingthetestingofmultithreadedcode. loose coupling, and presence of multiple administrative We also highlight the lessons learned in the process of domains. Infact,bothgridinfrastructureandapplications regainingcontrolofthesoftware. are currently very brittle [27]. After the integration pro- Keywords:Separation of Concerns, AspectJ, Grid cess, OurGrid 2.0 was released and used. At this point, Computing,SoftwareReengineering,SoftwareArchitec- the bugs begun to appear and we started to identify im- ture,Tests provements that needed to be performed. Nevertheless, changing the code in a secure way had become a chal- lengeandaveryerror-pronetask. Theseveralapplication concernswerenotwellisolated,andtheapplicationtests 1. Introduction werenotreliablesincetheycouldfaileitherduetoanin- OurGrid is an open, free-to-join, cooperative grid sufficienttimetowaitbeforeanassertionorduetoabug. in which institutions donate their idle computational re- At that point, we have decided to experiment with AOP sources in exchange for accessing someone else’s idle techniques to identify the problems associated with the resources when needed [8,35]. OurGrid leverages the software. ThefirststepwastouseAspectJtodebugOur- fact that people do not use their computers all the time. Gridandgetabetterideaandcontrolofitsthreads,espe- OurGridstartedbyprovidingasimpleandcompleteway ciallyduringtests. Bydoingso, wecouldgetaworking foruserstorunapplicationsoverallresourcestheycould AylaDantas,WalfredoCirne,KatiaSaikoski UsingAOPtoBringaProjectBackinShape: TheOurGridCase versionofOurGridwithareducednumberofbugs(Our- An example of an aspect language is AspectJ [20], Grid2.1.3). Anotherconcernwasthesoftwareevolution. which is a general-purpose aspect-oriented extension for This working version was very hard to evolve. Then we Java. It supports the concept of join points, which startedanefforttoisolateconcernsoraspectsoftheappli- are well-defined points in the execution flow of a pro- cation,redesigningitusinganarchitecturewherethreads gram[34]. Italsohasawayofidentifyingparticularjoin couldbewellmanagedandwherethetestscouldbemore points(namedpointcuts)andamechanismtochangethe deterministicandeasiertoimplement. applicationbehavioratjoinpoints(namedadvice). In this paper we report our experience in using AOP Pointcutdesignatorsidentifyparticularjoinpointsby during the OurGrid development, especially in a critical pickingoutasubsetofallthejoinpointsintheprogram phaseoftheproject, andwepresenttheredesignedOur- flow[20]andthecorrespondingvaluesofobjectsatthose Grid architecture. The main contribution was a better points. Apointcutexampleisshowninthefollowing: separation of concerns, which makes easier the integra- pointcut startingApplication(): tion of new features to the software as well as improves execution (public static void OurGrid’susability. Anothercontributionisthedevelop- main(String [])) ; ment of a general package for thread management using Thispointcutcapturestheexecutionofanypublicand AspectJ. This package aids in testing and debugging of staticmethodcalledmainthathasaString[]param- multithreadedcode. eter and has void as its return type. This is just one This work is organized as follows. Section 2 briefly exampleoftheseveralkindsofpointcutsprovidedbyAs- presents AOP and the AspectJ language. Section 3 pectJ. presents the problems we faced during the OurGrid de- Advicedeclarationsareusedtodefinecodethatruns velopment, inspecial, fornotdealing correctlywithdis- when a pointcut is reached. For example, we can define tinctaspectsofthesoftware. Section4presentshowwe code to run before a pointcut as shown in the following usedAspectJtoidentifytheOurGridproblemsandtosup- example: portthetestingprocess.Section5presentstheredesigned OurGridaiminganisolationofconcernsandthreadsman- before(): startingApplication(){ agement using an event-based architecture. Section 6 is System.out.println( "The system will start"); theevaluationsection,whereweanalyzetheOurGridre- } designandwepresentthelessonslearnedduringthepro- cess.Section7discussessomerelatedwork.Finally,Sec- Withthis advice, amessage isdisplayed on the stan- tion8presentsourconclusionsandsuggestsdirectionsfor dard output before the execution of any main method futureresearch. identified by the startingApplication pointcut. Besides the before advice, AspectJ also provides after and around advice. The former runs after the computationunderthejoinpointfinishes,whilethelatter 2. AOP Overview runswhenthejoinpointisreached,andhasexplicitcon- Separationofconcernsisanimportantmatterforsoft- troloverwhetherthecomputationunderthejoinpointis ware development. In its most general form, separation allowedtorunatall[34]. of concerns refers to the ability to identify, encapsulate, AspectJ also has a way of statically affecting a pro- and manipulate the parts of a software that are relevant gram. With inter-type declarations, the static structure to a particular concept, goal, task, or purpose [33]. By of a program can be changed. For example, we can analyzing the application concerns, we can organize and change the members of a class and the relationship be- decompose software into smaller, more manageable and tweenclasses[34]. comprehensiblepartsthataddressoneormoreconcerns. Finally,AspectJhastheconceptofanaspect,whichis Programmingparadigmsaddressthisissueindistinct amodularunitofcrosscuttingimplementation. Anaspect forms(e.g.,proceduresandfunctions,modules,objects). is defined similar to a class definition, and it can have However, not all aspects that need to be addressed in methods, fields, constructors, initializers, named point- a program can be encapsulated in traditional program- cuts,adviceandinter-typedeclarations. Inshort,aspects ming ways to separate concerns (e.g. message logging grouppointcuts,advice,andinter-typedeclarations. The orerrorhandling). Thisisbecausetheseaspectsareusu- aspect code is combined with the primary program code allyscatteredacrossthecode,whichmakesmaintenance byanaspectweaver[40]. and evolution complicated. Aspect-Oriented Program- ming(AOP)[21]isaprogrammingparadigmthataimsto clearlyaddresstheaspectsthatcrosscuttraditionalmod- 3. Getting into Trouble during the Develop- ules. AOP proposes that those aspects (e.g., transaction, message logging, error handling, failure handling, and mentofOurGrid output formatting) should be written separately from the In this section we give an overview of OurGrid and functionalcode. grid computing. Then, we discuss the problems we had 22 AylaDantas,WalfredoCirne,KatiaSaikoski UsingAOPtoBringaProjectBackinShape: TheOurGridCase intheprocessusedtobuildthisgridmiddleware. Finally, 3.2. Problemsinourdevelopmentprocess wedescribetheprojectstatusbeforetheredesign. TheprocessusedtobuildOurGridwasXP-based[5]. However, during the development of the initial Our- Gridversions,practicessuchas“ContinuousIntegration”, 3.1. OurGridoverview “PairProgramming”,“CollectiveOwnership”and“Small OurGrid is an open, free-to-join, cooperative grid in Releases”werenotused. which participants donate their idle computational re- sourcesinexchangeforaccessingotherparticipants’idle Incontrastwiththeweaknessofomittingsomeprac- resources when needed [8]. Itaims to enhance the com- tices,therewasastrongpointintheprocessbytheuseof puting capabilities of research labs around the world, by “automatictests”. However, eventhetestshadproblems allowingthemtotraderesourcesthatwouldotherwisebe (lowcoverageandnondeterminism)andthetest-firstap- wasted. OurGridwasdesignedtobescalable,bothinthe proach(implementthetestsbeforethefunctionalitybeing sense that it supports thousands of labs, and that joining tested)wasnotusedduetothesystemcomplexityandthe the system is straightforward. In fact, anyone can just strictdeadlines. downloadtheOurGridsoftwareandjointhegrid. There Once the initial version of OurGrid was developed, isnoneedforpaperworkorhumannegotiation,asitisthe several research efforts evolved in parallel. The evolu- caseforothergrids[8]. tion of the OurGrid code and the external features were The current design of OurGrid assumes applications managed separately (e.g., different branches in a reposi- tobeBag-of-Tasks(BoT).ABag-of-Tasksapplicationis tory, bug fixing, etc.). Then, after one year, each branch aparallelapplicationcomposedofindependenttasksthat neededtobeintegratedintotherepositorymainbranchin canbeexecutedinanyorder. AtypicalexampleofaBoT ordertoreleaseversion2.0withseveralnewfeatures. applicationisasetoftaskscomposingaparameter-sweep The integration of each new feature (each branch in simulation. BoTapplicationsarebothrelevantandsuited the repository tree) was a hard task. First, the auto- for execution in grids. OurGrid is open-source and it is matic tests took a long time to be performed (more than availablefordownloadingathttp://www.ourgrid. 3 hours). Second, sometimes the tests failed because of org. The current status of the OurGrid community is timingproblems,thatis,thetimethetestwouldwaitbe- available at http://status.ourgrid.org. The fore assertion was not enough because the functionality OurGrid basic architecture shown in Figure 1 comprises tobetestedhadnotfinishedyet. Thishappenedbecause thefollowingelements: theGridMachines(GuMs),the thetestthreadusuallycreatedotherthreadsthatwouldac- GridMachineProviders(GuMPs)andtheScheduler. tuallymaketheassertionscorrect. Therefore,sometimes Wemaysummarizethecommunicationbetweenthese it was necessary to include sleep calls on the tests to elementsasfollows. TheusersubmitsjobstotheSched- makethemwaitbeforetheseassertions. Inseveralcases uler. The Scheduler then requests machines (GuMs) to the code was committed with bugs because application the providers (GuMPs) and allocates the received GuMs developers interpreted failures as an insufficient “sleep” toexecutethetasksofthesubmittedjobs. AGuMPcon- timeinthethreadcontrolofthetest. Finally,someofthe trolsmachinesinagivenadministrativedomain. GuMPs failures were really bugs, especially those related to the cantrademachinesamongthemselvesusinganincentive- orderofexecutionofthreadsthatwerenotforeseen.Such compatiblepeer-to-peerprotocol[2,3]. bugsmadethetestspasssometimesandfailinothers. In the 1.0 version of OurGrid1, users could submit The result of this integration approach was a reposi- their jobs directly through the Scheduler remote ob- torywithnon-deterministictestswhosefailurescouldnot jectorusingLinuxshellscripts. Themachinesavailable be well diagnosed. This was due to the increase in the for the jobs in the initial versions were just those previ- numberofthreadsandsynchronizedblocksintheapplica- ouslyconfiguredinalocalgridmachineprovider. tion(forexample,therewere174synchronizedblocksat OurGrid 2.0 brought a host of new features com- theendofthissetofintegrations).Theworstscenariowas paredwiththe1.0version,suchasnewschedulingheuris- thateachthreadcouldfreely“walk”throughoutapoorly tics[12,30]andanewwaytoobtainGuMsusingapeer- modularized code, making deadlocks easy to be created to-peer community [3]. However, the process of incor- and difficult to be detected. To make things worse, be- porating these functionalities faced some problems. As causetestswouldtakealongtime,developerswouldfre- withothergridsystems[27],theresultwasabrittlecode. quentlyinterruptatestandexecuteitlaterwithoutnotic- Besides the classic difficulties in developing and testing ing that the delay could be the result of a deadlock that multi-threaded and distributed software, grids add new onlyhappenedinacertainthreadsconfiguration. Besides challenges due to their wide-dispersion, loose coupling, the testing problems, the code was not well understood andpresenceofmultipleadministrativedomains. Inpar- bythewholedevelopmentteamandevolution(including ticular, wehad very seriousproblems intestingthesoft- bug fixing) had become a hard problem, especially be- ware. causeseparationofcrosscuttingconcernsandevenofthe concernsthatwerenotsocrosscuttinghasnotbeencon- 1Version1.0ofOurGridwasoriginallycalledMyGridanditwasre- sideredduringtheintegration. leasedinthebeginningof2003 23 AylaDantas,WalfredoCirne,KatiaSaikoski UsingAOPtoBringaProjectBackinShape: TheOurGridCase Figure1.OurGridArchitectureBasics 4. Using Aspects to Diagnose the Problems andTestOurGrid Because the code of OurGrid was handled by sev- eralpeople(around20)withoutusingpairprogramming and not focusing on the collective code ownership prac- tice[5],itwasdifficulttounderstandstrangebehaviorsof thecompleteapplication. Anevenmoredifficulttaskwas tocorrectthebugs. Bugfixingbeguntotakemuchmore time than expected after the release of versions 2.0 and 2.1. Asaconsequence, westartedanefforttobetterun- derstandthecodeinordertomakeitsevolutionpossible. Figure2.ThreadServicesclassoperations Atthisphase,aspects[21]wereintroduced. 4.1. Management of application threads during de- whichcanbereusedinotherprojects. buggingandtesting The OurGrid tests use the JUnit framework [24], Themainproblemwastounderstandthebehaviorof which is based on assertions. Each test invokes the ap- the application threads. In order to do that using pure plication and then it makes an assertion to verify if a object-oriented programming, we would need to insert certain state has been reached. Before these assertions, code in several parts of the application (e.g., in each we usually used sleep calls with a certain amount of threadcreation, sleepandwaitcall, etc). So, wede- time to wait until a certain condition to be tested was cidedtouseAOPtohelpinthisprocess. Intheend,AOP achieved. This happened because the test thread usually wasnotonlyusedtodebugthecode,butalsotoimprove createdotherthreadsthatactuallychangedtheapplication thetestingprocessbycontrollingtheapplicationthreads statetotheoneexpectedbythetestassertion. Neverthe- beforeperformingassertions. less, thesleeptimewasnotagoodsolutionbecauseitis However, there were problems regarding the use of dependent on the machine being used in the test and on AOP by the development team. First, only one person theloaditfaces. was familiar with AOP and second, the team was very The org.ourgrid.threadservices pack- heterogeneousandchangedfrequently. Thesolutionwas age solves this problem. It was accessed through the todeviseatransparentwaytointroduceAOP.Thiswould ThreadServicesclass,whichisacommonclasswith allow anyone to use it, even without AOP knowledge. a set of static operations (illustrated in Figure 2). From The programmers simply needed to compile their pro- the methods available, the most used for tests was the grams with a different command and to know the meth- waitUntilWorkIsDonemethod.Thismethodmakes ods of a class that provided services that made the tests the caller thread wait until all threads started by the test threadswaitfortheotherapplicationthreadsbeforepro- have finished or were waiting on a monitor. The other ceeding. For the debugging process, we did not focus methods used to make the test thread to wait (methods on much transparency because our intention was only to wait*) were also widely used and they replaced most understand the problems in application threads. More of the sleep calls from the tests, making them faster specifically, we wanted to identify the following points: and more deterministic. For instance, if we wanted to whenthreadsstartedandfinishedtorun,whenthethreads test the Scheduler, the test would submit a job to it and waited on a monitor or when these waiting threads were then verify if the job had been successfully executed notified. This debugging code led to a general package assertingthatthejobstatewasfinished. Beforeasserting fortestingcalledorg.ourgrid.threadservices, iftheexecutionhadsuccessfullyfinished, wewouldcall 24 AylaDantas,WalfredoCirne,KatiaSaikoski UsingAOPtoBringaProjectBackinShape: TheOurGridCase the ThreadServices.waitUntilWorkIsDone The code inside each advice will include the threads method and then, the test thread would wait until the captured by the pointcuts in different collections of schedulerthreadandtheotherthreadsstartedbythetest threadsaccordingtotheirstate(started, runningorwait- hadfinishedrunningorwerewaitingonamonitor. ing).AstheThreadListsclassisresponsibleforthese In order to use the services shown in Figure 2, be- collectionsofthreads,theinclusionisdonethroughanin- sidesinvokingtheThreadServicesclassonthetests, stanceofthisclass,namedtLists. Notethatinthelast it was necessary to replace the command used to com- advice we take the caution of excluding the wait calls pile the application before the tests. In more practical thatwereinsidetheThreadListsclassitself. terms, application developers need to invoke the ant as- We have also included an after advice to remove pectscommand. Thisdidnotcauseanyimpactinthede- a thread from the collection of running threads after the velopmentprocesssincedevelopersalreadyusedAnt[18] runmethodexecutionhasfinished: fortestingandcompiling. after(): runnableRunExecutions(){ InordertoimplementtheThreadServicesmeth- tLists.removeRunnableThread(); ods without directly changing any part of the code, we } haveimplementedtheRunningThreadsMonitoras- pect. The RunningThreadsMonitor aspect man- Another two before advices have also been imple- agestheapplicationthreadswiththehelpofaJavaclass mented, as illustrated below. They are invoked when called ThreadLists, which manages thread collec- notifyandnotifyAllmethodsarecalledduringthe tions. SomeoftheRunningThreadsMonitorpoint- execution, but not within ThreadLists class. They cutdefinitionsarethefollowing. are responsible for notifying the tLists object about threads that may stop to wait, and that could, therefore, pointcut threadStartCalls(Thread t): havechangedtheirstate. call(public void start())&& target(t); pointcut waitCalls(Object o): before(Object o):( call(public void wait())&& target(o); call(public void notifyAll())) pointcut runnableRunExecutions(): && target(o) &&!within(ThreadLists){ execution(public void Runnable+.run()); tLists.notifyAllWaitingThreads(o); } The threadStartCalls pointcut illustrated before(Object o):( above collects every call to the start method on a call(public void notify())) && target(o) &&!within(ThreadLists){ Thread object. The waitCalls pointcut collects tLists.notifyOneWaitingThread(o); every call to the wait method on any 0bject. This } will indicate that one of the threads will be waiting on a given object. The runnableRunExecutions As could be seen, ThreadLists is responsi- captures the moment a thread is actually running. This ble for managing the state of application threads. corresponds to the execution of every run method from It implements the functionality that waits until a aRunnableobjectorfromaclassthatimplementsthis given configuration of threads is achieved or prints interface, such as the Thread class. These execution threads in a given state, which is provided by the points wereimportanttocapture, because inmostof the ThreadServicesclass(seeFigure2). Inordertodo teststhethreadsareonlystarted(i.e.,thestartmethod this, the RunningThreadsMonitor aspect replaces iscalled). However, whenthestartmethodreturns,it the ThreadServices static methods implementation doesnotmeanthattherunmethodhasstarted. using the AspectJ around advice. Developers would In the following, we show a number of advices that use static methods from ThreadServices class, but presentthecodetoberunimmediatelybeforethepoint- wouldbeinfactusingarealinstanceofThreadLists cutsdescribedabovearereached. instantiated by the RunningThreadsMonitor as- pect. This little “trick” was essential to make the use of before(Thread t): threadStartCalls(t){ aspectstransparenttomostprogrammers. tLists.includeInStartedThreads(t); Oneoftheseadvicesisshownbelow. Theothersfol- } lowthesameidea. before(): runnableRunExecutions(){ void around(): execution(public static void tLists.includeInRunningThreads( org.ourgrid.threadServices.ThreadServices. Thread.currentThread()); waitUntilWorkIsDone()){ } tLists.waitUntilWorkIsDoneNotifying(); } before(Object o): waitCalls(o) && !within(ThreadLists){ tLists.addWaiting(o); This advice defines that instead of exe- } cuting what is defined in the body of the 25 AylaDantas,WalfredoCirne,KatiaSaikoski UsingAOPtoBringaProjectBackinShape: TheOurGridCase ThreadServices.waitUntilWorkIsDone was kept easy as it was introduced in a transparent way method, the waitUntilWorkIsDoneNotifying throughadifferentcalltoatoolalreadyknowbythede- method is called on the ThreadLists instance owned velopmentteam(ApacheAnttool). bytheaspect. 4.2. Findingdeadlocksthroughexistingtests 5. TheRedesignofOurGrid After we have solved the problems of non- According to the analysis based on the aspects in- deterministicteststhatfailedbecauseoftimingproblems, cluded in the code, we discovered that a reengineering westillhadthechallengeofdiscoveringapplicationdead- process was necessary or we would lose the control of locks. thecodecompletely. Themaingoalofthisprocesswasto We randomly called the sleep method (with a ran- wellisolateinternalconcerns(aspects)ofthegridmiddle- dom time at a given interval) on running threads so that wareandtobettercontroltheapplication threads. Some applicationthreadswouldrunindifferentorders.Aspects classes could be reused, but others had to be completely aidedinthistaskandprovidedasolutionmoreappropri- rewrittenorcreated. ated than those based on pure object-oriented approach where several parts of the code needed to be changed. In this reengineering process, we have tried to iden- Testing would then be run over and over seeking for a tify patterns in the concerns implementations that would deadlock. make evolution easier. Besides that, identified patterns A single advice in a new aspect (the could also be used for future concerns to be included on ThreadSleeperAspect) was necessary to per- themiddleware. Forsimplification,wewilljustconsider formthistask: theOurGridbroker(calledMyGrid)redesigntoillustrate ourexperienceandsharesomeideasfromthisprocess. before(): call (* *(..)) Thefirstconcerntoisolatewastheuserinterface. In && withincode (* *..*.run()) { the early versions of OurGrid, users had to directly ac- makeThreadSleepIfItIsHerTurn(); cess remote objects through RMI [17] or Linux scripts, } and they were always forced to change their grid appli- This advice invokes the cations when any interface (or script) changed or when makeThreadSleepIfItIsHerTurn method a new interface was added. Besides, if the communi- before the execution of every call to any method inside cation infrastructure changed, users had to make several a run method execution. The method invoked inside changesintheirapplicationifdirectlyaccessingthecode the advice verifies if the sleep should be called or not, (which was the most common use). Moreover, it was according to a random choice, and then invokes sleep notclearfortheuserswhichmethodsexposedbythere- on the currently executing thread choosing a random mote interfaces should be used. Therefore, we have de- interval. finedtheorg.ourgrid.mygrid.ui.UIServices Althoughthisaspectwasreallyusefulforusinorder interface to offer all MyGrid services. This interface is to find deadlocks, random sleep calls considerably in- illustrated in Figure 3 and it can be accessed via Java or creased the execution time of tests. Therefore, the tests usingscriptwrappersfromtheOurGriddistribution. My- should not be executed every time with these sleep Gridalsooffersagraphicaluserinterfacethataccessesthe calls. AswewereusingAspectJ,inordertoperformthis, UIServices services. More details about the UI ser- wejustexcludetheThreadSleeperAspectfromthe vicescanbefoundintheOurGridmanual[35]. Besides weaving process. To do that, the developers simply isolatingtheuserinterface, weneededtoisolateinternal neededtouseadifferentAnttasktocompile. concerns of MyGrid. In this isolation, we needed to or- In order to find a deadlock through a test that some- ganizetheapplicationthreadsandminimizetheprobabil- timesgetsblockedandsometimespasses,thedevelopers ity of introducing deadlocks in the evolution of the soft- mustincludetheaspectandexecutethetestmanytimes. ware. Inordertodothat,wehavemodularizedthesolu- Themoreexecutions,themorelikelyaproblemisfound. tion and employed an event-based architecture for com- Ifthereisnodeadlocksuspicion,developerscannormally munication [7]. The salient feature of one event-based execute their tests without this aspect. However, auto- designisthatathreadinamoduleneverwandersintoan- matictools,suchasLinuxcrontabcommand,mustbe othermodule.Thefirststepofthisprocesswastoidentify usedtoinvoketheexecutionofthetestsforseveraltimes theconcernsthatneededtobemodularizedinthebroker. using this aspect to assure all tests are passing and that Then,weimplementedeachconcernwithabettercontrol theyarenotgettingblocked. ofitsthreadsandmadecommunicationbetweenthemuse Theuseofaspectsforthreadsmanagementduringde- solelyevents. Thefollowingconcernshadtobeisolated bugging and testing aided tremendously in the project inMyGrid: scheduling,localgridmachinesprovisioning critic phase. Besides that, they were not very difficult (thelocalGuMP),andtheexecutionofreplicas. to be developed since there was someone already famil- Inourimplementationoftheevent-basedarchitecture, iar with AOP in the team. Using the developed aspects wehavenoticedapatternthatwasrepeatedinmanyappli- 26 AylaDantas,WalfredoCirne,KatiaSaikoski UsingAOPtoBringaProjectBackinShape: TheOurGridCase receivemachinesfromthelocalGuMP,usersmustdefine their grid machines and the way to access them. This is performedbyinvokingtheLocalGridMachineProvider module. We will consider this operation to demonstrate the dynamics of the event-based pattern used in the My- Grid implementation that has proved to ease the evolu- tion of the software. To do this, we will present the dy- namics of a setLocalGuMs call, which is a method from the UIServices interface used to configure the local grid machines of the user. In order to do that, the implementer of this interface contacts a remote object, a GuMManager. Figure 4 illustrates this interaction. Every event to be processed by an EventProcessor implements the ActionEvent interface and therefore presents a process() method. In the creation of each event, such as the SetGuMsRequestEvent, il- lustrated by Figure 4, there must be an argument passed Figure3.UserInterfaceservicesofferedviaMyGrid to the object that will actually perform the action rep- resented by the event. In this figure, this object is the cationmodulesandthatcanprobablybeappliedtoother RequestManager. projects. It can be summarized as follows: the services Withtheevent-basedarchitecturedividedinmodules provided by a module are offered through a Fac¸ade [14] andfollowingapatternthatwasrepeatedinmanyplaces, class. This fac¸ade can be accessed by remote objects, maintenancehadbecomeeasierandthreadsmanagement by other fac¸ades, or by any element of the fac¸ade mod- too. TherewasathreadineachEventProcessorofa ule. Each fac¸ade operation is converted into an event to module and when other internal threads fromeach mod- a module, which is abstracted by the fac¸ade to its user. ule were needed, they only changed the internal state of There is a contract that allows only one event in a mod- themoduleviathemodulefac¸ade(viaanevent). Besides ule to be processed at a time, making the application isolating concerns, we had therefore isolated the threads threadsmorecontrollable. Theseeventsareprocessedby and decreased the number of synchronized blocks (from EventProcessorsclassesandtheyusespecificman- 174 in version 2.1.3 to 110 in version 2.2). Another im- agers from each module, which are classes invoked by portant observation is that many elements of the pattern the events when they are processed. The event process- implementation can be automatically generated, such as ingisperformedusingtheCommanddesignpattern[14] the EventProcessor and the basic structure of each to make the EventProcessor a general entity. The Eventusedonamodule. basic elements of our pattern are therefore: the fac¸ade, AlthoughwehavefocusedontheMyGridpartofthe theeventprocessor,theeventsandthemanagers. Itsdy- OurGrid solution, the separation of concerns principle namicsisexplainednextwithaconcreteexamplethatis was also applied in other parts of OurGrid, making the illustratedbyFigure4. system evolution possible and less stressing than in the In MyGrid, three modules were defined; all of them past. Wecouldhaveavoidedredesignandmadetheisola- implementthepatterndescribedabove: tionmostlyusingAspectJ,forexample,withthecodewe hadinthepast. However,withthissolution,wewouldbe • Scheduler avoiding refactoring, which is necessary in several mo- • LocalGridMachineProvider mentsandcannotbereplacedbyAOPbutaidedbyit. Besides the redesign, another aspect that also made • ReplicaExecutor the software evolution better was the stronger focus on importantXPpracticesthatwerenotbeingfolloweddur- Bywellisolatingthesemodules,wehaveseparatedthree ing our development process, as we have discussed in different aspects of the middleware: scheduling, local Section3.2. gums provisioning, and management of replicas’ execu- tion. TheSchedulermoduleisresponsibleforreceivingthe 6. Evaluation requestsfortheexecution ofjobsandallocatingreplicas ofthetasksdefinedinthesejobsformachines. Thema- InthissectionweevaluatetheresultofapplyingAOP chinescanbeprovidedbythelocalGuMP(theLocalGrid in the redesign of OurGrid. We first present a compari- Machine Provider). After this allocation, the scheduler son between the original and the redesigned versions of invokestheReplicaExecutormoduletomanagetheexe- OurGrid. Then we present the lessons learned from the cutionofachosenreplicaatagivenmachine. Inorderto redesignprocess,evaluatingthebenefitsitbroughttothe 27 AylaDantas,WalfredoCirne,KatiaSaikoski UsingAOPtoBringaProjectBackinShape: TheOurGridCase Figure4.OurGridevent-basedpatterndynamics softwareandtheteam. anarchitecturebasedontheprocessingofevents. Before redesign,itwasreallyhardtoknowwhyadeadlockwas happening. Wehavedesignedtwoaspectstohelpiniden- 6.1. Comparativeanalysis tifying these problems. The ThreadSleeper aspect Thefirstanalysiswepresentisacomparisonbetween and the RunningThreadsMonitor aspect helped us thestructureoftheOurGridcodebeforetheredesignand improvethequalityofourtestswithmultithreadedcode. afterseparatingthecrosscuttingconcerns. However, correcting an identified deadlock was really The aspect that triggered the rework was the thread hard, because we had many synchronization blocks and management problem. Because the tasks of provision- differentobjectsaslocksfortheseblocks. ingmachines,schedulingandmanagementofexecutions Aswecanobserve,areengineeringprocesswasnec- werescatteredacrossthecode, itwasdifficulttocontrol threadsrelatedtoeachtask.Theimplicationwasthecom- essaryinordertobetterisolateOurGridaspects. InAOP plexity in finding deadlocks and correcting them since methodology, the crosscutting concerns are modularized by identifying a clear role for each one in the system, thereweremanysynchronizationblocksanddifferentob- jectsaslocksfortheseblocks. implementing each role in its own module, and loosely coupling each module to only a limited number of other Before redesign, we can summarize the architecture modules[22]. usingFigure5. Ascanbeseen,RemoteObjects(objects that implement the java.rmi.Remote interface) re- After redesign, we created a structure to be followed ceived remote method invocations possibly from differ- by developers, clearly defining the application modules ent threads. The threads that came from method calls and the way these modules should interact. In order to these objects could freely walk along the application to better isolate thread management, which was spread packages. For each thread execution, several synchro- throughoutthecodeusingsynchronizedblocks,wehave nizedblockswerevisited. Inordertoillustratethecom- reduced these blocks and we have used an event-based plexity in extending and debugging OurGrid, Figure 6 architecture. For each module, every invocation per- shows a sequence diagram representing some internal formed to a remote object by a different thread was method calls that were invoked when the user wanted redirected to the correspondent module fac¸ade. The to add a new grid machine to his personal grid. In fac¸ade then created an event, depending on the method thissequenceofcalls,manysynchronizationactionshap- called on it, which would be processed later by the pened and different objects were used as locks. Ini- EventProcessor of that module. The necessary tially,thereisasynchronizationactionontheprocessors classes from the module would be invoked in a secure list managed by the LocalGMProviderImpl class. way since only one event per module would be invoked Then, there are calls to getInstance and isAlive atatime, changingthemodulestateinaconsistentway. methods from GuMStateOracle, which are synchro- Figure8summarizesthearchitectureafterredesigncon- nized. SynchronizedmethodsarealsocalledonthePro- sidering threads execution in a general way. An instan- cessor class (getGridMachine and active). Be- tiation of a thread execution in the redesigned software sides that, there is also a newProcessor call, from wasshowninFigure4. Althoughwecouldobtainabet- theRequestResponderclass,whichisalsosynchro- ter separation of concerns and a better management of nized. application threads, we had some impacts on some met- rics,whichwerecalculatedusingtheEclipseMetricsplu- In order to understand the problem, we made several gin[1]. TheyaresummarizedinTable1. drawingsusingdifferentcolorstoidentifythreadsbehav- ior. In Figure 7, we illustrate one of the drawings that One of the measures considered was the number of helped us understand all synchronized blocks (including classes. It has grown from 277 to 427, representing an methods) of OurGrid and the objects that were used as increaseof54.15%. Thishappenedbecauseweneededa locksintheseblocks. Inordertoisolatethreadsmanage- classforeacheventtobeprocessedinamoduleinsteadof mentintheredesignedversion,developershadtofollow asinglemethodusedinthepreviousversions. However, 28 AylaDantas,WalfredoCirne,KatiaSaikoski UsingAOPtoBringaProjectBackinShape: TheOurGridCase Figure5.OurGridbeforeredesign Figure6.Addingagridmachinebeforeredesign 29 AylaDantas,WalfredoCirne,KatiaSaikoski UsingAOPtoBringaProjectBackinShape: TheOurGridCase Figure7.AttempttounderstandOurGridthreadsbeforeredesign Figure8.OurGridafterredesign 30
Description: