Linköping University Post Print Message classification as a basis for studying command and control communication: an evaluation of machine learning approaches Ola Leifler and Henrik Eriksson N.B.: When citing this work, cite the original article. The original publication is available at www.springerlink.com: Ola Leifler and Henrik Eriksson, Message classification as a basis for studying command and control communication: an evaluation of machine learning approaches, 2011, Journal of Intelligent Information Systems. http://dx.doi.org/10.1007/s10844-011-0156-5 Copyright: Springer Science Business Media http://www.springerlink.com/ Postprint available at: Linköping University Electronic Press http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-67227 JournalofIntelligentInformationSystemsmanuscriptNo. (willbeinsertedbytheeditor) Message Classification as a basis for studying command and control communications - An evaluation of machine learning approaches OlaLeifler (cid:1) HenrikEriksson thedateofreceiptandacceptanceshouldbeinsertedlater Abstract In military command and control, success relies on being able to perform key functionssuchascommunicatingintent.Moststafffunctionsarecarriedoutusingstandard means of text communication. Exactly how members of staff perform their duties, who they communicate with and how, and how they could perform better, is an area of active research.Incommandandcontrolresearch,thereisnotyetasinglemodelwhichexplains all actions undertaken by members of staff well enough to prescribe a set of procedures for how to perform functions in command and control. In this context, we have studied whether automated classification approaches can be applied to textual communication to assistresearcherswhostudycommandteamsandanalyzetheiractions. Specifically,wereporttheresultsfromevaluatingmachineleaningwithrespecttotwo metrics of classification performance: (1) the precision of finding a known transition be- tweentwoactivitiesinaworkprocess,and(2)theprecisionofclassifyingmessagessimi- larlytohumanresearchersthatsearchforcriticalepisodesinaworkflow. Theresultsindicatethatclassificationbasedontextonlyprovideshigherprecisionre- sultswithrespecttobothmetricswhencomparedtoothermachinelearningapproaches,and thattheprecisionofclassifyingmessagesusingtext-basedclassificationinalreadyclassi- fieddatasetswasapproximately50%.Wepresenttheimplicationsthattheseresultshavefor thedesignofsupportsystemsbasedonmachinelearning,andoutlinehowtopracticallyuse textclassificationforanalyzingteamcommunicationsbydemonstratingaspecificprototype supporttoolforworkflowanalysis. Keywords Command and control; classification; exploratory sequential data analysis; workflowmining;randomindexing;textclustering O.Leifler(cid:1)H.Eriksson Dept.ofComputerandInformationScience LinköpingUniversity SE-58183Linköping,Sweden Tel.+4613281000 E-mail:{ola.leifler,henrik.eriksson}@liu.se 2 1 Introduction Althoughsuccessfulcommandandcontrolisessentialtothesuccessofcrisismanagement andmilitaryoperations,ourunderstandingofhowcommandandcontrolisperformedisstill limited (Brehmer 2007). Studying command teams present commanders and researchers with great challenges. First, commanders need to accomodate shifting circumstances and uncertaininformationabouttheenvironmentintheirworkprocesswhichmakesthework processinherentlydynamic(Kleinetal.1993).Second,theuseofelectroniccommunica- tionsandnewmediaincommandteamsyieldslargeamountsofdata(e.g.textcommunica- tions,audio,video,computerlogs)thataredifficultforresearcherstoprocess. An important aspect of analyzing command and control is to find critical episodes in theworkflowthatwarrantfurtherstudy.Currently,mostanalysesofelectroniccommunica- tioninbothsituatedanddistributedteamworkareconductedmanuallythroughtheuseof classificationschemes(Silverman2006).Aconsequenceofthesignificanteffortrequiredby manuallyclassifyingcommunicationsisthatonlyapartofteams’communicationpatterns canbeexplored.Theprospectofusingautomaticsupportforfindingrelationsincommand andcontrolcommunicationsisthereforeappealing. Thispaperpresentsanevaluationofautomatedapproachesforclassifyingtextmessages intheworkflowsofcommandandcontrolteamsbycomparingaselectionofclassifierswith respecttotheirprecisionofclassifyingmessagessimilarlytohumanexperts.Ourselection ofclassificationapproachestocomparewasjustifiedbytherequirementsofawidelyused methodforstudyingcommandandcontrol,ExploratorySequentialDataAnalysis(ESDA) (SandersonandFisher1994).Theresultsfromourevaluationaretwofold:first,weidentifya classificationapproachwhichissuitableforuseinanESDAapplication,andsecond,based ontheprecisionresultsattained,weoutlinehowtheclassificationapproachcouldbeused tosupportthestudyofcommandandcontrolworkflows. In the following sections, we describe command and control research and the ratio- naleforinvestigatingmachinelearningapproachesforsupportingitinSection2.Section3 presentsresearchonextractingpatternsrelatedtoworkflowsandrelatedconceptsfromtexts. InSection4wepresentthespecificdatasetswehaveappliedourclassificationapproaches on.WepresentourclassificationapproachesinSection5andtheresultsofclassifyingmes- sages in Section 6. Based on these results, we discuss their implications on the design of support tools for analyzing command and control communications and present an imple- mentation that uses automatic classification of text messages in Section 7, and Section 8 concludesthispaper. 2 Background Commandandcontrolresearchersinvestigatehowgroupsandgroupmembersperformtheir tasks,identifyperformancemeasuresforthegroupandstudyhowtheycouldimprovetheir performance (Brehmer 2007). There are several frameworks for understanding teams and teamwork(e.g.(Argyle1972;Salasetal.2008)).Acommonrepresentationofteamwork- flows is to use graphs, where nodes represent tasks and arcs denote transitions between tasks.Suchgraph-basedworkflowmodelshavebeensuggestedfortheanalysisandsupport thecoordinationofworkinvariousprofessionalsettings(Medina-Moraetal.1992;vander AalstandvanHee2002). One example of a workflow model that aims to describe how members of command teams perform their tasks is the Dynamic Observe-Orient-Decide-Act model in Figure 1. 3 Fig.1 TheDynamicObserve-Orient-Decide-ActloopbyBrehmerasanabstractmodelofaworkflowin commandandcontrolwithtasksandtransitionsbetweenthem(Brehmer2005). DOODAdescribesasetoftaskswithtransitionsfromonetasktoanother.Thesetaskscan beoverlappingoriterating,suchasthetasksofsensemaking(Weick1995)anddatacollec- tioninDOODA.Atonepoint,however,thereisatransitionfromsensemakingtoplanning, when the commander’s intent is formulated and communicated to subordinate units. Irre- spective of whether this model accurately describes command and control at a sufficient levelofdetailforcorrelatingtheactivitiesinthemodeltotheobservableactivitiesinacom- mandstaff,themodelcouldbeusedasahypothesisforanalyzingstaffwork.Ifwebelieve, accordingtoamodelsuchasDOODA,thatthestaffshouldbeginwithdatacollection,and weknowthatmessagesofcertaintypesdenoteatransitiontothesensemakingstepinthe DOODAprocess,thentheabsenceorpresenceofsuchtypesofmessageswouldbepartof aresearcher’sworkofestablishingperformancemeasuresforacommandstaff. Ingeneral,wecaninterpretthetaskofunderstandingcommandandcontrolasthreesep- aratetasks.First,understandinghowcommandteamsandteammembersperformtheirtasks meansconstructingageneralworkflowmodelsuchasDOODAfromcommandandcontrol scenarios.Second,establishingdirectperformancemeasuresissynonymoustorelatingthe workflowmodeltotheestimatedoutcomeofscenariosasdefinedbyindirectmeasurements ofscenariooutcome(forexample,performancescoresincomputersimulations(Johansson etal.2003)orevaluationsbyhumanexpertsofteamperformanceinrole-playingexercises (Jensen2009)).Third,improvingperformanceisequalto,ineachparticularscenario,using thoseperformancemeasurestorelatestaffactionstotheproposedworkflow.Theprocessby whichresearchersestablishaworkflowandrelatestaffactionstoitfromrecordedscenario dataisbasedontwoprincipalactivities:(1)labelingcommunicationactswithacategoriza- tionscheme(Thorstenssonetal.2001),and(2)lookingforhigherlevelpatternsofepisodes (tasks)withthelabelledcommunicationactstofocusthesearchforcriticalpointsthathave affectedtheoutcomeofthescenario(SandersonandFisher1994;Albinssonetal.2004). The work of labeling messages according to message categories is the most time- consumingstep,withvastquantitiesofcommunicationdatatosiftthroughiteratively,first searchingforcommonalitiesthatcanleadtoclassificationschemes,andlaterbyapplying classificationschemestoallutterancesandreducingtheamountofdatatoasetofepisodes basedontheclassification.Thisisalsotheactivityforwhichweevaluatetheuseofmachine learningtechniques. 3 Relatedwork Theproblemofinferringactivitiesfromtext-basedcommunicationshasbeenstudiedpre- viouslybyKushmerickandLau(KushmerickandLau2005).Theirapproachwasbasedon searchingforspecificsyntacticpatternsoriginatingfromtheuseofcomputersoftware(e- 4 commercesystems).Thosepatternswereinturnusedtoorganizemessagesintoworkflows. Patternsoriginatingfromtheuseofcomputersystemshasalsobeenstudiedbythework- flowmanagementcommunity(vanderAalstetal.2003)whereworkflowshavebeenelicited frominteractionswithworkflowmanagementsystemsorothersoftwaresystems.Boththese approachesconcerntheminingofmachine-generatedpatterns,notpatternsoriginatingfrom humanactivities. Regardingtherecognitionofhumanactivitiesfromtext,Scerrietal.(2008)havepro- posed a model for human workflow management in a semantic desktop environment that relies on the detection or tagging of speech acts in e-mail. Their approach is based on SpeechActrecognitionperformedbyaspeechactextractionwebservicewhichusesgram- marpatternsfordetectingspeechacts.Theirstatedapplicationistosupportindividualsby monitoringunresolvedissuesine-mailconversationssuchasunansweredquestions.Other researchershavedescribedanapproachtoworkflowminingfromunstructureddatawhich reliesontheexistenceofafixed,knownnumberofactivitytypesornamedentitiesinmes- sages for determining which activity a message pertains to (Wen et al. 2009; Geng et al. 2009). Mainly, however, the problem of extracting patterns from e-mail has been studied for the purpose of filtering spam (Sahami et al. 1998) which is essentially equivalent to consideringwhetheramessageisatallrelatedtoanykindofactivitytheuserisengagedin. Severalprojectshaveattemptedtoelicitpatternsofadomain-specificdiscourse,mainly from questions and responses sent between customers and company support lines for the purpose of helping customer support identify previous,relevant answers to newquestions (e.g.(LarssonandJönsson2009;Chalamallaetal.2008)). Indocumentmanagement,researchershavestudiedapproachestorelatespecificdomain knowledgeintheformofconcepts,objectsandrelationstotextualdocuments(McDowell andCafarella2006;Eriksson2007)andbasedonsuchsemanticdocuments,someprojects havestudiedhowtocreatesupportforinformationmanagementinteamworkflowsbyusing domain-specificdocumentfeatures(Franzetal.2007;LeiflerandEriksson2009). 4 Material Weusedthreedatasetstoestablishhowwellmachinelearningapproacheswouldclassify messagescomparedtohumanclassification.Ourdatasetscamefromthreecommandand controlscenarios(LabeledALFA-05,C3Fire-05andLKSfromtheprojectstheyoriginate from) in which crisis management teams had used free text-based means of communica- tionforcoordinatingtheirwork(fendingoffforestfiresinALFA-05andC3Fire-05,and defendingagainstinformationwarfareinLKS).Inallsettings,theparticipantsengagedin activitiestheywerelikelytoencounterintheirprofessionandthesettingsusedhadauthentic chainsofcommandandscenariodescriptions.Thetasksineachscenariowereconductedas simulatedexerciseswheretheparticipantscollaboratedinteamstosolveatask.Theirper- formancehadbeenassessedbythestaffleadingtheexercises,whichinallcasesconsisted ofresearchersstudyingteamperformances. In Table 1 we list the attributes made available to the classifiers we studied. Some of theattributeswerederivedfromotherattributesandreflectedwhatwebelievedwasrelevant for human classification of the messages in each dataset. The Message direction attribute iscalculatedusingthealgorithminFigure2,whichimplementsthecompareTomethod availableinJavaandotherprogramminglanguages. 5 Table1 Non-textattributesusedformessageclassification. Attribute Description ALFA-05 C3Fire-05 LKS Sender Text (cid:2) (cid:2) (cid:2) Recipient Text (cid:2) (cid:2) (cid:2) Senderlevel1 f0:::4g, low values represent (cid:2) (cid:2) highrankintheorganization Recipientlevel1 f0:::4g, low values represent (cid:2) (cid:2) highrankintheorganization Questionmarkspresent1 ftrue;falseg (cid:2) (cid:2) Messagedirection1 f(cid:0)1;0;1g, calculated as the (cid:2) normalized difference between the sender level and recipient level Messagetext Text (cid:2) (cid:2) (cid:2) Messagetime Date (cid:2) (cid:2) (cid:2) Messagetype Nominaldecisionattribute (cid:2) (cid:2) (cid:2) def get_message_direction(instance): direction = instance.sender_level - instance.recipient_level if direction > 0: return 1 elif direction < 0: return -1 else: return 0 Fig.2 AlgorithmforcalculatingtheMessagedirectiondatasetattribute. 4.1 ALFA-05 TheALFA-05datasetconsistedof849textmessagesexchangedbetweensevencomman- dersinasimulatedcrisisresponsescenario(Trnkaetal.2006).Duringthescenario,com- manders operated at three levels of command, in two administrative areas (approximately county-sizedareas)andplayedarole-playingsimulationexercise(TrnkaandJenvald2006) inwhichtherewasinitiallyaforestfirebutsubsequentlyalsoanevacuationfromazooas wellasasearchandrescueoperation.Participantscommunicatedwithoneanotherthrough atext-basedmessagingsystemdesignedforuseinmicro-worldsimulationswiththeC3Fire simulationenvironment(Johanssonetal.2003),althoughitsharedthebasicfeaturesofan e-mailmessagingsystemwithouttheuseofsubjectlinesorotherauxiliarye-mailheaders. Thescenariowasplayedoverthecourseofoneday. Eachmessagehadbeenassignedoneof19differentclassesbyhand.Theseclassesfall intofourspeech-act-relatedcategoriestiedtothefunctionsofcommandandcontrol(Trnka etal.2006).Thefourcategorieswerequestions,information,commandsandothermessages (theMessagetypeinTable1).WhenresearchershadlookedforpatternsintheALFA-05 dataset,theyhadstudiedboththegeneralproportionsofmessagesofeachclasssenttoand fromtheparticipantsinthescenario,buttheyhadalsostudiedspecificsequencesofspeech acts, such whether as a set of information and question-labelled message exchanges had precededacommand. 6 4.2 C3Fire-05 TheC3Fire-05datasetwassimilartoALFA-05withregardtothescenarioplayedandthe categorizationused.Itconsistedof619messages.Oneofthemaindifferenceswasthatit wascategorizedbytwoindependentresearcherswitha77.86%agreementbetweenthetwo on which category to assign each message (the agreement was 87.02% when considering onlythefourmaincategoriesdescribedinthesectionabove).Onlythosemessageswhich hadbeenclassifiedsimilarlybythetworesearcherswereselectedforclassifiercomparison. The other main difference compared to ALFA -05 messages was the participants of the study,whoweredomainexpertsintheALFA-05scenarioandstudentsintheC3Fire-05 scenario. 4.3 LKS The LKS dataset consisted primarily of 113 e-mail messages exchanged during a training exercise concerning information warfare at the Swedish Defense Research Institute. All participantswereexpertsinthedomainandtheexerciseservedthedualpurposebeingof exercise for them as well as a study of performance indicators in command and control. The scenario was role-played over the course of two days and the participants received instructionsfromtheirhighercommandtoengageinintelligenceoperationsforthefirstday tofindinformationabout,locate,andmonitorpotentialterrorists,andrepelthreatsduring anevacuationofaVIPduringtheseconddayoftheiroperation. Due to these instructions, we categorized the e-mail exchanges pertaining to the first dayasintelligenceandthosefromthesecondasevacuation,whichwasconsistentwiththe expectedoutcomeoftheexercise.Themanualclassificationsofbothdatasetswereusedas validationoftheautomaticclassificationapproacheswereportinthispaper. 5 Method Toverifythattheinformationinmessagescouldbeusedfordistinguishingcontextuallysig- nificantclassesofmessagesfromoneanother1 consistentwithhowcommandandcontrol researcherswouldclassifymessages,weaddedmeta-datatoourdatasetsthatwebelievedto berelevanttoclassification.Withthesedatasets,weconductedacomparisonbetweensev- eralclassificationapproachesbyusingstandardmethodsforevaluatingMachineLearning algorithms. Messagesinamilitarycommandandcontrolworkflowusuallycontaindomain-specific attributessuchastherankandroleofparticipants.Also,researchersmayclassifyaccording to the appearance of question marks and the grammatical structure of messages. To un- derstand how these attributes affect automated classification, we compared the impact on classificationresultsofencodingtheseattributesaspartofthemessageinstances.Theap- pearanceofquestionmarksbecameabinaryattributeavailabletonon-textclassifierswhile thegrammaticalstructurewasmadeavailabletoaStringSubsequenceKernel-basedclas- sifier (see Section 5.2). We also evaluated the relative significance of non-text attributes inrelationtothetextbyusingacombinedclassifierthatwoulduseatext-basedclassifier and a non-text classifier in combination for classification. The combined classifier would 1 suchasidentifyingthetwotasksintheLKSdatasetorthemessageclassesrelatedtospeechactsinthe ALFA-05andC3Fire-05datasets 7 Table2 FrequencyofmessagesineachofthefourmessagecategoriesoftheALFA-05dataset. Category Proportion Questions 23% Information 39% Orders 17% Othermessages 24% alsoprovideinformationontherelativecontributionsofanon-textclassifiercomparedtoa text-basedone. Apartfromdomain-specificmessageattributeswhicharelikelytoinfluencehumanclas- sificationsofmessages,weconsideredtheinfluenceofanumericalattributewithstatistically significantdifferencesofattributevaluesacrossthecategoriesofmessages:messagelength. Toestablishwhetherasignificantdifferenceinmessagelengthswouldbeusedbyaclassi- fierwhenbuildingaclassifiermodel,westudiedwhetherastandarddiscretizationapproach (FayyadandIrani1992)(asrequiredbytheclassifiersweevaluated)wouldgeneratemean- ingfulnominalintervalvaluesandifso,whatprecisionresultstheclassifierswouldattain. We also considered the precision of a random classifier and used that as a baseline for comparing the results of using our selected classification algorithms. If our classifier wouldnotfindameaningfuldistancemeasureforthepurposeofclassifyingwithrespectto messagecategories(intheALFA-05dataset)orbelongingtodifferentstagesinthescenario workflow(intheLKSdataset),theclassifierwouldbasicallychooseaclassatrandom.The precisionitcouldattainforeachdecisionclasscouldthenbedescribedasafunctionofthe proportionofinstancesofeachdecisionclassinthetrainingdata. Theprecisionofthealgorithmisexpressedasthenumberoftimesthealgorithmanswers correctly,dividedbythetotalnumberofquestionsasked.Thus,itisthesumofthenumber ofcorrectclassificationswithrespecttoeachoftheclasses.Acompletelyrandomclassifier, givenadatasetU andafunctiondformappingmessagestothedomainofdecisionclasses fc ;c ; :::;c gwherethesizesofeachclassisjc j = jfx 2 U : d(x) = c gjwouldattain 1 2 l i i precisionofEquation1. ( ) jc j 2 (cid:6)l i (1) i=1 jUj The LKS dataset consisted of two classes, evenly distributed with 61 messages from day one and 52 from day 2. The random precision would be (61=113)2 +(52=113)2 = 0:5032,closeto50%.GiventhedistributionofdecisionclassesinTable2,randomprecision attainableintheALFA-05datasetwas0:232+0:392+0:172+0:242 =0:29.Classification resultsofapproximately29%inALFA-05wouldthereforebeattributedtothedistribution ofmessagesandnottothemessagecontents.FortheC3Fire-05dataset,thedistributionsof classeswasmoreevenforbothsetsofclassificationsfromthetworesearchers,resultingin randomprecisionof24.04%and25.07%respectively. When evaluating the different approaches to classify messages, we used a stratified cross-validation (Witten and Frank 2005) on each dataset. To accomodate the execution times of text-based classification, we decided to use a 3-times 3-fold stratified cross- validation on our datasets for evaluation. The results were stable when confirmed with a train-and-testprocedureoneachdataset. 8 Table3 Messagelengthsinallcategories Information Commands Questions Other Mean 110(cid:6)78:94 76(cid:6)93:18 90(cid:6)68:56 55(cid:6)60:18 Median 92 58 68 32 5.1 Messagelengths ThemessagesfromthefourmaincategoriesoftheALFA-05datasetwerecomparedwith oneanotherwithrespecttothelengthsofthemessagesineachcategory.Sincethediffer- entmessagecategoriescontainedadifferentnumberofmessagesandthemessagelengths couldnotbeassumedtobenormallydistributed,wecomparedthedifferenceswithanon- parametricMann-WhitneyU-test.Allcategoriesofmessageswerecomparedtooneanother pairwise. All pairs of categories displayed significant differences in message lengths (p < 0.002)andthemeanandmedianvaluesdifferedasoutlinedinTable3,alongwithstandard deviationsfromthemeans. 5.2 Classifierselection Theclassificationschemesweusedforbothtextclassificationandnon-textclassificationon ourdatasetswereselectedbasedontwoprimarycriteria: 1. themodelsbuiltaspartoflearningpatternsindatashouldbeaccessibletohumanin- spection,and 2. theyshouldbecomputationallytractableforinteractiveuseinbothscenarios. The first criterion, accessibility, was considered important because of the prospect of usingthe resulting classifier model as abasis for a support tool for commandand control researchers.InESDAanalysis,explorationmeansusingvariousdatasourcesincombination todetectpatternsofteamactivity.Foracomputer-basedsupporttoolinthisprocess,estab- lishing trust is critical, and understanding the basis for making classifications could even be more important than high precision for classification, depending on the role of a clas- sifier. The second criterion, computational tractability for interactive use, was considered importantforthepracticaluseofautomaticclassification.Indataexplorationtoolssuchas MacSHAPA(Sandersonetal.1994)andMIND(Thorstenssonetal.2001),researchersnav- igatescenariodatalookingforcriticalepisodesbyscanningatimelineaccordingtowhich allscenariodataisloggedtofindincidentsthatareimportantforfurtherstudy.Whenusing suchtools,researchersexpectinteractionwithdatatobesmoothandallowfastmanipula- tionsduetothelabor-intensivetaskoffindingcriticalepisodes.Foranautomaticclassifier tocontributeinsuchexploration,itwouldhavetobuildaclassifiermodelfastenoughnot tointerruptthecloserstudyofdata. Basedonthesecriteria,weselectedatext-basedclassificationschemethatwouldcon- nectimportanttermsaswellastherelationshipsbetweentermsduringtheprocessofclas- sification,withtheintentionofusingthosetermsaspartofaworkflowanalysistool.Also, it would have to handle the datasets we had with little computational overhead. Based on these criteria, we decided to use the Random Indexing (RI) (Kanerva et al. 2000) vector space model as the primary method of text classification. RI assigns random vectors of a fixeddimensionalitytowordsandtextstocreatethevectormodelformeasuringsimilarity betweentexts(Kanervaetal.2000).PriortobuildingtheRImodel,wefilteredthemessages 9 Questionmark present = true: 1 Questionmark present = false | Sender = RL Nkpng: 2 | Sender = LKC E-ln | | Recipient_level <= 3 | | | Time <= 1133433043000 | | | | Time <= 1133432780000: 2 | | | | Time > 1133432780000: 3 | | | Time > 1133433043000 | | | | Recipient = Ambu E-ln: 2 | | | | Recipient = Patruller E-ln: 2 | | | | Recipient = RL Nkpng: 2 | | | | Recipient = LKC D-ln | | | | | Time <= 1133436580000: 4 | | | | | Time > 1133436580000: 2 Fig.3 PartofthedecisiontreegeneratedbytheJ48classifierontheALFA-05dataset sothatcommonlyused,domain-independentwords(stopwords)wouldnottaintourresults. InadditiontotheRI-basedtextclassificationmethod,aStringSubsequenceKernelwasalso usedforanalyzingthegrammaticalstructureofmessages(seeSection5.4)andforcompar- isonoftheRItextclassificationresults.Fornon-textclassification,weusedfourdifferent classifiers,representingfourclassesofinferencemechanisms: 1. J48,aclassifierbasedondecision-trees(Quinlan1993) 2. aDecisionTableclassifier(Kohavi1995) 3. PART,arule-basedclassifier(FrankandWitten1998) 4. aNaveBayesclassifier(JohnandLangley1995) The first three classifiers were selected based on the accessibility of the models they construct, and the fourth, the Bayesian classifier, was selected due to previously reported resultsonclassifyingmessageswithrespecttoworkflow-relatedactivitytypes(Gengetal. 2009)withaBayesianclassifier.WeconductedtheevaluationwithintheWEKAknowledge analysisframework(Halletal.2009),withinwhichwealsoimplementedanRI-basedtext classifier. In deserves to be noted that, due to the relatively small datasets available for training the classifiers, we assumed that there would be differences in precision which could be attributedtoclassifierselection,apartfromthedifferencesinwhatmodelstheybuild.For largerdatasets,ithasbeenarguedthatclassifierselectionmaybecomelessimportant(Banko andBrill2001),whichiswhytheissueofevaluatingclassifiersmaymakemoresensewith smallerdatasets. Whenstudyinghowaccessiblethemodelsproducedbytheclassifierswere,wetriedto elicit the heuristics that the classifier models expressed in order to establish whether they were sound compared to how human experts would reason. The decision tree in Figure 3 shows,inASCIIformat,anumberofdecisionbrancheswheretheshortestbranchindicates thatthepresenceofaquestionmarkshouldclassifythemessageasbeingaquestion(Mes- sage type 1 in the ALFA -05 dataset). Then, there are a number of conditions in the tree which correspond to combinations of sender, the organizational level of the recipient and time, which can be explained by the change in interactions between the command center and the field units during the scenario. Early in the scenario, most exchanges concerned informationexchanges(Messagetype2),whereaslaterexchanges,initiatedbyhighercom- mand,concernedorders(Messagetype3).
Description: