Word Storms: Multiples of Word Clouds for Visual Comparison of Documents Quim Castella Charles Sutton SchoolofInformatics SchoolofInformatics UniversityofEdinburgh UniversityofEdinburgh [email protected] [email protected] ABSTRACT Oneofthemostpopularcloudgenerators,Wordle,hasgen- eratedover1.4millioncloudsthathavebeenpubliclyposted Word clouds are a popular tool for visualizing documents, 3 but they are not a good tool for comparing documents, be- [6]. 1 cause identical words are not presented consistently across Despite their popularity for visualizing single documents, 0 different clouds. We introduce the concept of word storms, word clouds are not useful for navigating groups of docu- 2 a visualization tool for analysing corpora of documents. A ments, such as blogs or Web sites. The key problem is that word clouds are difficult to compare visually. For example, n word storm is a group of word clouds, in which each cloud say that we want to compare two documents, so we build a a representsasingledocument,juxtaposedtoallowtheviewer word cloud separately for each document. Even if the two J tocompareandcontrastthedocuments. Wepresentanovel documentsaretopicallysimilar,theresultingcloudscanbe 3 algorithm that creates a coordinated word storm, in which very different visually, because the shared words between words that appear in multiple documents are placed in the thedocumentsareusuallyscrambled,appearingindifferent ] same location, using the same color and orientation, in all R ofthecorrespondingclouds. Inthisway,similardocuments locations in each of the two clouds. The effect is that it is difficult to determine which words are shared between the I arerepresentedbysimilar-lookingwordclouds,makingthem . documents. s easiertocompareandcontrastvisually. Weevaluatetheal- c gorithm in two ways: first, an automatic evaluation based In this paper, we introduce the concept of word storms [ to afford visual comparison of groups of documents. Just on document classification; and second, a user study. The results confirm that unlike standard word clouds, a coor- as a storm is a group of clouds, a word storm is a group of 1 wordclouds. Eachcloudinthestormrepresentsasubsetof v dinated word storm better allows for visual comparison of the corpus. For example, a storm might contain one cloud 3 documents. perdocument,oralternativelyonecloudtorepresentallthe 0 documents written in each year, or one cloud to represent 5 CategoriesandSubjectDescriptors each track of an academic conference, etc. Effective storms 0 H.5 [Information Search and Retrieval]: Information make it easy to compare and contrast documents visually. . 1 Interfaces and Presentation We propose several principles behind effective storms, the 0 most important of which is that similar documents should 3 1. INTRODUCTION be represented by visually similar clouds. To achieve this, 1 algorithmsforgeneratingstormsmustperformlayoutofthe v: theBreecaisusaeodfetmhaenvdasftonruwmabyesrtooftaelxlotwdopcueompelenttsoonsctahneWlaregbe, clouds in a coordinated manner. i numbers of documents quickly. A natural approach is vi- We present a novel algorithm for generating coordinated X word storms. Its goal is to generate a set of visually ap- sualization,underthehopethatvisuallyscanningapicture r pealing clouds, under the constraint that if the same word maybeeasierforpeoplethanreadingtext. Oneofthemost a appears in more than one cloud in the storm, it appears in popular visualization methods for text documents are word a similar location. Interestingly, this also allows a user to clouds. A word cloud is a graphical presentation of a doc- see when a word is not in a cloud: simply find the desired ument, usually generated by plotting the document’s most word in one cloud and check the corresponding locations common words in two dimensional space, with the word’s in all the other clouds. At a technical level, our algorithm frequency indicated by its font size. Word clouds have the combines the greedy randomized layout strategy of Wor- advantages that they are easy for naive users to interpret dle, which generates aesthetically pleasing layouts, with an and that they can be aesthetically surprising and pleasing. optimization-based approach to maintain coordination be- tween the clouds. The objective function in the optimiza- tion measures the amount of coordination in the storm and is inspired by the theory of multidimensional scaling. We apply this algorithm on a variety of text corpora, in- cluding academic papers and research grant proposals. We evaluatethealgorithmintwoways. First,wepresentanovel automaticevaluationmethodforwordstormsbasedonhow well the clouds, represented as vectors of pixels, serve as Insubmission.Lastmodified3Jan2013. features for document classification. The automatic evalua- then we can build in a storm in which each word cloud tionallowsustorapidlycomparedifferentlayoutalgorithms. represents a single document. However, this evaluation is not specific to word clouds and may be of independent interest. Second, we present a user 2. Temporal Evolution of Documents. If we have a studyinwhichusersareaskedtoexamineandcomparethe set of documents that have been written over a long pe- riod,suchasnewsarticles,blogposts,orscientificdocu- cloudsinstorm. Bothexperimentsdemonstratethatacoor- ments, we may want to analyze how trends in the docu- dinatedwordstormisdramaticallybetterthanindependent word clouds at allowing users to visually compare and con- ments have changed over time. This is achieved using a trast documents. wordstorminwhicheachcloudrepresentsatimeperiod, e.g., one cloud per week or per month. By looking at the clouds sequentially, the user can see the appearance 2. DESIGNPRINCIPLES and disappearance of words and how their importance Inthissectionweintroducethe conceptofawordstorm, changes over time. describe different types of storms, and present design prin- ciples for effective storms. 3. Hierarchies of Documents. If the corpus is arranged Awordstormisagroupofwordcloudsconstructedforthe inahierarchyofcategories,wecancreateastormwhich purpose of visualizing a corpus of documents. In the sim- containsonecloudforeachofthecategoriesandsubcat- plesttypeofstorm,eachcloudrepresentsasingledocument egories. Thisallowsforhierarchicalinteraction,inwhich by creating a summary of its content; hence, by looking at foreverycategoryofthetopichierarchy,wehaveastorm the clouds a user can form a quick impression of the cor- that contains one cloud for each subcategory. For in- pus’s content and analyse the relations among the different stance,thisstructurecanbeusefulinacorpusofscientific documents. papers. Atthetoplevel,wewouldfirsthaveastormthat We build on word clouds in our work because they are containsonecloudforeachscientificfield(e.g.,chemistry, a popular way of visualising single documents. They are physics, engineering), then for each field, we also have a very easy to understand and they have been widely used to separate storm that includes one cloud for each subfield createappealingfigures. Bybuildingastormbasedonword (such as organic chemistry, inorganic chemistry) and so clouds, we create an accessible tool that can be understood onuntilarrivingatthearticles. Anexampleofthistype easilyandusedwithoutrequiringabackgroundinstatistics. of storm is shown in Figures 2 and 3. The aim of a word storm is to extend the capabilities of a word cloud: instead of visualizing just one document, it is To keep the explanations simple, when describing the algo- used to visualize an entire corpus. rithmslateron,wewillassumethateachcloudinthestorm There are two high level design motivations behind the represents a single document, with the understanding that concept of word storms. The first design motivation is to the“document”in this context may have been created by visualizehigh-dimensionaldatainahigh-dimensionalspace. concatenating a group of documents, as in the storms of Manyclassicalvisualizationtechniquesarebasedondimen- type 2 and 3 above. sionalityreduction,i.e.,mappinghigh-dimensionaldatainto 2.2 LevelsofAnalysisofStorms a low dimensional space. Word storms take an alternative strategy, of mapping high dimensional data into a different Asinglewordstormallowstheusertoanalysethecorpus highdimensionalspace,butonewhichistailoredforhuman at a variety of different levels, depending on what type of visualprocessing. Thisasimilarstrategytoapproacheslike information is of most interest, such as: Chernofffaces[2]. Theseconddesignmotivationistheprin- 1. Overall Impression. By scanning the largest terms ciple of small multiples [12, 11], in which similar visualiza- acrossalltheclouds,theusercanformaquickimpression tions are presented together in a table so that the eye is of the topics of whole corpus. drawn to the similarities and differences between them. A word storm is a small multiple of word clouds. This moti- 2. Comparison of Documents. As the storm displays vation strongly influences the design of effective clouds, as thecloudstogether,theusercaneasilycomparethemand described in Section 2.3. lookforsimilaritiesanddifferencesamongtheclouds. For example,theusercanlookforwordsthataremuchmore 2.1 TypesofStorms commoninonedocumentthananother. Alsotheusercan Different types of storms can be constructed for different comparewhethertwocloudshavesimilarshape,togauge data analysis tasks. In general, the individual clouds in a the overall similarity of the corresponding documents. storm can represent a group of documents rather than a single document. For example, a cloud could represent all 3. Analysis of Single Documents. Finally, the clouds thedocumentswritteninaparticularmonth,orthatappear in the storm have meaning in themselves. Just as with onaparticularsectionofawebsite. Itwouldbetypicaltodo a single word cloud, the user can analyze an individual this by simply merging all of the documents in each group, cloud to get an impression of a single document. and then generating the storm with one cloud per merged 2.3 PrinciplesofEffectiveWordStorms document. Thismakesthestormaflexibletoolthatcanbe usedfordifferenttypesofanalysis,anditispossibletocreate Becausetheysupportadditionaltypesofanalysis,princi- different storms from the same corpus and obtain different plesforeffectivewordstormsaredifferentthanthoseforin- insights on it. Here are some example scenarios: dividualclouds. Thissectiondescribessomedesirableprop- erties of effective word storms. 1. Comparing Individual Documents. If the goal is to First of all, each cloud should be a good representation compare and contrast individual documents in a corpus, of its document. That is, each cloud ought to emphasize (a) (b) (c) (d) (e) (f) (g) (h) Figure 1: We represented the papers of the ICML 2012 conference. These 8 clouds represent the papers in the Opitmization Algorithms track. the most important words so that the information that it Followingthe coordinationofsimilarityprinciple cansig- transmits is faithful to its content. Each cloud in a storm nificantly enhance the usefulness of the storm. For exam- should be an effective visualization in its own right. ple,acommonoperationwhencomparingwordcloudsisto Further principles follow from the fact that the clouds finding and comparing words between the clouds, e.g., once should also be built taking into account the roles they will a word is spotted in a cloud, checking if it also appears in play in the complete storm. In particular, clouds should be other clouds. By displaying shared words in the same color designedsothattheyareeffectiveassmallmultiples [11,12], and position across clouds, it is much easier for a viewer to that is, they should be easy to compare and contrast. This determine which words are shared across clouds, and which has several implications. First, clouds should be similar so wordsappearinonecloudbutnot inanother. Furthermore, that they look like multiples of the same thing, making the making common words look the same tends to cause the storm a whole unit. Because the same structure is main- overall shape of the clouds of similar documents to appear tainedacrossthedifferentclouds,theyareeasiertocompare, visually similar, allowing the viewer to assess the degree of so that the viewer’s attention is focused on the differences similarity of two documents without needing to fully scan amongthem. Arelatedimplicationisthatthecloudsought the clouds. tobesmallenoughthatviewerscananalyzemultipleclouds Following these principles presents a challenge for algo- at the same time without undue effort. rithmsthatbuildwordstorms. Existingalgorithmsforbuild- Thewaythecloudsarearrangedandorganisedonthecan- ingsinglewordcloudsdonottakeintoaccountrelationships vascanalsoplayanimportantrole,becausecloudsareprob- betweenmultiplecloudsinastorm. Inthenextsectionswe ably more easily compared to their neighbours than to the propose new algorithms for building effective storms. moredistantclouds. Thissuggestsaprinciplethatcloudsin a storm should be arranged to facilitate the most important 3. CREATINGASINGLECLOUD comparisons. In the current paper, we take a simple ap- proach to this issue, simply arranging the clouds in a grid, In this section, we describe the layout algorithm for sin- but in future work it might be a good option to place sim- gle clouds that we will extend when we present our new ilar clouds closer together so that they can be more easily algorithm for word storms. The method is based closely on compared. thatofWordle[6],becauseittendstoproduceaesthetically Afinal, andperhapsthemostimportant, principleisone pleasing clouds. Formally, we define a word cloud as a set that we will call the coordination of similarity principle. In of words W = {w ,...,w }, where each word w ∈ W is 1 M aneffectivestorm,visualcomparisonsbetweencloudsshould assignedapositionp =(x ,y )andvisualattributesthat w w w reflect the underlying relationships between documents, so include its font size s , color c and orientation o (hori- w w w thatsimilar documents should have similar clouds, and dis- zontal or vertical). similardocumentsshouldhavevisuallydistinctclouds. This Toselectthewordsinacloud,wechoosethetopM words principlehas particularlystrongimplications. Forinstance, from the document by term frequency, after removing stop to follow this principle, words should appear in a similar words. A more general measure of the weight of each term, fontandsimilarcolourswhentheyappearinmultipleclouds. suchastf*idf, couldbeusedinstead; forthisreasonweuse Moreambitiously,wordsshouldalsohaveapproximatelythe term weight to refer to whatever measure we have selected same position when the same position across all clouds. fortheimportanceofeachterm. Thefontsizeissetpropor- (a) Chemistry (b) Engineering (c) Information Communication and Technology (d) Physical Sciences (e) Complexity (f) Mathematical Sciences Figure 2: These clouds represent 6 EPSRC Scientific Programmes. Each of the programmes is obtained by concatenating all its grants abstracts. (a) (b) (c) (d) (e) (f) Figure 3: A word storm containing six randomly sampled grants from the Complexity Programme (which was Cloud (e) in Figure 2). The word“complex”, that only appeared in one cloud in Figure 2, appears in all clouds in this Figure. As this word conveys more information in Figure 2 than in here, here it is colored more transparent. tionaltotheterm’sfrequency,andthecolorandorientation 4. CREATINGASTORM are selected randomly. In this section, we present novel algorithms to build a Choosingthewordpositionsismorecomplex,becausethe storm. The simplest method would of course be to simply words must not overlap on the canvas. We use the layout runthesingle-cloudalgorithmofSection3independentlyfor algorithm from Wordle [6], which we will refer to as the eachdocument,buttheresultingstormswouldtypicallyvi- Spiral Algorithm. olatetheprincipleofcoordinationofsimilarity(Section2.3) This algorithm is greedy and incremental; it sets the lo- becausewordswilltendtohavedifferentcolors,orientations, cation of one word at a time in order of weight. In other andlayoutsevenwhentheyaresharedbetweendocuments. words, at the beginning of the i-th step, the algorithm has Instead, our algorithms will coordinate the layout of differ- generated a partial word cloud containing the i−1 words ent clouds, so that when words appear in more than one of largest weight. To add a word w to the cloud, the algo- cloud, they have the same color, orientation, and position. rithmplacesitataninitialdesiredpositionpw (e.g.,chosen In this way, if the viewer finds a word in one of the clouds, randomly). Ifatthatposition,wdoesnotintersectanypre- it is easy to check if it appears in any other clouds. vious words and is entirely within the frame, we go on to We represent each document as a vector u , where u is i iw the next word. Otherwise, w is moved one step outwards the count of word w in document i. A word cloud v is a i along a spiral path. The algorithm keeps moving the word tuple v = (W ,{p },{c },{s }), where W is the set of i i iw iw iw i over the spiral until it finds avalidposition, that is, itdoes words that are to be displayed in cloud i, and for any word notoverlapanditisinsidetheframe. Then,itmovesonto w ∈ W , we define p = (x ,y ) as the position of w in i iw iw iw the next word. This algorithm is shown in Algorithm 1. the cloud v , c the color, and s the font size. We write i iw iw We set the word desired positions randomly by sampling p ={p |w∈W } for the set of all word locations in v . i iw i i a 2D Gaussian distribution whose mean is at the center of Our algorithms will focus on coordinating word locations the word cloud frame. The variance is adjusted depending andattributesofwordsthataresharedinmultiplecloudsin onthewidthandtheheightofthewordcloudframe. Ifthe astorm. However,itisalsopossibletoselectthewordsthat desiredpositionissampledoutsidetheframeoritintersects are displayed in each cloud in a coordinated way that con- with it, it is resampled until it is inside. siders the entire corpus. For example, instead of selecting Notethatthealgorithmassumesthatthesizeoftheframe wordsbytheirfrequencyinthecurrentdocument,wecould is given. To choose the size of the frame, we estimate the use global measures, such as tf ∗idf, that could emphasize necessarywidthandthe heighttofitM words. Thischoice the differences among clouds. We tried a few preliminary willaffectthecompactnessoftheresultingwordcloud. Ifthe experimentswiththisbutsubjectivelypreferredstormspro- frame is too big, the words will find valid locations quickly duced using tf. buttheresultingcloudwillcontainalotofwhitespace. Ifit isframeistoosmall,itwillbemoredifficultorimpossibleto 4.1 CoordinatedAttributeSelection fitallthewords. Amaximumnumberofiterationsissetto A simple way to improve the coordination of the clouds preventwordsfromloopingforever. Ifonewordreachesthe inastormistoensurethatwordsthatappearinmorethan maximum number of iterations, we assume that the word onecloudsaredisplayedwiththesamecolorandorientation cannot fit in the current configuration. In that case, the across clouds. We can go a bit farther than this, however, algorithm is restarted with a larger frame. by encoding information in the words’ color and orienta- tion. In our case, we decided to use color as an additional Algorithm 1 Spiral Algorithm way of encoding the relevance of a term in the document. Require: Words W, optionally positions p={p } Rather than encoding this information in the hue, which w w∈W Ensure: Final positions p={p } wouldrequiredamodelofcolorsaliency,insteadwecontrol w w∈W 1: for all words w∈{w ,...,w } do the color transparency. We choose the alpha channel of the 1 M 2: if initial position p unsupplied, sample from Gaussian color to correspond to the inverse document frequency idf w 3: count ← 0 ofthewordinthecorpus. Inthisway,wordsthatappearin 4: while p not valid ∧ count < Max Iteration do asmallnumberofdocumentswillhaveopaquecolors,while w 5: Move p one step along a Spiral path words that occur in many documents will be more trans- w 6: count ← count + 1 parent. In this way the color choice emphasizes differences 7: end while among the documents, by making more informative words 8: if p not valid then more noticeable. w 9: Restart with a larger Frame 4.2 CoordinatedLayout: IterativeAlgorithm 10: end if 11: end for Coordinating the positions of shared words is much more difficultthancoordinatingthevisualattributes. Inthissec- tion we present the first of three algorithms for coordina- In order to decide if two words intersect, we check them tion word positions. In the same manner that we have set at the glyph level, instead of only considering a bounding the color and the orientation, we want to set the position box around the word. This ensures a more compact re- p = p ∀v ,v ∈ V , where V is the set of clouds that sult. However, checking the intersection of two glyphs can wi wj i j w w contain word w. The task is more challenging because it beexpensive,soinsteadweuseatreeofrectangularbound- adds an additional constraint to the layout algorithm. In- ing boxes that closely follows the shape of the glyph, as in stead of only avoiding overlaps, now we have the constraint [6]. Weusetheimplementationoftheapproachintheopen source library WordCram.1 ofplacingthewordsinthesamepositionacrosstheclouds. In order to do so, we present a layout algorithm that itera- 1http://wordcram.org tively generates valid word clouds changing the location of Algorithm 2 Iterative Layout Algorithm is: Require: Storm vi =(Wi,{ciw},{siw}) without positions f (v ,...,v )= (cid:88) (d (u ,u )−d (v ,v ))2 Ensure: Word storm {v1,...,vN} with positions u1,...,uN 1 N u i j v i j 1: for i∈{1,...,N} do 1≤i<j≤N 2: pi ← SpiralAlgorithm(Wi) + (cid:88) c(ui,vi), 3: end for 1≤i≤N 4: while Not Converged ∧ count < Max Iteration do (1) 5: for i∈{1,...,N} do where d is a distance metric between documents and d a 6: p(cid:48)iw ← |V1w|(cid:80)vj∈Vwpjw, ∀w∈Wi metric buetween clouds. The DBS is to be minimized avs a 7: pi ← SpiralAlgorithm(Wi, p(cid:48)i) function of {vi}. The first summand, which we call stress, 8: end for formalizestheideathatsimilardocumentsshouldhavesim- 9: count = count + 1 ilar clouds and different documents, different clouds. The 10: end while secondsummandusesafunctionthatwecallthecorrespon- dencefunction c(·,·),whichshouldbechosentoensurethat each cloud v is a good representation of its document u . i i The stress part of the objective function is inspired by multidimensional scaling (MDS). MDS is a method for di- mensionality reduction of high-dimensional data [1]. Our use of the stress function is slightly different than is com- the shared words to make them converge to the same po- mon, because instead of projecting the documents onto a sition in all clouds. We will refer to this procedure as the low-dimensional space, such as R2, we are mapping docu- iterative layout algorithm, which is shown in Algorithm 2. mentstothespaceofwordclouds. Thespaceofwordclouds In particular, the iterative layout algorithm works by re- isitselfhigh-dimesionsal,andindeed,mighthavegreaterdi- peatedlycallingthespiralalgorithm(Section3)withdiffer- mension than the original space. Additionally, the space of entdesiredlocationsforthesharedwords. Atthefirstitera- wordcloudsisnotEuclideanbecauseofthenon-overlapping tion,thedesiredlocationsaresetrandomly,inthesameway constraints. we did for a single cloud. Subsequently, the new desired lo- For the metric d among documents, we use Euclidean cations are chosen by averaging the previous final locations u distance. For the dissimilarity function d between clouds, ofthewordinthedifferentclouds. Thatis,thenewdesired v location for word w is p(cid:48) = |V |−1(cid:80) p . Thus, the we use w w vj∈Vw wj new desired locations are the same for all clouds vj ∈ Vw, d (v ,v )= (cid:88)(s −s )2+κ (cid:88) (x −x )2+(y −y )2, p(cid:48) =p(cid:48) . Changingthelocationsofsharedwordsmightin- v i j iw jw iw jw iw jw trwojducewnewoverlaps,soweruntheSpiralAlgorithmagain w∈W w∈Wi∩Wj to remove any overlaps. where κ≥0 is a parameter that determines the strength of Inprinciple,thisprocesswouldberepeateduntilthefinal each part. Note that the first summand considers all words locations are the same as the desired ones, that is, when ineithercloud,andthesecondonlythewordsthatappearin the Spiral Algorithm does not modify the given positions. bothclouds. (Ifaworddoesnotappearinacloud,wetreat At that point all shared words will be in precisely identical its size as zero.) The intuition is that clouds are similar if positions across the clouds. However, this process does not theirwordshavesimilarsizesandlocations. Alsonotethat, alwaysconverge,soinpractice,westopafterafixednumber in contrast to the previous layout algorithm, by optimizing of iterations. this function we will also determine the words’ sizes. However, in practice we find a serious problem with the The difference between the objective functions for MDS iterativealgorithm. Thealgorithmtendstomovewordsfar andDBSisthattheDBSaddsthecorrespondencefunction away from the center, because this makes it easier to place c(u ,v ). InMDS,thepositionofadatapointinthetarget i i sharedwordsinthesamepositionacrossclouds. Thisresults spaceisnotinterpretableonitsown,butonlyrelativetothe insparselayoutswithexcessivewhitespacethatarevisually other points. In contrast, in our case each word cloud must unappealing. accuratelyrepresentitsdocument. Ensuringthisistherole of the correspondence function. In this work we use c(u ,v )= (cid:88) (u −s )2, (2) i i iw iw 4.3 CoordinatedLayout: GradientApproach w∈Wi Inthissection,wepresentanewmethodtobuildastorm where recall that u is the tf of word w. iw by solving an optimization problem. This will provide us Wealsoneedtoaddadditionaltermstoensurethatwords withadditionalflexibilitytoincorporateaestheticconstraints do not overlap, and to favor compact configurations. We into storm construction, because we can incorporate them introducetheseconstraintsastwopenaltyterms. Whentwo asadditionaltermsintheobjectivefunction. Thiswillallow words overlap, we add a penalty proportional to the square ustoavoidtheunsightlysparselayoutswhicharesometimes ofthetheminimumdistancerequiredtoseparatethem;call produced by the iterative algorithm. this distance Oi;w,w(cid:48). We favor compactness by adding a We call the objective function the Discrepancy Between penalty proportional to the the squared distance from each Similarities(DBS).TheDBSisafunctionofthesetofclouds wordtowardsthecenter;byconventionwedefinetheorigin {v ,...,v } and the set of documents {u ,...,u }, and as the center, so this is simply the norm of word’s position. 1 N 1 N measures how well the storm fits the document corpus. It Therefore, the final objective function that we use to lay out word storms in the gradient based method is First, we consider a storm that displays six research pro- grammesfromEPSRCprogrammes,fiveofwhicharediffer- gλ(v1,...,vN)=fu1,...,uN(v1,...,vN)+ ent subprogrammes of material sciences and the sixth one N N is the mathematical sciences programme. For this data set λ(cid:88) (cid:88) O2 +µ(cid:88) (cid:88) ||p ||2, (3) we present both a set of independent clouds (Figure 4) and i;w,w(cid:48) iw i=1w,w(cid:48)∈Wi i=1w∈Wi a storm generated by the combined algorithm (Figure 5). From either set of clouds, we can get superficial idea of the where λ and µ are parameters that determine the strength corpus. Wecanseethemostimportantwordssuchas“mate- of the overlap and compactness penalties, respectively. rials”,whichappearsinthefirstfiveclouds,andsomeother We optimze (3) by solving a sequence of optimization words like “alloys”, “polymer” and “mathematical”. How- problems for increasing values λ < λ < λ < ... of the 0 1 2 ever, it is hard to get more information than this from the overlappenalty. Weincreaseλexponentiallyuntilnowords independent clouds. overlapinthefinalsolution. Eachsubproblemisminimized On the other hand, by looking at the coordinated storm using gradient descent, initialized from the solution of the we can detect more properties of the corpus. First, it is in- previous subproblem. stantly clear that the first five documents are similar and 4.4 CoordinatedLayout: CombinedAlgorithm that the sixth one is the most different from all the oth- ers. This is because the storm reveals the shared structure The iterative and gradient algorithms have complemen- in the documents, formed by shared words such as“materi- tary strengths. The iterative algorithm is fast, but as it als”,“properties”and“applications”. Second, we can easily does not enforce the compactness of the clouds, the words tell the presence or absence of words across clouds because drift away from the center. On the other hand, the gradi- of the consistent attributes and locations. For example, we entmethodisabletocreatecompactclouds,butitrequires canquicklyseethat“properties”doesnotappearinthesixth manyiterationstoconvergeandthelayoutstronglydepends cloud or that“coatings”only occurs in two of the six. Fi- on the initialization. Therefore we combine the two meth- nally, the transparency of the words allows us to spot the ods, using the final result of the iterative algorithm as the informativetermsquickly,suchas“electron”(a),“metal”(b), startingpointforthegradientmethod. Fromthisinitializa- “light”(c),“crack”(d),“composite”(e)and“problems”(f). All tion,thegradientmethodconvergesmuchfaster,becauseit of these term are informative of the document content but starts off without overlapping words. The gradient method are difficult to spot in the independent clouds of Figure 4. tends to improve the initial layout significantly, because it Overall, the coordinated storm seems to offer a more rich pulls words closer to the center, creating a more compact and comfortable representation that allows deeper analysis layout. Also, the gradient method tends to pull together than the independently generated clouds. thelocationsofsharedwordsforwhichtheiterativemethod Similarly, from the ICML 2012 data set, Figure 1 shows was not able to converge to a single position. a storm containing all the papers from a single conference session. Itisimmediatelyapparentfromthecloudsthatthe 5. EVALUATION session discusses optimization algorithms. It is also clear The evaluation is divided in three parts: a qualitatively thatthepapers(c)and(d)areveryrelatedsincetheyshare analysis,anautomaticanalysisandauserstudy. Weusetwo alotofwordssuchas“sgd”,“stochastic”and“convex”which different data sets. We used the scientific papers presented resultsinasimilarlayouts. Thefactthatsharedwordstake in the ICML 2012 conference, where we deployed a storm similar positions can also force unique words into similar on the main conference Web site to compare the presented positionsaswell,whichcanmakeiteasytofindtermsthat papers and help the people decide among sessions2. differentiatetheclouds. Forexample,wecanseehow“herd- Second, we also use a data set provided by the Research ing” (f), “coordinated” (g) and “similarity” (h) are in the Perspectivesproject3 [8],aprojectthataimstoofferavisu- same location or“semidefinite”(a),“quasi-newton”(b) and alizationoftheresearchportfoliosoffundingagencies. This “nonsmooth”(d) are in the same location. data set contains the abstracts of the proposals for funded Finally, Figures 2 and 3 show an example of a hierar- research grants from various funding agencies. We use a chical set of storms generated from the EPSRC grant ab- corpus of 2358 abstracts from the UK’s Engineering and stracts. Figure 2 presents a storm created by grouping all Physical Sciences Research Council (EPSRC). Each grant abstracts by their top level scientific program. There we belongs to exactly one of the following programmes: Infor- can see two pairs of similar programmes: Chemistry and mationandCommunicationsTechnology(626grants),Phys- Physical Sciences; and Engineering and Information Com- ical Sciences (533), Mathematical Sciences (331), Engineer- munication and Technology. In Figure 3, we show a second ing (317), User-LedResearch(291) andMaterials, Mechan- stormcomposedasixindividualgrantsfromtheComplexity ical and Medical Engineering (264). Each of these top-level programme (Cloud (e) in Figures 2). It is interesting to see programmes as several subprogrammes that correspond to howbigwordsinthetoplevelsuchas“complex”,“systems”, more specific research areas. “network”and“models”appearwithdifferentweightsinthe grantlevel. Inparticular,theterm“complex”,thatitisrare 5.1 QualitativeAnalysis whenlookingatthetoplevel,appearseverywhereinsidethe In this section, we discuss the presented storms qualita- complexityprogramme. Becauseofouruseoftransparency, tively, focusing on the additional information that is ap- this term is therefore prominent in the top level storm but parentfromcoordinatedstormscomparedtoindependently less noticeable in the lower level storm. built clouds. 5.2 AutomaticEvaluation 2http://icml.cc/2012/whatson/ 3Also see http://www.researchperspectives.org Apart from evaluating the resulting storm qualitatively, (a) Electronic Materials (b) Metals and Alloys (c) Photonic Materials (d) Structural Ceramics and Inorganics (e)StructuralPolymersandComposites (f) Mathematical Sciences Figure 4: Independent Clouds visualizing six EPSRC Scientific Programmes. These programmes are also represented in Figure 5 (a) Electronic Materials (b) Metals and Alloys (c) Photonic Materials (d) Structural Ceramics and Inorganics (e)StructuralPolymersandComposites (f) Mathematical Sciences Figure 5: Coordinated storm visualizing six EPSRC Scientific Programmes. These programmes are also represented as independent clouds in Figure 4. Compared to that figure, it is much easier to see the differences between clouds. Time (s) Compactness (%) Accuracy (%) Lower Bound - - 26.5 Independent Clouds 143.3 35.12 23.4 Coordinated Storm (Iterative) 250.9 20.39 54.7 Coordinated Storm (Combined) 2658.5 33.71 54.2 Upper Bound - - 67.9 Table 1: Comparison of the results given by different algorithms using the automatic evaluation. weproposeamethodtoevaluatewordstormalgorithmsau- clouds. We compute the compactness by taking the mini- tomatically. Theobjectiveistoassesshowwelltherelations mumboundingboxofthecloudandcalculatingthepercent- among documents are represented in the clouds. The moti- ageofnon-backgroundpixels. Weusethismeasurebecause vation is similar in spirit to the celebrated BLEU measure informally we noticed that more compact clouds tend to be in machine translation [9]: By evaluating layout algorithms more visually appealing. with an automatic process rather than conducting a user The results are shown in Table 1. Creating the clouds study, the process can be faster and inexpensive, allowing independentlyisfasterthananycoordinatedalgorithmand rapid comparison of algorithms. also produces very compact clouds. However, for classifica- Our automatic evaluation requires a corpus of labelled tion,thismethodisnobetterthanrandom. Thealgorithms documents, e.g., with a class label that indicates their top- tocreatecoordinatedclouds,theiterativeandthecombined ics. The main idea is: If the visualization is faithful to the algorithm, achieve a 54% classification accuracy, which is documents, then it should be possible to classify the docu- significantly higher than the lower bound. This confirms ments using the pixels in the visualization rather than the the intuition that by coordinating the clouds, the relations words in the documents. So we use classification accuracy among documents are better represented. as a proxy measure for visualization fidelity. The differences between the coordinated methods can be In the context of word storms, the automatic evaluation seenintherunningtimeandinthecompactness. Although consists of: (a) generating a storm from a labelled corpus the iterative algorithm achieves much better classification withone cloudper cloud, (b)trainingadocumentclassifier accuracy than the baseline, this is at the cost of producing using the pixels of the clouds as attributes and (c) testing muchlesscompactclouds. Thecombinedalgorithm,onthe the classifier on a held out set to obtain the classification other hand, is able to match the compactness of indepen- accuracy. More faithful visualizations are expected to have dently built clouds (33.71% combined and 35.12% indepen- better classification accuracy. dent) and the classification accuracy of the iterative algo- We use the Research Perspectives EPSRC data set with rithm. Thecombinedalgorithmissignificantlymoreexpen- the research programme as class label. Thus, we have a sive in computation time, although it should be noted that single-label classification problem with 6 classes. The data even the combined algorithm uses only 1.1s for each of the was randomly split into a training and test set using an 2358cloudsinthestorm. Therefore,althoughthecombined 80/20 split. We use the word storm algorithms to create algorithm requires more time, it seems the best option, be- one cloud per abstract, so there are 2358 clouds in total. causetheresultingstormoffersgoodclassificationaccuracy Wecomparethreelayoutalgorithms: (a)creatingtheclouds without losing compactness. independentlyusingtheSpiralAlgorithm,whichisourbase- A potential pitfall with automatic evaluations is that it line;(b)theiterativealgorithmwith5iterationsand(c)the is possible for algorithms to game the system, producing combinedalgorithm,using5iterationsoftheiterativealgo- visualizations that score better but look worse. This has rithm to initialize the gradient method. arguably happened in machine translation, in which BLEU We represent each cloud by a vector of the RGB values has been implicitly optimized, and possibly overfit, by the of its pixels. To reduce the size of this representation, we research community for many years. For this reason it is perform feature selection, discarding features with zero in- importanttocombineFurthermore,inourcase,noneofour formation gain. We classify the clouds by using support the algorithms optimize the classification accuracy directly vector machines with normalized polynomial kernel4. but instead follow very different considerations. But the Inordertoputtheclassificationaccuracyintocontext,we concern of“research community overfitting”is one to take present a lower bound obtain if all instances are classified seriously if automated evaluation of visualization is more as the largest class (ICT), which produces an accuracy of widely adopted. 26.5%. To obtain an upper bound, we classifying the doc- uments directly using bag-of-words features from the text, 5.3 UserStudy whichshouldperformbetterthantransformingthetextinto a visualization. Using a support vector machine, this yields Inordertoconfirmourresultsusingtheautomaticevalu- an accuracy of 67.9%. ation, we conducted a pilot user study comparing the stan- Apartfromtheclassificationaccuracy,wealsoreportthe dardindependentwordcloudswithcoordinatedstormscre- running time of the layout algorithm (in seconds)5, and, as ated by the combined algorithm. The study consisted of 5 a simple aesthetic measure, the compactness of the word multiple choice questions. In each of them, the users were presented with six clouds and were asked to perform a sim- 4The classification is performed by using the SMO imple- ple task. The tasks were of two kinds: checking the pres- mentation of Weka ence of words and comparing documents. The clouds for 5Allexperimentswererunona3.1GHzIntelCorei5server each question were generated either as independent clouds with 8GB of RAM. oracoordinatedstorm. Ineveryquestion,theuserreceived