By chance is not enough: preserving relative density through non uniform sampling EnricoBertini,GiuseppeSantucci DipartimentodiInformaticaeSistemistica-Universita`diRoma“LaSapienza” ViaSalaria,113-00198Roma,Italy-{bertini,santucci}@dis.uniroma1.it Abstract malmodelwediscussinthepapergivespreciseindications 7 Dealingwithvisualizationscontaininglargedatasetis ontherightamountofdatasamplingneededtoproducea 1 achallengingissueand,inthefieldofInformationVisual- representation preserving the most important image char- 0 ization,almosteveryvisualtechniquerevealsitsdrawback acteristics, i.e., relative densities that are one of the main 2 whenvisualizinglargenumberofitems. Todealwiththis cluestheusercangraspfrom2Dscatter-plots. n problemweintroduceaformalenvironment,modelingina Thecontributionofthispaperistwofold: (1)itpresents a virtual space the image features we are interested in (e.g, aformalmodelthatallowsfordefiningandmeasuringdata J absoluteandrelativedensity,clusters,etc.) andwedefine density both in terms of a virtual space and of a physical 4 some metrics able to characterize the image decay. Such space(e.g.,adisplay)and(2)itdefinesanovelautomatic 2 metrics drive our automatic techniques (i.e., not uniform non uniform sampling technique driven by some metrics ] sampling) rescuing the image features and making them definedabovethepreviousfigures. R visible to the user. In this paper we focus on 2D scatter- The paper is structured as follows: Section 2 analyzes G plots,devisinganovelnonuniformdatasamplingstrategy related works, Section 3 describes the model we use to s. abletopreserveinaneffectivewayrelativedensities. characterize clutter and density, formalizing the problem c andintroducingthemetricsweareinterestedin,Section4 [ Keywords—visualclutter,metrics,non-uniformsampling. describes our non uniform sampling technique, Section 5 1 discussestheresultsobtainedapplyingourtechniquestoa 1 Introduction v realdataset,and,finally,Section6presentssomeconclu- 0 Visualizinglargedatasetsresults,veryoften,inaclut- sions,openproblemsandfuturework. 1 tered image in which a lot of graphical elements overlap 1 andmanypixelsbecomeoverplotted,hidingfromtheuser 2 RelatedWork 7 themainimagevisualfeatures. ThispaperdealswithissuesconcerningmetricsforIn- 0 We deal with this problem providing a formal frame- formationVisualizationandtechniquestoaddresstheprob- . 1 work to measure the amount of decay resulting from a lem of overlapping pixels and visual clutter in computer 0 given visualization, then we build, upon these measures, displays. In the following we illustrate the research pro- 7 1 an automatic non uniform sampling strategy that aims at posals closer to our approach and their relationship with : reducingsuchadegradation. Wefocusonaverycommon ourwork. v visualtechnique,2Dscatter-plots,analyzingthelossofin- i 2.1 MetricsforInformationVisualization X formationderivedbyoverlappingpixels. As expressed in [6], Information Visualization needs r Inthispaperweimproveandextendsomepreliminary a metricsabletogivepreciseindicationsonhoweffectively resultspresentedin[1],definingaformalmodelthatesti- avisualizationpresentsdataandtomeasureitsgoodness. matestheamountofoverlappingelementsinagivenarea Some preliminary ideas have been proposed considering andtheremainingfreespace. Thesepiecesofinformation bothformalmeasurementsandguidelinestofollow. give an objective indication of what is eventually visual- Tufte proposes in [9] some measures to estimate the ized on the physical device; exploiting such measures we qualityof2Drepresentationsofstaticdata. Measureslike canestimatethequalityofthedisplayedgraphicdevising the lie factor, that is the ratio of the size of an effect as techniquesabletorecoverthedecayedvisualization. shown graphically to its size in the data, are examples of Toeliminatethesenseofclutter,weemployalowgrain first attempts to systematically provide indications about nonuniformsamplingtechniquedealingwiththechalleng- thequalityoftheimagedisplayed. Tufte’sproposalhow- ing issue of devising the right amount of sampling in or- everappliestopaperbased2Dvisualizationsanddoesnot dertopreservethevisualcharacteristicsoftheunderlying directlyapplytointeractivecomputer-basedimages. Brath data. Itisquiteevident,infact,thatatoostrongsampling in[7],startingfromTufte’sproposal,definesnewmetrics is useless and destroys the less dense areas, while a too for static digital 3D images. He proposes metrics such as lightsamplingdoesnotreducetheimageclutter. Thefor- 1 datadensity(numberofdatapoints/numberofpixels)that resembleTufte’sapproachtogetherwithnewones,aiming at measuring the visual image complexity. The occlusion percentage, for example, has connections with our work. It provides a measure of occluded elements in the visual spacesuggestingtoreducesuchavalueasmuchaspossi- ble.Thesemetricsareinterestingandaremoreappropriate for describing digital representations. However, as stated bytheauthor,theyarestillimmatureandneedrefinements. While the above metrics aim at measuring a general goodnessoratcomparingdifferentvisualsystems,ouraim istomeasuretheaccuracyofthevisualization,thatis,how well it represents the characteristics hidden inside data. They present some similarities with past metrics but op- erate at a lower level dealing with pixels and data points, Figure1: Plottingmailparcels providingmeasuresthatcandirectlybeexploitedtodrive correctiveactions. Itisworthtonotethat,onthecontrary of the above proposals, we will show how the suggested sity”. Even if interesting, this idea is not free of draw- metrics can be exploited in practice to take quantitative backs. In particular, when the data present particular dis- decisionsaboutcorrectiveactionsandenhancethecurrent tributions,i.e.,thedatasethasbothveryhighandverylow visualization. density areas, choosing the right amount of sampling is a challenging task. Depending on the amount of sampling 2.2 Dealingwithoverlappingpixelsandclutter two problems can arise: 1) If the sampling is too strong The problem of eliminating visual clutter and overlap- the areas in which the density is under a certain level be- ping pixels to produce intelligible graphics has been ad- comecompletelyempty;2)Ifthesamplingistooweakthe dressedbymanyproposals. areaswithhigherdensitieswillstilllookallthesame(i.e., Jitteringasstatedin[8],isawidelyadoptedtechnique completelysaturated)andconsequentlythedensitydiffer- thatpermitstomakeapparentpixelsthatnaturallymapinto encesamongthemwillbenotperceivedbytheuser.Afirst the same position into the screen. The idea is to slightly proposalinthisdirectionisin[1],whereanautomaticuni- changethepositionofoverlappingpointsinordertorender formsamplingtechniqueispresented,abletocomputethe them all visible. Similarly, space-filling pixel-based tech- optimalsamplingratiow.r.t. somequalitymetrics. niques[5]distributedatapointsalongpredefinedcurvesto Ourapproachdiffersfromtheaboveproposalsforthree avoidoverlappingpixels,shiftingthemtopositionsthatare mainaspects: ascloseaspossibletotheoriginalone. Transparency is also an interesting technique to over- • itprovidesasoundmodelfordefininginbothavir- comeocclusionandreduceclutter,bothinin3D[12]and tualandphysicalspaceseveralmetricsintendedspecif- 2D[3]visualizations. However, whendealingwithpixel- icallyfordigitalimages; basedvisualizationsitisnotpossibletoconveytransparency atthelevelofsinglepixels,thoughitisuseless. • it provides, on the basis of the above figures, some Constantdensityvisualization[10][11]isaninteresting quantitativeinformationabouttheimagedecay; techniquetodealwithclutter. Exploitingtheideaofgen- • it exploits such numerical results for automatically eralizedfisheyeviews[4],itconsistsingivingmoredetails computing where, how, and how much to sample tolessdenseareasandlessdetailstodenserareas,allowing preserving,asmuchaspossible,acertainvisualchar- thescreenspacetobeoptimallyutilizedandtoreduceclut- acteristic. ter.Theproblemswiththisapproacharethatitrequiresthe usertointeractwiththesystem,theoveralltrendofdatais 3 ModelingVisualDensityandClutter generallylost,andsomedistortionsareintroduced. In this section we present the formal framework that Sampling is used in [2] to reduce the density of visual aimsatmodelingtheclutterproducedbyover-plottingdata. representation. As the authors state, if the sampling is Some preliminary issues about the matter are in [1]; here madeinrandomway,thedistributionispreservedandthough weshowarefinementofthatresults. it is still possible to grasp some useful information about We consider a 2D space in which we plot elements by data correlation and distributions, permitting “to see the associatingapixeltoeachdataelementmappingtwodata overall trends in the visualization but at a reduced den- 2 attributes on the spatial coordinates. As an example, Fig- i.e.,selectingnelementsfromasetofpelementsallowing ure1showsabout160,000mailparcelsplottedontheX-Y repetitions(dispositionswithrepetitions: pn). plane according to their weight (X axis) and volume (Y Calculatingthe#configwithexactlykcollisionsisper- axis). Itisworthnotingthat,evenifthenumberofplotted formed in three steps. First we calculate all the possible itemsislittle, theareaclosetotheoriginisverycrowded ways of selecting n−k non colliding points from p pix- (usually parcels are very light and little), so a great num- els (combinations without repetitions: (cid:0) p (cid:1)). After that, n−k berofcollisionsispresentinthatarea: themostcrowded for each of such combinations, we calculate all the possi- areacontainsmorethat50,000(about30%)ofthewhole blewaysofhittingk timesoneormoreofthen−k non datasetcompressedinlessthan1%ofthewholescreen. collidingpointsinordertoobtainexactlykcollisions,that Exploiting well known results coming from the calcu- corresponds to selecting k elements form a set of n − k lus of probability, we derive a function that estimates the elements with repetitions (combinations with repetitions: amountofcollidingpointsand,asaconsequence,theamount (cid:0)n−k+k−1(cid:1)). Finally, because of we are interested in all k of free available space. More formally, two points are the possible dispositions, we need to count the permuta- in collision when their projection is on the same physical tions (PERM) of these combinations. Unfortunately, be- pixel. In order to derive such a function, we imagine to causeofthevariablenumberofduplicates(e.g.,itispossi- tossndatapointsinarandomwayonafixedareaofppix- bletohavekcollisionshittingk+1timesthesamepixelp , i els. Thisassumptionisquitereasonableifweconductour orktimesp andtwotimespixelp , ork-1timesp , two i j i analysisonsmallareas. timespixelp ,andtwotimespixelp andsoon)wewere j k Toconstructsuchfunctionsweuseaprobabilisticmodel noabletoexpresssuchpermutationsbyacloseformula. basedontheparametersjustdescribed,thatherewesum- ¿From the above expression we derived, through a C marizeforthesakeofclarity: program, a series of functions (see Figure 5) showing the behavior of the observed area as the number of plotted • nisthenumberofpointswewanttoplot; points increases. More precisely, we compute the avail- ablefreespaced(Yaxis,aspercentagew.r.t. p),themean • pisthenumberofavailablepixels; ofcollidingelementsk(Yaxis,aspercentagew.r.t. n)for • kisthenumberofcollisions; anygivennumberofplottedpointsn(Xaxis,aspercentage w.r.t. p). Forexample,ifwehaveanareaof64pixels,the • disthenumberoffreepixels. graphtellusthatplotting200%(128)ofppointswillpro- duce an average of 56.7% (72.5) collisions. On the other Theprobabilityofhavingexactlykcollisionsplottingn hand, if we plot 128 points having 72.5 collisions we can points on an area of p pixels, Pr(k,n,p), is given by the computethefreepixelsd,asd=64−(128−72.5)=8.5 followingfunction: (13.3%). The behavior of the functions is quite intuitive: as the PERM[( p )(n−k+k−1)] n−k k ifn≤pandk ∈[0,n−1] numberofplottedpointsnincreasesthepercentageofcol- pn orn>pandk ∈[n−p,n−1] lisions increases as well while the free space decreases; roughlyspeaking,wecansaythatoverplottingfourtimes 0 ifn>pandk ∈[0,n−p] the screen results in a totally saturated display (1.6% of freespace). Thefunctionisdefinedonlyfork <n,becauseitisim- Such functions can tell us how much we are saturat- possibletohavemorecollisionsthanplottedpoints. More- ingthespaceor,asamorecomplexpossibility,thewayin over, it is easy to understand that in some cases the prob- whichthedisplayisabletorepresentrelativedensitiesand ability isequal to zero: if n > p, becauseof we areplot- howmuchtosamplethedatatoguaranteeaprefixedvisu- ting more points than available pixels, we must necessar- alizationquality.Thisresultisexploitedinthenextsection ilyhavesomecollisions. Forexample,ifwehaveanarea andweclarifyitthroughanexample. Assumethatweare of 8×8 pixels and we plot 66 points, we must necessar- plotting n points on the area A turning on p pixels and 1 1 ily have at least 2 collisions, so Pr(0,66,64) = 0 and 2npointsontheareaA turningonp pixels. Inprinciple, 2 2 Pr(1,66,64)=0. theusershouldperceiveareaA ascontainingmore(i.e., 2 The basic idea of the formula is to calculate, given p twice as many) points as area A . Because of collisions, 1 pixels and n plotted points, the ratio between the number p ≤ 2p and as n increases the user initially looses the 2 1 ofpossiblecasesshowingexactlykcollisionsandthetotal informationthatareaA containstwiceasmanypointsas 2 numberofpossibleconfigurations. A andforgreatervaluesofntheuserisnotabletograsp 1 Thelatteriscomputedconsideringallthepossibleways anydifferencebetweenA andA . Asanumericalexam- 1 2 inwhichitispossibletochoosenpointsamongppixels, 3 values will be really assumed by the sample areas. For eachvaluewecancomputethenumberofsampleareasin which that value is present and an histogram showing the distributionofthevariousdatadensitiescanbecomputed. Forexample,ifweplot100datapointsintoanareaof10 sample areas, we could have the following configuration: 3sampleareaswith20datapoints,2sampleareaswith15 datapoints,2sampleareaswith5datapoints. Represented density is defined as rd = pi,j where i,j A p is the number of distinct active pixels that fall into i,j SA . The number of different values that a sample area i,j canassumeisheavilydependentonthesizeofsamplear- eas. If we adopt sample areas of size 8x8 pixels, as de- scribedbefore,thenumberofdifferentnotnullrepresented densitiesis64. Thus,wecanrepresentatmost64different Figure2: Collidingelementspercentage representeddensityvalues.Itisquiteobviousthat,because ple, ifweplot64and128pointsontwo8×8areas, the ofcollisions,rd ≤d . i,j i,j pixelsturnedoninthetwoareaswillbe40.55and55.43, Usingtheabovedefinitionswedevisedaneffectiveset so the ratio of displayed pixels is only 1.36. In order to ofqualitymetricswhosecompletediscussion,however,is preservethevisualimpressionthatareaA containstwice outofthescopeofthispaper(see[1]forapracticaluseof 2 asmanypointsasA acceptingadecayof20percentwe thesequalitymetricsforuniformsamplingstrategies). 1 havetosamplethedata(64and128points)asmuchas50 The above metrics, together with the statistical results per cent resulting in 32 and 64 points that, once plotted, giveusthemeanstodeviseanautomaticnonuniformsam- turn on 25.32 and 40.55 pixels, i.e., a ratio of 1.6 (20 per plingtechniquedescribedinthenextsection. centofdecay). 4 Nonuniformsampling 3.1 Datadensitiesandrepresenteddensity In[1]auniformsamplingstrategyhasbeenpresented, Thepreviousresultsgiveusawaytocontrolandmea- showingitsabilityinimprovinganimagereadability. Ap- surethenumberofcollidingelements. Beforeintroducing plyingthesameamountofsamplingtothewholeimageis our optimization strategy, we need to clarify our scenario quite straightforward but presents several drawbacks. As andtointroducenewfiguresanddefinitions. anexample,itisquiteobviousthatsamplingareaspresent- Weassumetheimageisdisplayedonarectangulararea ingverylowdatadensityisuselessandpotentiallydanger- (measured in inches) and that small squares of area A di- ous. Moreover it is quite evident that the most important videthespaceinm×nsampleareas(SA)wheredensity cluesausercangraspfrom2Dscatter-plotsaredifferences is measured. Given a particular monitor, resolution and indensitiesandouropinionisthatanonuniformsampling size affect the values used in calculations. In the follow- canpreserveinamoreefficientwaysuchdifferences. ingweassumethatweareusingamonitorof1280x1024 Theproblemofrepresentingrelativedensitiesistheone pixelsandsizeof13”x10.5”. Usingthesefigureswehave ofcreatinganoptimalmappingbetweenthesetoftheac- 1,310,720 pixels and if we choose SA of side l = 0,08 tualdatadensitiesandthesetofavailablerepresentedden- inch, the area is covered by 20.480 (128x160) sample ar- sities. Each data density must be associated to one of the easwhosedimensioninpixelis8×8. Weconsidersmall 64(underthehypothesisof8×8sampleareas)available areasbecauseofitmakestheuniformdistributionassump- represented densities. Any given visualization is one par- tionquiterealistic. ticular mapping. Consider the case in which a visualiza- For each SA , where 1 ≤ i ≤ m and 1 ≤ j ≤ n, tion is obtained by displaying a large data set. It likely i,j wecalculatetwodifferentdensities: realdatadensity(or, correspondstoamappinginwhichhigherdensitiesareall shorter,datadensity)andrepresenteddensity. mappedontofewsinglerepresenteddensities, theonesin Datadensityisdefinedasd = ni,j wheren isthe whichquiteallpixelsareactive(panesaturation). Thisis i,j A i,j number ofdata points thatfall into samplearea A . For why in that areas relative densities can not be perceived: i,j agivenvisualization,thesetofdatadensitiesisfiniteand alargenumberofhighdatadensitiesismappedontovery discrete. In fact, if we plot a number n of data elements close values. Our idea is to investigate how these map- into the display, each SA assumes a value d that is pingscouldbechangedinordertopresenttotheusermore i,j i,j withinthefiniteanddiscretesetofvalues:0, 1, 2,..., n. informationaboutrelativedensitiesaccepting,toacertain A A A In general, for any given visualization, a subset of these extent,somedistortion. 4 Figure3: Ascreenareamadeof100sampleareas: real(a)andrepresented(c)datadensities In the following we use a simple numeric example to each SA , between 0 and 12. Looking at figure 3 (d) it i,j clarify our approach. Assume we are plotting 2264 (this iseasytodiscoverthatmorethen50%ofthevisualization strangenumbercomesfromarandomdatageneration)points pane(54sampleareasoutof100)rangingbetween22and onascreencomposedby400x400pixelsarrangedin100 49datadensitycollapsedonjustthreedifferentrepresented sample areas of size 4x4 pixels. In the example we con- datadensities(10,11,12). centrate on the number of data elements or active pixels Inordertoimprovesuchasituationwewanttoproduce neglecting the SA area value (what we called A), that is anewmappingamongthegivendatadensitiesandthe12 justaconstant. InFigure3(a)thedatadensities(interms availablerepresenteddensities. Thiscanbedonepursuing ofnumberofpoints)correspondingtoeachsampleareaare thegoalofpreservingthemaximumnumberofdifferences, displayed. loosing,ontheotherend,theirextent. Inotherwords,we Figure3(b)showstheactualvaluesofdatadensities(X want to present the user with as many difference in den- axis) together with the associated number of sample ar- sity as possible, partially hiding the real amount of such eas sharing each value (Y axis). As an example, we can differences. see that the maximum data density 49 is shared by just Inordertoobtainsucharesult,startingfromfigureFig- one sample area (SA ) and the minimum data density ure 3 (b) and considering only the 96 sample areas with 2,6 0 is shared by four sample areas (SA , SA , SA , datadensity>0wesplitthexaxisin12(i.e.,theavailable 5,6 5,8 6,4 SA ). Figure3(c),obtainedapplyingthestatisticalre- representeddensities)adjacentnonuniformintervals,each 10,3 sultsdiscussedinSection2(seeFigure5),showstheactual of them containing 96/12 = 8 sample areas. Obviously, representeddensity(intermsofactivepixels)ranging,for because of we are working on discrete values we cannot 5 Figure4: Nonuniformsampling guarantee that each interval contains exactly 8 sample ar- 5 Discussion eas and we have to choose an approximation minimizing In this section we show the effectiveness of our tech- thevariance. Afterthat,thedataelementsbelongingtothe niquecommentingtheimagesobtainedapplyingdifferent sample areas associated with the same interval i are sam- sampling strategies. We compare the images acquired vi- pledinawaythatproducesarepresenteddensityequalto sualizing a real dataset: the one containing 160,000 mail i. Asanexample,thefirstintervalencompassesdataden- parcelsalreadymentionedinSection3. sities1(sharedby6sampleareas)and2(sharedby3sam- Theimagescomefromatoolspecificallydevelopedfor pleareas)andtheassociateddataelementsaresampledas ourpurposes. ItisaJavabasedapplicationthatpermitsto muchasneededinordertoproducearepresenteddensity inspectseveralcharacteristicsofthedisplayedimagesuch equalto1. Thesecondintervalencompassesdatadensity as: thedata/representeddensityofeachsamplearea,some 3, 4, and 5 (7 sample areas) and, after the sampling, the qualitymetrics,andthenumberofoverlappingpixels. Itis resultingdatadensityis2,andsoon. alsopossibletoapplyuniformandnon-uniformsampling, The represented densities resulting from this approach andtofiltersampleareaswithdata/representeddensityout are depicted in Figure 4 (a); Figure 4 (b) shows the new, ofaspecificrange. more uniform distribution of such represented densities. Figure5shows: (a)theoriginalvisualization(nosam- We want to point out that in this new representation the pling), (b) the one obtained uniformly sampling the data abovecollapsed54datadensitiesofFigure3(b)nowrange leaving 80% of the original dataset, (c) the one obtained between6and12representeddensitiesallowingtheuserto uniformly sampling the data leaving 20% of the original discovermoredensitydifferences.Ontheotherhand,asan dataset(thisvalueisthebestuniformsamplingratiocom- example,therealdifferencebetweendatadensities29and puted by the proposal shown in [1]), (d)the one obtained 22(1.32)ispoorlymappedonrepresenteddensities7and usingnonuniformsampling. Itisquiteevidentthatatoo 6(1.16). weakuniformsampling(Fig.5(b))doesnotmakeapparent Roughlyspeaking,wecanthinkatthewholeprocessas density differences in high density areas. Conversely, an follows. We have at disposal p different represented den- optimized(butstilltoostrong)uniformsampling(Fig.5(c)) sitiesthatarematchedagainstk realdatadensitieswhere, makes them apparent but to the detriment of low density usually,k >>p;thatimpliesthateachrepresenteddensity areas. In fact, the upper right area originally contained a is in charge to represent several, different data densities, cluster that is not visible anymore. Figure 5(d) shows the hiding differences to the user. The game is to change, by resultobtainedwhenapplyingnon-uniformsampling. The nonuniformsampling,theoriginaldatadensities,altering featuresinthelowdensityareasarestillvisible(asinthe theirassignmenttothepavailablerepresenteddensitiesin case of weak uniform sampling) but, at the same time, in ordertopreservethenumberofdensitydifferences. thehighdensityareaitmakesmoreevidentdensitydiffer- 6 Figure5: Comparisonofvarioussamplingmethodsvisualizingthemailparcelsdataset encesthatintheoriginalimagewerenotperceptible(asin resenteddensitiesassmartlyasitcan. the case of strong uniform sampling). Figure 5(e) makes The logic behind the algorithm can be better appreci- itclearer. Itisobtainedfilteringoutthesampleareaswith ated looking at Fig. 6 that compares the original and non data density lower than 810 (i.e., SA with less than 810 uniformlysampledvisualizationstogetherwiththeirdensi- points) therefore showing the most dense areas. If com- tieshistograms. Thedensitiesaremoreevenlydistributed, pared withthe other images itis easy tonotice that while allowing the dense areas to exhibit the underlying trends. on Figure 5(d), that pattern is perfectly clear, on images Moreover,thepeaksassociatedwiththehigherdatadensi- (Fig. 5(a) and (b)) it is hidden in the saturated areas and ties(i.e.,62,63,64)arenotpresentanymore. on (Fig. 5(c) it is faintly visible. Roughly speaking, we 6 Conclusionsandfuturework can say that our technique produce at the same time the In this paper we presented a low grain, non uniform advantagesofbothstrongandweaksampling. sampling sampling technique that automatically reduces Anotherinterestingaspectworthtomention,ishowthis visual clutter in a 2D scatter plot and preserves relative technique can be operated. When applying uniform sam- densities. To the best of our knowledge this approach is pling, the choice of the amount of sampling to choose is aquitenovelwayofsamplingvisualdata. Thetechnique critical. Ifthesamplingfactorisselectedbyhand,theuser exploits some statistical results and a formal model de- hastotrymanycombinationsuntils/hefindsthevaluethat scribing and measuring over plotting, screen occupation, best conveys the information. To overcome this, we ap- andbothdatadensityandrepresenteddatadensity. Sucha pliedin[1]analgorithmtoautomaticallydevisetheamount modelallowsforcomputingwhere,how,andhowmuchto of sampling to apply. Exploiting the metrics presented in samplepreservingsomeimagecharacteristics(i.e.,relative Section 3.1 we were able to find the best sampling factor density). to apply, but the problem of uniform sampling still held. Severalopenissuesrisefromthiswork: Conversely, with non-uniform sampling there is no need to search into a space of solutions and the algorithm runs • users must be involved. Our strategy provides pre- autonomouslywiththeideaofassigningtheavailablerep- cise figures but we need to map them against user 7 Figure6: Nonuniformsampling perceptions. Asanexample,stillreferringtoourap- References proach,ifasampleareacontainstwiceasmanyac- [1] E. Bertini and G.Santucci. Quality metrics for 2d tivepixelsasanotherone,doestheuserperceivethe scatterplot graphics: automatically reducing visual feeling of observing a double density for any total clutter. In Proceedings of 4th International Sympo- occupation of the areas? On the other hand, how siumonSmartGraphics,May2004. much two sample areas may differ in pixel num- [2] G. Ellis and A. Dix. Density control through ran- ber still giving the user the sensation of having the dom sampling: an architectural perspective. In Pro- samedatadensity? Wearecurrentlydesigningsome ceedingsofConferenceonInformationVisualisation, perceptiveexperiments,inordertodeepthisaspect. pages82–90,July2002. Thenextstepwillbetoincorporatewithinouralgo- rithmstheseissues. [3] Jean-Daniel Fekete and Catherine Plaisant. Interac- tive information visualization of a million items. In • samplingareas. Severalchoicesdeservemoreatten- ProceedingsoftheIEEESymposiumonInformation tion:itisourintentiontoanalyzetheinfluenceofin- Visualization(InfoVis’02),page117.IEEEComputer creasing/decreasing of sampling area dimension, in Society,2002. termofimagequalityandcomputationalaspects. [4] G.W.Furnas.Generalizedfisheyeviews.InProceed- Weareactuallyextendingtheprototypefunctionalities ings of the SIGCHI conference on Human factors in to apply and verify our ideas. We want to implement a computingsystems,pages16–23,1986. dataset generator to conduct controlled tests. The dataset generatorwillpermittogenerateartificialdistributions,giv- [5] DanielA.KeimandAnnemarieHerrmann. Thegrid- ingthepossibilitytocontrolspecificparameters, thatwill fit algorithm: an efficient and effective approach to beusedtocreatespecificcasesconsideredcriticalorinter- visualizinglargeamountsofspatialdata.InProceed- esting. ings of the conference on Visualization ’98, pages 181–188.IEEEComputerSocietyPress,1998. 7 Acknowledgements We would like to thank Pasquale Di Tucci for his in- [6] Nancy Miller, Beth Hetzler, Grant Nakamura, and valuablehelpinimplementingthesoftwareprototype. PaulWhitney.Theneedformetricsinvisualinforma- 8 tion analysis. In Proceedings of the 1997 workshop on New paradigms in information visualization and manipulation,pages24–28.ACMPress,1997. [7] Brath Richard. Concept demonstration: Metrics for effective information visualization. In Proceedings For IEEE Symposium On Information Visualization, pages 108–111. IEEE Service Center, Phoenix, AZ, 1997. [8] MarjanTrutschl,GeorgesGrinstein,andUrskaCvek. Intelligently resolving point occlusion. In Proceed- ings of the IEEE Symposium on Information Vizual- ization2003,page17.IEEEComputerSociety,2003. [9] Edward R. Tufte. The visual display of quantitative information. GraphicsPress,1986. [10] AllisonWoodruff,JamesLanday,andMichaelStone- braker. Constant density visualizations of non- uniform distributions of data. In Proceedings of the 11th annual ACM symposium on User interface software and technology, pages 19–28. ACM Press, 1998. [11] AllisonWoodruff,JamesLanday,andMichaelStone- braker. Vida: (visual information density adjuster). In CHI ’99 extended abstracts on Human factors in computingsystems,pages19–20.ACMPress,1999. [12] Shumin Zhai, William Buxton, and Paul Milgram. The partial-occlusion effect: utilizing semitrans- parency in 3d human-computer interaction. ACM Trans.Comput.-Hum.Interact.,3(3):254–284,1996. 9

