Table Of ContentSurpassing Humans and Computers with JELLYBEAN:
Crowd-Vision-Hybrid Counting Algorithms
AkashDasSarma AyushJain ArnabNandi
StanfordUniversity UniversityofIllinois TheOhioStateUniversity
akashds@stanford.edu ajain42@illinois.edu arnab@cse.osu.edu
AdityaParameswaran JenniferWidom
UniversityofIllinois StanfordUniversity
adityagp@illinois.edu widom@cs.stanford.edu
Abstract Countingisimportant.Countingobjectsinimagesorvideos
is a ubiquitous problem with many applications. For in-
Counting objects is a fundamental image processing primi-
stance, biologists are often interested in counting the num-
tive, and has many scientific, health, surveillance, security,
berofcellcoloniesinperiodicallycapturedphotographsof
and military applications. Existing supervised computer vi-
sion techniques typically require large quantities of labeled petridishes;countingthenumberofindividualsatconcerts
training data, and even with that, fail to return accurate re- ordemonstrationsisoftenessentialforsurveillanceandse-
sultsinallbutthemoststylizedsettings.Usingvanillacrowd- curity(Liuetal.2005);countingisoftennecessaryinmili-
sourcing, on the other hand, can lead to significant errors, taryapplications;countingnervecellsortumorsisstandard
especially on images with many objects. In this paper, we practice in medical applications (Loukas et al. 2003); and
presentourJellyBeansuiteofalgorithms,thatcombinesthe countingthenumberofanimalsinphotographsofpondsor
best of crowds and computer vision to count objects in im-
wildlife sanctuaries is often essential for animal conserva-
ages,andusesjudiciousdecompositionofimagestogreatly
tion(Russelletal.1996).Inmanyofthesescenarios,making
improveaccuracyatlowcost.Ouralgorithmshaveseveralde-
errorsincountingcanhaveunfavorableconsequences.Fur-
sirableproperties:(i)theyaretheoreticallyoptimalornear-
thermore,countingisaprerequisitetoother,morecomplex
optimal,inthattheyaskasfewquestionsaspossibletohu-
mans (under certain intuitively reasonable assumptions that computer vision problems requiring a deeper, more com-
wejustifyinourpaperexperimentally);(ii)theyoperateun- pleteunderstandingofimages.
derstand-aloneorhybridmodes,inthattheycaneitherwork Countingishardforcomputers. Unfortunately, current su-
independentofcomputervisionalgorithms,orworkincon- pervisedcomputervisiontechniquesaretypicallyverypoor
cert with them, depending on whether the computer vision
atcountingforallbutthemoststylizedsettings,andcannot
techniques are available or useful for the given setting; (iii)
berelieduponformakingstrategicdecisions.Inourconver-
theyperformverywellinpractice,returningaccuratecounts
sationswithmicrobiologists,theyreportthattheydonotrely
onimagesthatnoindividualworkerorcomputervisionalgo-
on automated counts (Carpenter et al. 2006), and will fre-
rithmcancountcorrectly,whilenotincurringahighcost.
quently spend hours painstakingly counting and cataloging
photographs of cell colonies. The computer vision tech-
1 Introduction
niques primarily have problems with occlusion, i.e., iden-
The field of computer vision (Forsyth and Ponce 2003; tifying objects that are partially hidden behind other ob-
Szeliski 2010) concerns itself with the understanding and jects.Asanexample,considerFigure1,depictingtheperfor-
interpretationofthecontentsofimagesorvideos.Manyof manceofarecentpre-trainedfacedetectionalgorithm(Zhu
the fundamental problems in this field are far from solved, andRamanan2012).Thealgorithmperformspoorlyforoc-
witheventhestate-of-the-arttechniquesachievingpoorre- cludedfaces,detectingonly35outof59(59.3%)faces.The
sults on benchmark datasets. For example, the recent tech- average precision for the state-of-the-art person detector is
niques for image categorization achieve average precision only46%(Everinghametal.2014).Furthermore,thesetech-
ranging from 19.5% (for the chair class) to 65% (for the niquesarenotgeneralizable;separatemodelsareneededfor
airplane class) on a canonical benchmark (Everingham eachnewapplication.Forinstance,ifinsteadofwantingto
et al. 2014). Another study (Nguyen, Yosinski, and Clune count the number of faces in a photograph, we needed to
2015)hasshownthateventhestate-of-the-artdeep-network countthenumberofwomen,orthenumberofpeoplewear-
methods are susceptible to high-confidence erroneous la- ing hoodies, we would need to start afresh by training an
bels, recognizing objects within images containing no ob- entirelynewmodel.
jectsvisibletothehumaneye.
Evenhumanshavetroublecounting. While humans are
Counting is one such fundamental image understanding
muchbetteratcountingthanautomatedtechniques,andare
problem, and refers to the task of counting the number of
goodatdetectingoccluded(hidden)objects,andnotmissing
itemsofaparticulartypewithinanimageorvideo.
them while counting, as the number of objects in the im-
Copyright©2015,AssociationfortheAdvancementofArtificial ageincreases,theystartmakingmistakes.Psychologystud-
Intelligence(www.aaai.org).Allrightsreserved. ies have estimated that human beings have a memory span
age.However,itisnotclearwhenorhowweshoulddivide
animage,orwheretofocusourattentionbyassigningmore
workers. For example, we cannot tell a-priori if all the cell
coloniesareconcentratedintheupperleftcorneroftheim-
age.Figuringoutthegranularityofthissub-divisionscheme
is another problem. Once it has been decided that a partic-
ular image region needs to be divided, how many portions
should it be divided into? Another challenge is to divide
Figure1:ChallengingimageforMachineLearning an image while being cognizant of the fact that you may
6 cut across objects during the division. This could result in
5 double-counting some objects across different sub-images.
Error 4 Lastly,giventhattheremaybecomputervisionalgorithms
ge 3 thatcanprovideusagoodstartingpointfordecidingwhere
a
Aver 2 tionffoorcmuastoiounrainttethnetiobnes,tywetaaynpoothsesirbclhea.llengeistoexploitthis
1
0 Adaptivitytotwomodes.Inthespiritofcombiningthebest
0 10 20 30 40 50 60 70 80 of human worker and computer expertise, when available,
Number of Objects
Figure2:WorkerError wedevelopalgorithmsthatarenear-optimalfortwoseparate
regimesormodes:
of 7 chunks of information at a time (Miller 1956). While • First, assuming we have no computer vision assistance
countingobjectsisasimplerproblemthanrememberingob- (i.e.,nopriorcomputervisionalgorithmthatcouldguide
jects,weexpectasimilar(butpossiblyhigher)thresholdfor ustowheretheobjectsareintheimage),wedesignan
counting—thisisbecauseworkersmayendupcountingthe algorithm that will allow us to narrow our focus to the
sameobjectmultipletimesornotcountthematall.Toob- right portions of the image requiring special attention.
serve this behavior experimentally, we had workers count The algorithm, while intuitively simple to describe, is
the number of cell colonies in simulated fluorescence mi- theoretically optimal in that it achieves the best possi-
croscope images with a wide range of counts. We plot the blecompetitiveratio,undercertainassumptions.Atthe
resultsinFigure2,displayingtheaverageerrorincount(on sametime,inpractice,onarealcrowd-countingdataset,
the y-axis) versus the actual count (on the x-axis). As can thecostofouralgorithmiswithin2.75×oftheoptimal
beseeninthefigure,crowdworkersmakefewmistakesun- “oracle” algorithm that has perfect information, while
tilthenumberofcellshit20or25,afterwhichtheaverage stillmaintainingveryhighaccuracy.
error increases. In fact, when the number of cells reaches • Second, if we have primitive or preliminary computer
75,theaverageerrorincountisasmuchas5.(Therearein vision algorithms that provide segmentation and prior
factmanyimageswithevenhighererrors.)Therefore,sim- count information, we design algorithms that can use
ply showing each image to one or more workers and using thisknowledgetoonceagainidentifytheregionsofthe
thosecountsisnotusefulifaccuratecountsaredesired(as image to focus our resources on, by “fast-forwarding”
wewillshowlater). totherightareas.Weformulatetheproblemasagraph
Theneedforahybridapproach. Thus, since both humans binningproblem,knowntobeNP-COMPLETEandpro-
videanefficientarticulation-pointbasedheuristicforthis
and computers have trouble with counting, there is a need
problem. We show that in practice, on a real biological
foranapproachthatbestcombineshumanandcomputerca-
celldataset,ouralgorithmhasaveryhighaccuracy,and
pabilities to count accurately while minimizing cost. These
only incurs 1.3× the cost of the optimal, perfect infor-
techniques would certainly help with the counting problem
mation“oracle”algorithm.
instanceathand—thealternativeofhavingabiology,secu-
We dub our algorithms for these two regimes as the Jelly-
rity, military, medical, or wildlife expert count objects in
Beanalgorithmsuite,asahomagetooneoftheearlyappli-
each image can be error-prone, tedious, and costly. At the
cationsofcrowdcounting1.
sametime,thesetechniqueswouldalsoenablethecollection
Here is the outline for the rest of the paper as well as our
oftrainingdataatscale,andtherebyspurthegenerationof
contributions(WedescriberelatedworkinSection6.)
evenmorecapablecomputervisionalgorithms.Tothebest
of our knowledge, we are the first to articulate and make • Wemodelimagesastreeswithnodesrepresentingimage
segmentsandedgesrepresentingimage-division.Given
concrete steps towards solving this important, fundamental
thismodel,wepresentanovelformulationofthecount-
visionproblem.
ing problem as a search problem over the nodes of the
Keyidea:judiciousdecomposition. Our approach, inspired
tree(Section2).
by Figure 2, is to judiciously decompose an image into
• We present a crowdsourced solution to the problem of
smallerones,focusingworkerattentionontheareasthatre-
countingobjectsoveragivenimage-tree.Weshowthat
quire more careful counting. Since workers have been ob-
served to be more accurate on images with fewer objects, 1Countingorestimatingthenumberofjellybeansinajarhas
thekeyideaistoobtainreliablecountsonsmaller,targeted beenapopularactivityinfairssince1900s,whilealsoservingas
sub-images,andusethemtoinfercountsfortheoriginalim- unfortunatevehiclefordisenfranchisement(NBCNews2005).
G=(V,E) SegmentationTree
V V0 RootNode
0 Image(Vi) Image/SegmentcorrespondingtonodeVi
F Frontier
TrueCount(Vi) ActualnumberofobjectsinImage(Vi)
WorkerCount(Vi) Worker estimate of number of objects in
Image(Vi)
V1 V2 Ci ChildrenofnodeVi
b Fanoutofthesegmentationtree
d∗ Countingthresholdforworkererrors
V V V5 V6 V7 GP =(VP,EP) PartitionGraph
3 4 Table1:NotationTable
Figure3:SegmentationTree
{V ,V }. V , in turn, can be split into segments {V ,V },
1 2 1 3 4
and V into {V ,V ,V }. Intuitively, the root image can be
2 5 6 7
under reasonable assumptions, our solution is provably thought of as physically segmented into the five leaf nodes
optimal(Section3). {V ,V ,V ,V ,V }.
3 4 5 6 7
• Weextendtheabovesolutiontoahybridschemethatcan
We assume that all segments of a node are non-
work in conjunction with computer vision algorithms,
overlapping. That is, given any node V and its imme-
i
leveraging prior information to reduce the cost of the
diate set of children C , we have (1) Image(V ) =
i i
crowdsourcing component of our algorithm, while sig- (cid:83)
Image(V ) and (2) Image(V ) ∩ Image(V ) =
nificantlyimprovingourcountestimates(Section4). Vj∈Ci j j k
φ∀V ,V ∈ C We denote the actual number of objects
• We validate the performance of our algorithms against j k i
inasegment,Image(V ),byTrueCount(V ).Asignificant
credible baselines using experiments on real data from i i
challengeistoensurethateachobjectappearsinexactlyone
twodifferentrepresentativeapplications(Section5).
segmentinC .WewilldescribehowwehandlethisinSec-
i
tion 3.4. Our assumption of non-overlapping splits ensures
2 Preliminaries (cid:80)
thatTrueCount(V )= TrueCount(V ).
i Vj ∈Ci j
Inthissection,wedescribeourdatamodelfortheinputim-
Our first constraint (1) above ensures that each object is
ages and our interaction model for worker responses. We
countedatleastonce,whilethesecondconstraint(2)ensures
providealistofnotationsinTable1foreasyreference.
thatnoneoftheobjectsarecountedmorethanonce.
Oneofthemajorchallengesofthecountingproblemisto
2.1 DataModel
estimatetheseTrueCountvalueswithhighaccuracy,byus-
Givenanimagewithalargenumberof(possiblyheteroge- ing elicited worker responses. Given the segmentation tree
nous) objects, our goal is to estimate, with high accuracy, GforimageV ,wecanaskworkerstocount,possiblymul-
0
the number of objects present. As noted above in Figure 2, tiple times, the number of objects in any of the segments.
humans can accurately count up to a small number of ob- For example, in Figure 3, we can ask workers to count the
jects,butmakesignificanterrorsonimageswithlargernum- number of objects in the segments (V ), (V ), (V ), (V ),
3 4 5 6
bers of objects. To reduce human error, we split the image (V ), (V = V ∪ V ), (V = V ∪ V ∪ V ),(V =
7 1 3 4 2 5 6 7 0
into smaller portions, or segments, and ask workers to esti- V ∪V ∪V ∪V ∪V ). While we can obtain counts for
3 4 5 6 7
matethenumberofobjectsineachsegment.Naturally,there differentnodesinthesegmentationtree,weneedtoconsoli-
aremanywayswemaysplitanimage.Wediscussourpre- datethesecountstoafinalestimateforV .Tohelpwiththis,
0
cise algorithms for splitting an image into segments subse- weintroducetheideaofafrontier,whichiscentraltoallour
quently.Fornow,weassumethatthesegmentationisfixed. algorithms.Intuitively,afrontierF isasetofnodeswhose
We represent a given image and all its segments in the correspondingsegmentsdonotoverlap,andcovertheentire
formofadirectedtreeG = (V,E),calledasegmentation originalimage,Image(V )onmerging.Weformallydefine
0
tree.Theoriginalimageistherootnode,V0,ofthetree.Each thisnotionbelow.
node V ∈ V,i ∈ {0,1,2,...} corresponds to a sub-
i Definition 2.1 (Frontier). Let G = (V,E) be a segmen-
image,denotedbyImage(V ).WecallanodeV asegment
i j tation tree with root node V . A set of k nodes given by
of V if Image(V ) is contained in Image(V ). A directed 0
i j i F = {V ,V ,...,V },whereV ∈ V∀i ∈ {1,...,k}is
edge exists between nodes Vi and Vj : (Vi,Vj) ∈ E, if afrontier1of2sizekifkImage(V )i=(cid:83) Image(V ),and
and only if Vi is the lowest node in the tree, i.e. smallest 0 Vi∈F i
Image(V )∩Image(V )=φ∀V ,V ∈ F
image, such that V is a segment of V . Note that not all i j i j
j i
segments need to be materialized to actual images — the A frontier F is now a set of nodes in the segmentation
treeisconceptualandcanbematerializedondemand. For tree such that taking the sum of TrueCount(·) over these
brevity, we refer to the set of children of node Vi (denoted nodes returns the desired count estimate TrueCount(V0).
as Ci) as the(cid:83)split of Vi. If Ci = {V1,··· ,Vs}, we have Note that the set containing just the root node, {V0} is a
Image(Vi)= j∈{1,···,s}Image(Vj). trivialfrontier.ContinuingwithourexampleinFigure3,we
Forexample,considerthesegmentationtreeinFigure3. have the following five possible frontiers: {V }, {V ,V },
0 1 2
The original image, V , can be split into the two segments {V ,V ,V ,V },{V ,V ,V },and{V ,V ,V ,V ,V }.
0 1 5 6 7 2 3 4 3 4 5 6 7
2.2 WorkerBehaviorModel 3.2 TheFrontierSeekingAlgorithm
Intuitively,workersestimatethenumberofobjectsinanim- Our algorithm is based on the simple idea that to estimate
age correctly if the image has a small number of objects. the number of objects in the root node, we need to find a
Asthenumberofobjectsincreases,itbecomesdifficultfor frontier with all nodes having fewer than d∗ objects. This
humans to keep track of which objects have been counted. isbecausetheelicitedWorkerCountsaretrustworthyonly
As a result, some objects may be counted multiple times, at the nodes that meet this criteria. We call such a frontier
whereasothersmaynotbecountedatall.Basedontheex- a terminating frontier. If we query all nodes in such a ter-
perimental evidence in Figure 2, we hypothesize that there minating frontier, then the sum of the worker estimates on
isathresholdnumberofobjects,abovewhichworkersstart those nodes is in fact the correct number of objects in the
to make errors, and below which their count estimates are rootnode,givenourmodelofworkerbehavior.Notethatter-
accurate. Let this threshold be d∗. So, in our interface, we minating frontiers are not necessarily unique for any given
asktheworkerstocountthenumberofobjectsinthequery imageandsegmentationtree.
image.Iftheirestimate,d,islessthand∗,theyprovidethat Definition 3.1. A frontier F is said to be terminating if
estimate. If not, they simply inform us that the number of ∀V ∈ F,TrueCount(V)≤d∗
objectsisgreaterthand∗.Thisallowsustocaptheamount
If a node V has greater than d∗ objects, then we cannot
of work done by the workers and ensure that we pay them
estimate the number of objects in its parent node, and con-
a minimum wage–the workers can count as far as they are
sequentlytherootnode,withoutqueryingV’schildren.Our
willing to correctly, and if the number of objects is, say, in
Algorithm 1, FrontierSeeking(G), depends on this ob-
the thousands, they may just inform us that this is greater
servation for finding a terminating frontier efficiently, and
than d∗ without expending too much effort. We denote the
correspondinglyobtainingacountfortherootnode,V .The
worker’sestimateofTrueCount(V)byWorkerCount(V). 0
algorithmsimplyqueriesnodesinatop-downexpansionof
(cid:26) TrueCount(V) :TrueCount(V)≤d∗ the segmentation tree, for example, with a breadth-first or
WorkerCount(V)=
>d∗ :TrueCount(V)>d∗ depth-first search. For each node, we query its children if
andonlyifworkersreportitscountasbeinghigherthanthe
Based on Figure 2, the threshold d∗ is 20. We provide fur-
threshold d∗. We continue querying nodes in the tree, only
ther experimental verification for this error model in Ap-
stoppingourexpansionatnodeswhosecountsarereported
pendixB.Whilewecouldchoosetousemorecomplexerror
assmallerthand∗,untilwehavequeriedallnodesinater-
models,wefindthattheabovemodeliseasytoanalyzeand
minatingfrontier.Wereturnthesumofthereportedcounts
experimentallyvalid,andthereforesufficesforourpurposes.
ofnodesinthisterminatingfrontierasourfinalestimate.
3 Crowdsourcing-OnlySolution
Algorithm1FrontierSeeking(G)
In this section, we consider the case when we do not have
acomputervisionalgorithmatourdisposal.Thus,wemust Require: SegmentationTreeG=(V,E)
useonlycrowdsourcingtoestimateimagecounts.Sinceitis Ensure: TerminatingFrontierF,Actualcountcount
oftenhardtotraincomputervisionalgorithmsforeverynew F ←φ
typeofobject,thisisascenariothatoftenoccursinpractice. count←0
AshintedatinSection2,theideabehindouralgorithmsis createaqueueQ
simple:weaskworkerstoestimatethecountatnodesofthe Q.enqueue(V0){AddrootV0toQ}
segmentationtreeinatop-downexpansion,untilwereacha whileQisnotemptydo
frontier such that we have a high confidence in the worker V ←Q.dequeue()
estimatesforallnodesinthatfrontier. Query(V)
ifWorkerCount(V)>d∗then
3.1 ProblemSetup Q.enqueue(children(V))
else
Wearegivenafixedb-arysegmentationtreei.e.atreewith
F ←F ∪{V}
each non-leaf node having exactly b children. We also as-
endif
sume that each object is present in exactly one segment
endwhile
across siblings of a node, and that workers follow the be-
haviormodelfromSection2.2.Someoftheseassumptions
maynotalwaysholdinpractice,andwediscusstheirrelax-
3.3 Guarantees
ationslaterinSection3.4.
Forbrevity,wewillrefertodisplayinganimagesegment WenowprovetheoptimalitythatourAlgorithm1provides
(node in the segmentation tree) and asking a worker to es- underourproposedmodel.Givenanimageanditssegmen-
timate the number of objects, as querying the node. Our tation tree, let F∗ be a terminating frontier of the smallest
problemcanbethereforerestatedasthatoffindingtheexact possiblesize,havingk nodes.Althoughtherearemanyter-
numberofobjectsinanimagebyqueryingasfewnodesof minatingfrontiers,thereexistsauniqueminimalterminating
thesegmentationtreeaspossible.Next,wedescribeoural- frontier with k nodes, such that all other terminating fron-
gorithm on this setting in Section 3.2, and give complexity tiers,F,have|F| > k.Ourgoalistofindtheminimalter-
andoptimalityguaranteesinSection3.3. minatingfrontierwithasfewqueriednodesaspossible.
First, we describe the complexity of a baseline optimal tree where our algorithm A queries fewer nodes, and A
FS
oracle algorithm using the following lemma, whose proof queries at least b k, where k is the size of the minimal
b−1
followstriviallyfromtheobservationthatweneedtoquery terminatingfrontier.
at least one complete terminating frontier to obtain a count
fortherootnodeofthetree. 3.4 PracticalSetup
Lemma3.1. Thenumberofquestionsrequiredtogetatrue Inthissectionwediscusssomeofthepracticaldesignchal-
countofthenumberofobjectsinthegivenimageis≥k. lenges faced by our algorithm and give a brief overview
ofourcurrentmechanismsforaddressingthesechallenges.
Thus k is the minimum number of questions that an algo-
Whileourcurrentmechanismssufficeforderivingaccurate
rithm with prior knowledge of a minimal terminating fron-
results,itremainstobeseeniffurtherexplorationintothese
tier requires to determine the true count, if it were only al-
choices would give greater benefits—this forms an impor-
lowedtoaskquestionsatnodesofthetree.Weusethisasthe
tantdirectionforourfutureworkinthisspace. Furtherde-
baselinetocompareagainstonlinealgorithms,thatis,algo-
tailscanbefoundinSection5.
rithmswhichhavenopriorknowledgeanddiscovercounts
ofsegmentsonlythroughtheprocessofqueryingthem.To Workererror.Sofar,wehaveassumedthathumanworker
measure the performance of an algorithm against this opti- countsareaccuratefornodeswithfewerthand∗objectsper-
mal baseline, we use a competitive ratio analysis (Borodin mitting us to query each node just a single time. However,
1998).LetA (G)denotethesequenceofquestionsasked, this is not always the case in practice. In reality, workers
FS
ornodesqueried,byanonlinedeterministicalgorithmAon may make mistakes while counting images with any num-
segmentationgraphG.Let|A (G)|bethecorresponding berofobjects(andweseethismanifestinourexperiments
FS
numberofquestionsasked.Fortheoptimaloraclealgorithm aswell).So,inouralgorithms,weshoweachimageornode
OPT, |OPT(G)| = k where k is the size of the mini- to multiple (five in our experiments) workers and aggre-
mal terminating frontier of G. For brevity, we also denote gate their answers via median to obtain a count estimate
FrontierSeekingalgorithmbyA . for that node. We observe that although individual work-
FS
ers can make mistakes, our aggregated answers satisfy our
Definition 3.2 (Competitive ratio). Let G be the set of all
assumptions in general (e.g., that the aggregate is always
segmentation trees with fanout b. The competitive ratio of accurate when the count is less than d∗). While we use a
analgorithmAisgivenbyCR(A)=max |A(G)| . primitiveaggregationschemeinthiswork,itremainstobe
G∈G |OPT(G)|
seen if more advanced aggregation schemes, such as those
Intuitively,thecompetitiveratioofanalgorithmisamea-
in (Karger, Oh, and Shah 2011; Parameswaran et al. 2012;
sure of its worst case performance against the optimal or-
Sheng,Provost,andIpeirotis2008)wouldleadtobetterre-
acle algorithm. Note that since our segmentation graphs
sults;weplantoexploretheseschemesinfuturework.
represent real segmentations on images, the true counts
Segmentation tree. So far, we have also assumed that a
on nodes are constrained by TrueCount(parentnode) =
(cid:80) segmentation tree with fanout b is already given to us. In
TrueCount(childrennodes). This means that when
practice,weareoftenonlygiventhewholeimage,andhave
queryingnodesinsegmentationtrees,onlyanswersthatare
to create the segmentation tree ourselves. In our setup, we
consistentwithourworkerbehaviormodeloversomecon-
create a binary segmentation tree (b = 2) where the chil-
sistent assignment of TrueCounts will be observed. We
drenofanynodearecreatedbysplittingtheparentintotwo
now show that our Algorithm 1 has a constant competitive
halves along its longer dimension. As we will see later on,
ratio that depends only on the fanout of the given segmen-
this choice leads to accurate results. While our algorithms
tation tree. The proof, which basically finds a tight upper-
also apply to segmentation trees of any fanout; further in-
bound to the number of nodes queried by our algorithm, is
vestigationisneededtostudytheeffectofbonthecostand
somewhatstraightforward,andcanbefoundinAppendixA.
accuracyoftheresults.
Theorem3.1. LetGbethesetofallsegmentationtreeswith
Segmentboundaries.Wehaveassumedthatobjectsdonot
fanoutb.Wehave,CR(A )= b overG.
FS b−1 cross segmentation boundaries, i.e., each object is present
Thefollowingtheorem(combinedwiththepreviousone) in exactly one leaf node, and cannot be partially present in
statesthatouralgorithmachievesthebestpossiblecompeti- multiple siblings. Our segmentation does not always guar-
tiveratioacrossallonlinedeterministicalgorithms. anteethis.Tohandlethiscornercase,inourexperimentswe
specifythenotionofa“majority”objecttoworkerswiththe
Theorem3.2. LetAbeanyonlinedeterministicalgorithm
help of an example image, and ask them to only count an
that computes the correct count for every given input seg-
objectforanimagesegmentifthemajorityofitispresentin
mentationtreeGwithfanoutb.Then,CR(A)≥( b ).
b−1 thatsegment.Onceagain,wefindthatthisleadstoaccurate
Theproofofthistheoremislessstraightforward,andcan resultsinourpresentexperiments.
befoundinAppendixA.Thehigh-levelideaisthefollow- Notethatthisnotionof“majorityobject”ishardertode-
ing: We first show that “reasonable” algorithms, i.e., those fineforsegmentationtreeswithfanouthigherthan2,asthe
that query a node only if the parent is queried, and do not sameobjectcouldbepresentin1,2,3or4imagesegments.
continuequeryingbelownodesthatarealreadyknowntobe Tocheckfor“majority”,theworkerwouldhavetolookatall
≤ d∗ query the same nodes as our algorithm. For any “un- segments containing a portion of the object simultaneously
reasonable” algorithm A, we can construct a segmentation and gauge which contains the largest chunk. An alternate
that these priors are only approximate estimates, and still
need to be verified by workers. We discuss details of our
partitioningalgorithm,priorcountestimation,andotherim-
plementationdetailslaterinSection4.3.
Weusethesegeneratedpartitionsandpriorcountstode-
fineapartitiongraphasfollows:
Definition4.1(PartitionGraph). Givenanimagesplitinto
thesetofpartitions,V ,wedefineitspartitiongraph,G =
(a) (b) P P
(V ,E ),asfollows.Eachpartition,u ∈ V ,isanodein
Figure4:Biologicalimage(a)beforeand(b)afterpartitioning P P P
the graph and has a weight associated with it equal to the
prior, w(u) = du. Furthermore, an undirected edge exists
way to handle this could be to instead just ask the worker
betweentwonodes,(u,v) ∈ E ,inthegraphifandonlyif
topopulateatableforeachdisplayedimagecontainingtwo P
thecorrespondingpartitions,u,v,areadjacentintheorigi-
columns: (a) Number of boundaries crossed, and (b) Num-
nalimage.
ber of objects crossing these many boundaries. This table
could then be used to avoid the double-counting of objects Notice that while we have used one partitioning scheme
whenaggregatingcountsacrossdifferentimagesegments. and one prior count estimation technique for our example
here,othermachinelearningorvisionalgorithmsforthis,as
As mentioned earlier, while our present design decisions
well as other settings provide similar information that will
areadequateforhighqualityresults,acompleteexploration
allowustogeneratesimilarpartitiongraphs.Thus,theset-
of the effect of aggregations schemes, fanout, and objects
ting where we begin with a partition graph is general, and
crossingsegmentboundariesisleftforfuturework.Were-
appliestootherscenarios.
visitthesedesigndecisionsinSection5.
Now, given a partition graph, one approach to counting
thenumberofobjectsinthecorrespondingimagecouldbe
4 IncorporatingComputerVision
tohaveworkerscounteachpartitionsindividually.Thenum-
Unliketheprevioussection,whereweassumedafixedseg- ber of partitions in a partition graph is, however, typically
mentation tree, here, we use computer vision techniques very large, making this approach impractical. For instance,
(wheneasilyavailable)tohelpbuildthesegmentationtree, mostofthe5–6partitionsclosetothelowerrighthandcor-
andusecrowdstosubsequentlycountsegmentsinthistree. neroftheimageabovehavepreciselyonecell,anditwould
Forcertaintypesofimages,existingmachinelearningtech- bewastefulandexpensivetoaskahumantocounteachone
niquesgivetwothings:(1)apartitioningofthegivenimage individually. Next, we discuss an algorithm to merge these
suchthatnoobjectispresentinmultiplepartitions,and(2) partitionsintoasmallernumber,tominimizethenumberof
apriorcountestimateofthenumberofobjectsineachpar- humantasks.
tition.Whilethesepriorcountsarenotalwaysaccurateand
stillneedtobeverifiedbyhumanworkers,theyallowusto 4.2 MergingPartitions
skipsomenodesintheimplicitsegmentationtreeand“fast- Givenapartitiongraphcorrespondingtoanimage,welever-
forward” to querying lower nodes, thereby requiring fewer agethepriorcountsonpartitionstoavoidthetop-downex-
humantasks.Notethatthisisadeparturefromourtop-down pansion of segmentation trees described in Section 3. In-
searchbasedapproachdiscussedinSection3.Weformulate stead, we infer the count of the image by merging its par-
ourideaoffast-forwardingthroughthesegmentationtreeas titions together into a small number of bins, each of which
a merging problem and discuss it in greater detail in Sec- canbereasonablycountedbyworkers,andaggregatingthe
tion4.2. countsacrossbins.
Mergingproblem.Intuitively,theproblemofmergingpar-
4.1 Partitioning
titionsisequivalenttoidentifyingconnectedcomponents(or
Asarunningexample,weconsidertheapplicationofcount- bins) of the partition graph, with total weight (or count) at
ing cells in biological images. Figure 4a shows one such mostd∗.Sinceworkersareaccurateonimageswithsizeup
image, generated using SIMCEP, a tool for simulating flu- tod∗,wecanthenelicitworkercountsforourmergedcom-
orescencemicroscopyimagesofcellpopulations(Lehmus- ponents and aggregate them to find the count of the whole
sola et al. 2007). SIMCEP is the gold standard for testing image.Overall,wehavethefollowingproblem:
algorithmsinmedicalimaging,providingmanytunablepa-
Problem4.1(MergingPartitions). Givenapartitiongraph
rametersthatcansimulaterealworldconditions.Weimple-
G = (V ,E ) of an image, partition the graph into k
ment one simple partitioning scheme that splits any given P P P
disjointconnectedcomponentsinG ,suchthatthesumof
such cell population image into many small, disjoint par- P
nodeweightsineachcomponentislessthanorequaltod∗,
titions. Applying this partitioning scheme to the image in
andkisassmallaspossible.
Figure4ayieldsFigure4b.Combinedwiththepartitioning
scheme above, we can leverage existing machine learning Inotherwords,givenapartitiongraphG =(V ,E )of
P P P
(ML) techniques to estimate the number of objects in each animage,wewishtofindmdisjointconnectedcomponents
of the partitions. We denote these ML-estimated counts on inG ,suchthatthesumofnodeweightsineachcomponent
P
eachpartition,u,aspriorcountsorsimplypriors,du.Note isclosetod∗.Enforcingdisjointcomponentsensuresthatno
mergingitsmemberpartitions:ifthepriorcountsareaccu-
rate,theseimagesegmentstogethercompriseaminimalter-
minatingfrontierforsomesegmentationtree.Whileinprac-
tice, they need not necessarily form a minimal terminating
frontier, or indeed even a terminating frontier, we observe
thattheyprovideverygoodapproximationsforone.
Relationship between Partition graphs and Segmenta-
Figure5:ArticulationPoint tiontrees.Sofarwehavelookedattheproblemofcounting
givenapartitiongraphasasanindependentmergingprob-
components overlap over a common object, thereby avoid- lem.Inthissection,wediscusshowthisproblemisrelated
ing double-counting. Furthermore, restricting our search to tothatofcountingonasegmentationtreedescribedinSec-
connectedcomponentsensuresthatourdisplayedimagesare tion 3. First, we note that since partitions are atomic seg-
contiguous — this is a desirable property for images dis- ments that we do not split further, they can be thought of
played to workers over most applications, because it pro- as the leaf nodes of a segmentation tree for a given image.
videsuseful,necessarycontexttounderstandtheimage. Observethatfixingthepartitiongraphdoesnotnecessarily
Hardness and reformulation. The solution to the above fix a segmentation tree – there can be many segmentation
problemwouldgiveustherequiredmerging.However,the treeswiththesamesetofleafnodes(partitions).Intuitively,
problemdescribedabovecanbeshowntobeNP-Complete, eachintermediate(non-leaf)nodeinanysuchsegmentation
using a reduction from the NP-Complete problem of parti- tree is the union of a group of adjacent partitions (or leaf
tioningplanarbipartitegraphs(DyerandFrieze1985);our nodes), and its prior count estimate is the sum of the pri-
setting uses arbitrary planar graphs, and so our problem is orsofitsconstituentpartitions.Notethatbyonlytakingthe
moregeneral.Thus,wehave: union of adjacent partitions to form intermediate nodes of
the segmentation tree, we restrict ourselves to trees where
Theorem4.1(Hardness). Problem4.1isNP-COMPLETE.
eachnode,orimage,iscontiguous.
Weconsideranalternativeformulationfortheabovebal-
Ifthepriorcountonanintermediatenodeisgreaterthan
anced partitioning problem based on the idea that given an
d∗,wecannowdirectlyqueryalowernodeinthesegmen-
upperboundonthesizeofeachcomponent,balancingcom-
tation tree, without verifying this image against the crowd.
ponent sizes is intuitively similar to finding the minimum
Intuitively, this is exactly what we do when we display
number of covering components satisfying the size con-
our merged components to workers — we are skipping the
straint.Thatis,giventotalweightonthegraphofsayD,and
nodes for which querying them would anyway yield incor-
upperboundoncomponentweightd∗,theproblemofparti-
rect answers resulting in image splitting. This is equiva-
tioning the graph into roughly even sized components with
lent to jumping ahead to a lower layer of the segmentation
weightsascloseto(butnotexceeding)d∗willyieldroughly
tree,onethatispotentiallyveryclosetoaterminatingfron-
D/d∗ components.Conversely,theproblemofpartitioning
tier. Also note that jumping too far ahead (segments hav-
thesegmentationgraphintothesmallestsetofcomponents,
ing counts (cid:28) d∗), while yielding accurate counts on each
each upper bounded by size d∗, will yield roughly D/d∗
node,willresultinthecountingofalargenumberofnodes,
even sized components. We formalize this problem of par-
whichwillcorrespondinglyincreasethecostofcounting.As
titioningthesegmentationgraphintothenumberofdisjoint
acompromise,wewishtojumpaheadtonodesthatideally
connectedcomponentsboundedbysized∗ below.Notethat
eachhavepriorcountsjustsmallerthanorequaltod∗.This
whilethis problemisstillNP-COMPLETE,asweseebelow,
motivatesourmergingproblem4.2.
itismoreconvenienttodesignheuristicalgorithmsforit.
Given the hardness of this modified merging problem,
Problem 4.2 (Modified Merging Problem). Let d =
max we now discuss good heuristics for it, and provide theo-
max du, u ∈ V be the maximum partition weight in the
u P retical and experimental evidence in support of our algo-
partition graph G = (V ,E ). Split G into the small-
P P P P rithms. Note that while we discuss the complexity of our
estnumberofdisjoint,connectedcomponentssuchthatfor
merging algorithms, in practice their running time is small
eachcomponent,thesumofitspartitionweightsisatmost
compared to the latency of human workers. It is still im-
k×d .
max practicaltorunabruteforcealgorithmtosolvethemerging
Bysettingk ≤d∗/d intheaboveproblem,wecanfind
max problem optimally – fortunately, our second, relatively so-
connected components whose priorcounts areestimated to
phisticatedheuristic,ArticulationAvoidance,described
beatmostd∗.Observethathere,althoughwedonotstartout
below achieves the theoretical optimum for most problem
withasegmentationtree,thepartitionsprovidedbythepar-
instancesandisveryclosetoitfortherest.
titioningalgorithmcanbethoughtofasleafnodesofaseg-
mentationtreeandourmergedcomponentsformparents,or FirstCut Algorithm. One simple, natural approach to
ancestorsoftheleafnodes.Wecommentontherelationship Problem 4.2, motivated by the first fit approximation to
betweenthepartitiongraphs(discussedinthissection)and the Bin-Packing problem (Coffman Jr, Garey, and Johnson
thesegmentationgraphs(discussedinSection3)ingreater 1996), is to start a component with one partition, and in-
detailbelow. crementallyaddneighboringpartitionsone-by-oneuntilno
Each component producedvia a solution of Problem4.2 more partitions can be added without violating the upper
also corresponds to an actual image segment formed by bound,k×d onthesumofvertexweights.Wereferto
max
thisastheFirstCutalgorithm,anddetailitinAlgorithm2. belookedatforthatcomponentagain. Soeachcomponent
takesatmostO(n)timetoform,wherenisthetotalnumber
ofpartitions.Intheworstcase,wheneverycomponentisan
Algorithm2FirstCut(G ) individualnode(forexampleifthepartitiongraphwasastar
P
witheverynodeconnectedtojustonecommonlargecentral
Require: PartitionGraphG =(V ,E )
P P P node), the number of components formed could at most be
Ensure: SetS ofcomponents
P O(n), resulting in an overall complexity of O(n2) for the
V ←V
copy P FirstCutalgorithm.
S ←φ
P
whileVcopy isnotemptydo While the number of components formed could at most
newComponent←φ be O(n), in practice, given an upper bound on component
ChooseV ∈Vcopy sizeofkdmax,weobservethatthisnumberiscloseton/k
newComponent←newComponent∪V resultinginaneffectiveruntimeofO(n2/k).Also,ifwere-
N←Vcopy.neighbors(V) laxtherequirementofonlymergingadjacentpartitions,then
Vcopy.removeNode(V) the worst case running time of our algorithm is O(n2/k)
whileSumofvertexweightsinnewComponent<k× since (as discussed above) every component will have size
dmaxdo lyingbetween(k−1)dmaxandkdmaxresultinginO(n/k)
ChooseV ∈N components.
newComponent←newComponent∪V Next,wedescribeanalgorithmthatbuildsonthebinning
N←N∪Vcopy.neighbors(V) ideas used in Algorithm 2, while also efficiently handling
N.removeNode(V) casesliketheoneshowninFigure5.
V .removeNode(V)
copy ArticulationAvoidanceAlgorithm. Recallthatap-
endwhile
plyingourfirstcutproceduretoFigure5resultsinpoorqual-
S ←S ∪{newComponent}
P P ity components if we merge partitions B...F to A before
endwhile
G.Intuitively,whenaddingB tothecomponentcontaining
A, the partition graph is split into two disconnected com-
Consider running this algorithm on a complete partition ponents:onecontainingG,andanothercontainingC...F.
graph(everypartitiontoucheseveryotherpartition).Byre- Givenourconstraintrequiringconnectedcomponents(con-
peatingthisproceduretotermination,(1)eachofthemerged tiguous images), this means that partition G can never be
components should have between (k − 1) × d and part of a reasonably sized component. This indicates that
max
k × d objects, and (2) the number of components ob- mergingarticulationpartitionslikeB,i.e.,nodesorparti-
max
tainedwillbeatmost k ×OPT,whereOPT isatheo- tionswhoseremovalfromthepartitiongraphsplitsthegraph
k−1
reticallowerboundtotheminimumnumberofcomponents intodisconnectedcomponents,potentiallyresultsinimbal-
possible in Problem 4.2. In practice, however, we find that ancedfinalmergedcomponents:
forseveralgraphs,certainpartitionsandcomponentsgetdis- Definition 4.2 (Articulation Partition). A node v in graph
connectedbytheformationofothercomponentsduringthis G isdefinedtobeanarticulationpartitioniffremovingit
P
process, resulting in a sub-optimal merging. For example, (alongwiththeedgesthroughit)disconnectsthegraphG .
P
considerthepartitioningshowninFigure5.
Since adding articulation partitions early results in the
Suppose partitions A,...,G contain 100 objects each
formation of disconnected components or imbalanced is-
and parameter k = 6. The maximum allowed size for a
lands, we implement our ArticulationAvoidance algo-
merged component is 6 × d ≥ 6 × 100. Supposing
max rithm detailed in Algorithm 3 that tries to merge them to
we start a component with A, and incrementally merge in
growing components as late as possible. We merge parti-
partitions B,...,F, we end up isolating G as an indepen-
tions as before, growing one component at a time up to an
dentmergedcomponent.Thiscausesasignificantdeparture
upperboundsizeofk×d ,butweprioritizetheaddingof
from our first bound on component size described above max
non-articulationpartitionsfirst.Witheachnewpartition,u,
(500≤(k−1)×d ≤componentcontainingG),which
max addedtoagrowingcomponent,weupdatethegraphofthe
inturnwillresultinahigherfinalnumberofmergedcompo-
remainingunmergedpartitionsbyremovingthenodecorre-
nentsthanoptimal.Notethatifwerelaxedourrequirement
spondingtoufromtheremainingpartitiongraph—wealso
ofonlymergingadjacentpartitionsandallowarbitrarynon-
update our list of articulation partitions for the new graph
connectedcomponents,theaboveguaranteeswouldholdfor
andrepeatthisprocessuntilallpartitionshavebeenmerged
ourFirstCutalgorithmoverarbitrarypartitiongraphs.
intoexistingcomponents.
Theorem 4.2 (Complexity). The worst case complexity of Again, we state the worst-case complexity of our algo-
theFirstCutalgorithmisO(n2)wherenisthenumberof rithmbelow,anddiscussitsproofaswellasthealgorithm’s
nodesinthepartitiongraph. complexityinpracticeinourtechnicalreport.
Proof. First, weobserve that foreach growing component, Theorem 4.3 (Complexity). The worst case complexity of
eachpartitionisexaminedatmostonce-ifitisnotaddedto theAritculationAvoidancealgorithmisO(n2(n+m))
thecomponentatthattime(becauseaddingitwillcausethe wherenisthenumberofnodesandmisthenumberofedges
componenttoexceedthethresholdsize),thenitneednever inthepartitiongraph.
Merging Partitions: Performance
Algorithm3ArticulationAvoidance(G )
P 80
FirstCut
Require: SegmentationGraphGP =(VP,EP) ed 70 ArticulationAvoidance
EnVsucroepy: ←SetVSPPofcomponents generat 456000 Min
SC←φ ns 30
whileVcopy isnotemptydo of bi 20
newComponent←φ # 10
Choose V ∈ V with priority to non-articulation 0
copy 0 2 4 6 8 10 12 14 16 18 20
points
k
newComponent←newComponent∪V Figure 6: Performanceofalgorithmsformergingpartitions:Min
N←Vcopy.neighbors(V) correspondstotheabsoluteminimumnumberofsegmentsrequired
V .removeNode(V) suchthateachsegmenthaslessthank×d objects
copy max
UpdatearticulationpointsofV
copy runtimeofO(n2(n+m)/k). Evaluation.Weperformex-
whileSumofvertexweightsinnewComponent<k×
d do tensive evaluation of our algorithms on synthetic and real
max
Choose V ∈ N with priority to non-articulation partition graphs on the metric of number of components
points generated. Since the optimal, minimum number of bins is
newComponent←newComponent∪V not known to us, we compare our algorithms against the
N←N∪V .neighbors(V) theoretical minimum. That is, given a partition graph with
copy
N.removeNode(V) sumofpartitionweightsN,highestpartitionweightdmax,
VUpcdopayte.raertmicouvleatNioondep(oVin)tsofV acnaldlyumppienrimbuomundnuomnbceormopfocnoemnptosnizeentksdpmoasxsi,btlheeistheoNreti-.
copy kdmax
endwhile Notethatthisisalowerboundtothetrueoptimalnumberof
S ←S ∪{newComponent} components. We show one representative plot for the vari-
P P
endwhile ationinnumberofcomponentscreated(y-axis)onthepar-
titioned image shown in Figure 4b, against the maximum
component size determined by k (x-axis) in Figure 6. The
image has 315 objects with d = 5 and the plot is rep-
Theorem 4.4 (Complexity). The worst case complexity of max
resentativeofthefollowinggeneraltrend—weobservethat
theAritculationAvoidancealgorithmisO(n2(n+m))
ArticulationAvoidance performs close to the theoreti-
wherenisthenumberofnodesandmisthenumberofedges
caloptimum;FirstCut,ontheotherhand,oftengetsstuck
inthepartitiongraph.
atarticulationpartitions,unabletomeetthetheoreticalopti-
mum.
Proof. Each time a partition is added to an existing com-
ponent, we recompute the set of articulation points. Since 4.3 PracticalSetup
the maximum size of a component is kd , we update
max
Inthissectionwediscusssomeoftheimplementationdetails
the set of articulation points at most min(n,kd ) times
max
ofandchallengesfacedbyouralgorithmsinpractice.Many
for each component (if each new partition added is of size
ofthechallengesfacedinSection3.4applyhereaswell.
1). We use a standard library to compute the set of artic-
ulation points whose running time is O(n + m) where n Partitioning. The first step of our algorithm is to parti-
is the number of partitions, and m is the number of edges tion the image into small, non-overlapping partitions. To
between partitions in the partition graph. Here, we use the do this, we use the marker-controlled watershed algo-
articulation pointsfunctionofPython’sNetworkXli- rithm (Beucher and Meyer 1992). The foreground markers
brary (Hagberg, Schult, and Swart 2008) which is imple- are obtained by background subtraction using morphologi-
mented using a non-recursive depth-first-search (DFS) that calopening(BeucherandMeyer1992)withacirculardisk.
keepstrackofthehighestlevelthatbackedgesreachinthe Priorcounts.IntheexampleofFigure4,welearnamodel
DFStree. for the biological cells using a simple Support Vector Ma-
Similar to the FirstCut algorithm, for each grow- chine classifier, trained on positive and negative examples
ing component, each partition is examined at most once. extracted from 25 such images. For a test image, every
So each component takes at most O(n) + O((n + 15 × 15 pixel window in the image is classified as ‘cell’
m)min(n,kdmax)),orO(n(n+m))timetoform.Again, or‘notcell’usingthelearnedmodeltoobtainaconfidence
intheworstcase,thenumberofcomponentsformedcould mapoftheoriginalimage.Theconfidencevalueassociated
at most be O(n), resulting in an overall complexity of witheachwindowisameasureofprobabilitythatthewin-
O(n2(n + m)) for the ArticulationAvoidance algo- dowcontainsacell.Simplelocalmaximasearchandthresh-
rithm. olding of this confidence map provides the locations of the
cells.Formoredetailsofthisapproach,wereferthereader
While the worst case complexity is O(n2(n + m)), in to(Nattkemperetal.2002;Sarmaetal.2015).Empirically
practice,weobservethenumberofcomponentsformedus- we observed that our choice of threshold = 0.5 gave clos-
ing this algorithm is closer to n/k resulting in an effective estestimatestothetruecounts.Notethatthisprocedureal-
waysundercounted,thatis,thepriorcountestimateobtained
for any partition was always smaller than the true number
of objects in that partition. Setting a lower threshold value
cancauseovercounting,andwefoundtheseestimatestobe
moreerroneousthantheonesobtainedbyhigherthresholds.
Additionally,ifthemachinelearningalgorithmovercounts,
theneachcomponentwillhaveTrueCount < d∗.Thisim-
pliesthatthecorrespondingnodeswillliebelowtheactual
minimal terminating frontier in the segmentation tree, and
the size of our guess of the frontier will be higher than the
minimalfrontier.Itisthusmorecostly(intermsofnumber
ofnodesqueried)toovercount.
Traversing the Segmentation Tree. While Section 4.2
givesusasetofmergedcomponents,orunifiedimageseg-
mentswithpriorcountestimates,westillneedtoshowthese
imagestohumanworkerstoverifythecounts.Asmentioned
earlierinthesamesection,bysettingk = (cid:98) d∗ (cid:99),wefind
dmax
connectedcomponentswhosepriorcountisestimatedtobe Figure7:Imagesinthecrowddataset
atmostd∗.Sincethepriorcountestimatesareapproximate
andlowerthanthetruecounts,weexpectthesemergedim- pict people in various settings, with the number of people
age segments constructed to have true counts close to, and (counts)rangingfrom41to209.Thisdatasetisrepresenta-
potentiallyhigherd∗.Oneoptionistohave(multiple)work- tiveoftheapplicationsinsurveillance.Thisisachallenging
erssimplycounteachoftheseimagecomponentsandaggre- dataset,withpeoplelookingverydifferentacrossimages—
gatethecountstogetanestimateforthewholeimage.Since rangingfrompartiallytocompletelyvisible,alignedatdif-
some of these image components may have counts higher ferent angles and sizes, and varying backgrounds. Further-
thanoursetworkerthresholdofd∗,ourmodeltellsusthat more,nopriorsorpartitionsareavailablefortheseimages—
worker answers on the larger components could be inaccu- soweevaluateoursolutionsfromSection3onthisdataset.
rate. So, another option is to use these images as a starting For the rest of this section, we refer to this as the crowd
pointforanexpansiondownthesegmentationtree,andper- dataset.
formaFrontierSeekingsearchsimilartothatinSection3
Theseconddatasetconsistsof20simulatedimagesshow-
by splitting these segments until we reach segments whose
ingbiologicalcells,generatedusingSIMCEP(Lehmussola
countsareallunderd∗.Wecomparethesetwocountingal-
etal.2007).Thenumberofobjectsineachimagewassam-
gorithms(termedAAandAA-Split)furtherinSection5.
pled uniformly from the range 150 to 350. The minimum
numberofobjectsinanimageinour(randomly)generated
5 ExperimentalStudy
datasetwas151,andthemaximumwas328.Thecomputer
We deployed our crowdsourcing solution for counting on vision techniques detailed in Section 4 are applied on each
twoimagedatasetsthatarerepresentativeofthemanyappli- of these images to get prior counts and partitions. For the
cationsofourwork.Weexaminethefollowingquestions: remainder of this section, we refer to this as the biological
• HowdotheJellyBeanalgorithmscomparewiththethe- dataset.
oreticallybestpossible“oracle”algorithmsoncost?
Segmentation Tree Construction. For the crowd dataset,
• How accurate are the JellyBean algorithms relative to the segmentation tree was constructed with fanout b = 2
machinelearningbaselines?
untiladepthof5,foratotalof31nodesperimage.Ateach
• Whatarethemonetarycostsofouralgorithms,andhow
stage, the image was split into two equal halves along the
dotheyscalewiththenumberofobjects?
longer dimension. This ensures that the aspect ratios of all
• How accurate are the JellyBean algorithms relative to
segmentsareclosetotheaspectratiooftheoriginalimage.
directlyaskingworkerstocountontheentireimage?
Given a segment, workers were asked to count the number
• Do we get any additional benefit from counting nodes
of‘majority’heads(asdescribedinSection3.4)—ifahead
below the initially determined terminating frontier
crossedtheimageboundary,itwastobecountedonlyifthe
(whenweworkinconcertwithanMLalgorithm)?
workerfeltthatmajorityofitwasvisible.Toaidtheworker
Later, in Section 5.4, we provide details of experiments
in judging whether a majority of an object lies within the
evaluationg various answer aggregation schemes beyond
image, the surrounding region was also shown demarcated
median. Inourappendix,weexperimentallystudyworker
byclearlines.OnesuchimageisshowninFigure8b.
behavior,includingthefrequencyoferrorsandlevelofcon-
For the biological dataset, bins were generated by our
fidenceinanswers.
ArticulationAvoidance algorithm. As detailed in Sec-
tion 4, we start our search for the terminating frontier
5.1 Datasets
fromthesenodes.Startingfromthesebins,subsequentlay-
Dataset Description. Our first dataset is a collection of 12 ers of the segmentation tree are constructed by applying
images from Flickr. These images, shown in Figure 7, de- ArticulationAvoidance recursively, but with a reduced
Description:Surpassing Humans and Computers with JELLYBEAN: Crowd-Vision-Hybrid Counting Algorithms. Akash Das Sarma. Stanford University.