It’s about time: Online Macrotask Sequencing in Expert Crowdsourcing Heinz Schmitz 6 Trier University of Applied Sciences, Germany 1 [email protected] 0 Ioanna Lykourentzou 2 Luxembourg Institute of Science and Technology n a [email protected] J 5 January 18, 2016 1 ] I Abstract abilities. Results, comparing tas-online S to four typical benchmarks, show that it . s c We introduce the problem of Task Assign- achieves more completed jobs, lower flow [ mentandSequencing(TAS),whichaddsthe times and higher job quality. This work timeline perspective to expert crowdsourc- haspracticalimplicationsforimprovingthe 1 v ing optimization. Expert crowdsourcing in- Quality of Service of current crowdsourcing 8 volves macrotasks, like document writing, platforms,allowingthemtooffercost,qual- 3 product design, or web development, which ityandtimeimprovementsforexperttasks. 0 take more time than typical binary micro- 4 tasks, require expert skills, assume varying 0 1 Introduction . degrees of knowledge over a topic, and re- 1 quirecrowdworkerstobuildoneachother’s 0 contributions. Current works usually as- As the appeal of crowd work increases, there is a 6 1 sume offline optimization models, which growing need to provide support for more complex : considerworkerandtaskarrivalsknownand tasks and workflows. Examples of such tasks in- v donottakeintoaccounttheelementoftime. clude document editing, product design, social inno- i X Realistically however, time is critical: tasks vation and idea development, offered through ded- r havedeadlines, expertworkersareavailable icated platforms (upWorks1, crowdSpring2 etc.), or a only at specific time slots, and worker/task incorporated into traditional ones through complex arrivalsarenotknowna-priori. Ourworkis workflows (e.g. the recently launched CrowdFlower the first to address the problem of optimal Labs3). This type of crowdsourcing is often referred task sequencing for online, heterogeneous, to as expert crowdsourcing, and the tasks that it in- time-constrained macrotasks. We propose volves are referred to as macrotasks [15]. Macro- tas-online, an online algorithm that aims tasks differ from the typically crowdsourced micro- tocompleteasmanytasksaspossiblewithin budget, required quality and a given time- 1https://www.upwork.com/ 2http://www.crowdspring.com/ line, without future input information re- 3http://www.crowdflower.com/blog/introducing- garding job release dates or worker avail- crowdflower-labs 1 tasks in that they require expert skills, assume vary- andworkeravailabilities,asitisthecaseforrealplat- ingdegreesofknowledgeoveratopic,maytakemore forms? worker time and often involve task dependency, i.e. To the best of our knowledge, this is the first workers building on each other’s contributions. work that addresses the problem of assignment and Together with the demand for complex tasks and sequencing optimization for expert crowdsourcing their supporting workflows, customers are increas- tasks. Overall, our three main contributions with ingly interested in performance guarantees, i.e. the this paper are: optimizationofexpertcrowdsourcingintermsofcost, quality and timeliness. Recent studies that examine • Weexplicitlyaddthetimelineperspective totask expert crowdsourcing optimization [2, 14] typically assignment modeling in expert crowdsourcing. seek to find worker assignments per task such that Thatis,ourmodelsincludenotonlytheworker- the worker contributions add up to a required qual- to-task-assignments, but also the rolling out of ity threshold within a given budget. Roughly speak- theseassignmentsalongatimelineunderreason- ing, the crowdsourced tasks play the roles of mul- able constraints. We call this problem modeling tiple knapsacks with some additional concepts like TAS and prove its strong NP hardness. domain-specific expertise and wages per worker, dif- • We propose a online algorithm, tas-online, ferent models of acceptance probabilities or types of which seeks to complete as many jobs as pos- quality aggregation. Unfortunately current studies sible within budget, required quality and given do not take into account the aspect of time, for ex- timeline, by computing worker sequence-to-task ample in terms of task deadlines, worker time con- matchings, and without any future input infor- straints, or time-dependent worker/task variability. mation regarding job release dates or worker As such these studies examine only the assignment availabilities. part of the worker allocation problem (finding which workershouldtakewhichtask), butnotthesequenc- • We illustrate, through simulated and real-world ing part (identifying when each worker should con- experiments, that tas-online can achieve more tribute). Moreover they usually assume an offline completed jobs, lower flow times and higher setting, where the algorithms are provided with the quality compared to four typical benchmarks. completeworker/taskinputinformationatonce. Ina realisticcrowdsourcingsettinghowever,timeisanin- The rest of this paper is organized as follows. herent property: customersrequirethetaskstofinish In section 2 we recapitulate the related literature upon a certain deadline, expert workers are available on crowdsourcing optimization, starting from earlier onlyatspecifictimeslots,andworker/taskarrivalor works that focus on micro-tasks and reaching latest departure information is not a-priori known. Opti- research efforts on knowledge-intensive macro-tasks. mizing for time is thus crucial, and raises the need In section 3 we describe the characteristics of the ex- not only for worker-to-task assignment but also for pertcrowdsourcingsettingthatthisworkappliesand sequencing. It also raises the need for online rather illustrate, through an example, why taking time into than index-based algorithms, which can take effi- account matters in this particular setting. In this cientsequencingdecisionshavingaccessonlytotime- sectionwealsoformallymodeltheTASproblemand dependent information that is available until their prove its strong NP-hardness. Next, in section 4 we decision point. describetheproposedonlinealgorithm(tas-online) In this paper we introduce the problem of crowd- for the solution of the TAS problem. In section 5, sourcing Task Assignment and Sequencing (TAS), we present and discuss the experimental results, ob- which adds the timeline perspective to the crowd- tainedonbothsimulatedandreal-worlddata. These sourcing allocation optimization model: How can we results compare tas-online with four benchmarks find task assignments that can be rolled out in a re- found in the literature, and show that tas-online alistictimeline,featuringunknowntaskreleasedates achieves higher numbers of completed jobs (both in 2 terms of absolute value and as a percentage of the n-ary) worker inputs and their quality is determined solution’s upper bound), lower flow times (the time through external subjective evaluation (for example betweenajob’sreleasedateandthelatestassignment peer review). onthatjob),betterbudgetutilizationandhigherlev- els of quality, comparable only the respective tas- 2.2 Optimizing Task Assignments offline version for certain of the above measures. Finally, we discuss possible extensions of the TAS Regarding microtask assignment optimization, model and algorithm (section 6) and end with the Kargeretal.[23]workwithhomogeneousmicrotasks paper’s main findings and conclusions (section 7). (that all have the same level of difficulty and do no distinguish among task “topics”), and propose 2 Related Work a matching algorithm inspired by the standard belief propagation algorithm for approximating max 2.1 Crowdsourcing Optimization marginals, which is order-optimal and minimizes cost. This study is among the first to show that the Crowdsourcingoptimizationisatermusedinvarious problem of task matching in crowdsourcing can be problem settings, including optimizing the selection transformed to a bipartite graph design problem, ofworkerlaborchannelstoimproveperformance[22], where workers are one part of the graph, tasks discovering the optimal worker wage [17], determin- are the other and the edges represent assignments ingtheoptimalnumberofworkerstoundertakeeach of workers to tasks. Ho et al. [16] work with task so as to maximize quality and minimize cost (a heterogeneous microtasks of n-ary classification method referred to as “plurality optimization” and quality on a model where worker skills per microtask applicable on n-ary tasks with an objective “true are a-priori unknown, and propose a two phase value”)[34],oridentifyingtheoptimalsetoftasksto exploration-exploitation assignment algorithm that forwardtothecrowd(forsystemslikedatabasequery seeks to maximize the total benefit of the requester execution ones, which are based partially on crowd- and is competitive with respect to its counterpart of sourcing and partially on automated methods) [12]. known worker skills. Yuen et al. [43, 44, 42] propose Thefamilyofoptimizationproblemsthatourwork a matrix factorization approach that utilizes the fallsintoisallocationoptimization,i.e. theidentifica- workers’ task performance and search history to tion of which worker should work on which task and derive their preferences and perform an improved when, in order to optimize one or more global per- task-to-worker matching. Regarding macrotask formancemetrics,whichusuallyincludecost,quality assignment optimization Goel et al. [14] and Roy andnumberofacceptabletasks. Thisfamilyofprob- et al. [2, 39] both propose task-to-worker assign- lems consists of two distinct optimization problems, ment optimizations (the first using a mechanism task assignment and task sequencing. Task assign- design-based approach and the second through an ment examines which worker should be given which index-based approach) on models that consider het- task. Task sequencing adds the element of time, ex- erogeneous macrotasks and where the optimization aminingwhen willtheworkerbegiventhetask. Most goal is to maximize the utility of the requester while existing works focus on the first problem, i.e. task ensuring budget feasibility. Yue et al. [41] add to assignment, either for microtasks or for macrotasks. this model the element of team instead of individual Microtasks are tasks that require a small amount worker assignments, and propose a heuristic genetic of worker time, accept binary (true/false) or n-ary algorithmthatoptimizesfortaskbudgetandquality, (multiple choice) worker inputs, and the quality of taking into account worker pay expectations, skills which is determined through methods such as ma- and availability. Jabbari et al. [20] add another jority voting (assigning multiple workers per micro- interesting facet to the heterogeneous task assign- task). Macrotasks [15] require more worker time, ac- ment model, by extending it to cover the online cept open-ended continuous (rather than binary or aspect of the problem (workers arrive online, no 3 prior knowledge over arrivals), and they impose mechanismdesign[5]mechanismshavealsobeenem- certain constraints that must be respected, such as ployed to manipulate the time behavior of the crowd declaring feasible tasks that workers can handle and towards an efficient execution of time-critical tasks. the payment they require. The difference of the The above works are different from our work, in that above works with ours is that their models do not theydonotexplicitlysequencethetaskstothework- consider the element of time, i.e., they focus only ers but they rather seek to incentivize the crowd’s on the task assignment and not the task sequencing timely responses in order to achieve the time-related aspect of the problem. task objective. 2.3 Optimizing Task Timing Finally,afewstudiesfocuspreciselyontime-sensitive Inregardstotime-sensitive macrotasks Khazankin optimization. Regarding time-sensitive microtasks, et al. [26] propose a mathematical optimization ap- Yuet al. [40]optimize the numberof tasks torecom- proach that learns the task selection behavior of mendtoeachworkerpertimeunitwiththeobjective workers and then executes tasks in a manner that of maximizing the time average number of success- optimizes for cost and considers deadlines. This ful (i.e. of acceptable quality) jobs for a given time approach does not consider task quality, yet it is period. Their model assumes binary task quality, one of the first attempts to sequence time-sensitive a pull-and-filter task selection model (workers select macrotasks. Finally, Boutsis and Kalogeraki [7] which tasks they want to work on and the system propose an multi-objective optimization approach filters these selections according to its optimization which searches for Pareto-optimal solutions, seek- objective) and performs task allocation on the basis ing to identify the group of workers (among multi- of worker accuracy measured in a [0,1] scale using a ple candidate groups) with the highest probability of heuristic algorithm of linearithmic complexity. Al- finishing the task on time. This approach is differ- though this study does incorporate the element of ent than ours, in that it does not apply sequencing time, it is different than ours in that the model it along a timeline, but rather makes one-shot assign- uses assumes binary, homogeneous tasks rather than ments based on the worker probabilities of meeting heterogeneous tasks of continuous quality. The use the deadline. of homogeneous tasks (all tasks have the same diffi- culty, no distinction of task topics) means that op- timization needs to be performed in terms of the number of jobs per worker, rather in terms of allo- cating specific workers to specific jobs according to Overall, crowdsourcing optimization studies have their skills. Bernstein et al. [4, 3] also work with ho- so far examined mainly the assignment but not the mogeneous microtasks and propose a retainer model sequencing aspect of the problem. The works that for pre-recruiting (reserving) the optimal number of optimize for time-sensitive task characteristics are workers, so as to minimize task completion latency. fewandtheyeitherfocusonbinary/n-arymicrotasks Thisworkhoweverdoesnottakeintoaccountworker (whichdiffersignificantlyfromtheexpertmacrotasks skills and subsequently does not seek to maximize that we are interested in) or they do not sequence task quality. On heterogeneous microtasks, Faridani the worker-to-task assignments along a timeline. We et al. [13] and Minder et al. [33] add the element of address in our work the problem of optimal task se- pricing to the problem model, proposing task pricing quencing for online, heterogeneous, time-constrained, algorithmsthataimtomaximizethenumberoftasks macrotasks. Aswewillseeinthefollowingsection,it finishing on time (the first study), while respecting isthistypeoftasksthatexpertcrowsourcingconsists budget and quality constraints (the second study). of,andthustheiroptimizationhassignificantimpact Mechanism design [35, 36] and multi-armed bandit on many recent platforms and applications. 4 3 Task Assignment and Se- manner. Anysequencingdecisionmustbemade quencing (TAS) based on task/worker information available up to the specific point in time. In this section we first describe, from a high-level 4. Time-constrained rather than only viewpoint, the expert crowdsourcing task model cost/quality-constrained tasks. In addition which we target through this work. Then we provide to the need of achieving a certain utility metric anexampletoillustratetheimportanceofaddingthe (e.g. quality, number of acceptable tasks etc.) timeline sequencing element into the above setting. and the need to keep costs within budget, Next we define the TAS problem model, in terms the online macrotasks of this setting have a of the input data, feasible solutions, constraints and deadline, i.e., they must also finish by a specific optimization goal. Finally we analyze this problem time point. model in terms of complexity. Applicationareas. Manytasksandplatforms,es- 3.1 Expert Crowdsourcing Setting pecially recent ones, fit the above expert crowdsourc- The expert crowdsourcing problem setting, at which ing setting and could benefit from its optimization. this work is aimed, features some very particular The first example are platforms such as upWork4 characteristicsthatmakeituniquecomparedtoother that work with freelancer experts on creative tasks crowdsourcing problem settings: suchaswebdesignanddevelopment,documentwrit- ing, or coding. Social innovation platforms such as 1. Heterogeneous rather than homogeneous OpenIDEO5orcreativeproductdesignplatformslike tasks. We work with crowdsourcing tasks that Quirky6, where users build on one another’s ideas, require different skills and skill levels from the could also directly benefit from the optimization of workers, and that belong to multiple “topics” the above setting. Finally collaborative document (rather than a single one). Workers in this set- editing applications, such as corporate wikis [29], ting possess a set of skills, and are less replace- where it is possible to sequence worker contributions able and less abundant than crowdsourcing set- alongatimelinecouldsignificantlyimprovefromour tingsthatconsiderhomogeneoustasksandskills approach. (everyone can do every task). 2. Macro- rather than microtasks. Whereas 3.2 The importance of the time ele- microtasks accept binary or n-ary (multiple ment choice) worker input, and their quality is de- finedbyassigningmultiplesimultaneousworkers Before giving the formal definition of the TAS opti- on the task and then performing majority vot- mization problem, we illustrate through an example ing, macrotasks feature open-ended worker in- theimportanceofaddingtheperspectiveoftime,and put (e.g. write a product description), and their howthisadditionhasasignificantonperformancein quality is built by one worker contribution af- expert crowdsourcing. ter the other (sequentially rather than simulta- neously). Example. Suppose there are only two jobs given, 3. Online rather than offline. Rather than both from the same knowledge domain. Each job workingonasimplifiedofflinesetting,wherethe j = (Q,C) has a quality threshold Q that needs to poolofworkersand/ortasksarea-prioriknown, be reached, and a cost threshold C that must not be we consider an online setting, where workers 4https://www.upwork.com/ demonstrate a dynamic flow of arrivals and de- 5https://openideo.com/ partures and tasks arrive in an unpredictable 6https://www.quirky.com/ 5 exceeded. For this example suppose timeslot 0 1 2 i × 0 j =(5,5) and j =(4,4). i j 0 1 1 0 i j × 2 0 On the other hand, each worker i = (e,w) has an expertise e that increases the quality of a job, and Now j1 cannot be completed without violating the a wage w that consumes this job’s budget. Let us sequential working assumption w.r.t. this job. On assume that three workers the other hand, if we set i2 on j0 at timeslot 2, we can complete both jobs with the schedule i =(2,3),i =(3,2) and i =(2,1) 0 1 3 timeslot 0 1 2 are given. Then each job has two possible assign- i j 0 1 ments within budget and with sufficient quality: i j 1 0 i j j 2 1 0 j : {i ,i } or {i ,i } 0 0 1 1 2 without violating any constraints. To complete the j : {i ,i } or {i ,i } 1 0 2 1 2 discussion, notethatifwechoosej ←{i ,i }inthe 1 1 2 beginning, then j cannot be completed no matter For both jobs the latter assignment seems to be 0 what timeslot is used for i (end of example). preferable over the other since workers {i ,i } pro- 2 1 2 The example shows that not only the choice of an vide more quality for less cost. Now we look at optimal worker-task assignment without considera- a sequencing period of three timeslots with limited tion of time may be misleading, but also that the worker availability as follows (both jobs are released specific selection of timeslots is important. immediately): timeslot 0 1 2 3.3 The TAS Problem Model i × 0 i × With the following definition we want to capture the 1 i × × interplay between task assignment and timeline se- 2 quencing within the same model, and add appropri- Since worker i is available only at a single timeslot 1 ate constraints. We refer to our problem description itis clear thatat mostone jobcan realizethe prefer- as task assignment and sequencing (TAS) in able assignment mentioned above. So assume for the expert crowdsourcing. moment that we choose modestly j ← {i ,i } for 0 0 1 j . This gives us the following partial schedule for 0 3.3.1 Input Data the workers: Scheduling period. Suppose we look at t timeslots timeslot 0 1 2 [t] = {0,1,...,t−1}. Each timeslot d ∈ [t] is also i j 0 0 called a day but can be any fixed period of time. i j 1 0 Knowledge domains. A finite set K of know- i × × 2 ledge domains. Each k ∈ K represents an area of But now none of the two feasible assignments for j knowledge or a knowledge topic. 1 canberealizedsinceonlyworkeri remainsavailable. Workers. AfinitesetU ofusers,herebyreferedto 2 Althoughtheassignmentj ←{i }iswithinbudget, as workers, participating in the crowdsourcing plat- 1 2 it does not reach the needed quality, and j remains form. Each worker i ∈ U has the following charac- 1 incomplete in this case. teristics7: So let us choose the alternative assignment j0 ← 7Note that in the context of this work the quantification {i1,i2} and set i2 on j0 at timeslot 0: of worker expertise, wage or speed are considered orthogonal 6 • Expertise. An expertise vector e of dimension difficulttoachievewhenspecificworkerskills(i.e. ex- i |K|. The expertise e of a worker denotes the perts on a topic) are required. Third, sequentiality ik addedqualitythattheworkercanbringtoajob allowsamorerealisticcouplingofourapproachwith belonging to domain k. worker skill evaluation mechanisms, making it easier to accurately evaluate the quality that each worker • Wage. A cost vector w of dimension |K|. The i has brought once she has finished working on a task. amount w is the monetary renumeration that ik Nevertheless, as also discussed in section 6 an exten- the worker demands in order to perform a job sion of our model to include worker concurrency is belonging to domain k. feasible and we aim to examine it as part of our fu- • Availability. An availability vector a of dimen- ture work. i siontwithentriesa =1ifworkeriisavailable id on day d, and aid =0 otherwise. 3.3.2 Feasible Solutions, Constraints and Jobs. A finite set J of knowledge-intensive jobs Optimization Goal that are crowdsourced. A job j is assumed to have A schedule needs to carry information about the re- the following characteristics: sourceallocationforeachjobintermsofworkersand • Domain. Each job belongs to exactly one do- in terms of time: When does what worker contribute main k ∈K. to which job? j Solutions. In a solution for input data x = • Quality threshold. The amount Q is the mini- j (t,K,U,J) we have for each job j ∈J a vector U of j mum quality that the job needs to achieve. dimension t with entries from U∪{none}. If U =i jd • Cost threshold. The budget for job j is given by then worker i is assigned to job j and scheduled on Cj asthemaximumtotalamountofmoneythat day d, and if Ujd = none then there is no worker can be paid for the job. assignment for job j on day d. Note that we represent solutions hereby as • Release date. Eachjobhasareleasedater ∈[t] j job/timeslot-schedules with worker entries, whereas which means that job j enters the crowd system in the previous example we utilized an equivalent at timeslot r (and never leaves the system). j worker/timeslot-representation with job entries. So Sequentiality. Finally our model assumes a se- the successful schedule from the example in the quential work mode along the timeline, according to present notation is which workers build on one another’s contributions, U = (none,i ,i ) at most one worker can be assigned to a task simul- 0 1 2 taneously, and each worker contributes at most to a U = (i ,none,i ) 1 2 0 singletaskatatime. Sequentialityischosenforthree reasons. Firstitisoftenimposedbythenatureofex- Constraints. A solution is called feasible if and pert crowdsourcing macrotasks, which are not easily only if the following holds: decomposable to microtask level and as such they do notallowmultiplesimultaneousworkercontributions (a) No worker is assigned to more than one job at (e.g. writing a document cannot be done by decom- a time, i.e., for all d ∈ [t] and distinct j,j(cid:48) ∈ J posing it to sentence level). Second, sequentiality al- it holds that Ujd (cid:54)= Uj(cid:48)d (unless both values are lows building on the task’s quality while not necessi- none). tating worker concurrency, which in practice is more (b) No job is assigned to more than one worker at tothestudiedassignmentandsequencingproblem. Theinter- a time, i.e., for all j ∈ J and d ∈ [t] there is at estedreaderisreferredto[29,8,19]foravailableworkerprofile most one worker stored in U . This is ensured quantification techniques based on machine-learning, implicit jd evaluationorinformationtheory. by the representation of Uj. 7 (c) No worker is assigned more than once to the 3.4.1 Allocation of Workers: The Multiple samejob,i.e.,forallj ∈J anddistinctd,d(cid:48) ∈[t] Knapsack perspective it holds that U (cid:54)= U (unless both values are jd jd(cid:48) If we look only at worker allocation in our model, we none). can understand each job j of domain k with budget (d) Noworkerisscheduledonadaywheresheisnot C asaknapsackofthissizethatweneedtofillwith j available, i.e., for all d ∈ [t] and j ∈ J it holds worker’s expertises e . Since the worker availabili- ik that if Ujd =i then aid =1. ties restrict the number of times a single worker can be packed, we have a bounded version of the multi- (e) No job is worked on before its release date, i.e., ple knapsack problem [25]. The difference to this forallj ∈J andd<r itholdsthatU =none. j jd classical problem is the optimization goal. While in (f) No job exceeds its budget, i.e., for all j ∈ J it TAS we want to maximize the number of completed holds that c ≤ C where c is the cost of job j jobs with respect to their individual quality thresh- j j j defined as cj =(cid:80)i∈Ujwik if j has domain k. moldizseQthj,etshuemgooaflailnlpmauckletdipelxepekrntiasepss,ancokmisatttoemrhaoxwi- Note that there always exists a trivial feasible solu- these spread over the different knapsacks. tion with U =none for all j,d. jd Objective. In order to assess the quality of a fea- 3.4.2 Allocationoftimeslots: TheOpenshop sible solution y ={j (cid:55)→U | j ∈J} we determine for j perspective each j with domain k the quality of job j w.r.t. this (cid:80) solution as q = e . Note that in the con- j i∈Uj ik On the other hand, let us suppose a worker-task- textofthisworkwedefinetaskqualityasthesumof assignment is already fixed such that all jobs reach expertises of the workers that participate in it, using their quality thresholds, and we need to schedule theadditiveskillaggregationmodelthatisoftenused the selected workers along the timeline with respect for expert sequential macrotasks such as document to job releases and worker availabilities. Then we editing [1, 31]. Other aggregation functions, includ- can understand this partial problem as a machine- ing minimum, maximum or product [1] could also be scheduling problem: Here workers play the role of used to compute a task’s quality, however their full machines and jobs need to be processes on these ma- examination is out of the scope of this work. chines. Observe that the order of processing is im- Now we set the measure for input x = (t,K,U,J) materialinourmodel,thatwedemandsequentiality, and solution y to and that the processing time of a job on a certain machine is either 0 or 1 per timeslot (depending on m(x,y)=|{j ∈J | q ≥Q }| j j whether the respecting worker is assigned to this job which we want to maximize. Therefore we count or not). So this aspect of TAS is a unittime open- thenumberofjobsthatreachtheirqualitythreshold shop problem with limited machine-availability and withinbudgetandthatcanbescheduledinafeasible job release-dates [27]. Note that the adoption of a way w.r.t. constraints (a) to (f). We call such jobs modelalsoimpliesnon-preemption,i.e. aworkercan- completed. not be interrupted once he/she has started working on a task. The goal of maximizing the number of 3.4 TAS: An allocation problem with completed jobs translates to minimizing the number of late jobs if we set t as the gobal deadline. We two aspects also want to mention that the sequencing of an al- The TAS optimization problem combines aspects of ready fixed worker-task-assignment can be reduced twowell-studiedproblemsofdifferentnature,reflect- tothebipartitelistedge-coloringproblem[11]. ing resource allocation of workers on one hand, and Herejobsandworkersformabipartitegraphwiththe allocation of timeslots on the other. worker-task-assignments as it’s edges, and timeslots 8 are represented by colors. Then we assign a list of availableonday0(1,2,resp.) andusedomainstofix colors to each edge (j,i) such that worker i is avail- the given triples from W. More precisely, we define able on these timeslots and job j is already released. a corresponding TAS-instance (t,K,U,J) as follows: Apropercoloringofalledgescanbefoundifandonly ifthepreviouslyfixedworker-task-assignmentcanbe • The scheduling period has t=3 timeslots. sequenced on the timeline. • There are |W| many different domains in K. 3.4.3 TAS Complexity • Eachtriplew ∈W isencodedasajobj ,andall w BothaspectsofTASthatwehavepointedoutabove jobshavepairwisedifferentdomains. Foralljobs are NP-hard on their own, so is TAS as we show jw we set quality and cost threshold to Qjw = C =3 and release date to r =0. below. For an upper complexity bound note that jw jw thelengthofTAS-solutionsispolynomiallybounded • Workers are defined as U = X ∪ Y ∪ Z. For in the input length and that the constraints can be x ∈ X, y ∈ Y and z ∈ Z we set the availability checked in polynomial time if a solution is given, so to a =(1,0,0), a =(0,1,0) and a =(0,0,1), TAS is an NP-optimization problem. Moreover, we x y z respectively. To fix expertise and wage, we con- observe that TAS is a large number problem, since sider each triple w = (x,y,z) ∈ W and the cor- it has knapsack as a subproblem (if there is only a responding job j . If j has domain k then we single job and each worker is available on a different w w define e = e = e = 1 and w = w = single day). So it is reasonable to consider strong xk yk zk xk yk w = 1. All entries in expertise and wage vec- NP-hardness. zk tors that are not addressed hereby are set to 0. Theorem 1. TAS is a strongly NP-hard optimiza- tion problem. First observe that a job j with w =(x,y,z) reaches w it’s quality threshold if and only if we assign work- Proof. WeshowNP-hardnesswithapolynomial-time ers {x,y,z} to this job, since exactly these workers many-one reduction from the NP-complete problem 3-Dimensional Matching [24]. For finite, dis- contribute to the job’s domain. Now we argue that the given 3-DM instance has joint sets X, Y and Z we say that M ⊆ X × a perfect matching M if and only if the constructed Y × Z is a 3-dimensional matching if for all dis- TAS instance has a feasible solution with |M| = u tinct triples (x ,y ,z ),(x ,y ,z )∈M it holds that 1 1 1 2 2 2 completed jobs. If M ⊆ W is a 3-dimensional x (cid:54)= x , y (cid:54)= y and z (cid:54)= z . It is known that 3-1Dime2nsio1nal M2 atchin1g is N2P-complete even in matching, then we consider the TAS-solution Ujw = (x,y,z) for all w = (x,y,z) ∈ M. Since M is a the special case when |X| = |Y| = |Z| = u and M matchingalldistinctsolutionvectorsdifferinallcom- has to be a perfect matching with |M|=u. ponents, so constraint (a) is satisfied. All other con- 3-Dimensional Matching (3-DM) straints are easy to check, just note that each worker Input: Finite and disjoint sets X,Y,Z with is available only on a single day, that all jobs are im- |X| = |Y| = |Z| and a subset W ⊆ mediately released and that no job can exceed the X×Y ×Z. budget. All jobs in this solution are completed due Question: Is there a perfect 3-dimensional to our previous observation. matching M ⊆W ? Conversely, note that if there is a feasible TAS- Suppose an instance of 3-DM is given with X = solution with completed jobs j and w = (x,y,z), w {x |i∈[u]},Y ={y |i∈[u]},Z ={z |i∈[u]}for then it must be that U = (x,y,z). Since con- i i i jw some u≥1 and W ⊆X×Y ×Z. The idea is to use straint (a) holds, the solution vectors for any two constraint(a)(noworkerisassignedtomorethatone distinct jobs differ in all components. So M = job at a time) to achieve the needed matching con- {(x,y,z) | U = (x,y,z) and j completed} is a 3- jw w dition. We take elements from X (Y, Z) as workers dimensional matching and |M|=u. 9 The reduction function maps only to TAS- So conversely, if we proceed day by day with our instances where all integer values are polynomially online algorithm, we can try to compute a matching bounded in the input length, so strong NP-hardness between the active (= released but incomplete) jobs follows. J(cid:48) inthesystemonthatday, andtheavailablework- ers U(cid:48) for that day. Note that due to this choice of Thisrulesoutthepossibilityofpseudo-polynomial J(cid:48) andU(cid:48) wealsoimmediatelysatisfyconstraints(d) algorithms and the existence of fully polynomial- and (e). It remains to consider constraints (c) (no timeapproximationschemesforTASunlessPequals workerassignmenttothesamejobtwice)and(f)(no NP. Furthermore note that the reduction empha- job exceeds it’s budget). Both can be taken care of sizes the aspect of timeline sequencing, since worker- when we construct the edges of possible assignments task-assignments in the constructed TAS-instance in the bipartite graph between J(cid:48) and U(cid:48): If the re- are trivial (there is exactly one feasible worker- maining budget for a job is smaller than the wage of assignment possible to reach the quality threshold of a worker in this domain, then the edge is omitted. each job). The same is true if the worker has already been as- signed to this job in the past. Both conditions can be checked when looking at the partial solution for 4 An Online Algorithm for timeslots<d. Together,thisonlineprocedureresults TAS inaseriesofmatchingsM ford=0,1,...,t−1that d form a feasible solution y for TAS. Duetothedynamicnatureofcrowdsourcingsystems, it seems not realistic to consider TAS as an offline problemwherealgorithmsareprovidedwiththecom- More than that, we want to choose a sequence of plete input at once. In fact, worker availabilities are matchings that yields a large number of completed hardly predictable and it is usually not known in ad- jobs. Among all possible matchings for each day d, vance which jobs will enter the system at what time. which is the right one? We propose a greedy ap- So the problem of task assignment and sequencing is proach and compute in each step a matching, such inherently online in nature and sequencing decisions that the sum of profits we get from the respective havetobetakenwithoutcompleteinformationabout assignments for that day is maximized. More pre- the input data. We say that an algorithm for TAS cisely, we construct for each day a weighted bipartite has the online property, if it processes the input in a graph where each possible assignment (edge) claims serialwayw.r.t. thetimelined=0,1,...andineach a certain profit. In our algorithm, the profit is just step d the algorithm has to take its assignment deci- the amount of expertise per wage unit (efficiency). sions while having access only to the time-dependent The problem max weighted bipartite matching informationoftheinputfortimeslots≤d. Theseare can be solved to optimality by known algorithms in the worker availabilities and the jobs released up to polynomial time, e.g. if we apply the Hungarian day d. For more background on the general concept Method this step has a running time proportional to of online algorithms we refer to [6]. O((|J(cid:48)|+|U(cid:48)|)2|E|) [28]. So we obtain the following To design such an algorithm we start with the fol- onlineAlgorithm1forTASwithpolynomialrunning lowing observation: Suppose a feasible solution y for time O(t|J|3|U|3). TAS is given. If we look at a single day d in this solution then the assignments of workers to jobs for that day form a bipartite matching between the (un- completed) jobs with (remaining) quality needs and This algorithm can be viewed as an online schema budget on one hand, and the set of available workers that allows multiple extension, which we discuss in for that day on the other hand. Constraints (a) and the last section after some experimental evaluation (b) form exactly this bipartite matching condition. using the present basic version. 10