ebook img

Data Conflict Resolution Using Trust Mappings PDF

12 Pages·2010·1.4 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Data Conflict Resolution Using Trust Mappings

Data Conflict Resolution Using Trust Mappings Wolfgang Gatterbauer Dan Suciu UniversityofWashington,Seattle UniversityofWashington,Seattle [email protected] [email protected] ABSTRACT glyph origin glyph origin Inmassivelycollaborativeprojectssuchasscientificorcom- b1 ship hull Alice ship hull munity databases, users often need to agree or disagree on b2 cow Bob fish the content of individual data items. On the other hand, b3 jar Charlie arrow trust relationships often exist between users, allowing them b fish Bob 4 to accept or reject other users’ beliefs by default. As those b knot Charlie 5 trustrelationshipsbecomecomplex,however,itbecomesdif- b arrow Bob, Charlie ficulttodefineandcomputeaconsistentsnapshotofthecon- 6 flictinginformation. Previoussolutionstoarelatedproblem, (a) (b) the update reconciliation problem, are dependent on the or- Figure 1: Origins of three Indus glyphs as asserted der in which the updates are processed and, therefore, do by archeologists Alice, Bob and Charlie (a). Alice’s not guarantee a globally consistent snapshot. beliefsafterapplyingtrustmappingsfromFig.2(b). Thispaperproposesthefirstprincipledsolutiontotheau- tomaticconflictresolutionprobleminacommunitydatabase. Our semantics is based on the certain tuples of all stable highlydisputedandstillchangingbeliefsisthecurrentstate models of a logic program. While evaluating stable models of knowledge on the Indus script [17]. ingeneraliswellknowntobehard,evenforverysimplelogic programs, we show that the conflict resolution problem ad- Example 1.1 (Indus script). The Indus civilization mitsaPTIMEsolution. Tothebestofourknowledge,ours flourished about 2600 to 1900 B.C. in what is now eastern is the first PTIME algorithm that allows conflict resolution Pakistan and northwestern India. No historical informa- in a principled way. We further discuss extensions to nega- tion exists about the civilization, but archaeologists have un- tivebeliefsandprovethatsomeoftheseextensionsarehard. covered samples of their writing on stamp seals, sealings, ThisworkisdoneinthecontextoftheBeliefDB projectat amulets, and small tablets. To this day, the script on these theUniversityofWashington,whichfocusesontheefficient objects remains undeciphered and various claims of deci- management of conflicts in community databases. pherments have been put forward. One important step in Categories and Subject Descriptors: H.2.5: Heteroge- such interpretations is determining the“origin”of the glyph neous Databases; H.3.3: Information Search and Retrieval which is the actual motif that was stylized into the glyph. For example, the origin of the glyph has been attributed General Terms: Algorithms, Performance by different archeologists1 to be either a ship hull, a cow, or a jar. Similarly, the glyph has been interpreted to 1. INTRODUCTION originate from a fish or a knot, whereas the glyph is In many scientific projects today, a community of users widely agreed to represent an arrow [15]. The current state isworkingtogethertoassemble,revise,andcurateashared ofknowledgecanberepresentedinarelationaltable(Fig.2a) datarepository. Sincethetruestateoftheworldisgenerally where conflicting beliefs of researchers (here simplified with notknowninmanyscientificdisciplines,userswilloftenhave Alice, Bob and Charlie) are represented by tuples (here tu- not just overlapping but also conflicting beliefs about the plesb tob )withpartialkeyviolations,highlightingthelack 1 6 world. Hence, as the community accumulates knowledge of agreement among archeologists. andthedatabasecontentevolvesovertime,itwillinevitably contain conflicting information. An example domain with Recent work has proposed trust mappings between users in order to support collaboration and data sharing [19]. A trustmappingisastatementthatauseriswillingtoaccept anotheruser’sdatavalue. Prioritiesarefurtherusedtospec- Permissiontomakedigitalorhardcopiesofallorpartofthisworkfor ifyhowtoresolveconflictsbetweendatavaluescomingfrom personalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesare differenttrustedusers. Figure2illustratesthreetrustmap- notmadeordistributedforprofitorcommercialadvantageandthatcopies pings, two defined by Alice (m and m ) and one defined bearthisnoticeandthefullcitationonthefirstpage.Tocopyotherwise,to 1 2 republish,topostonserversortoredistributetolists,requirespriorspecific by Bob (m3). Figure 2b shows Alice’s version of the data permissionand/orafee. after applying these mappings to Example 1.1: The second SIGMOD’10,June6–11,2010,Indianapolis,Indiana,USA. 1Parpola, Mahadevan, and Knorozov [15] Copyright2010ACM978-1-4503-0032-2/10/06...$10.00. and third tuples result from her trust of Bob and Charlie. Where they disagree (on ), Alice sees Bob’s value (fish) insteadofCharlie’svalue(knot)becausesheassignedtoBob 100 a higher priority (100) than to Charlie (50)2. Several systems have adopted some form of conflict han- Bob dling or trust mapping in order to facilitate data sharing 80 amongusers[7,9,11,16,19](seeFig.3foracomparisonof their features). However, providing a consistent semantics Alice 50 Charlie to a set of trust mappings is a challenging problem. The currentstateoftheartistheapproachtakenbytheOrches- tra system in the context of update exchange [9]. Updates m : Alice(k,v)← Bob(k,v) (prio 100) 1 are treated one at a time, in a First-In First-Out manner. m : Alice(k,v)← Charlie(k,v) (prio 50) 2 When an update is published by a user, the new value is m : Bob(k,v)← Alice(k,v) (prio 80) 3 propagated according to the trust mappings to other users, and conflicts are resolved based on the priorities. At each Figure 2: Example trust mappings between arche- moment, a user is provided with a snapshot of the current ologists Alice, Bob and Charlie. database,accordingtootherusers’beliefsandherowntrust relationships. However, this snapshot may be inconsistent because(i)itdependsontheorderinwhichtheupdatesare BothAliceandBobimportjarfromCharlie,andbothshould processed,and(ii)itmaybecomeinconsistentasaresultof update their belief when Charlie updates his to cow. How- other updates. ever, this is difficult to achieve even if one keeps the lin- eage of each value. The problem is that Alice and Bob trust Example 1.2 (Indus script continued).Considerthe each other with highest priority: Alice keeps her jar value followingsequenceofupdatesstartingwithanemptydatabase: because of Bob’ value jar; Bob keeps his value because of Alice’s value. Time User Update for Beliefs for In this paper we propose an efficient and principled so- Alice Bob Charlie lution to the conflict resolution problem: given a network 0 - - - ofprioritytrustmappings,findthevaluesthatarebelieved 1 Charlie insert jar - - jar (and trusted) by each user. Our first contribution is a def- 2 jar - jar inition of conflict resolution based on a stable solution. A 3 jar jar jar stable solution is simply a global assignment of values to 4 Bob insert cow ? cow jar users such that all trust mappings are satisfied, and that eachvaluehasalineagetoavaluethatwasinsertedexplic- Charlie is the first user to insert the origin of as jar, itlybysomeuser. Weshowthatstablesolutionscorrespond which propagates to Alice and Bob. When Bob inserts cow, precisely to the stable models of logic programs with nega- thesystemfailstopropagatethistoAlicebecauseshealready tion. However,computingstablemodelsisnotoriouslyhard: hasadatavalueacquiredatanearliertimestamp. Thus,Al- evenifeachruleoftheprogramhasasinglepredicatewhich ice continues to see jar. This is inconsistent, because Alice isnegated,decidingifastablemodelexistsforagivenlogic trusts Bob more than Charlie. Had the updates been per- programisNP-hard[4,pp.396]. Weshowthatoff-the-shelf formed in reverse order (first Bob inserts cow then Charlie logic program solvers scale exponentially when applied to inserts jar) then Alice would end up with cow. Thus, the conflict resolution. Instead, we describe a new, efficient al- snapshot for Alice depends on the order in which the up- gorithm that runs in quadratic time in the number of trust datesareperformed,whichisundesirable. Onemayattempt mappings,andcomputesforeachuserboththepossibleval- toaddressthatbystoringthelineageofeachdatavalue: the ues and the certain values. To the best of our knowledge, system could then update Alice’s value because it knows it this is the first solution to the conflict resolution problem came from Charlie. However, maintaining the lineage of all that is both principled and efficient. We describe further valuesinalargesystemisdifficult,becauseanupdatetoone extensionstoanswerqueriesabouttheconflictsthemselves, value may affect the lineage of many other data values. such as finding pairs of users that disagree in at least one Asecondproblemisthatupdatepropagationcannothandle stable solution, or finding the lineage of a belief. updatesor revocationofdatavalues. Considerthefollowing Secondly, we study the impact of constraints on the con- sequence, again starting from an empty database: flictresolutionproblem. Attribute-valueconstraintsarede- fined by users, and can be, for example, a range-constraint Time User Update for Belief for for a numerical attribute, or an inclusion constraint (check- Alice Bob Charlie ing that the value appears in a reference database) for a 0 - - - categorical attribute. Once a user defines a constraint, the 1 Charlie insert jar - - jar effect is that she rejects all values that do not satisfy the 2 jar - jar constraint, hence we also refer to a constraint as a set of 3 jar jar jar negative beliefs. 4 Charlie update jar → cow ? ? cow Thesimplestwaytohandleconstraintsistorestricttheir usage locally: the constraint is only used to decide whether 2Notethatprioritiesareassignedbyusersandonlyserveto imposeatotalpreorderontheirtrustedparties. Prioritiesof toacceptorrejectavaluecomingfromanothertrusteduser, mappings assigned by different users, e.g. m with priority and the constraint itself is not further propagated to other 2 50 and m with priority 80, cannot be compared. users. Onlythedatavaluesarepropagated. Wecallthisthe 3 d. es hasasingleattribute. Wewrite(k,v)whentheobjectwith ng pen ueri identifierk hasattributevaluev. Givenanobject,different pi de q usersmayhaveconflictingbeliefsaboutthecorrectvalueof conflicttsrustmpariporitieuspdaterienvokescyclesconsensus ivntaosltueaetDstrvitb1h,uevt2se,e..t.To.fhapucocss,osrtidbhilenegadtattotraidbviuffatleeureemsn,taayunsdtearUks’etbhoeenlieseefots.foWfseuveseedrreas-.l Thus, user x ∈ U may believe that (k,v) is correct, user y Orchestra [9, 19] x x x x may believe (k,w) is correct, where v,w ∈ D, while user z FICSR [16] x may have no opinion about the value of the object k. BeliefDB [7] x x x x Users’ beliefs may differ from object to object. In this Youtopia [11] x x x and the following section, we treat each object separately Figure 3: Recently proposed systems that model and therefore omit mentioning k altogether. In Sect. 4, we conflicts or data sharing for a community of users. discuss how to efficiently handle multiple objects. Thus, a user either has an explicit belief about the value of the object, or can derive her belief implicitly through agnostic paradigm. Anotherreasonableapproachistotreat priority trust mappings. We define these notions formally. constraints and data values equally and to propagate them together, based on the prioritized trust mappings. Thus, if Definition 2.1 (Explicit Belief b ). An explicit be- 0 AlicetrustsBob,thenshealsotrustsBob’sconstraints,and liefisapartialfunctionfromuserstovalueswithb (x)being 0 therefore may reject a value from Charlie if that value vio- the value believed by user x. lates Bob’s constraint. We call this paradigm eclectic. Sur- prisingly, we show that both the agnostic and the eclectic Definition 2.2 (Priority Trust Mapping m).Apri- paradigms are computationally hard: computing the pos- ority trust mapping (or trust mapping, in short) is a triple sible values is NP-hard, and computing the certain values m=(z,p,x), where z,x are users and p is an integer. The is co-NP hard. Therefore, we propose a third paradigm of meaning is that user x trusts the value from user z with dealing with constraints, called skeptic. Here constraints priority p. We call z a parent of x, and x a child of z. continuetobepropagated,but,inaddition,eachdatavalue v is augmented with a constraint that rejects all other data Definition 2.3 (Priority Trust Network TN). A values except v. When v is accepted, the constraint is re- Priority Trust Network (or trust network, in short) is a la- dundant, but its role becomes important later downstream, beled graph TN=(U,E,b0), where U is a set of users, E is ifvisrejected(e.g. becauseofsomeotheruser’sconstraint). a set of priority trust mappings, and b0 is an explicit belief. Weshowthattheskepticparadigmtoconflictresolutioncan becomputedefficiently(inquadratictimeinthenumberof Given a TN, each user x computes her belief b(x) as fol- trustmappings). Thus,weproposetheskepticparadigmas lows: If x has an explicit belief b0(x) then this is also her oursolutiontohandleconstraintsduringconflictresolution. belief. Otherwise,xcomputesherbeliefimplicitly bychoos- Finally, we study bulk conflict resolution, when similar ingatrustmapping(z,p,x)withhighestpriority(i.e.largest trust relationships are applied to a large number of data p with ties broken arbitrarily), such that the parent z has objects. Here,thecomplexityisdominatedbyalargenum- somebelief,anddefinesb(x)=b(z). Thus,theexplicitbeliefs ber of objects, while the number of users is assumed to be statedbysomeoftheuserspropagatethroughthegraphto small. We show that by simple changes to our two algo- other users. If a user has two trusted parents with different rithms (without and with constraints) we can translate the beliefs,thentheconflictisresolvedaccordingtothepriority, trust mappings into SQL and execute them in a standard with ties broken arbitrarily. relational database. Definition 2.4 (Stable solution b). Astablesolution In summary, our contributions with this paper are: ofatrustnetwork(U,E,b )isapartialfunctionbfromusers • We define a principled solution to the conflict resolu- 0 tovalues,sothatforeveryuserxwithbdefined,thereexists tion problem, called stable solution, and give an algo- a path x →x →...→x =x, s.t.: rithm that runs in quadratic worst time (Sect. 2). 0 1 n (1) x has an explicit belief b (x ), • We describe three paradigms for handling constraints 0 0 0 (2) all nodes along the path have the same implicit belief in conflict resolution and prove that two are hard. b(x )=b (x ), and For the third one, we give an algorithm that runs in i 0 0 (3) at no node x there is a trust mapping (x(cid:48) ,p(cid:48),x ) quadratic worst time (Sect. 3). i i−1 i with higher priority p(cid:48) > p than (x ,p,x ) and with • We discuss conflict resolution in bulk and describe a i−1 i a conflicting belief b(x(cid:48) )(cid:54)=b(x ) at parent x(cid:48) . solution based on a translation into SQL (Sect. 4). i−1 i i−1 Inthiscase,wecallthesequencex ,...,x thelineageofthe • Weconductexperimentsthatshowthat,whilequadratic 0 n belief b(x). In other words, every belief b(x) can be traced to in the the worst case, our proposed solutions actually some explicit belief along paths that are not dominated with scale linearly in most cases (Sect. 5). conflicting values. b(x) remains undefined only if no parent ofxhasanimplicitbeliefandxhasnoexplicitbeliefeither. 2. BASICCONFLICTRESOLUTION Inthissection,wedefineourmodelforhandlingconflicts Resolving a trust network can be thought of as a con- through priority trust mappings. We do not consider con- flictresolution process: usersthathaveexplicitanddistinct straints here, but discuss those later in Sect. 3. beliefs conflict with each other, and users without explicit The database consists of a collection of objects. We as- beliefsdecidewhomtotrustbasedontheprioritytrustmap- sume w.l.o.g. that each object is identified by a key k and pings. We illustrate with two examples: b (x )=v b (x )=w b (x )=v b (x )=w • Consensusvalue: Giventwousersx,y,findallvaluesv 0 2 0 3 0 3 0 4 suchthatineverystablesolutionb,b(x)=viffb(y)=v. • Lineage computation: Given a user x and possible 100 5500 50 40 value v for x, compute a lineage for v. 100 2.2 BinaryTrustNetworks x x 80 x 1 1 2 ABinaryTrustNetwork(BTN)isaTNwhereeverynode (a) Simple TN (b) Oscillator TN x has at most two incoming edges and the explicit beliefs b (x) are defined only for root nodes x (nodes without par- 0 Figure 4: Two example trust networks with either entsbutwithexplicitbeliefs). Wefurtherassumethatevery one (a) or two (b) stable solutions. nodeinaBTNisreachablebyapathfromsomerootnode. If x is not reachable, then b(x) is undefined in any stable solution and may be safely removed from the BTN. The Example 2.5 (Simple TN). Fig.5ashowsasimplenet- networks in Fig. 2 and Fig. 4 are binary. workthathasasinglestablesolution: b(x )=v (andb(x )= 1 2 If x has a single parent z , or two parents z ,z with pri- v, b(x )=w). Consider now Fig. 2 and assume the sin- 1 1 2 3 oritiesp >p ,thenz iscalledpreferredparent. Otherwise, gle explicit belief b (Charlie)=jar. Then the unique so- 1 2 1 0 itiscalledanon-preferredparent. Notethatiftherearetwo lution is b(Alice)=b(Bob)=jar. On the other hand, if parentswiththesamepriority,theyarebothnon-preferred. bothCharlieandBobhaveexplicitbeliefsb (Charlie)=jar, 0 b0(Bob)=cow, then the unique solution is b(Alice)=cow. Proposition 2.8. (Binary Trust Network Equiva- Example 2.6 (Oscillator TN).Nowconsiderthetrust lence) Every trust network TN is equivalent to a binary trust network BTN of maximum double size, where size is network in Fig. 5b. It has two stable solutions, one with the number of trust mappings. b(x )=b(x )=v, the other with b(x )=b(x )=w. To see 1 2 1 2 that the first is a stable solution, note that x and x de- 1 2 We prove Proposition 2.8 in the appendix and from now rive their beliefs from x and x respectively (cyclic), and 2 1 on, and w.l.o.g., consider only binary trust networks. bothhavealineagethatcanbetracedbacktob (x ). Onthe 0 3 other hand, setting b(x )=b(x )=u, where u is an arbi- 2.3 LogicProgrammingandStableModels 1 2 trary value, is not a stable solution, because u doesn’t have One possible approach to solve a trust network is to use a lineage to an explicit belief. LogicProgramming(LP).TechniquesforLPhavebeenstud- Thelastexampleillustratesanimportantissueinsolving ied extensively in the literature [4] and there are tools for a trust network: if the network has cycles, then there can LP solving. In this section, we explore the applicability of be more than one stable solution, even if for each node, the LP to solving a trust network. We associate the following trust mappings impose a total order on its parents. Cycles LP to a BTN. There is a unary predicate Ux for each user in trust mappings occur naturally since mappings are de- x, and a unary predicate Cx,z for every non-preferred edge fined by individual users, and those often form groups that z → x. If x has an explicit belief v, then Ux is an EDB mutually trust each other more than others. Therefore, a predicate,satisfyingthesinglepredicateUx(v);otherwiseit conflictresolutionsystemmustcopewithcycles,andthisis isanIDBpredicate. AllCx,z predicatesareIDBpredicates. a difficult problem as we have seen in Example 1.2. The rules in LP are the following. For every preferred edge y → x we have rule (1) below, and for every non-preferred 2.1 ProblemStatements edge z→x we have rules (2a), (2b) below: We define a principled approach to conflict resolutions in U (r)←U (r) (1) x y terms of certain beliefs. For technical purposes, we define C (q)←U (q),U (r),r(cid:54)=q (2a) them together with their duals, the possible beliefs: x,z z x U (q)←U (q),¬C (q) (2b) x z x,z Definition 2.7 (Certain / Possible Belief). Let x be a user and v be a value. We say that v is a certainbelief Thefirstrulesaysthatxshouldbelieveeverythingthatits if for every stable solution b, b(x)=v. We say that v is a preferred parent y does. The second and third rules state possiblebelief ifthere existsastable solution b s.t. b(x)=v. that x should believe z, except when it conflicts with its ownbelief(presumablyobtainedfromthepreferredparent, Themainproblemthatwestudyinthispaperisfor each or from the other non-preferred parent). The intermediate user x, find their certain beliefs, denoted cert(x). We call predicate C is introduced for technical reasons, to guar- x,z it resolving a trust network. For a simple illustration, the antee safety of the resulting LP. certainbeliefsinExample2.6are: cert(x )=u,cert(x )=v, ThesemanticsofaLPisgivenintermsofastable model. 1 2 cert(x )=cert(x )=∅. Each user’s certain belief represent We briefly review the definition and refer to [4] for details. 3 4 a snapshot of the conflicting information in the network. If ConsiderthegroundedlogicprogramP,obtainedbyground- thenetworkhasauniquestablesolutionb,theneachuserx ingeachrulewithallvaluesintheactivedomain: eachrule withatleastoneparentwithabelief,alsohasawelldefined in the grounded LP has some positive and some negative certain belief, and cert(x)=b(x); if there is more than one predicates (In the case of BTN’s, there is at most one neg- solution, then for some of those users cert(x)=∅. ative predicate). Consider a model M. The reduct of P by Inaddition,weshowthatourtechniquescanbeextended M is the logic program PM obtained by (a) removing all to solve several other related problems, such as: rulesP wheresomenegativepredicateisfalseinM,and(b) • Agreementchecking: Findpairsofusersx,ywhoagree removing all negative predicates from the remaining rules. in all stable solutions b: b(x)=b(y). Note that the reduct has no negations. The model M is Fig_DLVexponen-al   11-5-2009 10,000   Algorithm 1: Resolution Algorithm 1,000   Input: BTN=(U,E,b0) 100   Output: poss(x)foreachnodex∈U me  [sec]   10   I cfolorseeadc←hn∅ode x with b0(x) defined do Ti 1   poss(x)←{b0(x)} addxtoclosed 0.01     DLV   open←U−closed 0.001     M whileopen(cid:54)=∅do 0   50   100   150   200   S1 if ∃ preferred edge z→x with z∈closed∧x∈openthen Size  of  the  network  [x=|U|+|E|]   poss(x)←poss(z) movexfromopentoclosed Figure5: Resolvinga trust networkwith LPsolvers S2 else is exponential in the size of the network. LetSCC(open)betheSCCgraphconstructedfromthe opennodes. LetS beaminimalSCC.LetpossS←∅. foreachedge z→x with z∈closed∧x∈S do called a stable model if it is the minimal fixpoint model of addallvaluesfromposs(z)topossS the reduct PM. A ground fact is called certain if it be- foreachx∈S do longstoallstablemodels;itiscalledpossibleifitbelongsto po8ss7(  x)←possS some stable model. We prove the following in the technical moveallnodesofS fromopentoclosed report [8]. Theorem 2.9 (Logic Program Equivalence). Every stable solution to a BTN is a stable model to the associated open contains all other nodes. During the main loop (M), LP, and vice versa. the algorithm performs two steps: Step 1 (S1) propagates greedily poss(x) along preferred edges, closing the destina- Example 2.10 (Oscillator). TheLPcorrespondingto tion nodes. Step 2 (S2) handles the case when no more the oscillator from Example 2.6 is: preferred edges can be traversed, and we describe it next: U ((cid:48)v(cid:48))← U ((cid:48)w(cid:48))← Astrongly connected component(SCC)inagraphisaset 3 4 Sofnodessothatthereexistsapathfromxtoyandapath U (r)←U (r) U (r)←U (r) 1 2 2 1 fromytox,foreverytwonodesx,y∈S. TheSCCgraph is C (q)←U (q),U (r),r(cid:54)=q C (q)←U (q),U (r),r(cid:54)=q formedbycontractingtheverticesofeachSCC.Itisknown 1,3 3 1 2,4 4 2 U (q)←U (q),¬C (q) U (q)←U (q),¬C (q) that this graph is acyclic and can be calculated in O(n) 1 3 1,3 2 4 2,4 timeusingTarjan’salgorithm[18]. Inthesecondstep(S2), It has two stable models, M = {U ((cid:48)v(cid:48)),U ((cid:48)v(cid:48)),U ((cid:48)v(cid:48)), the algorithm computes the SCC graph of open nodes, and 1 1 2 3 U ((cid:48)w(cid:48))}, and M ={U ((cid:48)w(cid:48)),U ((cid:48)w(cid:48)),U ((cid:48)v(cid:48)),U ((cid:48)w(cid:48))}. chooses a component S that is minimal in the SCC graph. 4 2 1 2 3 4 In other words, S is an SCC that has no incoming edges In summary, one can resolve a trust network as follows. from other SCCs. S may still have incoming edges, but RewritetheBTNintoalogicprogram,thenuseaLPsolver they are all coming from closed, and moreover, they are all to compute the certain tuples. Unfortunately, solving logic non-preferred edges. The algorithm defines poss(x), for all programs is hard, even in the most restricted settings. De- nodes in S as the union of all possible values of all parents cidingwhetheralogicprogramhasatleastonestablemodel of nodes in S. It then closes all nodes in S. iscoNP-hardevenifallruleshaveasingleatominthebody and that atom is negated; deciding if a ground tuple is a Example 2.11 (Oscillator). We illustrate the algo- certain tuple is coNP-hard [13]. These theoretical results rithm for Example 2.6. Initially, closed = {x ,x }, and 3 4 translate into running times that increase exponentially in poss(x ) = {v}, poss(x ) = {w}. There are no preferred 3 4 thesizeoftheBTN.Figure5showstherunningtimeofthe edges that can be traversed, so the algorithm proceeds with state-of-the-art LP-solver DLV [12] on increasingly larger thesecondstep,computingtheconnectedcomponentsofopen. BTNscomposedofseveraloscillatorsfromFig.5b. Onecan Thereisasinglecomponent,{x ,x },andtheirpossibleval- 1 2 see the clear exponential trend: the LP solver becomes im- ues are set to poss(x ) ∪ poss(x ) = {v,w}. The output 3 4 practicalongraphslargerthan150nodes. (Wereportmore of the algorithm is: cert(x ) = {v}, cert(x ) = {w}, and 3 4 extensiveexperimentsinSect.5.) Thus,relyingonageneral cert(x )=cert(x )=∅. 1 4 purpose LP solver is not a practical solution. Instead, we describe in the next section a new algorithm for resolving a Theorem 2.12 (Resolution Algorithm).Letnbethe trust network which runs in quadratic time of its size. numberofnodesinaBTN.Algorithm1runsintimeO(n2) andcorrectlycomputesthesetofpossibletuplesforallnodes 2.4 AnAlgorithmforConflictResolution x in BTN. Algorithm1takesasinputaBTNandcomputesthesetof possiblevaluesposs(x)foreverynodex. Thesetsofcertain The running time follows from the fact that the SCC valuescert(x)canthenbederivedusingthefollowingrules: graphcanbecomputedintimeO(n). Notethattherunnig cert(x)={a} if poss(x)={a}, and cert(x)=∅ otherwise. timeisnotO(n),astheSCC’smayneedtobere-calculated The algorithm maintains a set closed of all nodes x for ateachiterationasthesetopenchangesateachiteration: we which poss(x) is already computed. This set is initialized showinthetechnicalreportthatthealgorithmtakesΩ(n2) with all root nodes, i.e. all nodes with explicit beliefs (I). in the worst case. We prove correctness in the appendix. 2.5 DiscussionandExtensions 3. CONFLICT RESOLUTION WITH CON- Let’s step back and examine what we have achieved. We STRAINTS have fixed an object k, and considered a priority trust net- In this section, we extend our approach to constraints work. Our goal is to give each user a snapshot of the data which we model as negative beliefs. So far, we have only thatisconsistentwiththeentirenetwork. Definition2.4de- considered positive beliefs, i.e. a user either believes that fines a stable solution for this network, but in general there the value of an object is v or has no opinion at all. A neg- may be several stable solutions. We have proposed the cer- ative belief, in contrast, states that the value of the object tain values as the snapshot to be shown to the user, and isnot v. Wedenotewith(k,v)+apositivebelief,andwith described Algorithm 1, which computes the certain values (k,v)− a negative belief. (and the possible values too) in time quadratic in the num- Constraints occur naturally in collaborative systems and ber of users. The algorithm may need to be run separately enable users to filter the data values they accept. For ex- for each object k, an issue that we will address in Sect. 4. ample, one users may define the constraint the value of the The important property of our approach is that both the ‘carbon-date’attributeisbetween1,200and40,000: thiscor- definition and the algorithm are order-invariant: they do responds to a negative belief for every value v outside the not rely on any order in which conflicts are to be resolved. range. Or, another user may rely on a reference database The result is a consistent snapshot of the conflicting infor- before accepting a value, e.g. the value of the ‘translation mation. By contrast, as we have seen, prior approaches to attribute’ must be in the ‘list-of-known-words’. These con- conflictresolutionprocesstheexplicitbeliefsinafixedorder straints are used to refuse a value from a trusted user and (e.g. in the order of their transaction time), and the result therefore affect the global conflict reconciliation. In addi- dependsonthisorder. Asaconsequence,ifanyexplicitbe- tion, a user may state a negative belief explicitly in order liefisupdated, e.g. somebeliefisrevoked, theremaybeno to refute another user’s statement. For example, user Alice way to re-compute a consistent snapshot. In our approach, may state that the origin of is cow, written (k ,cow)+. 1 if an explicit belief is updated, we will simply re-run the User Bob may disagree. He does not know what the origin algorithm and obtain another consistent snapshot. of the glyph is, but believes it cannot be a cow. His belief As a further benefit of a principled approach, we men- is thus (k ,cow)−. Bob may accept other values, such as 1 tion here two extensions of our algorithm that allow the horse or jar, from users he trusts, but not cow. systemtoanswermorecomplexqueries,asthosementioned As in the previous section, our discussion focuses on a in Sect. 2.1. single, fixed object k and we will not mention k anymore. Wewriteapositivebeliefasv+andanegativebeliefasv−, Retrieving lineage. Weshowhowtoextendthealgorithm where v is a data value. An explicit belief can be positive to compute the lineage of each possible value. Whenever v+,meaningthattheuserknowsthatthevalueisv,orcan we insert a value v into poss(x), store a pointer back to be a set of negative beliefs v−,w−,.... We allow these sets thevaluev∈poss(z)thatproducedthispossiblevalueat tobeinfiniteaslongastheycanbefinitelyrepresented,for x: for Step 1 this is a value in the preferred parent, for example by a range predicate. Step2therecanbeseveral(user,value)pairsfromoutside the set S: store pointers to all of them. Thus, from each Definition 3.1 (consistency). Two beliefs b ,b are valuev∈poss(x)wecantracebackseverallineages. Note 1 2 conflicting(b ↔(cid:54) b )iftheyareeitherdistinctpositivebeliefs thatthismethodisnotcomplete: Step1missessomelin- 1 2 v+,w+,oroneisv+andtheotherisv−. Otherwise,b ,b eagesthatcometoxvianon-preferrededges. However,it 1 2 areconsistent(b ↔b ). AsetofbeliefsBiscalledconsistent hasthepropertythateachpossiblevaluehasatleastone 1 2 if any two beliefs b ,b ∈B are consistent. lineage that the system can return to the user. 1 2 Definition 3.2 (preferred union). Giventwoconsis- Pairs of possible values. For any two users x,y in a bi- tent sets of beliefs B ,B , their preferred union is: nary trust network, denote: 1 2 B ∪(cid:126)B =B ∪{b |b ∈B .`∀b ∈B .b ↔b ´} 1 2 1 2 2 2 1 1 1 2 poss(x,y)={(v,w)|∃ stable solution b:b(x)=v,b(y)=w} Asintheprevioussection,ourgoalistodefine,thencom- Thus, poss(x,y) denotes the set of pairs of values that x puteallimplicitbeliefsbasedontheprioritytrustmappings. and y can take together. Note that if (v,w) ∈ poss(x,y) Aswewillseenext,thisraisesbothconceptualandcompu- then x∈poss(x) and y∈poss(y), but the converse is not tational challenges. true. For example in Fig. 5b, poss(x ,x ) contains the 1 2 3.1 ThreeParadigms pairs (v,v) and (w,w), but not (v,w) or (w,v). Consider the binary trust network in Fig. 7a. User x 1 Proposition 2.13 (Possible Pairs). Algorithm 1 defines a constraint resulting in a negative belief b−. Let’s canbeextendedtocomputeposs(x,y)forallpairsofusers examineuserx3: sheobviouslyadoptstheexplicitbeliefa+ x,y. The modified algorithm runs in time O(n4) where n from her preferred parent x2. The question is: what should is the number of users. she do with the negative belief b−? Should her belief be {a+,b−},orjust{a+}? Clearly,onceshebelievesthevalue a+, she has no more use for the constraint b−, since the Thesetsposs(x,y)allowustogobeyondthesnapshotcon- purpose of the constraint was only to rule out b+ which is sistingofcertaintuples,andanswermorecomplexqueries not under consideration at all here. This argument shows about the conflicts and the reconciliation. For example, that she may well restrict her belief to {a+}. On the other the agreement checking query mentioned in Sect. 2.1 can hand, her decision may affect the users who trust her. Her be answered as {(x,y)|∀(v,w)∈poss(x,y)⇒v=w}. immediate successor x willrejecta+, butthe next user x 5 7 Fig_Seman?csForOWFi_g_aS   eman?csForOWF_igd_  Seman?csForOWF_icg  _Seman?csForOW_b   11-­‐4-­‐2009   11-­‐4-­‐2009   11-­‐4-­‐2009   11-­‐4-­‐2009   {a+} {b−} {a+} {b−} {a+} {b−} {a+} {b−} x x x x x x x x {a−}2Preferred 1 {a−}2Preferred {a+}1 {a−}2Preferred {a+,1b−} {a−}2Preferred {a+}1 x4Preferred x3{b+} x4Preferred x3{b+} x4Preferred x3{b+} x4Preferred x3{b+} x5Preferred x6{c+} {a−x5}Preferred x6{c+} {a−,b−x5}Preferred x6{c+} ⊥x5Preferred x6{c+} x x x x x x x x 7Preferred 8 {b+}7Preferred 8 {a−,b−7}Preferred 8 ⊥7Preferred 8 x9 x9{b+} x9{c+,a−,b−} x9⊥ (a) Explicit beliefs (b)Agnosticparadigm (c) Eclectic paradigm (d) Skeptic paradigm Figure 6: (a): An example binary trust network with explicit positive and negative beliefs. The edge from the prefEexrprliecditbpealireefsnt is labeled as such. (b-d): The three alternative paradigms lead to different entailed implicit Skeptic Agnostic Eclectic beliefs at various nodes for the unique stable solution of the trust network. 83   86   85   84   has the option of adopting b+ or not. The decision made “A skeptic is a person inclined to question or doubt all by user x affects whether x can learn or not about the accepted opinions.” 3 7 constraint b− defined upstream. As this example shows, The paradigm is chosen by the system administrator and thereareseveralchoicesindefiningconflict-resolutioninthe appliedtoallusers. Thestablesolutionstoatrustnetwork presence of negative beliefs, even for graphs without cycles. depend on the paradigm chosen. Before we can define the We propose here three paradigms for trust network res- stable solutions, we need some technical definitions. Let B olution in the presence of negative beliefs. A paradigm is be a consistent set of positive and/or negative beliefs. For formallydefinedasasetofconsistentsetsofbeliefsthatare each paradigm σ ∈ {Agnostic,Eclectic,Skeptic} (abbre- considered valid or in normal form. We denote ⊥={v− | viated by {A,E,S}), the normal form Norm (B) is: v∈D} the set of all negative beliefs. Equivalently, ⊥ is an σ  inconsistent constraint that rejects any value. {v+} if ∃v+∈B Norm (B)= A B otherwise Agnostic. The only valid belief sets in this paradigm are singletonpositives{v+}andsetsofnegatives{v−,w−,...}. Norm (B)=B E Onceauserknowsthevalueofanobject,theydonotwant  {v+}∪(⊥−{v−}) if ∃v+∈B to know any constraints, even if they are consistent with NormS(B)= B otherwise thisvalue. Intheagnosticsolution,thenegativebeliefb− is blocked by x who believes only a+ (Fig. 7b). The preferred union specialized to the paradigm σ is: 3 “Anagnosticisapersonwhobelievesthatnothingisknown B ∪(cid:126) B =Norm `Norm (B )∪(cid:126)Norm (B )´ (1) 1 σ 2 σ σ 1 σ 2 or can be known (...) beyond material phenomena.” For example: Eclectic. Any consistent set of beliefs is valid. In this paradigm, a user adopts all constraints that are consis- {a−}∪(cid:126)A{b+}={b+} tent with a given value. Thus, {a+,b−,c−} is a a valid {a−}∪(cid:126) {b+}={b+,a−} E set of beliefs. The eclectic solution is shown in Fig. 7c. {a−}∪(cid:126) {b+}={b+,a−,c−,d−,...} Herex acceptstheconstraintb−inadditiontoa+. Asa S 3 consequence, this constraint is communicated all the way {b−}∪(cid:126) {b+}=⊥ S to x , who now rejects b+. 7 Wedefinenextastablesolutionforabinarytrustnetwork “An eclectic is a person who derives ideas, style, or taste with constraints. We make the restriction that edges enter- from a broad and diverse range of sources.” ing the same node have distinct priorities, thus we disallow Skeptic. Thevalidsetsofbeliefsarethefollowing: allsets ties. We discuss ties in the technical report [8]. withonlynegativebeliefs,andallsetsthatcontainexactly Definition 3.3 (Stable solution w/ constraints). onepositivebeliefandallnegativebeliefsconsistentwith Letσ∈{A,E,S},andletBTN=(U,E,B )beabinarytrust it, i.e. they are of the form {v+}∪(⊥−{v−}). Thus, 0 network, where for all x, B (x) is either a positive belief, or when a user accepts a positive belief v+, she also adopts 0 a set of negative beliefs, or the empty set. A stable solution a constraint that rules out all other values. The skeptic is a function B from users to sets of beliefs such that: solution is shown in Fig. 7d. In this paradigm the belief (1) Ifxhasapreferredparentyandanon-preferredparent a+“means”theset{a+,b−,c−,d−,...}. Whenx5rejects z, then B(x)=B (x)∪(cid:126) `B(y)∪(cid:126) B(z)´. If x has only a+, his belief becomes ⊥. This propagates to x , who 0 σ σ 7 one parent y, then B(x)=B (x)∪(cid:126) B(y). If x has no rejectb+,similarlytotheeclecticparadigm. However,at ` 0 ´σ parent, then B(x)=Norm B (x) . the next step x rejects c+ too, hence x does not belief σ 0 9 9 (2) For every belief b ∈ B(x) there exists a path x → any positive value (he believes ⊥). This differs from the ` ´0 x →...→x =x such that b∈Norm B (x ) and eclectic paradigm, where x believes c+. 1 n σ 0 0 9 b∈B(x ) for all i=0,...,n. i Fig_ComplexityFNigo_t  ComplexityOr   Fig_ComplexityExample   3-­‐17-­‐2010   3-­‐17-­‐2010   3-­‐17-­‐2010   {a−} {a+/b+} {c−} {c+/d+}{c−} {c+/d+} {c−} {c+/d+} {a+} {b+} {a+} {b+} {a+} {b+} X X X X 1 2 3 Preferred {d+} Preferred Preferred Preferred level1 Preferred X1 Preferred X2 Preferred X3 Preferred Preferred Preferred {b−} Preferred Preferred level2 P N P P Preferred {c+} Preferred {e+} level3 OR OR Preferred Y Preferred Y level4 ANDZ {d+(a−,b−)/c+(a−,b−)} {e+(c−)/d+(c−)} {e+(a−,b−,c−,d−)/f+(a−,b−,c−,d−)} (a) Y =NOT(X) (b) Y =OR(X ,X ,X ) (c) SAT encoding of (X ∨¬X )∧(X ∨X ) 1 2 3 1 2 2 3 Figure 7: (a,b): Representation of NOT and OR gates for Agnostic and Eclectic paradigms. A PASS- THROUGH gate is similar to a NOT, an AND gate to an OR. (c): A full example of CNF. Notice that 0/1 are encoded differently at different levels: the inputs at the top are a/b for 0/1, the output at Z is e/f for 0/1. 122   125   126   Consider a node x and a positive belief v+. We say that that all values at the second level have the same encoding v+ is possible if there exists a stable solution B s.t. v+ ∈ c+/d+. Thereductioniscompletedbytheobservationthat B(x). We say that v+ is certain, if v+∈b(x) for all stable the CNF formula is satisfiable iff f+ ∈ poss(Z), where Z solutions B. Our goal is to compute the possible and the is the output node. This proves the first claim of the theo- certain positive beliefs under each of the three paradigms. rem. The second claim follows from the fact that checking non-satisfiability of CNF formulas is coNP-hard, and that 3.2 TheComplexities the formula is unsatisfiable iff e+∈cert(Z). WeshownextthattheAgnosticandEclecticparadigms are hard, whereas Skeptic is in PTIME. Given this negative result, it is somewhat surprising that the third paradigm Skeptic can be computed in PTIME. Theorem 3.4 (Agnostic / Eclectic Complexities). Algorithm 2 runs in O(n2) and computes a set repPoss(x) Let σ be Agnostic or Eclectic. Then the following hold: for each node x, which is a representation of the set of pos- (a): Checking if a positive belief is possible at a given node sible values poss(x) at x as follows: If repPoss(x) contains x in a binary trust network is NP-complete. (b): Checking a positive value v+, then all other negative values are also if a positive belief is certain at a given node x in a binary possible. if repPoss(x) contains ⊥, then all negative values trust network is coNP-complete. are also possible. In summary, the set of possible values poss(x) represented by P=repPoss(x) is: Proof sketch. The proof is a quite simple reduction from the CNF SAT problem, and we include it here be- poss(x)={v−|v−∈P}∪{w−|⊥∈P,w∈D} cause it highlights the inherent difficulties of coping with ∪{v+|v+∈P}∪{w−|v+∈P,w∈D,w(cid:54)=v} constraints in conflict resolution. The key technical step is thatBooleangatesNOT,OR,andAND canbeencodedas IfrepPoss(x)doesnotcontainapositivevalueor⊥,wecall Trust Networks. Consider the NOT gate in Fig. 8a: The it of Type 1. Otherwise, it is of Type 2. inputs 0/1 are encoded as beliefs a+/b+, the outputs are We describe now the algorithm, which is a natural ex- encodedasc+/d+. WhentheinputX isa+for0,thenthe tension of Algorithm 1. The preprocessing phase (P) finds output Y is d+ for 1. This is because a+ is blocked at the all nodes who either have explicit negative beliefs or whose nextlevel,henced+advances(theparenthesesshowthead- preferred ancestors have explicit negative beliefs. For that ditionalnegativebeliefsthatarepropagatedintheEclectic itstartsfromnodesxwheretheexplicitbeliefhasonlyneg- paradigm). Similarly, one can check that if X is b+ for 1, ative values, and continues with all nodes whose preferred the output Y is c+ for 0. Thus, the network maps a+/b+ parent has only negative beliefs. These nodes correspond to d+/c+, hence it is a NOT. If we modify the network to the empty nodes in the earlier Algorithm 1, but unlike by switching c with d, we obtain a PASS-THROUGH gate there, we cannot discardthese nodes yet. The initialization whichmapsa+/b+toc+/d+. NextconsidertheternaryOR phase (I) again creates the sets of closed and open nodes, gate in Fig. 8b: If at least one of the three inputs is d+ for and closes nodes with explicit positive beliefs. Step 1 (S1) 1,d+willpropagatetotheoutput. Ifallinputsarec+then of the main loop (M) traverses all preferred edges z → x, all positive beliefs are blocked, and the output is e+ for 0. and updates repPoss(x) accordingly. Step 2 (S2) is slightly Thus,theencodingoftheoutputise+/d+for0/1. AnAND more involved and described next: gateisobtainedsimilarly. Bycombiningthesegates,making Herethealgorithmfindsaminimalconnectedcomponent sure that each level uses the same encoding of Boolean val- S of open nodes (as before), but it cannot simply flood it ues as positive beliefs, we can represent an CNF expression withallpositivevalueswaitingtoenterS. Thereasonisthat where the last level encodes 0/1 as e+/f+. We further use some nodes in S can be reached from closed nodes through basicoscillators,asinFig.5btogenerateinputs0/1. Fig.8c preferred edges. These are precisely the edges we skipped illustrates the entire encoding of (X ∨¬X )∧(X ∨X ). in Step 1. Thus, some nodes in the component S can be 1 2 2 3 Notice the role of the PASS-THROUGH gates in ensuring forced to accept negative beliefs, and these prevent a value v+waitingtoenterS toreachtheentiresetS. Instead,we Algorithm 2: Skeptic Resolution Algorithm simply compute which subset of S the value can reach, and addv+tothesetofpossiblevaluesforthosenodes. Forall Input: BTN=(U,E,b0) Output: repPoss(x)foreachnodex∈U unreachable nodes, we add ⊥ to the set of possible values, foreachx∈U doprefneg(x)←∅andrepPoss(x)←∅ because in the Skeptic paradigm: {v−}∪(cid:126)S{v+}=⊥. P foreachnode x with ∃v−∈b0(x)do Theorem 3.5. (Skeptic resolution algorithm) Al- prefNeg(x)←b0(x) gorithm 2 runs in time O(n2) and computes the set of pos- while∃ preferred edge z→x with v−∈prefNeg(z), sible values, for the Skeptic paradigm. v+(cid:54)∈b0(x)do prefNeg(x)←prefNeg(x)∪prefNeg(z) 3.3 Discussion I closed←∅ This section described how to handle constraints during foreachnode x with v+∈b0(x)do repPoss(x)←b0(x) conflictresolution. Werepresentconstraintsasnegativebe- addxtoclosed liefsandarguethattheyareanimportantfeatureincollab- open←U−closed orative data sharing. The question we have studied is how M whileopen(cid:54)=∅do trustmappingsshouldhandleconstraints. Perhapsthemost S1 if ∃ preferred edge z→x with z∈closed, x∈open and natural approach is to use the constraints only as filters for repPoss(x)=∅then datavaluesacceptedfromotherusers,butotherwiseignore repPoss(x)←repPoss(z) movexfromopentoclosed them during reconciliation. This is what we called the Ag- S2 else nosticparadigm. Thesecondnaturalapproachistosimply LetSCC(open)betheSCCgraphconstructedfromthe propagateconstraintstogetherwithdatavalues,inwhatwe opennodes. LetS={x1,...,xn}beaminimalSCC.Let called the Eclectic paradigm. However, we have shown {z1,...,zk}beallnodesinclosedthathaveedgesintoS. thatcomputingthepossiblevaluesunderbothparadigmsis foralli∈{1,...,n}, j∈{1,...,k}do NP-hard(andcomputingthecertainvaluesisco-NPhard), foreachv+∈repPoss(zj)do so we do not advocate their use in data reconciliation. Our LetS(cid:48)=S−{x|v−∈prefNeg(x)} third paradigm is, we believe, natural too: propagate con- if ∃ path zj →xi in S(cid:48) then straints, but in addition associate to a data value a maxi- repPoss(xi)←repPoss(xi)∪{v+} else mal constraint, which rules out any other data value. This repPoss(xi)←repPoss(xi)∪{⊥} paradigm,wehaveshown,isinPTIME,andthealgorithmis a natural, yet somewhat detailed extension of Algorithm 1. foreachv−∈repPoss(zj)do We propose the Skeptic paradigm as the basis for conflict repPoss(xi)←repPoss(xi)∪{v−} resolution in cooperative data sharing systems. moveallnodesofS fromopentoclosed We note that, if no constraints exist in the system, then allthreeparadigmscollapsetothesimplesemanticswedis- cussed in Sect. 2. The hardness of the Agnostic and Eclectic paradigms needs to be run separately for each object. In this section, holds under the assumption that the network is cyclic: the we show, that under certain conditions, the set of possi- proofusedoscillators. IfthenetworkisaDAG,thenallthree ble/certain values can be computed in bulk, for an entire paradigmscanbecomputedinPTIME,bysimplyapplying set of objects k ,k ,...,k . We sketch here the approach repeatedly the definition of preferred union (Eq. 1). 1 2 n and provide the details in the technical report. Proposition 3.6 (Acyclic BTNs). Let BTN be a bi- LetTN1,...,TNn bethetrustnetworksforeachofthen nary trust network that is acyclic. Then for any of the objects. We make the following two assumptions: three paradigms (Agnostic, Eclectic, Skeptic) the follow- (i) The set of trust mappings is the same for each object inghold: (a)thereexistsexactlyonestablesolution,and(b) ki,i.e.auserxtrustsauserz globally,forallobjects. that solution can be computed in PTIME. (ii) Ifauserhasanexplicitbeliefforanobjectki,thenit has an explicit belief for each of the objects. A puzzling question is why is the Skeptic paradigm in Then it is possible to simply adapt both, Algorithm 1 PTIME, while the other two are hard. It is easy to see and Algorithm 2 to bulk-compute the set of possible tuples thattheBooleangatesinFig.7nolongerworkunderSkep- throughSQLqueries. LetPOSS(X,K,V)denotethearelation tic, but we do not consider this a satisfactory explanation. representing the possible values: an entry (x,k,v) means While we cannot give an ultimate cause, we point out one that v is a possible value for user x and object k. Then, in interestingdifference. ThepreferredunionforSkepticisas- step1ofthemodifiedalgorithm,whentraversingapreferred sociative, while it is not associative for either Agnostic nor edge z→x, we perform the following bulk insertion: Eclectic. For example, consider the two expressions B = 1 {a−}∪(cid:126) `{a+}∪(cid:126) {b+}´, B =`{a−}∪(cid:126) {a+}´∪(cid:126) {b+}. For σ σ 2 σ σ Agnostic, we have B ={b+}, for Eclectic B ={a−,b+}, insert into POSS 2 2 whileforbothB ={a−}. Bycontrast,onecanshowthat∪(cid:126) select ’x’ AS X, t.K, t.V 1 S is associative. Associativity as a desirable property during from POSS t data merging was pointed out in [14]. where t.X = ’z’ 4. EXTENSIONTOBULKPROCESSING In step 2, when ’flooding’ a strongly connected component Sofarwehavetreatedoneobjectatatime. Ifseveralob- SCC with the beliefs coming from the users z ,...,z we 1 k jects need to be updated, then the reconciliation algorithm perform the following bulk insertions for each x ∈SCC: i Fig_ExperimentsOscillatorLoop   Fig_ExperimentsWebData   11-4-2009 Fig_ExperimentsBulk   11-4-2009 11-4-2009 100   100   100   10   10   10   me  [sec]   1   me  [sec]   1   me  [sec]   1   Ti Ti Ti 0.10     DLV   0.10     DLV   0.01     DLV   RA   RA   RA   y  =  1e-­‐5  x   y  =  1e-­‐5  x   y  =  19e-­‐5  x   0.010     0.010     0.001     10   100   1,000   10,000   100,000   1,000,000   10   100   1,000   10,000   100,000   1,000,000   10   100   1,000   10,000   100,000   1,000,000   Size  of  the  network  [x=|U|+|E|];  one  object   Size  of  the  network  [x=|U|+|E|];  one  object   Number  of  objects  in  the  database;  fixed  network  of  7  users  and  12  mappings   (a) Network with many cycles (b) Network of sampled web links (c) Bulk inserts Figure 8: Experiments for various setups show quasi-linear scalability of our Resolution Algorithm (RA) in contrast to exponential scalability for solving the corresponding Logic Program with DLV. insert into POSS memorymanagement,L2cachebehavior). Figure9bshows 82   83   84   select distinct ’xi’ AS X, t.K, t.V ourresultsontherealdataset. DLVperformsbetteronthis from POSS t dataset,sinceitcontainsfewercycles,onaverage. However, where t.X = ’z1’ or ... t.X = ’zk’ our algorithm is still faster by several orders of magnitude and exhibits the same robust behavior as before. Overall, For Algorithm 2 we need to modify some of the insert RA had an average running time of around 0.01 msec per statements to insert the appropriate representation of ⊥ user and trust relation on both data sets. rather than poss(x). Wenotethattherunningtimeofouralgorithmisquadratic intheworstcase(we prove thisinthetechnical report[8]), 5. EXPERIMENTS but the types of graphs that lead to quadratic behavior are We have conducted an experimental evaluation of our complex and highly regular with nested cycles. We tested approach addressing the following questions: What is the ouralgorithmonsuchkindsofgraphsandmeasuredrunning observed runtime performance, both of the quadratic con- times of about 4 sec for a network size of 10,000. flictresolutionalgorithm(RA)andofthebulk-updatealgo- Finally, Fig. 9c shows our result for testing the bulk con- rithm? And how does our approach compare to computing flictresolutionalgorithmfromSect.4. Weusedatrustnet- the stable models using an advanced liner program solver? workwith7usersand12mappings,andassumedtwousers For the latter we used the programming system DLV [12], with explicit beliefs. Then we varied the number of objects which is widely considered the state-of-the-art implemen- in the database. For each object, we randomly chose these tation of logic programming with negation, and execute it beliefs of the two users to be in conflict, or in agreement. underthebravequerysemantics3. Allalgorithmsareimple- Then we reconciled all users, on all objects. The database mentedinJavaandwererunona2.2GHzdualcoremachine systemwasMicrosoftSQLServer2005,andweusedJDBC with4GBofmainmemory. ExecutiontimesforRAareav- for calls to the database. The data was stored in a rela- eraged over 20 trials. tion POSS(X,K,V), as described in Sect. 4. Figure 9c shows We used two data sets in our experiments. The first is a linear data scalability of our algorithm4. The running time syntheticdataset,consistingofanetworkofmanyindepen- of DLV is exponential here too, because of the conflicts in dent cycles. This network consists of several, disconnected half of the objects between the two users. In contrast, the 4-node clusters of the form from Example 2.6. In this net- running time of the bulk conflict resolution algorithm is in- work, one out of two users has an explicit belief. In our dependent of the number of conflicts. second data set, we modeled a trust network with as much expected real-life characteristics as possible. Real-world so- 6. RELATEDWORK cial structures are known to exhibit a power-law behavior, often referred to as scale-free. We used a large crawl of a Several systems have been recently described that adopt topleveldomainoftheWorldWideWebandthelinkstruc- some kind of conflict resolution or trust mapping to model turebetweensiteswhichexhibitssuchbehavior,witharound datasharinginacommunityofusers[7,9,11,16,19]. These 270k domains with 5.4M links. We identified domains with systemsdiscusstechniquesinthepresenceofvarioussubsets users and links with trust mappings, and assigned random of the features needed in conflict resolution (Fig. 3). Con- priorities. We varied the size of this database by randomly flictsoccurwhenthesystemallowskeyconstraints;aconflict sampling a fraction of all edges and include both start and is simply a violation of the key constraint. Orchestra is the end point in our sampled graph. onlysystemthatconsidersbothconflictsandpriorities,and TheresultsofourexperimentsarereportedinFig.8. For is closest in spirit to our setting. We share with Youtopia the synthetic data set, as shown before, DLV runs in expo- theuseofcertain/possiblevalues,aconceptoriginallyintro- nentialtime(Fig.9a). Incontrast,RArunsinalmostlinear duced for incomplete databases [10]. time: thus, just increasing the number of cycles does not Our solution has relevance to the research areas of data leadtoaquadraticrunningtimeofouralgorithm. Weomit integration anddata exchange. Similarto[19],weassumea runningtimesonverysmalldatasets,becausetheyshowed fixed schema in this paper. In contrast, work on peer data highvarianceduetofactorsunrelatedtothealgorithm(Java 4Data points below 100 msec seem to be dominated by the 3command:“dlv.bin-braveinput.txtquery.txt” overhead of the Java-SQL server connection

Description:
Data Conflict Resolution Using Trust Mappings. Wolfgang neous Databases; H.3.3: Information Search and Retrieval amulets, and small tablets.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.