1 2 4 marking content aggregates marking content aggregates B Water Water JAA ? ? DM m ? DH e n ay (cid:29) JRQ bl o w (cid:28) o ti y (cid:14)(cid:14)(cid:14)(cid:16)(cid:12)(cid:9)(cid:10)(cid:8)(cid:11)(cid:4)(cid:13)(cid:15) (cid:28)(cid:28)(cid:27)(cid:31)(cid:30)(cid:31)(cid:29)(cid:29)(cid:18)(cid:23)(cid:24)(cid:25)(cid:26)(cid:24) 34679<>8:;=5 BIJGGDHAEDA IJJNNNNLMMLLPQAOAH Y on whatis the pr whatis asolu whocares an today (cid:2)(cid:7)(cid:5)(cid:1)(cid:6)(cid:3)(cid:4)(cid:8)(cid:0) (cid:18)(cid:18)(cid:23)(cid:19)(cid:20)(cid:21)(cid:22)(cid:24)(cid:17) &("#’%$%)! ,,,+-/012*. @BCFAADE? JELHHAEDMK STUTVTWXK Sion, Atallah, Prabhakar introducti • • • Sion, Atallah, Prabhakar Watermarking Multi-ContentAggregates Radu Sion([email protected])http://www.cs.purdue.edu/homes/sion Computer Sciences and CERIASPurdue University Keywords: Digital Watermarking, Steganography, Security, Copyright Protection, Databases eral Introduction10 slides) ore: traditionalwatermarking10 slides) research30 slides) Watermarking content aggregates3 ver. 2.12,April02,2002 overview •gen(aprox. •folkl(aprox. •our (aprox. Sion, Atallah, Prabhakar 2 6 8 solution information hidingWatermarking deploystechniques in the aim to become a solution to the previously outlined issues. i.e. hiding a certain mark (e.g. “radu is the author of this novel”) into the object itself (e.g. novel text) is hopedto hold up in court as evidence for copyright purposes at a later dispute time; important issue: “attack survivability” Watermarking content aggregatesSion, Atallah, Prabhakar information hiding InformationHiding CovertCopyrightsAnonymitySteganographyChannelsMarking LinguisticTechnicalRobustFragile FingerprintsWatermarks Classification of Information HidingImperceptiblePerceptible(according to Petitcolas et. all.) Fundamental difference: Watermarking vs. Steganography Watermarking content aggregatesSion, Atallah, Prabhakar 5 7 issues •affirmcreation rights: resiliently embedinformation within the object (and its copies !) allowing identification of the actual copyright ownerin a Court of Law •inlineannotation: encode (not necessary hide) information in object •identifyagreement violators (“bad people”):hide and persist information in each sold copy of the object, allowing identification of the initial buyerof that particular copy (“fingerprinting”) •unobtrusivecommunication: use ‘innocent looking’message to hide secret (covert channel). •etc. Watermarking content aggregatesSion, Atallah, Prabhakar market: got money ? Buyers of watermarking technology include any party that produces and/or sells valuable content and then distributes it through untrusted channels, especially in the case when the content allows for valuable , in which case the watermarking technology has derivatesto also provide protection for the derivates. Watermarking content aggregatesSion, Atallah, Prabhakar 3 10 12 watermark detection Marked Object KeyWatermark WatermarkOriginalExtractionStego Object Yes/No(confidence level) Watermarking content aggregatesSion, Atallah, Prabhakar attacks •detectand remove(“subtractive”)•knowhow & Key•statistics& Key •perturb(transform, segmentetc)•knowapproximately how•statistics •addnew watermark(“additive”)•claimownership based on new watermark •combinestego object copies (“collusion”)•usedto avoid fingerprints Watermarking content aggregatesSion, Atallah, Prabhakar 9 11 mark Watermarking content aggregates Watermarking content aggregates er at W ) M B edding Stego Object Watermarking Marked Object atermark (I b w m e e bl k ey si watermar K Sion, Atallah, Prabhakar 3 layer vi Sion, Atallah, Prabhakar 4 14 16 folklore: images •visible•LS (least significant) bits•LS with secret Key•LS with secret Key and pixel suitability test (e.g. luminosity var.)•adding redundancy•embedding according to compression scheme if known (GIF -palette games)•embed in frequency domain (JPEG) by altering the DCT coefficients•masking of human eye Watermarking content aggregatesSion, Atallah, Prabhakar folklore: video LS (least significant) bits of samples••LS with secret Key•LS with secret Key and sample suitability test (e.g. noise ratio variance)•adding redundancy•per frame apply image watermarking•humanvisual-temporal perception limitations (30fps-24fps)•encodingscheme dependent watermarking (MPEG -I-frames, B-frames) •captioning(annotation vs. watermark) Watermarking content aggregatesSion, Atallah, Prabhakar 13 15 folklore: digital watermark types multimedia:••images•audio•video •non-media:•text•software/runnablecode•numeric sets•structures Watermarking content aggregatesSion, Atallah, Prabhakar folklore: audio LS (least significant) bits of samples••LS with secret Key•LS with secret Key and sample suitability test (e.g. noise ratio variance)•adding redundancy•masking of human auditory system (sound interference -low level/strong level, close frequencies)•“echo hiding” schemes•statistical embedding (relies on large-sets theory, e.g. 1 bit in every 1.2secs timeslice [1], change pdf of subsets selected using Key) Watermarking content aggregatesSion, Atallah, Prabhakar 5 18 20 folklore: code/software •code: registerallocation/use•code: orderof push/pop of registers•code: hiddenvalues in low/high order bytes•algorithms: runtimestructures (number -> graph->structure at runtime)•code: obfuscation/runtimetamperproofing•code/algorithm:inherentpart of behavior (e.g. “easteregg” -code activated after unusual input). •code: “guarding”. Watermarking content aggregatesSion, Atallah, Prabhakar our research: non-media watermarking •Generalize: •Define/formalize more general model (no FFT ! ;) •Develop generic techniques for watermarking•Define model elements assesment metrics •Non-media: •Develop generic model (above) variations for watermarking structured content•Amplify power of domain-specific marking methods•Structures (e.g. numbers,documents,MLs,text) Watermarking content aggregatesSion, Atallah, Prabhakar 17 19 folklore: text/language •“text” vs. “language” •synonyms, rearrangingtext (vs. canonical form), distances between key words, variation of distributions of letters between key words, number of words per class (e.g.verbs, substantives) •syntax/semantictree surgeries•semanticwatermarking •“stegoTuring test”:“can computer watermark NL automatically ?” Watermarking content aggregatesSion, Atallah, Prabhakar folklore: media watermarking specifics Bandwidthcomes from exploits of limitations of the Human Sensorial System and associated media noise channels Whatabout future attackers in A.D. 3000++ ?(need fundamental theoretic encoding power warranties) Watermarking content aggregatesSion, Atallah, Prabhakar 6 22 24 Stock MarketTrends Data etc. (any datawith structure) Watermarking content aggregates Dumax umax usability domain Watermarking content aggregates Web Page O' D O usabilityvicinity of O n buzz: XML ApplicationBussinessDesign &ModelImplementation Content:DTD: "how""what" XML Descriptio Interoperability Sion, Atallah, Prabhakar issues: usability Idea: same object put to different uses (“usability domains”) has different value for each of the uses (“usability”) and associated permissible distortion bounds (“allowable change in usability”). (e.g. same picture containing different objects of differing interest for different people) Sion, Atallah, Prabhakar 21 23 why ? Outsourcing of commercial data••(X/HT)ML: SOAP,webcontent,•Software meta-descriptions•B2B interactions•Stock data sharing•Customer data buying patterns•Financial analysis data•It’s fun ! Watermarking content aggregatesSion, Atallah, Prabhakar issues: model Usability••Domains•Changein usability•Vicinities •Watermark•Algorithm•Attack•Power•Domain Desiderata•Information Theory of Structures Watermarking content aggregatesSion, Atallah, Prabhakar 8 2 6 1 3 g a g W n mn g g a e a n k o t rc e r a e i 5 3 t e s t r t 1 1 6 4 0 1 1 5 2 2 1 9 4 1 8 1 7 1 o a g h g p h ? a n h r b t S A o a P a a l t , i a k l r , r e a d t a o d c k t ) ) c p U C ; L ' o i Y ; > ' 2 ( > e r 2 z T e i t S z s ; c i f " 0 ( s = 1 s f S ' 0 E = S + p 0 L S A z o s S L s & h I A C h & = . A L c p G 0 C ' r ) c 0 a + s ) ? ' z + " l N 0 O + s ' e m 0 z t = & p t s a E & } a h 0 O v k e Z c . . L o w I 1 ; s x t + S - 9 t e : 0 . 8 s ' = = e d O s I 0 + = 0 = z ; N n a ' E 0 ) s ) " i T 8 = Z 1 ' k 6 ) = _ k E I = n o + 0 = e N Z S H i t ) = e d 2 A c . I T W s ' = m a j S 0 G ' / ) a r 8 2 N ( r ' ' N g H o 8 N 5 = E f a ( n p p 1 H L O v f i p u 4 3 = T X x O W a / t . ) C H G A e ; x ' . d R S 5 T N M d 9 e ( r a M 0 1 G E n = d f o o N L " i z n O t l E 8 E X g . s i x a n E . 1 L A n m t . e g w K e X M i r a d i o > 0 9 T A r o e . n ; v d " T M " t f w . i " a / t v ( h s } } t . . x n g p 5 1 " c h a r g m p ( n i i N l r c ; ; l a n ; r 7 i r T o a r 9 8 p v i 5 o 1 & t c t 8 T 0 b e a = = r ) = f " & u S c m s e z z & ; t z t = p ) a q I N 9 y " s s s & 1 s 5 s a ] ) m 4 v R 0 s = " k k 2 b = k l ' 4 o a A a " E = o o " = u > o p e = c < J O = M E t t e z s ) t z = ) / " E 5 E A M s s p s . n s & i ) } 4 m ) = A d M N A a h t o & S n < o n E t A N r r c c n i r t o ; ) c o G P C C N T a a s r e s a " n i m n . ; i A w s M s X T v v t s g r v e o s r o e ) s U T E X e A e p f r o i p ' r G o & X T E ; ; N r r V ; a ' e f s a > e N U Y a o E = T 7 8 " a e p 3 c [ V t r c T V A T E = = = = v s p = s l p a e s E p L M = P E z z = u a z t l p l V t S p N D K S T L E Y P s s e { . . s e a ) a p p e E a " P T Y t t m r r t N . ) . . p n M . ; t Y T a a a ) o o a " b 1 r r a . A r ' p T T e e ) N " t t e = A - o o } . e R o ' i U T w w 0 p 2 a a w = c = t t r m F t r T P U = p . g g e 4 = a a ; o o < a = c U N P r r ! a 6 i i r m s ) g g " t h ' g s P I N a a ) . " v v a a . ' i i N a / ( i e a N < I v v ' r a a v N s c v v O g / e v g v I ' < c o = n n p e a a a N i : t a a a < = ' ; ; a t > ( ; p s M n n " v p i n s j ' m = 1 9 M a = t 0 a s ' ( = = a t r ( s / = r m 2 1 ' g r n 1 . a ( t m m n t w t e t m o r = = ( i e r I = r l f n r r ( h . a m x r f o z z f v v e e z o c O I o o t " t o _ e o r ; f s s O a _ v s s t . x e f f n = n l e t f e ' h h h x n e _ r h a t e s t t I n e F m " k h > c c c e ( e a c g n d r a a e o m e o = c t 3 r r r d ( p r i e n a l l s i u s c E o a e a s s n f r s v m i p p p r t c r l P t e z e i i f a ( a u . ( a a o a e Y > s w i s r r . i v r n c m ( r r p c d p w T T > s a e a m e f a ( o r ( a e a ( o / ( P - r r f r v s v r s } { i v d o v s v l / r T t I - a a = a l o l f f f l f f a - P R v v S v { e { f e } { i } { t i { e { i } { i v - I C / S t a ! R e S / A a . l < C / L l . p S < C p . ( < : r e o l o h c ; e s n o e n : n l o y } i t d a l t o r o b s } c f : } i t e f r h f d e f g f - s f i t f - e x # s } } w e : c n d r t 0 - o a { l t o } 0 l n s o q b b } # , o c . x 8 l : : , a } f t { p r e a 0 h } t o i ; 4 . r c v 1 0 c g , l a } i 1 c d o 5 0 0 t i 0 : e 6 c : e c 5 y 0 # 0 e a w { 6 t # l # : f i k i z : r 3 s m # } : - : a n o i r 0 t r i s 3 : } v o } n i a o l r 0 8 - l q o x # } : o l : o f l t a f 0 0 o a e o . c - n : l 6 f , r o t 8 a c t 1 c k o n t f o w ; ; c { . 0 0 6 n t t f o x , d l ; : p p { } ; . 0 9 o e f t i p , c 0 e f e 6 l p 0 0 k { } # v a c : o c v 0 t # a a e e 2 1 n 1 i - : # i t 6 t i 0 s r : , d n 1 : : - c t i : r p q o l 6 c e e n i # r o n : a o v t . . w l a o 3 t : z z i : - o , , n u : l : e g r r o v d i i l 3 a : l - . c o s : o o f s l z r ! a a t i f c e y . # a . { d < - l - c . r t e , i , { o a k e s t t . , h t d : t g > w m , { k > i n , e n r n c d s s - l s k k e k o e p n l s v . t o o e } { { i t i n n c , n t c l l l ; v s i d , i f k f i l a : o y l c t s n i i y e a s o { { > y : a : n i l l c t n t 0 a i a d c : l : d . i , c b f e f t . . s a a - h i v v i v U { { . 0 { , . o s l - { / , b q n n a h : : : , i z i r / t < t t f < # d a a a d b i r / . . . . . . . . . . . a u p t h a o e S s y e s k k e : f h f n o o c n i t t l . s c a . i s . d e W l y n g d r a t a n e / w e p n h i k d e o n r p s a e i a e s ' r d M m y / T t s x r a e h . . f o i S . . r o m f . . t a o m h / / u u d d e e m . . e e o u u c d . d e r r l u u g p p o . . o s s g c c . . . w w w w w w w w w / / / / / / : : : p p p t t t t t t h h h 7 26 issues: key pre-commitment k OO'Owmdet ww Given a data domain D, an object O in D and a watermarking algorithm wm is there any way to find a key k that will yield a desired mark w in the unmarked O ? In other words, for the givendomain and algorithm class: “can we torture the data until it confesses ?” Watermarking content aggregatesSion, Atallah, Prabhakar 25 27 issues: generic challenge The main challenge in watermarking lies inkeeping the watermarking object within close vicinity of the original object in all considered usability domains while maximizing the power metric level of the application. (i.e. given a set of usability domains and associated permissible changes in usability) Watermarking content aggregatesSion, Atallah, Prabhakar hypertext Watermarking content aggregatesSion, Atallah, Prabhakar 8 30 32 aggregates: challenges •Lackof inherent structural noise •Newtransform domain (content and structural) •Newdata types •Manydifferent datatypes •Structuredvs. non-structured •Isolategeneral model from data domain specifics Watermarking content aggregatesSion, Atallah, Prabhakar aggregates: attacks •A1: node elimination (subtractive)•A2: inter-node relation elimination•A3: value preserving partitioning•A4: node content altering•A5: addition of fake nodes•etc! Watermarking content aggregatesSion, Atallah, Prabhakar 29 31 aggregates: initial ideas •Structure -> what about “any” structure•Value in structure andcontent•Node/items labeling (TCL)•Attacks -> tolerant labeling•Resillience -> partitioning •Semantic partitioning •Primitive watermark: noise injection •Resillience -> hierarchical watermarking Watermarking content aggregatesSion, Atallah, Prabhakar aggregates: challenging properties StructuredDataContent Structurevariablechangelow changetolerancetolerance ContentStructurelow bandwidthvariable bandwidthWatermarkingWatermarkinghigh fragilityhigher resilience WatermarkedStructuredDataModified Content Modified Structure Watermarking content aggregatesSion, Atallah, Prabhakar 9 34 36 aggregates: “angry/content hashes” = function of content, angry hash (content)(specific to it) that tolerates “minor” (in terms of usability) changes to content. (e.g. longest number of most significant bits for set of integers s.t. resulting hash values are maximally distinct) Watermarking content aggregatesSion, Atallah, Prabhakar aggregates: mark amplification Byapplying a weak mark on secret subsets of the original collectionsubset collection, the overall weakmarkpower ofthe marking schemeis effectively amplified collection collectionsubset Watermarking content aggregatesSion, Atallah, Prabhakar 33 35 Watermarking content aggregates beling (TCL) watermarkingalgorithm collection labelingtraining scenarios training/surgery C'C''C''' primitive labeling L'L''L''' Composite Label Watermarking content aggregates a g al l abelin anonic alterationconstraints collectiondata (C) aggregates: primitive l Labels are location and content aware by depending on both topology and content of node (“angry hashes”) Sion, Atallah, Prabhakar aggregates: tolerant c Compositelabels of collection items are formed of sets (or confidence intervals) of individual labels resulted after successive training(e.g. original graph surgery) and labeling sessions. Each labeling session is self-adjusting according to history Sion, Atallah, Prabhakar 0 1 38 40 numeric sets Problem:NGivenset of numbers , a set of local Tand global allowable distortions bounds , and a Kset of keys , determine the watermarked version NN’Tof , such that all elements in are satisfied N’powerand features enough watermark . Question:how much is “enough” ? Idea:use/alter global numeric properties (within distortion limits T) as bandwidth channels (e.g. confidence intervals), together with secret subset selection. Watermarking content aggregatesSion, Atallah, Prabhakar DBMS: challenges •Newtransforms •Viewsand data mining •Preservationof relational model •Preservation of consistency •Numericvs. alphanumeric vs. binary •Attributesemantics awareness Watermarking content aggregatesSion, Atallah, Prabhakar 37 39 marking 2 6 A B8 C 15 Watermarking content aggregates m meanm atorsm mean selection Watermarking content aggregates ater 1 7 A K e froof ite s viole fro bset al w ancbits rvalanc dsu aggregates: hierarchic 23 6541 10916138 127 1511 14 J 8 Sion, Atallah, Prabhakar numeric sets: ideas •Numeric Set•Semantics•Structure •Labeling•Normalizeddist•Mostimportant •Weak mark•Confidenceinte•Normalizeddist •Amplification: keye Sion, Atallah, Prabhakar