ebook img

Spread-spectrum watermarking of audio signals PDF

14 Pages·2001·1.37 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Spread-spectrum watermarking of audio signals

1020 IEEETRANSACTIONSONSIGNALPROCESSING,VOL.51,NO.4,APRIL2003 Spread-Spectrum Watermarking of Audio Signals Darko Kirovski and Henrique S. Malvar, Fellow, IEEE Abstract—Watermarkinghasbecomeatechnologyofchoicefor A. WatermarkingTechnologies a broad range of multimedia copyright protection applications. Audio watermarking schemes rely on the imperfections of Watermarks have also been used to embed format-independent metadatainaudio/videosignalsinawaythatisrobusttocommon the human auditory system (HAS) [3]. Numerous data hiding editing.Inthispaper,wepresentseveralnovelmechanismsforef- techniquesexplorethefactthattheHASisinsensitivetosmall fectiveencodinganddetectionofdirect-sequencespread-spectrum amplitude changes, either in the time [4] or frequency [5]–[7] watermarksinaudiosignals.Thedevelopedtechniquesaimati) domains, as well as insertion of low-amplitude time-domain improving detection convergence and robustness, ii) improving echoes[8].Informationmodulationisusuallycarriedoutusing: watermark imperceptiveness, iii) preventing desynchronization attacks, iv) alleviating estimation/removal attacks, and finally, v) SS[9]orquantizationindexmodulation(QIM)[10].Themain establishing covert communication over a public audio channel. advantageofbothSSandQIM isthatWMdetectiondoesnot Weexplorethesecurityimplicationsofthedevelopedmechanisms require the original recording and that it is difficult to extract and review watermark robustness on a benchmark suite that thehiddendatausingoptimalstatisticalanalysisundercertain includesacombinationofaudioprocessingprimitivesincluding: conditions[11]. time- and frequency-scaling with wow-and-flutter, additive and multiplicativenoise,resampling, requantization, noise reduction, However,itisimportanttoreviewthedisadvantagesthatboth andfiltering. technologiesexhibit.First,themarkedsignalandtheWMhave tobeperfectlysynchronizedatWMdetection.Next,toachieve IndexTerms—Audiosignals,covertcommunication,desynchro- nization,estimationattacks,spread-spectrum,watermarking. a sufficiently small error probability, WM length may need to be quite large, increasing detection complexity and delay. Fi- nally,themostsignificantdeficiencyofbothschemesisthatby I. INTRODUCTION breakingasingleplayer(debugging,reverseengineering,orthe WITH the growth of the Internet, unauthorized copying sensitivity attack [12]), one can extract the secret information and distributionofdigital mediahas neverbeeneasier. (theSSsequenceorthehiddenquantizersinQIM)andrecreate Asaresult,themusicindustryclaimsamultibilliondollaran- the original (in the case of SS) or create a new copy that in- nualrevenuelossduetopiracy[1],whichislikelytoincrease ducestheQIM detectortoidentifythe attackedcontent asun- duetopeer-to-peerfilesharingWebcommunities.Onesource marked.Whileaneffectivemechanismforenablingasymmetric ofhopeforcopyrightedcontentdistributionontheInternetlies SSwatermarkinghasbeendeveloped[2],anequivalentsystem intechnologicaladvancesthatwouldprovidewaysofenforcing forQIMdoesnotexisttodate. copyrightinclient-serverscenarios.Traditionaldataprotection methodssuchasscramblingorencryptioncannotbeusedsince B. TechniquesforSSWatermarkingofAudio thecontentmustbeplayedbackintheoriginalform,atwhich Inthispaper,werestrictourattentiontodirect-sequenceSS point,itcanalwaysbererecordedandthenfreelydistributed.A WMs and develop a set of technologies to improve the effec- promisingsolutiontothisproblemismarkingthemediasignal tivenessoftheirembeddinganddetectinginaudio.WMrobust- withasecret,robust,andimperceptiblewatermark(WM).The ness is enabled using i) block repetition codingfor prevention mediaplayerattheclientsidecandetectthismarkandconse- againstde-synchronizationattacks[13]andii)psycho-acoustic quentlyenforceacorrespondinge-commercepolicy. frequencymasking(PAFM).WeshowthatPAFMcreatesanim- Recent introduction of a content screening system that uses balanceinthenumberofpositiveandnegativeWMchipsinthe asymmetric direct sequence spread-spectrum (SS) WMs has partoftheSSsequencethatisusedforWMcorrelationdetec- significantly increased the value of WMs because a single tion and that corresponds to the audible part of the frequency compromised detector (client player) in that system does not spectrum. To compensate for this anomaly, we propose a iii) affect the security of the content [2]. In order to compromise modified covariance test. In addition, to improve reliability of the security of sucha systemwithout any traces, an adversary WMdetection,weproposetwotechniquesforreducingthevari- needstobreakintheexcessof100000playersforatwo-hour anceof the correlationtest: iv)cepstrum filteringand v)chess high-definitionvideo. WMs.SinceweembedSSWMsinthefrequencydomain,the energy of a WM is distributed throughout the entire synthesis block,makingSSWMsaudibleinblocksthatcontainquietpe- riods.Wesolvethisproblemusingvi)aprocedurethatidentifies ManuscriptreceivedFebruary4,2002;revisedDecember10,2002.Theasso- ciateeditorcoordinatingthereviewofthispaperandapprovingitforpublication blockswhereSSWMmaybeaudibletodecidewhethertouse wasDr.AhmedTewfik. aparticularblockintheWMembedding/detectionprocess.Fi- TheauthorsarewiththeMicrosoft Research,Redmond,WA98052USA nally, we propose vii) a technique that enables reliable covert (e-mail:[email protected];[email protected]). DigitalObjectIdentifier10.1109/TSP.2003.809384 communicationoverapublicaudiochannel. 1053-587X/03$17.00©2003IEEE KIROVSKIANDMALVAR:SPREAD-SPECTRUMWATERMARKINGOFAUDIOSIGNALS 1021 In order to investigate the security of SS WMs, we explore therobustnessofsuchatechnologywithrespecttowatermark estimation attacks [2]. To launch that attack, an adversary is assumed to know all the details of the WM codec, except the hidden secret. We present a modification to the traditional SS WMdetectorthatviii)undoestheattackand,hence,forcesthe adversarytoaddanamountofnoiseproportionalinamplitudeto therecordedsignalinordertosuccessfullyremoveanSSWM. Wehaveincorporatedthesetechniquesi)-viii)intoasystem capable of reliably detecting a WM in an audio clip that has beenmodifiedusinga compositionofattacksthatdegradethe original audio characteristics beyond the limit of acceptable quality.Suchattacksincludefluctuatingscalinginthetimeand frequencydomain,compression,additionandmultiplicationof noise,resampling,requantization,normalization,filtering,and randomcuttingandpastingofsignalsamples. InSectionII,wereviewthebasicaspectsofSSwatermarking, andinSectionIII,wedescribethespecificsforaudioWM.We consider the overal security aspects in Section IV and present finalremarksinSectionV. Fig. 1. Process of WM embedding: conversion of a block of time-domain samplesintotheMCLTdomain,SSWMaddition,andconversionbacktothe II. BASICSOFSPREAD-SPECTRUMWATERMARKING time-domain. The media signal to be watermarked can be mod- tector is optimal [14]. The probability of a false positive eledasarandomvector,wheretheelements areindependent detection(falsealarm)is identically distributed (i.i.d.) Gaussian random variables, with standarddeviation ,i.e., .1 Because actually representsacollectionofblocksofsamplesfromanappropriate erfc (2) invertible transformation on the original audio signal [5], [7], [9],suchmodelingisarguableandisfurtherdiscussedinSec- and the probability of a false negative detection (misde- tionV.AwatermarkisdefinedasadirectSSsequence ,which tection)is is a vector pseudo-randomly generated in . Each element isusuallycalleda“chip.”WMchipsaregenerated suchthattheyaremutuallyindependentwithrespecttotheorig- inalrecording .Themarkedsignal iscreatedby , erfc (3) where istheWMamplitude.Thesignalvariance directly impactsthesecurityofthescheme:thehigherthevariance,the Straightforwardapplicationoftheprinciplesaboveprovides more securely information can be hidden in the signal. Simi- neitherreliabilitynorrobustness.Inthefollowingsubsections, larly,higher yieldsmorereliabledetection,lesssecurity,and weoutlinethe deficienciesofthe basicSSWMparadigmand potentialWMaudibility. providesolutionsforimprovedWMrobustness,detectionreli- Let denotethenormalizedinnerproductofvectors and ability,andresiliencetocertainpowerfulattacks. ,i.e., with .Forexample,for asdefinedabove,wehave .AWM isdetectedby III. HIDING SPREAD-SPECTRUM SEQUENCES correlating(ormatchedfiltering)agivensignalvector with : INAUDIOSIGNALS In our watermarking system, the vector is composed of (1) magnitudes of several frames of a modulated complex lapped transform(MCLT)[15]inadecibel(dB)scale.TheMCLTisa Underno malicious attacksorother signal modifications,if 2 -oversampledfilterbankthatprovidesperfectreconstruction. thesignal hasbeenmarked,then ,else TheMCLTissimilartoaDFTfilterbank,butithasproperties . The detector decides that a WM is present if , that makes it attractive for audio processing, especially when where is a detection threshold that controls the tradeoff be- integrating with compression systems, because signals can tween the probabilities of false positive and false negative de- easily be reconstructed from just the real part of the MCLT cisions. We recall from modulation and detection theory that [15]. After addition of the WM, we generate the time-domain under the condition that and are i.i.d. signals, such a de- markedaudiosignalbycombiningthevector with the original phase of and passing these modified frames to theinverseMCLT.Fig.1illustratesthisprocessonanexample time-domainframe.Typically,WMamplitude issettoafixed 1N(a;b)denotesaGaussianwithmeanaandvarianceb . value in the range 0.5–2.5 dB. For example, for dB, 1022 IEEETRANSACTIONSONSIGNALPROCESSING,VOL.51,NO.4,APRIL2003 Theexpectationfortherelativedifference inthenumberof positiveandnegativechipsinthecorrelatedaudiblepartofthe SSsequenceequals (5) where ifcorresponding isaudibleand if isinaudible. Fig. 2. PAFM: (Left) Example MCLT frequency block with an identified Asymmetricdistributionofpositiveandnegativechipsinthe maskingfunctionand(right)anexampleofhowWMadditionincreasesthe maskedSSsequencecandrasticallyinfluencetheconvergence numberofpositivechipsthatcorrespondtotheaudiblepartoftheMCLTblock. of the correlation test in (1). The convergence is affected be- cause the expected value of the correlation test has trained ears cannot statistically pass a distinction test between anadditionalcomponentproportionalto .Forourbenchmark watermarked and original content for a benchmark suite con- suite, averaged0.057at dB,withpeakvaluesreaching sisting ofpop,rock,jazz, classical,instrumentsolo,andvocal forrecordingswithlowharmoniccontent.Thus,when- musical pieces. For the typical 44.1 kHz sampling, we use a everPAFMisused,thenormalizedcorrelationtest(1)mustbe length-2048MCLT.Onlythecoefficientswithin200Hz–2kHz replaced with a covariance test that compensates for using a are marked, and only the audible magnitudes in the same nonzero-mean SS sequence. Assuming , and , are sub-band are considered during detection. Sub-band selection the mean and variance of the audible portion of selected by aimsatminimizingcarriernoiseeffectsaswellassensitivityto positiveandnegativeSSchips,respectively,andsignal iswa- downsamplingandcompression. termarked,thecorrelationtestin(1)canberewrittenas A. Psycho-AcousticFrequencyMasking:Consequencesand Remedies TheWMdetectorshouldcorrelateonlytheaudiblefrequency magnitudeswiththeWM[7]becausetheinaudibleportionsof thefrequencyspectrumaresignificantlymoresusceptibletoat- (6) tacknoise.Thatreducestheeffectivewatermarklengthbecause the inaudible portion often dominates the frequency spectrum ofanaudiosignal[6]. wherethenoisecomponent ofthedetectiontesthas In order to quantify the audibility of a particular frequency a mean and variance component,weuseasimplePAFMmodel[16].ForeachMCLT .Themeanvalue ofthepartofthe magnitudecoefficient,thelikelihoodthatitisaudibleaverages originalsignal thatcorrespondstotheaudiblepartof canbe 0.6 in the crucial 200 Hz–2 kHz subband in our audio bench- expressedas ,whereasthemean marksuite.Fig.2illustratesthefrequencyspectrumofanMCLT value of the audible partof equals , where blockaswellasthePAFMboundary.PAFMfilteringintroduces ifsignal iswatermarkedand inthealternate the problem of SS sequence imbalance: a problem also illus- case.Thus,byusingatraditionalcovariancetest trated in Fig. 2. When embedding a positive chip ( ), aninaudiblefrequencymagnitude becomesaudibleif (7) ,where returnsthelevelofaudibilityforthear- gument magnitude for a given MCLT block. Similarly, when thedetectorwouldinduceameanabsoluteerrorof embedding a negative chip ( ), an audible magnitude to the covariance test because of the mutual dependency of becomesinaudibleif .Wedefine , ,and and .Considerthefollowingtest: as the ratios of frequency magnitudes that fall within the correspondingranges (8) whichresultsinanoisecomponent forthistestequal (4) to and . KIROVSKIANDMALVAR:SPREAD-SPECTRUMWATERMARKINGOFAUDIOSIGNALS 1023 Computationof from canbe maderelatively accurateasfollows.First, and arecomputedasmeansof theaudiblepartofthesignal selectedbypositiveandnegative chips respectively. Then, if , we conclude thatthesignalhasbeenwatermarkedandcompensatethetestin (8)for ;inthealternatecase,wecompensatefor .Parameter isaconstantequalto ,whichensures lowlikelihoodofafalsealarmormisdetectionthroughselection of (2),(3). An error of 2 in the covariance test occurs if the original signal is bipartitioned with the SS chips such that .ThiscasecanbedetectedatWMencodingtime.Then, Fig. 3. Example of block repetition coding along the time and frequency theencodercouldsignalanaudiosignalblockashard-to-mark, domainofanaudioclip.Eachblockisencodedwiththesamebit,whereasthe detectorintegratesonlythecenterlocationsofeachregion. oritcouldextendthelengthoftheWM.Suchcasesareexcep- tionallyrareforrelativelylongSSsequencesandtypicalmusic contentrichinsoundevents.Notethattheexactcomputationof of the encoding regions , are computed using a and wouldalsoresolvetheerrorproblemincurredinthe geometricprogression originalcovariancetestin(6)throughexactcomputationof . Thus, the two tests in (6) and (8) are comparable and involve computation of similar complexity. On super-pipelined archi- tectures,weexpectthetestin(8)tohavebetterperformancevia loopunfolding,asitdoesnotusebranchtesting. (9) B. PreventingtheDesynchronizationAttack The correlation metrics from (1)–(3) are reliable only if where isthewidthofthedecodingregion(centraltotheen- the majority of detection chips are aligned with those used codingregion)alongthefrequency.Similarly,thelengthofthe in marking. Thus, an adversary can attempt to desynchronize WM ingroupsofconstant , MCLT the correlation by fluctuating time- or frequency-axis scaling blocks watermarked with the same SS chip block is delimited withinthelooseboundsofacceptablesoundquality.Toprevent by ,where isthewidthofthedecoding such attacks, we use a multitest methodology that relies on region along the time-axis. Lower bound on the replication in blockrepetitioncodingofchipsoftheWMpattern. thetimedomain issetto100msforrobustnessagainstcrop- Itisimportanttodefinethedegreesoffreedomfortime-and pingorinsertion. frequency-scaling that preserves the relative fidelity of the at- If a WM length of MCLT blocks does not produce tackedrecordingwithrespecttotheoriginal.TheHASismuch satisfactory correlation convergence, additional MCLT blocks more tolerable to constant scaling rather than wow-and-flutter ( ) are integrated into the WM. Time-axis replica- (variationsinscalingovertime).Hence,weadoptthefollowing tion , foreachgroupoftheseblocksisrecursively tolerancelevels,whichareappropriateinpractice: for computedusingthegeometricprogression(10).Withinaregion constant time-scaling and for constant frequency- of sampleswatermarkedwiththesamechip ,onlythe scaling and scaling variance along both time and center samplesareintegratedin(1).Itisstraightforward frequency. toprovethatsuchgenerationofencodinganddecodingregions 1) Block Repetition Coding: In the first step, we provide guaranteesthatregardlessofinducedwow-and-flutterlimitedto resilience against fluctuations in playtime and pitch bending ,thecorrelationtestisperformedinperfectsynchronization. (wow-and-flutter) of up to a fixed parameter , which de- Typicalredundancyparametersarei)constantreplicationalong limitsthemaximumfluctuationmagnitudeindependentlyalong timeaxis5–10MCLTblocksandii)geometricallyprogressed any of these two dimensions. As common standard values for replicationalongthefrequencyaxissuchthattypically50–120 wow-and-flutter for modern turntables are significantly below chipsareembeddedwithinthetargetsub-band200–2kHz. 0.01,weadoptthisvalueasourrobustnesslimit. 2) Multiple Correlation Tests: The adversary can combine wow-and-flutter with a stronger constant scaling in time and We represent an SS sequence as a matrix of chips frequency. Constant scaling of up to along the time , , and , where is the number axisand alongthefrequencyaxiscanbeperformed of chips per MCLT block, and is the number of blocks of on an audio clip with good fidelity with respect to the orig- chipsperWM.WithinasingleMCLTblock,eachchip inalrecording.Resiliencetostatictime-andpitch-scalingisob- isspreadoverasub-bandof consecutiveMCLTcoefficients. tainedbyperformingmultiplecorrelationtestsasfollows: Chips embedded in a single MCLT block are then replicated alongthetimeaxiswithinconsecutive MCLTblocks.Anex- ampleofhowredundanciesaregeneratedisillustratedinFig.3 1) pointer 0; progress ; ( de- (withfixedparameters , forall and ).Widths notes WM length in MCLT blocks. 1024 IEEETRANSACTIONSONSIGNALPROCESSING,VOL.51,NO.4,APRIL2003 Fig. 4. Example of how a WM is detected during the search process. The correlationtestthatcorrespondstooneparticulartime-andfrequency-scaling hassynchronizedtheWMwiththeMCLTblockindexed671. Fig.5. DemonstrationofanoriginalMCLTblockanditscepstrumfiltering. ThedashedlinerepresentstheCF-envelopesubtractedfromtheoriginalMCLT block. 2) load buffer with MCLT co-efficients from progress consecutive MCLT blocks concludewhetherthereisaWMornotintheaudioclipbased starting from the MCLT block indexed with on the SS statistics from (1) and regardless of the presence of pointer. the attack. 3) for time.scaling to step and for frequency.scaling to C. CepstrumFiltering step , correlate buffer with WM scaled according to time.scaling and fre- Thevariance oftheoriginalsignaldirectlyaffectsthecar- quency.scaling. rier noise in (1). Audio clips with large energy fluctuations or 4) if (WM found in buffer with withstrongharmonicsareespeciallyboundtoproducelarge . time.scaling ) then progress Thus, we propose here a nonlinear processing step to reduce else progress . thecarriernoise.Oneapproachistosubtractamovingaverage 5) pointer progress; goto 2). fromthefrequencyspectrumrightbeforecorrelation:asortof whitening step. Unfortunately, as bits of the SS sequence are spreadoverfrequencyranges,thistechniqueinducespartialre- The search algorithm initially loads a buffer of MCLT movaloftheWMchips.Wehavedevelopedacepstrumfiltering coefficientsfrom consecutiveMCLTblocks.Then, (CF) technique that produces significantly better results than the loaded contents are correlated with different scalings of just spectral whitening. With CF, we reduce in (1) through the searched WM; the scalings are such that they create a thefollowingsteps: grid over with minimal distance between points (tests). Due to block redundancy coding, each 1) DCT —compute the cepstrum of the test candetectaWMiftheactualscalingoftheclipis dB magnitude MCLT vector under test via withinthe region.Thetest the discrete cosine transform. yielding the greatest correlation 2) , —filter out the first is compared with the detection threshold to determine WM (typically ) cepstrum coeffi- presence. If WM is found, the entire buffer is reloaded with cients. newMCLTcoefficients.Otherwise,thecontentofthebufferis 3) IDCT —reconstruct the frequency shiftedfor MCLTblocks,andanewsetoftestsisperformed. spectrum via an inverse DCT. The filtered Inatypicalimplementation,for ,inordertocover frequency spectrum replaces in the cor- and ,theWMdetectorcomputes105dif- relation detector (1). ferentcorrelationtests.Thesearchstepalongthetimeaxisde- notedas typicallyequalsbetweenoneandfourMCLTblocks. An example is shown in Fig. 4. Note that the main incentive TherationalebehindCFisthatlargevariationsin canonly for providing such a mechanism to enable synchronization is comefromlarge variationsin since islimitedtoasmall the fact that, within the length of the WM, the adversary re- value .Thus,byfilteringoutlargevariationsin ,wecan allycannotmoveawayfromtheselectedconstanttimeandfre- reducethecarriernoisesignificantly,withoutaffectingmuchthe quency scaling more than ; such a change would induce expectedvalue .ThatisparticularlyefficientiftheWM intolerablesoundquality.Iftheattackeriswithintheassumed sequence hasanonwhitespectrumcontainingmorenoiseat attackbounds,thedescribedmechanismenablesthedetectorto higher frequencies, as discussed in the next subsection. Fig. 5 KIROVSKIANDMALVAR:SPREAD-SPECTRUMWATERMARKINGOFAUDIOSIGNALS 1025 theremainderoftheblockisrichinaudioenergy.SincetheSS sequencespreadsovertheentireMCLTblock,itcancauseau- diblenoiseinthequietportionoftheMCLTblock(seeFig.7). To alleviate that problem, we detect MCLT blocks with dy- namiccontentwhereanSSWMmaybeaudibleifadded.The blocks are identified according to an energy criterium, for ex- ample,asdescriedbelow.WMsarenotembeddednordetected in such blocks. Fortunately, such blocks do not occur often in audiocontent;inourbenchmarkset,weidentifiedupto of MCLT blocks per WM as potential hazard for audibility. By not marking these blocks, the corresponding correlation is bound to a lower expected value , which causes only a minor effect on detector’s decision. The detec- tionofhazardousblocksisperformedoneachlength- MCLT blockusingthefollowingalgorithm. Fig. 6. (a) Convergence of the normalized correlation C(y;w) with WM lengthforanonwatermarkedsignal.Topthreeplots:90%percentilelimitsof C(y;w)(90%ofthecorrelationvaluesareundereachcurve),foratraditional 1) Compute the interval energy level purely random SS sequence, a perfect WM (PW), and a chess WM (CW). , Bottomthreeplots:CorrespondingstandarddeviationsofC(y;w)inthesame order.(b)SimplestatemachinethatproducesachessWM(p>0:5). for each of the interleaved subin- tervals of the tested signal in the time-domain (commonly ). Block illustratestheimpactofCFonthesignalvariance,whichistyp- subintervals are illustrated in Fig. 7. icallyreducedbyafactorofalmostfour.Thus,inordertoattain 2) if ( ) then WM the performance of CF detector, a non-CF detector must inte- is audible in the block. Parameter is gratealmostfourtimesmoremagnitudepoints. empirically determined. D. ChessWatermarks Because of the relatively short MCLT frames (30 ms), we F. CovertCommunicationOverAudioChannels assume that the audio signal has a slowly varying magnitude SS provides only means of embedding (hiding) spectrum.Thus,forshortWMs,apossiblesequenceintimeof pseudo-random bit sequences into a given signal carrier severalconsecutivepositiveWMchipscanposefalsealarmsif (audio clip). One trivial way to embed an arbitrary message correlatedwithlargepositive values.Inpractice,thatproblem intoaSSsequenceistouseapoolofWMssuchthateachWM occurs frequently for quiet clips with strong harmonics (e.g., representsasymbolfromanalphabetusedtocreatethecovert piano or sax solo). To alleviate the problem, it is important to message. Depending on the symbol to be sent, the encoder attenuate the DC component of the WM chips along the time selects one of the WMs from the pool and marks the next direction. consecutive part of audio with this WM. The detector tries all We define a perfect WM (PW) as a sequence of alternating WMs from the pool, and if any of the correlation tests yields positiveandnegativechips,alongboththetimeandfrequency a positive test, it concludes that the word that corresponds to axis. Correlation with PW results in highly improved correla- the detected WM has been sent. Since a typical WM length tion convergence for a nonwatermarked signal, as illustrated in our implementation ranges from 11 to 22 s, to achieve a in Fig. 6. To leverage the convergence efficacy of PW with covert channel capacity of just 1 b/s, the detector is expected the security of pseudo-random SS sequences, we introduce a to perform between 210 and 221 different WM tests. Besides chess-WM(CW).WedefineaCWasastochasticapproximation beingcomputationallyexpensive,thistechniquealsoraisesthe toaPWbyusingthesimplefirst-orderstatemachinedepicted likelihoodofafalsealarmormisdetectionbyseveralordersof inFig.6.Whereastheprobability ofswitchingfromthe“0” magnitude. state to the “1” state for traditional SS sequences is desired to Therefore,itisclearthatacovertchannelcannotrelysolely beone-half, webuiltCWstoenforcefrequenttogglingofbits on WM multiplicity, and thus, some form of WM modulation alongthetimeaxisor,equivalently,toemphasizehighfrequen- must be considered. A basic concept for the design of a mod- ciesintheWMsequence.Wetypicallyselect .Fora ulation scheme is the observation that if we multiply all WM sufficientlylarge ,therandomnessreductioninthesequence chips by 1, the normalized correlation changes sign but not domaindoesnotposeasecuritythreat,whileresultingincorre- magnitude. Therefore, the correlation test can detect the WM lationconvergencesimilartoPW(typically ). bythemagnitudeofthecorrelationandthesigncarriesonebit ofinformation. E. ImprovingtheInaudibilityofSpread-SpectrumWatermarks The covert communication channel that we have designed inAudio usestwoadditionalideas.First,toadd messagebits,the SS SS WMs can be audible when embedded in the MCLT do- sequenceispartitionedalongthetime-axisinto equal-length main,evenatlowmagnitudes(e.g., dB).Thiscanhappen subsets , ,whereeach consistsofallWMchips in blocks where certain parts (up to10 ms)are quiet, whereas such that . Thus, there are 1026 IEEETRANSACTIONSONSIGNALPROCESSING,VOL.51,NO.4,APRIL2003 Fig.7. ExampleofaudibilityofaSSWMwhenembeddedinthefrequencydomain.TheblackplotdenotesasingleMCLTblockoftimedomainsampleofthe originalrecording,whereasthegreylinedenotesthecorrespondingmarkedrecordingwithaudiblenoisepriortothesignalpeak. falsealarm (2)canbecomputedusingtheuppertailofthe chi-squaredpdfwith degreesoffreedom: (11) where istheGammafunction.Thelowerboundonthelike- lihood of a WM misdetection is computed according to (3) as the third component in (10) can be neglected for marked sig- nals because it is always positive. Bits of the covert message are recovered at detection time as the sign of partial correla- tions sign .Thelikelihoodofabitmisdetec- tion onceaWMisdetectedequals Fig. 8. Embedding a permuted covert communication channel over the temporalandspectraldomain. chipblocksof chipspereach .Eachbit ofamessage isusedtomultiplythechipsofthecorresponding whilecreatingthemarkedcontent ,where erfc (12) and arecontentblocksthatcorrespondto .A typical exampleisshowninFig.8. Atdetectiontime,thesquaredvalueofeachpartialcovariance Finally,inordertoimprovetherobustnessofeachbitofthe test —computedusing(1)—isaccumulatedtocreate encodedcovertmessage,weperformasecretpermutation thefinaltestvalueasfollows: of the message bits for each MCLT subband . Thus, a per- mutedbit iscombinedwithchipblocksalongacertain subband , (eachblockhas chips)andthen embedded in the original content as . This procedure aims at i) spreading each bit of the encoded covertmessagethroughouttheentireWMforsecurityreasons (anattackercannotfocusonlyonashortpartofthecliphoping toremovethemessagebit)andii)increasingtherobustnessof thedetectionalgorithmbecauseofspreadinglocalizedvariances of noise over the entire length of a WM. The process of per- mutingbitsofthemessageisillustratedinFig.8. (10) G. SummarizingDiscussion Therefore, in this case has three components: i) a We have deployed the techniques described in the previous mean and ii) a zero-mean Gaussian random variable (both of subsectionstocreateanaudiowatermarkingsystemwithstrong themequaltozeroifthecontentisnotmarked)andiii)asumof robustnesswithrespecttocommonaudioeditingprocedures.A squaresofGaussianrandomvariables.Thus,thelikelihoodofa block diagram that illustrates how the developed technologies KIROVSKIANDMALVAR:SPREAD-SPECTRUMWATERMARKINGOFAUDIOSIGNALS 1027 Fig.9. BlockdiagramoftheWM(left)embeddingand(right)detectionprocedures. arelinkedintoacohesivesystemforaudiomarkingispresented catenated into a single sound clip on these diagrams): (a) and in Fig.9. (b) versus (c) and (d) demonstrates strong gain in Areferenceimplementationofourdatahidingtechnologyon variance due to cepstrum filtering, and (e) and (f) versus (g) an x86 platform requires 32 Kbytes of memory for code and and (h) showcases slightly reduced detection reliability due 100Kbytesforthedatabuffer.Thedatabufferstoresaveraged to the permuted covert communication (PCC) channel. Peaks MCLT blocks of 12.1 s of audio (for a WM length of 11 s). in the correlation test clearly indicate detection and location WMs are searched with , which requires 40 of each WM. Note that the peak values for both detectors are testspersearchpoint.Real-timeWMdetectionunderthesecir- virtuallythesame;however,thenegativedetectionforthePCC cumstancesrequiresabout15MIPS,which isasmallrequire- decoderyieldsslightlyhighervariance(inourexperiments,we mentfortoday’sDSPprocessors.WMencodingisanorderof recordeddifferencesupto5%). magnitudefaster,withsmallermemoryfootprints.Theachieved Finally, in order to quantify the robustness of the wa- covertchannelbitratevariesintherangeof0.5–1b/sfor termarking technology with respect to a publicly available andapoolof16differentWMs. benchmark, we show the watermark detection results against Wehavetestedourproposedwatermarkingtechnologyusing the attacks in Stirmark Audio [18]. For that experiment, we acompositionofcommonsoundeditingtoolsandmaliciousat- have selected an audio clip rich in music events (a rhythmic tacks, including all tests defined by the Secure Digital Music latin jazz clip with trombone, piano, and alto-sax solos), Initiative(SDMI)industrycommittee[17].Suchtestsincluded watermarked it, and then detected watermarks in the original, doubleD/A-A/Dconversion,noiseadditionatthe 36dBlevel, themarkedcopy,andall46clipscreatedbytheStirmarkAudio bandpassfiltering,MP3encodingat64and32kbps,time-scale suiteof attacks.Thedetection results arepresentedin TableI. changingofupto 4 ,wowandflutterat0.5%,andechoin- For watermarked clips, we report the minimal correlation sertionofupto100ms.Weusedadatasetof8015-saudioclips, achievedforeachofthetenwatermarksembeddedintheaudio whichincludedjazz,classical,voice,pop,instrumentsolos(ac- clip. For the original clip, we report the maximal correlation cordion,piano,guitar,sax,etc.)androck.Inthatdataset,there value throughout the search for any of the ten watermarks. were no errors and from measured noise levels in the correla- The corresponding correlation value is marked as in tionmetric,weestimatedtheerrorprobabilitytobewellbelow Table I. The detection threshold is set to , which 10 . Error probabilities decrease exponentially fast with the results in an estimated probability of a false positive smaller increaseofWMlength;therefore,itisrelativelyeasytodesign than10 foravarietyofaudioclips.FromTableI,weobserve asystemforerrorprobabilitiesbelow10 ,forexample.Anal- thatallbutoneattackhadonlyminimaleffectonthecorrelation ysisofthesecurityofembeddedWMsispresentedinthenext value.Theonlyattackthatreducedsignificantlythecorrelation section. value (copysample) had a strong impact on the fidelity of the Fig. 10 shows the performance improvements, with the recordingsothattheattackedclipalmostdidnotresemblethe modifications described above, on our benchmark set (con- original.TheparametersoftheStirmarkAudioattackwerethe 1028 IEEETRANSACTIONSONSIGNALPROCESSING,VOL.51,NO.4,APRIL2003 Fig.10. Detectioncomparisonforfourdifferentdetectionsystems(a),(b)withoutand(c),(d)withcepstrumfilteringand(e),(f)withoutand(g),(h)with apermutedcovertcommunicationchannel.Foreachdiagram,thex-axisdepictsthetimelineinMCLTblocks,whereasthey-axisquantifiesthenormalized correlation. same as the ones included in the version of the tool available [19]. Thus, we need to quantify the efficiency of such attacks ontheWeb[18]. anddevisenewmechanismstoprotectagainstthem. Inordertosimplifytheformaldescriptionofblockrepetition codesinouraudioWMcodec,wenowmodifyslightlyourno- IV. SECURITYANALYSIS tation.Themarkedsignal iscreatedbyaddingtheWMwith certainmagnitude totheoriginal We now evaluate the security of our watermarking mech- anisms with respect to the watermark estimation attack. (13) As discussed in the previous section, we introduced block repetition codes and multiple correlation tests to enforce Vectors and have samples, whereas has synchronization for attacks with limited variable scaling. chips,eachofthemreplicatedsuccessively times.TheWM Therefore,inimprovingrobustnessagainstsignaldeformation detector correlates the averages of the central elements of attacks, we introduced a certain amount of redundancy in eachregionmarkedwiththesamechip,wherecommonly, the watermarking pattern. That improves the chances that an ,and .Suchadetectorcantoleratefluctuationin attacker can estimate the WM chips from the marked signal contentscalingupto signalcoefficients. KIROVSKIANDMALVAR:SPREAD-SPECTRUMWATERMARKINGOFAUDIOSIGNALS 1029 TABLE I WATERMARKDETECTIONRESULTSONAUDIOCLIPSATTACKEDWITHTHESTIRMARKAUDIOBENCHMARK.PARAMETERSOFTHE ATTACKSAREINHERITEDFROMTHEVERSIONOFTHETOOLAVAILABLEONLINE The involved block repetition code improves the detection, Corollary 1: The variance of the attacked butitalsoimprovestheefficacyoftheestimationattack.Ifall signaldependson aspresented: details of the embedder are known (except ), the adversary cancomputetheWMestimate,amplifyitwithafactor , and then subtract the amplified attack vector from the marked content [2]. Theorem1: Givenasetof samplesof markedwiththe erf (18) samechip suchthat Proof(sketch): Byreplacing( )in(18)with( (14) sign ),weobtain theoptimalestimate ofthehiddenWMchip isgivenby (19) sign (15) whichproves(18)tobecorrect. Corollary2: Aftertheattack,theexpectedcorrelationvalue computedbytheWMdetectorequals See[2,Lemma1]forproof.Notethat . Theorem 2: The optimal WM estimation, as presented in erf (20) Theorem1,yieldsthefollowingprobabilityofestimationerror perWMchip: with . Fig. 11 demonstrates how and change as in- erfc (16) creasesunderfixed ,with varyingfrom2.5to 6.5. From (20), we compute that in order to draw the expected See[2,Coroll.1]forproof. correlation value to , the attacker has to induce Theestimationattackisperformedbysubtractinganampli- equalto fiedWMestimate fromthemarkedcontent : (21) (17) erf The maximum value of the amplification factor depends If or , the estimation attack adds noise to the solely on the desired level of audibility for the attack. In markedsignal.Partofthisnoiseisanaccurateestimateofthe practice, can be much greater than because the content WM, and it actually reverses the effect of the watermarking marking entity is subject to much more stringent content process.Theremainderofthe attackvectorisappliedinaddi- fidelityconstraintsthananattacker. tiontotheexistingmarkeddata.

Description:
1020. IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 51, NO. book Signal Processing with Lapped Transforms (Boston, MA: Artech
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.