Selected Papers from the ACM Multimedia Conference 2003 ThisspecialissuecomprisessomeoftheoutstandingworkoriginallypresentedattheACMMultimedia Conference2003(ACMMM2003).Theconferencereceived255submissions,ofwhich43high-quality paperswereacceptedforpresentation.Ofthesepapers,theTechnicalProgramChairsinvitedadozen authors to submit enhanced versions of their papers to this special issue. These papers went through a rigorous review process, and we are happy to present four truly outstanding papers in this special issue.Duetothehighlycompetitiveevaluationprocessandlimitedspace,manyexcellentpaperscould notbeacceptedforthisspecialissue.However,someofthemarebeingforwardedforconsiderationas futureregularpapersinthisjournal. The four featured papers in this special issue span research related to (1) multimedia analysis, processing, and retrieval, (2) multimedia networking and systems support, and (3) multimedia tools, end-systems,andapplications.Thepapersaregivenasfollows: • “Real-TimeMultiDepthStreamCompression”authoredbySang-UokKumandKetanMayer-Patel, • “Panoptes: Scalable Low-Power Video Sensor Networking Technologies” authored by Wu-chi Feng, BrianCode,EdKaiser,Wu-changFeng,andMickaelLeBaillif, • “SemanticsandFeatureDiscoveryviaConfidence-basedEnsemble”authoredbyKingshyGoh,Beitao Li,andEdwardY.Chang,and • “Understanding Performance in Coliseum, an Immersive Videoconferencing System”, authored by H. Harlyn Baker, Nina Bhatti, Donald Tanguay, Irwin Sobel, Dan Gelb, Michael E. Goss, W. Bruce Culbertson,andThomasMalzbender. We hope the readers of this special issue find these papers truly interesting and representative of someofthebestworkinthefieldofmultimediain2003! The Guest Editors would like to thank the many authors for their hard work in submitting and preparing the papers for this special issue. We would also like to thank the many reviewers for their important feedback and help in selecting outstanding papers in the field of multimedia from 2003. Lastly, we would like to thank Larry Rowe, Chair ACM Multimedia 2003 and Ramesh Jain, Chair SIGMMfortheirsupportandguidanceinpreparingthisspecialissue. THOMASPLAGEMANN PRASHANTSHENOY JOHNR.SMITH GuestEditorstotheSpecialIssueand ACMMultimedia2003ProgramChairs ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005,Page127. Real-Time Multidepth Stream Compression SANG-UOKKUMandKETANMAYER-PATEL UniversityofNorthCarolina Thegoaloftele-immersionhaslongbeentoenablepeopleatremotelocationstoshareasenseofpresence.Atele-immersion systemacquiresthe3Drepresentationofacollaborator’senvironmentremotelyandsendsitoverthenetworkwhereitisrendered intheuser’senvironment.Acquisition,reconstruction,transmission,andrenderingallhavetobedoneinreal-timetocreatea senseofpresence.Withaddedcommodityhardwareresources,parallelismcanincreasetheacquisitionvolumeandreconstruction dataqualitywhilemaintainingreal-timeperformance.However,thisisnotaseasyforrenderingsinceallofthedataneedtobe combinedintoasingledisplay. Inthisarticle,wepresentanalgorithmtocompressdatafromsuch3Denvironmentsinreal-timetosolvethisimbalance.We presentacompressionalgorithmwhichscalescomparablytotheacquisitionandreconstruction,reducesnetworktransmission bandwidth,andreducestherenderingrequirementforreal-timeperformance.Thisisachievedbyexploitingthecoherencein the3Denvironmentdataandremovingtheminreal-time.Wehavetestedthealgorithmusingastaticofficedatasetaswellas adynamicscene,theresultsofwhicharepresentedinthearticle. Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval— Clustering;I.3.2[ComputerGraphics]:GraphicsSystems—Distributed/networkgraphics;I.3.7[ComputerGraphics]:Three- DimensionalGraphicsandRealism—Virtualreality;I.3.7[ComputerGraphics]:Applications GeneralTerms:Algorithms Additional Key Words and Phrases: Real-time compression, tele-immersion, tele-presence, augmented reality, virtual reality, k-meansalgorithm,k-meansinitialization 1. INTRODUCTION Recently,therehasbeenincreasinginterestintele-immersionsystemsthatcreateasenseofpresence with distant individuals and situations by providing an interactive 3D rendering of remote environ- ments[KauffandSchreer2002;Towlesetal.2002;Bakeretal.2002;Grossetal.2003].The3Dtele- immersionresearchgroupattheUniversityofNorthCarolina,ChapelHill[OfficeoftheFutureProject] togetherwithcollaboratorsattheUniversityofPennsylvania[UniversityofPennsylvaniaGRASPLab], thePittsburghSupercomputingCenter[PittsburghSupercomputingCenter],andAdvancedNetwork andServices,Inc.[AdvancedNetworkandServices,Inc.]havebeenactivelydevelopingtele-immersion systemsforseveralyears. The four major components of a tele-immersion system are scene acquisition, 3D reconstruction, transmission,andrendering.Figure1showsablockdiagramrelatingthesecomponentstoeachother and the overall system. For effective, interactive operation, these four components must accomplish theirtasksinreal-time. ThisworkwassupportedinpartbytheLinkFellowship,andtheNationalScienceFoundation(ANI-0219780,IIS-0121293). Author’saddress:UniversityofNorthCarolinaatChapelHill,CB#3175,SittersonHall,ChapelHill,NC27599-3175;email: {kumsu,kmp}@cs.unc.edu. Permissiontomakedigitalorhardcopiesofpartorallofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovided thatcopiesarenotmadeordistributedforprofitordirectcommercialadvantageandthatcopiesshowthisnoticeonthefirst pageorinitialscreenofadisplayalongwiththefullcitation.CopyrightsforcomponentsofthisworkownedbyothersthanACM mustbehonored.Abstractingwithcreditispermitted.Tocopyotherwise,torepublish,topostonservers,toredistributetolists, ortouseanycomponentofthisworkinotherworksrequirespriorspecificpermissionand/orafee.Permissionsmayberequested fromPublicationsDept.,ACM,Inc.,1515Broadway,NewYork,NY10036USA,fax:+1(212)869-0481,[email protected]. (cid:1)c 2005ACM1551-6857/05/0500-0128$5.00 ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005,Pages128–150. Real-TimeMultidepthStreamCompression • 129 Fig.1. Tele-immersionsystem. Sceneacquisitionisdoneusingmultipledigitalcamerasandcomputers.Multipledigitalcamerasare placedaroundthescenetobereconstructed.Thecamerasarecalibratedandregisteredtoasinglecoor- dinatesystemcalledtheworldcoordinatesystem.Thecomputersareusedtocontrolthecamerasforsyn- chronizedcaptureandtocontrol2Dimagestreamtransfertothe3Dreconstructionsystem.Usingcur- rentcommodityhardware,weareabletocaptureimageswitharesolutionof640×480at15frames/sec. The15frames/seclimitisaresultofthegen-locksynchronizationmechanismemployedbythepartic- ularcameraswehaveandfastercaptureperformancemaybeachievableusingotherproducts. The3Dreconstructionsystemreceivesthecaptured2Dimagestreamsfromtheacquisitionsystem and creates a 3D representation of the scene in real-time. The reconstructed 3D scene is represented bydepthstreams.Adepthstreamisavideostreamaugmentedwithper-pixeldepthinformationfrom the world coordinate system. Multiple input images are used to create a depth stream. The images are rectified and correspondences between the images are found. Using this correspondence informa- tion, disparities at each pixel are computed. The computed disparities and the calibration matrices of the cameras are used to compute the world coordinates of each 3D point. The major bottleneck of thereconstructionisthecorrespondencesearchbetweenimages,whichiscomputationallyexpensive. Fortunately,thisprocesscanbeparallelizedtoachievereal-timeperformance,sinceeachdepthstream computationisindependentoftheothers. Theacquiredremotescenemustbetransmittedtotherenderingsystem.At640×480resolution,each uncompresseddepthstreamrunningat15frames/secneeds—assuming3bytesforcolorand2bytesfor depth—about 184 Mbits/sec of network bandwidth. For 10 depth streams, without data compression, thetotalbandwidthrequiredwouldbe1.84Gbits/sec. Finally,thetransmitteddepthstreamsarerenderedanddisplayedinhead-trackedpassivestereoby therenderingsystem[Chenetal.2000].Sincethedepthstreamsareinworldcoordinates,thusview- independent, they can be rendered from any new viewpoint. The user’s head is tracked to render the depthstreamsfrompreciselytheuser’scurrentviewpointtoprovideasenseofpresence.Ataresolution of640×480,eachframeofeachdepthstreamiscomprisedofapproximately300K3Dpoints.Asystem with10depthstreamswouldrequire90Mpts/secrenderingperformancetoachieve30frames/secview- dependent rendering, which is difficult with currently available commodity hardware. Also rendering isnotaseasilyparallelizedas3Dreconstructionsinceallofthedepthstreamsmustberenderedinto asingleview. Whilethesceneacquisitionand3Dreconstructionprocessescanbeparallelizedbyaddingadditional hardware resources, experience with our initial prototypes indicate that rendering performance and ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005. 130 • S.-U.KumandK.Mayer-Patel transmissionbandwidtharelikelytoremainabottleneck.Ourworkconcentratesonthispossiblebot- tleneck between the reconstruction process and the rendering process. As such, we are not concerned with the 2D image streams captured during acquisition. Instead, we are concerned with the recon- structed3Ddepthstreams.Eachofthesedepthstreamsisconstructedfromaparticularviewpointas if captured by a 3D camera although no actual 3D camera exists. In the rest of this article, the terms image, stream, camera, and viewpoint all refer to the 3D information produced by the reconstruction processwhichincludesbothcoloranddepthonaperpixelbasis. One way to alleviate the network and rendering bottleneck is to exploit coherence between the re- constructed depth streams and remove redundant points. Since multiple cameras acquire a common scene, redundant points exist between the reconstructed depth streams. By identifying and remov- ing these redundant points, the total number of points transmitted to the rendering system is re- ducedwhichreducesnetworkbandwidthandrenderingdemandwhilemaintainingthequalityofthe reconstruction. Sincethereconstructionprocessneedstobedistributedovermanycomputersinordertoachievereal- timeperformance,eachdepthstreamiscreatedatadifferentcomputer.Inordertoremoveredundant points between two depth streams, at least one of the streams must be transmitted to the computer wheretheotherstreamresides.Becauseofthis,wemustbecarefultodistinguishbetweentwodifferent network resources that must be managed. The first is internal network bandwidth. This refers to the bandwidth between computers involved in the reconstruction process. We expect these computers to be locally connected and thus this bandwidth resource is expected to be fairly plentiful (i.e., on the orderof100Mb/sto1Gb/s)butstillfiniteandlimited.Inmanagingthisresource,wemustbecareful about how many of the depth streams need to be locally transmitted in order to remove redundant points. Thesecondnetworkresourceisexternalnetworkbandwidth,whichreferstothebandwidthavailable betweenthereconstructionprocessandtherenderingprocess.Thesetwoprocesseswillnotgenerally be locally connected and will probably traverse the Internet or Internet-2. In this case, bandwidth is expectedtobemorelimitedandtheconcernisremovingasmanyredundantpointsaspossibleinorder toreducetheamountofdatatransmittedtotherenderer. Thisarticlepresentsamodifiedtechniquebasedonourearlierwork[Kumetal.2003]forexploiting coherence between depth streams in order to find and eliminate redundant points. Our contributions include: —A real-time depth stream compression technique. The Group-Based Real-Time Compression algo- rithm presented in this article finds and eliminates redundant points between two or more depth streams. —Adepthstreamcoherencemetric.InordertoefficientlyemployGroup-BasedReal-TimeCompression, we must be able to compute which depth streams are most likely to exhibit strong coherence. We presentanefficientalgorithmforpartitioningdepthstreamsintocoherentgroups. —Anevaluationofourmethods,whichshowsthatwecanremovealargemajorityofredundantpoints andtherebyreduceexternalbandwidthandrenderingrequirementswhileatthesametimelimiting the amount of internal bandwidth required to match what is locally available. Furthermore, since eachdepthstreamiscomparedagainstatmostonlyoneotherdepthstream,real-timeperformance isachievable. Thisarticleisorganizedasfollows:Section2describesbackgroundandrelatedwork.Section3pro- vides an overview of our approach and a comparison with other possible approaches. In Section 4, we presentthecompressionalgorithmindetail.Section5explainshowstreamsarepartitionedintocoher- entgroups.TheresultsarepresentedinSection6,andconclusionsandfutureworkareinSection7. ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005. Real-TimeMultidepthStreamCompression • 131 2. BACKGROUNDANDRELATEDWORK There have been multiple tele-immersion systems built recently. VIRTUE system [Kauff and Schreer 2002] uses stereo-based reconstruction for modeling and the user is tracked for view-dependent ren- dering. However, the display is not in stereo, which reduces the effect of immersion. The Coliseum [Bakeretal.2002]usesanImage-BasedVisualHulls[Matusiketal.2000]methodforreconstruction andisdesignedtosupportalargenumberofusers.However,itusesoneserverforeachparticipantto handletherenderingforallclients,whichincreaseslatencyasthenumberofusersincrease.Aswith the VIRTUE system, it is also not displayed in stereo. The blue-c system [Gross et al. 2003] uses a CAVE[Cruz-Neiraetal.1993]environmentforrenderinganddisplaytocreateanimpressionoftotal immersion. The reconstruction is done using a shape-from-silhouette technique that creates a point based-model. McMillan and Bishop [1995] proposed using a depth image (i.e., an image with color and depth information) to render a scene from new viewpoints by warping the depth image. One major problem with this method is disocclusion artifacts caused when a portion of the scene not visible in the depth image is visible from the new viewpoint. Using multiple depth images from multiple viewpoints can reduce these disocclusion artifacts. Layered Depth Images (LDI) merge multiple depth images into a singledepthimagebykeepingmultipledepthvaluesperpixel[Shadeetal.1998].However,thefixed resolution of an LDI imposes limits on sampling multiple depth images. An LDI tree, an octreee with asingleLDIineachnode,canbeusedtoovercomethislimitation[Changetal.1999]. GrossmanandDally[1998]createmultipledepthimagestomodelanarbitrarysyntheticobject.The depth images are divided into 8 × 8 blocks and redundant blocks are removed. QSplat [Rusinkiewicz andLevoy2000]usesaboundingspherehierarchytogroup3Dscannedpointsforreal-timeprogressive renderingoflargemodels.Surfels[RusinkiewiczandLevoy2000]representobjectsusingatreeofthree orthogonalLDIscalledaLayeredDepthCube(LDC)tree.Alloftheseapproachesonlyhandlestaticdata inwhichcompressionwasdoneonlyonceasapreprocessingstep.Therefore,thesetechniquesarenot suitableforreal-timedynamicenvironmentsinwhichthecompressionhastobedoneforeveryframe. Thevideofragmentsusedintheblue-csystem[Wu¨rmlinetal.2004]areapointbasedrepresentations for dynamic scenes. It exploits spatio-temporal coherence by identifying differential fragments in 2D imagespaceandupdatingthe3Dpointrepresentationofthescene. There have also been efforts to develop special scalable hardware for composite images with depth information[Molnaretal.1992;Stolletal.2001].Therenderingsystemcanbeparallelizedusingthese special hardware by connecting each 3D camera to a rendering PC and then compositing all of the renderedimages.Unfortunatelythesesystemsarenotcommonlyavailableandexpensivetobuild. 3. OVERVIEWANDDESIGNGOALS This section outlines our design goals for the compression algorithm, examines several possible ap- proaches to the problem and gives an overview of the modified Group-Based Real-Time Compression AlgorithmfromKumetal.[2003]. 3.1 DesignGoals Toensureahighqualityrendering,wewillrequirethatthedepthstreamthatmostcloselymatchesthe user’sviewpointatanygiventimeisnotcompressed.Wewillcallthisdepthstreamthemainstream. All points of the main stream are transmitted to the rendering process. Furthermore, a subset of the depth streams are identified as the set of reference streams. The reference streams form a predictive base for detecting and eliminating redundant points and are distributed among the depth streams. Every stream except for the main stream is compared to one or more of the reference streams and ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005. 132 • S.-U.KumandK.Mayer-Patel Fig.2. Examplesofdifferentcompressionalgorithmsanditsreferencestreamtransfer.Themainstreamisinboldandthe arrowsshowthedirectionofreferencestreammovement. redundantpointsareeliminated.Theresultiscalledadifferentialstream.Thesedifferentialstreams andthemainstreamaresenttotherenderingsystem. Ourdesigngoalsforthecompressionalgorithminclude: —Real-TimePerformance.Thecompressionalgorithmneedstobeatleastasfastasthe3Dreconstruc- tionsothereisnodelayinprocessingthestreams. —Scalability. The algorithm needs to scale with the number of depth streams, so that as the number ofdepthstreamsincreasesthenumberofdatapointsdoesnotoverwhelmtherenderingsystem. —Data Reduction. In order to alleviate the rendering bottleneck, the algorithm needs to reduce the numberofdatapointsbyeliminatingasmanyredundantpointsaspossible. —TunableNetworkBandwidth.Distributingreferencestreamstothereconstructionprocesseswillre- quireadditionalnetworkbandwidth.Thealgorithmshouldbetunabletolimitthenetworkbandwidth usedevenasthetotalnumberofdepthstreamsincreases. 3.2 GeneralApproaches Giventherestrictionsanddesigngoalsoutlinedabove,thereareanumberofgeneralapproachesthat maybeincorporatedintooursolution. 3.2.1 StreamIndependentTemporalCompression. Onepossibleapproachistocompresseachstream independentlyusingtemporalcoherence.Withsuchanapproach,eachstreamactsasitsownreference stream.Exploitingtemporalcoherencefortraditionalvideotypesisknowntoresultingoodcompression for real-time applications. This compression scheme scales well, and requires no additional network bandwidth since there is no need to communicate reference streams among the reconstruction pro- cesses.However,thiscompressionschemedoesnotreducethenumberofdatapointsthattherenderer mustrendereachframe.Therenderermustrenderallredundantpointsfromthepreviousframewith thenonredundantpointsofthecurrentframe. 3.2.2 Best-InterstreamCompression. Thebestpossibleinterstreamcompressionwouldbetoremove allredundantpointsfromallstreamsbyusingeverystreamasapossiblereferencestream.Thiscould beaccomplishedinthefollowingway.Thefirststreamsendsallofitsdatapointstotherenderingsystem andtoallotherreconstructionprocessesasareferencestream.Thesecondstreamusesthefirststream asareferencestream,creatingadifferentialstreamwhichitalsodistributestotheotherreconstruction processesasareferencestream.Thethirdstreamreceivesthefirsttwostreamsasreferencestreamsin ordertocreateitsdifferentialstream,andsoon,continuinguntilthelaststreamusesallotherstreams as reference streams (Figure 2(a)). This is the best possible interstream compression since it has no redundantpoints.Thedrawbackstothisapproach,however,aresevere.Moststreamsinthisapproach ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005. Real-TimeMultidepthStreamCompression • 133 requiremultiplereferencestreamswithatleastonestreamusingallotherstreamsasreferences.This dramaticallyincreasescomputationrequirementsandmakesrealizingareal-timeimplementationvery difficult.Alsothenumberofreferencestreamsbroadcastisdependentonthenumberofstreams.Thus, thenetworkbandwidthrequiredwillincreaseasthenumberofstreamsincreases,limitingscalability ofthe3Dcameras. 3.2.3 Single Reference Stream Compression. Another approach is to use the main stream as the referencestreamforallotherstreams(Figure2(b)).Thisdoesnotrequireadditionalnetworkbandwidth as more streams are added since there is always only one reference stream. Real-time operation is feasiblesinceallotherstreamsarecomparedagainstonlyonereferencestream.Amaindisadvantage of this approach is possibly poor data compression. The coherence between the main stream and the depthstreamsthatuseitasareferencestreamwilldiminishastheviewpointsofthestreamsdiverge. Furthermore,thedepthstreamsfromtwonearbyviewpointsmaycontainredundantpointswhichare notremovedbyusingthemainstreamastheonlyreference. 3.2.4 Nearest Neighbors as Reference Stream Compression. Another approach is for each depth stream to select the closest neighboring depth stream as the reference stream to achieve better com- pression.Thestreamscanbelinearlysortedsuchthatneighboringstreamsinthelisthaveviewpoints that are close to each other. From this sorted list of streams, the streams left of the main stream use the right neighboring stream as its reference stream, and the streams right of the main stream uses the left neighboring stream as its reference stream (Figure 2(c)). With this scheme, every stream has onereferencestreamregardlessofthetotalnumberofstreams.Thecompressionratedependsonthe number of points that appear in nonneighboring streams but not in neighboring streams since these points will be redundant in the final result. Since the streams are sorted by viewpoint, the number ofredundantpointsinnonneighboringstreamsbutnotinneighboringstreamsshouldbesmallwhich makesthecompressioncomparabletothepreviouslymentionedBest-InterstreamCompressionmethod. However,thenetworkbandwidthdemandforthiscompressionschemeishigh.Fornstreamsthereare generallyn−2referencestreamstodistribute,againlimitingscalabilityof3Dcameras. 3.3 OverviewofGroup-BasedReal-TimeCompression The Group-Based Real-Time Compression tries to balance compression efficiency and network band- widthrequirementsbylimitingthenumberofreferencestreamstoaconfigurablelimitandgrouping streamstogetherbasedonwhichofthesestreamsservesasthebestreferencestreamtouse.Allstreams aredividedintogroupssuchthateachstreamispartofonlyonegroup.Eachgrouphasacenterstream thatisarepresentativeofthegroupandsubstreams(i.e.,allotherstreamsinthegroup).Streampar- titioningandcenterstreamselectionisdoneasapreprocessingstepsincetheacquisitioncamerasdo notmove.Themainstreamandthecenterstreamscomprisethesetofpossiblereferencestreams.Thus thenumberofreferencestreamsdistributedequalsthenumberofgroupscreatedplusone—themain stream.Adifferentialstreamiscreatedforeachstreamusingthereferencestreamthatwillmostlikely yieldthebestcompression.Sincethenumberofreferencestreamsislimitedtothenumberofgroups, newstreamscanbeaddedwithoutincreasingthereferencestreamnetworktrafficaslongasthenum- berofgroupsremainsthesame.Becausethenumberofgroupsisaconfigurablesystemparameter,the amountofnetworktrafficgeneratedbydistributingreferencestreamscanbeengineeredtomatchthe availablenetworkbandwidth.Alsoeachstreamonlyusesonereferencestreamtocreateitsdifferential frame, which makes real-time operation feasible. The difference from the algorithm presented in this paper and the one from our earlier work [Kum et al. 2003] is that the center streams use the closest neighboringcenterstreamasthereferencestream,notthecenterstreamofthemainstream’sgroup. This results in better compression for the center streams since, most of the time, closer streams have ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005. 134 • S.-U.KumandK.Mayer-Patel moreredundantpoints.ThecompressionalgorithmisdescribedinmoredetailinSection4.Section5 detailshowstreamsarepartitionedintogroupsandthecenterstreamforeachgroupselected. 4. STREAMCOMPRESSION This section details how depth streams are compressed in real-time. First, we detail how reference streams are selected for each stream, and then discuss how these streams are compressed using the selectedreferencestream. 4.1 ReferenceStreamSelection IntheGroup-BasedReal-TimeCompression,alldepthstreamsarepartitionedintodisjointgroups.The number of groups created is determined by the network bandwidth. Each group has a center stream, which best represents the group, and sub streams—depth streams in a group that are not the center stream. Furthermore,onestreamisselectedasthemainstreamforwhichnocompressionisdone.Thedepth streamviewpointwiththeshortestEuclidiandistancetotheuserischosenasthemainstreamsince itbestrepresentstheuser’sviewpoint. The group containing the main stream is called the main group and all other groups are referred to as a sub group. Once the main stream has been selected, the reference stream for all streams are selectedasfollows: —Forthemainstream,noreferencestreamisneeded. —Forthecenterstreamofthemaingroup,themainstreamisusedasthereferencestream. —Forthecenterstreamsofthesubgroups,thenearestcenterstreamisusedasthereferencestream. The center streams can be linearly sorted such that neighboring center streams in the list have viewpointsthatareclosetoeachother.Fromthissortedlistofcenterstreams,thecenterstreamsleft of the main stream use the right neighboring center stream as its reference stream, and the center streams right of the main stream uses the left neighboring center stream as its reference stream. ThisdiffersfromthealgorithmgiveninKumetal.[2003]sinceitdoesnotusethecenterstreamof the main group as the reference streams. It also compresses better than Kum et al. [2003] because twoneighboringstreamsusuallyhavemoreredundantpointsthentwonon-neighboringstreams. —Foranyothersubstream,thecenterstreamofitsgroupisusedasthereferencestream. Figure3showsanexamplewith12streamsand4groups.Stream5isthemainstream,whichmakes Group2themaingroup.Streams1,4,7and10arethecenterstreamsforitsgroup,andarenumbered insequentialorder.SinceStream5isthemainstreamitdoesnothaveanyreferencestreams.Stream4 is the center stream of the main group and uses the main stream (Stream 5) as its reference stream. Thecenterstreamofsubgroups—Stream1,7,and10—usethenearestcenterstreamasthereference stream,Stream1and7useStream4,andStream10usesStream7.Allsubstreamsusetheirgroup’s centerstreamsasitsreferencestream—Stream2and3ofGroup1useStream1,Stream6ofGroup2 usesStream4,Stream8and9ofGroup3useStream7,andStream11and12ofGroup4useStream10. Thearrowsshowthedirectionofthereferencestreamdistribution. 4.2 DifferentialStreamConstruction To construct a differential stream, the data points of a depth stream are compared to the data points within the reference stream. Points that are within some given distance threshold are removed from thedepthstream. Theformatofthedifferentialstreamisdifferentfromtheoriginalstreamformat.Theoriginalstream hasfivebytes,threebytesforcolorandtwobytesfordepth,foreachdatapoint.Thedifferentialstream ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005. Real-TimeMultidepthStreamCompression • 135 Fig.3. AnexampleofreferencestreamdistributionforGroup-BasedReal-TimeCompression. has five bytes for only the non-redundant points (i.e., points not removed) and a bitmask to indicate which points have been retained and which points have been eliminated. If the bit value is ‘0’ then the data point represented by the bit is a redundant point and is removed. If the bit value is ‘1’, the correspondingpointisincluded.Theorderofdatafornonredundantpointsisthesameastheorderit appears in the bitmask. This format reduces the size of a frame in the differential stream by 39 bits, fivebytesminusonebit,forredundantpointsandaddsonebitfornon-redundantpoints.Soforadepth stream of 640×480 resolution with a 5 to 1 redundancy ratio (i.e., 80% of data points are deemed redundant), the size of a frame for the stream is reduced from 1.536MB to 346KB—approximately 5 to1. 5. STREAMPARTITION In this section, we present an algorithm for stream partitioning and center stream selection. As dis- cussed in Section 4, the streams need to be partitioned into groups and the center stream of each groupselectedbeforeruntime.Sincereferencestreamselectionisdependentonthisstreampartition- ingprocess,italsoeffectsstreamcompressionefficiency.Therefore,streamsshouldbepartitionedinto groupssuchthatthemostredundantpointsareremoved.InSection5.1,wepresenteffectivecriteria topartitionnstreamsintok groupsandtofindtheappropriatecenterstreamineachgroup.Weshow howthesemetricscanbeusedtopartitionthestreamsandtoselectcenterstreamsinSection5.2.In Section5.3,themetricsareusedtodevelopanefficientapproximatealgorithmforstreampartitioning andcenterstreamselectionwhennistoolargeforanexhaustiveapproach. 5.1 CoherenceMetrics Streampartitioningandselectionofcenterstreamshasadirectimpactoncompressionsinceallsub- streamsofagroupusethecenterstreamofthegroupasthereferencestream.Therefore,thepartitioning shouldensurethateachstreambelongstoagroupwherethevolumeoverlapbetweenthestreamand thegroupcenterstreamismaximized. However, exact calculation of the volume overlap between two streams is expensive. Thus, in this article, we use the angle between the view directions of two depth streams as an approximation of ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005. 136 • S.-U.KumandK.Mayer-Patel Fig.4. Percentageofredundantpointsofastreaminthereferencestreamvs.theanglebetweenthetwostreams.Thestreams arefromthe3DcameraconfigurationsofFigure9. the overlapped volume. Empirically, the view directions of two streams are a good estimate for how muchthetwostreamvolumesoverlap.Thesmallertheangle,thebiggertheoverlap.Thisisshownin Figure4. Thelocalsquaredanglesum(LSAS)isdefinedforstreamS asthesumofthesquaredanglebetween i stream S and all other streams in its group (Eq. (1)). This is used as the center stream selection i criterion.ThestreamwiththelowestLSASofthegroupischosentobethecenterstream. (cid:1)nk LSAS = [angleof(S,S )]2 (1) i i j j=1 wherestream S and S isingroupk,andn isthenumberofstreamsingroupk. i j k The group squared angle sum (GSAS), defined for a given group, is the sum of the squared angle between the group’s center stream and every substream in the group (Eq. (2)). This is used as the partitioning criterion for partitioning nstreams into k groups. The sum of all GSAS’s for a particular partition(Eq.(3))isdefinedasthetotalsquaredanglesum(TSAS).Weareseekingthepartitionthat minimizesTSAS. (cid:1)nj GSAS = [angleof(C ,S )]2 (2) j j ji i=1 where C is the center stream in group j, S is a sub stream in group j, and n is the number of j ji j substreamsingroup j. (cid:1)k TSAS= GSAS (3) i i=1 wherek isthenumberofgroups. Finally, the central squared angle sum (CSAS) is defined as the sum of the squared angle between allcenterstreams(Eq.(4)).ThestreamsshouldbepartitionedsuchthatCSASisalsominimal,since allcenterstreamsuseeachotherasreferences.However,itshouldbenotedthatminimizingTSASis ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005.