Table Of ContentSelected Papers from the ACM Multimedia
Conference 2003
ThisspecialissuecomprisessomeoftheoutstandingworkoriginallypresentedattheACMMultimedia
Conference2003(ACMMM2003).Theconferencereceived255submissions,ofwhich43high-quality
paperswereacceptedforpresentation.Ofthesepapers,theTechnicalProgramChairsinvitedadozen
authors to submit enhanced versions of their papers to this special issue. These papers went through
a rigorous review process, and we are happy to present four truly outstanding papers in this special
issue.Duetothehighlycompetitiveevaluationprocessandlimitedspace,manyexcellentpaperscould
notbeacceptedforthisspecialissue.However,someofthemarebeingforwardedforconsiderationas
futureregularpapersinthisjournal.
The four featured papers in this special issue span research related to (1) multimedia analysis,
processing, and retrieval, (2) multimedia networking and systems support, and (3) multimedia tools,
end-systems,andapplications.Thepapersaregivenasfollows:
• “Real-TimeMultiDepthStreamCompression”authoredbySang-UokKumandKetanMayer-Patel,
• “Panoptes: Scalable Low-Power Video Sensor Networking Technologies” authored by Wu-chi Feng,
BrianCode,EdKaiser,Wu-changFeng,andMickaelLeBaillif,
• “SemanticsandFeatureDiscoveryviaConfidence-basedEnsemble”authoredbyKingshyGoh,Beitao
Li,andEdwardY.Chang,and
• “Understanding Performance in Coliseum, an Immersive Videoconferencing System”, authored by
H. Harlyn Baker, Nina Bhatti, Donald Tanguay, Irwin Sobel, Dan Gelb, Michael E. Goss, W. Bruce
Culbertson,andThomasMalzbender.
We hope the readers of this special issue find these papers truly interesting and representative of
someofthebestworkinthefieldofmultimediain2003!
The Guest Editors would like to thank the many authors for their hard work in submitting and
preparing the papers for this special issue. We would also like to thank the many reviewers for their
important feedback and help in selecting outstanding papers in the field of multimedia from 2003.
Lastly, we would like to thank Larry Rowe, Chair ACM Multimedia 2003 and Ramesh Jain, Chair
SIGMMfortheirsupportandguidanceinpreparingthisspecialissue.
THOMASPLAGEMANN
PRASHANTSHENOY
JOHNR.SMITH
GuestEditorstotheSpecialIssueand
ACMMultimedia2003ProgramChairs
ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005,Page127.
Real-Time Multidepth Stream Compression
SANG-UOKKUMandKETANMAYER-PATEL
UniversityofNorthCarolina
Thegoaloftele-immersionhaslongbeentoenablepeopleatremotelocationstoshareasenseofpresence.Atele-immersion
systemacquiresthe3Drepresentationofacollaborator’senvironmentremotelyandsendsitoverthenetworkwhereitisrendered
intheuser’senvironment.Acquisition,reconstruction,transmission,andrenderingallhavetobedoneinreal-timetocreatea
senseofpresence.Withaddedcommodityhardwareresources,parallelismcanincreasetheacquisitionvolumeandreconstruction
dataqualitywhilemaintainingreal-timeperformance.However,thisisnotaseasyforrenderingsinceallofthedataneedtobe
combinedintoasingledisplay.
Inthisarticle,wepresentanalgorithmtocompressdatafromsuch3Denvironmentsinreal-timetosolvethisimbalance.We
presentacompressionalgorithmwhichscalescomparablytotheacquisitionandreconstruction,reducesnetworktransmission
bandwidth,andreducestherenderingrequirementforreal-timeperformance.Thisisachievedbyexploitingthecoherencein
the3Denvironmentdataandremovingtheminreal-time.Wehavetestedthealgorithmusingastaticofficedatasetaswellas
adynamicscene,theresultsofwhicharepresentedinthearticle.
Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval—
Clustering;I.3.2[ComputerGraphics]:GraphicsSystems—Distributed/networkgraphics;I.3.7[ComputerGraphics]:Three-
DimensionalGraphicsandRealism—Virtualreality;I.3.7[ComputerGraphics]:Applications
GeneralTerms:Algorithms
Additional Key Words and Phrases: Real-time compression, tele-immersion, tele-presence, augmented reality, virtual reality,
k-meansalgorithm,k-meansinitialization
1. INTRODUCTION
Recently,therehasbeenincreasinginterestintele-immersionsystemsthatcreateasenseofpresence
with distant individuals and situations by providing an interactive 3D rendering of remote environ-
ments[KauffandSchreer2002;Towlesetal.2002;Bakeretal.2002;Grossetal.2003].The3Dtele-
immersionresearchgroupattheUniversityofNorthCarolina,ChapelHill[OfficeoftheFutureProject]
togetherwithcollaboratorsattheUniversityofPennsylvania[UniversityofPennsylvaniaGRASPLab],
thePittsburghSupercomputingCenter[PittsburghSupercomputingCenter],andAdvancedNetwork
andServices,Inc.[AdvancedNetworkandServices,Inc.]havebeenactivelydevelopingtele-immersion
systemsforseveralyears.
The four major components of a tele-immersion system are scene acquisition, 3D reconstruction,
transmission,andrendering.Figure1showsablockdiagramrelatingthesecomponentstoeachother
and the overall system. For effective, interactive operation, these four components must accomplish
theirtasksinreal-time.
ThisworkwassupportedinpartbytheLinkFellowship,andtheNationalScienceFoundation(ANI-0219780,IIS-0121293).
Author’saddress:UniversityofNorthCarolinaatChapelHill,CB#3175,SittersonHall,ChapelHill,NC27599-3175;email:
{kumsu,kmp}@cs.unc.edu.
Permissiontomakedigitalorhardcopiesofpartorallofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovided
thatcopiesarenotmadeordistributedforprofitordirectcommercialadvantageandthatcopiesshowthisnoticeonthefirst
pageorinitialscreenofadisplayalongwiththefullcitation.CopyrightsforcomponentsofthisworkownedbyothersthanACM
mustbehonored.Abstractingwithcreditispermitted.Tocopyotherwise,torepublish,topostonservers,toredistributetolists,
ortouseanycomponentofthisworkinotherworksrequirespriorspecificpermissionand/orafee.Permissionsmayberequested
fromPublicationsDept.,ACM,Inc.,1515Broadway,NewYork,NY10036USA,fax:+1(212)869-0481,orpermissions@acm.org.
(cid:1)c 2005ACM1551-6857/05/0500-0128$5.00
ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005,Pages128–150.
Real-TimeMultidepthStreamCompression • 129
Fig.1. Tele-immersionsystem.
Sceneacquisitionisdoneusingmultipledigitalcamerasandcomputers.Multipledigitalcamerasare
placedaroundthescenetobereconstructed.Thecamerasarecalibratedandregisteredtoasinglecoor-
dinatesystemcalledtheworldcoordinatesystem.Thecomputersareusedtocontrolthecamerasforsyn-
chronizedcaptureandtocontrol2Dimagestreamtransfertothe3Dreconstructionsystem.Usingcur-
rentcommodityhardware,weareabletocaptureimageswitharesolutionof640×480at15frames/sec.
The15frames/seclimitisaresultofthegen-locksynchronizationmechanismemployedbythepartic-
ularcameraswehaveandfastercaptureperformancemaybeachievableusingotherproducts.
The3Dreconstructionsystemreceivesthecaptured2Dimagestreamsfromtheacquisitionsystem
and creates a 3D representation of the scene in real-time. The reconstructed 3D scene is represented
bydepthstreams.Adepthstreamisavideostreamaugmentedwithper-pixeldepthinformationfrom
the world coordinate system. Multiple input images are used to create a depth stream. The images
are rectified and correspondences between the images are found. Using this correspondence informa-
tion, disparities at each pixel are computed. The computed disparities and the calibration matrices
of the cameras are used to compute the world coordinates of each 3D point. The major bottleneck of
thereconstructionisthecorrespondencesearchbetweenimages,whichiscomputationallyexpensive.
Fortunately,thisprocesscanbeparallelizedtoachievereal-timeperformance,sinceeachdepthstream
computationisindependentoftheothers.
Theacquiredremotescenemustbetransmittedtotherenderingsystem.At640×480resolution,each
uncompresseddepthstreamrunningat15frames/secneeds—assuming3bytesforcolorand2bytesfor
depth—about 184 Mbits/sec of network bandwidth. For 10 depth streams, without data compression,
thetotalbandwidthrequiredwouldbe1.84Gbits/sec.
Finally,thetransmitteddepthstreamsarerenderedanddisplayedinhead-trackedpassivestereoby
therenderingsystem[Chenetal.2000].Sincethedepthstreamsareinworldcoordinates,thusview-
independent, they can be rendered from any new viewpoint. The user’s head is tracked to render the
depthstreamsfrompreciselytheuser’scurrentviewpointtoprovideasenseofpresence.Ataresolution
of640×480,eachframeofeachdepthstreamiscomprisedofapproximately300K3Dpoints.Asystem
with10depthstreamswouldrequire90Mpts/secrenderingperformancetoachieve30frames/secview-
dependent rendering, which is difficult with currently available commodity hardware. Also rendering
isnotaseasilyparallelizedas3Dreconstructionsinceallofthedepthstreamsmustberenderedinto
asingleview.
Whilethesceneacquisitionand3Dreconstructionprocessescanbeparallelizedbyaddingadditional
hardware resources, experience with our initial prototypes indicate that rendering performance and
ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005.
130 • S.-U.KumandK.Mayer-Patel
transmissionbandwidtharelikelytoremainabottleneck.Ourworkconcentratesonthispossiblebot-
tleneck between the reconstruction process and the rendering process. As such, we are not concerned
with the 2D image streams captured during acquisition. Instead, we are concerned with the recon-
structed3Ddepthstreams.Eachofthesedepthstreamsisconstructedfromaparticularviewpointas
if captured by a 3D camera although no actual 3D camera exists. In the rest of this article, the terms
image, stream, camera, and viewpoint all refer to the 3D information produced by the reconstruction
processwhichincludesbothcoloranddepthonaperpixelbasis.
One way to alleviate the network and rendering bottleneck is to exploit coherence between the re-
constructed depth streams and remove redundant points. Since multiple cameras acquire a common
scene, redundant points exist between the reconstructed depth streams. By identifying and remov-
ing these redundant points, the total number of points transmitted to the rendering system is re-
ducedwhichreducesnetworkbandwidthandrenderingdemandwhilemaintainingthequalityofthe
reconstruction.
Sincethereconstructionprocessneedstobedistributedovermanycomputersinordertoachievereal-
timeperformance,eachdepthstreamiscreatedatadifferentcomputer.Inordertoremoveredundant
points between two depth streams, at least one of the streams must be transmitted to the computer
wheretheotherstreamresides.Becauseofthis,wemustbecarefultodistinguishbetweentwodifferent
network resources that must be managed. The first is internal network bandwidth. This refers to the
bandwidth between computers involved in the reconstruction process. We expect these computers to
be locally connected and thus this bandwidth resource is expected to be fairly plentiful (i.e., on the
orderof100Mb/sto1Gb/s)butstillfiniteandlimited.Inmanagingthisresource,wemustbecareful
about how many of the depth streams need to be locally transmitted in order to remove redundant
points.
Thesecondnetworkresourceisexternalnetworkbandwidth,whichreferstothebandwidthavailable
betweenthereconstructionprocessandtherenderingprocess.Thesetwoprocesseswillnotgenerally
be locally connected and will probably traverse the Internet or Internet-2. In this case, bandwidth is
expectedtobemorelimitedandtheconcernisremovingasmanyredundantpointsaspossibleinorder
toreducetheamountofdatatransmittedtotherenderer.
Thisarticlepresentsamodifiedtechniquebasedonourearlierwork[Kumetal.2003]forexploiting
coherence between depth streams in order to find and eliminate redundant points. Our contributions
include:
—A real-time depth stream compression technique. The Group-Based Real-Time Compression algo-
rithm presented in this article finds and eliminates redundant points between two or more depth
streams.
—Adepthstreamcoherencemetric.InordertoefficientlyemployGroup-BasedReal-TimeCompression,
we must be able to compute which depth streams are most likely to exhibit strong coherence. We
presentanefficientalgorithmforpartitioningdepthstreamsintocoherentgroups.
—Anevaluationofourmethods,whichshowsthatwecanremovealargemajorityofredundantpoints
andtherebyreduceexternalbandwidthandrenderingrequirementswhileatthesametimelimiting
the amount of internal bandwidth required to match what is locally available. Furthermore, since
eachdepthstreamiscomparedagainstatmostonlyoneotherdepthstream,real-timeperformance
isachievable.
Thisarticleisorganizedasfollows:Section2describesbackgroundandrelatedwork.Section3pro-
vides an overview of our approach and a comparison with other possible approaches. In Section 4, we
presentthecompressionalgorithmindetail.Section5explainshowstreamsarepartitionedintocoher-
entgroups.TheresultsarepresentedinSection6,andconclusionsandfutureworkareinSection7.
ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005.
Real-TimeMultidepthStreamCompression • 131
2. BACKGROUNDANDRELATEDWORK
There have been multiple tele-immersion systems built recently. VIRTUE system [Kauff and Schreer
2002] uses stereo-based reconstruction for modeling and the user is tracked for view-dependent ren-
dering. However, the display is not in stereo, which reduces the effect of immersion. The Coliseum
[Bakeretal.2002]usesanImage-BasedVisualHulls[Matusiketal.2000]methodforreconstruction
andisdesignedtosupportalargenumberofusers.However,itusesoneserverforeachparticipantto
handletherenderingforallclients,whichincreaseslatencyasthenumberofusersincrease.Aswith
the VIRTUE system, it is also not displayed in stereo. The blue-c system [Gross et al. 2003] uses a
CAVE[Cruz-Neiraetal.1993]environmentforrenderinganddisplaytocreateanimpressionoftotal
immersion. The reconstruction is done using a shape-from-silhouette technique that creates a point
based-model.
McMillan and Bishop [1995] proposed using a depth image (i.e., an image with color and depth
information) to render a scene from new viewpoints by warping the depth image. One major problem
with this method is disocclusion artifacts caused when a portion of the scene not visible in the depth
image is visible from the new viewpoint. Using multiple depth images from multiple viewpoints can
reduce these disocclusion artifacts. Layered Depth Images (LDI) merge multiple depth images into a
singledepthimagebykeepingmultipledepthvaluesperpixel[Shadeetal.1998].However,thefixed
resolution of an LDI imposes limits on sampling multiple depth images. An LDI tree, an octreee with
asingleLDIineachnode,canbeusedtoovercomethislimitation[Changetal.1999].
GrossmanandDally[1998]createmultipledepthimagestomodelanarbitrarysyntheticobject.The
depth images are divided into 8 × 8 blocks and redundant blocks are removed. QSplat [Rusinkiewicz
andLevoy2000]usesaboundingspherehierarchytogroup3Dscannedpointsforreal-timeprogressive
renderingoflargemodels.Surfels[RusinkiewiczandLevoy2000]representobjectsusingatreeofthree
orthogonalLDIscalledaLayeredDepthCube(LDC)tree.Alloftheseapproachesonlyhandlestaticdata
inwhichcompressionwasdoneonlyonceasapreprocessingstep.Therefore,thesetechniquesarenot
suitableforreal-timedynamicenvironmentsinwhichthecompressionhastobedoneforeveryframe.
Thevideofragmentsusedintheblue-csystem[Wu¨rmlinetal.2004]areapointbasedrepresentations
for dynamic scenes. It exploits spatio-temporal coherence by identifying differential fragments in 2D
imagespaceandupdatingthe3Dpointrepresentationofthescene.
There have also been efforts to develop special scalable hardware for composite images with depth
information[Molnaretal.1992;Stolletal.2001].Therenderingsystemcanbeparallelizedusingthese
special hardware by connecting each 3D camera to a rendering PC and then compositing all of the
renderedimages.Unfortunatelythesesystemsarenotcommonlyavailableandexpensivetobuild.
3. OVERVIEWANDDESIGNGOALS
This section outlines our design goals for the compression algorithm, examines several possible ap-
proaches to the problem and gives an overview of the modified Group-Based Real-Time Compression
AlgorithmfromKumetal.[2003].
3.1 DesignGoals
Toensureahighqualityrendering,wewillrequirethatthedepthstreamthatmostcloselymatchesthe
user’sviewpointatanygiventimeisnotcompressed.Wewillcallthisdepthstreamthemainstream.
All points of the main stream are transmitted to the rendering process. Furthermore, a subset of the
depth streams are identified as the set of reference streams. The reference streams form a predictive
base for detecting and eliminating redundant points and are distributed among the depth streams.
Every stream except for the main stream is compared to one or more of the reference streams and
ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005.
132 • S.-U.KumandK.Mayer-Patel
Fig.2. Examplesofdifferentcompressionalgorithmsanditsreferencestreamtransfer.Themainstreamisinboldandthe
arrowsshowthedirectionofreferencestreammovement.
redundantpointsareeliminated.Theresultiscalledadifferentialstream.Thesedifferentialstreams
andthemainstreamaresenttotherenderingsystem.
Ourdesigngoalsforthecompressionalgorithminclude:
—Real-TimePerformance.Thecompressionalgorithmneedstobeatleastasfastasthe3Dreconstruc-
tionsothereisnodelayinprocessingthestreams.
—Scalability. The algorithm needs to scale with the number of depth streams, so that as the number
ofdepthstreamsincreasesthenumberofdatapointsdoesnotoverwhelmtherenderingsystem.
—Data Reduction. In order to alleviate the rendering bottleneck, the algorithm needs to reduce the
numberofdatapointsbyeliminatingasmanyredundantpointsaspossible.
—TunableNetworkBandwidth.Distributingreferencestreamstothereconstructionprocesseswillre-
quireadditionalnetworkbandwidth.Thealgorithmshouldbetunabletolimitthenetworkbandwidth
usedevenasthetotalnumberofdepthstreamsincreases.
3.2 GeneralApproaches
Giventherestrictionsanddesigngoalsoutlinedabove,thereareanumberofgeneralapproachesthat
maybeincorporatedintooursolution.
3.2.1 StreamIndependentTemporalCompression. Onepossibleapproachistocompresseachstream
independentlyusingtemporalcoherence.Withsuchanapproach,eachstreamactsasitsownreference
stream.Exploitingtemporalcoherencefortraditionalvideotypesisknowntoresultingoodcompression
for real-time applications. This compression scheme scales well, and requires no additional network
bandwidth since there is no need to communicate reference streams among the reconstruction pro-
cesses.However,thiscompressionschemedoesnotreducethenumberofdatapointsthattherenderer
mustrendereachframe.Therenderermustrenderallredundantpointsfromthepreviousframewith
thenonredundantpointsofthecurrentframe.
3.2.2 Best-InterstreamCompression. Thebestpossibleinterstreamcompressionwouldbetoremove
allredundantpointsfromallstreamsbyusingeverystreamasapossiblereferencestream.Thiscould
beaccomplishedinthefollowingway.Thefirststreamsendsallofitsdatapointstotherenderingsystem
andtoallotherreconstructionprocessesasareferencestream.Thesecondstreamusesthefirststream
asareferencestream,creatingadifferentialstreamwhichitalsodistributestotheotherreconstruction
processesasareferencestream.Thethirdstreamreceivesthefirsttwostreamsasreferencestreamsin
ordertocreateitsdifferentialstream,andsoon,continuinguntilthelaststreamusesallotherstreams
as reference streams (Figure 2(a)). This is the best possible interstream compression since it has no
redundantpoints.Thedrawbackstothisapproach,however,aresevere.Moststreamsinthisapproach
ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005.
Real-TimeMultidepthStreamCompression • 133
requiremultiplereferencestreamswithatleastonestreamusingallotherstreamsasreferences.This
dramaticallyincreasescomputationrequirementsandmakesrealizingareal-timeimplementationvery
difficult.Alsothenumberofreferencestreamsbroadcastisdependentonthenumberofstreams.Thus,
thenetworkbandwidthrequiredwillincreaseasthenumberofstreamsincreases,limitingscalability
ofthe3Dcameras.
3.2.3 Single Reference Stream Compression. Another approach is to use the main stream as the
referencestreamforallotherstreams(Figure2(b)).Thisdoesnotrequireadditionalnetworkbandwidth
as more streams are added since there is always only one reference stream. Real-time operation is
feasiblesinceallotherstreamsarecomparedagainstonlyonereferencestream.Amaindisadvantage
of this approach is possibly poor data compression. The coherence between the main stream and the
depthstreamsthatuseitasareferencestreamwilldiminishastheviewpointsofthestreamsdiverge.
Furthermore,thedepthstreamsfromtwonearbyviewpointsmaycontainredundantpointswhichare
notremovedbyusingthemainstreamastheonlyreference.
3.2.4 Nearest Neighbors as Reference Stream Compression. Another approach is for each depth
stream to select the closest neighboring depth stream as the reference stream to achieve better com-
pression.Thestreamscanbelinearlysortedsuchthatneighboringstreamsinthelisthaveviewpoints
that are close to each other. From this sorted list of streams, the streams left of the main stream use
the right neighboring stream as its reference stream, and the streams right of the main stream uses
the left neighboring stream as its reference stream (Figure 2(c)). With this scheme, every stream has
onereferencestreamregardlessofthetotalnumberofstreams.Thecompressionratedependsonthe
number of points that appear in nonneighboring streams but not in neighboring streams since these
points will be redundant in the final result. Since the streams are sorted by viewpoint, the number
ofredundantpointsinnonneighboringstreamsbutnotinneighboringstreamsshouldbesmallwhich
makesthecompressioncomparabletothepreviouslymentionedBest-InterstreamCompressionmethod.
However,thenetworkbandwidthdemandforthiscompressionschemeishigh.Fornstreamsthereare
generallyn−2referencestreamstodistribute,againlimitingscalabilityof3Dcameras.
3.3 OverviewofGroup-BasedReal-TimeCompression
The Group-Based Real-Time Compression tries to balance compression efficiency and network band-
widthrequirementsbylimitingthenumberofreferencestreamstoaconfigurablelimitandgrouping
streamstogetherbasedonwhichofthesestreamsservesasthebestreferencestreamtouse.Allstreams
aredividedintogroupssuchthateachstreamispartofonlyonegroup.Eachgrouphasacenterstream
thatisarepresentativeofthegroupandsubstreams(i.e.,allotherstreamsinthegroup).Streampar-
titioningandcenterstreamselectionisdoneasapreprocessingstepsincetheacquisitioncamerasdo
notmove.Themainstreamandthecenterstreamscomprisethesetofpossiblereferencestreams.Thus
thenumberofreferencestreamsdistributedequalsthenumberofgroupscreatedplusone—themain
stream.Adifferentialstreamiscreatedforeachstreamusingthereferencestreamthatwillmostlikely
yieldthebestcompression.Sincethenumberofreferencestreamsislimitedtothenumberofgroups,
newstreamscanbeaddedwithoutincreasingthereferencestreamnetworktrafficaslongasthenum-
berofgroupsremainsthesame.Becausethenumberofgroupsisaconfigurablesystemparameter,the
amountofnetworktrafficgeneratedbydistributingreferencestreamscanbeengineeredtomatchthe
availablenetworkbandwidth.Alsoeachstreamonlyusesonereferencestreamtocreateitsdifferential
frame, which makes real-time operation feasible. The difference from the algorithm presented in this
paper and the one from our earlier work [Kum et al. 2003] is that the center streams use the closest
neighboringcenterstreamasthereferencestream,notthecenterstreamofthemainstream’sgroup.
This results in better compression for the center streams since, most of the time, closer streams have
ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005.
134 • S.-U.KumandK.Mayer-Patel
moreredundantpoints.ThecompressionalgorithmisdescribedinmoredetailinSection4.Section5
detailshowstreamsarepartitionedintogroupsandthecenterstreamforeachgroupselected.
4. STREAMCOMPRESSION
This section details how depth streams are compressed in real-time. First, we detail how reference
streams are selected for each stream, and then discuss how these streams are compressed using the
selectedreferencestream.
4.1 ReferenceStreamSelection
IntheGroup-BasedReal-TimeCompression,alldepthstreamsarepartitionedintodisjointgroups.The
number of groups created is determined by the network bandwidth. Each group has a center stream,
which best represents the group, and sub streams—depth streams in a group that are not the center
stream.
Furthermore,onestreamisselectedasthemainstreamforwhichnocompressionisdone.Thedepth
streamviewpointwiththeshortestEuclidiandistancetotheuserischosenasthemainstreamsince
itbestrepresentstheuser’sviewpoint.
The group containing the main stream is called the main group and all other groups are referred
to as a sub group. Once the main stream has been selected, the reference stream for all streams are
selectedasfollows:
—Forthemainstream,noreferencestreamisneeded.
—Forthecenterstreamofthemaingroup,themainstreamisusedasthereferencestream.
—Forthecenterstreamsofthesubgroups,thenearestcenterstreamisusedasthereferencestream.
The center streams can be linearly sorted such that neighboring center streams in the list have
viewpointsthatareclosetoeachother.Fromthissortedlistofcenterstreams,thecenterstreamsleft
of the main stream use the right neighboring center stream as its reference stream, and the center
streams right of the main stream uses the left neighboring center stream as its reference stream.
ThisdiffersfromthealgorithmgiveninKumetal.[2003]sinceitdoesnotusethecenterstreamof
the main group as the reference streams. It also compresses better than Kum et al. [2003] because
twoneighboringstreamsusuallyhavemoreredundantpointsthentwonon-neighboringstreams.
—Foranyothersubstream,thecenterstreamofitsgroupisusedasthereferencestream.
Figure3showsanexamplewith12streamsand4groups.Stream5isthemainstream,whichmakes
Group2themaingroup.Streams1,4,7and10arethecenterstreamsforitsgroup,andarenumbered
insequentialorder.SinceStream5isthemainstreamitdoesnothaveanyreferencestreams.Stream4
is the center stream of the main group and uses the main stream (Stream 5) as its reference stream.
Thecenterstreamofsubgroups—Stream1,7,and10—usethenearestcenterstreamasthereference
stream,Stream1and7useStream4,andStream10usesStream7.Allsubstreamsusetheirgroup’s
centerstreamsasitsreferencestream—Stream2and3ofGroup1useStream1,Stream6ofGroup2
usesStream4,Stream8and9ofGroup3useStream7,andStream11and12ofGroup4useStream10.
Thearrowsshowthedirectionofthereferencestreamdistribution.
4.2 DifferentialStreamConstruction
To construct a differential stream, the data points of a depth stream are compared to the data points
within the reference stream. Points that are within some given distance threshold are removed from
thedepthstream.
Theformatofthedifferentialstreamisdifferentfromtheoriginalstreamformat.Theoriginalstream
hasfivebytes,threebytesforcolorandtwobytesfordepth,foreachdatapoint.Thedifferentialstream
ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005.
Real-TimeMultidepthStreamCompression • 135
Fig.3. AnexampleofreferencestreamdistributionforGroup-BasedReal-TimeCompression.
has five bytes for only the non-redundant points (i.e., points not removed) and a bitmask to indicate
which points have been retained and which points have been eliminated. If the bit value is ‘0’ then
the data point represented by the bit is a redundant point and is removed. If the bit value is ‘1’, the
correspondingpointisincluded.Theorderofdatafornonredundantpointsisthesameastheorderit
appears in the bitmask. This format reduces the size of a frame in the differential stream by 39 bits,
fivebytesminusonebit,forredundantpointsandaddsonebitfornon-redundantpoints.Soforadepth
stream of 640×480 resolution with a 5 to 1 redundancy ratio (i.e., 80% of data points are deemed
redundant), the size of a frame for the stream is reduced from 1.536MB to 346KB—approximately 5
to1.
5. STREAMPARTITION
In this section, we present an algorithm for stream partitioning and center stream selection. As dis-
cussed in Section 4, the streams need to be partitioned into groups and the center stream of each
groupselectedbeforeruntime.Sincereferencestreamselectionisdependentonthisstreampartition-
ingprocess,italsoeffectsstreamcompressionefficiency.Therefore,streamsshouldbepartitionedinto
groupssuchthatthemostredundantpointsareremoved.InSection5.1,wepresenteffectivecriteria
topartitionnstreamsintok groupsandtofindtheappropriatecenterstreamineachgroup.Weshow
howthesemetricscanbeusedtopartitionthestreamsandtoselectcenterstreamsinSection5.2.In
Section5.3,themetricsareusedtodevelopanefficientapproximatealgorithmforstreampartitioning
andcenterstreamselectionwhennistoolargeforanexhaustiveapproach.
5.1 CoherenceMetrics
Streampartitioningandselectionofcenterstreamshasadirectimpactoncompressionsinceallsub-
streamsofagroupusethecenterstreamofthegroupasthereferencestream.Therefore,thepartitioning
shouldensurethateachstreambelongstoagroupwherethevolumeoverlapbetweenthestreamand
thegroupcenterstreamismaximized.
However, exact calculation of the volume overlap between two streams is expensive. Thus, in this
article, we use the angle between the view directions of two depth streams as an approximation of
ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005.
136 • S.-U.KumandK.Mayer-Patel
Fig.4. Percentageofredundantpointsofastreaminthereferencestreamvs.theanglebetweenthetwostreams.Thestreams
arefromthe3DcameraconfigurationsofFigure9.
the overlapped volume. Empirically, the view directions of two streams are a good estimate for how
muchthetwostreamvolumesoverlap.Thesmallertheangle,thebiggertheoverlap.Thisisshownin
Figure4.
Thelocalsquaredanglesum(LSAS)isdefinedforstreamS asthesumofthesquaredanglebetween
i
stream S and all other streams in its group (Eq. (1)). This is used as the center stream selection
i
criterion.ThestreamwiththelowestLSASofthegroupischosentobethecenterstream.
(cid:1)nk
LSAS = [angleof(S,S )]2 (1)
i i j
j=1
wherestream S and S isingroupk,andn isthenumberofstreamsingroupk.
i j k
The group squared angle sum (GSAS), defined for a given group, is the sum of the squared angle
between the group’s center stream and every substream in the group (Eq. (2)). This is used as the
partitioning criterion for partitioning nstreams into k groups. The sum of all GSAS’s for a particular
partition(Eq.(3))isdefinedasthetotalsquaredanglesum(TSAS).Weareseekingthepartitionthat
minimizesTSAS.
(cid:1)nj
GSAS = [angleof(C ,S )]2 (2)
j j ji
i=1
where C is the center stream in group j, S is a sub stream in group j, and n is the number of
j ji j
substreamsingroup j.
(cid:1)k
TSAS= GSAS (3)
i
i=1
wherek isthenumberofgroups.
Finally, the central squared angle sum (CSAS) is defined as the sum of the squared angle between
allcenterstreams(Eq.(4)).ThestreamsshouldbepartitionedsuchthatCSASisalsominimal,since
allcenterstreamsuseeachotherasreferences.However,itshouldbenotedthatminimizingTSASis
ACMTransactionsonMultimediaComputing,CommunicationsandApplications,Vol.1,No.2,May2005.