ebook img

Advanced Video Coding: Principles and Techniques PDF

421 Pages·1999·8.386 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Advanced Video Coding: Principles and Techniques

Series Editor: J. Biemond, Delft University of Technology, The Netherlands Volume 1 Three-Dimensional Object Recognition Systems (edited by A.K. Jain and P.J. Flynn) Volume 2 VLSI Implementations for Image Communications (edited by P. Pirsch) Volume 3 Digital Moving Pictures - Coding and Transmission on ATM Networks (J.-P. Leduc) Volume 4 Motion Analysis for Image Sequence Coding (G.Tziritas and C. Labit) Volume 5 Wavelets in Image Communication (edited by M. Barlaud) Volume 6 Subband Compression of Images: Principles and Examples (T.A. Ramstad, S.O. Aase and J.H. Husey) Volume 7 Advanced Video Coding: Principles and Techniques (K.N. Ngan, T. Meier and D. Chai) ADVANCES IN IMAGE COMMUNICATION 7 Advanced Video Coding: Principles and Techniques King N. Ngan, Thomas Meier and Douglas Chai University of Western Australia, Dept. of Electrical and Electronic Engineering, Visual Communications Research Group, Nedlands, Western Australia 6907 1999 Elsevier Amsterdam - Lausanne - New York - Oxford - Shannon - Singapore - Tokyo ELSEVIER SCIENCE B.V. Sara Burgerhartstraat 25 P.O. Box 211, 1000 AE Amsterdam, The Netherlands (cid:14)9 1999 Elsevier Science B.V. All rights reserved. This work is protected under copyright by Elsevier Science, and the following terms and conditions apply to its use: Photocopying Single photocopies of single chapters may be made for personal use as allowed by national copyright laws. Permission of the Publisher and payment of a fee is required for all other photocopying, including multiple or systematic copying, copying for advertising or promotional purposes, resale, and all forms of document delivery. Special rates are available for educational institutions that wish to make photocopies for non-profit educational classroom use. Permissions may be sought directly from Elsevier Science Rights & Permissions Department, PO Box 800, Oxford OX5 1DX, UK; phone: (+44) 1865 843830, fax: (+44) 1865 853333, e-mail: [email protected]. You may also contact Rights & Permissions directly through Elsevier's home page (http://www.elsevier.nl), selecting first 'Customer Support', then 'General Information', then 'Permissions Query Form'. In the USA, users may clear permissions and make payments through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA; phone: (978) 7508400, fax: (978) 7504744, and in the UK through the Copyright Licensing Agency Rapid Clearance Service (CLARCS), 90 Tottenham Court Road, London WlP 0LP, UK; phone: (+44) 171 631 5555; fax: (+44) 171 631 5500. Other countries may have a local reprographic rights agency for payments. Derivative Works Tables of contents may be reproduced for internal circulation, but permission of Elsevier Science is required for external resale or distribution of such material. Permission of the Publisher is required for all other derivative works, including compilations and translations. Electronic Storage or Usage Permission of the Publisher is required to store or use electronically any material contained in this work, including any chapter or part of a chapter. Except as outlined above, no part of this work may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the Publisher. Address permissions requests to: Elsevier Science Rights & Permissions Department, at the mail, fax and e-mail addresses noted above. Notice No responsibility is assumed by the Publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein. Because of rapid advances in the medical sciences, in particular, independent verification of diagnoses and drug dosages should be made. First edition 1999 Library of Congress Cataloging in Publication Data A catalog record from the Library of Congress has been applied for. ISBN: 0444 82667 X The paper used in this publication meets the requirements of ANSI/NISO Z39.48-1992 (Permanence of Paper). Printed in The Netherlands. To Nerissa, Xixiang, Simin, Siqi To Elena To June Preface The rapid advancement in computer and telecommunication technologies is affecting every aspects of our daily lives. It is changing the way we interact with each other, the way we conduct business and has profound impact on the environment in which we live. Increasingly, we see the boundaries be- tween computer, telecommunication and entertainment are blurring as the three industries become more integrated with each other. Nowadays, one no longer uses the computer solely as a computing tool, but often as a console for video games, movies and increasingly as a telecommunication terminal for fax, voice or videoconferencing. Similarly, the traditional telephone net- work now supports a diverse range of applications such as video-on-demand, videoconferencing, Internet, etc. One of the main driving forces behind the explosion in information traffic across the globe is the ability to move large chunks of data over the exist- ing telecommunication infrastructure. This is made possible largely due to the tremendous progress achieved by researchers around the world in data compression technology, in particular for video data. This means that for the first time in human history, moving images can be transmitted over long distances in real-time, i.e., the same time as the event unfolds over at the sender's end. Since the invention of image and video compression using DPCM (differ- ential pulse-code-modulation), followed by transform coding, vector quanti- zation, subband/wavelet coding, fractal coding, object-oreinted coding and model-based coding, the technology has matured to a stage that various cod- ing standards had been promulgated to enable interoperability of different equipment manufacturers implementing the standards. This promotes the adoption of the standards by the equipment manufacturers and popularizes the use of the standards in consumer products. JPEG is an image coding standard for compressing still images accord- ing to a compression/quality trade-off. It is a popular standard for image exchange over the Internet. For video, MPEG-1 caters for storage media vii viii up to a bit rate of 1.5 Mbits/s; MPEG-2 is aimed at video transmission of typically 4-10 Mbits/s but it alSo can go beyond that range to include HDTV (high-definition TV) image~. At the lower end of the bit rate spec- trum, there are H.261 for videoconmrencing applications at p x 64 Kbits/s, where p = ,1 2,... 30; , and H.263,~which can transmit at bit rates of less than 64 Kbits/s, clearly aiming at the videophony market. The standards above have a number of commonalities: firstly, they are based on predictive/transform coder architecture, and secondly, they pro- cess video images as rectangular frames. These place severe constraints as demand for greater variety and access of video content increases. Multi- media including sound, video, graphics, text, and animation is contained in many of the information content encountered in daily life. Standards have to evolve to integrate and code the multimedia content. The concept of video as a sequence of rectangular frames displayed in time is outdated since video nowadays can be captured in different locations and composed as a composite scene. Furthermore, video can be mixed with graphics and an- imation to form a new video, and so on. The new paradigm is to view video content as audiovisual object which as an entity can be coded, manipulated and composed in whatever way an application requires. MPEG-4 is the emerging stanc lard for the coding of multimedia con- tent. It defines a syntax for a set c f, content-based functionalities, namely, content-based interactivity, compre ssion and universal access. However, it does not specify how the video con tent is to be generated. The process of video generation is difficult and under active research. One simple way is to capture the visual objects separately , as it is done in TV weather reports, where the weather reporter stands in front of a weather map captured sepa- rately and then composed together yith the reporter. The problem is this is not always possible as in the case jm outdoor live broadcasts. Therefore, au- tomatic segmentation has to be employed to generate the visual content in real-time for encoding. Visual content is segmented as semantically mean- ingful object known as video objec I plane. The video object plane is then tracked making use of the tempora ~ I correlation between frames so that its location is known in subsequent frames. Encoding can then be carried out using MPEG-4. L " This book addresses the more ~dvanced topics in video coding not in- cluded in most of the video codingbooks in the market. The focus of the book is on coding of arbitrarily shaped visual objects and its associated topics. | It is organized into six chapters:Image and Video Segmentation (Chap- ter 1), Face Segmentation (Chapter" 2), Foreground/Background Coding ix (Chapter 3), Model-based Coding (Chapter 4), Video Object Plane Ex- traction and Tracking (Chapter 5), and MPEG-4 Video Coding Standard (Chapter 6). Chapter 1 deals with image and video segmentation. It begins with a review of Bayesian inference and Markov random fields, which are used in the various techniques discussed throughout the chapter. An important component of many segmentation algorithms is edge detection. Hence, an overview of some edge detection techniques is given. The next section deals with low level image segmentation involving morphological operations and Bayesian approaches. Motion is one of the key parameters used in video segmentation and its representation is introduced in Section 1.4. Motion estimation and some of its associated problems like occlusion are dealt with in the following section. In the last section, video segmentation based on motion information is discussed in detail. Chapter 2 focuses on the specific problem of face segmentation and its applications in videoconferencing. The chapter begins by defining the face segmentation problem followed by a discussion of the various approaches along with a literature review. The next section discusses a particular face segmentation algorithm based on a skin color map. Results showed that this particular approach is capable of segmenting facial images regardless of the facial color and it presents a fast and reliable method for face segmentation suitable for real-time applications. The face segmentation information is exploited in a video coding scheme to be described in the next chapter where the facial region is coded with a higher image quality than the background region. Chapter 3 describes the foreground/background (F/B) coding scheme where the facial region (the foreground) is coded with more bits than the background region. The objective is to achieve an improvement in the perceptual quality of the region of interest, i.e., the face, in the encoded image. The F/B coding algorithm is integrated into the H.261 coder with full compatibility, and into the H.263 coder with slight modifications of its syntax. Rate control in the foreground and background regions is also investigated using the concept of joint bit assignment. Lastly, the MPEG-4 coding standard in the context of foreground/background coding scheme is studied. As mentioned above, multimedia content can contain synthetic objects or objects which can be represented by synthetic models. One such model is the 3-D wire-frame model (WFM) consisting of 500 triangles commonly used to model human head and body. Model-based coding is the technique used to code the synthetic wire-frame models. Chapter 4 describes the pro- cedure involved in model-based coding for a human head. In model-based coding, the most difficult problem is the automatic location of the object in the image. The object location is crucial for accurate fitting of the 3-D WFM onto the physical object to be coded. The techniques employed for automatic facial feature contours extraction are active contours (or snakes) for face profile and eyebrow extraction, and deformable templates for eye and mouth extraction. For synthesis of the facial image sequence, head mo- tion parameters and facial expression parameters need to be estimated. At the decoder, the facial image sequence is synthesized using the facial struc- ture deformation method which deforms the structure of the 3-D WFM to stimulate facial expressions. Facial expressions can be represented by 44 ac- tion units and the deformation of the WFM is done through the movement of vertices according to the deformation rules defined by the action units. Facial texture is then updated to improve the quality of the synthesized images. Chapter 5 addresses the extraction of video object planes (VOPs) and their tracking thereafter. An intrinsic problem of video object plane extrac- tion is that objects of interest are not homogeneous with respect to low-level features such as color, intensity, or optical flow. Hence, conventional seg- mentation techniques will fail to obtain semantically meaningful partitions. The most important cue exploited by most of the VOP extraction algo- rithms is motion. In this chapter, an algorithm which makes use of motion information in successive frames to perform a separation of foreground ob- jects from the background and to track them subsequently is described in detail. The main hypothesis underlying this approach is the existence of a dominant global motion that can be assigned to the background. Areas in the frame that do not follow this background motion then indicate the presence of independently moving physical objects which can be character- ized by a motion that is different from the dominant global motion. The algorithm consists of the following stages: global motion estimation, ob- ject motion detection, model initialization, object tracking, model update and VOP extraction. Two versions of the algorithm are presented where the main difference is in the object motion detection stage. Version I uses morphological motion filtering whilst Version II employs change detection masks to detect the object motion. Results will be shown to illustrate the effectiveness of the algorithm. The last chapter of the book, Chapter ,6 contains a description of the MPEG-4 standard. It begins with an explanation of the MPEG-4 devel- opment process, followed by a brief description of the salient features of MPEG-4 and an outline of the technical description. Coding of audio ob- xi jects including natural sound and synthesized sound coding is detailed in Section 6.5. The next section containing the main part of the chapter, Cod- ing of Natural Textures, Images And Video, is extracted from the MPEG-4 Video Verification Model .11 This section gives a succinct explanation of the various techniques employed in the coding of natural images and video including shape coding, motion estimation and compensation, prediction, texture coding, scalable coding, sprite coding and still image coding. The following section gives an overview of the coding of synthetic objects. The approach adopted here is similar to that described in Chapter .4 In order to handle video transmission in error-prone environment such as the mobile channels, MPEG-4 has incorporated error resilience functionality into the standard. The last section of the chapter describes the error resilient tech- niques used in MPEG-4 for video transmission over mobile communication networks. King N. Ngan Thomas Meier Douglas Chai June 1999 Acknowledgments The authors would ike to thank Professor K. Aizawa of University of Tokyo, Japan, for the use of the "Makeface" 3-D wireframe synthesis soft- ware package, from which some of the images in Chapter 4 are obtained. xi jects including natural sound and synthesized sound coding is detailed in Section 6.5. The next section containing the main part of the chapter, Cod- ing of Natural Textures, Images And Video, is extracted from the MPEG-4 Video Verification Model .11 This section gives a succinct explanation of the various techniques employed in the coding of natural images and video including shape coding, motion estimation and compensation, prediction, texture coding, scalable coding, sprite coding and still image coding. The following section gives an overview of the coding of synthetic objects. The approach adopted here is similar to that described in Chapter .4 In order to handle video transmission in error-prone environment such as the mobile channels, MPEG-4 has incorporated error resilience functionality into the standard. The last section of the chapter describes the error resilient tech- niques used in MPEG-4 for video transmission over mobile communication networks. King N. Ngan Thomas Meier Douglas Chai June 1999 Acknowledgments The authors would ike to thank Professor K. Aizawa of University of Tokyo, Japan, for the use of the "Makeface" 3-D wireframe synthesis soft- ware package, from which some of the images in Chapter 4 are obtained.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.