METADATA TOOLS FOR DIGITAL MOTION PICTURE ARCHIVES Ilkka Juuso Juuso, I. (2009) Metadata Tools for Digital Motion Picture Archives. University of Oulu, Department of Electrical and Information Engineering, 112 p. ABSTRACT Most of the video information retrieval systems today rely on some set of computationally extracted video and/or audio features, which may be complemented with manually created annotation that is usually either arduous to create or insufficient for capturing the content. This thesis looks at the specific domain of motion pictures to identify the computational features relevant to films and, moreover, to investigate the use of actual motion picture planning documentation as a source of high quality annotation and metadata on films. The goal is to enable more advanced content-based retrieval of films without proportionately increasing the amount of manual annotation work. The underlying research includes a study of the concepts involved in film grammar and narrative structure, and a series of empirical viewer tests through which a key set of metadata describing the content and structure of motion picture material is identified. Through these activities the requirements for an end-to-end framework for harvesting and utilizing motion picture metadata is developed. The framework includes a range of tools for processing the motion picture documentation into metadata descriptions, a novel browser through which films and their metadata can be explored and the metadata models used by these tools. Keywords: motion pictures, content-based information retrieval, metadata, script, storyboards 2 Juuso, I. (2009) Metadatatyökaluja digitaalisiin elokuva-arkistoihin. Oulun yliopisto, sähkö- ja tietotekniikan osasto, 112 s. TIIVISTELMÄ Suurin osa nykyisistä videohakujärjestelmistä perustuu erityyppisiin video- ja/tai audiodatasta koneellisesti laskettuihin piirteisiin, joiden antamaa tietoa voidaan täydentää käsin luodun rajallisen annotaation kautta. Tämä diplomityö tarkastelee elokuvia hakuaineistona ja pyrkii identifioimaan elokuvien yhteydessä merkitykselliset piirteet ja tutkimaan elokuvien tuotantoprosessista syntyvien dokumenttien laajempaa hyödyntämistä korkeatasoisen metatiedon lähteenä. Tavoitteena on mahdollistaa aiempaa kehittyneempien sisältöpohjaisten elokuvahakujärjestelmien kehitys lisäämättä merkittävästi käsin tehtävän annotaation tarvetta. Työhön kuuluu katsaus elokuvalliseen ilmaisuun ja elokuvan rakenteeseen liittyviin konsepteihin, sekä sarja katsojakokeita, joiden kautta identifioidaan avainasemassa olevat elokuvan sisältöä ja rakennetta kuvaavat metatietoelementit. Näiden työvaiheiden pohjalta määritellään vaatimukset koko metatiedon keräys- ja hyödyntämisketjun kattavalle prosessille ja sen edellyttämille työkaluille. Kehitetty työkalupakki sisältää työkaluja sekä elokuvien tuotantoprosessin dokumenttien käsittelyyn ja metatiedoksi jalostamiseen että syntyneen metatiedon ja itse elokuvan selailuun. Lisäksi työkalupakki sisältää näiden työkalujen käyttämien tietomallien kuvaukset. Avainsanat: elokuvat, sisältöpohjainen haku, metatieto, käsikirjoitus, kuvakäsikirjoitus 3 TABLE OF CONTENTS ABSTRACT TIIVISTELMÄ TABLE OF CONTENTS PREFACE LIST OF ABBREVIATIONS 1 INTRODUCTION ................................................................................................. 9 2 DIGITAL MEDIA ASSET MANAGEMENT .................................................. 11 2.1 MULTIPLE STANDPOINTS AND SOLUTIONS .................................................... 11 2.1.1 Libraries ........................................................................................... 11 2.1.2 Media Production Companies .......................................................... 16 2.1.3 Media Consumption Portals ............................................................. 22 2.2 METADATA DESCRIPTION LANGUAGES ......................................................... 27 2.2.1 The Extensible Markup Language (XML) ........................................ 27 2.2.2 MPEG-7 ............................................................................................ 28 2.2.3 Dublin Core ...................................................................................... 30 2.2.4 Conclusions ...................................................................................... 30 2.3 DATABASE TECHNOLOGIES ........................................................................... 31 2.4 DATA PROTECTION AND DISTRIBUTION ........................................................ 31 3 MOTION PICTURE CONTENT CREATION ................................................ 33 3.1 FILM GRAMMAR ............................................................................................ 33 3.2 THE PRODUCTION STAGES ............................................................................ 35 3.3 THE SOURCE DOCUMENTS FOR FILM PRODUCTION ....................................... 38 3.3.1 The Script ......................................................................................... 38 3.3.2 Storyboards ...................................................................................... 41 3.4 TOOLS FOR COMPUTER-AIDED CONTENT PRODUCTION ................................. 43 3.4.1 Script Tools ....................................................................................... 44 3.4.2 Storyboard Tools .............................................................................. 45 4 CONTENT-BASED ANALYSIS OF MOTION PICTURES ......................... 47 4.1 BASIC METADATA ......................................................................................... 48 4.2 TRADITIONAL AUDIO AND VIDEO ANALYSIS ................................................ 50 4.3 FILM GRAMMAR ............................................................................................ 52 4.4 NARRATIVE STRUCTURE ON THE SCENE-LEVEL ............................................ 54 4.5 NARRATIVE STRUCTURE ABOVE THE SCENE-LEVEL ..................................... 55 5 REQUIREMENTS ANALYSIS ......................................................................... 56 5.1 WHAT THE VIEWER CONVEYS OF THE MOVIE ............................................... 56 5.1.1 Test I ................................................................................................. 57 5.1.2 Test II ................................................................................................ 58 5.1.3 Results .............................................................................................. 59 5.2 SYSTEM REQUIREMENTS ............................................................................... 64 5.2.1 Metadata Model ............................................................................... 64 5.2.2 Metadata Creation Suite .................................................................. 65 5.2.3 MetadataBrowser ............................................................................. 66 4 6 DESIGN OF THE SYSTEM .............................................................................. 68 6.1 GENERAL OVERVIEW .................................................................................... 68 6.2 METADATA MODEL ....................................................................................... 69 6.2.1 Content ............................................................................................. 69 6.2.2 Hierarchy .......................................................................................... 69 6.2.3 Analysis ............................................................................................ 71 6.3 METADATA CREATION SUITE ........................................................................ 71 6.3.1 Material Preparation ....................................................................... 71 6.3.2 Material Synchronization ................................................................. 73 6.3.3 Metadata Harvesting ........................................................................ 75 6.4 METADATABROWSER .................................................................................... 78 7 TESTING OF THE SYSTEM ............................................................................ 80 7.1 METADATA CREATION SUITE ........................................................................ 80 7.1.1 ScriptTagger ..................................................................................... 80 7.1.2 StoryLinker ....................................................................................... 82 7.1.3 Scene Hierarchy Editor .................................................................... 84 7.1.4 Database Manager ........................................................................... 85 7.2 METADATABROWSER .................................................................................... 86 7.3 DISCUSSION ................................................................................................... 88 8 DISCUSSION ....................................................................................................... 89 8.1 NOVELTY AND SIGNIFICANCE OF THE WORK ................................................ 89 8.2 POINTS OF CRITICISM .................................................................................... 90 8.3 FURTHER DEVELOPMENT .............................................................................. 90 9 SUMMARY .......................................................................................................... 91 10 REFERENCES .................................................................................................... 93 11 APPENDICES ................................................................................................... 103 5 PREFACE This Masters’s Thesis is the result of work conducted at the MediaTeam Oulu research group of the Department of Electrical and Information Engineering, at the University of Oulu. The results of this study were utilized in the Vikings project, and are currently set to be adapted for the LICHEN project. I would like to take this opportunity to thank the large number of people who have contributed to this research by offering comments, encouragement and in some cases even software components or data. First and foremost, I want to thank Professor Tapio Seppänen, whose long-term commitment and openness to new ideas has been instrumental for the successful completion of this study. I would also like to thank Professor Timo Ojala, who despite his busy schedule found time to be my second reviewer. Secondly, I gratefully acknowledge the collaboration of Juha Lilja, Terttu Kortelainen and Vesa Suominen form the Department of Information Studies, Pertti Väyrynen and Elokuvateatteri Star in designing and implementing the viewer tests preceding the actual tools development. Likewise, I am greatly indebted to Timo Koskela, Tiia Sutinen, Kai Noponen, Eero Väyrynen, Mika Rautiainen, Jialin Liu and Timo Mäkinen for their insight and collaboration in developing some of the tools discussed in this thesis. Furthermore, I wish to extend my gratitude to my colleagues at the Faculty of Humanities, most of all to Dr Lisa Lena Opas-Hänninen, whose enthusiasm for meshing together humanities and engineering is not only inspiring but also multidisciplinarity in its truest form. Thirdly, I would like to thank my friends and relatives for support, reprieve and generally making the world a nicer place. Finally, I want to thank my parents, my sister Anna-Maija and my wife Terhi for giving it all meaning. I humbly bow to you all, and thank the Academy. In Oulu, on the 20th of November 2009, Ilkka Juuso 6 LIST OF ABBREVIATIONS AACS Advanced Access Content System AAF Advanced Authoring Format AES Advanced Encryption Scheme ANSI American National Standards Institute ASR Automatic Speech Recognition techniques B2B Business-to-Business BBC British Broadcasting Company CGI Computer Generated Imagery CNN Cable News Network CSS Content Scrambling System DCMI Dublin Core Metadata Initiative DDL Description Definition Language DRM Digital Rights Management DR-TTD Double-Ring Take-Transition-Diagram DTV Digital Television DVB Digital Video Broadcasting DVD Digital Versatile Disc or Digtal Video Disc EBU Broadcasting Union EPG Electronic Program Guide ETSI European Telecommunications Standards Institute EXT Exterior FIAF International Federation of Film Archives HD-DVD High Definition Digital Video Disc HH Household HTML Hyper Text Markup Language IEC International Electrotechnical Commission IFLA International Federation of Library Associations and Institutions IMCE Integrated Media Creation Environment IMDb Internet Movie Database INT Interior ISAN International Standard Audio-visual Number ISO International Organization for Standardization ISP Internet Service Providers LICHEN The Linguistic and Cultural Heritage Electronic Network M2M Machine-to-Machine MARC Machine-Readable Cataloging Record MCDI Multimedia Content Description Interface MHP Multimedia Home Platform MPA Motion Picture Association MPAA Motion Picture Association of America MXF Material eXchange Format NIST National Institute of Standards and Technology NVOD Near-Video-on-Demand OMA Open Mobile Alliance OWL OWL Web Ontology Language P2P Peer-to-Peer 7 PVR Personal Video Recorder QoS Quality-of-Service RDA Resource Description and Access RDF Resource Description Framework RTF Rich Text Format SGML Standard Generalized Markup Language SI Service Information SMEF Standard Media Exchange Framework SMPTE Society of Motion Picture and Television Engineers TXT Text Format UI User interface VHS Video Home System V-ISAN Version identifier for audio-visual works VOD Video-on-Demand VTT Valtion Teknillinen Tutkimuskeskus W3C World Wide Web Consortium XML eXtensible Markup Language 8 1 INTRODUCTION Bridging the semantic gap, i.e. the void between low-level computational features and the high-level information needs of the user, has been the voiced objective of the information retrieval community since the latter part of the 1990s. Despite this and the significant progress of academic research in fields such as video information retrieval, there has been little impact of that research visible on a large-scale in everyday life [1][2]. At the same time, the digitization of media creation and use, in both professional and home environments, has meant that not only is there a wealth of media content available but it is also produced and consumed in a myriad of new contexts. In the face of growing media repositories, the likelihood of any particular media item floating to the top of the search hits list and reaching its consumer will increasingly depend on the quality of its metadata (i.e. data about the data). This promotion of metadata to a level comparable with advertising gives rise to the evaluation and further development of current information retrieval and management methods – particularly if content repurposing, i.e. consumers paying to reuse commercial material in their own works, takes off the way some experts are anticipating [2][3][4]. Most of the academic and enterprise-level commercial video information retrieval systems today rely on some set of computationally extracted video and/or audio features, which may be complemented with manually created annotation that is usually either arduous to create or insufficient for capturing the content. The vast amounts of content often at hand, the experimental stage of many automatic feature extraction methods and the prohibitive cost of extensive manual annotation have hindered the large-scale deployment of such sophisticated video information retrieval and management systems. Faced with such challenges many consumer-level commercial video and multimedia retrieval services have looked for ways to circumvent raw video and audio analysis by means of utilizing keywords or other human-assigned textual descriptions. Many consumer-level Video-on-Demand (VOD) services, particularly those offered through digital satellite television such as Dish1, but also services available on the internet, such as The Internet Archive2 or SF Anytime3 offer very little search capability at all, but rather provide browsing interfaces with lists and short descriptions of the content they offer. These services, with the exception of the Internet Archive which is a digital library archiving project, are continuations of pay- per-view or traditional video rental models and therefore offer much the same type of television and motion picture content as their predecessors. However, a number of other VOD services, such as youtube.com and soapbox, have also sprung up to offer consumers the chance to share their videos on the internet, and these services often match the search capabilities of their more traditional counterparts.4 1 http://www.dishnetwork.com 2 http://www.archive.org 3 http://www.sf-anytime.com 4 Interestingly, the emergence of such services where users themselves comment on, classify and annotate content has raised the idea of “folksonomy”, i.e. wikipedia-style collaboratively created labeling of content, as a possible solution to the organization of at least some portion of the large volume of data being added on the internet daily. In the future folksonomy could provide valuable insight into both user-originated and repurposed commercial content. 9 It is noteworthy that even the best-known and most widely used web search engines, such as Google and Yahoo! currently follow the above mentioned approach and solely use text to index video and images [1]. Other systems focus on leveraging knowledge of their application context and some media type specific structures or conventions to gain greater access into, for example news and sports material [5][6]. These systems take a good step forward towards offering an intuitive and truly content-based retrieval tool which facilitates textual querying of the speech content and enables event-level access to audio/video cued events, but in terms of dramatic content and narrative structure, they are still in their infancy. This thesis studies the possibility of using the previously untapped resource of television and motion picture planning documentation, such as the script and the storyboards, as sources of metadata in order to gain further insight into dramatic content. This planning documentation has the potential of offering detailed, yet no- nonsense information on not only what the film or show is about, but also how it is composed and how it conveys the story – in effect the documentation could serve as a read-through of the entire program. It is also reasonable to expect that this documentation is usually of high quality, particularly in the case of motion picture productions, which often have budgets of tens or hundreds of millions of dollars, and therefore have to be scripted extremely carefully in order to ensure that everyone involved in the production knows what they are working towards and what is expected of them. The underlying research includes a study of the concepts involved in narrative structure and the instruments used by filmmakers to package and convey information to their viewers, and a series of empirical tests designed to reveal the kinds of properties viewers notice in films and later use to describe them. The purpose of this research was to arrive at a key set of metadata which models films with the same concepts that viewers use in describing them, but is at the same time feasible to harvest from the planning documentation and from the finished film created by the filmmakers. Based on the model, a suite of metadata creation tools and a novel graphical browsing tool for motion picture material was developed. The author of this thesis was the co-designer of the empirical tests and the lead designer and implementer of the metadata creation tools, the browser and the datamodels used by them. The structure of the remainder of this thesis is as follows. Section 2 introduces the concept of digital media asset management. Section 3 gives a brief overview of film grammar and the production of motion pictures. Section 4 looks at the approaches taken in the automatic and semi-automatic content analysis of motion pictures. Section 5 describes the empirical viewer tests and the requirements set for of the metadata model, the tools and the browser. Section 6 presents the design of the metadata model, the tools and the browser. Section 7 provides an overview of the testing of the system. The study is brought to conclusion in Sections 8 and 9 with Section 8 providing an evaluation of the work conducted and Section 9 providing a summary of the study. 10
Description: