Integration of Natural Language and Vis ion Processing Computational Models and Systems Edited by Paul Mc Kevitt Dept. of Computer Science, University of Sheffield, U.K. Reprinted from Artificial Intelligence Review Volume 8, Nos. 2-3 and 5-6,1994-1995 Table of Contents AbouttheAuthors Preface 7 RYUICHIOKA/TheRealWorldComputingProgram 13 SEAN 6.NUALLAIN and ARNOLD G. SMITH / An Investigation into the CommonSemanticsofLanguageandVision 21 RYUICHIOKA/HierarchicalLabellingforIntegratingImagesandWords 31 PATRICKOLIVIERandJUN-ICHITSUJII/QuantitativePerceptualRepresen- tationofPrepositionalSemantics 55 WOLFGANGMAAB / From Vision to Multimodal Communication: Incremen- talRouteDescriptions 67 GERD HERZOG and PETER WAZINSKI / VIsual TRAnslator: Linking PerceptionsandNaturalLanguageDescriptions 83 HANS-HELLMUT NAGEL / A Vision of 'Vision and Language' Comprises Action: AnExamplefromRoadTraffic 97 YURI A. TIJERINO, SHINJI ABE, TSUTOMU MIYASATO, and FUMIO KISHINO / What You Say is What You See - Interactive Generation, Manipulation and Modification of 3-D Shapes Based on Verbal Descrip- tions 123 BRIGITTEDORNER andELIHAGEN/ Towards an AmericanSignLanguage Interface 143 SUZANNE LIEBOWITZ TAYLOR, DEBORAH A. DAHL, MARK LIP SHUTZ, CARL WEIR, LEWIS M. NORTON, ROSLYN WEIDNER NILSON, and MARCIA C. LINEBARGER / Integrating Natural Lan- guageUnderstandingwithDocumentStructureAnalysis 163 ROHINI K. SRIHARI / Computational Models for Integrating Linguistic and VisualInformation: ASurvey 185 JEFFREYMARKSISKIND/GroundingLanguageinPerception 207 NIALLGRIFFITH/ConnectionistVisualisationofTonalStructure 229 ROHINI K. SRIHARI / Use of Captions and Other Collateral Text in Under- standingPhotographs 245 iv TABLEOFCONTENTS R. M. C. AHN, R. J. BEUN, T. BORGHUIS, H. C. BUNT and C. W. A. M. VANOVERVELD/TheDenK-architecture: AFundamentalApproach to User-Interfaces 267 ANDREWCSINGER,KELLOGG S.BOOTHandDAVIDPOOLE/ AI Meets Authoring: UserModelsforIntelligentMultimedia 283 BookReviews: Sean 6 Nuallain, The Searchfor Mind: A New Foundationfor Cognitive Science(NADINELUCAS andJEAN-BAPTISTEBERTHELIN) 305 Derek Partridge and John Rowe, Computers andCreativity (STUART A. JACKSON) 307 Nigel Ward, A Connectionist Language Generator (AMANDA J. C. SHARKEY) 308 About the Authors Shinji Abe obtained both his B.E. and M.E. worked in the computer graphics group from in Nuclear Engineering from the Hokkaido 1968 to 1976. His research interests include UniversityinJapan in 1984and 1986,respec highperformancegraphicsworkstations,scien tively. In 1986,hejoinedtheHumanInterface tific visualization, computer animation, user Laboratories of Nippon Telegraph and interface design and analysis of algorithms. Telephone Corporation, where he was in HereceivedhisB.S.inmathematicsfromCal volvedon research and developmentofvideo tech in 1968 and his M.A. andPh.D. incom applications. Currently he is a senior puter science from UC Berkeley in 1970 and researcherattheArtificialIntelligenceDepart 1975. He is a member of the Canadian ment of AlR Communication Systems Human-Computer Communications Society, Research Laboratories which he joined in the IEEE Computer Society, and the ACM. 1993. His primary research interests are in He served as chair of ACM SIGGRAPH, the QualitativeReasoningaboutPhysicalSystems Association for Computing Machinery's Spe including Mathematical Model Generation cial Insterest Group on Computer Graphics, andtheSystemIdentificationProblem. from 1985to 1989andthenaspastchairfrom 1989 to 1993. Dr Booth is a consultant to Rene Ahn obtained a Master's degree in government and industry in the US and theoretical physics from the Eindhoven Canada on computer graphics and related University ofTechnology. He worked for six areas of computer science. He has written years as a software scientist at Philips overfifty technicalpapersandjournalarticles Research Laboratories and is currently and serves as a referee for a number of employed at the Institute for LanguageTech journals,conferencesandgrantingagencies. nology and Artificial Intelligence (ITK). His main interests are in automatic reasoning and Tijn Borghuis received his Master's degree cognitivemodelling. in philosophyfrom the University ofAmster damin 1989.Since 1990hehasbeenworking Robbert·JanBeunisworkingas aresearcher on his Ph.D. project on the interpretation of at the Institute ofPerception Research (IPO) modalities in typed lambda calculus, at the in Eindhoven. He received his Master's de Faculty of Mathematics and Computing gree in electrical engineering from the ScienceoftheEindhoven UniversityofTech Eindhoven University of Technology (1983) nology.Hiscurrentinterests(besidescomplet andhis Ph.D. in linguistic pragmatics at the ing his thesis) are type theory and intesional University ofTilburg (1989). His main inter logic. ests are in dialogue modelling, discourse analysisandformalpragmatics. HarryBuntgraduatedinphysics(Utrecht)in 1969 and obtained a doctorate in Linguistics Kellogg S. Booth is Professor of Computer in 1981 with adissertation on formal seman Science and Director of the Media and tics (Amsterdam). He conducted research in Graphics Interdisciplinary Centre at the Artificial Intelligence at Philips Research University ofBritish Columbia. Prior to that, Laboratories from 1970 to 1983. Since 1983 he was Professor of Computer Science and he is Professor of Computational Linguistics Director of the Institute for Computer at Tilburg University. Since its foundation in ResearchattheUniversityofWaterloo,where 1988 he is Director of the Institute for he was on the faculty from 1977 to 1990; he Language Technology and Artificial Intel wasalso amemberoftheresearchstaffinthe ligence (ITK). He initiated and directs the Computation Department of the Lawrence DenKproject. Livermore National Laboratory, where he ABOUTTHEAUTHORS Andrew Csinger is a Ph.D. candidate at the gree in Computer Science awarded by the Department of Computer Science of the UniversityofEssexin 1988,andaBachelor's University of British Columbia. He received degree in Archaeology and Anthropology his Bachelor's degree in Electrical Engineer awarded by the University of Cambridge in ing from McGill University in 1985, and his 1972. He is currently working within a M.Sc. in Computer Science from UBC in researchgroupattheUniversityofExeterthat 1990. Several years in industry, first as isinvestigatingtheuseofartificialneuralnet softwaredesignerandthenasconsultant,con works in software engineering. His principal vinced him that a retreat to academia was reserach interests are the modelling ofcogni necessary. His research interests include the tivestructureinmusicperception,thedevelop applicationofartificialintelligencetechniques mentofmodular neural nets, and the abstrac to problems in the real world. Andrew tion ofsequentialstructureandcategorical at believes thatcomputersystems mustbegin to tention. Moregenerallyhe is interestedin the shoulder some ofthe responsibility for effec implicationsofthecomputationalmetaphorof tive interaction,andthatuser-modellinginin mind,andtherelationshipbetweenrepresenta telligent multimedia interfaces is a promising tionandprocessincomputationalmodels. direction. Eli Hagen completed her Master of Science Deborah A. Dahl received her Ph.D. in lin degree in the School of Computing Science, guistics from the University of Minnesota in SimonFraserUniversity,Canadain 1993 and 1984.Aftercompletingapost-doctoralfellow her Bachelor of Science degree in the ship in cognitive science at the University of Department of Computing and Information PennsylvaniaDr.DahljoinedtheNaturalLan Science, Queen'sUniversity,Canadain 1991. guage Processing Group at Unisys Corpora Her M.Sc. thesis work consisted ofdesigning tion(thenBurroughs)andhasbeenactivelyin acomputationalmodelofAmericanSignLan volved in the design and implementation of guage, which is aimed at being used in a natural language processing software since natural language interface to a deductive thattime. Dr. Dahlbeganworkingonextend database. Currently she is working as a ing written language processing work to research assistantin theSchoolofComputing spoken language in 1989 and since then has Science,SimonFraserUniversity. led several spoken language application developmenteffortsatUnisys. Dr.Dahlisac GerdHerzogwasbornin1963inBadKreuz tively involved in spoken language standards nach,Germany.Heisafull-timeresearcherin efforts, and is currently the chair of the the Special Collaborative Program on Artifi government-sponsored planning committee cial Intelligence and Knowledge-based Sys defining formal spoken language evaluation tems(SFB 314)oftheGermanScienceFoun metrics. dation (DFG), project N2: VITRA, at the UniversitatdesSaarlandes, Saarbrilcken, Ger Brigitte Dorner was born in Erlangen, Ger many. He joined the VITRA group as a many. She studied technical mathematics at research assistant in 1985, when the project theTechnicalUniversity, Graz, Austriaand is was started. He began to carry out full-time currently finishing her Master's degree in research in VITRA after completing his Computing Science at Simon Fraser Univer Master's degree in Computer Science, which sity, Canada. Her research interests include was obtained from the Universitiit des Saar computer vision, digital image processing, landes in 1988. He is co-supervisor of the andvisualization. VITRA project since 1991. His primary research interests are in high-level scene Niall Griffith is 44and completed his Ph.D. analysisfortheintegrationofnaturallanguage in Computer Scienceat the University ofEx andvisionprocessing. Heisalso interestedin eterin 1993.Thetopic ofhisdissertation was MultimediaandthegeneralareaofIntelligent the use of neural networks in modelling the Interfaces. tonalstructureofmusic.HehasaMaster'sde- 2 ABOUTTHEAUTHORS FumioKishinoistheheadoftheArtificialIn worldproblems. telligence Department of ATR Communica tion Systems Research Laboratories. He Wolfgang Maa8 is 29 and comes from received the B.E. and M.E. degrees from Ziilpich, Germany. Since 1992, he has Nagoya Institute of Technology, Nagoya, cooperated in the project VITRA (Visual Japan, in 1969 and 1971, respectively. In Translator) of the research program "Kunst 1971,hejoinedtheElectricalCommunication licheIntelligenzundwissensbasierteSysteme, Laboratories, Nippon Telegraph and Sonderforschungsbereich 314" in Saar Telephone Corporation, where he was in brucken, Germany, which is supported by the volved in work on research and development German Research Community (DFG). Since of image processing and visual communica 1993,hehasalsoparticipatedintheCognitive tion systems. In mid-1989, he joined ATR Science Program at the University of Communication Systems Research Labo Saarbrucken. From 1985 till 1990, he studied ratories. His research interests include 3D Computer Science at the University of visualcommunicationandimageprocessing. Aachen, Germany, where he obtained his Bachelor's degree. From 1990 till 1992, he Marcia C. Linebarger received herPh.D. in studied at the University of Saarbrucken linguistics from MITin 1980. Herexperience where he gainedhis Master'sdegree inCom includes college teaching (atSwarthmore and puter Science. His primary research interests Hampshire Colleges), a post-doctoral areinvisuo-spatial knowledgeprocessingand fellowship incognitive scienceat the Univer Multimodal Communication. Heis also inter sity of Pennsylvania, and NIH-supported ested in related research areas such as Geog research on language processing in aphasia, a raphy, Environmental Psychology, Cognitive language disorder resulting from brain Psychology,andthePhilosophyofMind. damage. Dr. Linebarger joined the Natural Language Processing Group at Unisys (then Paul Me Kevitt is 31 and comes from Dun System Development Corporation) in 1985: NanGaIl(Donegal),IrelandontheNorthwest she has been involved primarily in the of the EU. He is a British EPSRC development of the syntactic and semantic (Engineering and Physical Sciences Research modules of the spoken language processing Council) Advanced Fellow in the Department system. ofComputerScienceattheUniversityofShef field in England, EU. The Fellowship, com Mark Lipshutz has worked since receiving mencing in 1994, releases him from his his M.S.E. degree in Computer and Informa tenured Lecturership (Associate Professor tion Science from the University ofPennsyl ship)for 5years toconductfull-time research vaniain 1982inthecomputersciencefield as onthe integrationofnatural language, speech a researcher, knowledge engineer, systems and vision processing. He is currently pursu analyst,consultantandinstructor.Priortothat ing a Master's degree in Education at the he was a teacher of secondary mathematics, University of Sheffield. He completed his during which time he served for a year as a Ph.D. in Computer Science at the University Fulbright Exchange Teacher in London, ofExeter,England,EUin 1991.His Master's England. His investigations have ranged from degree in Computer Science was obtained applied AI (automated configuration, item from New Mexico State University, New managementfor logistics) tohiscurrentfocus Mexico, USA in 1988and his Bachelor's de on knowledge representation and reasoning gree in Computer Science from University for document understanding. The latter, on College Dublin, Ireland, EU in 1985. His going for three years, includes the functional primaryresearch interests are in NaturalLan and logicalanalysisofdocuments providedin guage Processing including the processing of hardcopy form and the automatic generation pragmatics,beliefsand intentions indialogue. ofhypertextfrom legacydocuments. Mr. Lip He is also interested in Philosophy, shutz is also interested in image processing Multimedia and the general area ofArtificial and multi-disciplinary approaches to real- Intelligence. 3 ABOUTTHEAUTHORS Tsutomu Miyasato received the B.E. degree test. She was a programmer for RCA (1980 inElectronicEngineeringfrom the University and 1981), a systems software analyst for of Electro-Communications, Tokyo, Japan in Singer-Kearfott(1982-1984),andhasworked 1976, and the M.E. degree inElectronic Sys on various projects and tasks since joining tems from Tokyo Institute of Technology, Unisys (then Burroughs) in 1984. Since Tokyo,Japan, in 1978. He received thePh.D. joining the research staff she has worked as degreefrom TokyoInstituteofTechnologyin system integrator and interface designer for 1991. From 1978, he was with the Research thelOUSsystem. andDevelopmentLaboratoriesoftheKokusai Denshin Denwa (KDD) Co., Ltd., Tokyo, Lewis M. Norton holds a Ph.D. in Mathe Japan, and worked in the field of high ef matics from the Massachusetts Institute of ficiency coding ofhandwritten signals, image Technology. He has been a member of the processing in videotex. Since 1993,hejoined research staff at MITRE Corporation ATR Communication Systems Research (1966-1969) and the Division of Computer Laboratories wherehe iscurrently engagedin Research and Technology ofthe National In research on ateleconference system based on stitutes of Health (1969-1983). He joined virtualrealitytechnology. Unisys (then System Development Corpora tion) in 1983, and has been a member of Hans-HellmutNagelreceivedtheDiplomde Unisys' Spoken Language Systems Group gree in Physics from the Universitiit Heidel since 1989.Dr. Norton's researchhas beenin berg in 1960 and the Doctor's degree in the areas of computational linguistics, auto Physics from the Universitiit Bonn in 1964. mated reasoning, knowledge representation, After 18 months as a Visiting Scientist at andexpertsystems. M.LT., he worked on the automatic analysis of bubble chamber film at the Deutsche Sean 6 Nuallain holds an M.Sc. inPsychol Elektronen-Synchrotron at Hamburg as well ogy and a Ph.D. in Computer Science from as atthePhysikalischeInstitutderUniversitiit Trinity College, Dublin. He is currently on Bonn from 1966 through 1971. In Fall 1971 sabbatical leave at the National Research hebecameProfessorfUrInformatik(computer Council ofCanada from his lecturing post at science) at the Universitiit Hamburg. Since DublinCityUniversity,whereheinitiatedand 1983 he has been Directorofthe Fraunhofer directed the B.Sc. in Applied Computational Institut fUr Informations- und Datenverar Linguistics. Heis the authorofabookonthe beitungatKarlsruhe, inajointappointmentas foundations of Cognitive Science: In Search Full Professor at the Fakultiit fUr Informatik ofMind(inpress). der Universitiit Karlsruhe. In addition to his primaryinterestintheevaluationofimagese Ryuichi Oka is a manager of Theory and quences and associatedquestions in computer Novel Functions Department at Tsukuba vision, AI and pattern recognition, his Research Center ofthe Real World Comput interests include the implementation and use ing Partnership (RWC Japan). He is also a ofhigherlevelprogramminglanguagesforthe chief of the Information Interactive Integra realization of image analysis systems. Dr. tionSystemProjectofRWC. His research in Nagel is a member of editorial boards of terests include motion image understanding, various international journals in the field of spontaneous speech understanding, self-or computervision,AI,andpatternrecognition. ganisation information base, movable robot, integration of symbol and pattern, and super Roslyn Weidner NilsonreceivedherM.S. in parallel computation. Oka received his Ph.D. Computer Science from Villanova University degree in Engineering from the University of in 1990andherB.S. inElectricalEngineering Tokyo. from Lehigh University in 1982. Roslyn Nil son has thirteen years of experience ranging Patrick Olivier is a lecturer in Computer from systems engineer, software analyst and Science at the Centre for Intelligent Systems designer, software implementer, and systems at the University of Wales, Aberystwyth. 4 ABOUTTHEAUTHORS AfterobtainingaBachelor'sdegreeinPhysics PennsylvaniaInstitutefor Research in Cogni from King's College, Cambridge, and a tive Science and is currently a visiting assis Master'sdegreeinArtificialIntelligencefrom tant professor at the University of Toronto the University of Wales, he worked for two DepartmentofComputerScience. Hepursues years as a researcher at the Centre for Com reserachon visualeventperception,computa putational Linguistics, in the Department of tional models of child language acquisition, Language and Linguistics, UMIST (UK), andadvancedcompilationtechniques. where he is also currently a doctoral can didate. His current research interests include ArnoldSmithiscoordinatoroftheIntelligent the mediation ofinformation from the verbal Human-ComputerInterfaceProgramattheIn tothevisualdomain,qualitativespatialreason stitute for Information Technology, National ingand representation, andfunctional reason Research Council of Canada, where he ing and representation in engineering recently moved from SRI International's domains. Cambridge Research Centre in England. He was educated at Harvard and Sussex Univer CorneliusW.A.M. van Overveld is working sities, andhis currentresearchinterests are in in computer graphics since spring 1985 as a naturallanguageprocessingandvisualization. staff teacher and researcher in the Dept. of Mathematics and Computing science of RohiniSriharireceivedherB.MathinCom Eindhoven University of Technology (EUT). puter Science from the University of Water He has an M.Sc. in physics and a Ph.D. in loo, Canada. Shereceived herPh.D. inCom nuclearphysics, also atEUT. In graphics, his puter Science from the State University of main interests are in fundamental aspects of NewYorkatBuffaloin 1992.Shewas Assis rasteralgorithms(discretisation,rendering), 3 tantProfessorofComputer Science, Canisius D modelling, computer animation, dynamical College, Buffalo, during 1985-1989. At simulation, and direct manipulation tech presentsheisaresearchscientistattheCenter niquesforuserinterfaces. of Excellence for Document Analysis and Recognition (CEDAR) and aResearch Assis David Poole is an AssociateProfessor in the tantProfessorofComputerScience atSUNY Department of Computer Science at the at Buffalo. Dr. Srihari's Ph.D. dissertation UniversityofBritishColumbia,andaScholar was on using collateral text in interpreting of the Canadian Institute for Advanced photographs. Her current reearch centers on Research. He obtained his Ph.D. from the usinglinguisticinformationininterpretingspa Australian National University in 1984. He tial(visual) data. SheisPrincipalInvestigator was part of the Logic Progrmaming and AI ontwoprojects:"LanguageModelsforRecog Group at the University of Waterloo from nizing Handwritten Text", an NSFIARPA 1984-1988,andhas beenatthe University of funded projecton Human LanguageTechnol British Columbia since 1988. His main ogy, and "Use of Collateral Text in Under research interests are automated logical and standing Photos in Documents", a 0001 probabilisticreasoningfordiagnosis,common ARPAfundedproject. sense reasoning and decision making. He pioneered assumption-based logical reason Suzanne Liebowitz Taylor received her ing, developed the system 'Theorist' which Ph.D. inElectricalandComputerEngineering has been used for both defaultand abductive from Carnegie Mellon University in 1989 reasoning, and has more recently worked on before joining the research group at Unisys. representations and algorithms combining Dr. Taylor is researching techniques to logicandprobability. analyzedocumentimages using image under standing, optical character recognition, and Jeffrey Siskind received aB.A. in Computer text interpretation. This work has resulted in Science from the Technion and an S.M. and the development ofthe Intelligent Document Ph.D. inComputerSciencefrom MIT. Hedid UnderstandingSystem(IDUS)whichmanipu a postdoctoral fellowship at the University of lates document images for input to either a 5

