ebook img

Semantics-Oriented Natural Language Processing. Mathematical Models and Algorithms PDF

340 Pages·2010·2.444 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Semantics-Oriented Natural Language Processing. Mathematical Models and Algorithms

Semantics-Oriented Natural Language Processing Volume 27 InternationalFederationforSystemsResearchInternational Series on Systems Science and Engineering SeriesEditor:GeorgeJ.Klir BinghamtonStateUniversity EditorialBoard GerritBroekstra IvanM.Havel ErasmusUniversity,Rotterdam, CharlesUniversity,Prague, TheNetherlands CzechRepublic JohnL.Castii KlausKornwachs SanteFeInstitute,NewMexico TechnicalUniversityofCottbus, Germany BrianGaines FranzPichler UniversityofCalgary,Canada UniversityofLinzAustria Volume22 ORGANIZATIONSTRUCTURE:CyberneticSystems Foundation YasuhikoTakaharaandMihajloMesarovic Volume23 CONSTRAINT THEORY: Multidimensional Mathematical ModelManagement GeorgeJ.Friedman Volume24 FOUNDATIONS AND APPLICATIONS OF MIS: Model TheoryApproach YasuhikoTakaharaandYongmeiLiu Volume25 GENERALIZEDMEASURETHEORY ZhenyuanWangandGeorgeJ.Klir Volume26 AMISSINGLINKINCYBERNETICS:LogicandContinuity AlexM.Andrew Volume27 SEMANTICS-ORIENTEDNATURALLANGUAGE PROCESSING:MathematicalModelsandAlgorithms VladimirA.Fomichov IFSRwasestablished“tostimulateallactivitiesassociatedwiththescientificstudyofsystemsandto coordinatesuchactivitiesatinternationallevel.”Theaimofthisseriesistostimulatepublicationof high-qualitymonographsandtextbooksonvarioustopicsofsystemsscienceandengineering.Thisseries complementstheFederation’sotherpublications. AContinuationOrderPlanisavailableforthisseries.Acontinuationorderwillbringdeliveryofeach newvolumeimmediatelyuponpublication.Volumesarebilledonlyuponactualshipment.Forfurther informationpleasecontactthepublisher. Volumes1–6werepublishedbyPergamonPress. Vladimir A. Fomichov Semantics-Oriented Natural Language Processing Mathematical Models and Algorithms 123 VladimirA.Fomichov FacultyofBusinessInformatics DepartmentofInnovationsandBusiness intheSphereofInformationalTechnologies StateUniversity–HigherSchoolofEconomics KirpichnayaStreet33,105679Moscow,Russia [email protected] [email protected] [email protected] SeriesEditor: GeorgeJ.Klir ThomasJ.WatsonSchoolofEngineeringandAppliedSciences DepartmentofSystemsScienceandIndustrialEngineering BinghamtonUniversity Binghamton,NY13902 U.S.A ISBN978-0-387-72924-4 e-ISBN978-0-387-72926-8 DOI10.1007/978-0-387-72926-8 SpringerNewYorkDordrechtHeidelbergLondon LibraryofCongressControlNumber:2009937251 MathematicsSubjectsClassification(2000): 03B65,68-XX,93A30 (cid:2)c SpringerScience+BusinessMedia,LLC2010 Allrightsreserved.Thisworkmaynotbetranslatedorcopiedinwholeorinpartwithoutthewritten permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY10013,USA),exceptforbriefexcerptsinconnectionwithreviewsorscholarlyanalysis.Usein connection with any form of information storage and retrieval, electronic adaptation, computer software,orbysimilarordissimilarmethodologynowknownorhereafterdevelopedisforbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not theyaresubjecttoproprietaryrights. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Tomywife,OlgaSvyatoslavovnaFomichova Preface Gluecklich,diewissen,dasshinterallen SprachendasUnsaeglichesteht. Thosearehappywhoknowthatbehind alllanguagesthereissomethingunsaid RainerMariaRilke This book shows in a new way that a solution to a fundamental problem from onescientificfieldcanhelptofindthesolutionstoimportantproblemsemergedin severalotherfieldsofscienceandtechnology. In modern science, the term “Natural Language” denotes the collection of all suchlanguagesthateverylanguageisusedasaprimarymeansofcommunicationby peoplebelongingtoanycountryoranyregion.SoNaturalLanguage(NL)includes, inparticular,theEnglish,Russian,andGermanlanguages. The applied computer systems processing natural language printed or written texts(NL-texts)ororalspeechwithrespecttothefactthatthewordsareassociated withsomemeaningsarecalledsemantics-orientednaturallanguageprocessingsys- tems(NLPSs). On one hand, this book is a snapshot of the current stage of a research pro- gram started many years ago and called Integral Formal Semantics (IFS) of NL. Thegoalofthisprogramhasbeentodeveloptheformalmodelsandmethodshelp- ingtoovercomethedifficultiesoflogicalcharacterassociatedwiththeengineering ofsemantics-orientedNLPSs.Thedesignersofsuchsystemsofarbitrarykindswill findinthisbooktheformalmeansandalgorithmsbeingofgreathelpintheirwork. Ontheotherhand,thisbookcanbecomeasourceofnewpowerfulformaltools for the specialists from several different communities interested in developing se- manticinformationaltechnologies(or,shorter,semantictechnologies),inparticular, fortheresearchersdeveloping • theknowledgerepresentationlanguagesfortheontologiesintheSemanticWeb projectandotherfields; • theformallanguagesandcomputerprogramsforbuildingandanalyzingthese- manticannotationsofWebsourcesandWebservices; vii viii Preface • theformalmeansforsemanticdataintegrationine-scienceande-health; • theadvancedcontentrepresentationlanguagesinthefieldofmultiagentsystems; • thegeneral-purposeformallanguagesforelectronicbusinesscommunicational- lowing, in particular, for representing the content of negotiations conducted by computer intelligent agents (CIAs) in the field of e-commerce and for forming thecontractsconcludedbyCIAsastheresultofsuchnegotiations. During last 20 years, semantics-oriented NLPSs have become one of the main subclassesofappliedintelligentsystems(or,inotherterms,ofthecomputersystems withtheelementsofartificialintelligence). DuetothestormyprogressoftheInternet,theendusersinnumerouscountries havereceivedtechnicalaccesstoNL-textsstoredfarawayfromtheirterminals.This hasposednewdemandstothedesignersofNLPSs.Inthisconnectionitshouldbe underlinedthatseveralacutescientific–technicalproblemsrequiretheconstruction ofcomputersystemsbeingableto“understand”themeaningsofarbitraryNL-texts pertaining to some fields of humans’ professional activity. The collection of these problems,inparticular,includes • the extraction of information from textual sources for forming and updating knowledge bases of applied intelligent systems and the creation of a Semantic Web; • thesummarizationofNL-textsstoredonacertainWebsiteorselectedinaccor- dancewithcertaincriteria; • conceptual informationretrievalintextual databases onNL-requests oftheend users; • question answering based on the semantic-syntactic analysis of NL-texts being componentsofWebdocuments. Semantics-oriented NLPSs are complex technical systems; their design is as- sociated not only with programming but also with solving numerous questions of logical character. That is why this field of engineering, as the other fields of con- structing complex technical systems, needs effective formal tools, first of all, the formal means being convenient both for describing semantic structure of arbitrary NL-textspertainingtovariousfieldsofhumans’professionalactivityandforrepre- sentingknowledgeabouttheworld. SystemsSciencehasproposedahugeamountofmathematicalmodelsandmeth- ods thatareuseful forabroad spectrumof technical and social applications: from the design and control of airplanes, rockets, and ships to modeling chemical pro- cessesandproduction-sailingactivityofthefirms. Theprincipalpurposeofthismonograph istoopen forSystems Science anew fieldofstudies–thedevelopmentofformalmodelsandmethodsintendedforhelp- ingthedesignersofsemantics-orientedNLPSstoovercomenumerousproblemsof logicalcharacterassociatedwiththeengineeringofsuchsystems. This new field of studies can be called Mathematical Linguocybernetics (this termwasintroducedbytheauthorin [66]). Let’sconsidertheinformaldefinitionsofseveralnotionsusedbelowfordescrib- ingtheprincipalaspectsofthescientificnoveltyofthisbook. Preface ix Theterm“semanticsofNaturalLanguage”willdenotethecollectionofthereg- ularitiesofconveyinginformationbymeansofNL.Discourses(ornarrativetexts) arethefinitesequencesofthesentencesinNLwiththeinterrelatedmeanings. IfTisanexpressioninNL(ashortwordcombination,asentence,oradiscourse), a structured meaning of the expression T is an informational structure being con- structed by the brain of a person having command of the considered sublanguage ofNL(Russian,English,oranyother),andtheconstructionofthisstructureisin- dependentofthecontextoftheexpressionT,thatis,thisinformationalstructureis builtonthebasisofknowledgeaboutonlyelementarymeaningfullexicalunitsand therulesofcombiningsuchunitsintheconsideredsublanguageofNL. Let’s agree that a semantic representation (SR) of an NL-expression T isa for- malstructurebeingeitheranimageofastructuredmeaningoftheconsideredNL- expressionorbeingareflectionofthemeaning(orcontent)ofthegivenexpression inadefinitecontext–inaconcretesituationofadialogue,inthecontextofknowl- edgeabouttheworld,orinthecontextoftheprecedingpartofthediscourse. Thus,anSRofanNL-expressionT issuchformalstructurethatitsbasiccompo- nentsare,inparticular,thedesignationsofthenotions,concretethings,thesetsof things,events,functionsandrelations,logicalconnectives,numbersandcolors,and also the designations of the conceptual relationships between the meanings of the fragmentsofNL-textsorbetweentheentitiesoftheconsideredapplicationdomain. Semantic representations of NL-texts may be, for instance, the strings and the markedorientedgraphs(semanticsets). Analgorithmofsemantic-syntacticanalysisbuildsanSRofanNL-expression, proceedingfromtheknowledgeaboutthemorphologyandsyntaxoftheconsidered sublanguage of NL (English, Russian, etc.), from the information about the asso- ciations of lexical units with the units of conceptual level (or semantic level), and taking into account the knowledge about application domains. An SR of the text constructed by such an algorithm is interpreted by an applied computer system in accordancewithitsspecialization,forinstance,asarequesttosearchananswerto aquestion,oracommandtocarryoutanactionbyanautonomousintelligentrobot, orasapieceofknowledgetobeinscribedintotheknowledgebase,etc. Thescientificresultsstatedinthismonographhavebeenobtainedbytheauthor while fulfilling a research program started over 20 years ago. The choice of the directionofthestudieswasareactiontoalmostcompletelackinthattimeofmath- ematicalmeansandmethodsthatwereconvenientfordesigningsemantics-oriented NLPSs. The results of this monograph not only contribute to a movement forward but alsomeanaqualitativeleapinthefieldofelaboratingtheformalmeansandmeth- ods of developing the algorithms of semantic-syntactic analysis of NL-texts. This qualitativeleapisconditionedbythefollowingmainfactors: • ThedesignersofNLPSshavereceivedasystemoftherulesforconstructingwell- formedformulas(besides,acompactsystem,itconsistsofonlytenmainrules) allowing for (according to the hypothesis of the author) building semantic rep- resentations of arbitrary texts pertaining to numerous fields of humans’ profes- sionalactivity,i.e.,SRsoftheNL-textsoneconomy,medicine,law,technology, x Preface politics,etc.ThismeansthattheeffectiveproceduresofconstructingSRsofNL- textsandeffectivealgorithmsofprocessingSRsofNL-texts(withrespecttothe contextofadialogueorofaprecedingpartofdiscourse,takingintoaccountthe knowledgeaboutapplicationdomains)canbeusedinvariousthematicdomains, anditwillbepossibletoexpandthepossibilitiesoftheseproceduresincaseof emergingnewproblems. • Amathematicalmodelofabroadlyapplicablelinguisticdatabaseisconstructed, i.e., a model of a database containing such information about the lexical units andtheirinterrelationswiththeunitsofconceptuallevelthatthisinformationis sufficientforsemantic-syntacticanalysisofthesublanguagesofnaturallanguage beinginterestingforanumberofapplications. • Acomplexanduseful,stronglystructuredalgorithmofsemantic-syntacticanal- ysisofNL-textsiselaboratedthatisdescribednotbymeansofanyprogramming systembutcompletelywiththehelpofaproposedsystemofformalnotions,this makes the algorithm independent of program implementation and application domain. • A possible structure of several mathematical models of the new kinds is pro- posed with the aim of opening for Systems Science a new field of studies high significanceforComputerScience. Informationaltechnologiesimplementedinsemantics-orientedNLPSsbelongto the class of Semantic Informational Technologies (or, shorter, Semantic Technolo- gies).Thistermwasbornonlyseveralyearsagoasaconsequenceoftheemergence of the Semantic Web project, the use of ontologies in this project and many other projects, the elaboration of Content Representation Languages as the components ofAgentCommunicationLanguagesinthefieldofMultiagentSystems,andofthe studies on formal means for representing the records of negotiations and the con- tractsinthefieldofElectronicCommerce(E-commerce). One of the precious features of this monograph is that the elaborated power- fulformalmeansofdescribingstructuredmeaningsofNL-textsprovideabroadly applicable and flexible formal framework for the development of Semantic Tech- nologiesasawhole. ContentoftheBook The monograph contains two parts. Part 1, consisting of Chaps. 1, 2, 3, 4, 5, and 6, will be of interest to a broad circle of the designers of Semantic Informational Technologies. Part 2 (Chaps. 7, 8, 9, 10, and 11) is intended for the designers of Semantics-OrientedNaturalLanguageProcessingSystems. Chapter1groundsthenecessityofenrichingtheinventoryofformalmeans,mod- els, and methods intended for designing semantics-oriented NLPSs. Special atten- tionispaidtoshowingthenecessityofcreatingtheformalmeansbeingconvenient for describing structured meanings of arbitrary sentences and discourses pertain- ing to various fields of humans’ professional activity. The context of Cognitive

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.