Table Of ContentSemantics-Oriented Natural Language Processing
Volume 27
InternationalFederationforSystemsResearchInternational
Series on Systems Science and Engineering
SeriesEditor:GeorgeJ.Klir
BinghamtonStateUniversity
EditorialBoard
GerritBroekstra IvanM.Havel
ErasmusUniversity,Rotterdam, CharlesUniversity,Prague,
TheNetherlands CzechRepublic
JohnL.Castii KlausKornwachs
SanteFeInstitute,NewMexico TechnicalUniversityofCottbus,
Germany
BrianGaines FranzPichler
UniversityofCalgary,Canada UniversityofLinzAustria
Volume22 ORGANIZATIONSTRUCTURE:CyberneticSystems
Foundation
YasuhikoTakaharaandMihajloMesarovic
Volume23 CONSTRAINT THEORY: Multidimensional Mathematical
ModelManagement
GeorgeJ.Friedman
Volume24 FOUNDATIONS AND APPLICATIONS OF MIS: Model
TheoryApproach
YasuhikoTakaharaandYongmeiLiu
Volume25 GENERALIZEDMEASURETHEORY
ZhenyuanWangandGeorgeJ.Klir
Volume26 AMISSINGLINKINCYBERNETICS:LogicandContinuity
AlexM.Andrew
Volume27 SEMANTICS-ORIENTEDNATURALLANGUAGE
PROCESSING:MathematicalModelsandAlgorithms
VladimirA.Fomichov
IFSRwasestablished“tostimulateallactivitiesassociatedwiththescientificstudyofsystemsandto
coordinatesuchactivitiesatinternationallevel.”Theaimofthisseriesistostimulatepublicationof
high-qualitymonographsandtextbooksonvarioustopicsofsystemsscienceandengineering.Thisseries
complementstheFederation’sotherpublications.
AContinuationOrderPlanisavailableforthisseries.Acontinuationorderwillbringdeliveryofeach
newvolumeimmediatelyuponpublication.Volumesarebilledonlyuponactualshipment.Forfurther
informationpleasecontactthepublisher.
Volumes1–6werepublishedbyPergamonPress.
Vladimir A. Fomichov
Semantics-Oriented Natural
Language Processing
Mathematical Models and Algorithms
123
VladimirA.Fomichov
FacultyofBusinessInformatics
DepartmentofInnovationsandBusiness
intheSphereofInformationalTechnologies
StateUniversity–HigherSchoolofEconomics
KirpichnayaStreet33,105679Moscow,Russia
vfomichov@hse.ru
vdrfom@aha.ru
v.fomichov@snhu.edu
SeriesEditor:
GeorgeJ.Klir
ThomasJ.WatsonSchoolofEngineeringandAppliedSciences
DepartmentofSystemsScienceandIndustrialEngineering
BinghamtonUniversity
Binghamton,NY13902
U.S.A
ISBN978-0-387-72924-4 e-ISBN978-0-387-72926-8
DOI10.1007/978-0-387-72926-8
SpringerNewYorkDordrechtHeidelbergLondon
LibraryofCongressControlNumber:2009937251
MathematicsSubjectsClassification(2000): 03B65,68-XX,93A30
(cid:2)c SpringerScience+BusinessMedia,LLC2010
Allrightsreserved.Thisworkmaynotbetranslatedorcopiedinwholeorinpartwithoutthewritten
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York,
NY10013,USA),exceptforbriefexcerptsinconnectionwithreviewsorscholarlyanalysis.Usein
connection with any form of information storage and retrieval, electronic adaptation, computer
software,orbysimilarordissimilarmethodologynowknownorhereafterdevelopedisforbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if
they are not identified as such, is not to be taken as an expression of opinion as to whether or not
theyaresubjecttoproprietaryrights.
Printedonacid-freepaper
SpringerispartofSpringerScience+BusinessMedia(www.springer.com)
Tomywife,OlgaSvyatoslavovnaFomichova
Preface
Gluecklich,diewissen,dasshinterallen
SprachendasUnsaeglichesteht.
Thosearehappywhoknowthatbehind
alllanguagesthereissomethingunsaid
RainerMariaRilke
This book shows in a new way that a solution to a fundamental problem from
onescientificfieldcanhelptofindthesolutionstoimportantproblemsemergedin
severalotherfieldsofscienceandtechnology.
In modern science, the term “Natural Language” denotes the collection of all
suchlanguagesthateverylanguageisusedasaprimarymeansofcommunicationby
peoplebelongingtoanycountryoranyregion.SoNaturalLanguage(NL)includes,
inparticular,theEnglish,Russian,andGermanlanguages.
The applied computer systems processing natural language printed or written
texts(NL-texts)ororalspeechwithrespecttothefactthatthewordsareassociated
withsomemeaningsarecalledsemantics-orientednaturallanguageprocessingsys-
tems(NLPSs).
On one hand, this book is a snapshot of the current stage of a research pro-
gram started many years ago and called Integral Formal Semantics (IFS) of NL.
Thegoalofthisprogramhasbeentodeveloptheformalmodelsandmethodshelp-
ingtoovercomethedifficultiesoflogicalcharacterassociatedwiththeengineering
ofsemantics-orientedNLPSs.Thedesignersofsuchsystemsofarbitrarykindswill
findinthisbooktheformalmeansandalgorithmsbeingofgreathelpintheirwork.
Ontheotherhand,thisbookcanbecomeasourceofnewpowerfulformaltools
for the specialists from several different communities interested in developing se-
manticinformationaltechnologies(or,shorter,semantictechnologies),inparticular,
fortheresearchersdeveloping
• theknowledgerepresentationlanguagesfortheontologiesintheSemanticWeb
projectandotherfields;
• theformallanguagesandcomputerprogramsforbuildingandanalyzingthese-
manticannotationsofWebsourcesandWebservices;
vii
viii Preface
• theformalmeansforsemanticdataintegrationine-scienceande-health;
• theadvancedcontentrepresentationlanguagesinthefieldofmultiagentsystems;
• thegeneral-purposeformallanguagesforelectronicbusinesscommunicational-
lowing, in particular, for representing the content of negotiations conducted by
computer intelligent agents (CIAs) in the field of e-commerce and for forming
thecontractsconcludedbyCIAsastheresultofsuchnegotiations.
During last 20 years, semantics-oriented NLPSs have become one of the main
subclassesofappliedintelligentsystems(or,inotherterms,ofthecomputersystems
withtheelementsofartificialintelligence).
DuetothestormyprogressoftheInternet,theendusersinnumerouscountries
havereceivedtechnicalaccesstoNL-textsstoredfarawayfromtheirterminals.This
hasposednewdemandstothedesignersofNLPSs.Inthisconnectionitshouldbe
underlinedthatseveralacutescientific–technicalproblemsrequiretheconstruction
ofcomputersystemsbeingableto“understand”themeaningsofarbitraryNL-texts
pertaining to some fields of humans’ professional activity. The collection of these
problems,inparticular,includes
• the extraction of information from textual sources for forming and updating
knowledge bases of applied intelligent systems and the creation of a Semantic
Web;
• thesummarizationofNL-textsstoredonacertainWebsiteorselectedinaccor-
dancewithcertaincriteria;
• conceptual informationretrievalintextual databases onNL-requests oftheend
users;
• question answering based on the semantic-syntactic analysis of NL-texts being
componentsofWebdocuments.
Semantics-oriented NLPSs are complex technical systems; their design is as-
sociated not only with programming but also with solving numerous questions of
logical character. That is why this field of engineering, as the other fields of con-
structing complex technical systems, needs effective formal tools, first of all, the
formal means being convenient both for describing semantic structure of arbitrary
NL-textspertainingtovariousfieldsofhumans’professionalactivityandforrepre-
sentingknowledgeabouttheworld.
SystemsSciencehasproposedahugeamountofmathematicalmodelsandmeth-
ods thatareuseful forabroad spectrumof technical and social applications: from
the design and control of airplanes, rockets, and ships to modeling chemical pro-
cessesandproduction-sailingactivityofthefirms.
Theprincipalpurposeofthismonograph istoopen forSystems Science anew
fieldofstudies–thedevelopmentofformalmodelsandmethodsintendedforhelp-
ingthedesignersofsemantics-orientedNLPSstoovercomenumerousproblemsof
logicalcharacterassociatedwiththeengineeringofsuchsystems.
This new field of studies can be called Mathematical Linguocybernetics (this
termwasintroducedbytheauthorin [66]).
Let’sconsidertheinformaldefinitionsofseveralnotionsusedbelowfordescrib-
ingtheprincipalaspectsofthescientificnoveltyofthisbook.
Preface ix
Theterm“semanticsofNaturalLanguage”willdenotethecollectionofthereg-
ularitiesofconveyinginformationbymeansofNL.Discourses(ornarrativetexts)
arethefinitesequencesofthesentencesinNLwiththeinterrelatedmeanings.
IfTisanexpressioninNL(ashortwordcombination,asentence,oradiscourse),
a structured meaning of the expression T is an informational structure being con-
structed by the brain of a person having command of the considered sublanguage
ofNL(Russian,English,oranyother),andtheconstructionofthisstructureisin-
dependentofthecontextoftheexpressionT,thatis,thisinformationalstructureis
builtonthebasisofknowledgeaboutonlyelementarymeaningfullexicalunitsand
therulesofcombiningsuchunitsintheconsideredsublanguageofNL.
Let’s agree that a semantic representation (SR) of an NL-expression T isa for-
malstructurebeingeitheranimageofastructuredmeaningoftheconsideredNL-
expressionorbeingareflectionofthemeaning(orcontent)ofthegivenexpression
inadefinitecontext–inaconcretesituationofadialogue,inthecontextofknowl-
edgeabouttheworld,orinthecontextoftheprecedingpartofthediscourse.
Thus,anSRofanNL-expressionT issuchformalstructurethatitsbasiccompo-
nentsare,inparticular,thedesignationsofthenotions,concretethings,thesetsof
things,events,functionsandrelations,logicalconnectives,numbersandcolors,and
also the designations of the conceptual relationships between the meanings of the
fragmentsofNL-textsorbetweentheentitiesoftheconsideredapplicationdomain.
Semantic representations of NL-texts may be, for instance, the strings and the
markedorientedgraphs(semanticsets).
Analgorithmofsemantic-syntacticanalysisbuildsanSRofanNL-expression,
proceedingfromtheknowledgeaboutthemorphologyandsyntaxoftheconsidered
sublanguage of NL (English, Russian, etc.), from the information about the asso-
ciations of lexical units with the units of conceptual level (or semantic level), and
taking into account the knowledge about application domains. An SR of the text
constructed by such an algorithm is interpreted by an applied computer system in
accordancewithitsspecialization,forinstance,asarequesttosearchananswerto
aquestion,oracommandtocarryoutanactionbyanautonomousintelligentrobot,
orasapieceofknowledgetobeinscribedintotheknowledgebase,etc.
Thescientificresultsstatedinthismonographhavebeenobtainedbytheauthor
while fulfilling a research program started over 20 years ago. The choice of the
directionofthestudieswasareactiontoalmostcompletelackinthattimeofmath-
ematicalmeansandmethodsthatwereconvenientfordesigningsemantics-oriented
NLPSs.
The results of this monograph not only contribute to a movement forward but
alsomeanaqualitativeleapinthefieldofelaboratingtheformalmeansandmeth-
ods of developing the algorithms of semantic-syntactic analysis of NL-texts. This
qualitativeleapisconditionedbythefollowingmainfactors:
• ThedesignersofNLPSshavereceivedasystemoftherulesforconstructingwell-
formedformulas(besides,acompactsystem,itconsistsofonlytenmainrules)
allowing for (according to the hypothesis of the author) building semantic rep-
resentations of arbitrary texts pertaining to numerous fields of humans’ profes-
sionalactivity,i.e.,SRsoftheNL-textsoneconomy,medicine,law,technology,
x Preface
politics,etc.ThismeansthattheeffectiveproceduresofconstructingSRsofNL-
textsandeffectivealgorithmsofprocessingSRsofNL-texts(withrespecttothe
contextofadialogueorofaprecedingpartofdiscourse,takingintoaccountthe
knowledgeaboutapplicationdomains)canbeusedinvariousthematicdomains,
anditwillbepossibletoexpandthepossibilitiesoftheseproceduresincaseof
emergingnewproblems.
• Amathematicalmodelofabroadlyapplicablelinguisticdatabaseisconstructed,
i.e., a model of a database containing such information about the lexical units
andtheirinterrelationswiththeunitsofconceptuallevelthatthisinformationis
sufficientforsemantic-syntacticanalysisofthesublanguagesofnaturallanguage
beinginterestingforanumberofapplications.
• Acomplexanduseful,stronglystructuredalgorithmofsemantic-syntacticanal-
ysisofNL-textsiselaboratedthatisdescribednotbymeansofanyprogramming
systembutcompletelywiththehelpofaproposedsystemofformalnotions,this
makes the algorithm independent of program implementation and application
domain.
• A possible structure of several mathematical models of the new kinds is pro-
posed with the aim of opening for Systems Science a new field of studies high
significanceforComputerScience.
Informationaltechnologiesimplementedinsemantics-orientedNLPSsbelongto
the class of Semantic Informational Technologies (or, shorter, Semantic Technolo-
gies).Thistermwasbornonlyseveralyearsagoasaconsequenceoftheemergence
of the Semantic Web project, the use of ontologies in this project and many other
projects, the elaboration of Content Representation Languages as the components
ofAgentCommunicationLanguagesinthefieldofMultiagentSystems,andofthe
studies on formal means for representing the records of negotiations and the con-
tractsinthefieldofElectronicCommerce(E-commerce).
One of the precious features of this monograph is that the elaborated power-
fulformalmeansofdescribingstructuredmeaningsofNL-textsprovideabroadly
applicable and flexible formal framework for the development of Semantic Tech-
nologiesasawhole.
ContentoftheBook
The monograph contains two parts. Part 1, consisting of Chaps. 1, 2, 3, 4, 5, and
6, will be of interest to a broad circle of the designers of Semantic Informational
Technologies. Part 2 (Chaps. 7, 8, 9, 10, and 11) is intended for the designers of
Semantics-OrientedNaturalLanguageProcessingSystems.
Chapter1groundsthenecessityofenrichingtheinventoryofformalmeans,mod-
els, and methods intended for designing semantics-oriented NLPSs. Special atten-
tionispaidtoshowingthenecessityofcreatingtheformalmeansbeingconvenient
for describing structured meanings of arbitrary sentences and discourses pertain-
ing to various fields of humans’ professional activity. The context of Cognitive
Description:This book examines key issues in designing semantics-oriented natural language (NL) processing systems. One of the key features is an original strategy for transforming the existing World Wide Web into a new generation Semantic Web (SW-2) and the basic formal tools for its realization, which are pro