ebook img

Current Challenges in Patent Information Retrieval PDF

414 Pages·2011·8.818 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Current Challenges in Patent Information Retrieval

The Information Retrieval Series Volume 29 SeriesEditor W.BruceCroft EditorialBoard ChengXiangZhai MaartendeRijke NicholasJ.Belkin CharlesClarke Mihai Lupu (cid:2) Katja Mayer (cid:2) John Tait (cid:2) Anthony J. Trippe Editors Current Challenges in Patent Information Retrieval Editors MihaiLupu JohnTait InformationRetrievalFacility InformationRetrievalFacility Donau-CityStraße1 Donau-CityStraße1 Vienna1220 Vienna1220 Austria Austria [email protected] [email protected] KatjaMayer AnthonyJ.Trippe InformationRetrievalFacility 3LPAdvisors Donau-CityStraße1 PostRd.7003Suite409 Vienna1220 43016Dublin,OH Austria USA [email protected] [email protected] ISSN1387-5264 ISBN978-3-642-19230-2 e-ISBN978-3-642-19231-9 DOI10.1007/978-3-642-19231-9 SpringerHeidelbergDordrechtLondonNewYork LibraryofCongressControlNumber:2011926006 ACMComputingClassification(1998): H.3,I.7,J.1 ©Springer-VerlagBerlinHeidelberg2011 Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,broadcasting, reproductiononmicrofilmorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9, 1965,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violations areliabletoprosecutionundertheGermanCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,etc.inthispublicationdoesnot imply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotective lawsandregulationsandthereforefreeforgeneraluse. Coverdesign:deblik Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface Patent Information Retrieval is an economically important activity. Today’s econ- omy is becoming increasingly knowledge-based and intellectual property in the formofpatentsplaysavitalroleinthisgrowth.Between1998and2008,thenumber ofpatentapplicationsfiledworldwidegrewbymorethan50percent.Thenumber of granted patents worldwide continues to increase, albeit at a slower rate than at itspeakin2006(18%),whensome727,000patentsweregranted.Thesubstantial increaseinpatentsgrantedisdue,inpart,toeffortsbypatentofficestoreduceback- logsaswellasthesignificantgrowthinthenumberofpatentsgrantedbyChinaand, toalesserextentinthemorerecentyears,bytheRepublicofKorea.Accordingto these statistics, the total number of patents in force worldwide at the end of 2008 wasapproximately6.7million(WIPOreport2010).Apriorartsearchmighthave tocoverasmanyas70millionpatents.BycombiningdatafromOceanTomo’sIn- tangible Asset Market Value Survey, and Standard and Poor’s 1200 Index we can estimatethattheglobalvalueofpatentsexceedsUS$10trillionin2009. Apatentisabargainbetweentheinventorandthestate.Theinventormustteach thecommunityhowtomaketheproduct,andusethetechniqueshe/shehasinvented inreturnforalimitedmonopolywhichgiveshimasettimetoexploithisinvention and realise its value. Patents are used for many reasons, e.g. to protect inventions, to create value and to monitor competitive activities in a field. Much knowledge isdistilledthroughpatents,whichisneverpublishedelsewhere.Thuspatentsform animportantknowledgeresource—e.g.muchtechnicalinformationrepresentedin patentsisnotrepresentedinscientificliterature—andareatthesametimeimportant legaldocuments. Despitetheoverallincreaseinpatentapplicationsandgrants,asituationofeco- nomic downturn, such as the one the world has experienced in 2008, leads to a reductioninpatentapplicationsandgrants(asindicatedbypreliminaryfigurespub- lished by WIPO for 2009). This is, to some extent, explained by the high costs involvedinapplyingforapatent,particularlyforsmallenterprises.Thecostsofthe pre-applicationprocess,thelongdurationoftheapplicationprocessandthecorre- spondinguncertaintyinthelong-termeconomyinsuchperiodsofeconomicdown- turnneedtobeaddressedbychangingthewaywesearchthepatentandnon-patent v vi Preface literature.BoththeIntellectualProperty(IP)professionalsandtheInformationRe- trieval (IR) scientists can see this book as a challenge: for the former, in terms of adaptingtonewtools;forthelatter,intermsofcreatingbettertoolsforanobviously difficulttask;forboth,intermsofengaginginexchangeandcooperation. Inthepast10or15years,generalinformationretrievalandWebsearchengines havemadetremendousadvances.Andstill,weseeahugegapbetweenthetechnolo- gieswhich,ontheonehand,wereemergingfromresearchlabsandinusebymajor internet search engines, in e-commerce, and in enterprise search systems, and, on theother,thesystemsinday-to-dayusebythepatentsearchcommunities. It has been estimated that since 1991, when the US Federal National Insti- tute of Standards and Technology (NIST) began its Text Retrieval Conference (TREC) evaluation campaign, the available information retrieval and search sys- tems have improved 40% or more in their ability to find relevant documents. And yetthetechnologiesunderlyingthepatentsearchsystemwerelargelyunaffectedby thesechanges.Patentsearchersgenerallyusethesametechnologyasinthe1980s. Boolean specification of searches and set-based retrieval are the norm rather than therankedretrievalsystemsusedbyGoogleandthelike.Toolsinsomeareashave movedonsignificantly:someprovidershavesemanticanalysistools,otherseffec- tivevisualisationmechanismsforpatentdocuments.Andyettherehasnotbeenthe kindofrevolutioninpatentsearchwhichGooglehadrepresentedforWebsearch. Inthepastfewyears,theInformationRetrievalFacility(anot-for-profitresearch institution based in Vienna, Austria) has organised a series of events to bring to- gether leading researchers in IR with those who practice and use patent search, to establishtheinterdisciplinarydialoguebetweentheIRandtheIPcommunitiesand to create a discursive as well as empirical space for sustainablediscussion and in- novation. In the first Information Retrieval Facility Symposium in Vienna in 2007 (www.irfs.at),adistinguishedaudienceofinformationretrievalscientistsandpatent search specialists started to explore the reasons for the knowledge gap. It turned out that academic researchers were often unaware of the specialised needs of the patent searchers: for example, they needed a degree of transparency quite unlike the casual Web searchers, upon which the academics mainly focussed. The patent searcherswereoftenunawareoftheadvancesmadeinotherareas,andhowtheyhad been achieved. There were difficulties in finding (and using) a common, compre- hensiblevocabulary.InthecourseofthatfirstSymposium,andthroughsubsequent IRF symposia and other joint activities, such as the CLEF-IP and TREC-CHEM tracks,thePaIRandAspireworkshops,majorprogresshasbeenmadeindevelop- ingacommonunderstanding,andevenanagendabetweensearchresearchersand technologistsandthepatentsearchcommunity. Thisbookispartofthedevelopmentofthatjointunderstanding.Itsoriginsliein theideaofproducingpost-proceedingsforthefirstIRFSymposium.Thatideawas notfullyfollowedup,inpartbecauseofpressuretoproducemorepractical,action- oriented work, and in part because many of the participants felt their approaches wereattooearlyastageforformalpublication.Inthecourseofthefollowingyears itbecameapparenttherereallywasademandtoproduceavolumewhichwasacces- sibletoboththepatentsearchcommunityandtotheinformationretrievalresearch Preface vii community;toprovideacollectedandorganizedintroductiontotheworkandviews ofthetwosidesoftheemergingpatentsearchresearchandinnovationcommunity; andtoprovideacoherentandorganisedviewofwhathasbeenachievedand,per- hapsevenmoresignificantly,ofwhatremainstobeachieved. We have already noted the need for transparency (or at least defensibility) of searchprocessesfromthepatentsearchcommunity.Wehopethisbookwillallow theIRresearcherstobetterunderstandwhysuchtransparencyisneeded,andwhatit meansinpractise.Furthermore,itisourhopethatthisbookwillalsobeavaluable resourceforIPprofessionalsinlearningaboutcurrentapproachesofIRinthepatent domain. It has often been difficult to reconcile the focus on useful technological innovation from the IP community, with the demands for scientific rigour and to proceedonthebasisofsoundempiricalevidence,whichissuchanimportantfeature ofIR(incontrasttosomeotherareasofcomputerscience). Moreover, patent search is an inherently multilingual and multinational topic: thenoveltyofapatentmaybedismissedbyfindingadocumentdescribingthesame ideainanylanguageanywhereintheworld.Patentsarecomplexlegaldocuments, evenlessaccessiblethanthescientificliterature.Thesearejustsomeofthecharac- teristicsofthepatentsystem,whichmakeitanimportantchallengeforthesearch, informationretrievalandinformationaccesscommunities. The book has had a lengthy and difficult gestation: the list of authors has been revisedmanytimesasaresultofchangesininstitutional,occupationalandprivate circumstances. Although we, the editors, do feel we have succeeded in producing a volume which will provide important perspectives of the issues affecting patent searchresearchandinnovationatthetimeofwriting,aswellasauseful,briefin- troductiontotheoutlookandliteratureofthecommunityaccessibletoitsmembers, regardlessoftheirbackground,wewouldhavelikedtocoverseveraltopicsnotrep- resentedhere. InparticularitwasdisappointingwecouldnotincludeachapteronNTCIR,the first of the evaluation campaigns to focus seriously on patents. Also, a chapter on theuseofLatentSemanticIndexingforthepatentdomainhadbeenplanned,which ultimatelycouldnotappearinthisbook. Several of the chapters have been written jointly by intellectual property and informationretrievalexperts.Membersofbothcommunitieswithabackgroundop- positetotheprimaryauthorhavereviewedallthechapters.Ithasnotalwaysbeen easytoreconciletheirdifferingviewpoints:wemustthankthemfortakingthetime toresolvetheirdifferencesandfortakingtheopportunitytoexchangetheirknowl- edge across fields and disciplinary mind-sets and to engage in a mutual discourse thatwillhopefullyfostertheunderstandinginthefuture. Finally, we would like to thank the IRF for making this publication possible, thepublisher,Springer;andinparticularRalfGerstner,forthepatiencewithwhich he accepted the numerous delays, as well as the externalreviewers who read each chapterandprovidedtheauthorswithvaluableadvice. Theeditorsareverygratefultothefollowingpersons,whoagreedtoreviewthe manuscripts:StephenAdams,LindaAndersson,GeethaBasappa,JohnM.Barnard, Shariq Bashir, Helmut Berger, Katrien Beuls, Ted Briscoe, Ben Carterette, Paul viii Preface Clough,BruceCroft,SzabolcsCsepregi,BarrouDiallo,KarlA.Froeschl,Norbert Fuhr, Eric Gaussier, Julio Gonzalo, Allan Hanbury, Christopher G. Harris, Ilkka Havukkala, Bruce Hedin, Cornelis H.A. Koster, Mounia Lalmas, Patrice Lopez, TeresaLoughbrough,Marie-FrancineMoens,HenningMüller,IadhOunis,Florina Piroi,KeithvanRijsbergen,PatrickRuch,PhilipTetlow,HenkThomas,IngoThon, SteveTomlinson,AnthonyTrippe,SuzanVerberne,EllenM.Voorhees,PeterWil- lett,ChristaWomser-Hacker. MihaiLupu KatjaMayer JohnTait AnthonyTrippe Contents PartI IntroductiontoPatentSearching 1 IntroductiontoPatentSearching . . . . . . . . . . . . . . . . . . . . 3 DoreenAlberts,CynthiaBarcelonYang,DeniseFobare-DePonio,Ken Koubek,SuzanneRobins,MatthewRodgers,EdlynSimmons,and DominicDeMarco 2 AnIntroductiontoContemporarySearchTechnology . . . . . . . . 45 VeronikaStefanovandJohnI.Tait PartII EvaluatingPatentRetrieval 3 OverviewofInformationRetrievalEvaluation . . . . . . . . . . . . . 69 BenCarteretteandEllenM.Voorhees 4 Evaluating Information Retrieval in the Intellectual Property Domain:TheCLEF–IPCampaign . . . . . . . . . . . . . . . . . . . 87 FlorinaPiroiandVeronikaZenz 5 EvaluationofChemicalInformationRetrievalTools . . . . . . . . . 109 MihaiLupu,JimmyHuang,andJianhanZhu 6 EvaluatingRealPatentRetrievalEffectiveness . . . . . . . . . . . . 125 AnthonyTrippeandIanRuthven PartIII HighRecallSearch 7 MeasuringandImprovingAccesstotheCorpus . . . . . . . . . . . . 147 RichardBache 8 MeasuringEffectivenessintheTRECLegalTrack . . . . . . . . . . 167 StephenTomlinsonandBruceHedin ix x Contents 9 Large-ScaleLogicalRetrieval:TechnologyforSemanticModelling ofPatentSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 HanyAzzam,IraklisA.Klampanos,andThomasRoelleke 10 PatentClaimDecompositionforImprovedInformationExtraction . 197 PeterParapaticsandMichaelDittenbach 11 FromStaticTextualDisplayofPatentstoGraphicalInteractions . . 217 SteffenKochandHaraldBosch PartIV Classification 12 AutomatedPatentClassification. . . . . . . . . . . . . . . . . . . . . 239 KarimBenzinebandJacquesGuyot 13 Phrase-basedDocumentCategorization . . . . . . . . . . . . . . . . 263 CornelisH.A.Koster,JeanG.Beney,SuzanVerberne,andMerijnVogel 14 UsingClassificationCodeHierarchiesforPatentPriorArtSearches 287 ChristopherG.Harris,RobertArens,andPadminiSrinivasan PartV SemanticSearch 15 Information Extraction and Semantic Annotation for Multi- ParadigmInformationManagement . . . . . . . . . . . . . . . . . . 307 Hamish Cunningham, Valentin Tablan, Ian Roberts, Mark A. Greenwood,andNirajAswani 16 IntelligentInformationAccessfromScientificPapers . . . . . . . . . 329 TedBriscoe,KarlHarrison,AndrewNaish,AndyParker,MarekRei, AdvaithSiddharthan,DavidSinclair,MarkSlater,andRebeccaWatson 17 RepresentationandSearchingofChemical-StructureInformation inPatents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 JohnD.HollidayandPeterWillett 18 OfferingNewInsightsbyHarmonizingPatents,Taxonomiesand LinkedData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 AndreasPesenhofer,HelmutBerger,andMichaelDittenbach 19 AutomaticTranslationofScholarlyTermsintoPatentTerms . . . . 373 HidetsuguNanba,HideakiKamaya,ToshiyukiTakezawa,Manabu Okumura,AkihiroShinmori,andHidekazuTanigawa 20 FuturePatentSearch . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 JohnI.TaitandBarouDiallo Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409 Contributors Doreen Alberts Theravance Inc., 901 Gateway Blvd., South San Francisco, CA, USA RobertArens NuanceCommunications,Burlington,MA,USA, [email protected] NirajAswani DepartmentofComputerScience,UniversityofSheffield,Sheffield, UK,[email protected] HanyAzzam QueenMaryUniversityofLondon,London,UK, [email protected] RichardBache DepartmentofComputerandInformationSciences,Universityof Strathclyde,GlasgowG41XH,Scotland,UK,[email protected] JeanG.Beney Dept.Informatique,LCI,INSAdeLyon,Lyon,France, [email protected] Karim Benzineb SIMPLE SHIFT, Ruelle du P’tit-Gris 1, 1228 Plan-les-Ouates, Switzerland,[email protected] HelmutBerger max.recallinformationsystems,Vienna,Austria, [email protected] Harald Bosch Institute for Interactive Systems and Visualization, Universität Stuttgart,Stuttgart,Germany TedBriscoe UniversityofCambridge,Cambridge,UK,[email protected]; iLexIRLtd,Cambridge,UK BenCarterette UniversityofDelaware,Newark,DE19716,USA, [email protected] HamishCunningham DepartmentofComputerScience,UniversityofSheffield, Sheffield,UK,[email protected] xi

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.