ebook img

Semantic Control for the Cybersecurity Domain: Investigation on the Representativeness of a Domain-Specific Terminology Referring to Lexical Variation PDF

193 Pages·2022·5.543 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Semantic Control for the Cybersecurity Domain: Investigation on the Representativeness of a Domain-Specific Terminology Referring to Lexical Variation

Semantic Control for the Cybersecurity Domain This book presents the creation of a bilingual thesaurus (Italian and English), and its conversion into an ontology system, oriented to the Cybersecurity field of knowl- edge term management and the identification of a replicable method over other spe- cialized areas of study, through computational linguistics procedures, to a statistical and qualitative measurement of the terminological coverage threshold a controlled vocabulary is able to guarantee with respect to the semantic richness proper to the domain under investigation. The volume empowers readers to compile and study significant corpora documentations to support the text mining tasks and to establish a representativeness evaluation of the information retrieved. Through a description of several techniques belonging to the field of linguistics and knowledge engineering, this monograph provides a methodological account on how to enhance and update semantic monitoring tools reflecting a specialized lexicon as that of Cybersecurity to grant a reference semantic structure for domain-sector text classification tasks. This volume is a valuable reference to scholars of corpus-based studies, terminology, ICT, documentation and librarianship studies, text processing research, and distributional semantics area of interest as well as for professionals involved in Cybersecurity organizations. Semantic Control for the Cybersecurity Domain Investigation on the Representativeness of a Domain-Specific Terminology Referring to Lexical Variation Claudia Lanza First edition published 2023 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2023 Claudia Lanza Reasonable efforts have been made to publish reliable data and information, but the author and pub- lisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information stor- age or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. ISBN: 978-1-032-25080-9 (hbk) ISBN: 978-1-032-25081-6 (pbk) ISBN: 978-1-003-28145-0 (ebk) DOI: 10.1201/9781003281450 Typeset in CMR10 by KnowledgeWorks Global Ltd. Publisher’s note: This book has been prepared from camera-ready copy provided by the authors. Contents ListofFigures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix ListofTables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv 1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Corpuslinguistics . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Corpusrepresentativeness . . . . . . . . . . . . . . . . 1 1.1.2 Thesauritermrepresentativeness . . . . . . . . . . . . 5 1.1.3 Corpusdesign . . . . . . . . . . . . . . . . . . . . . . 10 1.2 Textmining . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2.1 Datapre-processing . . . . . . . . . . . . . . . . . . . 13 1.3 Terminologicalknowledgebases(TBK) . . . . . . . . . . . . 15 1.3.1 Automatictermextraction(ATE) . . . . . . . . . . . . 17 1.3.2 Goldstandardsreference . . . . . . . . . . . . . . . . 19 1.3.3 Domain-dependency . . . . . . . . . . . . . . . . . . . 20 1.3.4 Termspopulation . . . . . . . . . . . . . . . . . . . . 20 2 Casestudy:Cybersecuritydomain . . . . . . . . . . . . . . . . . . 23 2.1 OCSproject . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 Specializedlanguagesreflectingspecializeddomains . . . . . . 25 2.3 Cybersecurityfieldofknowledge . . . . . . . . . . . . . . . . 27 2.4 CorpusdesignforCybersecuritydomaininItalianlanguage . . 29 2.5 ExistingresourcesforCybersecuritydomain . . . . . . . . . . 31 2.5.1 Italianresources . . . . . . . . . . . . . . . . . . . . . 32 v vi (cid:4) Contents 2.5.2 Englishresources . . . . . . . . . . . . . . . . . . . . 34 2.6 Groupofexpertssupervision . . . . . . . . . . . . . . . . . . 35 3 Relatedworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.1 KOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.1.1 Thesauri . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.1.2 Ontologies . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2 Semanticconversions–fromthesauritoontologies . . . . . . . 48 3.3 SKOSssystems . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.4 Researchapproaches . . . . . . . . . . . . . . . . . . . . . . . 50 3.4.1 Clusteringapproaches . . . . . . . . . . . . . . . . . . 51 3.4.2 Keyphrasesextraction . . . . . . . . . . . . . . . . . . 52 3.4.3 NLPapproachesforbuildingsemanticstructures . . . . 54 3.4.3.1 Distributionalsimilarity . . . . . . . . . . . 54 4 Researchmethodology . . . . . . . . . . . . . . . . . . . . . . . . . 61 4.1 Corpusconstruction . . . . . . . . . . . . . . . . . . . . . . . 61 4.1.1 Documentsselection . . . . . . . . . . . . . . . . . . . 62 4.1.2 Software-aidedterminologicalextractions–Firstselec- tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.1.2.1 T2K–Linguistic-orientedtool . . . . . . . . 71 4.1.2.2 TermSuite. . . . . . . . . . . . . . . . . . . 74 4.1.2.3 Pke-keyphrasesdetection . . . . . . . . . . 79 4.1.2.4 Results . . . . . . . . . . . . . . . . . . . . 82 4.2 Candidatetermsselection . . . . . . . . . . . . . . . . . . . . 83 4.2.1 Frequencycriterion . . . . . . . . . . . . . . . . . . . 83 4.2.2 Mappingwithgoldstandards . . . . . . . . . . . . . . 85 4.2.3 Expertsupportsystem . . . . . . . . . . . . . . . . . . 90 4.3 Automatizationofthesaurusconstruction . . . . . . . . . . . . 93 4.3.1 Approaches . . . . . . . . . . . . . . . . . . . . . . . 93 4.3.2 Variants . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.3.3 Patterns-based . . . . . . . . . . . . . . . . . . . . . . 94 4.3.3.1 Causativerelations . . . . . . . . . . . . . . 95 4.3.3.2 Hierarchy . . . . . . . . . . . . . . . . . . . 98 4.3.3.3 Synonyms . . . . . . . . . . . . . . . . . . 98 4.3.4 Wordembeddingsdetection . . . . . . . . . . . . . . . 99 5 SemantictoolsforCybersecurity . . . . . . . . . . . . . . . . . . . 105 5.1 ConstructionofItalianCybersecuritythesaurus . . . . . . . . . 105 5.1.1 Semanticrelationships . . . . . . . . . . . . . . . . . . 108 5.1.1.1 Hierarchy . . . . . . . . . . . . . . . . . . . 108 5.1.1.2 Equivalence. . . . . . . . . . . . . . . . . . 109 5.1.1.3 Association . . . . . . . . . . . . . . . . . . 110 Contents (cid:4) vii 5.1.1.4 Scopenotes . . . . . . . . . . . . . . . . . . 110 5.2 Ontologyconversion . . . . . . . . . . . . . . . . . . . . . . . 112 5.2.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6 Semanticenhancementandnewperspectives . . . . . . . . . . . . 127 6.1 Multilingualalignment . . . . . . . . . . . . . . . . . . . . . . 127 6.2 Monitoringofterminologicalrepresentativeness . . . . . . . . 129 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 List of Figures 1 PhDresearchphases . . . . . . . . . . . . . . . . . . . . . . . xxi 1.1 Approachesforcorpusrepresentativeness. . . . . . . . . . . . . 4 2.1 StatisticsOCSwebsite . . . . . . . . . . . . . . . . . . . . . . 25 2.2 MeSHtreeviewforvirus . . . . . . . . . . . . . . . . . . . . . 27 2.3 Hierarchyoflaws . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4 CybersercurityNationalstructureasreportedintheWhiteBook onCybersecurity(2018:18)[147] . . . . . . . . . . . . . . . . 32 3.1 KOSdiagramclassification[127] . . . . . . . . . . . . . . . . . 38 4.1 Magazines from Ulrich’s web and BNCF for the Cybesecurity domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.2 Resultsoftermhoodmeasures . . . . . . . . . . . . . . . . . . 85 4.3 Synthesisofthemainfirstcandidatetermswithreferencetothe sharedknowledgecontainedintheevaluationlists . . . . . . . . 89 4.4 Newtermsderivedbythecollaborationwiththegroupofexperts 91 4.5 ComparisonofRTinthesaurusandpatternspathsoutputs . . . . 96 5.1 Subjectcategoryandbroaderterm . . . . . . . . . . . . . . . . 108 5.2 Cybersecurityhierarchicalstructure . . . . . . . . . . . . . . . 109 5.3 Equivalentterms . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.4 Relatedterms . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 5.5 ScopeNotesfortermsinthethesaurus . . . . . . . . . . . . . . 113 5.6 Ontology graph for the Italian Cybersecurity thesaurus conver- sion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.7 Thesaurus representation of the semantic relationship that de- scribesopposition . . . . . . . . . . . . . . . . . . . . . . . . . 118 ix

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.