Git4Voc: Git-based Versioning for Collaborative Vocabulary Development Lavdim Halilaj Irla´n Grangel-Gonza´lez Go¨khan Coskun So¨ren Auer EIS Department EIS Department EIS Department EIS Department University of Bonn University of Bonn University of Bonn University of Bonn Bonn, Germany Bonn, Germany Bonn, Germany Bonn, Germany [email protected] [email protected] [email protected] [email protected] 6 Abstract—Collaborativevocabularydevelopmentinthecontext the main terms across heterogeneous data sources by finding 1 of data integration is the process of finding consensus between a consensus between the developers and defining a shared 0 theexpertsofthedifferentsystemsanddomains.Thecomplexity 2 vocabulary is an effective approach to tackle this problem. of this process is increased with the number of involved people, However, this process, which we refer to as collaborative n the variety of the systems to be integrated and the dynamics of a their domain. In this paper we advocate that the realization of vocabulary development, itself is a complex problem to be J a powerful version control system is the heart of the problem. solved.Infact,themainchallengeforthevocabularyengineers 1 Driven by this idea and the success of Git in the context of is to work collaboratively on a shared objective in a harmonic 1 software development, we investigate the applicability of Git for and efficient way while avoiding misunderstandings, uncer- collaborative vocabulary development. Even though vocabulary taintyandambiguity.Thequalityoftheproducedvocabularies ] developmentandsoftwaredevelopmenthavemuchmoresimilar- I ities than differences there are still important differences. These is another challenge that should be tackled as well. In [1], A need to be considered within the development of a successful we identified and elaborated important aspects for vocabulary . versioningandcollaborationsystemforvocabularydevelopment. development such as: reuse, vocabulary structure, naming s c Therefore,thispaperstartsbypresentingthechallengeswewere conventions, multilinguality, documentation, validation and [ faced with during the creation of vocabularies collaboratively and discusses its distinction to software development. Based on authoring. These aspects are relevant from collaborative point 1 these insights we propose Git4Voc which comprises guidelines of view as well. Taking into consideration above aspects will v how Git can be adopted to vocabulary development. Finally, we impact the quality of vocabulary itself. 3 demonstratehowGithookscanbeimplementedtogobeyondthe Therefore, finding a suitable collaboration methodology is 3 plainfunctionalityofGitbyrealizingvocabulary-specificfeatures exacerbated by the number and diversity of the involved 4 like syntactic validation and semantic diffs. stakeholders as well as the complexity of the domains. Due 2 IndexTerms—versioncontrolsystem;collaborativevocabulary 0 development; git; to the open, distributed and participatory nature of the Web, . such a solution is of paramount interest for the Semantic Web 1 0 I. INTRODUCTION community. 6 Our approach to tackle the mentioned problem is to focus One of the key obstacles for the wider deployment of se- 1 on supporting the collaborative vocabulary development with mantictechnologiesisthelackofcomprehensivevocabularies. : v This is because vocabulary development requires a significant a well-known method for distributed version control in a i domain-agnostic way. In this regard, we have chosen Git for X investment, which is difficult to make by a single person or organisation. If we look at current vocabularies (e.g. LOV1), the following two reasons. On the one hand, Git is a mature r version control system supported by sophisticated tools and a we observe that they are rather simplistic. For a total of 457 broadly used in software development projects. More than 10 vocabularies listed in LOV, a straightforward query against million repositories2 are hosted on GitHub for open source the LOV SPARQL endpoint tells us that the average number and commercial projects [2]. On the other hand, existing pop- of classes for each vocabulary is 42 whereas the average ular vocabularies like schema.org3, Description of a Project number of properties is 59. Omitting the four vocabularies (DOAP)4, the music ontology5 publish their efforts in GitHub withthehighestnumberofclassesandproperties,thesefigures to leverage the contribution of the community. This indicates decrease to 31 classes and 37 properties on average. We also thatthevocabularydevelopmentcommunityisalreadyfamiliar observe that a large number of crucial domains is not or only with Git. superficiallycoveredbyexistingvocabularies.Oneofthemain The remainder of this paper is structured as follows: In reasonsforthelackofvocabulariesisalsothelackofadequate section II we present a comprehensive list of requirements methodological and tool support. At the same time, the problem of integrating data from dif- 2https://github.com/blog/1724-10-million-repositories ferent systems receives ever-increasing attention. Identifying 3https://github.com/schemaorg/schemaorg 4https://github.com/edumbill/doap 1http://lov.okfn.org 5https://github.com/motools/musicontology aggregated from the current state of the art and our ongoing add comments and discuss terms. A core team is allowed work on MobiVoc6 and SCORVoc7. In section III we present to edit the main components of the vocabulary by adding Git4Voc which comprises guidelines how Git can be used modules, classes, properties, removing terms and performing for collaborative vocabulary development. With Git4Voc we refactoring. For that reason, there is a need for the definition proposetoutilizeGit’shooksmechanismtorealizevocabular- of roles along with the permissions [5], [4], [6], [7]. specificfeatures.InsectionIVwedemonstrateconcreteexam- Workflowindependence(R4)Theoverallfieldofmethod- ple of hook implementations. We provide an overview about ologies and workflows for collaborative vocabulary develop- related work in section V. The conclusion and an outlook to ment is changing continuously [4]. To the best of our knowl- future work are presented in section VI. edge, there are no established methodologies nor workflows which are broadly applied. Tools supporting collaboration II. REQUIREMENTSOFCOLLABORATIVEVOCABULARY shouldbegenericandbeabletoadaptinhighlydynamiccon- DEVELOPMENT text.Therefore,itisimportantthatasystemisflexibleenough Collaborative vocabulary development is considered to be to be used within different methodologies and workflows. very related to the broad field of software development. In Quality assurance (R5) Developing vocabularies includes fact, most proposals for supporting the former are inspired by many requirements of quality assurance. Syntax and semantic experiences in the latter. However, a vocabulary is not totally correctness as well as the application of best practices on equaltosoftwarecode.Thedevelopmentofvocabulariesraises designingvocabulariesaresomeofthequalityaspects.There- challengeswhicharenewandnotoratleastnottothatextend fore providing tool support is a significant feature to prevent raised during software development. In this section we focus contributorsfrommakingerrors.Latercorrectingphasesmight on requirements which are more critical for vocabularies. We lead to a wasting of resources in terms of time and money. gathered these requirements by aggregating insights from the Documentation generation (R6) As mentioned before, a current state of the art and our own experiences during the team for vocabulary development comprises domain experts development of MobiVoc8 and SCORVoc9. In the following with less technical expertise in knowledge representation and these requirements are presented in detail. engineering tools. In order to enable them contributing to Communication support (R1) Collaborative vocabulary the development process, providing user friendly view to the development is about finding consensus between members currentstateofthevocabularyisvital.Therefore,anautomatic of a team. In order to share ideas and finding agreements, documentation generation feature is necessary. communicationamongthecontributorsisessential[3].During Deltas among versions (R7) Collaborative development of the whole life cycle, especially in agile development, sup- vocabulariesshouldrespondtotheevolutionoftheknowledge porting and recording discussions, changes and their reasons domain [7]. It should also respect the evolution of connected are crucial [4]. This is especially very important in the case vocabularies within the Linked Data Cloud, in order to avoid of heterogeneous teams with experts from different domains. semantic inconsistencies. Therefore, support for detecting and Some critical examples to be communicated within a team documenting the semantic difference between versions is are introducing new elements, extending or modifying the needed, to enable developers to understand the mentioned subsumption hierarchy, integration of external resources and evolutions. This includes the modification, the addition of changing the underlying semantic expressivity [5]. An effec- new elements (i.e. classes, properties) as well as the removal tive communication has a significant impact on the quality of of existing terms. Authors of well-known vocabularies such the collaboration and its outcome. as SKOS11 and schema.org12 publish release notes containing Provenance of information (R2) In collaborative develop- what has been changed among different versions. ment the capability to track the changes made by contributors Editor agnostic (R8) In contrast to software code, vocab- is an important feature [4]. This is due to the fact that each ularies are abstract artefacts which can be serialized with dif- change in the vocabulary reflects the understanding of the ferent techniques. Since contributors can use different editors authors regarding the domain. In case of disagreements, it is which style the syntax in different ways, the support of the necessarytoknowwhichchangewasmadebywhomatwhich collaboration must be editor agnostic and syntax independent. time and for what reason. Modularity(R9)Modularizationisrecognizedasanimpor- Differentroles(R3)Creatingvocabularieswiththepurpose tant step in collaborative vocabulary building [8]. Reusability, ofrealizingdataintegrationacrossheterogeneousindependent the decrease of complexity, ownership and customization are systems, involves domain experts from various fields with someofthebenefitsofvocabularymodularization.Somestud- differentlevelsofexpertise.Forinstance,inlargeprojectslike iesreportthatthereisnouniversalwaytoperformthisprocess theGeneOntology10 (GO)manyparticipantsandcuratorstake and that the choice of a particular technique should be guided part in the development process. Most participants can only by application specific requirements [9]. In contrast, other reports show that a module in a mid-sized vocabulary should 6https://github.com/vocol/mobivoc contain between 200 and 300 lines of code [10]. Especially 7http://purl.org/eis/vocab/scor 8https://github.com/vocol/mobivoc 9http://purl.org/eis/vocab/scor 11http://www.w3.org/2004/02/skos/history 10http://www.geneontology.org/ 12http://schema.org/docs/releases.html TABLE I: Different roles and their primary activities. in an agile development process with large vocabularies and many contributors, it is of paramount importance that the system provides means to support the modularization activity. Roles Basic Semantic Structural Multilinguality (R10) In order to have a wide range of Activities Issues Issues applicabilitytodifferentculturesandcommunities,vocabulary Vocabulary Eng. + + + terms must be translated into various languages [11]. The Domain Expert + - - localization (and internationalization) process of vocabularies Users - - - should be supported by the system. Translators + - - Labeling versions (R11) Release versions of vocabularies should be labeled appropriately. This ensures that users that can be humans or machines have always the possibility to use B. Rights Management specific version, not only the latest one. StandalonesolutionssuchasGitLab14 andGitolite15 aswell as third-party services like Bitbucket16 and GitHub17 offer ba- III. GIT4VOC sicoptionsforuserrightsmanagements,likereading,writing, posting,addingnewteammembersandaddingtags.However, In this section we present Git4Voc. On the one hand, even with these solutions a high level of user management we propose guidelines how Git can be used for collabo- i.e. restricting editing a specified number or type of classes, rative vocabulary development project. On the other hand, properties or instances cannot be achieved with Git. In order we present how the requirements from section II can be to address requirement (R3), we explore a combination of technically implemented by Git hooks. Additionally, in terms branching and hooks. of guidelines we analyzed best practices from collaborative With the combination of branching and hooks with role software development and identified the following aspects as definition for users, fine grained access management can be critical for the quality of the vocabulary: (1) management of achieved. Concretely, by using server-side hooks, realizing generated information; (2) rights management; (3) branching rights managements on top of user roles is possible. For andmerging;(4)automatedevelopmentanddeploymenttasks instance, an implementation of a pre-push hook can check by hooks; (5) tool independence; (6) vocabulary organization for the user’s role and permissions and deny if the necessary structure; and (7) labeling of release versions. In the next rights according the activity and branch are not set. subsections we show in detail how our approach responds to Table I shows common roles and their permissions, with the above mentioned requirements. respect to the defined categories of activities. In a trusted environment right management can also be realized with A. Management of Generated Information client-sidehooks.AnexampleforthisisdepictedinListing3, where the user is denied to push to the master branch. During the development process a bunch of information is generated by the contributors. The capability to manage C. Branching and Merging this information within the entire project life-cycle is essen- Git is a very flexible tool, which addresses requirement tial. In fact, value added services like GitHub, GitLab or (R4). Using Git, teams are able to organize their work in BitBucket enrich Git functionality with powerful information different types of workflows18. Branching strategies affect management features. For instance, issues are a great way the quality in collaborative software development [12], [13]. of tracking communications, reporting problems as well as Vocabulary development is mostly accepted to be a specific bug fixes and announcing version releases. Communities like type of software development. Therefore, it is considered that schema.orgmanagetheirdiscussionsusingGitHub.Theabove the branching strategy affects the quality of the vocabularies. mentioned means support requirement (R1). Based on this Well-known projects such as schema.org use branches to fact, we propose that activities gathered in Table II should be organize their work. In order to design a branching model, documented.Ifpossible,thenameofissuesshouldcorrespond itisimportanttounderstandthepossibleactivitiesthatateam to the name of the activities. can perform. In this regard, we collected common activities Another important requirement in collaborative vocabulary of collaborative vocabulary development which are listed in development is the ability to view the history of the changes TableII.Aimingatproducingavocabularywithgoodquality, (called traceability in software engineering). This addresses the entire team should be aware of these activities and how the requirement (R2). Using commands git log and git diff a to face them in the development process. Due to their impact usercanexplorethehistoryofthecommitsandthedifferences between them. Each commit should be realized based on Best 14https://gitlab.com/gitlab-org/gitlab-ce/blob/master/doc/permissions/permissions.md Commit Practices13. In vocabulary development the atomicity 15https://github.com/sitaramc/gitolite of commits is of paramount importance. 16https://confluence.atlassian.com/display/BITBUCKET/Add+Users,+Set+ Permissions,+and+Review+Account+Plans 17https://help.github.com/articles/permission-levels-for-an-organization-repository 13http://www.git-tower.com/learn/git/ebook/command-line/appendix/best-practices 18https://www.atlassian.com/git/tutorials/comparing-workflows/forking-workflow TABLE II: Common Activities in Collaborative Vocabulary Engineering Activity Name Description Example ACT1 Simple Adding new or deleting existing elements like classes Addingaclassinthelastlevelofthetaxonomy Addition/Deletion andproperties ACT2 Complex Adding new elements to be interconnected within the Adding a object property as a super property of two existing Addition/Deletion existingclassorpropertiestaxonomy properties ACT3 Modification Modifyingexistingelements Modifyingthedomainandrangeofanexistingobjectproperty ACT4 Reusing ReusingelementsoftheLinkedDataCloud Definingnewlocalconceptsbyusingexternalresources ACT5 Alignment Alignment of existing elements with equivalents in the Alignment of classes and instances with owl:equivalentClass cloud andowl:sameAs ACT6 Refactoring Changingthenameandmetadataofanspecificelement Renamingaclasswhichisconnectedinmanydomainandrange anditsconnections relationofpropertiesandneedtoberenamedeverywhere ACT7 CommonMetadata Adding/Removing/ModifyingpredefineRDFSmetadata Addingmetadatatoaclasswithrdfs:label,rdfs:comment ACT8 ExternalMetadata Adding/Removing/Modifyingexternalmetadata Addingmetadatatoaclasswithskos:prefLabel,dc:title ACT9 Translating Adding/Removing/Modifyingtranslationfortheterms Usingrdfs:labeltotranslateelementsintodifferentlanguages ACT10 Modularization Addingnewmodulestotheexistingvocabulary Creatingandintegratingnewmodulesduetonewrequirements ACT11 Partitioning Partitioning into different modules with existing ele- The vocabulary has grown in size and semantic complexity. ments Partitioningtheexistingvocabularyintodifferentmodules of hooks are distinguished, client-side and server-side hooks. Duetospacelimitations,theexamplesinthisarticleshowcase Tags v1.0.0 v2.1.6 v2.9.8 v3.5.1 v4.2.1 only client-side hooks. In order to address requirements (R5) Master and(R6),weimplementedthefollowingthreeimportanttasks Semantic forcollaborativevocabularydevelopment:(1)syntaxchecking, Issues (2) assessing vocabularies against best design practices and Develop (3) documentation generation. Figure 2 illustrates how these Structural Issues examples are integrated into the commit process. Time After modifying the local vocabulary and adding changes to the stage phase, the next step is to commit the current state to the local repository. The initialization of commit triggers a Fig. 1: Branching model for vocabulary development hook named pre-commit. Listing 3 shows our implementation ofthishookwhichrealizesthetaskssyntaxcheckingandbest practice assessment. First, it retrieves all modified files with on the overall vocabulary, we have classified these activities extensions such as rdf, owl, ttl and checks for syntax errors intothreecategories:(1)basicactivities(ACT1,ACT7,ACT9), by using Rapper20. In case that vocabularies fail to pass the (2)semanticissues(ACT2,ACT3,ACT4,ACT5,ACT6,ACT8) validationprocess,thecommitiscanceled.Theuserisnotified and (3) structural issues (ACT10, ACT11). with a message which shows detailed description about the This led us to the branching model that is depicted in error which comprised of the file name, line number and Figure 1. We designed different branches to handle the men- the error type. If syntax validation is passed successfully, the tioned categories. Basic activities have to be performed in modified files are posted to the OOPS21 Web Service through the Develop Branch. For the second category we propose curl, a command-line HTTP client. This service assesses a dedicated branch called Semantic Issues. In case of the vocabulary files for certain quality metrics. The result of third category a branch named Structural Issues has to be this is a descriptive message that contains recommendations applied. It is important to bear in mind that we are not of best practices for vocabulary development. If no errors restricting the flexibility of Git regarding branches. On the exist, the pre-commit hook is finished and the commit is contrary, other branches can be used as a complement of accepted. Afterwards a post-commit hook is called. Listing 4 this model. Nevertheless, our approach of branching model demonstrates our implementation of a post-commit hook for will help developers because those branches are connected to documentation generation in a human friendly format. This specific activities in collaborative vocabulary development. script uses Parrot22 as an external tool. Our solution is built on top of the best practices for For security reasons Git repository services do not allow branching in software development19. to automatically distribute predefined hooks on cloning phase. D. Automate Development and Deployment Tasks by Hooks In order to accomplish this task, the repository itself should have a dedicated folder that contains the implemented hooks. Despite the fact that Git has many implemented features, After the first clone, these hooks need to be copied to the it allows extending its functionality by using so-called hooks. .git/hooksdirectory.Forthatpurpose,weimplementedascript Thisisamechanismthatallowsrunningscriptsbeforeorafter specific Git events. Based on the execution place, two types 20http://librdf.org/raptor/ 21http://oops-ws.oeg-upm.net/ 19http://nvie.com/posts/a-successful-git-branching-model 22https://bitbucket.org/fundacionctic/parrot/ the merge tool is necessary, which is a time consuming and error prone task that could lead to information lose. Commit Start Inordertoavoidtheabovementionedproblems,wepropose Pre-Commit Hook the use of Turtle format. This addresses the requirement (R8). A similar approach describes a pattern to express data Local Syntax Validation on GitHub storing it in CSV files26. Listing 1 presents our No Rapper proposal to write one triple per line. Yes Listing 1: One triple per line Best Design Practices @prefix rdf: <http://www.w3.org/1999/02/22−rdf−syntax−ns#>. Checking @prefix rdfs:<http://www.w3.org/2000/01/rdf−schema#>. No OOPS Web Service @prefix owl: <http://www.w3.org/2002/07/owl#>. @prefix scor:<http://purl.org/eis/vocab/scor#>. Yes @prefix vs: <http://www.w3.org/2003/06/sw−vocab−status/ns#>. Commit Done scor:Process rdf:type owl:Class ; rdfs:comment "A process is a unique activity..."@en ; Post-Commit Hook rdfs:label "Process"@en ; Local Documentation rdfs:isDefinedBy scor: ; Generation vs:term_status "testing". Parrot scor:Enable rdfs:subClassOf scor:Process ; rdfs:comment "Enable describes the ..."; rdfs:label "Enable". Fig. 2: Client-side Hooks Workflow F. Vocabulary Organization Structure which needs to be executed after cloning the repository. Once Git’sbasicfunctionalitiesdonotsupportmodularizingcode thisprocessisfinished,predefinedhookswillbeautomatically orvocabularies.Therefore,inordertoaddresstherequirement executed after each commit. However, when the hooks have (R9), we propose some guidelines for organizing the vocabu- beenchanged,e.g.tousedifferentvalidationordocumentation lary in files where each file represents a module. Considering generation tools, this script has to be executed again. Apart the fact that each line should represent a triple and based on from installing the hooks, this script can also be used to the insights on [10], we propose that files should not contain download and install tools like Rapper, which are necessary more than 300 triples. We highlight three possible forms of for the hooks. If these tools are to be placed within the local organizingthefiles.AllofthesecasesusesingleGitrepository repository, the file .gitignore should be used to prevent them to store the files. from being pushed to the remote repository. 1. The complete vocabulary is contained in one single file. Gitdoesnotshowsemanticdiffsbetweenversionsofvocab- Whenthevocabularyissmall(e.g.containslessthan300lines ulary. Owl2VCS [14] shows deltas among different versions. of code) and represents a domain which cannot be divided in By using such a tool and hooks, generated deltas can be sub domains, it should be saved within one single file. If the published is human friendly format as well. This corresponds number of contributors is relatively small and the domain of to the requirement (R7). the vocabulary is very focused, organizing it into one single file might be possible, even if it exceeds 300 lines of code. E. Tool Independence However, if the comprehensibility is exacerbated, splitting it into different files should be considered. Collaborative working with Git can be facilitated by us- ing vocabulary editors like Prote´ge´23, TopBraid Composer24, 2.Thevocabularyissplitinmultiplefiles.Ifthevocabulary Neon Toolkit25. As each of them has different algorithms for contains more than 300 lines of code or covers a complex domain, it should be organized into different sub domains writing files, there might arise consistency problems in case or modules. In this regard, we mapped sub domains with that contributors are not using the same editor. For instance, modules. Whenthe subdomains themselvesare smallenough one contributor use Prote´ge´, whereas another one uses Neon they should be represented by different files within the par- Toolkit. They are editing the same file simultaneously. After ent folder. There exists patterns for vocabulary modulariza- saving it, different representations of that file will be created. tion [15]. We developed the MobiVoc based on the pattern n As a consequence Git recognizes lot of changes and asks for modules importing 1 module. In this case, 1 module was the conflict resolution. This is due to fact that Git is a version vocabularyitself.ThenmoduleslikeAircraft,Fuelweresaved control based on text line changing. It detects when a line has in separate files. Each file represents a specific sub domain. been changed from the previous version. In such a case using By following this approach, domain experts can contribute 23http://protege.stanford.edu/ independently to vocabulary development according to their 24http://www.topquadrant.com/tools/IDE-topbraid-composer-maestro-edition/ 25http://neon-toolkit.org/wiki/Main Page.html 26http://blog.okfn.org/2013/07/02/git-and-github-for-data/ category is related with a number, in the respective position. Changes in the vocabulary regarding to the categories are Vocabulary MobiVoc commonly reflected by increasing the numbers. For instance, the difference between releases v[1.0.0] and v[2.0.0] shows Modules structural issue changes (StI). IV. IMPLEMENTATION We have developed Git4Voc27, which is an environment for Aircraft Fuel Charging Parking Energy Points collaborative vocabulary development. Table III provides an overview which of the previously described requirements are fulfilled by Git4Voc. This solution combines Git4Voc with a Motor Filling Bike Low Emission Means of Related Vehicle Stations Sharing Zone Transport Vehicle set of state-of-the-art tools like Rapper, OOPS Service and Parrot. Each tool is exchangeable and can be easily replaced by alternatives. They provide services which are called by the hooks mechanism. In the following these hooks are presented Fig. 3: The structure of MobiVoc in detail. The Listing 2 shows an example how predefined hooks are copied into the .git/hooks folder after cloning the repository. field of expertise. Figure 3 depicts the structure of MobiVoc In addition, it shows installing of the tools: Raptor and Parrot and its modules. and their necessary libraries in case they do not exist. 3. Vocabulary modules are stored in files and folders. For hugevocabulariesthatcomprisescomplexdomains,splittingit Listing 2: Install Hooks and Tools intofilesisnotsufficient.Thiswouldleadtoalargeamountof #!/bin/sh files within a single folder. Therefore, if the sub domains are # Copy the modified hooks for i in ‘ls -1 hooks‘ do largeenoughtobesplitintofilestheyshouldberepresentedby cp hooks/$i .git/hooks/$i folders. Each folder contains files which represents modules. done # Create directory for necessary tools In this case, the folder and file structure should reflect the if [ -d "$tools" ]; then complex hierarchy of the overall domain. mkdir -p "tools" ... Through splitting the vocabulary in files for specific pur- fi poses, the requirement (R10) is addressed as well. This can cd tools be achieved by creating dedicated files for translating. In #Install Raptor these files users with the role Translators can contribute by if [ ! -d "$Raptor" ]; then curl -O http://download.librdf.org/source/raptor2-2.0.15. translating the terms into the required language. tar.gz ... sudo apt-get install libxml2-dev libxslt1-dev python-dev G. Labeling of Release Versions sudo apt-get -y install raptor2-utils ... Based on requirement (R11), proper labeling of release fi versions is vital, as it facilitates re-usability. One of the #Install Parrot common ways to realize that is to deploy each release ver- if [ ! -e "parrot-jar-with-dependencies.jar" ]; then sion in different files. However, this could lead to following curl -O https://github.com/vocol/vocol/raw/master/Hooks/ tools/parrot-jar-with-dependencies.jar problems as identified in [16]: (1) the number of files could fi increase rapidly, (2) choosing versions creates confusion, (3) ... maintenance needs additional resources and (4) synchronizing with latest version from dependent applications requires ad- The pre-commit hook is adapted to prevent users from ditional effort. To avoid the above mentioned problems, we committing to the master branch as shown in the Listing 3. have kept versions of vocabularies in the same file. These Thisexamplecanbefurthercustomizedtorestrictcommitting versions are separated by Git implemented functionality of to other branches as well. By doing so, a low level of rights tagging and saved in the master branch which is part of the management is achieved on the local repository, before the branching model and illustrated in Figure 1. It is possible changes are pushed to the remote repository. Furthermore, to create and filter tags at any time. Moreover, users can to reduce the efforts needed for subsequent corrections, we obtain a specific version of the vocabulary just by giving the integratedtoolsfor(1)syntaxvalidation;and(2)checkingfor tag name. Therefore, each released version of a vocabulary bad modeling practices. For the first, the Rapper tool is used, must have a version number. Based on the scheme from [17] which validates each turtle file for syntactic errors. For the and the mentioned categories of activities in Table II, we second, we used OOPS Web Service to scan vocabulary files propose tagging different versions according to the following for bad modeling practices. pattern: v[StI.SeI.BA], where StI stands for Structural Issues, SeI for Semantic Issues and BaA for Basic Activities. Each 27https://github.com/vocol/vocol/tree/master/Git4Voc TABLE III: Collaborative Requirements from Git4Voc perspective. No. Requirement Supported Means byplainGit R1 Communicationsupport + IssuestrackingsystemofferedbyhostingplatformsGitHub,GitLab,BitBucket R2 Provenanceofinformation + Gitlogandgitdifffunctionality R3 Differentroles + Furtherextensionbyacombinationofbranchingandhooks R4 Workflowindependence + Git,branchingandmergingstrategies R5 Qualityassurance - Extendedbyacombinationofhooksandtoolslike:Rapper,JenaRiot R6 Documentationgeneration - Extendedbyacombinationofhooks,documentationgenerationtoolslike:SchemaOrg,Widoco R7 Deltasamongreleases - ExtendedbyacombinationofOwl2VCSasanexternaltoolandhooks R8 Editoragnostic + Gitbasefunctionality R9 Modularity + Hierarchyorganizationofvocabularieswheremodulesarerepresentedbyfiles R10 Multilinguality + Hierarchyorganizationofvocabularieswherededicatedfilesareusedformultilinguality R11 Labelingversions + Gittagfunctionality Listing3:Pre-CommitHook:Syntaxvalidationandcheck- if [ "$files" = "" ]; then ing for bad modeling practices exit 0 fi #!/bin/bash currentBranch=$(git symbolic-ref HEAD) for file in ${files}; do # Generate documentation using Parrot if [ "$currentBranch" = "refs/heads/master" ]; then java -jar @path/parrot-jar-with-dependencies.jar -i "${ echo "Not allowed to commit to master branch!" file}" -o "${file}".html exit 1 # Generate documentation using Widoco else # java -jar @path/widoco-0.0.1-jar-with-dependencies. # Get only modified files with ttl extension jar -ontFile "${file}" -outFolder /home/ files=$(git diff --cached --name-only --diff-filter=ACM ... | grep ".ttl$") done ... echo "\Documentation Generation is completed.\n" exit 1 for file in ${files}; do # Validate each file using Rapper res=$(rapper -i turtle "${files}" -c 2>&1) ... V. RELATEDWORK if ! $error; then for file in ${files}; do fileContent=‘cat ${files}‘ Collaborative vocabulary development is an active research request="<?xml ...>" area in the Semantic Web community [19]. Existing ap- res=$(curl -X POST -d "$request" -H "Content- Type: application/xml" http://oops-ws.oeg- proaches like WebProte´ge´ [20] provides a collaborative web upm.net/rest) frontend for a subset of the functionality of the Prote´ge´ OWL ... done editor. The aim of WebProte´ge´, is to lower the threshold fi for collaborative ontology development. Neologism [21] is a if ${succeed}; then vocabulary publishing platform, with a focus on ease of use echo "COMMIT SUCCEEDED" and compatibility with Linked Data principles. Neologism fo- else echo "COMMIT FAILED" cusesmoreonvocabularypublishingandlessoncollaboration. exit 1 VocBench [22], is an open source web application for editing fi fi thesauri complying with the SKOS and SKOS-XL standards. VocBenchhasafocusoncollaboration,supportedbyworkflow The Listing 4 shows the post-commit hook, which uses management for content validation and publication. Parrot to create the human friendly representation of the The main limitation of the aforementioned tools is the lack developed vocabulary. The generated content is saved as a of version control. Therefore, we only consider approaches single HTML file which consists of all vocabulary elements. focused on using version control systems for collaborative User is able to navigate through entire vocabulary by merely vocabulary development. selecting the element name. Moreover, in order to create dif- SVoNt [23] extends the functionality of Apache SubVersion ferent representation style of the vocabulary, alternative tools (SVN)byprovidingapossibilityforversioningOWLconform such as: Widoco28, Specgen29, Dowl30, etc. (c.f. commented lightweight description logic. SVN manipulates only with part of the code) can be used as well. deltas of files, therefore SVoNt use a separate server to create conceptual changes between versions of ontologies. These Listing 4: Post-Commit Hook: Documentation Generation changesaregeneratedasaresultofdiff operationbetweenthe #!/bin/sh files=$(git diff --cached --name-only --diff-filter=ACM | modifiedontologyandthebaseontology.ContentCVS[24]isa grep ".ttl$") Prote´ge´ plugin. It adapts concepts from concurrent versioning to enable developers to work in parallel. Moreover, it has fea- 28https://github.com/dgarijo/Widoco 29https://github.com/specgen/specgen turesforconflictdetectionandresolutionbycheckingstructure 30https://github.com/ldodds/dowl and semantic of the ontology versions. In [17] is described how the developers of RDA Vocabularies31 adopt rules from [2] E. Kalliamvakou, D. Damian, K. Blincoe, L. Singer, and D. German, SemVer32 to realize a meaningful versioning using Git. Addi- “Open source-style collaborative development practices in commercial projects using github,” in 37th International Conference on Software tionally,itprovidesgeneralnotesfororganizingthevocabulary Engineering(ICSE15),2015. development in branches. [25] describes Owl2VCS, a toolset [3] N. F. Noy, A. Chugh, W. Liu, and M. A. Musen, “A framework for designed to facilitate version control of OWL 2 ontologies ontologyevolutionincollaborativeenvironments,”inISWC. Springer, 2006,pp.544–558. using version control systems. It can be integrated as an [4] N.F.NoyandT.Tudorache,“Collaborativeontologydevelopmentonthe external tool with Git, Mercurial and Subversion and provide (semantic) web.” in AAAI Spring Symposium: Symbiotic Relationships algorithmsforstructuraldiff[14].However,noneoftheabove betweenSemanticWebandKnowledgeEngineering,2008,pp.63–68. [5] J. Seidenberg and A. L. Rector, “The state of multi-user ontology mentioned approaches cover all the identified requirements engineering.”inWoMO. Citeseer,2007. (c.f. section II) for collaborative vocabulary development. On [6] T.Tudorache,N.F.Noy,S.M.Falconer,andM.A.Musen,“Aknowl- the contrary, our work analyze and address each one of them edgebasedrivenuserinterfaceforcollaborativeontologydevelopment,” in16thInternationalConferenceonIntelligentUserInterfaces. ACM, by using Git and Git4Voc as an extension. 2011,pp.411–414. [7] E.SimperlandM.Luczak-Ro¨sch,“Collaborativeontologyengineering: VI. CONCLUSIONANDFUTUREWORK asurvey,”TheKnowledgeEngineeringReview,vol.29,no.01,pp.101– 131,2014. Inthispaper,weinvestigatedtheapplicabilityofGitforcol- [8] M. C. Suarez-Figueroa, A. Go´mez-Pe´rez, and M. Fernandez-Lopez, laborative vocabulary development. We defined collaborative “The neon methodology for ontology engineering,” in Ontology engi- vocabularydevelopmentastheprocessofidentifyingthemain neeringinanetworkedworld. Springer,2012,pp.9–34. [9] M.d’Aquin,A.Schlicht,H.Stuckenschmidt,andM.Sabou,“Ontology terms across heterogeneous data sources by finding a consen- modularizationforknowledgeselection:Experimentsandevaluations,” susbetweenthedevelopers.Themainchallengeinthisregard in Database and Expert Systems Applications. Springer, 2007, pp. istherealizationofapowerfulcollaborativeenvironment.Dis- 874–883. [10] A. Schlicht and H. Stuckenschmidt, “Towards structural criteria for tributed version control systems enable developers around the ontology modularization,” in ISWC 2006 Workshop on Modular On- world to work collaboratively on complex software systems. tologies. Citeseer,2006. Sincesoftwareandvocabulariesarenotthesame,weanalyzed [11] J. Gracia, E. Montiel-Ponsoda, P. Cimiano, A. Go´mez-Pe´rez, P. Buite- laar, and J. McCrae, “Challenges for the multilingual web of data,” their differences in detail by identifying requirements for a JournalofWebSemantics,vol.11,pp.63–71,2012. version control system that supports collaborative vocabulary [12] S. Phillips, J. Sillito, and R. Walker, “Branching and merging: an development. Our approach extends plain Git functionality by investigationintocurrentversioncontrolpractices,”in4thInternational WorkshoponCooperativeandHumanAspectsofSoftwareEngineering. utilizing the hooks mechanism in combination with external ACM,2011,pp.9–15. tools to address these requirements. The presented approach [13] E. Shihab, C. Bird, and T. Zimmermann, “The effect of branching is easily extensible and can accommodate additional external strategiesonsoftwarequality,”inACM-IEEEInternationalSymposium onEmpiricalSoftwareEngineeringandMeasurement(ESEM). IEEE, tools. 2012,pp.301–310. Regarding the future work, we are going to extend our [14] R.S.Gonc¸alves,B.Parsia,andU.Sattler,“Ecco:Ahybriddifftoolfor approachwiththefullimplementationofserversidehooks.By owl2ontologies.”inOWLED,2012. [15] S.B.Abbes,A.Scheuermann,T.Meilender,andM.d’Aquin,“Charac- doing so, tasks like: deploying specific versions of vocabular- terizingmodularontologies,”in7thInternationalConferenceonFormal iestoadedicatedserver,generatingdeferencableURI’s,ontol- OntologiesinInformationSystems(FOIS),2012,pp.13–25. ogypartitioningandmodularizationtaskscanbeperformedin [16] P. Bedi and S. Marwaha, “Versioning owl ontologies using temporal tags,”InternationalJournalofComputer,Control,QuantumandInfor- a fully automated way. We also plan to develop and integrate mationEngineering,2007. a tool that validates vocabularies against conventions [1] and [17] J.P.DianeI.Hillmann,GordonDunsire,“Versioningvocabulariesina provides recommendations for solving possible issues. This linkeddataworld,”IFLALion,2014. [18] VoCol: An Agile Methodology and Environment for Collaborative will lead to a convenience and less error prone collaborative Vocabulary Development. Zenodo, Feb. 2015. [Online]. Available: vocabulary development environment. As a result, all gener- http://dx.doi.org/10.5281/zenodo.15023 ated artefacts will be publicly accessible from all interested [19] R. Palma, O. Corcho, A. Go´mez-Pe´rez, and P. Haase, “A holistic approachtocollaborativeontologydevelopmentbasedonchangeman- parts. agement,”JournalofWebSemantics,vol.9,no.3,pp.299–314,2011. [20] T. Tudorache, C. Nyulas, N. F. Noy, and M. A. Musen, “Webprote´ge´: ACKNOWLEDGMENTS Acollaborativeontologyeditorandknowledgeacquisitiontoolforthe web,”SemanticWeb,vol.4,no.1,2013. This work is supported by the German Ministry for Ed- [21] C. Basca, S. Corlosquet, R. Cyganiak, S. Ferna´ndez, and T. Schandl, ucation and Research funded project LUCID and European “Neologism:Easyvocabularypublishing,”2008. [22] A. Stellato, S. Rajbhandari, A. Turbati, M. Fiorelli, C. Caracciolo, Commission under the Seventh Framework Program FP7 for T. Lorenzetti, J. Keizer, and M. T. Pazienza, “Vocbench: A web grant 601043 (http://diachron-fp7.eu). application for collaborative development of multilingual thesauri,” in TheSemanticWeb.LatestAdvancesandNewDomains. Springer,2015, REFERENCES pp.38–53. [23] M.Luczak-Ro¨sch,G.Coskun,A.Paschke,M.Rothe,andR.Tolksdorf, [1] I.Grangel-Gonza´lez,L.Halilaj,G.CoskunandS.Auer,“Towardsvo- “Svont-version control of owl ontologies on the concept level.” GI cabularydevelopmentbyconvention,”in7thInternationalJointConfer- Jahrestagung(2),vol.176,pp.79–84,2010. enceonKnowledgeDiscovery,KnowledgeEngineeringandKnowledge [24] E. Jime´nez-Ruiz, B. C. Grau, I. Horrocks, and R. B. Llavori, “Con- Management,Vol2,2015,pp.334–343. tentcvs: A cvs-based collaborative ontology engineering tool.” in SWAT4LS. Citeseer,2009. 31http://www.rdaregistry.info [25] I. Zaikin and A. Tuzovsky, “Owl2vcs: Tools for distributed ontology 32http://www.semver.org development.”inOWLED. Citeseer,2013.