This is a repository copy of Improving Archaeologists' Online Archive Experiences Through User-Centred Design. White Rose Research Online URL for this paper: https://eprints.whiterose.ac.uk/109266/ Version: Published Version Article: Power, Christopher Douglas orcid.org/0000-0001-9486-8043, Lewis, Andrew, Petrie, Helen orcid.org/0000-0002-0100-9846 et al. (7 more authors) (2017) Improving Archaeologists' Online Archive Experiences Through User-Centred Design. ACM Journal on Computing and Cultural Heritage. ISSN 1556-4711 https://doi.org/10.1145/2983917 Reuse Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of the full text version. This is indicated by the licence information on the White Rose Research Online record for the item. Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing [email protected] including the URL of the record and the reason for the withdrawal request. [email protected] https://eprints.whiterose.ac.uk/ Improving Archaeologists’ Online Archive Experiences Through User-Centred Design CHRISTOPHERPOWER,ANDREWLEWIS,HELENPETRIE,KATIEGREEN,andJULIANRICHARDS, 3 UniversityofYork MARKERAMIAN,BRITTANYCHAN,andEKTAWALIA,UniversityofSaskatchewan ISAACSIJARANAMUALandMAARTENDERIJKE,UniversityofAmsterdam Traditionally,thepreservationofarchaeologicaldatahasbeenlimitedbythecostofmaterialsandthephysicalspacerequiredto storethem,butforthelast20years,increasingamountsofdigitaldatahavebeengeneratedandstoredonline.Newtechniques indigitalphotographyanddocumentscanninghavedramaticallyincreasedtheamountofdatathatcanberetainedindigital format, while at the same time reducing the physical cost of production and storage. Vast numbers of hand written notes, greyliteraturedocuments,imagesofassemblages,contexts,andartefactshavebeenmadeavailableonline.However,accessing theserepositoriesisnotalwaysstraightforward.Superficialinteractiondesign,sparselypopulatedmetadata,andheterogeneous schemasmaypreventusersfromworkingthedatathattheyneedwithinarchaeologicalarchives. Inthisarticle,wepresenttheworkoftheDiggingintoArchaeologicalDataandImageSearchMetadataproject(DADAISM), amultidisciplinaryprojectthatdrawstogethertheworkofresearchersfromthefieldsofarchaeology,interactiondesign,image processingandtextminingtocreateaninteractivesystemthatsupportsarchaeologistsintheirtasksinonlinearchives.By adopting a user-centred approach with techniques grounded in contextual design, we identified the phases of archaeologists workinonlinearchives,whicharedistinctivetothisusergroup.Theinsightsfromthisworkdrovethedesignandevaluation ofaninteractivesystemthatsuccessfullyintegratescontent-basedimagebasedretrievalandimprovedmetadatasearchingto deliverapositiveuserexperiencewhenworkingwithonlinearchives. (cid:2) (cid:2) CCS Concepts: Human-centered computing→Empirical studies in interaction design; Applied computing→ Artsandhumanities; AdditionalKeyWordsandPhrases:Archivalsearch,archaeologicalinformation,exploratorysearchanddiscovery,content-based imageretrieval,texturedescription,featurefusion,instrumentalinteraction,user-centreddesign ThisworkwasfundedbytheDiggingintoDataChallengewithfundingfromJISC,AHRC/ESRC,NWO,andSSHRC. Authors’addresses:C.Power,A.Lewis,andH.Petrie,DepartmentofComputerScience,UniversityofYork,Heslington,York, YO105GH;emails:{christopher.power,andrew.lewis,helen.petrie}@york.ac.uk;K.GreenandJ.Richards,DepartmentofAr- chaeology,UniversityofYork,TheKing’sManor,York,YO17EP;emails:{katie.green,julian.richards}@york.ac.uk;M.Eramian, B.Chan,andE.Walia,176ThorvaldsonBldg.,110SciencePlace,UniversityofSaskatchewan,Saskatoon,SK,S7N5C9,Canada; emails:[email protected],[email protected],[email protected];I.SijaranamualandM.deRijke,UniversityofAm- sterdam,InformaticsInstitute,SciencePark904,1098XHAmsterdam,TheNetherlands;emails:{i.b.sijaranamual,derijke}@ uva.nl. Permissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovided thatcopiesarenotmadeordistributedforprofitorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitation onthefirstpage.Copyrightsforcomponentsofthisworkownedbyothersthantheauthor(s)mustbehonored.Abstractingwith creditispermitted.Tocopyotherwise,orrepublish,topostonserversortoredistributetolists,requirespriorspecificpermission and/[email protected]. 2017Copyrightisheldbytheowner/author(s).PublicationrightslicensedtoACM. ACM1556-4673/2017/01-ART3$15.00 DOI:http://dx.doi.org/10.1145/2983917 ACMJournalonComputingandCulturalHeritage,Vol.10,No.1,Article3,Publicationdate:January2017. 3:2 • C.Poweretal. ACMReferenceFormat: Christopher Power, Andrew Lewis, Helen Petrie, Katie Green, Julian Richards, Mark Eramian, Brittany Chan, Ekta Walia, Isaac Sijaranamual, and Maarten de Rijke. 2017. Improving archaeologists’ online archive experiences through user-centred design.J.Comput.Cult.Herit.10,1,Article3(January2017),20pages. DOI:http://dx.doi.org/10.1145/2983917 1. INTRODUCTION Anever-increasingamountofdigitalarchaeologicaldataisbeingmadeavailableonlinethroughavari- etyofdifferentweb-basedrepositories.Eachrepositorymaycontainthousandsofresources,including everything from scans of handwritten notes, grey literature documents, photographs or sketches of sites,assemblages,andcontexts,toimagesofindividualartefacts. Yet, many of these resources remain underused for a variety of reasons. In some cases, the content of documents is simply unavailable, needing either manual or semi-automatic transcription. In other cases,thecontentisavailable,butsearchingisnotpossibleduetometadatabeingsparselypopulated. Theseproblemsarecompoundedwitharchivecollectionsoftenhavingnon-standardorheterogeneous metadataschemasthatmakeitdifficultforuserstoguesswhatsearchtermsmayberelevantfortheir queries.Allofthesefactorscontributetousersstrugglingtofindthedatathattheyneedwithinlarge archaeologicalarchives. While addressing all of these issues is important, it does not necessarily follow that just improving themetadatawouldresultinalargeincreaseindatausageinthesearchives. Inthelastdecade,therehasbeenanincreasingemphasisonproducingsystemsthatprovidepeople with a good user experience. As users have increased their exposure to a range of technology options, through apps they can buy cheaply, or via online services and where they have the opportunity to exploreforfreebeforebuying,theyaredevelopingasophisticatedunderstandingofwhatis“good”for them in technology. Technology will ultimately go unused if it is either unusable, where users cannot accomplishtheirgoals,orwhereitdeliversanoverallnegativeexperience.Onlinearchives,similarly, need to be considering user experience as a primary outcome when delivering a new system, driven fromtheneedsofusers,ortheyriskanyinvestmentinimproveddataandmetadatagoingtowaste. Interestingly,potentiallyduetothefocusonimprovingsearch,thereisatendencytodesignsystems witharesourcecentricapproach,whichissimilartothematerials-basedapproachdiscussedbyElena et al. [2010]. That is, the design of the systems and their interfaces, such as those developed at the ArchaeologyDataService(ADS),aredrivenfromthestructureofthedata[Jeffreyetal.2007;Charno et al. 2012]. When we look across the literature, we find many projects that deliver systems through thisresourcecentricapproach.Forexample,theGazetteercreatedfortheprojectrelatingtoLucanian heritage[Duplouyetal.2014]wascreatedformanagingresources.Thisprojecthasanexcellentanal- ysisoftechnicalconcerns andtherequirements relatedtothematerialsthatthesystemwillworkon and the resources needed to implement it. However, within the work there was no engagement with potentialuserstoidentifywhataretheirkeyneedsfromthesystem.AnotherexampleistheArkeoGIS system, which has concrete analyses about the types of information users will deposit into their sys- tem[Bernardetal.2014],butnoindicationofwhattheprocesseswerebywhicharchaeologistswould engagewiththesystem. Whilewecritiqueeachoftheabove,usergoalswerenotatargetedoutcomeoftheprojects.Eachof thesesystemsareusefulinofthemselvesandprovidesupporttospecificusers.However,theresource centric approach leads to a lack of generalisation on how to build systems that support the broader tasksofarchaeologists.Asthesystemsaresocloselytiedtotheirdata,itisdifficulttoextractlessons andinteractiondesignpatternsthatcouldbeappliedtonewarchaeologysystems. ACMJournalonComputingandCulturalHeritage,Vol.10,No.1,Article3,Publicationdate:January2017. ImprovingArchaeologists’OnlineArchiveExperiencesThroughUser-CentredDesign • 3:3 We propose that in order to increase use of online archaeological archives and to make future sys- tems easier to design, we need to build on the very broad literature about archive users [Marchionini 2006;Huvila2007;Chapman2010;Elenaetal.2010;Sinn2012],andinvestigatehowcurrentsystems arenotmeetingtheneedsofthearchaeologists inresearch andprofessionalpractice.Withthisinfor- mation,wecantheninvolvearchaeologistsintheprocesstoproducedesignsthataremoresupportive of the workflows found across a variety of different systems and practices. We then need to evaluate systems rigorously with real users, and use the lessons learned from all of this work to generalise designpatternsthatcanbesharedacrossthefield. In this article, we present the work of the Digging into Archaeological Data and Image Search Metadata project (DADAISM). The DADAISM project brought together researchers from the fields ofarchaeology,interactiondesign,imageprocessing,andtextminingtocreateauser-centredinterac- tive system to support archaeologists in their tasks when working with digital image and document archives. 2. CONTEXTUALDESIGN There is a limited amount of work exploring the overall workflows that archaeologists undertake in working with online materials. As a result, there was a need to work with archaeologists to elicit the typesoftaskstheyundertakewhenworkingwithdatainonlinearchives. We chose to use a contextual enquiry [Holtzblatt and Jones 1993; Holtzblatt 2009] approach to examine the archaeologist’s work and draw out details in context. In this approach, the researcher encouragesparticipantstoundertaketheirnormaltasksofworkforthepurposesofobservingandun- derstandinghowandwhytheparticipantundertakesspecificactivities.Unlikeatraditionalinterview setting, where the researcher drives the selection of topics, it is instead participants who are guiding theresearcherthroughtheirroutines.Duringtheobservation,theresearcherwillaskquestionsabout the purpose behind particular actions and how it relates to the overall task. At times the researcher will also explain their interpretation of what is happening in order to confirm with participants that theyareunderstandingthedifferentactivitiesappropriately.Allinterviewsarethenanalysedforkey commonworkflowsandscenariostofeedintointeractiondesignsessions. 2.1 MaterialsandContext For contextual enquiry to be effective, it is often best to document activity in the place of work of the individual. In most cases, this was possible, with interviews being conducted in the offices of the participantontheirownpersonalcomputers. Infourcases,participantsreportedthattheydotheirworkinavarietyofspaces,includingoffices,in thefield,andinlibraries.Fortheenquirysessions,threeofthoseparticipantsundertooktheirtasksin quietofficesattheUniversityofYork,andoneinaquietspaceintheuniversitylibrary.Ineachcase, participants brought a variety of paper notes and notebooks with them, and accessed data remotely throughshareddrivesordocumentsincloudstorage. Participants used a variety of archaeology archives and web resources in their searches, including: theArchaeologicalDataServicearchive,theBritishMuseumwebsite,thePortableAntiquityScheme, andcustomdatabasesthatwerecreatedfromarchaeologicalexcavations. In addition to notes taken with the interview schedule, all sessions were video recorded for later reviewandanalysis. 2.2 Participants Eightparticipants,fivemaleandthreefemale,wererecruitedasanopportunitysamplethroughmail- inglistsandcontactsoftheArchaeologicalDataService. ACMJournalonComputingandCulturalHeritage,Vol.10,No.1,Article3,Publicationdate:January2017. 3:4 • C.Poweretal. Participants were selected from across different groups of archaeologists including: six research ar- chaeologistsworkinginacademicenvironments,oneprofessionalarchaeologistwhoworksinthefield forprivateorpublicorganisations,andonearchaeologystudent. The ages of the participants ranged from 25 to 41, with researchers and professionals having be- tween3to10years working inthefield ofarchaeology. Participants were self-described specialistsin avarietyofdifferenttopicsincluding:smallartefactsfromRomanfrontiers,lithicageantlerandbone tools, flint and glass tools, Greek pottery, Middle Eastern pottery, landscape archaeology, and ancient colonisation. 2.3 ContextualEnquiryProcedure Theenquirybeganwithparticipantsreadinganinformationsheetandwereaskedtosignaninformed consentform.Theresearcherthenaskedasetofwarm-upquestionsexploringtheparticipants’history inthearchaeologyfield,thetypeofworktheyspecialisein,andthetypesarchivesthattheytypically use. This was followed by a set of questions exploring the broad types of tasks participants usually undertookwhenusingonlinearchives,andsomeexploratoryquestionsabouttheirfavouriteandleast favouritethingsaboutthearchivesthemselves. Thefocusoftheenquirysessionswasonunderstandinghowarchaeologyusersfromacrossthecom- munityworkwithimageanddocumentarchivestoachieveavarietyoftasks.Normally,incontextual enquiry,individualsundertaketheirworkinaonetotwo-hoursession.However,thetypeofworkthat archaeologyusersdowiththearchivesoftenextendsoutoveraperiodofweeksormonths.Asaresult, thecontextualenquiryprotocolwasmodifiedslightlyandparticipantswereaskedtoexplore,withthe researcher, a recent project that they had undertaken. The participants were asked to either extend thatworkifitwaspossible,ortoretrospectivelyshowhowtheybegantheirworkontheproject.Par- ticipantswereaskedtoexplorekeyorunusualpointsintheirprocess,inanadaptationofthecritical incidenttechniqueusedinothermethodologies[HartsonandCastillo1998;SerenkoandTurel2010]. Participantswereaskedtodescribetheprojectthattheywereabouttoshowandexplainwhatthey were trying to accomplish. Then, the participants undertook their work in the archives, unguided by theinterviewer,andwerepromptedforclarificationandfurtherdiscussionofkeypoints.Eachsession ranapproximatelyonehour. 2.4 Analysis Inordertocontextualiseandunderstandthedifferenttypesofactivitiesundertakenbythearchaeol- ogists, we undertook a scenario-based design methodology [Rosson and Carroll 2009]. We composed a seriesofscenariosthatdescribeactivitiesseeninthevideorecordingswhichwerethenrewrittenasa smaller set of scenarios that captured the common activities of users. These scenarios were reviewed by archaeologists at the ADS in regards to how representative they were to their own experiences or the experiences of the ADS user base. After several rounds of review, a set of five master scenarios wereusedforideationthroughtherestofthedesignprocess. All participants were trying to find information about artefacts that related to the ones they were alreadyworkingwith.However,thefinalusesoftheimageswerevaried.Manyoftheparticipantswere trying to collect together sets of similar artefacts in order to identify an artefact they had in hand, or in some cases, to pass off that identification task to another specialist team member. In these cases, therewereoftenseveraltypesofanalysisthatwouldbeconductedbyotherteammembersinparallel, with individuals contributing analyses to a central repository (e.g., shared spreadsheet or document). Other researchers were working on presentations and displays for public engagement. Several were workingtofindimagesforinclusioninpublicationsorlectures. ACMJournalonComputingandCulturalHeritage,Vol.10,No.1,Article3,Publicationdate:January2017. ImprovingArchaeologists’OnlineArchiveExperiencesThroughUser-CentredDesign • 3:5 Fig.1. Sequencemodelforarchaeologists’workwithonlinearchives. 2.5 DesignInsights The analysis of the participants’ sessions point toward several commonalities in workflow that can informthedesignofourinteractivesystem.Asequencemodel[Holtzblatt2009]wasconstructedasan outputofthescenario-basedapproach.Themodelrepresentsthegeneralprogressionofactivitiesseen duringthesessions.ThemodelisshowninFigure1. The sequence model aligns well with the pattern of work for information described in other information-seeking models, such as Ellis and Haugan [1997], Marchionini [2006], and Bron et al. [2012], where the sequence is often represented as a cycle with a person moving from formulating a problem,toselectingsourcesandrunningqueries,toresults,andthenextractinginformationforanal- ysisandinterpretation,withopportunitiesforfeedbackintoearlystagesofthecycletoreformulateor change direction. See Bron et al. [2016] for a recent comparison of information-seeking models of the researchcycleacrossanumberofdisciplines. Whileatafundamentallevelthejourneyisthesameforarchaeologistsworkingwithonlinearchives, theemphasisonwhenthearchaeologiststransitionbetweenthedifferentstatesininformationseeking isdistinctive.Inalmostallcases,ourarchaeologistparticipantsexperiencetwodistinctphasesofwork: aphaseofretrievingresourcesandaphaseofsensemakingacrossthoseresources. 2.5.1 RetrievalPhase. In the first phase after query formulation, the participants collect together resourcesfromacrossalargenumberofqueriesandresultsets. Participants would often formulate a set of keyword queries based around their knowledge of the artefact with which they were working, such as “flint blade” or “roman amphora,” in order to prime theirsearchwithsomeresults. ACMJournalonComputingandCulturalHeritage,Vol.10,No.1,Article3,Publicationdate:January2017. 3:6 • C.Poweretal. Afterexaminingtheresultsofthosequeries,theybegintocollectmetadataitemstouseinthenext set of reformulated queries, often on paper outside of the search system. Sometimes, this resulted in successful follow-up searches, but there was a tendency for participants to append the keywords onto previous queries. This often resulted in overly specific queries that returned very small sets of questionableresultstotheparticipants. The participants also often extracted metadata terms from the results of one archive to apply to otherarchives.Thisisinteresting,becauseeventhoughthemetadataschemasaremostlikelydifferent betweenthedifferentarchives,thereseemstobeanimplicittrustthatsearchsystemswillbeableto findappropriateinformationbasedonthetermsthemselves. When these attempts to find artefacts via metadata fail, participants often turned to the physical characteristics of objects they were searching for, such as trying a query for “green glaze” to try to bring up pottery objects from the archive. Two of the participants attempted to use Google image search to bring back images from across the web, potentially giving new avenues of exploration or newunknownarchives.However,bothoftheseparticipantslamentedthatthesesearcheswereseldom successfulbecausethefeaturesthatareusedtoidentifyandcategoriseartefactsgenerallydonotuse very general search algorithms. This issue is further complicated by many artefacts being shards or fragmentsoflargerartefacts,meaningthattheimagesearchcannotaccuratelymatchartefacts. These observations have several consequences for design. First, they demonstrate the importance of having complete metadata, and validate the idea that effort contributed to improving metadata availability will have an impact on overall success in archives. Second, any design solution should providemeansforuserstospeedupthatreuseofmetadatainqueriesacrossmultiplearchives.Finally, itwouldbeverybeneficialforasystemtoallowuserstoperformsearchesonvisualcharacteristicsof artefacts, many of which are not recorded in typical metadata schema. This could be in the form of morespecialisedinformationbeyondlengthandcolour,orviaanimagesearchmechanism. 2.5.2 SensemakingPhase. Aftercollectingtogetherlargeamountsofinformation,theparticipants move into a sensemaking phase where they begin to process the items collected in different ways in ordertoproducenewinsights[Kleinetal.2006a,2006b]. There was less commonality across the participants in the method and order in which this phase was undertaken, often with participants doing many activities in parallel, before transitioning back intoanotherretrievalphase.ThefollowingsectionsdiscussthecommonactivitiestotheSensemaking Phase. 2.5.3 PersonalCollectionCuration. Perhaps one of the most surprising aspects of the participants’ sessionswastheprevalenceandsizeofpersonalcollectionsofartefactimagesanddocumentsthatwere stored in relation to any one project. During the search processes, participants often took resources from the web and stored them either in documents with research notes, or more commonly had hard drives or USB sticks with many hundreds or even thousands of pictures and documents. In many cases, a separate document or spreadsheet was kept with the web links to the items as well as key informationlikeaccessionorcollectionnumbers. These images and documents are often stored in relatively simple structures to begin with (e.g., a folder labelled “Flint images”), with resources being added when there was even a slight possibility of them being useful. After the initial collection stage, there were phases of curating the images for duplicatesorobviouslyirrelevantitems,andthenphasesoforganisingandcategorisingtheitemsinto moreelaborateschemas,usuallythroughfilefolders. Inparallelwiththeseactivities,participantstypicallykeptanongoingsetofresearchnotesregard- ingwhyimageswerebeingkept,orparticularfeatures.Thesenotesincluderelationshipsbetweenthe different resources as well as summarisation about why particular resources were being included in ACMJournalonComputingandCulturalHeritage,Vol.10,No.1,Article3,Publicationdate:January2017. ImprovingArchaeologists’OnlineArchiveExperiencesThroughUser-CentredDesign • 3:7 the set. Where spreadsheets were used, those notes would be integrated with the information about theprovenanceoftheresources. 2.5.4 SharingandCollaboration. Mostoftheparticipantsdiscussedsharingoftheirpersonalcollec- tionsandtheirresearchnotesforpurposesofcollaborationwithcolleagues.Therewasalargevariety of different ways of sharing resources, with no clear style or theme emerging. Many people discussed sending research notes and archive (i.e., zip) files to colleagues through email, sending USB drives through the post, or sharing through cloud resources to request contributions from others. This, of course, leads to potential duplication or data loss problems, which are common in this type of asyn- chronoussharing. Perhaps the most sophisticated structure that was seen during the sessions for collaboration was a spreadsheetforasingleexcavationthatconsistedofalloftheidentifyinginformationforeachresource fromthesite,includingaweblinkwhereitcouldbefound,andthenindividualprotectedcolumnsfor each collaborator toincludetheirfindings fromdifferentanalysismethodologies. Inthisway,asingle team was able to contribute all of their results to a single spreadsheet for each object of interest. It should be noted that this approach was one adopted for a specific excavation that took place as it progressed,notgeneratedfromasetofarchivesearches.However,itisaninterestingmodelofsharing thatmightbeintegratedintodesigns. 3. INTERACTIVESYSTEMPROTOTYPE Fromtheresultsofthecontextualinquirysessions,weproposedaninteractiveapplicationthatwould supportarchaeologistsacrossthetwophases.FortheRetrievalPhase,welookedatmeansofimprov- ing the initial searches, trying to take advantage of the visual characteristics of artefacts as well as proposingnewwaystohelpusersidentifycommonalitiesbetweentheartefactstheyareworkingwith, and moving them to their next search tasks. In order to achieve this, we utilised image processing across specialist data sets, to allow matching of images in the archive to a query image provided by thearchaeologist.Further,weemployedtextminingtoextractcontentlevelmetadatafromdocuments thatwerepreviouslyinaccessibletoresearchers. For the Sensemaking Phase, we focussed on the creation and curation of personal collections that couldbesharedwithcolleagues. 3.1 DataSources Forpurposesofcreatingtheinteractivesystemprototype,theprojectteamidentifiedseveralpotential datasets from the ADS, which were good candidates for exploring the novel aspects of the interactive system. We required one set of data that had a large collection of images of similar artefacts with good, re- liable resource metadata, which could be used for ground truth against any image processing work undertaken in the project. We identified a robust flint biface artefact data set that met these require- ments.Further,wehadseveraldocumentsetsaroundflintbifacesthatwereunderusedduetothelack ofcontentlevelmetadata.ThefinaldatasetsfromtheADSthatwereusedintheprototypewere: —J.J. Wymer Archive: A collection of transcribed notes from one of the foremost specialists in Palae- olithicarchaeology —Grey Literature Archive: A library of unpublished fieldwork reports with large amounts relating to avarietyofdifferentfindtypes,includinglithics. —LowerPalaeolithicTechnology,RawMaterialandPopulationEcologyArchive:Adatabaseof10,668 digitisedimagesof3,556bifaces,aswellasinformationonprovenience,rawmaterial,andstandard ACMJournalonComputingandCulturalHeritage,Vol.10,No.1,Article3,Publicationdate:January2017. 3:8 • C.Poweretal. TableI.MethodsEvaluatedforImage-basedRetrievalofFlintBifaceImages Methods References 1 Uniformlocalbinarypatterns [Topietal.2000] 2 Orthogonalcombinationoflinearbinarypatterns [Zhuetal.2013] 3 Segmentation-basedfractaltextureanalysis [Costaetal.2012] 4 Globalphasecongruencyhistogram [Kovesi2000] 5 Angularradialphasecongruencyhistogram(ARPCH) [Chalechaleetal.2004] 6 Orientation-basedphasecongruencyhistogram OurnovelvariationofKovesi[2000]and Chalechaleetal.[2004] 7 Gaborwaveletfeatures [ManjunathandMa1996] 8 Log-Gaborwaveletfeatures(LGWF) [Arro´spideandSalgado2013] 9 Binarytextonfeatures(BTF) [Guoetal.2014] 10 FusionofLGWFandARPCH 11 FusionofLGWFandBTF 12 FusionofLGWFandBTF 13 FusionofLWF,ARPCH,andBTF measurements.Entrieshaveatimerangefrom1.5Myrto300KyrandincludematerialfromAfrica, Europe,andtheNearEast. 3.2 Image-BasedRetrieval Giventheexamplesofindividualstryingtomatchvisualcharacteristicsofartefacts,weimplemented a set of image-based retrieval algorithms that accept a query image as input, search a database of images for those that are most similar to the query image, according to some domain-specific crite- ria, and return a set of the most similar images. Many of our participants of the contextual enquiry sessionshadexperienceeitherworkingwithorteachingaboutbifaceartefactsofdifferenttypes.Wein- terviewedourparticipantsregardingwhatcharacteristicstheyusetoidentifydifferenttypesofbiface artefacts. From this, we hand-crafted numerical image descriptors that measure the similarity of images by extractingarchaeologicallyrelevantcharacteristicsofthebifacesintheimages.Weevaluated13differ- ent methods for extracting texture information and compared their performance of pulling out shape andtextureinformationtocharacterisesimilaritywithrespecttothequeryimage.Shapeinformation is extracted from images by segmenting the artefacts and extracting features like scale-normalised length and width of the object’s bounding box, scale-normalised area, and scale normalised breadths of the artefact at 20% and 80% along its length. These features are correlated well with “tool type” metadata from images in the database. Thirteen different texture descriptors were evaluated and all performed generally well as predictors of the “raw material type” metadata from images in the database. Table I lists the methods from the computer vision literature used to build our 13 texture descriptors. Theseshapeandtexturedescriptorsarefusedintoasinglenumericaldescriptorforthequeryimage, which is then compared with the corresponding descriptors of images in the archive. We retrieve the top100imageswiththesmallestdifferencesfromthequerydescriptor. Thebest-performing methods forextracting shapeandtexture representation wereabletoretrieve setsofimageswithtotalmetadatasimilaritythatwas,onaverage,90%ashighasthetotalmetadata similarityofthe100mostsimilarimageswhenusingaleave-one-outcross-validationstrategyagainst thegroundtruthmetadataforthebifaceimages.Thismeansthatourmethodmostlyretrievesresults thathavehigh-likelihoodofbeinghighlyrelevanttousers. ACMJournalonComputingandCulturalHeritage,Vol.10,No.1,Article3,Publicationdate:January2017. ImprovingArchaeologists’OnlineArchiveExperiencesThroughUser-CentredDesign • 3:9 TableII.Token-LevelFeaturesforAutomaticMetadataExtraction Feature Description Word Theoriginalwordform lower Thelower-casedformoftheword Length Thelengthofthewordincharacters Hascap Doesthiswordcontainanuppercaseletter Initcap Doesthiswordstartwithanuppercaseletter Allcap Doesthiswordconsistsolelyofuppercaseletters Hasdigit Doesthiswordcontainadigit Alldigit Doesthiswordconsistsolelyofdigits Haspunct Doesthiswordcontainpunctuationcharacters(dashes,periods,hyphens) Allpunct Doesthiswordconsistsolelyofpunctuationcharacters alnum Doesthiswordconsistssolelyoflettersandnumbers prefix Allprefixesofthiswordoflengthoneuptofour suffix Allsuffixesofthiswordoflengthoneuptofour shape Theshapeoftheword:mapuppercaseto“A,”lowercaseto“a,”digitsto“0,” andpunctuationto“.” Summarisedshape Sameasshape,butcollapserunsofidenticalconsecutivecharacters:map“A aa-AAA00.”to“Aa-A0.” 3.3 TextMiningofResearchandGreyLiterature The textual datasets in the archives have the same problems as the visual datasets with regards to the available metadata. To overcome the lack of descriptive metadata fields, a text-mining pipeline was developed to automatically extract salient phrases. Our approach builds upon the earlier work undertaken in the Archaeotools [Jeffrey et al. 2009] project and on work on the Contextualizing Media Research Data (CoMeRDa) toolkit [Bron et al. 2013]. The aim in both cases is to automati- cally extract the metadata fields to supplement the current manually added fields. The overarching guideline to decide what is extracted is the what, where, when, and who questions. Specifically, in this context, we are interested in what was found, where it was found, from what time period it originated, and who found it. This information is reflected in the existing item metadata fields, but due to the current fields being the result of a manual process, the coverage of existing metadata is sparse. Theautomaticmetadataextractioniscastasaninformationextractiontask,andonethatisstrongly related to the more common named entity recognition (NER) task [Nadeau and Sekine 2007]. In the NERtask,oneisinterestedinextractingallproper nouns,denotingpersons,locations,andorganisa- tions.Thecomplicatingfactorforthemetadataextractionisthatthefieldsaremostlynominalentities, with the exception of the who and where fields, and as such, have a higher degree of variability (e.g., “workedflint,”“charredchafffragments,”“mid-lateironage”) This information extraction task has a typical supervised machine learning setup. Using the man- ually annotated documents from the Archaeotools project, we trained a first order linear chain condi- tionalrandomfield(CRF)[SuttonandMcCallum2006]withthetokenlevelfeaturesfromTableII.To enable the CRF to also take the context of the current word into account, we enrich its feature set by includingthetokenlevelfeaturesofthetwoprecedingandtwofollowingtokens. Theautomaticallyextractedfieldswerethenaddedtotheexistingmetadatafieldsoftheitemsinthe datasets, if they existed, or were added as new fields where they did not exist before. This allowed us tohaveamoreconsistentsetofmetadataforeachindividualcollection,aswellasacrossthedifferent collections,increasingtheusabilityofthesearchinterface. ACMJournalonComputingandCulturalHeritage,Vol.10,No.1,Article3,Publicationdate:January2017.
Description: