InformaticsinEducation,2011,Vol.10,No.1,1–12 1 ©2011VilniusUniversity Development of an Information System for Diploma Works Management TsvetankaGEORGIEVA-TRIFONOVA DepartmentofMathematicsandInformatics St.CyrilandSt.MethodiusUniversityofVelikoTarnovo 3ArchitectGeorgiKozarovstr.,VelikoTarnovo,Bulgaria e-mail:[email protected] Received:January2011 Abstract.Inthispaper,aclient/serverinformationsystemforthemanagementofdataanditsex- traction from a database containing information for diploma works of students is proposed. The developedsystemprovidesusersthepossibilityofaccessinginformationaboutdifferentcharacter- isticsofthediplomaworks,accordingtotheirspecificinterests.Theclient/serverarchitectureof thesystemisdescribedaswellastheservicesoffered.Theauthorpresentsthestructureofthecre- ateddatabasethatstoresthenecessaryinformation.AclientapplicationADP(accessdataproject) isimplemented,providingthepossibilityforinsertion,updatingandsearching,aswellasaclient applicationisfulfilledwithJavafordiscoveringtheconstraint-basedassociationrules. Keywords:database;client/serversystem;ADP(accessdataproject)application;diplomaworks; associationrules. 1. Introduction Inthelastyears,thecreation,thedistributionandtheusageofthetrainingandscientific literatureisperformedlargelybymeansofitsdigitalizedform.Inthismannertheworkof teachers,researchers,professionalsinparticularareasarefacilitated,aswellasthework ofthestudentsandmainlyofthestudentspreparingtheirdiplomaworks.Providingfast access to the electronic variant of the developed diploma works and related with them materialscansufficientlysupportthestudentsintheirworkandincreaseitsquality. Theaimofthepresentedpaperistorepresentaclient/serversystemforkeepingin- formationondiplomaworksofstudents.Theimplementedclient/serversystemprovides usersthepossibilityofextractinginformationaboutthedevelopeddiplomaworks.Ital- lows students and teachers quick access by means of a convenient interface to the data andthefiles,relatedtothediplomaworksofstudents,graduatedthebachelororthemas- ter degreeof some of thespecialitiesinDepartmentof Mathematicsand Informaticsin St.CyrilandSt.MethodiusUniversityofVelikoTarnovoaftertheyear2002. Thestorageofthedataaboutthediplomaworksofthestudentsinadatabasemakes suitableconditionsfordatamining(Barsegyanetal.,2008;Kantardzic,2003),i.e.,ana- lyzingtheaccumulateddatawiththepurposeofextractingthepreviouslyunknownand 2 T.Georgieva-Trifonova potentiallyusefulinformation.Thatmotivatestheutilizationofaprogramfordiscovering theconstraint-basedassociationrules.Associationruleminingisaformofdatamining todiscoverinterestingcorrelationrelationshipsamongtheattributesoftheanalyzeddata. Anassociationrulerevealsthefrequentoccurringpatternsofthegivendataitemsinthe database. Therestofthepaperisorganizedasfollows.InSection2,thefeaturesoftheinfor- mationsystemsfordiplomaworksarereviewed.Section3containsadescriptionofthe client/serverarchitectureandtheinterfaceofthesystem.InSection4,werepresentthe entity-relationship model (ER model) of the database that stores information about the diploma works of the students. We also produce the relational tables obtained after the transformationofthecreatedERmodelintorelational.Theserelationsareimplemented byusingthedatabasemanagementsystemMicrosoftSQLServer.Section5consistsof thedescriptionoftheclientapplicationsforupdatingthedataanddatamining. 2. ReviewoftheFeaturesoftheInformationSystemsforDiplomaWorks The main features of the existing storage and retrieval systems providing access to the electronicversionofthediplomaworksofthestudents(HKUSTLibrary;UMGraduate School&FoglerLibrary;VirginiaTech;WSULibraries)are: 1. Insertionofthedataaboutagivenstudent. 2. Insertionofthedataaboutadiplomawork–topic,yearofitsdefence,filewiththe electronicversionofthediplomawork. 3. Searchingdiplomaworksbytopic,author,year. Theneedofasystemprovidingthelistedfeaturesforourstudentsisthebasicmoti- vationforthedevelopmentofthesystemrepresentedinthepresentedpaper.Moreover, asaresultofourresearch,wehaveestablishedthataninformationsystemfromthecon- sideredkindcouldproposeadditionalfeatures,whichmakeitmoreusefulfortheusers. 4. Insertionofthefilesrelatedtothediplomaworks. Besidesthefileswiththecontentofthediplomaworks,thecorrespondingpresen- tationsfromtheirdefences,themultimediafiles,theprograms,thedatabases,the programssourcecodeandtheotherscanbeadded.Ononehandthissupplement ishelpfulfortheusers.Ontheotherhand,asubstantialadvantageisthatprovid- ing the electronic sources permits the students better and more complete ways to representtheirdevelopments. 5. Applyingalgorithmsfordatamining. Applying different algorithms for data mining on the data collected as a result of the usage of the system, could lead to extracting useful information about the diplomaworks,theirtopicsduringthedifferentyears,theobtainedmarksfromthe defences,etc. The basic features (1)–(3) are included in the developed information system and a variantfortheimplementationoftheadditional(4)–(5)isproposed. In the presented paper, the client/server architecture of the developed system is de- scribedandtheservices,includedinitsrealization.Thestructureofthecreateddatabase DevelopmentofanInformationSystemforDiplomaWorksManagement 3 is represented, keeping the necessary information. A client application ADP (Pearson, 2004) is proposed, designed for insertion, editing and searching the data. Besides this aJavaprogramisapplied(Eck,2006;Eckel,2001),providingthepossibilityfordiscov- eringtheconstraint-basedassociationrules. 3. OverviewoftheClient/ServerArchitectureoftheSystem Theclient/serverarchitectureofthedevelopedsystemiscreatedonthebaseofthetwo- layerinformationmodel(Fig.1). Thelayerfordataprocessingisimplementedbyusingthedatabasemanagementsys- tem.ForthepresentsystemweuseMicrosoftSQLServer,whichallowsefficientstorage oflargedatabasesandprovidesfunctionalityforaccessingthedata(Bienieketal.,2006; Kroenke,2003;MicrosoftCorporation). The client part consists of an ADP application, providing a convenient interface for insertion, updating and searching the data, as well as a Java program for mining the constraint-basedassociationrules. 3.1. SQLServerDatabaseforDataStorage The DiplomaWorksDB database serves for storage and processing the data for the diplomaworks,thestudentsandtheiradvisors.Informationonthestudent’sfacultynum- ber,student’snames,thescientificand/oreducationaldegree,thespeciality,theformof training,thetopicandtheannotationofthediplomawork,thestudent’sadvisor,there- viewers,the date of the defence of the diploma work, the mark obtainedby the student forthediplomaworkismaintained. Foreachdiplomaworkapossibilityforinsertionofanadditionalinformationispro- vided–files(.doc,.pdf)withitscontent;presentation(.ppt,.pdf)ofthestudentforthe defenceofthediplomawork;applicationimplementedbythestudent(suchasadatabase, aprogramofC,Java,etc.)andothers. Fig.1.Thearchitectureofthesystem. 4 T.Georgieva-Trifonova Thebasicfunctionsofthedatabaseinclude: • additionofanewstudentinthedatabase; • editionofthedataaboutthestudents; • deletionofstudentsfromthedatabase; • browsingthedataforthestudents; • searchingbydifferentcriteria. 3.2. ClientApplicationADPforInsertionandSearchingtheData MicrosoftAccessallowsestablishingaconnectionbetweenthecurrentdatabaseandta- blesfromotherdatabasesofMicrosoftSQLServerandotherdatasources.ADPiscon- nected with a database of SQL Server and provides an access to the objects created in thatdatabase(suchastables,views,storedprocedures,triggers,etc.).Thedataarestored in thedatabaseof SQLServer.ADPdoesnotcontainanydataandtables,butit canbe usedforeasycreationofforms,reports,macros.Asaresultofthat,theenduserfeatures opportunity for insertion, editing, and deletion of the data by means of a comfortable interface. 3.3. ClientApplicationforMiningtheConstraint-BasedAssociationRules Thegoalofassociationrulesmining(Agrawaletal.,1993)istofindinterestingassoci- ationsorcorrelationrelationshipsamongadataset,i.e.,toidentifythesetsofattributes- items, which frequently occur together and then to formulate the rules characterizing these relationships. The constraint-based association rule mining (Fu and Han, 1995) aimstofindallrulesfromgivendataset,whichsatisfytheconstraintsrequiredfromthe users.AnapplicationisimplementedwithJavafordiscoveringtheconstraint-basedas- sociationrules,whichin(TrifonovandGeorgieva,2009)isutilizedforthedata,obtained after applying the methods of digital processing of signals for analysis of the sounds oftheuniqueBulgarianbells.ThisclientapplicationisconnectedwiththeDiploma- WorksDB database with the purpose of performing the association analysis of the data aboutthediplomaworksofthestudents. 4. ModelingoftheData ThemodeloftheDiplomaWorksDBdatabase,inkeepingwiththeentity-relationship model(ERmodel),introducedinChen(1976),isshowninFig.2.Theentitysetsofthe ERmodelaredepictedasrectangles,theirattributesasellipsis,andtherelationshipsas rhombs(Garcia-Molinaetal.,2002). ThedatabaseisimplementedbymeansofthedatabasemanagementsystemMicrosoft SQLServer.TherelevantrelationaltablesareshowninFig.3. The structure of the database is defined to provide the best efficiency of the most frequentlyusedoperations–insertion,updating,datasearching. DevelopmentofanInformationSystemforDiplomaWorksManagement 5 Fig.2.ERdiagramoftheDiplomaWorksDBdatabase. Fig.3.RelationalmodeloftheDiplomaWorksDBdatabase. 6 T.Georgieva-Trifonova TheDiplomaWorksDBdatabaseofSQLServercontainsthecreatedviewsforex- tractingthedatafromseveralrelatedtables,aswellasthestoredproceduresforobtaining the information on the students, defended their diploma works during a specific month andyear;thestudents,whosediplomaworks’topicscompriseaspecificstring.Thestored procedures provide a better performance of the client/server system because they make decreasingtheexchangetodatabetweentheclientandtheserver.Besidesthestoredpro- cedures can accept parameters and therefore they can be executed from multiple client applicationsbyapplyingdifferentinputdata. 5. ClientApplicationsforUpdating,SearchingandMiningtheData AclientapplicationADPisdevelopedforupdatingandsearchingthedataaboutdiploma worksofstudents,aswellasaclientapplicationimplementedwithJavaforminingthe constraint-basedassociationrules. 5.1. ADPClientApplicationforInsertion,EditionandSearchingtheData Formsforinsertionandupdatingthedataareimplemented.Theirpurposeistofacilitate actualizationoftheinformation.Theformforinsertionofthedataaboutthestudentsand theirdiplomaworksisshowninFig.4. Besidesthis,theapplicationallowstheexecutionofdifferentqueries,whichperform findingthespecificinformation,correspondingtothegivensearchingcriteria.Eachuser canfulfilsearchbyfillingintextboxesand/orlistboxeswhichcorrespondtothechosen Fig.4.Formforinsertionofthedataaboutstudents. DevelopmentofanInformationSystemforDiplomaWorksManagement 7 characteristics of the diploma works of the students, stored in the database. The results fromeachqueryarepresentedinaformatconvenientfortheenduser.Theformsandthe reportsareimplementedwiththerecordsources–viewsandstoredproceduresdesigned forextractingthedataabout: • studentsfromachosenspeciality; • studentswithachosendiplomawork’sadvisor; • students,defendedtheirdiplomaworksduringachosenmonthandyear; • students,whosediplomaworkscontainintheirtopicsagivenstring. 5.2. ClientApplicationforDiscoveringtheConstraint-BasedAssociationRules The association rules are introduced in Agrawal et al. (1993) and they are utilized for specifyingthecorrelationrelationshipsamongitemsetsinthedatabase. LetI ={I ,I ,...,I }beasetofndifferentvaluesofattributes.LetRbearelation, 1 2 n whereeachtuplethasauniqueidentifierandcontainsasetofitems,suchthatt⊆I.An associationruleisanimplicationoftheformX →Y,whereX,Y ⊂I aresetsofitems withX ∩Y =∅.ThesetX iscalledanantecedent,andY –aconsequent. Therearetwoparametersassociatedwitharule–supportandconfidence.Thesupport softheassociationruleX → Y istheproportion(inpercentages)ofthenumberofthe tuplesinR,whichcontainX ∪Y tothetotalnumberofthetuplesintherelation.The confidence c of the association rule X → Y is the proportion (in percentages) of the number of the tuples in R, which contain X ∪ Y to the number of the tuples, which containX. The task of association rules mining is to generate all association rules which have valuesoftheparameterssupportandconfidence,exceedingthepreviouslygivenrespec- tivelyminimalsupportmin_suppandminimalconfidencemin_conf. To increase the efficiency of existing algorithms for data mining, during the mining process constraints are applied with the goal for these association rules, of which only thoseinterestingtotheuseraregenerated,insteadofallassociationrules.Inthiswaya largepartofthecalculationsforminingthoserulesthatareremovedasunnecessary,can be avoided. Usually the constraints, provided by the users, are constraints for the data, constraintsfortheattributes,constraintsforformationoftherule. We have developed an application for discovering the constraint-based association rules,whichneedstomeetthefollowingpreliminaryrequirements: • Theapplicationmustgivetheopportunityfortheusertoselecttheattributesinthe antecedentandtheconsequencesofthesearchedrules. Usually the user is interested in a specified subset of attributes and wants to ex- pressinterestingcommonconnectionsbetweentheselectedattributes.Thereforea facilitywithafriendlyinterfaceshouldbeprovidedtospecifythesetofattributes tobeminedandexcludethesetofirrelevantattributesfromtheexamination. • The user has to be able to define various values of the minimal support and the minimalconfidence,whentheitemsaremined. Thesupportreflectstheutilityongivenrule.Theminimalsupportmin_supp,which anassociationrulehastosatisfy,meansthateachvalue,includedinthestudy,has 8 T.Georgieva-Trifonova to appear a significant number of times in corresponding attribute of the initial relation. Theconfidencemeasuresthereliabilityoftheinferencemadebyarule. • Reducingthenumberofthegeneratedassociationrulesmustbepossiblebydeter- miningthecriteriathatthevaluesofselectedattributeshavetosatisfy. Innumerouscasesthealgorithmsgeneratealargenumberofassociationrules.Itis almostimpossiblefortheenduserstoencompassorvalidatesuchalargenumber of association rules, limiting the results of the data mining is therefore helpful. Besides, often the user is interested in definite values of the attributes, which are includedinassociationrulesmining. • Visualizingtheobtainedresultsmustberepresentedinatabularviewwithprovid- ingtheopportunityfortheusertoorderthefoundrulesby: ◦ alphabeticorderoftheattributes,whichparticipateintheantecedentandthe consequenceoftheassociationrules; ◦ thesupportoftheassociationrulesinascendingordescendingorder; ◦ theconfidenceoftheassociationrulesinascendingordescendingorder. Intabularviewofassociationrules,alldiscoveredrulesarerepresentedinatabular tablewitheachrowcorrespondingtoaruleandrepresentsinformationaboutrule support and confidence. By this way the user has a clearer and complete view of the rules and can locate a specific rule more easily. The tabular view facilitates understandingthelargenumberofrules. Anapplicationisdeveloped,whichallowstheusertosetconstraintsforsearchedrules and finds constraint-based association rules. The application is used for performing the associationanalysisonthedifferentcharacteristicsofthediplomaworks,theinformation forwhichiskeptinaninformationsystemproducedforthegoal. Totheuserthatstartstheapplication,thefollowingpossibilities,areprovided(Fig.5): • settingtheattributes,beingsubjecttoanalysis; • setting the minimal value of the support min_supp and the minimal value of the confidencemin_conf; • settingtheconditions(Booleanexpression)forthevaluesoftheattributes,which canparticipateornotintheantecedentandtheconsequenceofthesearchedrules; • all the rules can be displayed in different order – by alphabetic order of the at- tributesparticipatingintheantecedentandtheconsequence;bythesupportorcon- fidenceinascendingordescendingorder. Theapplicationisutilizedfordiscoveringtheconstraint-basedassociationrulesinthe database containing information about the diploma works of about 1000 students grad- uated the bachelor or the master degree of the specialities “Mathematics and informat- ics”,“Informatics”,“Computerscience”,“Informationsystems”,“Informationsecurity”, “Computermultimedia”inDepartmentofMathematicsandInformaticsinSt.Cyriland St.MethodiusUniversityofVelikoTarnovoaftertheyear2002.Theattributes,whichcan be included in analyzing, are: student’s faculty number; student’s names; the scientific and/oreducationaldegree;the speciality;the form of training; the topic; the annotation DevelopmentofanInformationSystemforDiplomaWorksManagement 9 Fig.5.Discoveringtheconstraint-basedassociationrules. ofthediplomawork;theyearofthedefenceofthediplomawork,themarkobtainedby thestudentforthediplomawork;thestudent’sadvisor. Figure 5 shows an example result from the execution of the implemented program with given values of the minimal support, minimal confidence and conditions for the valuesoftheattributes.Forinstance,letthefollowingrulebegeneratedfromthedatabase withthediplomaworks: Speciality(“Informatics”)→Mark(“6”) withvaluesofthesupports=0.11164andtheconfidencec=0.61702.Thisrulemeans, thatforthestudents,graduatedinthespeciality“Informatics”,whosediplomaworksare intheareaofthedatabasesandtheinformationsystems,oneofthemostfrequentmarks fromthedefenceoftheirdiplomaworksis6(with61.702%confidence)andthestudents in Informatics with diploma works in databases and information systems and marks 6 represent11.164%fromallstudents,includedinthestudy. Someotherexamplesofassociationrules,whichcanberetrievedfromtheexecution oftheprogram,are: • Advisor(A),Speciality(S) → Mark(M)withthevaluesoftheparameterssup- port s = 0.19477 and confidence c = 0.64634, where the condition YearOfDe- fence=Y isgiven; 10 T.Georgieva-Trifonova Bymeansoftherulesofthiskindwecanestablishthatthestudentswiththeadvisor A,thespecialitySandgraduatedinagivenyearY,oneofthemostfrequentmarks fromthedefenceoftheirdiplomaworksisM(with64.634%confidence)andthey represent19.477%fromallstudents. • Mark(M) → Speciality(S) with the values of the parameters support s = 0.49644andconfidencec=0.41627. Such rules allow extracting information about the percentage (49.644%) of the studentsfromagivenspecialityS,whosediplomaworksareevaluatedwithcertain mark M. Besides, we can conclude that the students, which are evaluated with markM,arefromthespecialityS with41.627%confidence. Theusercanestablishtherelationshipbetweenotherattributesincludedinthestudyby meansofthesimilarrules. 6. Conclusion In this paper, the automated system is proposed. It explores a client/server based ap- proach to managing the information on the diploma works of the students. The created database contains information about different characteristics of the diploma works and it is implemented on Microsoft SQL Server. The interface is developed by means that allow establishing a connection with the database of the ADP project. This gives users the possibility of easily accessing detailed information about the diploma works of the students. In addition, an application is represented which provides the possibility for finding theconstraint-basedassociationrulesofthedataaboutthediplomaworks. Thebasicadvantagesfromtheusageoftheimplementedsystemcanbesummarized bythefollowingway: • Thesystemprovidesfastandeasyaccesstothedevelopeddiplomaworksandthe related materials. The user can receive an extract about the existing works of the graduated students on a given topic. Besides, browsing concrete diploma works and their reviews allows acquiring a clearer notion about the requirements to the finalviewofadiplomawork,fortheeventualnotesandrecommendations. • Anadditionalmotivationofthestudentsisprovidedforbetterandfullerrepresen- tationoftheirdevelopments. • Analyzingtheaccumulateddatawiththeconstraint-basedassociationrulesreveals aninterestinginformationaboutthecharacteristicsofthestudent’sdiplomaworks. Our future work includes development of an application for mining the constraint- basedassociationrulesinthetextofthediplomaworks(textmining),whichallowsper- formingtheassociationanalysisofthedifferentwordsfromtheircontents. References Agrawal,R.,Imielinski,T.,Swami,A.(1993).Miningassociationrulesbetweensetsofitemsinlargedatabases. In:ProceedingsoftheACMSIGMODInternationalConferenceonManagementofData,Washington,207– 216.