ebook img

SQL Server 2005 Data Mining PDF

10 Pages·2005·0.233 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview SQL Server 2005 Data Mining

SQL Server 2005 Data Mining Page 1 asp:feature LANGUAGES:VB.NET SubscriptionServices ASP.NETVERSIONS: ALL Subscribe RenewSubscription asp.netNOWNewsletter SQL ChangeofAddress PayAnInvoice Server SubscriptionPackages asp.netPRO Articles 2005 AffiliateSpotlight 411asp.netDirectory Data NewProducts BookReviews ProductReviews Mining Opinion BackIssues CreateaWebCross- Reprints/E-prints Search sellApplication Downloads PremiumDownloads Informant ByRamanIyerandJesperLind ContactUs AdvertisewithUs Theconceptofcross-sellis WriteForUs familiartomostofus.What yourfriendlyneighborhood McDonald’ssalespersondoes LatestFeatures whenyouorderacheeseburger •NUnitASP isexactlywhatAmazon.comor •UndertheHood buy.comaredoingonlinewhen youadditemstoyourshopping •DynamicDropdowns cartandyougetalistofother •ValidateUser-entered itemsyoumightalsolike. Data Youcanaddthisfunctionalityto •TotalRecall yourASP.NETpageby ArticleRating employingthepowerofdata mining,usingsimpleSQL-like Ratethisarticleona queriestoproducehigh-quality scalefrom0to5 recommendations.Microsoft nmlkj 5Best SQLServer2005,currently nmlkj 4 availabletoover200,000 MSDNsubscribersinBeta2, nmlkj 3 includesadvanceddatamining nmlkj 2 capabilitiesthatareavailable nmlkj 1 programmaticallyviastandard interfaceslikeADO.NET.This nmlkj 0Worst willallowyoutointegratecross- sellintoyourWebstore Submit applicationwithminimaleffort. Email BeforedevelopingaWeb cross-sellapplication,weneed tobuildtheserver-side intelligencethatwillenablethe Tellafriend applicationtocomeupwith aboutthisarticle! smartproduct recommendations.This processinvolves: Preparingthedatayoualreadyhaveaboutpastcustomers formining; Designingaminingmodelforthepurposeofmaking recommendationstonewcustomers; DeployingthemodeltoAnalysisServerandtrainingitwith thedatapreparedearlier;and SettingupsecuritytoallowASP.NETtoquerythetrained model. http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 2 Thefirstpartofthisarticlewillexplainkeyconceptsand walkyouthroughtheaboveprocessusingtheSQL Server2005DataMiningtools.Inthelatterhalfwe’ll demonstratetheeasewithwhichyoucanintegratethe predictivepoweroftheminingmodelintoyourASP.NET Webapplicationusingstraightforwarddatabaseaccess code. BeforeYouStart InstallMicrosoftSQLServer2005AnalysisServicesBeta 2.Thiswillsetuptheservercomponentsaswellasthe designandmanagementtoolsthatwe’lluseinthisarticle: BusinessIntelligenceStudioandSQLServer ManagementStudio,respectively.Wealsorecommend yougooverthedataminingtutorialincludedwithBeta2. Note:We’llrefertoyourrunninginstanceofSQLServer 2005AnalysisServicesas“AnalysisServer”intherestof thearticle.Theserviceshowsupas“AnalysisServices (MSSQLSERVER)”underServicesinAdministrative Tools. MiningYourCustomerPurchaseData Hereweoutlinetheprocessofbuildingtheback-end frameworkthatminesyourhistoricalcustomermovie purchasetransactionsandextractstheknowledgeneeded tomakecross-sellrecommendationstonewcustomers. Thisknowledgeisembeddedintheminingmodelthat we’lldesign. Datapreparationisanimportantaspectofanydata miningprocess.WithSQLServerDataMining,however, itisalsopossibletominethetransactiondatainyour relationaldatabasedirectly.Forsimplicity,we’lltakethis approachandassumethatasingletransactiontable containsyourcustomerpurchaseinformation,asshown inFigure1. Figure1:Moviepurchasedatausedbysampleapplication. TheAnalysisServicesprojectinSQLServer2005 BusinessIntelligenceDevelopmentStudioprovidesthe frameworkformodelingdataandbuildingaminingmodel thatlearnscustomerbuyingpatternsfromexistingdata gatheredfrompriorpurchases.Wethenusethetrained modeltogeneraterecommendationsfornewcustomers. Thefirststepistoidentifytheentitywhosebehaviorwe areinterestedinanalyzingforthepurposeofourcross- sellapplication.Acaserepresentsallinformation(also referredtoas“attributes”)knownaboutthisentity.Inthis scenario,eachdistinctcustomerintheCustomerMovies tableandthesetofmoviestheypurchasedformsacase. SQL2005DataMiningusestheconceptofanested tabletorepresentavariable-lengthcollectionofattributes ofthesamekindassociatedwithacase.Foreach customerthereisasetofrowscontainingthelistof moviespurchased,whichcanberepresentedasanested table(asshowninFigure2). Figure2:Theminingmodel’sviewofthecustomermoviepurchasedata. http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 3 Thedefinitionofacaseanditsassociatedattributesis knownasaminingstructureinSQL2005DataMining. Thenextstepistobuildaminingmodelbyselectinga miningalgorithmandspecifyinghowthecolumnsinthe miningstructurewillbeusedbythealgorithmtoprocess theinputdataandextractusefulknowledgefromit.The AssociationRulesalgorithmisagoodfitforourscenario. Itlearnswhichitemsarelikelytobeboughttogetherand utilizesthatinformationtopredictotheritemsgiventhe itemsthecustomerhasselected.MarkingtheMovies nestedtableasbothInputandPredictableenablesthe modeltomakepredictionsusingthemoviesprovidedas input.NotethattheminingmodelinSQLServer2005 DataMiningisadatabaseobjectthatholdsboththe definitionoftheinputtotheknowledgeextractionprocess andtheoutput,whichconsistsofpatternsorrules learnedbyapplyingtheselecteddataminingalgorithmto processtheinputdata. Aswe’llsee,theMiningModelWizardinSQLServer 2005BusinessIntelligenceDevelopmentStudiobuildsa miningstructureandamodelusingthealgorithmselected onthefirstpage. Next,theminingmodeldefinitionissentaspartofa deploymentpackagetotheserverwhereitistrained. Trainingcasesconsistofinformationwehavecollected frompastpurchases.Thedeploymentpackageis generatedfromtheAnalysisServicesprojectthatwe designthemodelin,anditincludesbindingstothedata sourcethatAnalysisServerusesforobtainingthetraining cases. Toimprovethequalityofrecommendationsmadefornew customersthemodelcanbeperiodicallyre-trainedas morecustomerdataisaddedtothetransactiondatabase. Forlargedatasetsthiswilltypicallybecarriedoutduring off-peakhoursoragainstareplicaofthetransaction database.SQLServerDataTransformationServices (DTS)canbeusedtosetupapackagetoperformsuch periodicupdates. Finally,wemustsetuppermissionssoourapplication canquerythetrainedmodel. BuildingaCross-sellMiningModelinSQLServer 2005 Seetheendofthisarticleforinformationabout downloadingthecompleteAnalysisServerprojectbuilt usingthestepsoutlinedhereforproducingthemining modelwe’llutilizeintheASP.NETcodesamplelaterin thisarticle: 1) CreateanewAnalysisServicesprojectnamed MovieRecommendationsinBusinessIntelligence DevelopmentStudio. 2) AddanewDataSourcepointingto MovieData.mdb,theAccessdatabaseincludedwiththe sampleproject(availablefordownload;seeendofarticle fordetails). 3) AddaDataSourceViewbasedonthe DataSource.Selecttheonlytable,CustomerMovies, presentintheDataSource. 4) Right-clickontheMiningModelscollectionand selectNewMiningModeltolaunchtheMiningModel Wizard. 5) PickFromexistingrelationaldatabaseordata warehouseontheSelectDefinitionMethodpageandclick Next. 6) PickMicrosoftAssociationRulesasthedata miningtechniquetouseonthenextpage. 7) SelecttheDataSourceViewcreatedinstep3. http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 4 8) Onthenextpage,thesingleCustomerMovies tablepresentinourDataSourceViewisshown.Markitas bothCaseandNested. 9) ClickNexttogototheTrainingDatapagewhere weneedtospecifythecolumnsweareinterestedin includingfromeachselectedtable(seeFigure3).Inour scenario,theCustomerMoviestableservesasthesource forourcasesaswellasthenestedtabledataassociated witheachcase(thisiswhyitislistedtwice).Fromthe firstinstancepickCustomerIdasthekeysinceituniquely identifieseachcustomer.Fromthesecondinstancepick MovieasKey,Input,andPredictable. 10) ClickFinishtocompletethewizardandbuildthe MovieRecommendationsminingstructureandmodel(see Figure4). Figure3:Selectingcolumnsweareinterestedinmodelingforcross-sellintheData MiningWizard. Figure4:Thecross-sellminingmodelinBusinessIntelligenceDevelopmentStudio. Theabovestepscreateadefinitionofourcross-sell miningmodelandassociatedobjectsinthedevelopment environment.Therearetwostepstodeploythemining modeltotheAnalysisServerandtrainit: 1) Right-clickontheMovieRecommendationsproject intheSolutionExplorerandselectProperties.Verifythat theServerpropertyintheDeploymentsectionof ConfigurationPropertiespointstotheserverhostingyour AnalysisServicesinstance.Closethedialogbox. 2) Right-clickagainontheMovieRecommendations projectintheSolutionExplorerandselectDeploy.This sendstheclient-sidedefinitionstotheserverandinitiates trainingoftheminingmodel. WemustsetupaccesspermissionsinAnalysisServer forIISusingtheSQLmanagementtoolforourASP.NET applicationtousethetrainedminingmodel: 1) OpenSQLServerManagementStudio. http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 5 2) ClickConnectintheObjectExplorer,select AnalysisServer,andconnecttotheAnalysisServer hostingyourmodel. 3) LocatetheMovieRecommendationsdatabasein theDatabasescollection,rightclickonRoles,andselect NewRole.ThisbringsuptheCreateRoledialogbox. 4) OntheGeneralpageenterInternet_Userasthe rolename.ChecktheReadDefinitioncheckboxunder Setthedatabasepermissionsforthisrole. 5) ClickMembershipintheleftpaneandselecta page.AddyourIISuser(thedefaultis IUSR_machinename)totherolebyclickingAdd. 6) NowclickMiningStructureintheleftpane.This showstheCustomerMoviesminingstructureandthe CustomerMoviesminingmodelownedbythemining structure.DropdownthepermissionlistunderAccess andselectReadforbothobjects.AlsochecktheRead Definitioncheckboxforboth. 7) ClickOKtoaddtheRolewiththeabove permissionset. RecommendingProductsBasedontheUser’s ShoppingBasket Nowwe’rereadytoproducemovierecommendationsin ourWebapplicationbyrunningaSQL-likequeryagainst theAnalysisServerthatholdsourtrainedminingmodel. We’veputtogetheraminimalapplication(showninFigure 5)thatdemonstratestheideasbehindarealdeployment, focusingonthegenerationofthepredictionqueryfor gettingrecommendations.TheWebcustomeris assumed tohaveoneormoreitemsintheshoppingbasket,andfor simplicitywehaveatextboxwhereitemscanbeentered manually(separatedbysemicolons).ClickingAddItems toCartdisplaystheitemsintheshoppingbasketandalso showsalistofrecommendations. Figure5:Asimpleshoppingbasketapplication. ThecodebehindthebuttonclickisshowninFigure6; youcanseethatit’squitesimple. PrivateSubButton1_Click(ByValsenderAs Object, _ ByVal eAsSystem.EventArgs) 'Handles Me.Button1.Click 'ParsetheinputintoanArrayListofstrings. DimalInputItemsAsNewArrayList() DimsplitcharAsChar() ={";"c} DimszInputItemsAsString() = Me.TextBox1.Text.Split(splitchar,20) DimiAsInteger Fori = 0ToszInputItems.Length - 1 alInputItems.Add(szInputItems(i).Trim()) Next i 'Additemstotheshoppingbasket. dgShoppingBasket.DataSource =alInputItems dgShoppingBasket.DataBind() 'Gettop5recommendations. DimalRecommendedItemsAsNewArrayList(5) http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 6 GetRecommendations(alInputItems, alRecommendedItems) 'Displayrecommendations. dgRecommendations.DataSource = alRecommendedItems dgRecommendations.DataBind() EndSub'Button1_Click Figure6:Populateshoppingbasketandrecommendations. ThemethodbuildsanArrayListofstringsfromtheitems intheshoppingbasketandpassesittothe GetRecommendationssubroutine,requestingthetopfive recommendationsbasedontheinputitems.Weusetwo DataGridobjects:dgShoppingBasket,toholdtheitemsin theuser’sshoppingbasket;anddgRecommendations,to displaythegeneratedrecommendations.Thereal workhorseistheGetRecommendationssubroutine. ThecoreoftheGetRecommendationssubroutineisthe constructionofthepredictionjoinquery(seeFigure7) thatgetssenttoAnalysisServerandreturnsthelistof fiverecommendations. SELECTFLATTENED TopCount(Predict([CustomerMovies], INCLUDE_STATISTICS), $AdjustedProbability, 5) FROM[MovieRecommendations] NATURALPREDICTIONJOIN (SELECT (SELECT'StarWars' AS[Movie] UNION SELECT'TheMatrix'AS [Movie] ) AS[CustomerMovies] )AS t Figure7:ObtainrecommendationsusingDMXpredictionjoinagainstminingmodel. TheSQL-likequerylanguagesupportedbyAnalysis ServerforqueryingminingmodelsiscalledDMX.The DMXqueryinFigure7performsapredictionjointhat takesthetwomoviesfromtheuser’sshoppingbasket, formsacase,andjoinsitwiththeminingmodelto produceanoutputrowsetcontainingalistofpredicted recommendations. TheGetRecommendationssubroutine,showninFigure8, storesmostofthequeryincompile-timestringtemplates. Whatneedstobefilledinistheinputdata. PrivateSharedSubGetRecommendations( _ ByValvInputItemsAsArrayList, _ ByRefvRecommendedItemsAsArrayList) 'TemplatesforgeneratingDMXpredictionjoin statement. DimstrDMX1AsString = _ "SELECTFLATTENEDTopCount(" + _ "Predict([CustomerMovies], INCLUDE_STATISTICS)," + _ "$AdjustedProbability,5)From[Movie Recommendations] " + _ "NATURALPREDICTIONJOIN(SELECT(" DimstrDMX2AsString =")AS[Customer Movies])ASt" 'Iterateshoppingbasketandproduceinput case. DimcItemsAsInteger=vInputItems.Count DimstrDMXAsString ="" DimiAsInteger Fori = 0TocItems - 1 DimitemAsString =vInputItems(i).ToString() item=item.Replace("'","''") strDMX+="SELECT " +"'"+item+"'AS"+ "[Movie]" Ifi <cItems - 1Then strDMX+= "UNION " EndIf Next i http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 7 'PuttogetherDMXpredictionqueryto 'get5recommendations. strDMX =strDMX1 +strDMX +strDMX2 'ConnecttoAnalysisServerandexecutequery. DimasSessionAsNewAnalysisServerSession asSession.Connect() IfFalse = asSession.ExecuteAndFetchResult(strDMX)Then Return EndIf 'Readpredictionresultsandbuildlist 'ofrecommendations. vRecommendedItems.Clear() WhileasSession.asDataReader.Read() DimtypeAsString = asSession.asDataReader.GetDataTypeName(0) Iftype="DBTYPE_WVARCHAR"Ortype="String" Then Try DimvalAsString = asSession.asDataReader.GetString(0) vRecommendedItems.Add(val) CatcheAsException Console.WriteLine(e.Message) EndTry EndIf EndWhile 'DisconnectfromAnalysisServer. asSession.DisConnect() EndSub 'GetRecommendations. Figure8:Convertshoppingbaskettoinputcaseforminingmodelpredictionandget recommendations. Asexplainedbrieflyduringtheminingmodeldesign phase,dataispassedtominingmodelsintheformof cases.Eachcaserepresentsalltheinformationwehave aboutasinglecustomer.Whenwemakeapredictionfor anewcustomer,theknowninformationaboutthe customerispassedasaninputcasetotheminingmodel. Inthisscenariotheinputcaseisthelistofmoviesinthe customer’sshoppingbasket. Notethatthelistofmovieswasrepresentedasanested tableintheminingmodeldefinition;therefore,weneedto supplytheinputcaseinthesameform.Thisis accomplishedbythenestedsub-selectafterthe NATURALPREDICTIONJOINclauseinFigure7.The UNIONoperationisusedtoaddeachmovieasa separaterowinthenestedtable.Whenwesupplycases inlinelikethis,wedon’tneedtoprovideavalueforthe casekey,CustomerId,sincethatisnotusedforthe predictionoperation. Withtheaboveinputcasedataformatinmindit’snow easytowalkthroughtheGetRecommendationscodeand understandwhatit’sdoing. Westartthequerygenerationprocessbydeclaring templatesfortheDMXstatement.Wetheniterateover theinputitems,thecontentsoftheuser’sshopping basket,toproduceaninputcasewithanestedtable formatasdescribedabove.Theinputcaseisappended to therestofthetemplatetoproducethequerythatwillbe senttotheserver. Next,aninstanceoftheAnalysisServerSessionclassis instantiatedtoconnecttoAnalysisServerandexecutethe queryagainstthetrainedminingmodel.Thecodeforthis classisincludedinthedownload.The AnalysisServerSessionclassusesADOMD.NET,the managedproviderforAnalysisServer,toconnecttothe server.ADOMD.NETimplementsthesameconnection, command,anddatareaderinterfacesasstandard ADO.NET(System.Data.Oledb),sotherestofthecode http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 8 looksexactlylikeADO.NETcode—butoffersbetter performance.(TouseADOMD.NETyou’llneedtoadda referencetoMicrosoft.AnalysisServices.AdomdClient.dll, installedbytheSQLServer2005Beta2Client ComponentsSetup.Ifitisnotlistedunder.NETinthe AddReferencedialogboxinVisualStudio,browseto ProgramFiles\Microsoft.NET\ADOMD.NET\90andaddit fromthatlocation.) Thequeryresultsarefetched,againusingastandard datareaderinterfaceexposedbyADOMD.NETandthe arraylistofoutputitemspopulatedwiththe recommendationsreturnedbythepredictionjoin.Note thattheDMXquerywegenerateusestheFLATTENED keywordtoavoidhavingtoperformhierarchicalresult navigationtofetchtheresults. UndertheHood Inthissectionweexplainwhathappensontheserver whenitreceivesapredictionqueryandhowthe knowledgeacquiredbytheminingmodelmaybeexplored further. Duringthetrainingprocess,theAssociationRulesmodel learnsasetofrulesthatareusedtogenerateproduct recommendations.Ifarulesuchas“[Camera,Film]-> Batteries”wasdiscoveredandthecustomer’sshopping basketcontainsCameraandFilm,thenthisrulefires.Of course,theremaybeotherrulesthatpredictBatteriesas well,inwhichcasetherulewiththehighestscoreis used.Thescoreassignedtoarule,alsoknownasits Importance,takesintoaccountandcompensatesforthe factthattheprobabilityforarulemaybehighjust becausethetargetitemispopularinthedataset.Assume that10%ofcustomersbuy StarWars irrespectiveof otherpurchases,and BladeRunner isboughtbyjust3%. Iftworulespredict StarWars andBladeRunner withthe sameprobability,thescorefor StarWars willbelower sinceit’ssopopular.However,theadvancedusercan finetunethescoreusingalgorithmparameters. Finally,thetop nhighestscoringrulesareusedto generatetherecommendations.Thelastparameterinthe TopCountfunction(again,seeFigure7)setsanupper limitonthenumberofitemsreturnedbytheprediction algorithm. Therulesareorganizedbytheitemspredictedandsorted indescendingorderbasedonthescore.Theprediction algorithmcanavoidlookingatverylargesetsofrulesto achievegoodpredictionperformance. Additionalinsightintoacustomer’spurchasingbehavior maybegainedbyusingtheviewerssuppliedaspartof theBusinessIntelligenceDevelopmentStudio.These includetheAssociationRulesViewer(showninFigure9) forbrowsingtherulesandtheDependencyNetViewer thatgraphicallyshowstherelationshipbetweenitems. Thegraphlayoutalgorithmillustrateshowthestrongest correlatedproductsareclustered. http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 9 Figure9:Viewingmodelcontent. TheAssociationRulesviewerdisplaystheimportancefor eachruleandrankstherulesbythisscore.Asexplained earlier,importanceservesasabettermeasurethan probabilityforfindinginterestingrulesinyourdata. Conclusion Thisarticlemerelyscratchesthesurfaceofthe possibilitiesthatSQLServer2005DataMiningpresents foryourASP.NETapplications.Oneideaforextending thissamplewouldbetoaddothercustomerattributes suchasdemographicsasinputstopotentiallyimprovethe qualityoftherecommendations.OtherpossibleWeb applicationsinclude: Targetedadsbasedonanalysisofbrowsingbehaviorusing theSequence_Clusteringalgorithm. AnonlineHelpdeskthatfindsthemostappropriateanswer forauser’squerybyusingtheNeural_Networkalgorithmin conjunctionwithtextminingtechnologiesavailableinSQL Server2005DataTransformationServices. ThedataminingtutorialforSQLServer2005Analysis ServicesisavailableontheReadinessKitCDincluded withtheBeta2package.VisittheMicrosoftbeta newsgroupat microsoft.beta.yukon.analysisservices.dataminingorthe DataMiningforumat http://www.sqljunkies.com/Forums/ ShowForum.aspx?ForumID=38ifyouhavequestionsabout SQLServer2005DataMining.Thisarticleoffersan outlineofthesimplifiedminingprocessusedforthis specificscenario.Forageneralizedversionofthis methodology,refertoCRISP-DMat http://www.crisp- dm.org. Thesamplecodeinthisarticleisavailablefor download. RamanIyer isaSoftwareDesignEngineeratMicrosoft Corp.andafoundingmemberoftheSQLServerData Miningdevelopmentteamthere.Hecanbereachedat mailto:[email protected]. JesperLind isaResearchSoftwareDesignEngineerat MicrosoftResearchandamemberoftheMachine LearningandStatisticsteam.Hecanbereachedat mailto:[email protected]. http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27 SQL Server 2005 Data Mining Page 10 InformantCommunicationsGroup, Inc. 5105FlorinPerkinsRoad Sacramento,CA95826 Phone:(916)379-0609•Fax: (916)379-0610 Copyright©2005InformantCommunicationsGroup.AllRightsReserved.• SiteUseAgreement •Sendfeedback tothe Webmaster •Important informationabout privacy http://www.aspnetpro.com/newsletterarticle/2004/10/asp200410ri_l/asp200410ri_l.asp 09/03/2005 17:34:27

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.