DATA MINING TECHNIQUES AND MATHEMATICAL MODELS FOR THE OPTIMAL SCHOLARSHIP ALLOCATION PROBLEM FOR A STATE UNIVERSITY A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy by SHUAI WANG B.S., Management, Dalian Jiaotong University, 2011 2017 Wright State University WrightStateUniversity GRADUATESCHOOL December14,2017 I HEREBY RECOMMEND THAT THE DISSERTATION PREPARED UNDER MY SUPERVISION BY Shuai Wang ENTITLED Data Mining Techniques and Mathematical Models for the Optimal Scholarship Allocation Problem for a State University BE AC- CEPTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DE- GREEOFDoctorofPhilosophy. XinhuiZhang,Ph.D. DissertationDirector FrankW.Ciarallo,Ph.D. Director,Ph.D.inEngineeringProgram BarryMilligan,Ph.D. InterimDeanoftheGraduateSchool Committeeon FinalExamination XinhuiZhang,Ph.D. PratikParikh,Ph.D. CarolineCao,Ph.D. SubhashiniGanapathy,Ph.D. NanKong,Ph.D. ABSTRACT Wang, Shuai. Ph.D in Engineering Program, Department of Biomedical, Industrial and Human Factors Engineering, Wright State University, 2017. Data Mining Techniques and Mathematical ModelsfortheOptimalScholarshipAllocationProblemforaStateUniversity. Enrollment Management and Financial Aid. Enrollment management is the term that is often used to describe the synergistic approaches to influence the enrollment of higher education institutions, and consists of activities such as student college choice, transition to college, retention, and graduation. Of all the factors, financial aid, institution rank, and tuition are the three most important ones that affect students’ choice processes and matriculation decisions; as such, with the continuous increase of tuition over the years, financial aid serves as a marketing tool and plays an important role in attracting students. In the United States, in the 2012-2013 academic year, there were a total of 20.4 million students enrolled in degree-granting institutions and more than eighty percent of them receivedfinancial. TheOptimalScholarshipAllocationProblem: Thewidespreaduseoffinancialaidleads to an important problem yet to be solved in the literature, i.e., how to optimally allocate the limited financial aid to students with various social and economic backgrounds so as to achieve enrollment goals. Though financial aid can be of various forms, merit-based scholarshipsaretheprimarypartoftheallocationprocess. Thisproblem,referredtoasthe optimalscholarshipallocationproblem,haspuzzledtheenrollmentmanagementteamsat manyhigherinstitutionsandisthefocusofthisthesis. iii Solution Approach: This thesis proposes a series of predictive and optimization models to solve the optimal financial aid allocation problems. The methodology consists of three sequential phases: 1) predictive models to find the responses (enrollment and graduation probabilities and years of study) to various levels of scholarship for students with various socioeconomic backgrounds; 2) optimization models to find the maximum revenue for given budget based on the response discovered to the various levels of scholarships; and 3) data mining models to discover patterns and transform results from the optimization modeltosimpleandeffectivepolicies. Phase I: Predictive Models. A series of predictive models have been investigated to esti- matetheresponsesfromstudentstovariouslevelsofscholarshipawards. Theseresponses canbeclassifiedintotwocategories: thefirstcategoryincludesenrollmentandgraduation decisions and the second one is the number of years of study once a student enrolls in the institution. In the first category, because of the binary nature of the responses (enroll or not enroll), logistic regression based models have been adopted to predict the probability of enrollment and the probability of graduation given that student enrolls. In the second category,regressionanalysisareadopted. Phase II: Optimization Models. An optimization model is designed to allocate financial aid to applicants with an objective to maximize the revenue, which is composed of net tuition, i.e., tuition minus scholarship, over the years of study, plus the state share of instruction once the student graduates. The constraints to be observed include the total budget limitations and a fairness constraint. For a merit-based scholarship, the fairness constraint stipulates that a student with better academic performance must be assigned to an equal or higher level of scholarships than that of students with a lower academic iv performance. The inclusion of the fairness constraint has dramatically increased the size of the model, and to reduce computational burden, the concept of a minimum dominance set is developed. This has reduced the size of the model by orders of magnitude and enabledtheefficientsolutionoftheresultingmathematicalmodel. Phase III: Policies Analysis Models. Regression analysis is developed to discover pat- terns in the optimization results, in the form of the amount of scholarship awarded for each student, and translate them into simple and effective scholarship award policies for implementation. Several techniques such as decision tree and piecewise regression have beenexplored. Fortheinstitutionunderstudy,theresultssuggestedthatacompositescore based on the student’s GPA and ACT scores can be used as the basis for the award of scholarships; and a simple yet effective award scholarship policy derived from piecewise regressionhasbeendiscovered. Implementation: The analysis based on the above framework was adopted by the in- stitution under study and has been used in an overhaul of the scholarship redesign. The piecewiseregressionderived,compositescorebasedscholarshipawardpolicyprovestobe effective, and together with a proactive marketing strategy it has yielded an 11% increase indirectlyadmittedstudentsunderasimilarbudget. Thistranslatesintomillionsofdollars ofrevenueandsignificantlyimprovestheuniversity’sbottomline. v Contents 1 Introduction 1 1.1 EnrollmentManagementandFinancialAid . . . . . . . . . . . . . . . . 1 1.2 TheScholarshipAllocationProblem . . . . . . . . . . . . . . . . . . . . 3 1.3 AThreePhaseSolutionApproach . . . . . . . . . . . . . . . . . . . . . 7 1.4 ImplementationandFinancialResults . . . . . . . . . . . . . . . . . . . 9 1.5 Contributionandlimitation . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 LiteratureReview 13 2.1 Macro-LevelStudentDemandModels . . . . . . . . . . . . . . . . . . . 13 2.1.1 StudentDemandTheoryonTuition . . . . . . . . . . . . . . . . 14 2.1.2 StudentDemandTheoryonFinancialAid . . . . . . . . . . . . . 15 2.1.3 TargetEffectonFinancialAid . . . . . . . . . . . . . . . . . . . 16 2.1.4 StudentDemandStudyforPolicyAnalysis . . . . . . . . . . . . 17 2.2 EnrollmentPredictionatMicro-Level . . . . . . . . . . . . . . . . . . . 19 2.2.1 CollegeChoiceProcessandModels . . . . . . . . . . . . . . . . 20 vi 2.2.2 Micro-levelResponsetoFinancialAidandItsOptimization . . . 21 2.3 MethodologyReviews . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.1 RegressionModelsinStudentDemandStudies . . . . . . . . . . 23 2.3.2 LogisticRegressionModelsinStudentChoiceResponseStudies . 25 3 PredictiveModelsforProbabilitiesofEnrollmentandofGraduation 28 3.1 DataExplorationandVisualization . . . . . . . . . . . . . . . . . . . . . 28 3.2 LogisticRegressionForEnrollment&Graduation . . . . . . . . . . . . . 36 3.2.1 LogisticRegressionMethodology . . . . . . . . . . . . . . . . . 36 3.2.2 CollinearityandVariableSelection . . . . . . . . . . . . . . . . 37 3.2.3 LogisticRegressionModelsonTrainingData . . . . . . . . . . 39 3.2.4 LogisticRegressionTreeModelsonTrainingData . . . . . . . . 43 3.2.5 PredictionAccuracyonTestData . . . . . . . . . . . . . . . . . 44 3.3 AnswertoEnrollment&GraduationProbabilities . . . . . . . . . . . . 47 4 PredictionModelsontheNumberofYearsofStudy 50 4.1 DifferenceFromaRetentionStudy . . . . . . . . . . . . . . . . . . . . . 50 4.2 MethodsandResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.1 TrainingandTestingData . . . . . . . . . . . . . . . . . . . . . 52 4.2.2 PredictionModels . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.2.3 ExperimentResults . . . . . . . . . . . . . . . . . . . . . . . . . 56 5 MathematicalModelsforTheOptimalFinancialAidAllocationProblem 60 5.1 TheFinancialAidOptimizationModel . . . . . . . . . . . . . . . . . . . 61 5.2 ModelSizeReductionandDominanceMatrix . . . . . . . . . . . . . . . 63 vii 5.2.1 TheSizeofPair-wiseDominanceConstraints . . . . . . . . . . . 63 5.2.2 FullDominanceMatrix . . . . . . . . . . . . . . . . . . . . . . . 63 5.2.3 RedundantDominanceMatrix . . . . . . . . . . . . . . . . . . . 65 5.2.4 MinimumCardinalityDominanceMatrix . . . . . . . . . . . . . 67 5.3 ModelComparisonandResults . . . . . . . . . . . . . . . . . . . . . . . 67 5.3.1 ResultsUnderDifferentSSI . . . . . . . . . . . . . . . . . . . . 68 6 DerivationofScholarshipAwardPolicies&Implementation 74 6.1 DerivationofScholarshipAwardPolicies . . . . . . . . . . . . . . . . . 74 6.1.1 ScholarshipAwardPolicyBasedonDecisionTree . . . . . . . . 75 6.1.2 ScholarshipAwardPolicyonStepwiseRegression . . . . . . . . 77 6.1.3 InsightsonChangeOfBudget . . . . . . . . . . . . . . . . . . . 81 6.2 ImplementationandResults . . . . . . . . . . . . . . . . . . . . . . . . 83 7 Conclusion 86 Bibliography 89 viii List of Figures 3.1 HistogramforACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 HistogramforHighSchoolGPA . . . . . . . . . . . . . . . . . . . . . . 33 5.1 Fulldominancerelationshipsingraphform . . . . . . . . . . . . . . . . 65 5.2 Redundantdominancerelationshipsingraphform . . . . . . . . . . . . . 66 5.3 Minimumdominanceingraphform . . . . . . . . . . . . . . . . . . . . 67 5.4 OptimizationresultsforSSI=10,000. . . . . . . . . . . . . . . . . . . . 72 5.5 OptimizationresultsforSSI=12,000. . . . . . . . . . . . . . . . . . . . 73 5.6 OptimizationresultsforSSI=14,000. . . . . . . . . . . . . . . . . . . . 73 6.1 Financialaidpolicybasedondecisiontree . . . . . . . . . . . . . . . . 76 6.2 (a) (b) (c) Scholarship vs ACT for various budgets and SSI. (d) (e) (f) ScholarshipvsGPAforvariousbudgetsandSSI. . . . . . . . . . . . . . 79 6.3 ScholarshipvsCompositescoreforSSI=10,000 . . . . . . . . . . . . . . 81 6.4 ScholarshipvsCompositescoreforSSI=12,000 . . . . . . . . . . . . . . 82 6.5 ScholarshipvsCompositescoreforSSI=14,000 . . . . . . . . . . . . . . 82 ix List of Tables 1.1 Comparisonofenrollmentbetween2012-2013and2013-2014Years . . . 10 3.1 Thenumberofapplicationsfrom2007to2013 . . . . . . . . . . . . . . 29 3.2 Statisticsofselectedcontinuousvariablesrelatedtoapplications . . . . . 31 3.3 Statisticsofselectedcontinuousvariablesrelatedtomatriculatedstudents 32 3.4 NumberofapplicantsvsGPA/ACTin2012-2013 . . . . . . . . . . . . . 35 3.5 Pearsoncorrelationmatrixofallnumericvariables . . . . . . . . . . . . 38 3.6 Summarystatisticsoflogisticregressionforenrollmentmodel . . . . . . 40 3.7 Summarystatisticsoflogisticregressionforgraduationmodel . . . . . . 42 3.8 Variablesusedinthelogisticregressiontreemodels . . . . . . . . . . . . 44 3.9 Enrollmentpredictionfromlogisticregressionandlogisticregressiontree 45 3.10 Graduationpredictionfromlogisticregressionandlogisticregressiontree 45 3.11 Accuracy of enrollment prediction from support vector machine and neu- ralnetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 x
Description: