JournalofIntelligent&FuzzySystems35(2018)3733–3745 3733 DOI:10.3233/JIFS-18547 IOSPress A comprehensive mechanism for hotel recommendation to achieve personalized search engine Y ∗ YingHuang,Hong-YuZhang andJian-QiangWang P SchoolofBusiness,CentralSouthUniversity,Changsha,China O C Abstract.Searchenginesareasimportantasrecommendersystemsforhotelselections.However,therecommendedlists of search engines are usually non-personalized and low accuracy. In order to deal with these issues in search engines, a comprehensive mechanism for hotel recommendation is prop osed. In this mechanism, we consider users’ personalized preferences by identifying users’ attributes about interest, trusRt and consumption capacity. Meanwhile, the quantification methodforeachattributeispresentedbyusingfuzzytheory.Moreover,thispaperimprovesthemethodtoevaluatethehotel, whichrespectstothecriteriaprice,rating,andonlinereviewbyusingfuzzytheory.Inaddition,thisproposedapproachuses TOPSIS,aclassicalmulti-criteriadecisionmakingmethod,Otoimprovetheaccuracyfurther.Finally,acasestudyisconducted basedonTripadvisor.comtoillustratethevalidityoftheproposedmethodforhotelrecommendationinsearchengines.The resultsofthecasestudyindicatethatitnotonlysolvestheproblemofnon-personalization,butalsoimprovestheaccuracy insearchengine. H Keywords:Hotelrecommendation,TOPSISmethod,searchengine,onlinereviews,price T U 1. Introduction menditemsone-commerceplatformsforconsumers. A Both recommender systems and search engines can The development of electronic-commerce (e- augmentthepossibilityofonlinepurchasing. commerce) and social media has changed users’ At present, most of the researches focus on rec- styles for booking hotels. Tourists are accustomed ommender systems, few of them study how to to booking hotels in advance on e-commerce plat- improve search engines. However, there are some forms,suchasHolidayInn.com,ctrip.comandsoon. obvious issues in search engines. Usually, search However,thelargequantity,variedpricesandragged engines cannot provide personalized recommended qualityofhotelsmakeitisdifficultforuserstofind lists. Nevertheless, the item recommendations by satisfactoryhotels.Recommendersystemscanfilter search engines are mainly based on the ratings. thevastamountofinformationinnetworkstoassist Besides,evenifconsumersarenotsatisfiedwiththe consumerinmakingthebestchoices[1–6].Besides items,theystilltendtogivehighscoresduetothecon- the recommender systems, search engines are the sumers’ rating habits or in order to avoid malicious othercommontoolstofilterinformationandrecom- harassment caused by giving low scores. It causes mosttheratingsofitemsareconcentratedin4or5(on the 1–5 scale). Traditional recommendation meth- ∗Correspondingauthor.Hong-YuZhang,SchoolofBusiness, ods usually use the average ratings to rank hotels, CentralSouthUniversity,Changsha410083,China.Tel.:+86731 88830594;Fax:+8673188710006;E-mail:[email protected]. and the difference of average ratings among hotels 1064-1246/18/$35.00©2018–IOSPressandtheauthors.Allrightsreserved 3734 Y.Huangetal./Acomprehensivemechanismforhotelrecommendationtoachievepersonalizedsearchengine isoftenslight.Inthecircumstances,itisdifficultto neutrosophic numbers instead. In this study, on the distinguish the quality of hotels [7] on the basis of basis of research in [25], we will use the sentiment average ratings, which in turn causes low accuracy analysistechnologytotransformonlinereviewsinto of the recommendations for search engines. In this singlevaluedneutrosophicnumbers(SVNNs). study, we will propose a new method to deal with Apartfromratingsandonlinereviews,ithasbeen theratingsandacomprehensivehotelrecommenda- proved that price is a pivotal factor that influences tion method to improve the performance of search consumers’ decision-making [26–30]. Few studies engines. have applied the price in product or hotel recom- The basic framework of consumer’s purchasing mendation.Largelybecauseofregionaleconomyand decision-making in the context of online shopping personalincomes,differentconsumersmayhavedif- is same with that of off-line way. The decision- ferentopinionsonthequestionwhetherthepriceof Y makingprocessofitemsselectionisthemostcritical ahotelisexpensiveornot.Theresultisthatitcan- stage in the process of consumer’s online shopping notdirectlyrelyonthevalueofthepricetoevaluate [8].Atthisstage,consumersneedtocompare,ana- or recommPend hotel. Besides, due to the impact of lyze and evaluate the selected items according to various factors, such as social position, vanity and the purchase criteria, so that they can make a judg- income, different consumers have different prefer- O ment of the quality of the items, to generate the ences for products’ prices. One point to be sure is final evaluation. The researches of the current rec- that,forthesameconsumer,hisconsumptioncapac- ommendation methods often focus on the demand itywillbeconsistentforalongtime.Thehigherthe C evokingpart,whichignoretheprocessofconsumer similarity between the hotel’s price and customer’s decision.Withrespecttosearchengines,consumers consumptioncapacity,thegreaterthelikelihoodfor providetheirdemandsdirectlybyenteringkeywords. thehoteltobebookedbythecustomer. Therefore, there is a relatively low requirement on RBased on the researches outlined above, in the identifying the demands of consumers. In addition, decision-making process of item selection, we will decision-making process of items selection is more combinetheimpactofrating,onlinereviews,aswell O important in the methods for search engines than aspricetoimprovetheaccuracyofsearchengine. in the traditional recommendation methods. In this In the hotel recommendation field, some studies study,wewillamelioratesearchenginesbyresearch- are devoted to the excavation of hotel evaluation H ingdecision-makingprocessofitemsselection. rules. Yu and Chang [31] designed a hotel evalua- Thedecision-makingprocessofitemsselectionis tionrulewithfivecriteriaincludingdistance,traveler influenced by many factors for consumers T[9–11]. preference,roomrate,facilities,andratingthesefive Theratingtowarditemisthemostusedcriterionfor criteria.Levietal.[32]minedhotelreviewstodeter- item selection [12, 13]. However, most studies use minetheimportanceofeachcriterionofhotels.Some U onlyacomprehensivescore,whichresultsinaseri- literaturesdesignedpersonalizedhotelrecommenda- ouslossofinformation[14,15].Inrecentyears,with tionmethodsbygivingdifferentweightsfordifferent thedevelopmentoftextanalysistechnAology,several criteriabasedonthetargetuser’spersonalizedprefer- researcheshavecombinedratingsandonlinereviews ence[33,34].Nevertheless,mostexistingresearches toselectitems[16,17]. givethesameweightsforalluserswhohavecomment As online reviews are in the form of text, which on the hotels. Actually, for a target user, the ratings contain more information than ratings, more and andonlinereviewsofsimilarusershavemoreinflu- more scholars have studied the impact of online encethanthoseofdissimilarusers.Therefore,inthis reviewsandappliedthemtoimprovetheperformance study,weneedtomineusers’preferences,toidentify of recommender systems [18–21]. Most of them similar group, to adjust the weights of groups with utilizeonlinereviewstoidentifyusers’oritems’pref- differentsimilaritiestotargetuserintermsofratings erences by identifying keywords in online reviews, andonlinereviews,andtodealwiththeproblemof then use these keywords as the properties of items non-personalizationinsearchengine. or users [22, 23]. Some other researches use online Fuzzytoolshavebeenusedtomodeluncertainand reviewstoevaluateitems[24].Zhangetal.[25]pro- vaguepreferencesinrecommendersystems[34,35]. posedamethodstoquantifyonlinereviewsbyusing Usually, the fuzzy sets used in different studies are neutrosophictheory.Intheirstudy,theydidnottrans- differentbecauseofthedifferentdataformsandrec- late online reviews into the neutrosophic numbers ommendation strategies. Combining the features of actually, but translated ratings into interval-valued the collected data and the feature of each fuzzy set, Y.Huangetal./Acomprehensivemechanismforhotelrecommendationtoachievepersonalizedsearchengine 3735 thispaperwillusetheappropriatefuzzysettoquan- 1) A+B={(cid:3)x, μ (x)+μ (x)−μ (x)μ (x), A B A B tifyprices,ratings,andonlinereviews.Theformof v (x)v (x)(cid:4)|x∈X}, A B informationabouteachcriteriontoevaluatehotelsis 2) A·B={(cid:3)x,v (x)+v (x)−v (x)v (x),μ (x) A B A B A different,sothemethodofquantificationvariescon- μB(x)(cid:4)|(cid:2)x(cid:3)∈X}. siderably with criteria. For this reason, aggregating 3) λ·A= x, 1−(1−μ(x))λ, 1−(1−v (cid:4) (cid:5) A thecriterionevaluationvaluesdirectlytosorthotels (x))(cid:2)λ(cid:3) |x∈X , (cid:4)(cid:5) isnotefficient.Wecouldrankhotelsandrecommend 4) Aλ = x, μ λ(x), v λ(x)|x∈X . A A thembyusingtheTOPSIS(TechniqueforOrderPref- erencebySimilaritytoSolution)method,tosolvethe multi-criteriadecisionmakingproblems[36–39]. Definition 3. [41] Let A={(cid:3)x, μ (x), v A A Theremainderofthispaperisorganizedasfollows. (x)(cid:4)|x∈X}, B={(cid:3)x, μ (x), v (x)(cid:4)|x∈X} B B Y InSection2,webrieflyreviewsomebasicconcepts are two IFSs on X, then the normalized Euclidean relatedtothisstudy.Section3proposesacomprehen- distancebetweenAandBcanbedefinedas: P (cid:6) (cid:7) (cid:7) (cid:9)n (cid:10) (cid:11) d (A, B)=(cid:8) 1 (μ (x)−μ (x))2+(v O(x)−v (x))2+(π (x)−π (x))2 (1) IFs A B A B A B 2n i=1 C Definition4.[42]LetXbeauniverseofdiscourse, sivemechanismforhotelrecommendation.Toverify thensinglevaluedneutrosophicsetainXisdefined the feasibility of the method, Section 4 conducts a Ras: a={(cid:3)x, Ta(x), Ia(x), Fa(x)(cid:4)|x∈X}. Ta(x), casestudyusingdatafromTripAdvisor.com.Finally, I (x), F (x) are the degrees of truth-membership, a a Section5concludesthestudyandsuggestsdirections indeterminacy-membershipandfalsity-membership, forfutureresearch. O respectively.ThevaluesofT (x),I (x)andF (x)lie a a a in[0,1],forallx∈X.Theelementinsetaiscalled singlevaluedneutrosophicnumber(SVNN). 2. Basicconcepts H Definition 5. [43] Let A and B be two SVNNs, for In this section, we briefly review the defini- anyx∈X,therearesomeoperations,anddefinedas: T tions of the intuitionistic fuzzy set (IFS), the single valued neutrosophic set (SVNS), and the hesitant 1) A+B=(cid:3)TA+TB−TATB, IA+IB− fuzzy linguistic term set (HFLS), as weUll as their IAIB, FA+FB −FAFB(cid:4), operations and distance methods. These definitions 2) A·B=(cid:3)(cid:3)TATB, IAIB, FAFB(cid:4), will be used in the proposed hotel recommendation 3) λ·A= 1(cid:4)−(1−TA)λ, 1−(1−IA)λ, 1− A method. (1−F(cid:3)A)λ ,and (cid:4) 4) Aλ = T λ, I λ, F λ . A A A Definition 1. [40] Let X be a nonempty clas- sical set, X={x , x ,..., x }, the intuitionistic 1 2 n fuzzy set defined on X is represented as: A= Definition6.[44]LetAandBbetwoSVNNs,then {(cid:3)x, μ (x), v (x)(cid:4)|x∈X}, where μ (x):X→ the Euclidean distance between A and B is defined A A A [0,1] and v (x):X→[0,1] are the membership as: A degree and non-membership degree of A, and 0≤ d (A, B) μA(x)+vA(x)≤1. H (cid:12) ForeveryIFSonX,π (x)=1−μ (x)−v (x) 1(cid:10) (cid:11) A A A = (T −T )2+(I −I )2+(F −F )2 . istheintuitionisticindexoftheelementxinA,and 3 A B A B A B 0≤π (x)≤1, x∈X. A (2) Definition 2. [40] Let A={(cid:3)x, μ (x),v A A (x)(cid:4)|x∈X},andB={(cid:3)x, μ (x), v (x)(cid:4)|x∈X} Definition7.[45]introducedasubscript-symmetric B B be two IFSs on X, the operations of A and B are linguistic evaluation scale, which can be defined as definedas: follows:S ={s−τ,...,s−1, s0, s1,...sτ}. 3736 Y.Huangetal./Acomprehensivemechanismforhotelrecommendationtoachievepersonalizedsearchengine Definition 8. [45] Let s and s be two linguistic a b terms, then the distance measure between s and s a b is: |a−b| d(s , s )= . (3) a b 2τ+1 Definition 9. [46] S = {s−τ, ..., s−1, s0, s1, ... s }isalinguisticscalesetincluding2τ+1linguis- τ tic terms, then a set of finite ordered elements in the set S is called a hesitant fuzzy linguistic term set (HFLS), denoted as H : Hs={s |s ∈S, i= Y S βi βi 1, 2,..., n}, s is a linguistic term, an element in βi H . βi is the subscript of s , and −τ ≤βi≤τ. n S βi P representsthenumberofelementsinH . S Definition 10. [47] Let H1 and H2 be two HFLSs, O s s theHammingdistancebetweenH1andH2isdefined s s as: Fig.1. Theframeworkoftheproposedmethod. d(cid:13)H1,H2(cid:14)= 1 (cid:9)n (cid:15)(cid:15)αj −βj(cid:15)(cid:15), (4) C s s n 2τ+1 isoneoftheindispensablestepstoprovideuserswith j=1 accurate recommendations. This study will extract whenthenumbersofelementsintwosetsarediffer- Ruserinterestattributefromuser’spurchasingrecords ent,complementthesetwithfewerelements. andonlinereviews. Thestepsofinterestrecognitionareasfollows. O (i) For user u, gather all titles of hotels user u 3. Acomprehensivemechanismforhotel stayedanduser’sonlinereviewstoformalong recommendation H text,denotedasT . u (ii) For each long text, eliminate irrelevant words Inthissection,weproposeacomprehensivehotel such as stop words and so on, and carry out recommendationmethodtoimprovetheperforTmance wordsegmentationandwordfrequencystatis- of search engine. The framework of the proposed tics. Then, classify the high-frequency words. methodisshownasFig.1. U Because the same interest can be expressed Thedetailsoftheproposedmethodwillbeexpati- in different ways, it is necessary to sort out atedintherestofthissection. thehigh-frequencywordsfurther:classifysyn- A onymsintoonecategory,anduseeachcategory 3.1. Thequantificationofuserattribute vocabularyasafeatureofuseru’sinterest,the featureofuser’sinterestdenotingasf . u In a socialized business environment, users’ data (iii) For each T , count the frequency of each u have increased exponentially. User’s purchasing feature f appearing in the text T , and u u records, browsing records, ratings, online reviews, denote as freq . Thus, the interest attribute u tags and other data, to some extent, reflect user’s for user u can be represented by the collec- interest, trust, consumption capacity and other tionofallthefeatures,representedasF(u)= personalized information. How to utilize massive {f , f ,...f ..., f }, where f is the u1 u2 ui un ui unstructured information, to mine user’s attribute ith feature of interest for user u, freq(u)= information accurately and effectively is one of the {freq , freq ,...freq ..., freq } is a u1 u2 ui un key issues in this study. In this part, we will intro- set of the frequency corresponding to the fea- ducethequantificationmethodsforuser’sattributes tureF(u). indetail. User interest. User interest directly reflects con- User trust. At present, the number of users on e- sumer’spurchasepreference.Miningusers’interest commerce websites is huge. Some users are false Y.Huangetal./Acomprehensivemechanismforhotelrecommendationtoachievepersonalizedsearchengine 3737 golfer326 Since Nov 2014 Jordan 50-64 year old female the data reflects how many items user “golfer 326” has commented. the data reflects how many users Y think the reviews are useful P Fig.2. Themechanismtoevaluateuser’strustonTripadvisor.com. O users,whosepurchasingandcommentbehaviorsare irregularanduntruthfulness.Theonlinereviewsand ratings of these users are unconvincing. Their trust CFig.3. Keywordsrepresenteduser’sinterests. degreesarelow,andtherecommendationsbasedon thehistoricrecordsofthemareinaccurate.Thelower s−4=“Incomparable Cheap”, s−3=“Significantly thetrustofuser,thelowerthereliabilityoftheuser, Cheap”, s−2=“Cheap”, s−1=“Somewhat Cheap”, and the lower the accuracy of the recommendation Rs =“Medially”, s =“Somewhat Expensive”, 0 1 basedonhishistoricrecords.Therefore,thetrustof s =“Expensive”,s =“SignificantlyExpensive”and 2 3 userisalsoimportanttotheidentificationofsimilarOs4=“CertainlyExpensive”. group,andtheusersinsimilargrouparewithahigh Secondly,collectthepricedistributionsofdiffer- degree of trust. Many e-commerce platforms have ent cities, and divide the prices of hotels in each the mechanisms to evaluate user’s trust, for exaHm- city into 9 levels corresponding to linguistic terms ple, Fig. 2 shows the mechanism to evaluate user’s S. For the same city, the higher the price of hotel trust on Tripadvisor.com. The more the number of user booked, the stronger the consumption capacity “helpful votes” a single review gets, the mTore the ofuser.Finally,foreachuser,consumptioncapacity trustworthytheuserpossesses.Bycomparison,some isdenotedasH intheformoflinguistictermsets. u e-commerce platforms do not have the meUchanisms toevaluateuser’strust. 3.2. Thedefinitionofsimilargroup Fortheformer,thereisthestandardizationofuser’s A trust: Inthisstudy,weidentifyconsumerswithhighsim- t ilarity in terms of interest, trust, and consumption trust = u , (5) u max(t) capacity as similar group, and they tend to have a similarpurchasingpreference. wheretuisthetrustofuseruonthee-commerceplat- (1)Generally,themoresimilarthecharacteristics form,max(t)isthemaximumvalueinallusers’trust. ofhotelsandtheaspectswhencommentingonhotels, Forthelatter,thetrustquantitativemethodneedsto the higher the preference similarity between users. befurtherstudiedinthefuture. Theinterestpreferencesimilaritybetweentargetuser uandtheuservcanbecomputedasfollow: User consumption capacity. Similar users have similarconsumptioncapacities.User’sconsumption Sim int(u, v) ⎧ chaapsasctaityyedca.Innbtheirsesfltuedctye,dwbeywtihlleuptirliiczeeslionfghuoistetilcstuesrmer ⎪⎪⎨(cid:20)m (1−|frequj−freqvj|) setFsitrostelyx,praessssuumseerst’hcaotntshuemreptiiosnacaspeatciLtieso.f nine =⎪⎪⎩j=1 max(N0u,,Nv) ,NNu =/=00aonrdNNv==/00, linguistic terms S to depict the price, where L= u v {s−4, s−3, s−2, s−1, s0, s1, s2, s3, s4}, where (6) 3738 Y.Huangetal./Acomprehensivemechanismforhotelrecommendationtoachievepersonalizedsearchengine whereN , N arethenumbersofinterestfeaturesfor G ={v|sim all(u, v)<θ }. (12) u v 3 2 useruanduserv;jdenotesthejthcommoninterest In this study, for G , G , G , the weight of each featureforuseruanduserv;misthetotalnumberof 1 2 3 groupisdefinedasfollow: commoninterestfeaturesforuseruanduserv,freq uj and freq are the frequencies of the jth common vj sim average(u, G) interestfeatureinuseruanduservinterestfeatures, wi = (cid:20)3 i , (13) respectively. sim average(u, G) i (2)Thegreaterthevalueofuser’strust,themore l=1 the target user trusts another user, which means the whereuseruisthetargetuser,sim average(u, G) i higherthetrustsimilaritydegreebetweentargetuser istheaverageofthesimilaritiesbetweenmembersin and the other user. In addition, the greater the trust groupG andthetargetuser. i Y valueofthetargetuserhimself,thehighertherequire- mentintrustfortheusersinthesimilargroup.The 3.3. Thequantificationofhotelevaluation methodofcalculatingthetrustsimilarityisshownas: P criteria Sim trust(u, v) (1+min(trust , trust ))×trust In oOrder to improve the accuracy of recommen- = u v v, (7) dations, this study comprehensively considers three 1+trust(u) factors,whicharerating,onlinereviewsandpriceas where trustu, and trustv denote the trust degrees of hotCelevaluationcriteria. useruanduserv,min(trustu, trustv)istheminimum Rating.Theratingreflectsthesatisfactiondegree valueintrustu,trustv. of a user towards a hotel. For a hotel, some users (3)Foruseruanduserv,theirconsumptioncapac- rateitwithhighvalue,whileothersgivelowratings. R itiesaredenotedasHuandHv,thentheconsumption Simply seeking the average rating of all users just capacitysimilaritybetweenthemisdefinedas reflectslittleinformation.Inordertoretaintherating Opreferences of different users for the hotel as much sim c(H , H )=1−d(H , H ). (8) u v u v aspossible,foreachgroup,weuseanintuitivefuzzy Then, for user u and user v, the comprehensive numbertoexpressthesatisfactionofgrouptowards similarityiscalculatedasfollow: H hotel. Sim all(uv) IF(G)={(cid:3)μ(G), v(G)(cid:4)}. (14) i i i Sim int(u, v)+Sim trust(u, v)+Sim c(HT, H ) = u v . Inmostoftheelectroniccommerceplatforms,the 3 rating scale ranges from 1 to 5. A score of 4 or 5 U (9) represents that the user likes the hotel, and a score of 1 or 2 represents that the user does not like the According to the comprehensiveAsimilarity of hotel. Therefore, when the scale is in the range of users,wedividetheusersintothreegroups: 1 to 5, the ratios of ratings above 3 and less than 3 inthegroupindicatethatthedegreeofmembership • Similar group (G ): the user whose user simi- 1 andthedegreeofnon-membershiptowardshotel.G 1 larity is higher than θ is a member of similar 1 represents similar groups. G means weak similar 2 group. groupandG meansdissimilargroup. • Weak similar group (G ): the user whose user 3 2 Comparedtouserswithlowsimilarity,userswith similarityislessthanθ andhigherthanθ isa 1 2 highsimilarityhaveagreaterimpactonthepurchas- memberofweaksimilargroup. ing decision of target user. The ratings and online • Dissimilargroup(G ):theuserwhoseusersim- 3 reviews of similar group are more important than ilarityislessthanθ isamemberofdissimilar 2 thoseofweaksimilargroupordissimilargroupare. group. Foreachgroup(G ,G ,G ),wecancalculatetheir 1 2 3 Theequationtoidentifyeachgroupisasfollows: weightsbasedonEquation(13).Bygivingdifferent groupsdifferentweights,itnotonlymakestheresult G1 ={v|sim all(u, v)>θ1}, (10) moreaccuratethandirectlycalculatingtheaverageof allusers’ratingsandonlinereviews,butalsomakes G ={v|θ <sim all(u, v)<θ }, (11) therecommendationresultpersonalized. 2 2 1 Y.Huangetal./Acomprehensivemechanismforhotelrecommendationtoachievepersonalizedsearchengine 3739 Thecomprehensiveratingforalluserstowardsthe is used to express the target user’s consumption hotelisIF(G): capacity. (cid:9)3 IF(G)= wIF(G). (15) 3.4. Themodelofrecommendationbasedon i i TOPSIS i=1 Becausetheexpressionsofthreecriteriatoevaluate Online reviews. Because one online review may hotelsaredifferent,criterionvaluescannotbeaggre- havepositive,neutralandnegativeevaluationsatthe gateddirectlybyaggregationoperators.TheTOPSIS same time. In order to depict online reviews appro- methodaggregatesallcriteriabasedondistancemea- priately and reduce the loss of information, we use sure,andsortsthealternativesaccordingtothedegree singlevaluedneutrosophicnumberstoexpressonline Y ofcloseness.Thereisnospecialrequirementforthe reviews. According to the definition of SVNNs, the data form of each criterion value. In this part, we online review of user u on the hotel i is expressed as R˜ =(cid:3)T , I , F (cid:4). T indicates the positive will sort hPotels by TOPSIS method and form final ui ui ui ui recommendedlist.Thefollowingarethesteps: degreeofthepositiveonlinereview,andI indicates (1)Obtainthepositiveidealsolutionandtheneg- theneutraldegreeoftheneutralonlinereviewandF O ativeidealsolution. indicates the negative degree of the negative online The positive ideal solution consists of the posi- review. Further, we regard all users in the same group as tiveCideal value of each criterion, and the negative idealsolutionconsistsofthenegativeidealvalueof onevirtualuser.Thentheonlinereviewsofusersin each criterion. For the criterion of rating, the pos- the same group are regarded as one online review. itive ideal value is (cid:3)1,0(cid:4), which means all users TheonlinereviewforgroupG isdenotedas: i Rhave high ratings on hotel (4 or 5, in a scale of R˜ (Gi)=(cid:3)T (Gi), I(Gi), F(Gi)(cid:4). (16) 1–5). For the criterion of online review, the posi- tive ideal value is (cid:3)1,0,0(cid:4), which means all online Thenallusers’onlinereviewsarecalculatedas: O reviews contains only positive comments, in which (cid:9)3 thepositive-membershipdegreeoftheonlinereview R˜ (G)= wiR˜ (Gi). (17) is equal to 1. For the criterion of price, the positive H i=1 idealvalueis1,whichmeansthesimilaritybetween the price of hotel and the consumption capacity of Price.ThequantitationmethodofpriceisTsimilar useris1.Thenegativeidealvalueiscompletelyoppo- to that of consumption capacity in Section 3.1. sitetothepositiveidealvalueundereachcriterion. First,definethesetoflinguistictermsthatUrepresent Then,thepositiveidealsolutionA+andthenega- the price. Assume that there is a set S of nine tiveidealsolutionA−aredescriptedasfollows: linguistic terms to depict the price, where S = {s−4, s−3, s−2, s−1, s0, s1, s2, s3,As4}, s−4= A+ ={(cid:3)1,0(cid:4),(cid:3)1,0,0(cid:4),1}, (19) “Incomparable Cheap”, s−3=“Significantly Cheap”, s−2=“Cheap”, s−1=“Somewhat Cheap”, A− ={(cid:3)0,1(cid:4),(cid:3)0,1,1(cid:4),0}. (20) s =“Medially ”, s =“Somewhat Expensive”, 0 1 (2) Calculate the distance. For each alternative s =“Expensive”, s =“Significantly Expensive” 2 3 hotel,calculatethedistancebetweeneachalternative and s =“Certainly Expensive”. Then, collect the 4 + andthepositiveidealsolutionD ,aswellasthedis- price distribution of hotels in each city, and divide i tancesbetweeneachalternativeandthenegativeideal the price into 9 levels corresponding to the price − solution D . For the criteria rating, online reviews linguistic term of each city. The price of each hotel i and price, the distances can be calculated based on canbeexpressedasalinguisticterm,anddenotedas Equations(1,2and4). s .Lettheevaluationofpriceforhotelibedenoted β (3)CalculatetheCloseIndexC. asP: i i (cid:15) (cid:15) Pi =1− (cid:15)sα−9 sβ(cid:15), (18) Ci = Di+D+i−Di−. (21) where s is a linguistic term, α is the average Rank hotels based on Close Index. The greater α value of subscripts in the linguistic term set which the value of hotel’s Close Index, the higher the 3740 Y.Huangetal./Acomprehensivemechanismforhotelrecommendationtoachievepersonalizedsearchengine recommendedrankingofhotel.Thenrecommendthe Table1 hotelstothetargetuserubasedonthenewranking Alternativehotelsforrecommendation list. Hotel R1 R2 price Pi rating Hotel41 1 1 4274 s2 5 3.5. Theprocessofrecommendation MilestoneHotel 2 3 6978 s4 5 MontcalmHotel 3 8 2891 s2 5 HaymarketHotel 4 10 14937 s4 5 In this part, we will introduce the process of FourSeasonsHotel 5 15 7408 s4 5 mechanismforhotelrecommendationtoachieveper- TheDorchester 6 37 5783 s3 5 sonalizedsearchenginebasedontheresearcheswe TheCapitalHotel 7 42 3070 s1 5 haddiscussed.Theproposedapproachiscomprised AmpersandHotel 8 47 2795 s0 4.5 VentureHostel 9 690 241 s−4 3.5 ofthefollowingsteps: EuroQueensHotel 10 922 430 s−4 2.5 Y Step 1: Obtain the alternative hotels. Using search engine on e-commerce platform to filter the hotels P Table2 basedonthekeywordsenteredbythetargetuser. Asampleofusers’information User number vote tu trustu Step 2: Identify similar group for the target user. O brooket986 3 1 0.3333 0.0238 According to the user’s historical records and the chickenlips 2 2 1 0.0714 similargroupidentificationmethodproposedin3.2. ZenaB 13 12 0.9231 0.0659 ...C ... ... Foreachpairofusers,calculatethesimilarityusing RonaeF 1 0 0 0 Equations(6–9),anddivideusersintothreegroups. Step3:Calculatetheweightofeachgroupaccording Rsearch engine as alternative hotels and collect basic toEquation(13). informationabout10hotels.Foreachhotel,wecol- lect20users’ratingsandonlinereviews.Weusethese Step4:CalculatehotelevaluationvaluesunderthreeOdataasthetestset.Besides,wecollectuser’sinfor- criteria: price, rating, online review based on the mationandhistoricalcommentrecordsabouthotels methodsproposedinSection3.3. foreachuser,whichareusedasthetrainingset. H Table 1 displays the 10 hotels’ information. The Step5:UsetheTOPSISmethodtorankthealterna- first column is the name of hotel, the second one is tivesaccordingtoSection3.4.BiggervaluesofClose hotel’srankingin10hotels,thethirdcolumnishotel’s Indexareassociatedwithhigherranking. T rankinginallLondon’hotelsbasedonsearchengine, Step6:Recommendthehotelsbasedontheranking the fourth one is hotel’s price, and the sixth one is U listinstep5tothetargetuser. hotel’soverallrating.Table2showstheuser’infor- mation,inwhichthefirstcolumnisuser’sname,the A secondcolumnisthenumberofhotelsthattheuser 4. AcasestudybasedonTripadvisor.com comments, the third column is the number of other users that consider the user’s comments is helpful, Acasestudyisconductedinordertovalidatethe thefourthandfifthcolumnsaretheoriginalandstan- efficiency and applicability of the proposed method dardized trusts of user on Tripadvisor.com. Table 3 one-commercewebsiterecommenderproblems. isasampleaboutthedetailedrecordsofusers’com- mentsonhotels.Sincethereistoomuchtextinonline 4.1. Dataset reviewsandhotelsdescriptions,itisnotshownhere. In Table 3, the first column is user’s ID, the second Inthissection,weusethedatacollectedfromTri- columnisthehoteltheusercommentedon,thethird padvisor.com, which is a travel-sharing site where columnshowsuser’sratingtothehotelandthefourth users can comment on hotels they have lived in, andfifthcolumnsshowthepriceandcityofhotel. attractions they have visited and foods they have eaten. In this case study, we assume that the tar- 4.2. Result get user’s demand is to order a hotel in London by searchingforthekeyword“Londonhotel”.Thenwe Firstly, we identify users’ attributes of interest, randomlyselect10hotelsintherecommendedlistof trust and consumption capacity, respectively. The Y.Huangetal./Acomprehensivemechanismforhotelrecommendationtoachievepersonalizedsearchengine 3741 Table3 Table4 Asampleaboutthedetailrecordsofuserscommentedonhotels Apartofresultsabouttheusersimilarity UserID hotel rating price Cityofhotel UserID S1 S2 S3 S4 3 OakleyHallHotel 4 1393 Hampshire 2 0 0.0714 0 0.02381 3 TheSavoy 5 3579 London 3 0 0.0659 0 0.021978 3 FlemingsMayfair 5 2100 London 4 0 0.1428 0 0.047619 5 KahalaHotel 5 3876 Honolulu 5 0 0.0396 0 0.013228 5 MilestoneHotel 5 3093 London ... ... ... ... ... 198 DisneylandHotel 5 1585 HongKong Table5 198 HotelBoss 3 542 Singapore Theresultsaboutsimilaritygroups UserID G1 G2 G3 Y 1 2,3,9,13,14,19 5,8,11,12,16,18 4,6,7,10,15,17,20 2 1,3,9,13,14,19 5,8,11,12,16,18 4,6,7,10,15,17,20 3 2,3,5,7,14,20 6,8,9,12,13,19 4,10,11,15,16,17,18 P O Table6 Therankingofeachhotelforusers (cid:2)C(cid:2)Use(cid:2)rID(cid:2)(cid:2)(cid:2)Ho(cid:2)te(cid:2)l 1 2 3 4 5 6 7 8 9 10 1 4 7 1 8 6 5 3 2 9 10 2 4 7 1 8 6 5 3 2 9 10 3 4 8 2 7 6 5 1 3 9 10 R 4 4 7 1 8 6 5 3 2 9 10 5 1 5 4 6 8 2 3 7 9 10 28 5 2 7 3 4 1 6 8 9 10 O43 4 7 1.5 6 8 5 3 1.5 9 10 65 4 7 3 8 6 5 1 2 9 10 Fig.4. Thehotel’spricedistribution. 97 3 4 7 2 5 1 6 8 9 10 H 101 1 5 8 6 4 2 3 7 9 10 trustattributehasbeenstandardizedaccordingtothe 121 4 6 1 7 8 5 3 2 9 10 third column in Table 2 and the standardized trust 141 4 8 1 7 6 5 3 2 9 10 161 6 8 1 4 10 7 5 3 2 9 for user is shown in the fourth column in TTable 2. 181 5 10 3 8 9 7 6 4 2 1 Basedontheonlinereviewsandhotelsdescriptions, thekeywordsusedtorepresenttheusers’interestsare U displayedinFig.3,thelargerthesizeofaword,the Then, for a target user, for each item, the users higherthefrequency. who have evaluated the hotel can be divided into The consumption capacity of theAuser can be three groups. A part of the results is displayed in obtainedfromthedatainthefourthcolumninTable3. Table 5. Similarly, the first column is user ID, the We compare the hotel’s price distribution in several second to the fourth column are the users belongs cities, as Fig. 4. It is obvious that the price distri- to similar group, weak similar group and dissimilar butions in different cities are evidently distinct, the group,respectively. linguistictermscorrespondingtothesamepricemay ThenbasedonthemethodproposedinSection3.4, bedifferent. eachusercanobtainapersonalizedrecommendedlist Thenforeachuser,wecalculatethesimilaritywith forhotel.ThedetailsaredisplayinTable6.InTable6, otherusersbasedonEquations(6–9)proposedinSec- rows 2 to 6 are 10 hotels’ rankings for users 1 to 5 tion3.3.ApartoftheresultsisshowninTable4.The whobookedthe41hotel.Foreachoftheremaining first column is the users’ ID, the second to fourth ninehotels,weselectoneuserwhohavecommented columns are these users’ similarities with user 1 in on the hotel, and display the 10 hotels’ ranking for interests, trust, and consumption capacity. The fifth these users as rows 7 to 15. In search engine, the column is the overall similarity. Because user 1 has rankingofthese10hotelsis{1,2,3,4,5,6,7,8,9 nohistoricalrecords,thesimilaritiesbetweenhimand 10}forallusers.TheresultinTable6indicatesthat theothersininterestsandconsumptioncapacityare therankinglistsofthese10hotelsformostusersare zero. different. 3742 Y.Huangetal./Acomprehensivemechanismforhotelrecommendationtoachievepersonalizedsearchengine n o ecisi Pr 1 2 3 4 5 6 7 n model 1 model 2 Y Fig.5. ThePrecisionwiththechangeofn. Fig.6. Theprecisionaftereliminatingtheuserswhoorderedthe 4.3. Discussion top-(cid:2)hotels.P From Tables 4 to 6, for all users, the proposed O method has the ability to calculate their similarities The hotel’s ranking in search engine will remain withotherusers,toidentifytheirsimilargroupsand thesameforalongperiod.Fortheuserwecollected, to provide them a personalized recommended list. theCaccuracy of model 1 for users who booked top The feasibility of proposed hotel recommendation hotels is always higher than that for the user who methodhasbeenillustrated.What’smore,theresults booked bottom hotels. For instance, for users who inTable6illustratethattheproposedmechanismused bookedhotel1,thePrecisionofmodel2isequalto1, for search engine can provide personalized list for RwhilethePrecisionofmodel2foruserswhobooked users. hotel 10 is equal to 0. The Precision is extremely In the next step, we will discuss the accuracy of unstableforuserswithdifferentpreferencesonbook- O theproposedmethod.Inthisstudy,weusetheindex ing hotels. Therefore, we analyze the Precision for of Precision to measure the accuracy. Precision is different users with different preferences. Figure 6 the ratio of the items the user liked to all the rHec- showsthePrecisionaftereliminatingtheuserswho ommended items in the recommended items [48]. orderedthetop-(cid:2)hotels(where(cid:2)isathreshold).For Because in the test set, only one hotel per user has example,when(cid:2)=1,itmeansthatwejustcalculate beenordered,thereisafine-turningforthedeTfinition thePrecisionsforuserswhobookedhotels2tohotel of Precision. In this study, we consider the recom- 10. The results of Model 1(a) and model 2(a) show mendation is successful when the rankinUg of hotel thePrecisionsformodel1andmodel2respectively userorderedistop-n(wherenisthethreshold)among when the value ofn is 5. The results of Model 1(b) allhotels.Therefore,thePrecisionisthepercentage andmodel2(b)showthePrecisionsformodel1and ofprovidingsuccessfulrecommendatioAntoallusers. model 2 respectively when the value of n is 3. The ThecalculationofPrecisionisshownas: resultsofmodel1(c)andmodel2(c)showthePreci- sionsformodel1andmodel2respectivelywhenthe N(s) Precision= , (22) valueofnis1.ConsistentwiththeresultsinFig.5, N the Precisions of the proposed method are always whereN(s)isthenumberofprovidingsuccessfulrec- greaterthanthoseofsearchengineusedinTripadvi- ommendation,andNisthetotalnumberofproviding sor.comwhateverthevalueofnor(cid:2)is.InFig.6,we recommendation. canobservethattheprecisionsarezerofortheusers WecalculatethePrecisionoftheproposedmethod whopreferthebottom-5hotelwhateverthevalueofn (model1)andsearchenginecurrentlyusedinTripad- is.However,theproposedmethodhasstableperfor- visor.com(model2).Figure5exhibitsthePrecisions mance.Theresultfurtherprovesthedeficiencyofthe withthechangeofnfortwomethods. hotelrecommendedlistgeneratedbysearchengine, From Fig. 5, it is obvious that the Precisions of andindicatesthattheproposedmethodcanprovide theproposedmethodarealwaysgreaterthanthatof usersaccurateandpersonalizedhotelrecommended model2.Thatistosay,theaccuracyoftheproposed list. methodisimprovedcomparingtothesearchengine To go a step further, we discuss the accuracy for currentlyusedinTripadvisor.com. cold start users. The Precisions are presented as (in