Dialectic: Enhancing Text Input Fields with Automatic Feedback to Improve Social Content Writing Quality HamedNilforoshan,JamesSands,KevinLin,RahulKhanna,EugeneWu ColumbiaUniversity {hn2284,js4597,kl2806,rk2658}@columbia.edu,[email protected] 7 Abstract questions)willnotalwayshaveenoughusergeneratedcon- 1 tent to contain high quality documents to serve. In these 0 Modern social media relies on high quality user generated cases, it is more important to improve the quality of every 2 writingsuchasreviews,explanations,andanswers. Incon- usercontribution. trast to standard validation to provide feedback for struc- n a tured inputs (e.g., dates, email addresses), it is difficult to Tothisend,incentivessuchasbadgesandenhanceduser J provide timely, high quality, customized feedback for free- status(Wangetal.2013;Bosuetal.2013)encourageusers 4 form text input. While existing solutions based on crowd- to provide higher quality contributions, and seek to attract sourcedfeedback(e.g., upvotesandcomments)caneventu- 2 and retain high quality contributors. This can even include ally produce high quality feedback, they suffer from high material goods: Yelp Elite offers top contributors access latency and costs, whereas fully automated approaches are ] to exclusive events with free food and drink (Yelp 2016a). C limitedtosyntacticfeedbackthatdoesnotaddresstextcon- However,incentivesaregeneraltotheoverallservicerather tent.WeintroduceDialectic,anend-to-endextensiblesystem H thatsimplifiestheprocessofcreating, customizing, andde- thantoanyspecificuserinput(e.g.,aproductrevieworan- . ploying content-specific feedback for free-text inputs. Our swer), and may still fail to address the quality of long tail s c main observation is that many services already have a cor- content. [ pusofcrowdsourcedfeedbackthatcanbeusedtobootstrap afeedbacksystem. Dialecticinitializeswithacorpusofan- Rapidfeedbackduringthewritingprocessimprovescon- 1 notatedfree-formtext,automaticallysegmentsinputtext,and tent quality (Kulik and Kulik 1988). Recent systems use v helpsdevelopersrapidlyadddomain-specificdocumentqual- crowdsourcingthattrainspeerstogeneratecustomizedfeed- 8 ity features as well as content-specific feedback generation back for student writing assignments (Kulkarni, Bernstein, 1 functions to provide targeted feedback to user inputs. Our andKlemmer2015);however,thefeedbacklatencyisnearly 7 userstudyshowsthatDialecticcanbeusedtocreateafeed- 20minutes, toolongformanyonlinecontexts. Fullyauto- 6 backinterfacethatproducesanaverageof14.4%qualityim- matedsystemscanbeinstantaneous,butarelimitedtogram- 0 provementofproductreviewtext,over3xbetterthanastate- mar, misspellings and other syntactic errors (Madnani and . of-the-artfeedbacksystem. 1 Cahill2014;Microsoft2016;Google2016). 0 7 Introduction Learning-based methods are a promising middle- 1 ground(Krause2015b;BiranandMcKeown2014;Krause, : Modern social media relies on user generated text: prod- Perer,andNg2016)thatidentifyfeaturevaluesthatarein- v uctwebsites(e.g.,Amazon)relyonusersubmittedproduct dicativeoflowqualitytextandtranslatethemintofeedback i X reviewstohelpusersmakepurchasingdecisions;Q&Aser- tohelpimprovethecontent. However,twofactorslimitthe vices(e.g.,StackOverflow,reddit,Quora)relyontheavail- feedbackqualityofexistingapproaches;feedbackaddresses r a abilityofhighqualityusergeneratedquestionsandanswers. featuresinisolationwithouttakinginteractionsbetweendif- Theirsuccessisdirectlyrelatedtothequalityofthecontent ferent features into account, and is often limited to broad shown to users, and a key challenge is to manage and im- suggestionsatthefulldocumentgranularity(Krause2015b; provethequalityoftheusergeneratedtextcontentthatthey Boomerang 2016; FoxType 2016). In contrast, text-quality serve. predictionworkhighlightsthevalueofaccountingformulti- The primary quality control methods are filtering to re- feature interactions (Weimer, Gurevych, and Mühlhäuser move/hide low quality or spam content (Spirin and Han 2007; Ghose and Ipeirotis 2011), and feedback psychol- 2012)andrankingtoprioritizetheuser-generatedcontentto ogy studies show that localized, targeted feedback is cru- servebyusinguser-basedratings(Tang,Hu,andLiu2013; cial(NelsonandSchunn2007),butthesebenefitshaveyetto Guy 2015) or automatic rankings (Agichtein et al. 2008; beintegratedintoafeedbackgenerationsystem.Finally,de- Wangetal.2013;YangandAmatriain2016;Siersdorferet spiteresearchinnewpredictivefeaturesandfeedbackgen- al.2010). Theseapproachesfundamentallyassumethatthe eration,itissimplydifficulttobuild,customize,anddeploy corpus of user contributions is sufficiently large; however, an end-to-end feedback systems to test new ideas, and to thelongtailofentities(e.g.,lesspopularproducts,esoteric sustainsuchsystemsinlightofnewfindingsintheseareas. (a)Non-annotatedreviewtext. (b)Documentfeedbackisshown (c)Hoveringoverhighlightsshows belowthetext. segmentspecificfeedback. Figure1: ExampleofDialecticfeedbackinterface. To this end, Dialectic is a system to easily build and de- aminehowautomatedfeedbackimprovesproductreview ploy new writing feedback systems. It automates common writing. Combining our perturbation technique to gen- tasks such as text segmentation, training, and deployment, eratesegment-specificfeedbackimprovesaveragereview so developers can focus on domain-specific tasks such as qualityby14.4%,andover3xmorethanabaselinebased creating features and generating feedback explanation text. onstate-of-the-artfeedbackgeneration. Althoughmanycomponentsarewellknown,theircombina- tionintoauser-facingextensiblesystem(Figure1)doesnot RelatedWork exist. Inaddition,ouruserstudyshowsthatcombiningseg- Our research focuses on a system for providing targeted mentationandournovelperturbation-basedfeedbackgener- writingfeedbacktoimproveusertextcontributionsonweb- atesmoreeffectivewritingfeedback,comparedtoapplying sitesthatrelyonuser-generatedtextcontent. Thesuccessof eithertechniqueseparatelytoalternativestate-of-the-artsys- such websites partially depends on presenting high quality tems. contentandusercontributions(Archak, Ghose, andIpeiro- Developers first provide a corpus of labeled user gen- tis2011;Li,Ghose,andIpeirotis2011;Ghose,Ipeirotis,and erated text (e.g., helpfulness votes for product reviews, or Sundararajan2007;GhoseandIpeirotis2009). Assuch,the up/down votes for comments). Offline, Dialectic automati- relatedworkfallsintoseveralcategories: post-hocmethods cally segments the text documents based on a configurable thatmaximizethequalityofsurfacedcontentthroughfilter- segmentation method (by default, we segment by topic), ingandranking, indirectmechanismstoimproveusergen- and trains document and segment-level quality prediction eratedwritingquality, anddirectfeedback-orientedmecha- models. To generate feedback for a text input, Dialectic nismstoimprovewritingquality. uses the models to find low-quality text, and uses a novel perturbation-based technique to identify combinations of Post-hocApproaches: Post-hocapproachesfilterpoorcon- featuresthataffectthewritingquality. Developer-provided tent (Spirin and Han 2012) such as spam; sort and sur- explanationfunctionsmapthesefeaturesintofeedbacktext. face higher quality content (Tang, Hu, and Liu 2013; Guy Asnewfeaturesareidentifiedintheliterature,theycaneas- 2015; Agichtein et al. 2008) such as product reviews (Mu- ily be incorporated. In our evaluation, Dialectic generated dambiandSchuff2010), answerstousercomments(Wang feedback for multiparagraph text inputs (reviews ranging et al. 2013; Yang and Amatriain 2016), or forum com- from1-5paragraphs)in<1second. ments(Siersdorferetal.2010);oredituserreviewsforclar- Insummary,ourmaincontributionsare: ification or grammatical purposes (Ipeirotis 2016). These • The design and implementation of Dialectic, an exten- approachesassumealargecorpusthatcontainshighquality sible text feedback system that combines text segmen- content for every topic (e.g., product or question). In re- tation, classification, and explanation to automate both ality, there is often a long tail of topics without sufficient broad document-level and granular segment-level feed- content for such approaches to be effective (Saito 2016; back on user generated writing. To use the system, it McAuley,Pandey,andLeskovec2015). Forsuchcases,im- can be installed using the python package manager pip provingqualityupstreamduringuserinputprocessmaybe anonymizedforsubmission. moreeffective. • Anovelperturbation-basedtechniquethatidentifiescom- IndirectMechanisms: Indirectmethodssuchascommunity binations of features that, if changed, will most improve standards and guidelines (Nov 2007; Bakshy, Karrer, and thepredictedqualityofinputtext.Weformalizethisprob- Adamic2009;Amazon2016)helpclarifyqualitystandards, lem,andproposeanefficientheuristictosearchtheexpo- whileup-votesandratingsprovidesocialincentives(Much- nential solution space that empirically produces helpful nik, Aral, and Taylor 2013; Bosu et al. 2013). Incentive feedback. mechanisms such as badges, scores (Ghosh 2012; Deterd- • End-to-end evaluation by training Dialectic on Amazon ingetal.2011),status(Zappos2016),orevenmoney(Kim product reviews using existing features from the litera- 2012;Ipeirotis2016)havealsobeenusedtokeepgoodcon- ture, and conducting a crowd-sourced user study to ex- tributors. These methods focus more on finding good con- tributors and lack content-specific feedback (e.g., discuss gestingimprovementstothetextquality. Ourworkfocuses cameraqualityforaphone). onwhite-boxrandomforestmodelsandexploresnovelways tochangethemodel’spredictionbyperturbingtheinputfea- DirectWritingFeedback: Wefocusonfeedbackinterfaces turevector.Tothebestofourknowledge,wepresentthefirst to improve text quality during the writing process. Crowd- attempttogeneratesuggestionsonhowtoimprovetheutil- based feedback has been shown to improve writing qual- ityofaspecificdocumentbyexaminingtheglobalspaceof ity but can take 20 minutes to generate feedback. (Kulka- possiblealternativeclassificationsinarandomforestmodel. rni, Bernstein, and Klemmer 2015)—existing research em- phasizes the importance and benefits of immediate writing SystemDesign feedback(KulikandKulik1988)andsuggeststhevalueof automatedapproaches. Althoughdeepsemanticfeedbackis automatedincodedevelopmentenvironments(RoyChoud- A Segmentation Oriented Approach: Dialectic is an auto- hury,Yin,andFox2016;Singh,Gulwani,andSolar-Lezama matedfeedbackgenerationsystemthataugmentstraditional 2012;RiversandKoedinger2014),thesametechniquescan- document-levelfeedbackwithlocalizedsegment-levelfeed- not begeneralized from highlyconstrained code grammars back. Existingresearchonwritingfeedbackshowsthatlo- tofree-formhumantext. calized feedback is more helpful because it increases the likelihood that feedback recipients will identify and im- Dialecticusesatwo-stepautomatedapproachtogenerate prove the specific issue (Nelson and Schunn 2007). In on- feedbackfortheentiredocument’stextaswellasindividual linewritingsettings,feedbackintheformofcommentsare documentsegments.Itfirstusesmodelstoidentifylowqual- more effective at motivating revision than a standardized ity text and then generates targeted feedback for that text. rubric (Kulkarni, Bernstein, and Klemmer 2015). Thus, in Thefirststepismotivatedbyautomaticessaygraders,which order to provide targeted and free-form feedback to users, usepredictivemodelstoassignoverallqualityscorestowrit- Dialectic first segments the input document and then sug- tencontent(Valenti,Neri,andCucchiarelli2003;Farra,So- gests changes on a segment-level, in addition to providing masundaran, and Burstein 2015; Attali and Burstein 2004; generaldocument-levelfeedback. Figure2summarizesthe Madnani and Cahill 2014). Numerous predictive features overalldataflowtogeneratesegment-levelfeedback. Note have been studied across text domains such as code re- thattheclassificationandfeedbackoccursforbothlowqual- views (Krause 2015a), YouTube comments (Siersdorfer et itysegmentsaswellasfortheentiredocument;however,our al. 2010), Amazon product reviews (Ghose and Ipeirotis discussionwillfocusonthebenefitsandimplementationof 2011;Kimetal.2006),moviereviews(Liuetal.2008),and segment-levelfeedback. redditcomments(Tanetal.2016). Ratherthaninnovateon features,wesurveyandincludeabroadrangeofdefaultfea- Architecture Overview: Figure 3 depicts the Dialectic sys- turesfrompriorworkinDialectic,whichwedescribeinthe tem architecture, which consists of offline and online com- PredictingLow-qualityText subsection of the Implemen- ponents. The offline components (blue arrows) take as in- tationsection. put a corpus of training data in the form of user generated The second step is generating feedback that suggests textdocumentsandtheirlabels–forinstance,Amazonprod- changestomostimprovethepredictedtextquality. Acom- uct reviews may be labelled by the ratio of “helpful” and mon method is to identify individual features that are out- “unhelpful” votes. The Segmenter first splits each docu- liers from “typical” feature values in high quality docu- ment into segments. The Model Generator then trains a ments, and map these outlier features to pre-written feed- collection of classification models to predict the quality of back text (Krause, Perer, and Ng 2016; Boomerang 2016; a user’s overall text submission as well as its constituent Biran and McKeown 2014). However, existing techniques segments; these are cached in the Model Store. The online use linear models that both ignore multi-feature interac- components(greenarrows)sendthecontentsofatextinput tions, and are often less predictive of quality than Ran- widget, along with an optional corpus name, to the web- dom Forests (Weimer, Gurevych, and Mühlhäuser 2007; server. DialecticusesthemodelsintheModelStoretoiden- Ghose and Ipeirotis 2011), which are able to account for tifywhethertheentiredocumentand/orsegmentsgenerated multi-feature interactions. Consider a product review con- by the Segmenter are low quality. The Feedback Genera- taining a long, angry diatribe about customer service. By torthenconstructsfeedbackexplanationsforthelowquality studyingsentimentandlengthfeaturesinisolation,existing text,whicharereturnedanddisplayedinthetextwidget. systems may suggest reducing length and decreasing emo- tion. However, such systems would not recognize that the (cid:1) review can be most improved by simultaneously reducing (cid:2) theemotioninthetextandincreasingthelengthbyinclud- (cid:1) ingmoredetailsabouttheproduct. 1. user input 2. segment by 3. estimate 4. targeted There is related to work that seeks to explain a model’s topic quality feedback predictions by e.g., generating a sparse more understand- Figure2: Dialecticsplitsuserinputintocoherentsegments; able model (Ribeiro, Singh, and Guestrin 2016) or identi- estimates the quality of each segment and the text as a fying subsets of the input sufficient to maintain a model’s whole; and generates and shows suggested improvements prediction(Lei,Barzilay,andJaakkola2016). Theseexpla- totheuser. nation approaches are complementary to our goals of sug- PredictingLow-qualityText Textbox UI Dialectic reduces the task of supporting new domains to adding new domains-specific feature extraction methods1 Training data Web Server Dialectic and mapping features predictive of low quality text to ex- ( , l a b e l ) planation text (next subsection). Dialectic implements a li- Segmenter Feedback Gen ( , l a b e l ) SegmenterLibrary explanations brary of state-of-the-art features to help bootstrap new de- ployments. Fromasurveyofexistingliterature,weidentify ( , l a b e l ) … Model Gen (offline) andimplementadiverselibraryof47featuressummarized Feature Library Model Store in Table 1 (Krause 2015a; Liu et al. 2007; Tan et al. 2016; Siersdorferetal.2010;GhoseandIpeirotis2011). Thepri- Figure3: Dialecticonlineandofflinearchitecture. Bluear- mary features that we do not include are those that rely on rowsdepicttheofflinetrainingandstorageprocess. Green applicationmetadatasuchastheauthor’shistory,whichare arrowsdepicttheonlineexecutionflowwhenausersubmits. predictiveofqualitybutnotrelatedtothewritingcontent. Dialectic uses the feature library to build two models to Implementation predict the quality of individual segments generated by the Segmenter, and for the full document. Following Ghose et TheinputtoDialecticisalabeledcorpusofuser-generated al.(GhoseandIpeirotis2011)weusearandomforestclas- text documents (e.g., product reviews and their helpful- sifier,andweexploititsstructureforourperturbation-based ness ratings), which the system uses to train document and feedbackexplanationtechnique.Giventhelabeledcorpusof segment-level prediction models. The models are used to documents,Dialectictrainsboththesegmentanddocument- generatefeedbackfornewuserinputtext. Togeneratefeed- back, we assume a utility function U : N → R over the level classifiers. For each model, we run recursive feature elimination to select the optimal subset of features for the classificationlabelsandseektoidentifychangestotheuser segmentanddocumentmodels(Guyonetal.2002). inputthatwillmaximizetheexpectedutilityfunctionoutput. Ourmodelperformscompetitivelywithpriorwork,which For instance, our evaluations use binary helpful/unhelpful predictsthequalityofAmazonDVD,AVplayerandCamera labels with utilities 0 and 1, respectively. We also assume reviews with 83% accuracy when not using metadata fea- thattheuserhasprovidedE,asetofdomain-specificexpla- nationfunctionse:F×Rn →textwhichtakeasinputaset tures(GhoseandIpeirotis2011). Dialectic’sdefaultmodel onthesamesetuppredictsat85%accuracy—theslightim- offeaturesandtheirvaluesforagiventextinput,andreturn provementisduetotheadditionalfeaturesintheoftopicand a text-feedback (see Feature Generator subsection in Ex- similaritycategoriesfromotherliterature(Table1). Weval- planation Functions). In short, Dialectic identifies features idated the model on reddit comments from the AskScience oftheinputdocument(oritssegments)that,ifchanged,will subreddit2 andpredictedcommenthelpfulnessonanevenly mostimproveitsutility, andusestheexplanationfunctions balancedsamplewith80%accuracy3. totransformthesefeaturesintofeedbacktext. Onechallengeistrainingthesegment-levelclassifierbe- cause most corpi are labeled at the document level. In the Segmenter DialecticSetup section of the evaluation section, we show The segmenter supports any segmentation algorithm that an effective default is to train the segment-classifier using splits multi-paragraph text into an array of text segments. document labels as a proxy (i.e. a segment from an 80% Contributor rubrics across many social media services helpfuldocumentisassignedan80%helpfultraininglabel). are structured around topics (Yelp 2016b; Amazon 2016; Wikipedia 2016), and psychology research suggests that FeedbackGenerator mentally processing the topical hierarchy of text is funda- Oncethemodelsidentifylow-qualitytext(asegmentorthe mentaltothereadingprocess(Hyönä,LorchJr,andKaaki- whole document), they must generate specific feedback to nen2002). Thus,bydefaultwesegmentandcritiquedocu- helpusersimprovetheirwriting. Dialecticfirstusesanovel mentsattopic-levelunits. perturbation-basedanalysisinspiredby(Krause, Perer, and Tothisend,weuseatechniquecalledTopicTiling(Riedl Ng 2016) to estimate the amount that the document’s fea- and Biemann 2012), an extension to TextTiling (Hearst turevectorneedstobeperturbedinordertobeclassifiedas 1997), thatusesaslidingwindowapproachtocomputethe highquality. Thisisusedtoassigneachfeaturean“impact- LDAtopicdistributionwithineachwindowandcreateanew score” that represents the amount that it can improve the segmentwhenthedistributionchangesbeyondathreshold. qualityscorewhenperturbedsimultaneouslywithotherfea- TopicTiling outperformed other topic segmenters (Misra et tures. These scores are then used to select the explanation al. 2011; Ji-Wei Wu 2011) in terms of their WindowDiff functionsthatgeneratedocument/segment-specificfeedback score (Pevzner and Hearst 2002) on our hand-segmented, text. testcorpi. Developers can easily add custom segmentation algo- rithmsandprovideasmalltestingcorpusofpre-segmented documents. Dialectic then benchmarks its library of sege- 1Functionsthattransformtexttoafixed-sizenumericvector. mentation algorithms and recommends the one with the 2https://www.reddit.com/r/askscience/ highestWindowDiffscore. 3Wedefine>1netup-votesashelpfuland≤1asunhelpful. Category # Description mined jargon word and named entity stats (Minqing Hu 2004), length measures (word, sentence, etc. Informativeness 8 count) Topic 5 LDAtopicdistributionandtoptopics (Blei,Ng,andJordan2003),entropyacrosstopicdistribution opinionsentencedistributionstats (MinqingHu2004),valence,polarity,andsubjectivityscoresanddis- Subjectivity 15 tributionacrosssentences (GhoseandIpeirotis2011;Gilbert2014;Loria2014),%uppercasecharacters, firstpersonusage,adjectives spellingerrors (Kelly2011),ARI,Gunningindex,Coleman-Liauindex,FleschReadingtests,SMOG, Readability 15 punctuation,partsofspeechdistribution,lexicaldiversitymeasures Similarity 4 variousTF-IDFandtoppartsofspeechcomparisonswithsampleoflowandhighutilitydocuments Table1: Summaryofdefaultfeaturelibrary(fulllistwillbeavailableintechnicalreport) Perturbation-based Feature Impact Our perturbation- By default Dialectic uses the hamming distance as the basedanalysisextendspriorworkbyKrauseet. al (Krause, dist()inthe∆function,andcomputesC asthepercentage Perer, and Ng 2016), which perturbs each feature in isola- oftreesthatvoteforthemajoritylabel. tion to estimate the prediction’s sensitivity to each feature. Problem1 Givendatapointd, computeSd forallfeatures However, this ignores complex patterns in writing quality i F that emerge from multi-feature interactions. For this rea- son,weusenon-linearrandomforestmodelsthatarecapa- HeuristicSolution: ThespaceofsolutionsforProblem1is bleofmodelingthesecomplexfeatureinteractions(Weimer, exponentialinthenumberoffeaturesusedbythemodel,be- Gurevych,andMühlhäuser2007;GhoseandIpeirotis2011). causethecardinalityofthepowerset|P(F)|=2|F|,mean- Dialectic estimates the amount that perturbing sets of fea- ingthatfornfeaturesthereare2npossiblesetsofperturba- tures have on model predictions for a document’s feature tions to naively explore. We instead present a heuristic so- vector,whichcanbedoneefficientlybyexploitingthestruc- lutionwhosecomplexityislinearinthenumberofpathsin ture of random forest models. Below, we formalize the therandomforestmodel. Themainideaistoscaneachtree model-independentproblemoffeedbackgeneration,andde- in the random forest and compute perturbations and scores scribe a heuristic solution based on random forests. The localtothetree. Evaluationsectioncomparethisapproachtoexistingtech- Let the D = {d ,...,d } be the training dataset and 1 m niquesthatexaminefeaturesinisolation(Krause2015b). Y ={y ,...,y }betheirlabels.Therandomforestmodel 1 m ProblemSetup: LetF bethesetofnmodelfeatures, and M ={T1,...,Tt}iscomposedofasetoftrees.AtreeTiis f denote the ith feature. Let d ∈ Rn be a data point (text composedofasetofk decisionpathsq1,...,qk;eachpath i i i documentorsegment)representedasafeaturevector,where qj matchesasubsetofthetrainingdatasetDj ⊆ D andits i i dicorrespondstothevalueoffi.Forinstance,F maybethe votevj isthemajoritylabelinDj.Thus,theoutputofT (d) textfeaturesdescribedabove, andadatapointcorresponds istheivoteofthepaththatmatchiesd,andtheoutputofithe totheextractedtextfeaturevector. AmodelM : Rn → N randomforestM(d)isthemajorityvoteofitstrees. classifies data points as M(d) ∈ N, and a utility function Let minp(d,qm) return the minimum perturbation p U : N → Rmapsalabeltoautilityscore. Aperturbation (basedonitsL2niorm)suchthatdmatchespathqm: p ∈ Rn is a vector that modifies a data point, for a set of i features. pi ∈R−{0}iffiisaperturbedfeatureintheset, minp(d,qim)=argmin|p|2s.t.qimmatchesd+p otherwisep =0. p∈Rn i Ourgoalistoidentifyfeaturesubsetsofthetestdatapoint Ratherthanexaminingallpossibleperturbations,ourheuris- dthat,ifperturbed,willmostimproved’sutility4. Todoso, tic to compute Sd restricts the set of perturbations with re- i we first define the impact I(d,p) for an individual pertur- specttothedecisionpathsinthetreesthatincreased’sutil- bation p as the amount that it improves the utility function ity. TheimpactfunctionI()isidentical,howeverittakesa discountedbytheamountoftheperturbation∆(p)andthe pathqj asinputandinternallycomputestheminimumper- i model’spredictionconfidenceC(d+p)∈[0,1]. turbationminp(d,qj). Finally, wecomputetheconfidence Sid computestheoverallscoreforfeaturefi basedonthe C(d)asthefractioniofsamplesinDjwhoselabelsy match impactofallperturbationsinvolvingf : i k i thepath’spredictionvj. i U(M(d+p))−U(M(d)) I(d,p)= ∆(p) ×C(d+p) Sid = (cid:88) (cid:88) I(d,qij)ifU(vij)>U(M(d)) ∆(p)= (cid:88) dist(pi) Ti∈Mqij∈Ti i∈[0,n] I(d,qj)= U(vij)−U(M(d)) ×C(d+minp(d,qj)) Sid = (cid:88) I(d,p) i ∆(minp(d,qij)) i p∈Rn,pi(cid:54)=0 C(d)= |{dk ∈Dij|yk =vij}| |Dj| i 4Nofeedbackneededifdatapointalreadyhashighutility. Our implementation indexes all paths in the random for- thedistributionoflow-utilitydocumentsexplanationscores est by their utility. Given d and predict utility U(M(d)), andusingapercentile(i.eV =80thpercentileofscores). weretrieveandscanthepathswithhigherutility. Foreach scanned path q, we compute the change in the utility func- FeedbackInterfaceandUsage tion, discount its value by the minimum perturbation p as Our default interface can be seen in (Figure 1). Research well as the path’s confidence. Finally, we add this value to showstheefficacyofhighlightingtoguideusers(Antwarg, thescoreofallfeaturesperturbedinp. Thefinalscoresare Lavie, and Rokach 2012) and the importance of provid- usedtoselectfromthelibraryofexplanationfunctions. Per- ing feedback immediately after writing (Anderson 2004); formingtheindexingprocessduringtheofflinephaseallows wethereforehighlightlow-qualitysegments,andonlyshow uscomputethesescoresontheorderof 1 seconds. feedbackaftertheuserfinisheswritingandpushesGetFeed- 10 back. ExplanationFunctions Weassumethatthedeveloperhas Dialectic is designed to require minimal front-end implementedexplanationfunctionse:F×Rn →text. An changes. Dialectic provides a javascript library that auto- explanation function has two parts, the text-generation and matically augments text input elements with feedback sup- mapped features. The text-generation method takes as in- port (Figure 1). After running the pipeline on their corpus putthefeaturevector, thedocumenttext, itssegments, and , userssimplyincludetheDialecticjavascriptlibraryanno- thesegmentidifappropriate,andreturnsthefeedbacktext. tatetextboxandformsubmissionHTMLelementswithspe- Mappedfeatures(⊆F)isalistoffeaturesforwhichahigh cial attributes prefixed by dialectic-. The library also average perturbation impact score indicates that the expla- providesdeveloperswithaJavascriptAPItocustomizethe nation should be executed. For example, features relating designandinteractionsofthefeedbackinterfaces. Dialectic toreadability(i.eARI,FleschReadingTests,misspellings) isavailableasapippackageanonymizedforsubmission) couldbemappedtoanexplanationfunctionthataskswriters torevisetheirtexttobemoreclear. Developerscanthenuse Evaluation priorliteraturetomanuallymapfeaturestoexplanations(we We evaluate Dialectic on an existing corpus of Amazon demonstratethisprocessintheevaluationsection).Infuture product reviews (McAuley, Pandey, and Leskovec 2015) work, we hope to learn mappings, provided a training set through a crowdsourced Mechanical Turk study. We com- oflow-qualitydocumentslabeledwithrelevantexplanation pare against a state-of-the-art feedback system (Krause functions. In practice, developers extend an Explainer 2015b)alongtwodimensions—granularityandexplanation class,implementacall()methodthatperformsthetext- selection(seeExperimentDesign). generation,andimplementafeatures()methodtore- turnthemappedfeatures.Thefollowingabbreviatedsnippet DialecticSetup sketches the Not Enough Detail function in our evaluation, whichrecommendsproductfeaturestoincludeinthereview WefirstdescribehowtheDialecticmodelsandexplanation basedonthetext’stopicdistributionandthenumberofprod- functions were configured to run these experiments. The uctfeaturesdetected. mainchallengeisthatreviewsonlycontaindocument-level traininglabels,andwemustalsotrainasegment-levelclas- class NotEnoughDetail(Explainer): sifier. def features(self): return [’topics’, ’featureCnt’,’length’] Model Training: We trained our models using our de- def __call__(self, feats, text, segs, seg_id): fault library of 47 features on the Amazon review cor- if feats[’featureCnt’] < 10: pus, and used a cut-off of ≥ 60% “helpful” votes as return suggest_new_prod_feats(feats[’topics’], text) high quality reviews (positive labels) and the rest as low Selecting Explanation functions We first adjust feature quality (negative labels) based on (Archak, Ghose, and impactscorestoreducebias;featuresclosertotherootwill Ipeirotis 2011). The document-level classifier (acc=85%, happen to occur in more feature sets and have artificially prec=81.2%, recall=87.9%) was competitive with existing higherscores. Weadjusteachfeature’simpactscoreSd by literature (Ghose and Ipeirotis 2011; Archak, Ghose, and i computing asample mean u and standarddeviation σ for Ipeirotis2011)onabalancedsampleof500reviews. i i thatscorefromasampleoflow-utilitydocuments. Wethen Forsegment-levelclassifiers,wehypothesizedthatdocu- compute: Snormd = Sid−ui. We now score explanation ment quality is sufficiently correlated with segment quality i σi that a document’s label can be used as the labels for seg- functionsbytheaverageimpactscoresoftheirmappedfea- mentsfortrainingpurposes. Tovalidatethis,wefirsttrained tures. Thiscanberepresentedasaseriesoffastmatrixop- a segment classifier using this assumption. We then ran a erations: Let (cid:126)s ∈ Rn where (cid:126)s = Snormd and construct i i crowdsourced study to label 500 balanced segments (250 matrixA ∈ Rm×n formexplanationfunctions(A = 1if ji each from helpful/unhelpful reviews)5. Despite shorter in- explanationjismappedtofeaturei,otherwise0). Compute puttext,theclassifierperformedreasonablywellatpredict- (cid:126)e ∈ Rm = A(cid:126)s. (cid:80)ni=(cid:126)e1jAji istheaverageimpactscoreofall ing the manual segment labels (acc=73.6%, prec=76.3%, featuresmappedtothejthexplanationfunction.Wethense- lectandreturntheconcatenatedtextoutputsforexplanation functionswithscore>V (athreshold). WemanuallysetV 5For space constraints, we simply report the results of this in our evaluation, though it may be estimated by sampling study. Features # UnhelpfulnessReasons ExplanationFunction Lack of Information, Infer input topics and suggest relevant, unmentioned product features (Minqing Hu Informativeness 8 Not Enough Detail, Too 2004)(Usescachedlistofminedproductfeaturesandassociatedtopicsfromtraining), Short Infertopicsofinputtext,suggestseveraltopicsthathavealowporportionwithinthis Topic 5 IrrelevantComments distribution,andcorrelatehighlywithhelpfulreviews Overly Emotional/Bi- Identifywordsthatmostcontributetotheinputtext’ssentimentscore (Gilbert2014), Subjectivity 15 ased andrecommendrevisingtobemorebalancedintheirwriting. Readability 15 PoorWritingStyle Asktheauthortorevisethewritingofthesegmentoroveralldocumenttobeclearer Table2: Summaryofthe4productreviewexplanationfunctions. Reasonsarefrom(Connors,Mudambi,andSchuff2011) recall=69.7%).Theseresultssuggesttheefficacyofoursim- ExperimentDesign plesegment-levelclassifier,thoughmorestudiesareneeded We evaluated Dialectic through a crowdsourced study on tofullyevaluatethishypothesisacrossothertextdomains. MechanicalTurk. Wecomparedfourfeedbacksystemsthat Explanation Functions: We setup Dialectic to generate ex- varied along two dimensions—granularity varies the feed- planations for a broad range of reasons for why a review backtobeatthedocumentlevel(Doc), oratthedocument maybeunhelpful. Priorworkfoundthat75%ofreasonsfor and segment level (Seg), while explanation selection com- unhelpfulreviewswerecoveredby(inpriorityorder)overly paresthetechniquefrom(Krause2015b)(Krause)withDi- emotional/biased opinions, lack of information/not enough alectic’sperturbation-basedexplanationselection(Perturb). detail, irrelevant comments, and poor writing style (Con- Thisresultsina2x2between-subjectsdesign. Dialecticde- nors, Mudambi, and Schuff 2011). These reasons natu- notesthesegment-levelperturbation-basedsystem. rallymapto4ofthe5Dialecticfeaturecategories: Subjec- Krauseisbasedon(Krause2015b),whichwasshownto tivity, Informativeness, Topic, and Readability respectively outperformshowingwritersstaticexplanationsofimportant (Table1). Wecreatedoneexplanationfunctionforeachcat- componentsofahelpfulreview(similartoarubric), inthe egory(Table2)andmappedthefunctionmanuallytomodel contextofuniversitystudentswritingcode-reviewsfortheir features belonging in that category. For instance, a high peers. Krausemodelstheprimaryqualitiesofahelpfulre- perturbation-basedrankingforatopic-relatedfeaturewould view (specificity, subjectivity, etc.) using a set of explan- be mapped to the “off-topic” explanation function. In fu- able features (i.e document length, emotion). It computes turework, wehopetolearnfeaturetoexplanationfunction the mean and standard deviation of these features across a mappingsfromdevelopersprovidedexamples. corpusofhighqualitydocuments. Ifauser-generateddoc- ument’s feature value is more than 1.5 SD from the mean, ThisprocessillustrateshowDialecticcanbeextendedto its explanation function is executed. Features are mapped newdomains—inmanycases,thereissubstantialliterature toexplanationfunctionsthatreturnsstatictextabouthowto that 1) identifies reasons for low document quality, 2) con- improve that feature. Our Krause condition implemented structsfeaturesbasedonthesereasons,and3)suggestsways the features from Krause as well as domain-specific fea- tostructureandprovidefeedbacktothewriter. Oneofour tures that were assigned a high feature weight by Dialec- contributionsistoprovideanend-to-endsystemsothatthe tic’srandomforestmodel(#ofproductfeatures/jargonand abovetypesofliteraturecanbeeasilyaddedintheformof Coleman-Liauindextopredictspecificityandreadabilityre- features and explanation functions. Dialectic then uses the spectively). We supplemented the features of Krause be- features to train models to detect low quality text, and the causethepurposeoftheseexperimentswasnottoshowthat perturbation-based technique to select and generate the ex- ourspecificcocktailoffeaturesout-performpriorwork,but planations that are suitable for each low quality document, toshowtheefficacyofgeneratingsegment-specificfeedback taking into account multi-feature interactions. Our subse- and the value of our perturbation-based explanation selec- quent field experiment demonstrates the improvements in tion. user-generatedtextthatfeedbackfromDialecticachieves. To summarize, each participant was randomly assigned to one of four conditions: Doc+Krause, Seg+Krause, Doc+PerturbandDialectic(Seg+Perturb). Participants: We recruited 85 workers on Amazon’s Me- chanical Turk (61.2% male, 38.8% female, ages 20-65 µ =32, σ =8.5). Participants were randomly assigned age age to one condition group; all conditions had 21 subjects ex- cept the Dialectic condition which had 22. No participant hadusedDialecticbefore.71.3%hadwrittenapriorproduct Figure 5: Subjective agreement to: “The post-feedback re- review;allhadreadaproductreviewinthepast. Allpartici- visionsimprovedonthepre-feedbackreview.” pantswereUSResidentswith>90%HITacceptrates. The averagetaskcompletiontimewas14minutes,andpayment was$2.5(∼$10/hr). Figure4: Improvementininformativity,subjectivity,readability,andoverallqualityLikertscoresacrossallfourconditions Procedure: Participants were asked to write and then re- inrandomorderanddidnothaveaccesstoanyotherinfor- viseareviewoftheirmostrecentlyownedlaptopcomputer. mationaboutthetext. Weusedaqualificationtasktoensureparticipantshadever Rubric Description: The rubric asks coders to scores re- owned a laptop. We explained the feedback interface and viewsonhelpfulnesstolaptopshoppers. Itdefinesthethree did not offer a rubric, then asked participants to write their main measures, and provides examples from the Amazon review“asiftheyaretryingtohelpsomeoneelsedecideto corpusthatcontributepositivelyandnegativelytoeachcri- buythatlaptopornot... astheywouldonareviewwebsite teria. Informativity is the extent that the review provides like the Amazon store”. The I’m Done Writing but- detailedinformationabouttheproduct,where7meansthat ton displayed our document-level feedback under the text the review elaborates on all or almost all of the specifica- field; for users in the segmentation condition, low quality tionsofaproductwhile1meansthatitstatesanopinionbut segmentswerehighlightedredandtherelatedfeedbackdis- fails to provide factual details (e.g., laptop specifications). playedwhenusershoveredoverthesegment. Wethengave Subjectivityistheextentthatthereviewisfairandbalanced participants the opportunity to revise their review; to avoid but with enough helpful opinions for the buyer to make an bias,wenotedthattheywerenotobligatedto. Atthispoint, informed decision: 1 means the review is an angry rant or userscouldclicktheRecompute Text Feedbackbut- lacks any opinions while 7 means it is a fair and balanced ton(median1click/participant),orpressSubmittosubmit opinion. Readabilityistheextentthatthereviewfacilitates and finish the task. We used a post-study survey to collect or obfuscates the writer’s meaning. For instance, a review demographicinformationaswellastheirsubjectiveexperi- thatconsistsofmanyambiguousphraseslike“Ihavenever ence. doneanythingcrazywithitanditstillworks.” isassigned1 The interface was the same for all conditions—only the asitmightrequiremultiplereadingstounderstand. Overall feedback content changed. The final submission was con- Qualityistheholistichelpfulnessofthereviewforprospec- sidered the post-feedback review, and the initial submis- tivebuyers. sionuponpressingtheI’m Done Writingwasthepre- feedbackreview. TheexperimentwasIRBapproved. Analysis Figure 4 plots the mean change and 95% boostrap confi- Results dence interval for the four rubric scores. Figure 5 shows asimilarchartforthecoder’ssubjectiveopinionoftheim- ReviewEvaluation: 81ofthe85participantscompletedthe provement. Theseplotsshowtheeffectsizeacrossallmea- reviewwritingtask.Threeindependentevaluatorscodedthe sures, and that the largest improvements were due to the preandpost-feedbackreviewsusingarubricbasedonprior combinationofsegmentationandperturbation-basedexpla- work on review quality (Connors, Mudambi, and Schuff nation. Wenowfocusouranalysisonthechangeinoverall 2011; Mudambi and Schuff 2010; Liu et al. 2007). The qualitymetric(right-mostfacetFigure4). Analagousstatis- rubricratedreviewsona1-7Likertscaleusingthreespecific ticaltestsonthedatainFigure5producedthesameconclu- aspects—informativity,subjectivity,readability—aswellas sions. a holistic overall score. The change in these measures be- A one-way ANOVA first showed that the four condi- tweenpreandpost-feedbacksuggeststheutilityofthefeed- tions had a significant influence on the overall change back. in quality (F(3,77)=7.11,p<1e−4). Using Tukey’s HSD Finally, we asked reviewers to subjectively rate their post-hoc test to compare the individual conditions, we agreementfrom1-7tothestatement“Thepost-feedbackre- found that the pairwise comparisons between the Di- visions improved on the pre-feedback review.”, or 0 if the alectic condition (µ=0.55,σ=0.51) and the other three review did not change. Each measure is the average of the conditions Doc+Perturb (µ=0.23,σ=0.34), Seg+Krause ratings from two coders—if they differed by > 3, a third (µ=0.025,σ=0.26), Doc+Seg (µ=0.14,σ=0.51) were signif- expertcoderwasusedasthetiebreakeranddecidedthefi- icant (p<.05). However, all pairwise comparisons between nal value. The third coder was trained by being shown the thelatterthreeconditionsdidnotshowsignificance. Amazon review corpus, examples across the quality spec- We then performed a two-way ANOVA using over- trum,andtheothertwocoders. Thecoderslabeledreviews all quality increase as the dependent variable, and per- turbation and segmentation as the independent variables. The most widespread criticism regarding Dialectic The ANOVA found a significant effect for perturba- was after users had edited their reviews after the first tion (F(1,80)=9.66, p=0.0026<.005). We can see this round of feedback and then pressed Recompute Text in Figure 4: Dialectic outperformed Seg+Generic, and Feedback a second time. If the edits did not address the Doc+Perturb outperformed Doc+Krause. On the other predicted issues, Dialectic would generate the same feed- hand, segmentation alone did not have a significant effect back, which was not productive. For example, one user (F(1,80)=2.21,p=0.14). Finally,interactioneffectsbetween wrote: “Initially it seemed interesting but even after edit- segmentation&perturbationweresignificant(F(1,80)=5.75, ing... nothingchangedwhenIresubmitted.’ Thiswasunder- p=0.019<.05) which can be seen from the Tukey HSD test standable given that we only used four simple explanation where adding segmentation to Doc+Perturb greatly im- functions. Extensionsthattakeintoaccountpriorfeedback proved overall review quality and adding segmentation to mayhelpamelioratethesecomplaints. Doc+Krausehadanegativeeffect. Finally, we found that a few (N=3) users were reluctant to trust computer-generated feedback; one stated that they In summary, combining segmentation and perturbation- didn’t make changes because “It seemed like the feedback based explanation outperformed all other conditions by a wasjustanautomatedresponse.”Creatingmorecustomized statistically significant margin. Controlling for the other andhuman-likefeedbackisapromisingareaoffuturework. variable, perturbation-analysiswasshowntocauseastatis- ticallysignificantimprovementinimprovement, whileseg- ConclusionandFutureWork mentation did not. However, the effect sizes of both vari- ables in isolation were relatively small compared to the ef- Thispaperpresentedthedesign,implementationandevalu- fect of combining them; Dialectic, which combines seg- ation of an extensible feedback system for user text inputs. mentation and perturbation-analysis, improved the overall We demonstrate that by combining segmentation, classifi- measure(right-mostfacet)bynearly3.9×overthebaseline cation, and explanation techniques, it is possible to create (0.55 vs. 0.14 increase), and a 2.4× improvement over the automatedinterfacesthatimprovethequalityofusergener- next-best Doc+Perturb condition. Significant interactions atedsocialcontent. Moreover,wedemonstratethatthecre- effects suggest this to be due to a co-dependency between ationofsuchinterfacescanstreamlinedtorequireminimal segmentation and perturbation methods; granular feedback effortonthepartofdevelopers. Throughacrowdstudy,we is not useful when a model does not provide complex in- findthatDialecticisabletoimprovethetotalwritingquality sights into each segment, and the perturbation-analysis in- scoreoflaptopreviewsby14.4%,over3×astate-of-the-art sights are less useful when constrained to the document system. level. Allfeedbackwasgeneratedwithinonesecond. ThoughDialecticdemonstratesthefeasabilityofsuchau- tomatedinterfaces,italsorevealsseveralareasofimprove- ment. Due to a small (4) number of explanation functions, Participant Feedback: The post-trial questionnaire asked studyparticipantsfoundthatrepeatedlyusingthesystembe- participants to indicate their level of agreement with a few gan to provide redundant feedback; simplifying the devel- statements regarding their experience on a 7-point Likert opmentofmoreexplanationfunctionsmayhelpthesystem scale (1 - strongly disagree, 7 - strongly agree). Par- produce more nuanced feedback. We also used document- ticipants agreed with the statement “The interface was qualitylabelstotrainthesegmentclassifier. Weshowedthis easy to use” for both the baseline Doc+Krause (µ=5.81, tobesufficientbytestingoncrowd-sourcedlabels;however σ=1.33) and Dialectic (µ=5.82, σ=1.18) conditions. This more sophisticated techniques to classify segments could suggests that the interface design facilitated the partic- improvefeedback. Finally,wehopetoexploreapplications ipants’ writing. However, in response to “Using the ofDialectictodifferentsocialmediadomainsanddifferent interface improved my review,” there was a statistically usercontexts. significant (t(39)=2.998,p=0.027<.05) difference between Inthelongterm,weenvisionDialecticasanexampleof Doc+Krause (µ=3.81, σ=1.86) and Dialectic (µ=5.00, automaticallyapplyinginput-sideoptimizations(i.ewriting σ=1.41),suggestingthatDialectic’ssegment-orientedfeed- feedback)basedondownstreamapplicationneeds(i.equal- back contributed to improved writing. Regarding the tar- ity reviews). In many cases, input-side optimizations can getedandsegmentlevelfeedbackofDialectic,userspraised improve the quality of the application or drastically reduce itformakingthemreconsiderspecificaspectsoftheirwrit- costsofdownstreamprocesses. Forinstance, entityresolu- ing; e.g: “the feedback I got was over parts where I wrote tion(GetoorandMachanavajjhala2012)isoftenperformed [sic]alot...itseemedlikeunnecessaryfluffaftergivingita by social media services at a cost quadratic with respect to second read when it was highlighted in the feedback, and the dataset size. If data is de-duplicated during input, we I ended up taking some parts of it out.” Conversely, when can eliminate such expensive procedures. We hope to ex- asked to describe the suggestions from Doc+Krause, one ploreothermethodsofaugmentingtheinput-sideupstream user wrote “It was general feedback nothing specific,” and infuturework. anothersaidtheywere“notsureofwhatneededtochange.” Overall, these comments suggest that there is value in pro- References vidingsegment-orientedfeedbackandthatexplanationfunc- tions helped produce more actionable feedback by taking Agichtein,E.;Castillo,C.;Donato,D.;Gionis,A.;andMishne,G. intoaccountmultiplefeaturesaswellastheinputtext. 2008. Findinghigh-qualitycontentinsocialmedia. InWSDM. Amazon. 2016. Amazon: Community guidelines. Hearst, M. A. 1997. Texttiling: Segmenting text into multi- https://www.amazon.com/gp/help/customer/ paragraphsubtopicpassages. InComputationalLinguistics. MIT display.html?nodeId=201929730. Press. Anderson,T.2004.Teachinginanonlinelearningcontext.volume Hyönä,J.;LorchJr,R.F.;andKaakinen,J.K.2002.Individualdif- 273. ferencesinreadingtosummarizeexpositorytext: Evidencefrom Antwarg,L.;Lavie,T.;andRokach,L.2012.Highlightingitemsas eye fixation patterns. volume 94, 44. American Psychological meansofadaptiveassistance. InBehaviorandInformationTech- Association. nology. Ipeirotis, P. 2016. Fix reviews’ grammar, improve sales. Archak, N.; Ghose, A.; and Ipeirotis, P. G. 2011. Deriving the behind-the-enemy-lines.com/2011/04/want-to- pricingpowerofproductfeaturesbyminingconsumerreviews. In improve-sales-fix-grammar-and.html. ManagementScience. INFORMS. Ji-Wei Wu, J. C. T. 2011. An efficient linear text segmentation Attali, Y., and Burstein, J. 2004. Automated essay scoring with algorithmusinghierarchicalagglomerativeclustering. InCIS. e-rater(cid:13)R v. 2.0. In ETS Research Report Series. Wiley Online Kelly,R. 2011. rfk/pyenchant. Library. Kim,S.-M.;Pantel,P.;Chklovski,T.;andPennacchiotti,M. 2006. Bakshy,E.;Karrer,B.;andAdamic,L.A. 2009. Socialinfluence Automaticallyassessingreviewhelpfulness. InACL. andthediffusionofuser-createdcontent. InEC. Kim, J. 2012. The institutionalization of youtube: From user- Biran, O., and McKeown, K. 2014. Justification narratives for generatedcontenttoprofessionallygeneratedcontent. InMedia, individualclassifications. InAutoML. Culture&Society. SagePublications. Blei, D.M.; Ng, A.Y.; andJordan, M.I. 2003. Latentdirichlet Krause,J.;Perer,A.;andNg,K.2016.Interactingwithpredictions: allocation. InJMLR. Visualinspectionofblack-boxmachinelearningmodels. InHCI. Boomerang. 2016. Respondable: Personal ai assistant for writ- Krause,M. 2015a. Bull-o-meter: Predictingthequalityofnatural ing better emails. http://www.boomeranggmail.com/ languageresponses. InHCOMP. respondable/. Krause,M. 2015b. Amethodtoautomaticallychoosesuggestions Bosu, A.; Corley, C. S.; Heaton, D.; Chatterji, D.; Carver, J. C.; to improve perceived quality of peer reviews based on linguistic andKraft,N.A. 2013. Buildingreputationinstackoverflow: an empiricalinvestigation. InMSR. features. Connors, L.; Mudambi, S. M.; and Schuff, D. 2011. Is it the Kulik, J.A., andKulik, C.-L.C. 1988. Timingoffeedbackand review or the reviewer? a multi-method approach to determine verballearning. theantecedentsofonlinereviewhelpfulness. InSystemSciences Kulkarni,C.E.;Bernstein,M.S.;andKlemmer,S.R. 2015. Peer- (HICSS), 2011 44th Hawaii International Conference on, 1–10. studio:Rapidpeerfeedbackemphasizesrevisionandimprovesper- IEEE. formance. InACM. Deterding,S.;Dixon,D.;Khaled,R.;andNacke,L. 2011. From Lei,T.;Barzilay,R.;andJaakkola,T.S.2016.Rationalizingneural game design elements to gamefulness: defining gamification. In predictions. InEMNLP. MindTrek. Li, B.; Ghose, A.; and Ipeirotis, P. G. 2011. Towards a theory Farra,N.;Somasundaran,S.;andBurstein,J. 2015. Scoringper- modelforproductsearch. InWWW. suasiveessaysusingopinionsandtheirtargets. InNAACL. Liu, J.; Cao, Y.; Lin, C.-Y.; Huang, Y.; and Zhou, M. 2007. FoxType. 2016. Writesmarteremails. foxtype.com/. Low-quality product review detection in opinion summarization. Getoor,L.,andMachanavajjhala,A. 2012. Entityresolution: the- InEMNLP-CoNLL. ory,practice&openchallenges. VLDB. Liu, Y.; Huang, X.; An, A.; and Yu, X. 2008. Modeling and Ghose, A., and Ipeirotis, P. 2009. The economining project at predicting the helpfulness of online reviews. In IEEE Computer nyu:Studyingtheeconomicvalueofuser-generatedcontentonthe Society. internet. In Journal of Revenue & Pricing Management. Nature Loria,S. 2014. Textblob:SimplifiedtextprocessingÂu˝. PublishingGroup. Madnani,N.,andCahill,A.2014.Anexplicitfeedbacksystemfor Ghose, A., andIpeirotis, P.G. 2011. Estimatingthehelpfulness prepositionerrorsbasedonwikipediarevisions. InNAACL. andeconomicimpactofproductreviews:Miningtextandreviewer characteristics. InTKDE. IEEE. McAuley, J.; Pandey, R.; and Leskovec, J. 2015. Inferring net- worksofsubstitutableandcomplementaryproducts. InKDD. Ghose, A.; Ipeirotis, P.G.; andSundararajan, A. 2007. Opinion miningusingeconometrics:Acasestudyonreputationsystems.In Microsoft. 2016. Checkspellingandgrammarinoffice2010and ACL. later. support.office.com. Ghosh,A. 2012. Socialcomputinganduser-generatedcontent: a Minqing Hu, B. L. 2004. Mining opinion features in customer game-theoreticapproach. InACMSIGecomExchanges. ACM. reviews. InAAAI. Gilbert,C.H.E. 2014. Vader: Aparsimoniousrule-basedmodel Misra,H.;Yvon,F.;Cappé,O.;andJose,J. 2011. Textsegmenta- forsentimentanalysisofsocialmediatext. tion: Atopicmodelingperspective. InInformationProcessing& Google. 2016. Check spelling and grammar in google docs. Management. Elsevier. support.google.com/docs/answer/57859. Muchnik,L.;Aral,S.;andTaylor,S.J.2013.Socialinfluencebias: Guy, I. 2015. Social recommender systems. In Recommender Arandomizedexperiment. InScience. AmericanAssociationfor SystemsHandbook.Springer. theAdvancementofScience. Guyon, I.; Weston, J.; Barnhill, S.; and Vapnik, V. 2002. Gene Mudambi, S. M., and Schuff, D. 2010. What makes a helpful selection for cancer classification using support vector machines. review? a study of customer reviews on amazon. com. In MIS InMachinelearning. Springer. quarterly.