Table Of ContentResponding to Retrieval: A Proposal to Use Retrieval
Information for Better Presentation of Website Content
CRavindranathChowdary AnilKumarSingh AnilNelakanti
IIT(BHU) IIT(BHU) IIT (BHU)
Varanasi,India221005 Varanasi,India221005 Varanasi,India221005
rchowdary.cse@iitbhu.ac.in nlprnd@gmail.com anil.nelakanti@gmail.com
5
1 ABSTRACT of load that was met by computationally powerful servers.
0 Retrieval and content management are assumed to be mu- Thebiggerchallengewastoorganizeandmakeavailablethe
2 tually exclusive. In this paper we suggest that they need hugeamountofinformationinareadilyconsumablemanner.
not beso. In theusual information retrieval scenario, some This required the third entity of retrieval systems. What
n
informationaboutqueriesleadingtoawebsite(dueto‘hits’ essentially wasatwo-way transaction betweenthehost and
a
or‘visits’)isavailabletotheserveradministratorofthecon- the client has become three-way with an IR system in the
J
cernedwebsite. Thisinformationcanusedtobetterpresent middle.
9
the content on the website. Further, we suggest that some
1
moreinformationcanbesharedbytheretrievalsystemwith Clients are served byhosts, a relation facilitated by IRsys-
the content provider. This will enable the content provider tems. However, current day IR systems are more than just
]
R (any website) to have a more dynamic presentation of the organizer ofweblinks. Theymodeluserchoicesandprefer-
content that is in tune with the query trends, without vio- ences to serve them better. We argue that the three-entity
I
. lating the privacy of the querying user. The result will be unitoftheclient,IRsystemandthehostisgreaterthanthe
s
c abettersynchronizationbetweenretrievalsystemsandcon- sumofitsparts. Therelationbetweenthesethreeentitiesis
[ tentproviders,withthepurposeofimprovingtheuser’sweb ignoredbythecurrentweb-servicearchitecture. Wepresent
search experience. This will also give the content provider hereaproposalwhichwillexploitthisrelationshiptobetter
1 a say in this process, given that the content provider is the deliver some aspects for web service usage.
v
one who knows much more about the content than the re-
9
trieval system. It also means that the content presentation Web designers write content on the pages based on the in-
0
maychangeinresponsetoaquery. Intheend,theuserwill formation provided by the owner of the site. Content in a
5
beabletofindtherelevantcontentmoreeasilyandquickly. website is primarily organized based on the categorization
4
of the information and arranged appropriately by the de-
0
General Terms signer. Inthisentireprocessofthecurrentdesignparadigm,
.
1 thequery hasnorole toplayduringthedesign orpresenta-
ContentManagementSystems,PersonalizedContent,Infor-
0 tion phases of a website. But, when a search query is given
mation sharing
5 to an IR system, it retrieves links of pages that are pre-
1 pared without taking query into consideration on the host
1. INTRODUCTION
v: side. Thisis becauseretrieval and contentmanagement are
Information retrieval (IR) systems have become integral to
i considered mutuallyexclusive, that is, thecontentmanage-
X daily activities of millions and will retain their prominence ment system does not knowabout the retrieval system and
in years to come. One of the reasons for such importance
r theretrievalsystemdoesnotknowhowthecontentprovider
a of a good IR system is the amount of data that is available may respond to the query. Due to this shortcoming, both
on the web and the pace at which it is increasing. The
the content provider and IR system are under performing.
numberofwebsitesreportedlyincreasedfromonein1991to
In this paper, we try to address this issue by proposing an
1
morethanonebillion inSeptember2014 . Simultaneously,
architecture that enables the server hosting the website to
therewas anincreasing numberofusersavailing thehosted
presentcontentthatisbasedonthequeryposedbytheuser.
services. This increase in web usage is more than an issue
1http://www.internetlivestats.com/total-number-of- 2. THE PROPOSEDARCHITECTURE
websites/ on 27/10/2014
The outline of the architecture we propose is presented in
Figure 1. The scenario is that the user starts a retrieval
systemandgivesaquery. Theretrievalsystempresentsthe
search results to the user. Out of them, the users selects
one and is taken to thedestination website. Whenthe user
is taken to that website, the retrieval system also shares
some information about the user and the query (subject to
privacyrequirements: seeSection5)withtheserverhosting
thewebsite. Thehostserverusesthisinformationtopresent
the content such that the user might have a better search
Feedback-based
User's IR Results
selection Anonymized Content
user attributes request
Content
Query Host
User IR System Management
User stay Server
System
based
feedback
Content
request
Query Dependent
User stay
Content
Host response
Figure 1: The proposed architecture for more responsive IR and CMS systems
experience (see Section 3). This presentation might, for ex- servermaypresentalistofoldactionmovies(couldbefrom
ample,makeiteasierfortheusertofindcertainthings. The otherpagesofthehostserver)toAandalist ofnewaction
host server will then provide feedback to the retrieval sys- movies to B, in both case in addition to the content at the
tem (again subject to privacy requirements) based on the Link1.
user’s stay in thewebsite and theuser’s activity duringthe
stay. Since the information shared with the host server is Forgettingmaximumbenefitfromthiskindofarchitecture,
anonymized, so will be the feedback given to the retrieval thecurrentContentManagement Systems(CMS) likeDru-
system. The retrieval system will now use this feedback to pal,Joomla, Djangoetc. mayhavetoberedesignedtotake
givebetterresultsinthefuture(seeSection4). Theoverall user queries into account for presenting the final web page
result will be better synchronization between the retrieval tobeshowntotheuser. Thiswillallowthehostserver(and
system and the host server for the purpose of presenting theCMS)toplayanactiveroleintheprocessofcontentre-
better results to the user. Anonymization, opt-out option trieval. Sincethecontentproviderknowsmuchmore about
andcustomizationwillbethecentralrequirements,enforced the content than the retrieval system, all that knowledge
throughaprotocol(seeSection6),topreventanyabusethat could be used to present dynamic query-aware content to
can result from sharing theinformation. theuser.
4. FEEDBACK-AWARE RETRIEVAL
3. QUERY-AWARECONTENTPRESENTA-
Aclassicalorbare-bonedretrievalsystem[2]onlytakesinto
TION
account thequery for retrieval. Some modern retrieval sys-
Current state of the art Web servers do not take the query tem go further and use the information available about the
into consideration while presenting the content to the user. user for personalizing the results. However, they do not
Lotofworkhasbeenreportedonimprovingthearchitecture take into account the user’s activity once the user has se-
ofWebserversforvariousapplications[1,6,4]. Manymod- lected and visited one of the web pages. In the proposed
els are available to compare thearchitectures of the servers architecture, anonymized information about the user’s ac-
[3]. [7] discusses improving the performance of websites by tivitywillbemadeavailabletotheretrievalsystem. Itwill,
using edge servers in Fog Computing Architecture. To the thus,bepossibletodesignalgorithmsthattakethisactivity
best of our knowledge there is no attempt to use the query into account. Some work in this direction was proposed by
by thehost serverto present thecontent totheuser. [5].
Ifahostservercanpresentthecontenttotheuserbasedon The details about this activity might include information
thequery,thenitwillbebeneficialtoboththeuserandthe such as the other links on the website that the user clicked
hostorganization. SupposethatuserAgivesquery“popular on and the total time that the user spent on the website
movies in action genre + old” and that B gives “popular andonvariouspages. Aretrievalsystemmadeawareofthe
movies in action genre + latest”. Let us assume that both feedback from the host server should, intuitively, perform
the users get Link1 as their first link. We propose that better.
thehostserverofLink1shouldpresentdifferentcontentsto
each of them based on their query. In this case, the host Somemodernretrievalsystemsalsoprovideadditionallinks
aspartofthesummary‘snippet’whilepresentingtheresults Asthisproposalisworkedoutinmoredetailinfuturework,
ofretrievaltotheuser. Suchsnippetscanbebetterprepared more such requirements might be identified and will also
with thesuggested feedback from thehost server. haveto beaddressed.
Additionally, and importantly, the host server can provide Evenafteraddressingtheseissues,oneconcernstillremains
extrainformation aboutitscontentaspartof thefeedback. regarding the proposed architecture. Even if the shared in-
This extra information will be based on the query and the formation isanonymizedandthehostserverdoesnotknow
knowledgeofthecontentthatisavailabletothehostserver. theidentityof theuser, theretrieval system may still know
This will allow the content provider to have a say in the theidentityandbeabletoconnecttheactivityoftheuseron
presentation of the snippet for the concerned website. The the visited website with the user’s identity. This raises the
retrieval system may or may not use this information, de- question whether the retrieval system will come to acquire
pendingontheretrievalandsnippetpreparationalgorithm. moreknowledgeabouttheuserthaniswarranted. Thismay
be a problematic ethical issue and requires further investi-
gation.
5. PRIVACYAND CUSTOMIZATION
Ourproposalrequirestheretrievalsystemtosharesomein-
6. RETRIEVAL RESPONSE PROTOCOL
formationabouttheuserandthequerywiththehostserver.
There are many different kinds of retrieval systems. Sim-
Italsorequiresthehostservertoprovidefeedbacktothere-
ilarly, there are many different kinds of host servers and
trieval system based on the user’s stay in the website after
content management systems. If there is to be a flow of in-
theuserselectedthewebsitefromtheretrievalresults. This
formation between them as suggested in the preceding sec-
extra sharing of information immediately raises the ques-
tions, then it will have to be precisely regulated so that it
tionsofprivacy. Ifourproposalisimplemented,itsdetailed
ispossible toimplementsystemswithoutanyconflict. This
version will need to include stringent requirements to ad-
will require a well-defined and well-designed protocol. We
dress all the possible privacy concerns. We list below some
call thistheRetrieval ResponseProtocol (RRP).
of theserequirements:
TheRetrievalResponseProtocolwillregulatetheflowofin-
• The first such requirement is that the user’s identity, formation betweentheretrievalsystemandthehostserver.
The protocol will be used to initiate, maintain and close a
even if known to the retrieval system, will not be re-
retrieval session. Assoon as theuser select one result from
vealed to the host server. Whatever information is
the results provided by the retrieval system in response to
shared with the host server will have to be strictly
theuserquery,aretrievalsessionwillbeinitiated. Theend-
anonymized so as not toreveal theuser’s identity.
ing of the session will perhaps have to be timeout based as
• The second requirement is that only the relevant in- there is no other way to know when the user has left the
website.
formation will be shared. If we view this information
asalistofattribute-valuepairs,thenonlythatsubset
Duringthetimethesessionisalive,theretrievalsystemwill
of attribute-value pairs will be shared with the host
firstsharetheinformationabouttheuserandthequerywith
server that the host server needs to know in order to
thehostserver. Afterthat,basedontheuser’sactivity,the
betterpresent its content.
hostserverwillprovidethefeedbacktotheretrievalsystem.
• The third requirement is that an opt-out option will All the activity during this session will be subject to the
privacy and customization requirements and the protocol
beavailabletoboththeuserandthehostserver. The
design will havetotake thisinto account.
user will be made aware of the sharing of information
and the user will decide whether this sharing is to be
The protocol will have to be designed to regulate this re-
allowed or not. The information will be shared only
trievalsession. Weleavethedesignofthisprotocolforfuture
if the user explicitly agrees to it. In the default case,
work.
there will be no sharing. Similarly, the host server
willdecidewhethertoprovidefeedbacktotheretrieval
7. CONCLUSION
system or not and thedefault will bethelatter.
Inthecurrentinformationretrievalparadigm,thehostdoes
• The fourth requirement is that both the user and the notusethequeryinformationforcontentpresentation. The
host server must be able to customize sharing of in- retrievalsystemdoesnotknowwhathappensaftertheuser
formation. If they decide to share information, they selects a retrieval result. And the host also does not have
will further be given the option to select the specific access to the information which is available to the retrieval
attributesthattheyarewillingtoshare. Forexample, system. We presented the outline of an architecture that
iftheretrievalsystemknowsabouttheuser’slocation, addressestheseissues. Theaimistoprovideabettersearch
age,genderandlanguage,thentheusermaydecideto experience to the user through better presentation of the
share only location and language. contentbasedonthequeryandbetterretrievalresultsbased
onthefeedbacktotheretrievalsystemfromthehostserver.
• Theuserwillhavetobeinformedthattheactivityon The retrieval system will share some information with the
thevisitedwebsitemaybeusedforprovidingfeedback host serverand thehostserverinturnwill providerelevant
to the retrieval system. And theuser will then decide feedback to the retrieval system based on the user’s stay in
whether and what part of the activity on the website thewebsite. Thehostusesallthequeryrelatedinformation
canbeusedtoprovidefeedbacktotheretrievalsystem. for dynamic content presentation. This revised paradigm
forinformationretrievalalsointroducestheissuesofprivacy
which will have to be addressed stringently. It also needs a
newprotocolforcontentretrievalresponse,whichwebriefly
described. Thisprotocolwillregulatetheflowofinformation
between the retrieval system and the host server subject to
theprivacy and customization requirements.
8. REFERENCES
[1] AliBegen,TankutAkgul,andMarkBaugher.Watching
videoovertheweb: Part 1: Streamingprotocols. IEEE
Internet Computing, 15(2):54–63, March 2011.
[2] Sergey Brin and Lawrence Page. Theanatomy of a
large-scale hypertextualweb search engine. In
Proceedings of the Seventh International Conference on
World Wide Web 7, WWW7, pages 107–117,
Amsterdam,The Netherlands,The Netherlands,1998.
Elsevier Science PublishersB. V.
[3] AshifS. Harji, Peter A.Buhr, and Tim Brecht.
Comparing high-performance multi-coreweb-server
architectures. InProceedings of the 5th Annual
International Systems and Storage Conference,
SYSTOR’12, pages 1:1–1:12, New York,NY,USA,
2012. ACM.
[4] Raoufehsadat Hashemian, Diwakar Krishnamurthy,
Martin Arlitt,and Niklas Carlsson. Improvingthe
scalability of a multi-core web server.In Proceedings of
the 4th ACM/SPEC International Conference on
Performance Engineering, ICPE ’13, pages 161–172,
NewYork,NY,USA,2013. ACM.
[5] XuehuaShen,Bin Tan, and ChengXiang Zhai.
Context-sensitiveinformation retrieval using implicit
feedback.In Proceedings of the 28th Annual
International ACM SIGIR Conference on Research and
Development in Information Retrieval, SIGIR’05,
pages 43–50, New York,NY,USA,2005. ACM.
[6] Bryan Vealand AnnieFoong. Performance scalability
of a multi-coreweb server. In Proceedings of the 3rd
ACM/IEEE Symposium on Architecture for Networking
and Communications Systems, ANCS ’07, pages 57–66,
NewYork,NY,USA,2007. ACM.
[7] Jiang Zhu,D.S.Chan, M.S. Prabhu,P. Natarajan, Hao
Hu,and F. Bonomi. Improvingweb sites performance
usingedge servers in fog computingarchitecture. In
IEEE 7th International Symposium on Service Oriented
System Engineering (SOSE), pages 320–323, March
2013.