Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

International Journal of Engineering Research and Development (IJERD)

1.
International Journal of Engineering Research and Developmente-ISSN: 2278-067X, p-ISSN: 2278-800X, www.ijerd.comVolume 6, Issue 11 (April 2013), PP. 15-2215A Novel Hybrid Policyfor Web CachingSirshenduSekhar Ghosh1, Vinay Kumar2, Aruna Jain31Research Scholar, Dept. of InformationTechnology, Birla Institute of Technology, Mesra, Ranchi,Jharkhand,India2M.Tech Student, Dept. of Information Technology, Birla Institute of Technology, Mesra, Ranchi, Jharkhand,India3Associate Professor, Dept. of Information Technology, Birla Institute of Technology, Mesra, Ranchi,Jharkhand, IndiaAbstract:-Web caching is one of the most successful solutions for improving the performance of Web-based system by keeping Web objects that are likely to be used in the near future close to the clients.The core of the caching system is its page replacement policy, which needs to make efficientreplacement decision when its cache is full and a new document needs to be stored. Severalreplacement policies based on recency, frequency, size, cost of fetching object and access latency havebeen proposed for improving the performance of Web caching by many researchers. However, it isdifficult to have an omnipotent policy that performs best in all environments as each policy hasdifferent design rational to optimize different resources. In this paper, we have proposed a NovelHybrid Replacement Policy for Web Caching which is an efficient combination of two traditional well-known policies, Least Recently Used (LRU) and Least Frequently Used (LFU). We have developedexperimental codes along with trace-driven simulation and analyzed the results. Analysis of the resultsshow that our proposed Novel Hybrid Policy gives better performance characteristics over other twopolicies used individually in terms of Hit ratio and Latency saving ratio.Keywords: -Web Caching, Replacement policy, LRU, LFU, Hybrid replacement policy, Hit ratio,Latency saving ratio.I. INTRODUCTIONWeb caching is a technique which aims to reducing WWW network traffic and improving responsetime for the end users. It stores the popular Web objects already requested by users into a pool close to theclient-side to avoid repetition of requests for such objects to the original Web Servers. Web caching mechanismsare implemented at three levels: client level, proxy level and original server level as shown in Fig 1.Fig.1: Possible Web cache locationsAmongst these, proxy servers play key role between users and Web servers in reducing the responsetime and thus saving network bandwidth. Users make their connection to proxy applications running on theirhosts. The proxy connects the server and relays data between the user and the server. At each request, the proxyserver is contacted first to find whether it has a valid copy of the requested object. If the proxy has the requestedobject this is considered as a cache hit, otherwise a cache miss occurs and the proxy must forward the request onbehalf of the user to retrieve the document from another location such as a peer or parent proxy cache or theorigin server.Once the copy of the document has been retrieved, the proxy cancomplete its response to theclient. If the document is cacheable (based on information provided by the origin serveror determined from theURL) the proxy may decide to add a copy of the document to its cache. Proxy caches are often located near

2.
A Novel Hybrid Policyfor Web Caching16network gateways to reduce the bandwidth required over expensive dedicated Internet connections. Whenshared with other users, the proxies serve many clients with cached objects from many servers.Therefore, for achieving better response time, an efficient caching approach should be built in a proxyserver as it can effectively server more user requests. It takes advantage of the Web object’s temporal locality toreduce user-perceived latency, bandwidth consumption, network congestion and traffic.Furthermore, as itdelivers cached objects form Proxy Servers; it reduces external latency and improves reliability as a user canobtain a cached copy even if the remote Server is unavailable. Due to the larger number of users connected tothe same proxy server, object characteristics (popularity,spatial and temporal locality) can be better exploitedincreasing the cache hit ratio and improving Webperformance.The performance of a Web proxy caching scheme, mainly dependent on the cache replacement policy(identify the objects to be replaced in a cacheupon a request arrival) which has been enhanced by the underlyingproxy server. If a cache is full and an object needs to be stored, the policy will determine which object to beevicted to make room for the new object. Replacement policies play a key role in picking out which documentsto sacrifice. In practical implementation a replacement policy usually takes place before the cache is really full.The goal of the replacement policy is to make the best use of available resources including disk space,processing speed, server load, and network bandwidth. Therefore, we must resort to an approach, which willpredict the future users’ requests and retain in cache the most valuable objects thus improving the Web latency.Cache size is also an important factor influencing the cache performance. The larger the cache size isthe more documents it can maintain and the higher the cache hit ratio is. But, cache space is expensive and intoday’s computing world where cache size is not a big issue but some significant problems such as updating thelarge cache which involves complexity and thus results in an increase in response time. Therefore, an optimalcache size involves a tradeoff between cache cost and cache performance. Document size is also associated withcache performance. Given a certain cache size, the cache can store more small sized documents or fewer largesized documents. Maximum cacheable document size is a user-defined factor that places a ceiling on the size ofdocuments that are allowed to be stored in the cache. Furthermore, there are two others factors which arecooperation and consistency. Cooperation refers to the coordination of users requests among many proxy cachesin a hierarchical proxy cache environment. Cache consistency refers to maintaining copies of documents incache that are not outdated. There are also factors which indirectly affect proxy cache performance such asprotection copyright, which increases the complexity of proxy cache design.Motivated by the wealth of research in replacement policies of Web caching [1,2,3,4], in this paper wehave proposed a Novel HybridCache Replacement Policy combining two traditional policies LRU andLFU.These policies are particularly popular because of their simplicity and fairly good performance in manysituations. LRU policies evict the least recently referenced object first. It is designed on the assumption that arecently referenced document will be referenced again in the near future.It has the advantage that it is a verysimple policy which can be implemented to work very efficiently. But it has the handicap that it does not takeinto account the frequency and size of documents.On the other hand, LFU is a simple policy that evicts the leastfrequently referenced object first. But it has the disadvantage that documents that have been referenced veryoften in the past but they are not popular any more are maintained in cache and not evicted. This effect is called“cache pollution”. It does not take into account the document size either.In this paper we have proposed a novel hybrid combination of the above mentioned two traditionalpolicies which effectively integrate them and as well as give better response time in terms of Hit ratio andLatency saving ratio. We have also proposed a system architecture scheme deploying the Hybrid policy whichcan be easily incorporated in present Web infrastructure.The rest of the paper is organized as follows.Section 2 describes the background and related work inthis area. Section 3 describes the proposed Novel HybridCache Replacement Policy. Section 4 describes theproposed System Model and its components deploying the HybridReplacement Policy. Section 5 shows theresults and discussions. Section 6 concludes the paper showingfuture scope towards this research direction.II. RELATED WORKThe size and complexity of Web pages is increasing drastically day by day due to increasing dynamicand flashy contents. Hence it is imperative to have a combined and intelligent kind of policy for identifying thedocuments for replacement to improve cache performance.Web caching has been studied from many differentanglesby the research community over the years.All these replacement policies are mainly based on thefollowing attributes: recency, frequency, size, cost of fetching object, access latency of object, modificationtime, expiration time etc.Balamash et al. [3] give an overview of various replacement algorithms. They classify these strategiesinto different categories, namely recency, frequency, recency/frequency and function based and also conclude

3.
A Novel Hybrid Policyfor Web Caching17that GDSF outperform when cache size is small. However their survey did not provide insight on which of thesestrategies perform best, or how do these strategies compare to each other.Podlipnig et al.[4] conducted an extensive survey on Web cache replacement strategies, and classified them as(a) recency-based (b) frequency-based (c) recency/frequency-based (d) function-based and (e) randomized.Recency-based strategies were based on “least recently used” (LRU), which uses the concept of temporallocality. Frequency-based strategies were modeled after “least frequently used” (LFU) and were best suited forstatic environments. LFU-based strategies tend to suffer from cache pollution if an appropriate aging factor isnot used. Recency/frequency-based strategies consider both recency and frequency in making their decisions.Function-based strategies apply a function to candidate items and select the item for replacement based on itsvalue. Randomized strategies are those that are nondeterministic, and these can be fully random.Ghosh et al. [5] presented a very detailed survey on different Web caching and prefetching policiestowardsWeb latency reduction. They also studied various Web cache replacement policies influenced by thefactors recency, frequency, size,cost of fetching object and access latency of object. They also concluded thatHit Ratio (HR), Byte Hit Ratio (BHR) and Latency Saving Ratio (LSR) are the most widely used metrics inevaluating the performance of Web Caching.Martin et al. [6,7] used trace-driven simulations to assess the performance of different cachereplacement policies for proxy caches and utilized a trace of client requests to Web proxy in an ISP environmentto assess the performance of several existing replacement policies. The results in this paper are based on themost extensive Web proxy workload characterization yet reported in the literature.Williams et al. [8] discussed that SIZE outperforms than LFU, LRU and several LRU variations interms of different performance measures; cache hit ratio and byte hit ratio. In their experiments, they fail toconsider object frequency in decision making process.Rachid et al. [9] proposed a strategy called class-based LRU. C-LRU works as recency-based as well assize-based, aiming to obtain a well-balanced mixture between large and small documents in the cache, andhence, good performance for both small and large objects requests. The caching strategy class-based LRU is amodification of standard LRU.Triantafillou et al. [10] employ CSP (Cost Size Popularity) cache replacement algorithm which utilizesthe communication cost to fetch Web objects, object’s sizes, their popularities, an auxiliary cache and a cacheadmission control algorithm. They conclude that LRU is preferable to CSP for important parameter values,accounting for the objects sizes does not improve latency and/or bandwidth requirements, and the collaborationof nearby proxies is not very beneficial.Kumar et al. [11] proposed a new proxy-level Web caching mechanism that takes into accountaggregate patterns observed in user object requests. The proposed integrated caching mechanism consists of aquasi-static portion that exploits historical request patterns, as well as a dynamic portion that handles deviationsfrom normal usage patterns.Rassul et al. [12] present two modified LRU algorithms and compare their performance with the LRU.Their results indicate that the performance of the LRU algorithm can be improved substantially with very simplemodifications.Kaya et al. [13] evaluated an admission-control (screening) policy for proxy server caching which usesthe LRU (Least Recently Used) algorithm. The results obtained are useful for operating a proxy server deployedby an Internet Service Provider or for an enterprise (forward) proxy server through which employees browse theInternet. The admission-control policy classifies documents as cacheable and non-cacheable based on loadingtimes and then uses LRU to operate the cache. The mathematical analysis of the admission control approach isparticularly challenging because it considers the dynamics of the caching policy (LRU) operating at the proxy.Luo et al. [14] focused on making proxy caching work for database-backed Web sites. There areseveral decisions that must be made when designing a Web proxy cache, such as cache placement andreplacement. Many proposals can be found in the literature regarding these Web cache decision processes.Houtzager et al. [15] proposed an evolutionary approach to find an optimal solution to the Web proxy cacheplacement problem, while Aguilar and Leis [16] addressed the replacement problemwith cache coherency ofproxy caching.Radhika et al. [17] proposed an efficient Webcache replacement policy which uses frequency andrecency of references to an object to calculate and associate a benefit value (bValue) with the object; an objectwith the lowest bValue is chosen for eviction. The policy is simple to implement than other policies and doesnot require any parameter tuning. But their result does not show much improved than Least Recently Used(LRU) and Least Unified Value (LUV) algorithms in terms of HR, BHR and DSR. Also their proposedreplacement policy does not consider frequency of access of Web objects as well.

4.
A Novel Hybrid Policyfor Web Caching18Jiang et al. [18]proposed a novel replacement algorithm Low Inter-reference Recency Set (LIRS)which effectively addresses the limitations of LRU by using recency to evaluate Inter-Reference Recency (IRR)of accessed blocks for making a replacement decision.Later Zhanshenget al. [19]proposed a novel replacement policy that switches between LRU and LFUon runtime. The implementation of this scheme is complex which negatively affectsthe performance.Dongheelee et al.[20]proposed a replacement policy called Least Recently/Frequently Used(LRFU)which subsumes both the LRU and LFU, and provides a spectrum of block replacement policies between themaccording to how much more they weigh the recent history than the older history.Adwan et al.[21] proposed combined approach of LRU+5LFU in which weight of object play animportant role but calculating weight of an object is a complex task.As it can be observed, many Web cache replacement policies have been proposed for improvingperformance of Web caching. However, it is difficult to have an omnipotent policy that performs well in allenvironments or for all time because each policy has different design rational to optimize different resources.Moreover, combination of the factors that can influence the replacement process to get wise replacementdecision is not an easy task because one factor in a particular situation or environment is more important thanother environments. Hence, there is a need for a combined approach to efficiently manage the Web cache whichsatisfies the objectives of Web caching requirement. This is the motivation in adopting our Novel HybridCacheReplacement Policyin solving Web caching problem.III. PROPOSEDNOVELHYBRIDCACHEREPLACEMENTPOLICYFig.2:Cache division of the Proposed HybridReplacement PolicyC1=first logical section of particular size of the cacheC2=second logical section of particular size of the cache1. Begin2. If (PageRequest is private)3. Send PageRequest to original server4. Else5. While (hash_value of cache! = NULL)6. Do7. If (hash_value == URL_hash)8. Send page to client browser andplace it at the top of the cache9. If (no match found)10. Send request to original server11. If (c1! =FULL)12. Shift every item and place therecent page on top of the cache13. Else14. If (frequency of lower page in c1 > smallest frequency of c2)15. Free the page from c2 having lowest frequency and store the page c1 to that space16. Else17. Free the page coming from c1 to c218. EndWe divide the cache memory as shown in Fig.2 into two logical halves, the first half (C1)is using LRUand the second half (C2) is using LFU as replacement policybecause we always access recently used pages andthen after we calculate their times of occurrences. So, LRU policy will be used in first half (C1) and second

5.
A Novel Hybrid Policyfor Web Caching19half(C2) will be using LFU. The criteria for Web pages to migrate from first half (C1) to second (C2) is thefrequency of the last element in first half is more than the page having least frequency in second half otherwisethe page from the first half will be removed from the cache.IV. PROPOSEDSYSTEMMODEL IMPLEMENTINGTHENOVELHYBRIDREPLACEMENT POLICYFig.3: Proposed Hybrid System ModelThe proposed Caching System Modelimplementing the Novel Hybrid Replacement Policy is shown inFig.3where we are tracking both recently accessed and frequently accessedWeb pages. We have used MD-5Hashing algorithm for securing URLs in the cache so that any user could not know the URLs stored in thecache. Our first work is of securing URLs in the cache from other users so that what pages users are requestingno one knows. MD-5 is a widely used cryptographic hash function that produces a 128-bit (16-byte) hashvalue. An MD-5 hash is typically expressed as a hexadecimal number, 32 digits long. As the user makes therequest, URL will pass through MD-5 hashing algorithm. After getting 32 digits long hash value, Cache lookupwill perform and if a match occur it will be calculated as a cache hit and the page will be send to the requestedclient. If no match found miss occur and the request will be sent to the original server. After getting responsefrom original server, cache need to store the page, so URL hash value and the content of requested page will bestored in the cache.We have also removed redundancy from cache so as to protect it from flooding attack.V. EXPERIMENTALRESULTSANDDISCUSSIONSThis section presents the experimental results on the performance of the proposed Novel Hybrid Cachereplacement policy. The dataset for testing our proposed policy is obtained from an Internet café containing 10computers within BIT campus which is extremely popular among students, faculty members and staffs. We haveexecuted our program code written in C programming language with the obtained dataset. We have constructeda trace-driven simulation to study our proposed Hybrid policy using a set of Internet users’ traces. In ourexperiment the traces refer to the period from 01/March/2013:09:20:04 to 01/March/2013:21:05:02, Monday ofone busy working day. The trace is composed of 1,165,845 Web requests with total of 72 users on that day. Thesimulations were performed at different network loads.In order to evaluate our proposed Hybrid policy, we use two performance metrics Hit Ratio (HR) andLatency Saving Ratio (LSR) which are the most widely used metrics in evaluating the performance of

6.
A Novel Hybrid Policyfor Web Caching20Webcaching. HR is defined as the percentage of requests that can be satisfied by the cache. LSR is defined asthe ratio of the sum of download time of objects satisfied by the cache over the sum of all downloading time.Let N be the total number of requests (objects).δi= 1, if the request i is in the cache, whileδi= 0, otherwise.Mathematically, this can be expressed as follows:HR =δi𝑁𝑖=1𝑁(i)LSR =tiδi𝑁𝑖=1ti𝑁𝑖=1(ii), where tiis the time to download the ithreferenced object from server to the cache.A higher HR indicates the user’s satisfaction and defines an increased user servicing. On the otherhand, a higher LSR improve the network performance and reduce the user-perceived latency (i.e. bandwidthsavings, low congestion etc.).We keep track of the HTTP requests made by each computer and we find total number of hits in allthree policies (LRU, LFU, combined LRU and LFU) with 512 MB of proxy cache. To evaluate the effectivenessof theproposed policy, we have executedour program code in a system equipped with 32-bit Intel Core 2 Duo 3GHz processor, 6M cache, 4 cores with 4 GB of RAM with theproxy server dataset obtained from the cybercafé. The results we have got after implementation are better than LRU and LFU used individually in terms ofHit ratio and Latency saving ratio. It was the closest to optimum theoretical.Fig.4:Hit ratio (HR) calculated with all three policiesFig.5:Latency saving ratio (LSR) calculated with all three policiesWe have shown in Fig. 4 and 5 the results of our Hybrid policy compared to two other policies in termsof Hit ratio and Latency saving ratioas performance evaluation parameters.As time goes by in a busy workingday, our proposed Hybrid policy outperforms in comparisonwith other two policies individually in case of bothsame and large size caches.Further to elaborate with respect to HR, we observed that as the time is increasing in the busy day, theaverage Hit ratio of LRU is 0.32, LFU is 0.29 and that of our combined approach is 0.37. Concerning the LSR,00.10.20.30.40.59-10am10-11am11-12am12-01pm01-02pm02-03pm03-04pm04-05pm05-06pm06-07pmLRULFULRU+LFU00.10.20.30.40.50.69-10am10-11am11-12am12-01pm01-02pm02-03pm03-04pm04-05pm05-06pm06-07pmLRULFULRU+LFU

7.
A Novel Hybrid Policyfor Web Caching21the average Latencysaving ratio for LRU is 0.34, LFU is having 0.32 and our combined approach is having 0.42.Now if we look into the data we find that average increase in Hit ratio of our policy to other two policies isincreased by 19% and latency saving ratio is increased by 27%.VI. CONCLUSIONAND FUTURESCOPEIn recent years, Web caching has been an active area of research among the research community due toits several advantages. As the number of internet users increased exponentially, the problems of Web traffic,increasing bandwidth demand and server overloading became more and more serious. Many efforts have beenmade in this direction to improve the Web performance, such as reducing Web latency and alleviating serverbottleneck. But the existing solutions are not adequate to tackle this problem. Hence motivated by this, we haveproposeda Novel Hybrid policy for Web proxy caching by combining LRU and LFU in an efficient manner toreduce the network latency in accessing particular Web services. The performance of the cache is improved,hence the overall network traffic, user-perceived latency, network congestion and traffic willminimize.Increasing the size of the cache could improve the hit ratio, but after certain value it is not improvedmuch. To accommodate more Web documents in cache, increasing the cache size beyond the optimal value maycause increased latency of searching a Web documents in the cache. Hence average cache size is more desirable.Also highly dynamic document need not be kept in the cache, because those documents are modified frequently.This work may be extended in future by considering dynamism of the Web content.Today the available caching techniques are usually designed keeping in mind specific types of Website, thus the need of some standard benchmark cannot be overruled. In all caching schemes content replication,update propagation, and consistency management are always a concern to the management. Further there is needto develop a feedback mechanism that should be used to tune the page pre-generation process to match thecurrent system load. Designing a set of benchmark on the set of these metrics will help the designers to decidewhich caching technique can be most suitable for them. Most often a combination of one or more cachingsolutions is used for a site. Dividing the Website in components and using suitable caching technique for eachcomponent should be an ideal solution. The optimal replacement policy aims to make the best use of availableCache space, to improve Cache hit ratios, and to reduce loads on the origin Server. Further, we must resort to anapproach, which can predict the future users’ requests and retain in Cache the most valuable objects. Theperformance of the Web caching will be significantly improved by integrating Web Prefetching technique whichrefers to the process of deducing clients’ future requests for Web objects and getting those objects into thecache, in the background, before an explicit request is made for them. Our future research work is in progresstowards this direction. The research frontier in Web performance improvement lies in developing efficient,scalable, robust, adaptive, stable Web caching scheme that can be easily deployed in current and future Webinfrastructure.ACKNOWLEDGMENTSOur thanks to Biresh and Robin of the cyber café named “Technosoft” of BIT who helps towardsproviding proxy server logs of a busy working day.REFERENCES[1]. SarinaSulaiman, S.M.Shamsuddin, A. Abraham, S. Sulaiman, “Web Caching and Prefetching: What,Why, and How?”, IEEE, 2008.[2]. A.K.Y. Wong, “Web Cache Replacement Policies: A Pragmatic Approach”, IEEE Network magazine,20(1), (2006), pp.28–34.[3]. A. Balamash, M.Krunz, “An Overview of Web Caching Replacement Algorithms”, IEEECommunications Surveys & Tutorials, Second Quarter, 2004.[4]. Podlipnig, S. and Böszörmenyi, L.,”A Survey of Web Cache Replacement Strategies”, ACMComputing Surveys, Vol. 35, Ner 4, pp. 374-398, 2003.[5]. SirshenduSekharGhosh and Dr. Aruna Jain, “Web Latency Reduction Techniques: A Review Paper”,IJITNA vol.1 No.2 pp 20 – 26, September, 2011.[6]. Martin F.Arlitt and Carey L. Williamson,”Trace-Driven Simulation of Document Caching Strategiesfor Internet Web Servers Simulation”, vol.68, Jan. 1997, pp.23-33.[7]. Martin F. Arlitt, LudmilaCherkasova, John Dilley, Rich Friedrich, Tai Jin, “Evaluating contentmanagement techniques for Web proxy caches”, SIGMETRICS Performance Evaluation Review 27(4):3-11 (2000).[8]. S.Williams, M.Abrams, C.R. Standbridge, G.Abdulla and E.A.Fox, “Removal Policies in NetworkCaches for World-Wide Web Documents”, Proceedings of the ACM Sigcomm96, August, 1996,Stanford University.