2 htherto nput. It s agan, surprsng, to dscover that the onlne cost nzaton for deferrable upload under percentle chargng, even when defned over a sngle lnk fro one source to one recever only, s stll hghly non-trval, exhbtng a rch cobnatoral structure, yet never studed before n the lterature of ether coputer networkng or theoretcal coputer scence (wth an only excepton below) [5]. The only study of the onlne cost nzaton proble under percentle charges that we are aware of s a recent work of Golubchk et al. [5], whch focuses exclusvely on the sngle pont-to-pont lnk case. The onlne algorth they present, referred to as Sple Soothng here, s extreely sple, and nvolves evenly soothng every nput across ts wndow of tolerable delay for upload. Nonetheless, ths seengly straghtforward algorth s proven to approach the offlne optu wthn a sall constant under the MX odel. In ths work, we frst desgn our own onlne algorth for a sngle lnk, also adoptng the MX odel, n preparaton for the MapReduce data processng case. Based on the nsght that Sple Soothng gnores valuable nforaton ncludng the axu volue recorded so far and the current aount of backlogged data and ther deadlnes, we talor a ore sophstcated soluton, whch ncorporates a few heurstc soothng deas and s hence referred to as Heurstc Soothng. We prove that Heurstc Soothng always guarantees a copettve rato no worse than that of Sple Soothng, under any possble data arrval pattern. Theoretcal analyss shows that Heurstc Soothng can acheve a worst-case copettve rato between D+ and ( e ), where D s the tolerable delay. We further extend the sngle lnk case to a cloud scenaro where ultple ISPs are eployed to transfer bg data dynacally for processng usng a MapReduce-lke fraework. Data are routed fro the cloud user to appers and then reducers, both resdng n potentally dfferent data centres of the cloud [6]. We apply Heurstc Soothng as a plug-n odule for desgnng a dstrbuted and randozed onlne algorth wth very low coputatonal coplexty. The copettve rato guaranteed by the randozed onlne algorth ncreases fro that of Heurstc Soothng by a sall constant factor. Extensve evaluatons are conducted to nvestgate the perforance of the proposed onlne algorths. The results show that Heurstc Soothng perfors uch better than Iedate Transfer (IT), a straghtforward algorth that gnores teporal soothng. Meanwhle Heurstc Soothng also acheves saller copettve ratos than Sple Soothng does. In ost cases tested, the observed copettve rato of Heurstc Soothng s saller than.5, better than the theoretcal upper bound, and relatvely close to the offlne optu. Such superor perforance s attrbuted to less abrupt responses to hghly volatle traffc deand. Eprcal studes for the cloud scenaro further verfy the effcacy of the randozed cost reducton algorth, n ters of both scalablty and copettve rato. In the rest of ths paper, we dscuss related work n Sec. II, and ntroduce the syste odel n Sec. III. Heurstc Soothng and the randozed algorth for the cloud scenaro are desgned and analyzed n Sec. IV and Sec. V, respectvely. Evaluaton results are n Sec. VI. Sec. VII concludes the paper. II. RELTED WORK Slar to deferrng data upload to nze the peak bandwdth deand, there have been studes on schedulng CPU tasks to nze the axu CPU speed, that s closely related to the power consupton. Yao et al. [] ntally provde an optal offlne algorth, the YDS algorth, to optally nze power consupton by scalng CPU speed under the assupton that the forer s a convex functon of the latter. Bansal et al. [] further propose the BKP algorth wth a copettve rato of e, for nzng the axu speed when facng arbtrary nputs wth dfferent delay requreents, and arbtrary workload patterns. Towards new challenges brought by the prolferaton of ult-core processors, lbers et al. [3] desgn an onlne algorth for ult-processor job schedulng wthout nterprocess job graton. Bngha et al. [4] and ngel et al. [5] further propose polynoal-te offlne optal algorths, wth graton of jobs consdered. Grener et al. [6] generalze a c-copettve onlne algorth for a sngle processor nto a randozed cb α -copettve onlne algorth for ultple processors, where B α s the α-th Bell nuber. Dfferent fro the MX traffc charge odel n ths work, they focus on the total volue based energy charges coputed by ntegratng nstantaneous power consupton over te. In recent years, data centre workload schedulng wth deadlne constrants has been extensvely studed n the cloud coputng lterature. Gupta et al. [7] analyze the energy nzaton proble n a data center when avalable deadlne nforaton of the workload ay be used to defer job executon for reduced energy consupton. Yao et al. [8] tackle the power reducton proble wth deferrable workloads n date centers usng the Lyapunov optzaton approach, for approxate te averaged optzaton. few studes exst on the transfer of bg data to the cloud. Cho et al. [9] desgn a statc cost-aware plannng syste for transferrng large aounts of data to the cloud provder va both the Internet and courer servces. Consderng a dynac transfer schee where data s produced dynacally, Zhang et al. [6] propose two onlne algorths to nze the total transfer cost. Dfferent fro ths work, they assue andatory edate data upload, and adopt a total volue based charge odel nstead of the percentle charge odel. Goldenberg et al. [9] study the ulthong proble under 95-percentle traffc charges. Grothey et al. [] nvestgate a slar proble through a stochastc dynac prograng approach. They both leverage ISP subscrpton swtchng for traffc engneerng, so that the charge volue s nzed. However, data traffc n ther studes cannot be deferred. dler et al. [] focus on careful routng of data traffc between two types of ISPs (verage contract, Maxu contract) to pursue the optal onlne soluton, leadng to an onlne optzaton proble slar to the classc sk-rental proble. Golubchk

3 et al. [5] study the nzaton of transsson cost by explotng a sall tolerable delay when ISPs adopt a 95- percentle or MX charge odel, focusng on a sngle lnk only, and proposng the Sple Soothng algorth. between DC and DC, and ISP B for councatng between DC and DC3. If two nter-dc connectons are covered by the sae ISP, t can be equvalently vewed as two ISPs wth dentcal traffc charge odels. III. SYSTEM MODEL We consder a cloud user who generates large aounts of data dynacally over te, requred for transfer nto a cloud or a federaton of clouds for processng usng a MapReducelke fraework. The appers and reducers ay resde n geographcally dspersed data centres. The bg data n queston can tolerate bounded upload delays specfed n ther SL. User Locaton DC DC DC ' DC '. The MapReduce Fraework MapReduce, ntally unveled by Dean and Gheawat [], s a prograng odel targetng at effcently processng large datasets n parallel. typcal MapReduce applcaton ncludes two functons ap and reduce, both wrtten by the users. Map processes nput key/value pars, and produces a set of nteredate key/value pars. The MapReduce lbrary cobnes all nteredate values assocated wth the sae nteredate key I and then passes the to the reduce functon. Reduce then erges these values assocated wth the nteredate key I to produce saller sets of values. There are four stages n the MapReduce fraework: pushng, appng, shufflng, and reducng. The user transfers workloads to the appers durng the pushng stage. The appers process the durng the appng stage, and delver the processed data to the reducers durng the shufflng stage. Fnally the reducers produce the results n the reducng stage. In a dstrbuted syste, appng and reducng stages can happen at dfferent locatons. The syste wll delver all nteredate data fro appers to reducers durng the shufflng stage, and the cloud provders ay charge for nter-datacentre traffc durng the shufflng stage. Recent studes [], [3] suggest that the relaton between nteredate data sze and orgnal data sze depends closely on the specfc applcaton. For applcatons such as n-gra odels, nteredate data sze s uch bgger, and the bandwdth cost charged by the cloud provder cannot be neglected. We use β to denote the rato of orgnal data sze to nteredate data sze. B. Cost Mnzaton for MapReduce pplcatons We odel a cloud user producng a large volue of data every hour, as exeplfed by astronocal observatores [6]. s shown n Fg., the data locaton s ult-hoed wth ultple ISPs, for councatng wth data centers. Through the nfrastructure provded by ISP, data can be uploaded to a correspondng data centre DC. Each ISP has ts own traffc charge odel and prcng functon. fter arrval at the data centers, the uploaded data wll be processed usng a MapReduce-lke fraework. Interedate data need to be transferred aong data centers n the shufflng stage. Towards a general odel, we agan assue that ultple ISPs are eployed by the cloud to councate aong ts dstrbuted data centers, e.g., ISP for councatng DC 3 DC 3' Data Sources Mappers Reducers Fg.. n llustraton of the network for deferrable data upload under the MapReduce fraework. The syste runs n a te-slotted fashon. Each te slot s 5 nutes. The charge perod s a onth (3 days). M and R denote the set of appers and the set of reducers, respectvely. Snce each apper s assocated wth a unque ISP n the frst stage, we eploy M to represent the ISP used to connect the user to apper. ll appers use the sae hash functon to ap the nteredate keys to reducers [3]. The upload delay s defned as the duraton between when data are generated to when they are transtted to the appers. We focus on unfor delays,.e., all jobs have the sae axu tolerable delay D, whch s reasonable assung data generated at the sae user locaton are of slar nature and portance. We use W t to represent each workload released at the user locaton n te slot t. Let x d,t be a decson varable ndcatng the porton of W t assgned to apper at te slot t+d. The cost of ISP s ndcated by f (V ), where V s the axu traffc that goes through ISP at te slot t. To ensure all workload s uploaded nto the cloud, we have: x d,t, M. () D x d,t =, t. () Gven the axu tolerable uploadng delay D, the traffc V t between the user and apper s: D V t = W t d x d,t d, M. (3) Let V be the axu traffc volue of ISP, whch wll be used n the calculaton of bandwdth cost. V satsfes: V V t, t. (4) We assue that ISPs n the frst stage, connectng user to appers, eploy the sae chargng functon f ; and ISPs n the second stage fro appers to reducers use the sae chargng functon f,r. Both chargng functons f and f,r are non-decreasng and convex. We further assue that the frst stage s non-splttable,.e., each workload s uploaded

Basc Queueng Theory M/M/* Queues These sldes are created by Dr. Yh Huang of George Mason Unversty. Students regstered n Dr. Huang's courses at GMU can ake a sngle achne-readable copy and prnt a sngle copy

How Much to Bet on Vdeo Poker Trstan Barnett A queston that arses whenever a gae s favorable to the player s how uch to wager on each event? Whle conservatve play (or nu bet nzes large fluctuatons, t lacks

An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

HE DISRIBUION OF LOAN PORFOLIO VALUE * Oldrch Alfons Vascek he amount of captal necessary to support a portfolo of debt securtes depends on the probablty dstrbuton of the portfolo loss. Consder a portfolo

A Hybrd Approach to Evaluate the Perforance of Engneerng Schools School of Engneerng Unversty of Brdgeport Brdgeport, CT 06604 ABSTRACT Scence and engneerng (S&E) are two dscplnes that are hghly receptve

Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

7.5 Present Value of an Annuty Owen and Anna are approachng retrement and are puttng ther fnances n order. They have worked hard and nvested ther earnngs so that they now have a large amount of money on

Indeternate Analyss Force Method The force (flexblty) ethod expresses the relatonshps between dsplaceents and forces that exst n a structure. Prary objectve of the force ethod s to deterne the chosen set

What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

Chapter 5 Fnance The frst part of ths revew wll explan the dfferent nterest and nvestment equatons you learned n secton 5.1 through 5.4 of your textbook and go through several examples. The second part

Secton 5.4 Annutes, Present Value, and Amortzaton Present Value In Secton 5.2, we saw that the present value of A dollars at nterest rate per perod for n perods s the amount that must be deposted today