you per 100 100 of the but thank you so 1st I like to thank the organizer for inviting me to this great workshops and this will be about a statistical model called the Newtown bending model in which an agency interacts with a set of probability distribution called on by sequentially something from them this month and is often used the within the reinforcement running framework In this sense as the sample collected of you and everyone else that the agents wants to maximize or equivalently we say that you want to minimize the quantity ,comma regrets and why it is request minimization problem is can be consider as solvent because of some of lower bond fund a matching there a lot of Long Island talk today about so much less than 2 problems that of best and notification in what the goal is to identify as quickly and accurately as possible the arm with the highest a within the mud and insist joint work riesling and you will present a new albums for these problems together with lender reasons that s into declaring that she is also bond and that is also very efficient in practice so 1st I will start by reminding you always explaining you when the military the bandit model years so it is simply a collection of capable of meeting traditions that we called on and then the Canadian sequentially interacts with this arm by choosing at times and number 8 seed that he wants to draw on many observers sample from The Associated Breweries year distribution so of course he is something strategy is going to be sequentially in the sense that the armed chosen at time to press 1 now depends only in some arbitrary way of the past chosen on a 1 to 18 and the best that absorb some X 1 of 2 x 2 and this is the way sample the arm will be a directed to wild good related to learning which arms of the best in the modern on our criterion for best will be at the Army's highest me especially ones were intensified on a stock that much amazing and wounding O'Brien new started the mean of this and this learning process can US-Iraq constraints and actually the 1st constraint that has been considered in the literature in the 1

02:51

regrets minimization in which the samples collected are viewed as some of our reliance on which the goal is to adjust the something strategy so that no 1 maximizing is expected some of the rewards accumulating during interaction the is equivalent to minimize as a quantity that I defined here as the regrets which is expecting difference between accumulated through what 1 could obtain if we played always the arm new stock that we of course don't know that during the the real situation minus the sum of accumulating that we want to obtain which are actually a strategy to minimizing the regret forces to realize a trade-off between the exploring environments trying a little bit all the arm to get an estimate of the Indian payout on trying to play the onset the best so far because we have this constraint of maximizing the rewards 1 leader so regionally this model arose from a simple monitoring of clinical trials of these things back to the 1970 is the work of love Thompson and as for example we can and we have a bunch of many get treatment so is associated with a boundary random variables that models the viability of the treatment present patients and that gives 1 if the treatment was successful all of the euro stationed there and the goal of course would be in the Dominican traders to maximize the number of patients still alive at the end of the of the trade but actually if you will so this would be a rephrasing of match expected some of but they should discuss with people running real clinical trial because they actually in the early stages are naturally concerned with securing patient while the Rams efficiency of the treatment and they would be it more interesting by a licensee of objective that is quickly which treatment is the best among the pool of candidates treatment because it later in the next phase is this treatment will be given to a much larger size of page so they would be more interested by this this town instigation problem that they introduced here and at a meeting more mathematically the goodies to notify quickly and is on a stock so that only the highest mean but this time without the incentive to grow arms at the higher mean so we are only looking for a strategy that obscene many explorers as the environments so the strategy as losing an important part in the strategy was standing in the way that's we choose the we want to grow at the current stage based on the previous history but it would also mean that some of stuff some random stopping time telling us when we can stop the experiments when we are convinced that we can identify the best arm after which we make against soliciting recommendation work so I guess the Eyharts for based on a study and so several doors have been considered in the literature of Slate gift to here so I 0 we're gonna fix a budgets so we know that we can only samples can be touchy times a the arms and then we want to make Europe a recommendation that is as accurate as possible so that minimizes the probability that we make a mistake or are we are we think some risk parameters that up and we want to guarantee that the recommendation we making is wrong with probability at most Delta and to which this recommendation uh we want we want need ask you as you sample from the army explicit lessening remains which recorded here that the sum but completes disease framework users can be a Gambardella clinic at trial as they said but would be making more reverence some of the market's search application in which for example of company has to decide which products and want to commercialize anti-EU goes we want students face the best product with hybrid tea the use of the company is looking to elucidate a bit of money during the grounding in older than to make it much more much more money so it is that we will focus on the fixed confidence setting and more precisely given the class of militants set efforts of placebo lamented Monday for example all the arm down the distributions we want to build a strategy that are better back on this class that is we can guarantee is that for any of militants this classy two-wheeler outputs of correct as the best on whispery the larger than the 1 and we were among back-to-back strategies that we want to minimize the sample complexity of In stock I want to tell you what's in the meaning and the expected number of samples that we need to work for a balance and reason and the answer will consist In 1st novel belongs on those it's so simple complexity and then in exhibiting and a debt that back strategy for as expected somebody complexity match so here it will be a distribution dependent global so depending on the classification of management as if we consider and we solve this problem for a particular type of 1 parameter of Monday Monday so that we call the explanation for me and then wouldn't know it from sinking 1 2 things that are always X and then OK so we

08:54

steady a class of management and which the distribution of all arms the belonged to some sets of probability distributions that are parametrized based on a real parliament the tackle the natural perimeter and whose density of the following form and so this class of distribution and if we particularities of the function here we can recover tolerant of well-known distribution like it's Melanie distribution question distribution Gaussian with and so forth and a good teacher and the user class of 1 parameter mother is that actually the distribution you think back and be also retirement tried by its because there is a one-to-one mapping between is a natural part of the debt and the means and as our promotor of interest in the budget problem is the minimum we will denote by news of so you need distribution in the class speed and that doesn't mean that you and I introduced here an important important notion to characterize the complexity of my problem in terms of the information parity quantity is here and the kid back label divergent Smith's Weiner so to apologize by demeanor the distributions so we introduced here the of new new prime as a came back later evidence between the Yining distribution of me and the union distribution of millions of new products so we can give across form for the particularity example that I mentioned and here I did it for them the dominant and so the the class of abundant model that we consider we business-class so too is invitation and I'm going to consider at Bundy program of the following farm where so each Of the our dependence on parameter I will intensify this is a set of distribution with the victim of of their means and so I would consider all the money problem for which there exists 1 arm that is strictly larger whose ministry louder than says a set of 4 management and that have you a unique opportunity OK so before

11:10

presenting the other abound on the matching algorithm I'm going to review quickly but why I said in the introduction that the regrets minimization objective is solved and we will find some inspiration about what we want to prove for the best on instigation probe so it regrets mediation be well and unimportant results that the inspectors 19 eighties is our bond that base being given by the name remains on the on the regrets and that for forms the basis of a rewriting of the regret as a function of the number of times each hour as being up to 20 so more precisely the regret is the son of a the on all so here we have this opportunity to get to the gap between the need of the best on the transmission of on a multiplied by is expected number of times that aim has been drawn up to a two-time and so have another reason with hero regrets we needed and reason for which expected number of droves of any anti-Semitism on is small and so the allowed the flying reviews give us and limits on the number of times we .period was a sum of 2 not out and that is that we need to draw an infinitely many and specific unique that's expected number of droves of visible tonight on any of the 20 U.S. entity can you know upon the availability divided by D of New Way New Stossel so here's the user can better brother virgins between the distribution of new way under this provision of the new stock at years of course we understand that this more these quantities of the larger of the 2 the honest to be drawn because we troubles to discriminate between Zia's armed groups tonight and so the flow of arms dummies to define motion of a seek to hasn't another that matches this slower Bonsall for which the expected number of rules of any 79 on a is as as security of problems and they look to embrace the right information theory sequences so could the

13:34

show that there exists such innocent particularly of the militarism that is based on the so-called UCD principle so it is a very simple algorithm that computes 1 index that homage to these arms with ideas index and the index will be some of our confidence on the meaning the Nunavut found and action in order to get this was a several UCB diversified being proposed that in order to get the asymptotic of humanity prepared 1 has to be careful in the way we knew the confidence interval and here you see we have a non-explicit and of our confidence bond that is computed using the function the that is sensitive matters of evidence in the exponential family that we consider and so these types of confidence in Thailand followed by outplaying the chairman of MFS and reliance on specific property of the expansion of but just practical to compute the index we just need to work there have so sold the function of the of the museum 3 Kenyan acts as a function of X and then threshold at some level of stimulated by the number of jobless and these new vessels or the so-called carry on using the index and so the scale agrees I'm being shown to be a reference to ticket to to a finite time analysis so we with at bonds so it was a problem in the newspaper as expected number of growth of the world to map out a is a problem in violent generated by the new a new stock plus some 2nd order term that is more than that so this problem that following we

15:22

showed that so that in the mom order consistent and reason of the units of the regretted the Mighty the is exactly equal to the sum of as the I'm off to the gap between the mean as the best of times in uniform and dividing by the humanitarian divergent between you and and so this results dates back to to 1985 because additional was allowable compilation remains actually proposed and the algorithm hasn't ability might have arisen however as algorithm was naturally explicit operatic and sold more more efficient and where proposal afterward once you have some idea is unlikely to be that are simple to implement Unocal eccentricity of 2 so far the best-known notification problems as we are looking at today I'm going to try to provide the is the 1st 2 steps and also we would see that's centering on which still be efficient to implement incorrect so I quickly remind you about the best and instigation problem so whenever I will work with somebody .period that modern parametrized venue I will assume for simplicity is that the on all there in decreasing way and there is a gap between you and and a new tool and as I told you all the other reason is made of 3 thinks there's something wrong the stopping work under the recommendation and we have to guarantee that for any millennial several energy that's uh 0 recommendation ruled out sports facility my answer is louder than minus time and we want to have a small sample complexities was more expected number of withdrawal of these so in churches we can find a lot of that that agrees for which bonds on December the complexity and events either in expectation always hyperbole to amend the existing bonds as scalar exist so that all the orders of magnitude of the symbol completed the user on 1 of their top where their ties all risk parameters multiplied by quantities that depends on the on the bonded with them and that takes the form of a sum of the arm of so that here we have the son of a on the set arm Of the square distance between the best this town and here we have the distance between the here almonds on the second-best so this looks like someone of the square gap between the news in the Delaney gave at least look like it's some Gaussian approximation of this said the function of the user collected remembered so somehow we miss quantities of information theory turns found that sets indentified and press here in the videos are allowed to offer sometimes even non-explicit consistent that are hiding and so my goal is really need to have it all belong on a problem that match up 2 constant and involves some information theoretic did so in this sense we can say is that the book to some complexity is that certain items so let's

18:49

try to lessen the neural bonds fall from these problems so we 1st introduced you some useful tools to the right along and we directly as a at the lower 1 for for this program solo albums for either regret minimization of at all based on notification realize on the so-called change of measure arguments so the idea that if we want to bombs a number of samples immediately on some specific abundant Mandela knew that we would need to find another mandate instance lined up under which the behavior of their must be In a quite different and under which the on the on needs to be To be drawn little bits and search engine distribution can be quite technical but here I propose a use for and I we derived recently to get them all in getting here that's relates in a quite explicit is risk bomb at a better time to the expected number of the of the through that connected the emergence of a economic aid under a that you are and their mother Yolanda and this inequality also for every Monday more than that that's as a different but tonight I'm compelled to the to the origin and abundant so the results that their stands and some of the arm of the expected number of Jerome misapplied information tone is no abandoned by so this quantity is a binary related entropy between data 1 Maine is that that that is roughly of father of 1 of the positions of order of 1 of of the things that we saw in the state of the art by sunlight tool explain you how to use this this result to the arrival so we will 1st try to so as we needed all brand and the number of sunburn mediated the 1st idea would be to separately bonds a number of times each on should be wrong and for example we fix here economic aid but belongs to the 2 OK so 79 on a and we want to lower bonds expected number of times this arm has been drawn to the idea here is that if we choose about amended Milan that in which only a few of these terms are non-zero will directly other along the on the expected number of draw any and so what we choose here is abundant Milan London in which sold for on a difference of a London ITV acquired you I for the corresponding information down is 0 on farm aid we moved the minuet slightly above the median of the tumor unsolicited to new ones so unknown disbanded mother along that sold tonight on is no on a whereas in the 1st half of the original but look my arm was on 1 so this condition is satisfied and many free ride this inequality for this party cannot choice of film that we get and the expected number of drove a misapplied by the of new and new I new 1 subsidiary is no longer the brighter OK and it's 1 minus their top leaders give us the following lower bond funds expected number of draws a farm and uh that's we when we take a it's even with goes to to the year so as we showing these arguments

22:39

is a failing so we have to repeat their demand for also would run out and then the losses something for the for the ultimate so we had that flying that ever get is simple complexity is no Monty Beisel positions roughly 1 of of the top as latest quantity misapplied basis complexity terms and so here we have something like looks like an equivalent of the Elaine Rabin's so long because here we would have sold the distance between new on the Tonight on here the distance between the 2 men armed and unarmed so for sometimes believe that this was the right lower development and that we could find in gruesome matching it that found that that this settlement is not tight enough and actually and we can derive

23:33

noting that no along actually it would be a streamlined proved so the idea is if we start with the very same near Mosul the various same chain of distribution actually in the previous proof which chose a specific values of London on roads the statement for value but we can in principle as the source for every hour of every minute model that has a different ultimate armed with guns and the difference is that the odds of from you as all blended might have known that that there were different to on and then it all and that the inhuman overrun that in this set of bonded mother of so this sun is smaller than the lower end of day that and so as we want to lower bonds assembled complexity we just artificially introduced it so we multiply and divide by the expectation of of and so at this stage run that's completely happy because this quantity the bans on the algorithms for expected number of rules of the island and so the ideas as to simply a problem by something that does not depend on the agrees and if we notes that this quantity uh sums to 1 they form probability victim we got problems basis supremo over on W in the simplex of space Casals doesn't sums to walk to 1 of these quantity and so we have prove in Sri Lanka as a very simple yet non-exclusive no longer on the complexity of the somebody complicity civilians that's the but complexity is lower bounded by the star of meal so some characteristic of a number of samples of the problem mich applied by 1 of the staff where soldiers far as is known explicit for and this actually reminds us from some None explicitly elements that do exist in the abundant literature so the first one given by GM in lie in 1997 was a regrets that fall very complex modern whispers about a correlation between arms it was somehow natural to have something less expensive than the line remains long while saying on results he's a recent paper from last year that studies the best amenities mediation problem but for a different class of bending so whereas we studied all that ended Monday with a single team on their where concerned with Monday in which only 1 arm is different from the others so you have 1 of the most ominous set of farms at all at the same values and so far specific class as that is different from ours they also derived some non-explicit too so even going back to offsetting what is actually very interesting with his bond is that we understood from the proof that these W any where substitute for as a proportion of droll so there was ratio Of the expected number of draw of a debate in those expecting them the dilemma of some bosses they can be viewed as a proportion of drove of on any and some of the messages so if we take that the new staff knew that you realize is that unlike here so we expect this to contains 0 Overton proportion of droves of the so at this stage you could ask me a question so can I really defined the I I never told you that these on my is unique and selects I needed to justify this definition is to show you that this is a unique and so well defined and actually has a by-products we will come up with some efficient algorithm to compute the of disease non-exclusive value so we can start by being a little bit less emissions and just fixing some of the men star in the southern Max and actually held because while working with this exponential familiar about model but we can we can try to give a bit more explicit formulation of the 2nd optimization problem here in the land that can actually an explicit calculation yelled that this is the way who would the minimum for somebody arm and saw a different of 1 of these quantity which is awaited some of you might divergent and actually if you've if we fucked W 1 in this expression and we can see that this is not simply a function of the ratio between the W a and W 1 through a function the G 8 since we defined here under the bed because tougher the revisions show you that this function GA and it is a one-to-one mapping between than a summons Nevada 0 at the new Wembley New aid between be useful in the industry quelled and to refrain is any 20 tomorrow more explicitly we are going to introduce the following the patient so we will work with excess stocks the ratio between and the value a stock and W Weinstock and so is this notation using that as the sum of the value a equals 1 and we have that to W 1 New Media 1 divided by the the sum of of the X 8 and so finally we will if we want to find those ex-staff 1st we are looking at the X 2 1 X K case Tom we see in this so these comments and so the next step is to realize that as here we have a minimum came 1 function it is easier to check that adds a lot to learn all of this came 1 function I have to take the the same values and so there exists some of the real value of the painting by away stock for which the ex-mayor of X and the attainment of peace went to work to these value and so finally had the ability innovation problem that we are solving a can completely be reduced to 0 one-dimensional optimization of problems so we just knew this defining the X fat ex-CIA as the universe function of his previously introduced function here but we have to find out why Starr that maximizes this function over doing about as you roll at the new when mute and actually it is possible to compute the theory that he was functional and solidly the equation so derivative Equus 0 we can prove that

30:54

and so on we can prove that

30:57

there is a unique points relating the on next which show us that there is a unique opportunity awaits number you stop and so on

31:06

we true love recently we have the following year the the paper said that characterize the value of their values staff new intimacies dysfunction explains introduced and so showing that since the computation of why stock a reduced to solving a real equation so F F of widely in the summer of increasing function so we can just solved this by I don't know the couldn't accept and then the evaluation of the function will be also reduced from solving it came in a smart 1 music which so in the end we have an efficient way to compute this important victory victories star of of and sold the idea Of that the reason that we propose to attain the the along will be to try to match the user disproportion and indeed that is what aura tracking that's something more easily is doing so the idea that's still a given stage of formal and the reason we have farms evict or a new hats of the on the beginning of all the answer based on the rule the rules that we have so far and so the trucking something more 1st check where is there there is an arms that have been drawn less than a square root of 50 times at 20 Is this the the case then we're going to draw such arms so this is going to foster exploration face and if all of the AMA had been drawn more than Rudy at times while going to choose the arms that maximizes the sea of W a star but course all on-duty gala of victory of empirical news minus energy and so this something is being in orders that's whispering to 1 that is a fraction of of the outbreak converge is to use the targets of opportunity you start a new and so we can see here is that genderism requires To constitutes a victim of the various stock for our current Ontario can mean so at each step of his at the end reason we're going to solve the previously described optimization problem in order to compute the way and so now

33:41

we have to do so and that the future of future actually have the best and instigation problem here that the starting work so we we should as soon as possible in order to arrive at a low simple and complex so the idea saying so the stubborn rules that we propose can simply be motivated by some of the statistical testing problem so if we introduce this year the at a summary of the likelihood ratio some here we have the maximum like he went under the constraint that isn't enough Armatrading Dodgers and the meaning of and here we have the maximum let you on those the opposite constraint and it is easy to understand that the high value of the statistics tend to reject the petition is that a new way is more than newbie and so are we will start when there is 1 arm that can be shown to be significantly larger than all the the other in the sense of the user-generated likelihood ratio the ratio of tests so all stopping time at sea that they indexed by the debt that because of course there was a moment we step depends on the risk parameter of data that will give can be rephrased the in the following way so we stopped when there exists a suggests for on the floor or all the arms as a steady seekers the ABT is larger than from threshold that they plan to that and needed is stopping rule can be traced back to work some old work by China the in 59 who was working on sequential and that's the way I put this testing but he was concerned with the finite I put this year's whereas here all I put this is continues because we want to check with some new ways resident older than the other but he order the newspaper gave us a lot of intuition on how to devise a good would strategy OK so of

35:58

course again even under are expansion familiar assumption began given explicit formulation for all this and generalized lately with the statistic showing especially that if a new ads says some Gagnon of our company is larger than the country can mean of army we have the following expression that future here so this quantity it is important so it is the weighted sum of of and you can mean so that it's like in the we have here a weighted sum of of information theory to give them and so I did and defined the stopping with user-generated generalized like with the racial tests and I've used but actually there are several possible interpretation 1 of which is related to the lower bond between give and so disturbing statistic can actually be shown to be acquired to see multiplied by the end of the month foreign under that belonged to a so ads of new had Cecil or demanded Monday in which the book our indifference from all current and guests the sum of this quantity and so we understand that if we use of a sampling strategy and sends that's here is the is the fraction of drills of on a conversion to the biggest stock solution of 4 optimization problem here we will mark and we we recover complexity attempt to stop from and so if we have a good something will and we believe this is true then we should stop when this quantity exceeded that of 1 of the best that because we wouldn't exactly where of and the complexity of the start new applied by the whenever debt but we will see that we wish only that a slightly larger than the threshold .period and I will give you hear some some

38:16

pointers on how to ensure that there is something we ought

38:22

to be far said that maybe another interesting interpretation for the starting role in some of the information theory in some of the minimum number of bids you need to call the sequence of the 0 1 1 produced by on a timely so again the origin rise lately the statistic Kaneria return using the Shannon entropy of the the distribution and in the following form where he actually and we have the a number of times we have drawn up a and B. misapplied by officers at Shannon entropy of new entity so this represents the average number of minutes and we would need to call the old the sick although the produced by both arms and here we have the number of bits mediated if we separately quoted the samples obtained from any and the samples obtained from the from the so another interpretation is that we when this quantity is significantly larger than this 1 meaning that it is really more costly to encode a and B together so that a human being should really be the so this is another interpretation and I think I once had the time to go into a into details of the probe led this there will be some useful information .period the arguments and so West we

39:57

proving that if we show at all and later on as threshold slightly larger than life 1 of the top 2 more precise ideologue of something which applied by TD Bank made then we have the data back here and the reason and the existence of this tries coming from the following a result that would prove that betting that if we have them enough of a that is smaller than the meaning of family then thought any something rule ruled the probability that there exists the surge that is the key exceeded to Cheney recommended that is that is smaller than and then there true inside shrine I have time for this but there were just try so the way we proved so usually to prove this Tennessee we use some of concentration in equation and here I I also use some changes to the measure of ideas and so the improve this uh if we introduce the CAB as the 1st time at which the disease the exceeded the threshold of lock to generated by the time the whole or we have to prove is that the probability that the is finite he's more than than that and to do so we're just using the definition of the agency as a lackey of the racial statistics so that you want to see a movie equality I mean that this likely the ratio will exceed the 2 would seem be made it and then there's the probability that the TAB is finite will be the summer for old of the expectations of the indicator of that event and then the idea is just to at alone 1 by z is divided into Baghdad that debated by Tutsis would be it's like the trade used to prove a mark of antiquities and so your your left with this quantity and then using that's new and is smaller than that you'd be here you can lower bonds and eliminate or by the value of these falls apart particular instances of new area new we

42:30

reconsider so we have the following year a quantity and then which is what will be a useful is that when we expressed here that the meeting ground so I deleted hearings about the case so there will be a son of on the placebo an old sound books sell real to a 2002 as this is going to cancel out with the likelihood that town and here we would be very happy if had here a probability density which is not the case because of the maximum here and then so information theoretic an idea that we used in the program is an existing uniform bombs over the directly would offer Of down random variable In terms of this quantity corridor Katie affixed and so it consists it is partially integrated the likely will hear clue you have X is the likelihood of the sequence of acceleration we have on the Alabama near me in you and then we integrates you on their own some simple reason distribution here true and so at all bonding as this quantity of years he said he knew area of the samples gathered from around a bag

43:52

these quantities give us something that is a prominent in distribution over the sequence of 0 1 of served and then there was a change of measure idea that comes from the fact that here we can interpret this summer as an expectation some ultimately proving its space of the indicator of these events and this at a loss as to be lesson the problem by something which is preventing the team from which he therefore there's more than 1 so this is a new idea to go to the arises is kind of proof compared to work to use to wear usual concentration iniquities

44:36

attack that I use it and you know they're in Azle and that could also be used to obtain a less precise reason OK so To summarize what's what and tourism did we put the buoying the meat and whether the guarantees a approved so there is no Michael at the track and stuff strategies to reduce the is the trappings the tracking something ruled that I presented to the idea to match the size a victim of the believes star of all humanity an tune proportion of jewels and then we use of the chairman of stepping morsel based on this generation like you the racial statistics we this this threshold and the recommendation work when we step we are going to choose that has the highest and Eureka Canyon and so we proved that the centrism and that is that back and satisfy abandoning its when Delta boasts was 0 of the expected number of samples need divided by a loved 1 of deft touch is equalized to at the start of new which was or that the characters time that appeared in the inaugural true so it is pretty easy to prove that existed when each year for almost release was the technical part to and the expectation and so we I was kinda proved that he thinks 1

46:11

sites which 1 they want to look in the papers also so it's a very short so I want to spend a few minutes on the practicality of the implication of this this Wetzel how does are on track and stubs strategy compares to the state of the art and the reason for all this problem so in order to do so I have to quickly introduced a few algorithm that can be used as for this problems and usually have 2 kinds of things as they are using an apparent lower confidence bounds so that counterparts both for using the types of algorithm for the best and education problem or if they are not using some elimination principles so you start with all the arm and then you successfully fully eliminates the arm for which were convinced that they are not specific to so the 1st underwritten by presumptive of the 1st kind it's going to carry on and you should be sweetie sibutramine incentives Eusebio agrees not to extend that there is going to be going to use the enough stock of England's gone so based on the the function again and and also a lower confidence upon under here whereas in the case of UCB we had locked him here we have something that depends on the sea and that which is the equivalent of the trick the threshold seen before so the reason samples actually to arms at neutron so examples the empirical best the on that much a season the new heights and 80 and it also some part the arm among the some of the men arm that has the largest at the confidence so here in the Army in involves using a sample and it stops whenever the on of the men is lodges and that the problems of of all the the use of so somehow win the confidence interval separated and then of course we decide for all the computers that so the other are enduring and that there according to the erasing use of the 2nd type so he maintains that sets are of the remaining arms so all the young teens erased and preceding rounds at it from we 1st drew on the arm that are in the center for the remaining comes a week on June the country continues to fall each arms the race we have we have drawn it's part-time soldiers untreated they mean is based on involves then we can compute the country can best answer when we highest country can mean a new R and also the unduly CanWest and if the best arm in significantly larger than the West Ham were going to discuss the the worst and are at all criterion to perform the elimination is the assuming to workers at KL and see reason is based on confidence and value and more precisely which eliminates the West's on is Orella confidence bounds of the undercarriage best these larger about to announce bones of the them to return worst and then we mean really remove W are from the set up and so will we stop where there is a single Illinois and in this set of remaining arms that without food additive all guests forwarded the built-in outpaced are OK so this is kind of a generate categories where you could replace the Nation step by some bozo criterion and so on .period getting we also tried to improve this procedure Europe by replacing the elimination step by it so we stopped even the likelihood statistic of the best under get best masses and antique wicker West exceed the sum of 3 shells and so this in a more explicit form this amounts to alienates the when the disease this also true for any quarter the untreated best and be equal and that the country can work through legal and this is the chairman of racing and reason because it is erasing type of algorithm that that's through these chairman of stopping so I presented here as numerical results on to download and Monday so 1 reason for arms and there a reason firearms so I leave here and there the exhibition value of the arms together with his the old team proportion of draws in each of these tumor that I also mentioned that in practice this threshold function of the expiration rate in the convenience and a lot out all sense to worry about the longer the that whereas it was proved that demand for instant hero times the cells are beat smaller than what is handled by theory but there are still a very conservative and so I would go with this choice away I implemented the truck and instead strategy as it to state of the art underwritten loss and also are improvements for the racing type of algorithm based on the chance of stopping work and what we see that compared to the state of the art there's something complexity are divided by Bates who free so there is a huge practical improvements so this is wrong for a specific value that I 0 1 but of course the transit the same principle if we take smaller venue of of their attack and an interesting phenomenon that weekend highlights is that on the 1st minutes of mother shown of racing from kind of similarly than Perkinston whereas in the sentence it death on worsened their carriers and using the switches monitors less progressed to a value so many problem and the

52:36

reason for this in in the way we believe that the reason racing type of algorithm is going to have to arms that have been drawn the same a number of times because we stopped when we stubbed out student to arm that have been growing the center of time and somehow so if we looked at the proportion of gross we would be here at 2 the equal valuing proportion whereas so is program tools as proportion of drove off 1 and 2 our Morrow separating them in the 1st program and then it's skein of normalize its racing take ferries and Solomon is going adjustment so to conclude that we approved the following fall based on instigation to become should see that as the value of the in over back and there is some of the and the needs of the ratio of the sun convicts divided by another 1 of dead so we prove we proposal attended more explicit formulation in the paper and more abundantly and that the matter as area characterization of the old tonight proportion of the values start that relies on Max here and that Thomas as to right and efficient strategy and matching the the ball so that there is plenty of future work because here than any since we propose is really isn't so we would once and a finite time analysis just like with exists for regret minimization and also we can imagine all the ways to use this knowledge of the awaits and so were we would want to combine it with all the success for a year is taking the mandatory treacherous like the user for using these around computers there's no on the use and thank you thank you thank you in the form of 2 he wrote to if In this we find that this form of the origin from the 1st online jury wrote :colon forces due the producers supplying refined performances around the leader of its support for the use of the and book so it would be summoned to manage sink so this the for digital have been proposed called explore the next Anthony would you propose to dissociate the exploration phase and the expectation face and I seem because the the 2 I mean if you want to use and renowned for best omelets invitation in the 1st phase and then to the way the best under best the end I stink you would have something in your about bonding that 1 you could device for the algorithm you would have this at the start of a new appear somewhere and I don't think that's true you would end up with the the complexity of the regret minimization problem which is much more simple and inexpensive and because of all move uh yeah I think it would be a constant factor in front of the but it could be at his 2 twice a day and actually tried and regalia with nuts results militarism between you and your wrist accounted for a great musician insisted that up to a balance exploration and exploitation as we go on way to the association you can see the humor successes so wholeheartedly to the rest of the year in the yes and this is Collender dead said this and I regret tho the observation there are somehow school your criterion is to minimize the new Star minus and knew of the on you get so meaning that yeah so I have not answer so some work in the region have considered this measure which is I agreed the different from just the privilege of Iraq better for example I don't know or lower bond forms of those simple regret it would be extremely in an interesting future workers the French if you have the floor fugitives from the heart of the the city reason why we have from the press congress several of freedom yes this is a natural way has led to elect the problem so you fix some agents the haunted stays and you're going to output an arm whose many is larger than New Star minus the so this is also a relaxation that are been considered in the literature and you can and that's the reason twin that case that I gained for the dollar bonds I wouldn't be able to derive bond that would feature that would incorporate that that only for the moment good to but yes we got this algorithm widths of problems that involve a lighter than the maximum between it's redundant and the minimum gap between the army forces and so the Agency for example refuses erasing the table from Greece and simply be well if afterward divided by its unions querulous 1 of others that I used to not stop then you stop and output whatever answer this would work as a heuristic for this this problem the company

Inhaltliche Metadaten

This talk proposes a complete characterization of the complexity of best-arm identification in one-parameter bandit models. We first give a new, tight lower bound on the sample complexity, that is the total number of draws of the arms needed in order to identify the arm with highest mean with a prescribed accuracy. This lower bound does not take an explicit form, but reveals the existence of a vector of optimal proportions of draws of the arms, that can be computed efficiently. We then propose a 'Track-and-Stop' strategy, whose sample complexity is proved to asymptotically match the lower bound. It consists in a new sampling rule, which tracks the optimal proportions of arm draws, and a stopping rule for which we propose several interpretations and that can be traced back to Chernoff (1959).