Machine Learning (Theory)http://hunch.net
Machine learning and learning theory researchFri, 24 May 2019 13:14:32 +0000en-UShourly1http://wordpress.org/?v=4.3Code submission should be encouraged but not compulsoryhttp://hunch.net/?p=11377237
http://hunch.net/?p=11377237#commentsTue, 26 Feb 2019 16:27:14 +0000http://hunch.net/?p=11377237ICML, ICLR, and NeurIPS are all considering or experimenting with code and data submission as a part of the reviewer or publication process with the hypothesis that it aids reproducibility of results. Reproducibility has been a rising concern with discussions in paper, workshop, and invited talk.

The fundamental driver is of course lack of reproducibility. Lack of reproducibility is an inherently serious and valid concern for any kind of publishing process where people rely on prior work to compare with and do new things. Lack of reproducibility (due to random initialization for example) was one of the things leading to a period of unpopularity for neural networks when I was a graduate student. That has proved nonviable (Surprise! Learning circuits is important!), but the reproducibility issue remains. Furthermore, there is always an opportunity and latent suspicion that authors ‘cheat’ in reporting results which could be allayed using a reproducible approach.

With the above said, I think the reproducibility proponents should understand that reproducibility is a value but not an absolute value. As an example here, I believe it’s quite worthwhile for the community to see AlphaGoZero published even if the results are not necessarily easily reproduced. There is real value for the community in showing what is possible irrespective of whether or not another game with same master of Go is possible, and there is real value in having an algorithm like this be public even if the code is not. Treating reproducibility as an absolute value could exclude results like this.

An essential understanding here is that machine learning is (at least) 3 different kinds of research.

Algorithms: The goal is coming up with a better algorithm for solving some category of learning problems. This is the most typical viewpoint at these conferences.

Theory: The goal is generally understanding what is possible or not possible for learning algorithms. Although these papers may have algorithms, they are often not the point and demanding an implementation of them is a waste of time for author, reviewer, and reader.

Applications: The goal is solving some particular task. AlphaGoZero is a reasonable example of this—it was about beating the world champion in Go with algorithmic development in service of that. For this kind of research perfect programmatic reproducibility may be infeasible because the computation is to extreme, the data is proprietary, etc…

Using a one-size-fits-all approach where you demand that every paper “is” a programmatically reproducible implementation is a mistake that would create a division that reduces our community. Keeping this three-fold focus fundamentally enriches the community both literally and ontologically.

Another view here is provided by considering the argument at a wider scope. Would you prefer that health regulations/treatments be based on all scientific studies including those where data is not fully released to the public (i.e almost all of them for privacy reasons)? Or would you prefer that health regulations/treatments be based only on data fully released to the public? Preferring the latter is equivalent to ignoring most scientific studies in making decisions.

The alternative to a compulsory approach is to take an additive view. The additive approach has a good track record amongst reviewing process changes.

When I was a graduate student, papers were not double blind. The community switched to double blind because it adds an opportunity for reviewers to review fairly and it gives authors a chance to have their work reviewed fairly whether they are junior or senior. As a community we also do not restrict posting on arxiv or talks about a paper before publication, because that would subtract from what authors can do. Double blind reviewing could be divisive, but it is not when used in this fashion.

When I was a graduate student, there was also a hard limit on the number of pages in submissions. For theory papers this meant that proofs were not included. We changed the review process to allow (but not require) submission of an appendix which could optionally be used by reviewers. This again adds to the options available to authors/reviewers and is generally viewed as positive by everyone involved.

What can we add to the community in terms reproducibility?

Can reviewers do a better job of reviewing if they have access to the underlying code or data?

Can authors benefit from releasing code?

Can readers of a paper benefit from an accompanying code release?

The answer to each of these question is a clear ‘yes’ if done right.

For reviewers, it’s important to not overburden them. They may lack the computational resources, platform, or personal time to do a full reproduction of results even if that is possible. Hence, we should view code (and data) submission in the same way as an appendix which reviewers may delve into and use if they so desire.

For authors, code release has two benefits—it provides an additional avenue for convincing reviewers who default to skeptical and it makes followup work significantly more likely. My most cited paper was Isomap which did indeed come with a code release. Of course, this is not possible or beneficial for authors in many cases. Maybe it’s a theory paper where the algorithm isn’t the point? Maybe either data or code can’t be fully released since it’s proprietary? There are a variety of reasons. From this viewpoint we see that releasing code should be supported and encouraged but optional.

For readers, having code (and data) available obviously adds to the depth of value that a paper has. Not every reader will take advantage of that but some will and it enormously reduces the barrier to using a paper in many cases.

Is there a need for go further towards compulsory code submission? I don’t yet see evidence that default skeptical reviewers aren’t capable of weighing the value of reproducibility against other values in considering whether a paper should be published.

Should we do less than the additive and enabling things? I don’t see why—the additive approach provides pure improvements to the author/review/publish process. Not everyone is able to take advantage of this, but that seems like a poor reason to restrict others from taking advantage when they can.

One last thing to note is that this year’s code submission process is an experiment. We should all want program chairs to be able to experiment, because that is how improvements happen. We should do our best to work with such experiments, try to make a real assessment of success/failure, and expect adjustments for next year.

]]>http://hunch.net/?feed=rss2&p=113772372FAQ on ICML 2019 Code Submission Policyhttp://hunch.net/?p=11007870
http://hunch.net/?p=11007870#commentsWed, 19 Dec 2018 21:52:12 +0000http://hunch.net/?p=11007870ICML 2019 has an option for supplementary code submission that the authors can use to provide additional evidence to bolster their experimental results. Since we have been getting a lot of questions about it, here is a Frequently Asked Questions for authors.

1. Is code submission mandatory?

No. Code submission is completely optional, and we anticipate that high quality papers whose results are judged by our reviewers to be credible will be accepted to ICML, even if code is not submitted.

2. Does submitted code need to be anonymized?

ICML is a double blind conference, and we expect authors to put in reasonable effort to anonymize the submitted code and institution. This means that author names and licenses that reveal the organization of the authors should be removed.

Please note that submitted code will not be made public — eg, only the reviewers, Area Chair and Senior Area Chair in charge will have access to it during the review period. If the paper gets accepted, we expect the authors to replace the submitted code by a non-anonymized version or link to a public github repository.

3. Are anonymous github links allowed?

Yes. However, they have to be on a branch that will not be modified after the submission deadline. Please enter the github link in a standalone text file in a submitted zip file.

4. How will the submitted code be used for decision-making?

The submitted code will be used as additional evidence provided by the authors to add more credibility to their results. We anticipate that high quality papers whose results are judged by our reviewers to be credible will be accepted to ICML, even if code is not submitted. However, if something is unclear in the paper, then code, if submitted, will provide an extra chance to the authors to clarify the details. To encourage code submission, we will also provide increased visibility to papers that submit code.

5. If code is submitted, do you expect it to be published with the rest of the supplementary? Or, could it be withdrawn later?

We expect submitted code to be published with the rest of the supplementary. However, if the paper gets accepted, then the authors will get a chance to update the code before it is published by adding author names, licenses, etc.

6. Do you expect the code to be standalone? For example, what if it is part of a much bigger codebase?

We expect your code to be readable and helpful to reviewers in verifying the credibility of your results. It is possible to do this through code that is not standalone — for example, with proper documentation.

7. What about pseudocode instead of code? Does that count as code submission?

Yes, we will count detailed pseudocode as code submission as it is helpful to reviewers in validating your results.

8. Do you expect authors to submit data?

We understand that many of our authors work with highly sensitive datasets, and are not asking for private data submission. If the dataset used is publicly available, there is no need to provide it. If the dataset is private, then the authors can submit a toy or simulated dataset to illustrate how the code works.

9. Who has access to my code?

Only the reviewers, Area Chair and Senior Area Chair assigned to your paper will have access to your code. We will instruct reviewers, Area Chair and Senior Area Chair to keep the code submissions confidential (just like the paper submissions), and delete all code submissions from their machine at the end of the review cycle. Please note that code submission is also completely optional.

10. I would like to revise my code/add code during author feedback. Is this permitted?

The detailed FAQ as well other Author and Style instructions are available here.

Kamalika Chaudhuri and Ruslan Salakhutdinov
ICML 2019 Program Chairs

]]>http://hunch.net/?feed=rss2&p=110078701ICML 2019: Some Changes and Call for Papershttp://hunch.net/?p=10858918
http://hunch.net/?p=10858918#commentsWed, 28 Nov 2018 03:52:56 +0000http://hunch.net/?p=10858918The ICML 2019 Conference will be held from June 10-15 in Long Beach, CA — about a month earlier than last year. To encourage reproducibility as well as high quality submissions, this year we have three major changes in place.

There is an abstract submission deadline on Jan 18, 2019. Only submissions with proper abstracts will be allowed to submit a full paper, and placeholder abstracts will be removed. The full paper submission deadline is Jan 23, 2019.

This year, the author list at the paper submission deadline (Jan 23) is final. No changes will be permitted after this date for accepted papers.

Finally, to foster reproducibility, we highly encourage code submission with papers. Our submission form will have space for two optional supplementary files — a regular supplementary manuscript, and code. Reproducibility of results and easy accessibility of code will be taken into account in the decision-making process.

]]>http://hunch.net/?feed=rss2&p=108589181Please votehttp://hunch.net/?p=10648987
http://hunch.net/?p=10648987#commentsMon, 29 Oct 2018 20:45:49 +0000http://hunch.net/?p=10648987This is not at all related to Machine Learning.

I lived in Squirrel Hill as a graduate student at Carnegie Mellon so the massacre there is feeling particularly immediate. While the person who did it is obviously culpable, the pattern of events makes it clear that others bear responsibility as well. This pattern includes an attempted bomber of Democrats and Trump critics by a Trump fanboy. It also includes a more general cross section of Republicans and their leaders pushing anti-semitism and more general xenophobia about migrants.

Alliances in a two-party system tend to be fragile since winning with a smaller constituency enables better serving that constituency. Losing the populist angle leaves a double-down on the remaining agenda as the most plausible choice. Xenophobia is much older than democracy and psychologically potent so it has obvious value. It’s historically used by leaders who pick some characteristic to divide people and position themselves to thrive on the conflict or distraction that creates. Almost anything will do—if you take away religion, birthplace, skin color, and ethnicity, it would just change to hair color, nose size, or left-handedness. In a democracy, the goal with this approach is simply convincing people to vote according to their activated xenophobia.

For people embracing xenophobia to retain power, stochastic terrorism is just an unfortunate side effect. In this sense, inciting xenophobia about a caravan of refugee Guatemalans at the other end of Mexico is rather clever since most of them won’t even make it to the US border months after the election plausibly leaving only electoral consequences. Yet xenophobia is known to be hard to control. Given this, it’s difficult to imagine stochastic terrorism as anything other than deliberately accepted by the Republican party leadership as an observed consequence of this behavior. The Squirrel Hill massacre and the attempted bombing campaigns are precisely the sort of thing that can happen when you dial up the rhetoric just before an election.

This is part of a pattern of moral collapse across the Republican party. By any reasonable measure Donald Trump is a serial liar with Republican politicians now mimicking this behavior. A remarkable set of people around the Trump campaign are confessed or convicted criminals with members of the Republican party variously tolerating, condoning, and perhaps mimicking.

In this context, the upcoming midterm election seems particularly important. If politicians in aggregate behave as if they will do anything to get reelected, then voters must vote for the behavior they want at the ballot box rather than relying on or appealing to it at a later date. In most situations, this is about picking and choosing the better candidate. I’ve been registered as an independent for this reason—I want to decide for myself.

This is not most situations. Do voters rebuke the Republican party or not? If the answer is not (a 37% chance according to bettors at present) then the slide into corruption likely accelerates as confirmed control of the government erodes the remaining institutional checks on corruption. We are several steps away from a state of deep corruption and it takes time for the consequences of corruption to really seep into society. But every step on the path makes the situation worse and we are on the wrong path now as evidenced by bombing attempts, a xenophobic massacre, and the wider context creating them.

I want to particularly encourage those who are eligible to vote in the United States midterms November 6th.

]]>http://hunch.net/?feed=rss2&p=106489871A Real World Reinforcement Learning Research Programhttp://hunch.net/?p=9828091
http://hunch.net/?p=9828091#commentsFri, 06 Jul 2018 15:10:03 +0000http://hunch.net/?p=9828091We are hiring for reinforcement learning related research at all levels and all MSR labs. If you are interested, apply, talk to me at COLT or ICML, or email me.

More generally though, I wanted to lay out a philosophy of research which differs from (and plausibly improves on) the current prevailing mode.

Deepmind and OpenAI have popularized an empirical approach where researchers modify algorithms and test them against simulated environments, including in self-play. They’ve achieved significant success in these simulated environments, greatly expanding the reportoire of ‘games solved by reinforcement learning’ which consisted of the singleton backgammon when I was a graduate student. Given the ambitious goals of these organizations, the more general plan seems to be “first solve games, then solve real problems”. There are some weaknesses to this approach, which I want to lay out next.

Broken API One issue with this is that multi-step reinforcement learning is a broken API in the sense that it creates an interface for problem definitions that is unsolvable via currently popular algorithm families. In particular, you can create problems which are either ‘antishaped’ so local rewards mislead w.r.t. long term rewards or keylock problems, as are common in Markov Decision Process lower bounds. I coded up simple versions of these problems a couple years ago and stuck them on github now to be extra crisp. If you try to apply policy gradient or Q-learning style algorithms on these problems they commonly run into exponential (in the number of states) sample complexity. As a general principle, APIs which create exponential sample complexity are bad—they imply that individual applications require taking advantage of special structure in order to succeed.

Transference Another significant issue is the degree of transference between solutions in simulation and the real world. “Transference” here potentially happens at several levels.

Do the algorithms carry over? One of the persistent issues with simulation-based approaches is that you don’t care about sample complexity that much—optimal performance at acceptable computational complexities is the typical goal. In real world applications, this is somewhat absurd—you really care about immediately doing something reasonable and optimizing from there.

Do the simulators carry over? For every simulator, there is a fidelity question which comes into play when you want to transfer a policy learned in the simulator into action in the real world. Real-time ray tracing and simulator quality more generally are advancing, but I’m not ready yet to trust a self-driving care trained in a simulated reality. An accurate simulation of the physics is unclear—friction for example is known-difficult, and more generally the representative variety of exogenous events in an open world seems quite difficult to implement.

Solution generality When you test and discover that an algorithm works in a simulated world, you know that it works in the simulated world. If you try it in 30 simulated worlds and it works in all of them, it can still easily be the case that an algorithm fails on the 31st simulated world. How can you achieve confidence beyond the number of simulated worlds that you try and succeed on? There is some sense by which you can imagine generalization over an underlying process generating problems, but this seems like a shaky justification in practice, since the nature of the problems encountered seems to be a nonstationary development of an unknown future.

Value creation Solutions of a ‘first A, then B’ flavor naturally take time to get to the end state where most of the real value is set to be realized. In the years before reaching applications in the real world, does the funding run out? We certainly hope not for the field of research but a danger does exist. Some discussion here including the comments is relevant.

What’s an alternative?

Each of the issues above is addressable.

Build fundamental theories of what are statistically and computationally tractable sub-problems of Reinforcement Learning. These tractable sub-problems form the ‘APIs’ of systems for solving these problems. Examples of this include simpler (Contextual Bandits), intermediate (learning to search, and move advanced (Contextual Decision Process).

Work on real-world problems. The obvious antidote to simulation is reality, driving both the need to create systems that work in reality as well as a research agenda around reality-centered issues like performance at low sample complexity. There are some significant difficulties with this—reinforcement style algorithms require interactive access to learn which often drives research towards companies with an infrastructure. Nevertheless, offline evaluation on real-world data does exist and the choice of emphasis in research directions is universal.

The combination of fundamental theories and a platform which distills learnings so they are not forgotten and always improved upon provides a stronger basis for expectation of generalization into the next problem.

The shortest path to creating valuable applications in the real world is to simply work on creating valuable applications in the real world. Doing this in a manner guided by other elements of the research program is just good sense.

The above must be applied in moderation—some emphasis on theory, some emphasis on real world applications, some emphasis on platforms, and some emphasis on empirics. This has been my research approach for a little over 10 years, ever since I started working on contextual bandits.

Let’s call the first research program ’empirical simulation’ and the second research program ‘real fundamentals’. The empirical simulation approach has a clear strong advantage in that it creates impressive demos, which creates funding, which creates more research. The threshold for contribution to the empirical simulation approach may also be lower simply because it requires mastery of fewer elements, implying people can more easily participate in it. At the same time, the real fundamentals approach has clear advantages in addressing the weaknesses of the empirical simulation approach. At a concrete level, this means we have managed to define and create fundamentals through research while creating real-world applications and value radically more efficiently than the empirical simulation approach has achieved.

The ‘real fundamentals’ concept is behind the open positions above. These positions have been designed to come with both the colleagues and mandate to address the most difficult research problems along with the organizational leverage to change the world. For people interested in fundamentals and making things happen in the real world these are prime positions—please consider joining us.

There is significant evidence that the process of reviewing papers in machine learning is creaking under several years of exponentiating growth.

Public figures often overclaim the state of AI.

Money rains from the sky on ambitious startups with a good story.

Apparently, we now even have a fake conference website (https://nips.cc/ is the real one for NIPS).

We are clearly not in a steady-state situation. Is this a bubble or a revolution? The answer surely includes a bit of revolution—the fields of vision and speech recognition have been turned over by great empirical successes created by deep neural architectures and more generally machine learning has found plentiful real-world uses.

At the same time, I find it hard to believe that we aren’t living in a bubble. There was an AI bubble in the 1980s (before my time), a techbubble around 2000, and we seem to have a combined AI/tech bubble going on right now. This is great in some ways—many companies are handing out professional sports scale signing bonuses to researchers. It’s a little worrisome in other ways—can the field effectively handle the stress of the influx?

It’s always hard to say when and how a bubble bursts. It might happen today or in several years and it may be a coordinated failure or a series of uncoordinated failures.

As a field, we should consider the coordinated failure case a little bit. What fraction of the field is currently at companies or in units at companies which are very expensive without yet justifying that expense? It’s no longer a small fraction so there is a chance for something traumatic for both the people and field when/where there is a sudden cut-off. My experience is that cuts typically happen quite quickly when they come.

As an individual researcher, consider this an invitation to awareness and a small amount of caution. I’d like everyone to be fully aware that we are in a bit of a bubble right now and consider it in their decisions. Caution should not be overdone—I’d gladly repeat the experience of going to Yahoo! Research even knowing how it ended. There are two natural elements here:

Where do you work as a researcher? The best place to be when a bubble bursts is on the sidelines.

Is it in the middle of a costly venture? Companies are not good places for this in the long term whether a startup or a business unit. Being a researcher at a place desperately trying to figure out how to make research valuable doesn’t sound pleasant.

Is it in the middle of a clearly valuable venture? That could be a good place. If you are interested we are hiring.

Is it in academia? Academia has a real claim to stability over time, but at the same time opportunity may be lost. I’ve greatly enjoyed and benefited from the opportunity to work with highly capable colleagues on the most difficult problems. Assembling the capability to do that in an academic setting seems difficult since the typical maximum scale of research in academia is a professor+students.

What do you work on as a researcher? Some approaches are more “bubbly” than others—they might look good, but do they really provide value?

Are you working on intelligence imitation or intelligence creation? Intelligence creation ends up being more valuable in the long term.

Are you solving synthetic or real-world problems? If you are solving real-world problems, you are almost certainly creating value. Synthetic problems can lead to real-world solutions, but the path is often fraught with unforeseen difficulties.

Are you working on a solution to one problem or many problems? A wide applicability for foundational solutions clearly helps when a bubble bursts.

Researchers have a great ability to survive a bubble bursting—a built up public record of their accomplishments. If you are in a good environment doing valuable things and that environment happens to implode one day the strength of your publications is an immense aid in landing on your feet.

]]>http://hunch.net/?feed=rss2&p=960432816Reinforcement Learning Platformshttp://hunch.net/?p=9262767
http://hunch.net/?p=9262767#commentsMon, 16 Apr 2018 20:17:22 +0000http://hunch.net/?p=9262767If you are interested in building an industrial Reinforcement Learning platform, we are hiring a data scientistandmultipledevelopers as a followup to last year’s hiring. Please apply if interested as this is a real chance to be a part of building the future
]]>http://hunch.net/?feed=rss2&p=92627673ICML Board and Reviewer profileshttp://hunch.net/?p=8962378
http://hunch.net/?p=8962378#commentsMon, 05 Mar 2018 21:34:16 +0000http://hunch.net/?p=8962378The outcome of the election for the IMLS (which runs ICML) adds Emma Brunskill and Hugo Larochelle to the board. The current members of the board (and the reason for board membership) are:

President Elect is a 2-year position with little responsibility, but I decided to look into two things. One is the website which seems relatively difficult to navigate. Ideas for how to improve are welcome.

The other is creating a longitudinal reviewer profile. I keenly remember the day after reviews were due when I was program chair (in 2012) which left a panic-inducing number of unfinished reviews. To help with this, I’m planning to create a profile of reviewers which program chairs can refer to in making decisions about who to ask to review. There are a number of ways to do this wrong which I’m avoiding with the following procedure:

After reviews are assigned, capture the reviewer/paper assignment. Call this set A.

Strip the paper ids from B (completed reviews) turning it into a multiset D of reviewers completed reviews.

Compute C-A (as a set difference) then turn it into a multiset E of reviewers incomplete reviews.

Store D & E for long term reference.

This approach:

Is objectively defined. Approaches based on subjective measurements seem both fraught with judgment issues and inconsistent. Consider for example the impressive variation we all see in review quality.

Does not record a review as late for reviewers who are assigned a paper late in the process via step (1) and (4). We want to encourage reviewers to take on the unusual but important late tasks that arrive.

Does not record a review as late for reviewers who discover they are inappropriate after assignment and ask for reassignment. We want to encourage reviewers to look at their papers early and, if necessary, ask for a paper to be reassigned early.

Preserves anonymity of paper/reviewer assignments for authors who later become program chairs. The conversion into a multiset removes the paper id entirely.

Overall, my hope is that several years of this will provide a good and useful tool enabling program chairs and good (or at least not-bad) reviewers to recognize each other.

]]>http://hunch.net/?feed=rss2&p=89623781Pervasive Simulator Misuse with Reinforcement Learninghttp://hunch.net/?p=8825714
http://hunch.net/?p=8825714#commentsWed, 14 Feb 2018 21:25:14 +0000http://hunch.net/?p=8825714The surge of interest in reinforcement learning is great fun, but I often see confused choices in applying RL algorithms to solve problems. There are two purposes for which you might use a world simulator in reinforcement learning:

Reinforcement Learning Research: You might be interested in creating reinforcement learning algorithms for the real world and use the simulator as a cheap alternative to actual real-world application.

Problem Solving: You want to find a good policy solving a problem for which you have a good simulator.

In the first instance I have no problem, but in the second instance, I’m seeing many head-scratcher choices.

A reinforcement learning algorithm engaging in policy improvement from a continuous stream of experience needs to solve an opportunity-cost problem. (The RL lingo for opportunity-cost is “advantage”.) Thinking about this in the context of a 2-person game, at a given state, with your existing rollout policy, is taking the first action leading to a win 1/2 the time good or bad? It could be good since the player is well behind and every other action is worse. Or it could be bad since the player is well ahead and every other action is better. Understanding one action’s long term value relative to another’s is the essence of the opportunity cost trade-off at the core of many reinforcement learning algorithms.

If you have a choice between an algorithm that estimates the opportunity cost and one which observes the opportunity cost, which works better? Using observed opportunity-cost is an almost pure winner because it cuts out the effect of estimation error. In the real world you can’t observe the opportunity cost directly Groundhog day style. How many times have you left a conversation and thought to yourself: I wish I had said something else? A simulator is different though—you can reset a simulator. And when you do reset a simulator, you can directly observe the opportunity-cost of an action which can then directly drive learning updates.

If you are coming from viewpoint 1, using a “reset cheat” is unappealing since it doesn’t work in the real world and the goal is making algorithms which work in the real world. On the other hand, if you are operating from viewpoint 2, the “reset cheat” is a gigantic opportunity to dramatically improve learning algorithms. So, why are many people with goal 2 using goal 1 designed algorithms? I don’t know, but here are some hypotheses.

Maybe people just aren’t aware that goal 2 style algorithms exist? They are out there. The most prominent examples of goal 2 style algorithms are from Learning to search and AlphaGo Zero.

Maybe people are worried about the additional sample complexity of doing multiple rollouts from reset points? But these algorithm typically require little additional sample complexity in the worst case and can provide gigantic wins. People commonly use a discount factor d values future rewards t timesteps ahead with a discount of dt. Alternatively, you can terminate rollouts with probability 1 – d and value future rewards with no discount while preserving the expected value. Using this approach a rollout terminates after an expected 1/(1-d) timesteps bounding the cost of a reset and rollout. Since it is common to use very heavy discounting (e.g. d=0.9), the worst case additional sample complexity is only a small factor larger. On the upside, eliminating estimation error is can radically reduce sample complexity in theory and practice.

Maybe the implementation overhead for a second family of algorithms is to difficult? But the choice of whether or not you use resets is far more important than “oh, we’ll just run things for 10x longer”. It can easily make or break the outcome.

Maybe there is some other reason? As I said above, this is head-scratcher that I find myself trying to address regularly.

]]>http://hunch.net/?feed=rss2&p=88257145Vowpal Wabbit 8.5.0 & NIPS tutorialhttp://hunch.net/?p=8305306
http://hunch.net/?p=8305306#commentsSun, 03 Dec 2017 16:45:10 +0000http://hunch.net/?p=8305306Yesterday, I tagged VW version 8.5.0 which has many interactive learning improvements (both contextual bandit and active learning), better support for sparse models, and a new baseline reduction which I’m considering making a part of the default update rule.