Alex Slivkins: publications by topic

Exploration, Exploitation and Incentives.
We study various scenarios in which the algorithmic challenges of online learning, and particularly the exploration-exploitation tradeoff, are
intertwined with the mechanism design challenges of interacting with self-interested agents.

Characterizing Truthful Multi-Armed Bandit Mechanisms(rev. June'13)Moshe Babaioff, Yogeshwer Sharma and Aleksandrs Slivkins
EC 2009: ACM Symp. on Electronic CommerceSICOMP: SIAM J. on Computing, Vol. 43, No. 1, pp. 194-230, 2014
We consider a natural strategic version of the MAB problem motivated by pay-per-click auctions. We show that requiring an MAB algorithm to be incentive-compatible has striking consequences both for structure and regret.

The latest revision reflects some minor bug fixes in the proof of Lemma 7.10.

We show that payment computation essentially does not present any obstacle in designing truthful mechanisms for single-parameter domains, even when we can only call the allocation rule once. Applying this to multi-armed bandits (MAB), we design truthful MAB mechanisms for stochastic payoffs. More generally, we open up a problem of designing monotone MAB allocation rules.

Multi-parameter Mechanisms with Implicit Payment ComputationMoshe Babaioff, Robert Kleinberg and Aleksandrs Slivkins
EC 2013: ACM Symp. on Electronic Commerce
We show that payment computation essentially does not present any obstacle in designing truthful mechanisms, even for multi-parameter domains, and even when we can only call the allocation rule once. Then we study a prominent example for a multi-parameter setting in which an allocation rule can only be called once, which arises in sponsored search auctions.

Bayesian Incentive-Compatible Bandit Exploration(rev. 2017)
(slides)
Yishay Mansour, Aleksandrs Slivkins and Vasilis Syrgkanis
EC 2015: ACM Symp. on Economics and ComputationWorking paper (2016)
We design bandit algorithms that recommend actions to self-interested agents (who then decide which actions to take). By means of carefully designed information disclosure, we incentivize the agents to balance exploration and exploitation so as to maximize social welfare.

Bayesian Exploration: Incentivizing Exploration in Bayesian Games(rev. Nov'16)
(slides)
Yishay Mansour, Aleksandrs Slivkins, Vasilis Syrgkanis and Steven Wu
EC 2016: ACM Symp. on Economics and ComputationWorking paper (2016)
At each time step, multiple agents arrive, play a fixed Bayesian game, and leave forever.
Agents' decisions reveal info that can help future agents,
creating a tradeoff between exploration, exploitation, and agents' incentives.
We design a social planner which learns over time and coordinates the agents towards
socially desirable outcomes.

Competing Bandits: Learning under Competition (2017)
Yishay Mansour, Aleksandrs Slivkins, and Zhiwei Steven Wu
Most modern systems strive to learn from interactions with users, and many
engage in exploration: making potentially suboptimal choices for the
sake of acquiring new information. We initiate a study of the interplay between
exploration and competition---how such systems balance the exploration
for learning and the competition for users.

Online Decision Making in Crowdsourcing Markets: Theoretical ChallengesAleksandrs Slivkins and Jennifer Wortman Vaughan
SIGecom Exchanges, Dec 2013.
In crowdsourcing markets, task requesters and the platform itself make repeated decisions about prices to set, workers to filter out, problems to assign to specific workers, etc. Designing algorithms for making these repeated decisions is a rich, emerging problem space. We survey this problem space, point out significant modeling difficulties, and identify directions to make progress.

We propose a simple model for adaptive quality control in crowdsourced multiple-choice tasks which we call the bandit survey problem. This model is related to, but technically different from the well-known multi-armed bandit problem. We present several algorithms for this problem, and support them with analysis and simulations.

Adaptive Contract Design for Crowdsourcing Markets:
Bandit Algorithms for Repeated Principal-Agent Problems
(rev. Sep'15)Chien-Ju Ho, Aleksandrs Slivkins and Jennifer Wortman Vaughan.
EC 2014: ACM Symp. on Economics and ComputationJAIR: J. of Artificial Intelligence Research, Vol. 54, 2015.
(Special Track on Human Computation)
We consider a repeated version of the principal-agent model in which the principal can revise the contract over time, and the agent can strategically choose the (unobservable) effort level. We treat this as a multi-armed bandit problem, and design an algorithm that adaptively refines the partition of the action space without relying on Lipschitz assumptions.

Bandits and Experts in Metric Spaces(rev. 2015)Robert Kleinberg, Aleksandrs Slivkins and Eli Upfal.
A merged and heavily revised version of papers in
STOC'08 and
SODA'10.
To appear in J. of the ACM, upon a revision.
We introduce the 'Lipschitz MAB problem': a stochastic MAB problem, possibly with a very large set of arms, such that the expected payoffs obey a Lipschitz condition with respect to a given metric space. The goal is to minimize regret as a function of time, both in the worst case and for 'nice' problem instances.

Contextual bandits with similarity information(rev. May'14)COLT 2011: Conf. on Learning Theory.
JMLR:
J. of Machine Learning Research, 15(Jul):2533-2568, 2014.
In each round nature reveals a 'context' x, algorithm chooses an 'arm' y, and the expected payoff is μ(x,y). Similarity info is given: a metric space over the (x,y) pairs such that μ is a Lipschitz function. Interpreting the current time as a part of the 'context', we obtain a very general bandit framework that includes slowly changing payoffs and variable sets of arms. The main algorithmic idea is to adapt the partitions of the metric space to frequent context arrivals and high-payoff regions.

Multi-armed bandits on implicit metric spacesNIPS 2011:
Conf. on Neural Information Processing Systems.
Suppose an MAB algorithm is given a tree-based classification of arms. This tree implicitly defines a "similarity distance" between arms, but the numeric distances are not revealed to the algorithm. Our algorithm (almost) matches the best known guarantees for the setting (Lipschitz MAB) in which the distances are revealed.

Adaptive Contract Design for Crowdsourcing Markets:
Bandit Algorithms for Repeated Principal-Agent Problems
(rev. Sep'15)Chien-Ju Ho, Aleksandrs Slivkins and Jennifer Wortman Vaughan.
EC 2014: ACM Symp. on Economics and ComputationJAIR (J. of Artificial Intelligence Research), Vol. 54, 2015.
We consider a repeated version of the principal-agent model in which the principal can revise the contract over time, and the agent can strategically choose the (unobservable) effort level. We treat this as a multi-armed bandit problem, and design an algorithm that adaptively refines the partition of the action space without relying on Lipschitz assumptions.

Adapting to a Changing Environment: the Brownian Restless BanditsAleksandrs Slivkins and Eli Upfal.
COLT 2008:
Conf. on Learning Theory.
We study a version of the stochastic MAB problem in which the expected reward of each arm evolves stochastically and gradually in time, following an independent Brownian motion or a similar process. Our benchmark is a hypothetical policy that chooses the best arm in each round.

Adapting to the Shifting Intent of Search QueriesUmar Syed, Aleksandrs Slivkins and Nina Mishra
NIPS'09:
Annual Conf. on Neural Information Processing Systems
Query intent may shift over time. A classifier can use the available signals to predict a shift in intent. Then a bandit algorithm can be used to find the new relevant results. We present a meta-algorithm that combines such
classifier with a bandit algorithm in a feedback loop, with favorable regret guarantees.

Contextual bandits with similarity information(rev. May'14)COLT 2011: Conf. on Learning Theory.
JMLR:
J. of Machine Learning Research, 15(Jul):2533-2568, 2014.
In each round nature reveals a 'context' x, algorithm chooses an 'arm' y, and the expected payoff is μ(x,y). Similarity info is given: a metric space over the (x,y) pairs such that μ is a Lipschitz function. Interpreting the current time as a part of the 'context', we obtain a very general bandit framework that includes slowly changing payoffs and variable sets of arms. The main algorithmic idea is to adapt the partitions of the metric space to frequent context arrivals and high-payoff regions.

The best of both worlds: stochastic and adversarial bandits.
Sébastien Bubeck and Aleksandrs Slivkins
COLT 2012: Conf. on Learning Theory.
We present a new bandit algorithm whose regret is optimal both for adversarial rewards and for stochastic rewards, achieving, resp., square-root regret and polylog regret. Adversarial rewards and stochastic rewards are the two main settings for (non-Bayesian) multi-armed bandits; prior work treats them separately, and does not attempt to jointly optimize for both.

One Practical Algorithm for Both Stochastic and Adversarial BanditsYevgeny Seldin and Aleksandrs Slivkins
ICML 2014: Intl. Conf. on Machine Learning.
We present a bandit algorithm that achieves near-optimal performance in both stochastic and adversarial regimes without prior knowledge about the environment. Our algorithm is both rigorous and practical; it is based on a new control lever that we reveal in the EXP3 algorithm.

Making Contextual Decisions with Low Technical Debt (2017)
Alekh Agarwal, Sarah Bird, Markus Cozowicz, Luong Hoang, John Langford, Stephen Lee, Jiaji Li, Dan Melamed, Gal Oshri, Oswaldo Ribas, Siddhartha Sen, and Aleksandrs Slivkins
Applications and systems are constantly faced with decisions that require picking from a set of actions based on contextual information. Contextual bandit algorithms can be very effective in these settings, but applying them in practice is fraught with technical debt. We create the first general system for contextual bandit learning, called the Decision Service.

Multi-World Testing: A System for Experimentation, Learning, And Decision-Making(rev. Jul'16)Alekh Agarwal, Sarah Bird, Markus Cozowicz, Miro Dudik, John Langford, Lihong Li, Luong Hoang, Dan Melamed, Sid Sen, Robert Schapire, Alex Slivkins.
(The MWT project)
Multi-World Testing (MWT) is a methodology for principled and efficient experimentation, learning, and decision-making. It is plausibly applicable to most services that interact with customers; in many scenarios, it is exponentially more efficient than the traditional A/B testing. The underlying research area is known as "contextual bandits" and "counterfactual evaluation".

Contextual Dueling BanditsMiroslav Dudík, Katja Hofmann, Robert E. Schapire, Aleksandrs Slivkins and Masrour Zoghi
COLT 2015: Conf. on Learning Theory.
We extend "dueling bandits" (where feedback is limited to pairwise comparisons between arms) to incorporate contexts (as in "contextual bandits"). We propose a natural new solution concept, rooted in game theory, and present algorithms for approximately learning this concept.

It is commonly assumed that individuals tend to be more similar to their friends than to strangers. Thus, we can view an observed social network as a noisy signal about the latent underlying "social space": the way in which individuals are (dis)similar. We present near-linear time algorithms which - under reasonably standard models of social network generation - can infer the similarities from the observed network with provable guarantees.

Selection and Influence in Cultural Dynamics(rev. Oct'15)David Kempe, Jon Kleinberg, Sigal Oren and Aleksandrs Slivkins
EC 2013: ACM Symp. on Electronic CommerceNetwork Science, vol. 4(1), 2016.
One of the fundamental principles driving diversity or homogeneity in a social network is the tension between two forces: influence (tendency to become similar to one's friends) and selection (tendency to interact with similar people). Influence tends to promote homogeneity within a society, while selection frequently causes fragmentation. We analyze which societal outcomes should be expected when both forces are in effect. We consider a natural class of models built upon active lines of work in political opinion formation, cultural diversity, and language evolution.

We consider metric embeddings and triangulation-based distance estimation
in a distributed framework with low load on the participating nodes.
Our results provide theoretical insight into the empirical success of several recent
Internet-related projects.

Given any x, any metric admits a low-dim embedding
into Lp, p>=1 with disortion D(x) = O(log 1/x)
on all but an x-fraction of edges.
Moreover, any decomposable metric (e.g. any doubling metric)
admits a low-dim embedding such that
D(x) = O(log 1/x)^{1/p}
for all x.

Oscillations with TCP-like Flow Control in Networks of QueuesMatthew Andrews and Aleksandrs Slivkins
INFOCOM 2006
IEEE Conf. on Computer Communications
For a wide range of TCP-like fluid-based congestion control models,
we construct a network of sessions and (almost) FIFO routers such that
starting from a certain initial state, the system returns to the same
state eventually. Contrasting the prior work, in our example the total
sending rate of all sessions that come through any given router never
exceeds its capacity.

Network Failure Detection and Graph ConnectivityJon Kleinberg, Mark Sandler and Aleksandrs Slivkins.
SIAM J. on Computing, 38(4): 1330-1346, Aug 2008.
SODA 2004:
The ACM-SIAM Symp. on Discrete Algorithms
[slides]
We detect network partitions -- with strong provable guarantees -- using
a small set of 'agents' placed randomly on nodes of the network.
We parameterize our guarantees by edge- and
node-connectivity of the underlying graph.

Approximate Matching for Peer-to-Peer Overlays with Cubit (2009)
Bernard Wong, Aleksandrs Slivkins and Emin G. Sirer.
Cubit is a system that provides fully decentralized approximate keyword search capabilities to a peer-to-peer network. You can use Cubit to find a movie, song or artist even if you misspell the title or the name.