Medoids in Almost-Linear Time via Multi-Armed Bandits

Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, PMLR 84:500-509, 2018.

Abstract

Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit to compute the medoid with high probability, which uses $O(n\log n)$ distance evaluations. Med-dit is based on a connection with the Multi-Armed Bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and single-cell RNA-seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a $5$-$10$x improvement in performance over the current state of the art. We have released the code of Med-dit and our empirical results at https://github.com/bagavi/Meddit.

Related Material

@InProceedings{pmlr-v84-bagaria18a,
title = {Medoids in Almost-Linear Time via Multi-Armed Bandits},
author = {Vivek Bagaria and Govinda Kamath and Vasilis Ntranos and Martin Zhang and David Tse},
booktitle = {Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics},
pages = {500--509},
year = {2018},
editor = {Amos Storkey and Fernando Perez-Cruz},
volume = {84},
series = {Proceedings of Machine Learning Research},
address = {Playa Blanca, Lanzarote, Canary Islands},
month = {09--11 Apr},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v84/bagaria18a/bagaria18a.pdf},
url = {http://proceedings.mlr.press/v84/bagaria18a.html},
abstract = {Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit to compute the medoid with high probability, which uses $O(n\log n)$ distance evaluations. Med-dit is based on a connection with the Multi-Armed Bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and single-cell RNA-seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a $5$-$10$x improvement in performance over the current state of the art. We have released the code of Med-dit and our empirical results at https://github.com/bagavi/Meddit.}
}

%0 Conference Paper
%T Medoids in Almost-Linear Time via Multi-Armed Bandits
%A Vivek Bagaria
%A Govinda Kamath
%A Vasilis Ntranos
%A Martin Zhang
%A David Tse
%B Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2018
%E Amos Storkey
%E Fernando Perez-Cruz
%F pmlr-v84-bagaria18a
%I PMLR
%J Proceedings of Machine Learning Research
%P 500--509
%U http://proceedings.mlr.press
%V 84
%W PMLR
%X Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit to compute the medoid with high probability, which uses $O(n\log n)$ distance evaluations. Med-dit is based on a connection with the Multi-Armed Bandit problem. We evaluate the performance of Med-dit empirically on the Netflix-prize and single-cell RNA-seq datasets, containing hundreds of thousands of points living in tens of thousands of dimensions, and observe a $5$-$10$x improvement in performance over the current state of the art. We have released the code of Med-dit and our empirical results at https://github.com/bagavi/Meddit.