For your convenience, we have compiled a list of currently implemented APIs and methods
available in Modin. This documentation is updated as new methods and APIs are merged
into the master branch, and not necessarily correct as of the most recent release. In
order to install the latest version of Modin, follow the directions found on the
installation page.

Currently, we support ~71% of the pandas API. The exact methods we have implemented are
listed below.

We have taken a community-driven approach to implementing new methods. We did a study
on pandas usage to learn what the most-used APIs are. Modin currently supports 93%
of the pandas API based on our study of pandas usage, and we are actively expanding the
API.

The remaining unimplemented methods default to pandas. This allows users to continue
using Modin even though their workloads contain functions not yet implemented in Modin.
Here is a diagram of how we convert to pandas and perform the operation:

We first convert to a pandas DataFrame, then perform the operation. There is a
performance penalty for going from a partitioned Modin DataFrame to pandas because of
the communication cost and single-threaded nature of pandas. Once the pandas operation
has completed, we convert the DataFrame back into a partitioned Modin DataFrame. This
way, operations performed after something defaults to pandas will be optimized with
Modin.

The following table lists both implemented and not implemented methods. If you have need
of an operation that is listed as not implemented, feel free to open an issue on the
GitHub repository. Contributions are also welcome!

DataFrame method

Implemented?

Limitations/Notes for Current implementation

T

Y

__abs__

Y

__add__

Y

__and__

Y

__array__

Y

Will not result in a distributed object

__array_wrap__

Y

Will not result in a distributed object

__bool__

Y

__contains__

Y

__copy__

Y

Copy will always make a shallow copy

__deepcopy__

Y

Copy will always make a shallow copy

__delitem__

Y

__div__

Y

Requires shuffle when operating on two DataFrames

__eq__

Y

Requires shuffle when operating on two DataFrames

__finalize__

N

Defaults to pandas

__floordiv__

Y

Requires shuffle when operating on two DataFrames

__ge__

Y

Requires shuffle when operating on two DataFrames

__getitem__

Y

Returns a pandas Series (see Series section below)

key parameter as type DataFrame not yet
supported

MultiIndex columns defaults to pandas

__getstate__

N

Defaults to pandas

__gt__

Y

Requires shuffle when operating on two DataFrames

__hash__

N

Defaults to pandas

__iadd__

Y

See __add__

__ifloordiv__

Y

See __floordiv__

__imod__

Y

See __mod__

__imul__

Y

See __mul__

__invert__

N

Defaults to pandas

__ipow__

Y

See __pow__

__isub__

Y

See __sub__

__iter__

Y

__itruediv__

Y

See __truediv__

__le__

Y

Requires shuffle when operating on two DataFrames

__len__

Y

__lt__

Y

Requires shuffle when operating on two DataFrames

__mod__

Y

Requires shuffle when operating on two DataFrames

__mul__

Y

Requires shuffle when operating on two DataFrames

__ne__

Y

Requires shuffle when operating on two DataFrames

__neg__

Y

__nonzero__

Y

__or__

Y

__pow__

Y

Requires shuffle when operating on two DataFrames

__radd__

Y

See __add__

__rdiv__

Y

See __div__

__repr__

Y

Blocking call: Must retrieve data from remote

__rfloordiv__

Y

See __floordiv__

__rmod__

Y

See __mod__

__rmul__

Y

See __mul__

__round__

N

Defaults to pandas

__rpow__

Y

See __pow__

__rsub__

Y

See __sub__

__rtruediv__

Y

See __truediv__

__setitem__

Y

Can only set if key parameter is type str

__setstate__

N

Defaults to pandas

__sizeof__

N

Defaults to pandas

__str__

Y

Blocking call: Must retrieve data from remote

__sub__

Y

Requires shuffle when operating on two DataFrames

__truediv__

Y

Requires shuffle when operating on two DataFrames

__unicode__

N

Defaults to pandas

__xor__

Y

abs

Y

add

Y

See __add__

add_prefix

Y

add_suffix

Y

agg

Y

Not yet optimized: Can return DataFrame or Series

Passing a dictionary for the func parameter
not yet supported

Passing the string name of a numpy operation for
the func parameter defaults to pandas

aggregate

Y

See agg

align

N

Defaults to pandas

all

Y

any

Y

append

Y

Can be further optimized to be non-blocking

apply

Y

See agg

applymap

Y

as_blocks

N

Defaults to pandas

as_matrix

Y

Will not result in a distributed object

asfreq

N

Defaults to pandas

asof

N

Defaults to pandas

assign

N

Defaults to pandas

astype

Y

at

N

Defaults to pandas

at_time

N

Defaults to pandas

axes

Y

between_time

N

Defaults to pandas

bfill

Y

blocks

N

Defaults to pandas

bool

Y

boxplot

Y

clip

Y

clip_lower

Y

clip_upper

Y

columns

Y

combine

N

Defaults to pandas

combine_first

N

Defaults to pandas

compound

N

Defaults to pandas

consolidate

N

Defaults to pandas

convert_objects

N

Defaults to pandas

copy

Y

Copy will always make a shallow copy

corr

N

Defaults to pandas

corrwith

N

Defaults to pandas

count

Y

cov

N

Defaults to pandas

cummax

Y

cummin

Y

cumprod

Y

cumsum

Y

describe

Y

diff

Y

div

Y

See __div__

divide

Y

See __div__

dot

N

Defaults to pandas

drop

Y

drop_duplicates

N

Defaults to pandas

dropna

Y

dtypes

Y

duplicated

N

Defaults to pandas

empty

Y

eq

Y

See __eq__

equals

Y

Requires shuffle, can be further optimized

eval

Y

ewm

N

Defaults to pandas

expanding

N

Defaults to pandas

ffill

Y

fillna

Y

value parameter of type DataFrame defaults to
pandas

filter

Y

first

N

Defaults to pandas

first_valid_index

Y

floordiv

Y

See __floordiv__

from_csv

Y

from_dict

Y

from_items

Y

from_records

Y

ftypes

Y

ge

Y

See __ge__

get

Y

get_dtype_counts

Y

get_ftype_counts

Y

get_value

N

Defaults to pandas

get_values

N

Defaults to pandas

groupby

Y

Not yet optimized, will require Distributed Series

by with a list of columns defaults to pandas

gt

Y

See __gt__

head

Y

hist

N

Defaults to pandas

iat

N

Defaults to pandas

idxmax

Y

idxmin

Y

iloc

Y

index

Y

infer_objects

N

Defaults to pandas

info

Y

insert

Y

interpolate

N

Defaults to pandas

is_copy

N

Defaults to pandas

isin

Y

isna

Y

isnull

Y

items

Y

iteritems

Y

iterrows

Y

itertuples

Y

ix

N

Defaults to pandas

join

Y

keys

Y

kurt

N

Defaults to pandas

kurtosis

N

Defaults to pandas

last

N

Defaults to pandas

last_valid_index

Y

le

Y

See __le__

loc

Y

lookup

N

Defaults to pandas

lt

Y

See __lt__

mad

N

Defaults to pandas

mask

N

Defaults to pandas

max

Y

mean

Y

median

Y

melt

N

Defaults to pandas

memory_usage

Y

merge

Y

Only implemented for left_index=True and
right_index=True, defaults to pandas otherwise

Currently, whenever a Series is used or returned, we use a pandas Series. In the future,
we’re going to implement a distributed Series, but until then there will be some
performance bottlenecks. The pandas Series is completely compatible with all operations
that both require and return one in Modin.

A number of IO methods default to pandas. We have parallelized read_csv and
read_parquet, though many of the remaining methods can be relatively easily
parallelized. Some of the operations default to the pandas implementation, meaning it
will read in serially as a single, non-distributed DataFrame and distribute it.
Performance will be affected by this.

If you importmodin.pandasaspd the following operations are available from
pd.<op>, e.g. pd.concat. If you do not see an operation that pandas enables and
would like to request it, feel free to open an issue. Make sure you tell us your
primary use-case so we can make it happen faster!