Hi everyone, we're considering moving from Sphinx to Elastic Search, but I
want to make sure it is a good fit before rewriting our infrastructure.

Currently we have 20 dual octo core 2690 machines with 32GB of ram. We
handle about 2,000 queries per second with the existing setup, but we have
some pain points that I believe ES can help with.

Index rotation. Our index is about 30GB in size, but the nature of
Sphinx means that each week when our dataset is updated, we have to reindex
the entire dataset. Further, we have to do this on all 20 machines. We
currently reindex on one central machine, then rsync the data to the 20
main servers, then perform rolling restarts. Rsyncing 30GB of data to 20
machines takes too long, and it is only going to get worse as the number of
servers increases.

MySQL... Since Sphinx uses MySQL as a datastore, we are going to reach a
point where our database becomes a bottleneck. Our MySQL servers have no
problem handling the current load of 5,000 or so queries per second, but DB
servers are expensive to scale, and I would rather store the data in ES and
skip SQL completely.

My understanding is that ES shards the index across machines as they are
added. We have worked with cassandra in the past. The concept seems very
similar?

I know it's difficult to predict, but how is query performance with ES? Is
CPU or memory/IO the main bottleneck? We're moving to a new datacenter
where we will have 30 or so dual hex core machines but with 72GB of memory
each instead of the 32 we have in the current machines. Since ES shards the
index across machines, we should have no problem storing everything in
memory, so i'm guessing we would still be cpu bound (as we are with sphinx.
It isn't a problem, just wondering if I can expect the same).

Has anyone here moved a large scale sphinx cluster over to ES? Any gotcha's
that i'm overlooking? Any super easy migration plan you found that will cut
our development time from months down to days/weeks lol? I know, wishful
thinking.

We helped a client successfully migrate from Sphinx to Solr last year.
Migration to ES should be similar to what we did for them for Solr.

Inline...

On Friday, January 3, 2014 12:08:55 PM UTC-5, Brian Lovett wrote:

Hi everyone, we're considering moving from Sphinx to Elastic Search, but I
want to make sure it is a good fit before rewriting our infrastructure.

Currently we have 20 dual octo core 2690 machines with 32GB of ram. We
handle about 2,000 queries per second with the existing setup, but we have
some pain points that I believe ES can help with.

Index rotation. Our index is about 30GB in size, but the nature of
Sphinx means that each week when our dataset is updated, we have to reindex
the entire dataset. Further, we have to do this on all 20 machines. We
currently reindex on one central machine, then rsync the data to the 20
main servers, then perform rolling restarts. Rsyncing 30GB of data to 20
machines takes too long, and it is only going to get worse as the number of
servers increases.

Right, this sounds a bit old-style. With ES you'll be able to do
incremental indexing and you'll be able to forget about rsync.

MySQL... Since Sphinx uses MySQL as a datastore, we are going to reach
a point where our database becomes a bottleneck. Our MySQL servers have no
problem handling the current load of 5,000 or so queries per second, but DB
servers are expensive to scale, and I would rather store the data in ES and
skip SQL completely.

Right. Software like ES was built with sharding and replication in mind.
You still have to think about how to shard, replicate, query, etc. but it
will feel much more natural.

My understanding is that ES shards the index across machines as they are

added. We have worked with cassandra in the past. The concept seems very
similar?

Similar, but not quite the same. You tell ES how many shards you want to
split your index into when you create the index. ES currently cannot split
an existing shard and you currently can't change the number of shards later
on, but there are ways around that when/IF you end up needing that.

I know it's difficult to predict, but how is query performance with ES? Is

CPU or memory/IO the main bottleneck? We're moving to a new datacenter
where we will have 30 or so dual hex core machines but with 72GB of memory
each instead of the 32 we have in the current machines. Since ES shards the
index across machines, we should have no problem storing everything in
memory, so i'm guessing we would still be cpu bound (as we are with sphinx.
It isn't a problem, just wondering if I can expect the same).

Right, it's impossible to tell without a lot more details - query types.
complexity, cache utilization, etc. etc. But 30 dual hex core servers with
72 GB RAM everything should be in memory if your index ends up being more
or less the same size as in Sphinx.

Has anyone here moved a large scale sphinx cluster over to ES? Any
gotcha's that i'm overlooking? Any super easy migration plan you found that
will cut our development time from months down to days/weeks lol? I know,
wishful thinking.

In short: good decision. It shouldn't take months as far as ES work is
concerned.