Now it’s time to get serious and look at writing some simple code that can query a running Sphinx index and take advantage of its growing number of advanced query features. The Sphinx Documentation is obviously the definitive reference, but I hope to show just enough sample code that you realize how easy it is to start talking to a Sphinx server.

API Basics

There are Sphinx clients available in most popular languages. If you look in the api subdirectory of the source tree, you’ll find Ruby, Java, PHP, Python, and C (libsphinxclient). There’s a Perl module available (Sphinx-Search on CPAN) too. And if none of those are sufficient, the latest versions of Sphinx even support SQL-like queries issued via the MySQL protocol (on TCP port 3306, just like MySQL). Talk about an easy migration path from MySQL full-text!

Obviously the syntax in the various languages, differs, but the general approach for querying Sphinx is similar in all of them.

create a sphinx client object

set query options

set query

connect to sphinx server (if not connected)

send query

receive results

close connection

Note that it’s possible to batch queries and send several at once. Doing so allows Sphinx to perform more efficiently if some duplicate work can be done only once. However, that’s not often needed in traditional web deployments, but it can be useful in offline processing. Also, newer Sphinx releases have support for persistent connections. Not only do they reduce the fork() overhead (which can be substantial!) on the server side, they also reduce the TCP overhead and allow for higher throughput in high volume situations. As a result, step 4 and step 7 may not always apply.

Let’s look a simple PHP code example that connects to the Sphinx server running on localhost, searches for all documents that contains the phrase “hello world”, and sorts them by size.

That code makes use of the single file sphinxapi.php which is the PHP client API that’s shipped as part of every Sphinx release. In fact, the test suite used to validate new releases uses the PHP API heavily, so you can probably find example code to do just about anything you’d need.

As you can see, it follows the process outlined above. After a few variables are defined, we create a new Sphinx client object ($cl), set a few options, and then fire off the query. Iterating over the results is also very straightforward. The example above is intentionally short — it’s actually possible to retrieve some metadata (namely, the attributes) for each of the matched documents in the result set too.

Building on that simple foundation, there’s a lot more we can do.

Matching Modes

In the example code we used a call to SetMatchMode(), passing SPH_MATCH_PHRASE. That told Sphinx we wanted a phrase match–that is find “hello” and “world” used together. There are several other matching modes availble.

SPH_MATCH_EXTENDED2: support queries using Sphinx’s more complex query language

SPH_MATCH_FULLSCAN: search all documents, applying any specified filters and grouping

Between Boolean and extended2 (which replaces the original “extended” mode), you can construct queries complex enough for just about any circumstance.

Sorting Modes

Sphinx allows you to choose from several sorting modes that affect the order in which results are returned but not which documents match the query.

SPH_SORT_RELEVANCE: Sphinx default, sort from most relevant to least based on word frequency

SPH_SORT_ATTR_ASC: sort in ascending order based on an attribute

SPH_SORT_ATTR_DESC: sort in descending order based on an attribute

SPH_SORT_TIME_SEGMENTS: group by “time segment”, then sort by relevance within the groups

SPH_SORT_EXTENDED: configure sorting based on multiple attributes, each of which can be in ascending or descending order

SPH_SORT_EXPR: sort based on an arbitrary mathematical expression

To make this more concrete, consider this call:

$cl->SetSortMode(SPH_SORT_ATTR_DESC, "size");

That asks Sphinx to sort the documents from largest to smallest (based on the size attribute included in the earlier index definition).

In “extended” mode, you can use the attributes defined for your index as well as some of Sphinx’s internal attributes as well.

$cl->SetSortMode(SPH_SORT_EXTENDED, "size ASC, @id DESC");

That tells Sphinx to sort in ascending order by size and then in descending order by document id in the case of a tie. Extended mode is very powerful–especially if you have numerous attributes on which to sort (price, weight, date added, etc.).

Filtering

In addition to full-text search capabilities, Sphinx lets you use numeric attributes to refine a search. For example, in building a product search, you may want to find all products whose price is less than $500. Or maybe find all those that fall between $50 and $75. To do this, you’ll want to call SetFilter(), SetFilterRange(), or SetFloatFilterRange(). All three filtering functions allow you to specify either an inclusive or exclusive filter.

Using SetFilter() can find documents whose attributes match one or more values, or exclude those documents that match one or more values.

Between filters on attributes and the extended query language, you can handle a surprising array of query types without having to write a lot of custom code.

Geography

A special case of filtering and sorting based on attributes is geo-distance. If you have geocoded data, such as houses for sale or the locations of restaurants, you can add latitude and longitude attributes to your index and take advange of Sphinx’s built-in support. In the SPH_SORT_EXPR sorting mode, you can use the built-in GEODIST() function to compute the distance between two points of latitude and longitude. But it’s easier to use the SetGeoAnchor() call to tell Sphinx what the latitude and longitude attributes are called in your index and specify an “anchor” point from which distances will be computed.

$cl->SetGeoAnchor("lat", "lon", $latitude, $longitude);

Once that is done, you can use the magic attribute @geodist in both filters and sorting. That would allow you to, say, find all pizza places within a 5 mile radius of a given point and then sort the result set based on that distance.

Conclusion

Hopefully this has provided you with some ideas for the types of tweaking you can do behind the scenes to make Sphinx search just the way you expect (and need) it to. In addition to everything we’ve seen so far, Sphinx can also perform more complex grouping of results and it can also build “excerpts” of matched documents on the fly to show context (much like Google does). As always, it’s best to check the documentation for complete descriptions of the options as well as any gotchas or hints.

Happy searching!

Comments on "Sphinx: Queries and APIs"

The port you’re using for connecting (3306) is the default for mysql, not sphinx.

A formidable share, I simply given this onto a colleague who was doing a little bit evaluation on this. And he in fact bought me breakfast as a result of I found it for him.. smile. So let me reword that: Thnx for the treat! But yeah Thnkx for spending the time to discuss this, I feel strongly about it and love reading extra on this topic. If possible, as you turn into expertise, would you thoughts updating your weblog with extra details? It is highly helpful for me. Huge thumb up for this blog post!

We’re a group of volunteers and opening a new scheme in our community. Your site provided us with valuable information to work on. You have done an impressive job and our entire community will be grateful to you.comprar salomon speedcross 3 [url=http://patronatomera.gob.ec/omera.php?es=comprar-salomon-speedcross-3]comprar salomon speedcross 3[/url]