Activity

I haven't yet looked at the contributions (but I did read the whitepaper).

It seems like we want lat+lon in the same field value. That will remove the need for any other mechanism to correlate the two (what lat goes with what lon) and will allow future indexing mechanisms that operate on both values at once.

Do we need a new basic output type (in addition to str, int, long, etc)? For now perhaps we should just use a string representation?
<str name="my_house">12.345,-67.89</str>
or in JSON
"my_house":"12.345,-67.89"

So breaking things down, it seems like we basically need to be able to:
1) filter by a bounding box
2) filter by a geo radius (impl could first get the bounding box and narrow within that...)
3) sort by distance
4) return the distance

It also seems like there could be an opportunity to make much/most of this generic (not specific to geosearch).

Yonik Seeley
added a comment - 17/Sep/08 22:42 I haven't yet looked at the contributions (but I did read the whitepaper).
It seems like we want lat+lon in the same field value. That will remove the need for any other mechanism to correlate the two (what lat goes with what lon) and will allow future indexing mechanisms that operate on both values at once.
Do we need a new basic output type (in addition to str, int, long, etc)? For now perhaps we should just use a string representation?
<str name="my_house">12.345,-67.89</str>
or in JSON
"my_house":"12.345,-67.89"
So breaking things down, it seems like we basically need to be able to:
1) filter by a bounding box
2) filter by a geo radius (impl could first get the bounding box and narrow within that...)
3) sort by distance
4) return the distance
It also seems like there could be an opportunity to make much/most of this generic (not specific to geosearch).

As for storing lat/lon in a single field... that sounds really interesting. Currently the local lucene stuff uses two fields and NumberUtils.java to index/store the distance. It does a lot of good work to break various bounding box levels into tokens and only performs math on the minimum result set.

We should consider a geohash field type: http://en.wikipedia.org/wiki/Geohash to store lat/lon in a single string. This has some really interesting features that are ideal for lucene. In particular, checking if a point is within a bounding box is simply a lexicographic range query.

Ryan McKinley
added a comment - 18/Sep/08 00:00 LocalLucene/Solr are currently designed to do exactly points 1-4.
As for storing lat/lon in a single field... that sounds really interesting. Currently the local lucene stuff uses two fields and NumberUtils.java to index/store the distance. It does a lot of good work to break various bounding box levels into tokens and only performs math on the minimum result set.
We should consider a geohash field type: http://en.wikipedia.org/wiki/Geohash to store lat/lon in a single string. This has some really interesting features that are ideal for lucene. In particular, checking if a point is within a bounding box is simply a lexicographic range query.
Here is a public domain python geohash implementation: http://mappinghacks.com/code/geohash.py.txt

Ryan McKinley
added a comment - 18/Sep/08 02:58 We should also consider the OGC standard "Well Known Text":
http://en.wikipedia.org/wiki/Well-known_text
This is what MySQL and PostGIS use to enter GIS data

Placing both lat and long in the same field is good when used internally, the majority of users
of localsolr have separate fields representing lat long, so make sure the representation
does not effect the original document.

WKT uses point as the naming convention for single item's and I'd suggest that rather than just str, it would
also be nice to get to a KML wt format as well. I've done some stuff integrating in mapping components and
KML goes down real well.

However be aware as soon as you start supporting WKT, you will be asked for ERSI support, and poly support,
ray tracing, collision and a lot more fun things

patrick o'leary
added a comment - 18/Sep/08 16:07 Hey guys
Placing both lat and long in the same field is good when used internally, the majority of users
of localsolr have separate fields representing lat long, so make sure the representation
does not effect the original document.
WKT uses point as the naming convention for single item's and I'd suggest that rather than just str, it would
also be nice to get to a KML wt format as well. I've done some stuff integrating in mapping components and
KML goes down real well.
However be aware as soon as you start supporting WKT, you will be asked for ERSI support, and poly support,
ray tracing, collision and a lot more fun things
P

There are two implementations of LocalUpdateProcessorFactory:
com.mapquest
com.pjaol

They do slightly different things...

The pjaol version creates a CartesianTierPlotter and then builds a bunch of fields for each level: _localTierN

The mapquest verison puts a bunch of spatial tokens (sid/SpatialIndex) into a single field.

Any pointers on why one approach over the other? Do they solve the same problem?

The mapquest version seems like it could be easily replaced with an Analyzer... perhaps one that takes a single lat/lon string:
<str name="location">12.345 -67.89</str>
and then generates tokens for it. All the plumbing to encode the data in an updateProcessor then decode it in a FieldType seems a bit awkward.

Ryan McKinley
added a comment - 18/Sep/08 17:38 I'm looking over the code grant now... (thanks again!)
There are two implementations of LocalUpdateProcessorFactory:
com.mapquest
com.pjaol
They do slightly different things...
The pjaol version creates a CartesianTierPlotter and then builds a bunch of fields for each level: _localTierN
The mapquest verison puts a bunch of spatial tokens (sid/SpatialIndex) into a single field.
Any pointers on why one approach over the other? Do they solve the same problem?
The mapquest version seems like it could be easily replaced with an Analyzer... perhaps one that takes a single lat/lon string:
<str name="location">12.345 -67.89</str>
and then generates tokens for it. All the plumbing to encode the data in an updateProcessor then decode it in a FieldType seems a bit awkward.

The differences are, com.pjaol uses pretty exact measurements, the flattening method is based on something
called a sinusoidal projection, where I translate lat / longs to x,y coordinates which provide an equal spaced projection on a flat surface. Then I use GeoTools for the actual precise distance calculation.

It comes at a slight performance cost to be that exact, but users have a need for it.

The com.mapquest code, does a direct conversion to cartesian x,y coordinate from lat / long, encodes and generates sids and uses a standard great circle equation for distance calculation. So not as convoluted.
It does come at a slight accuracy cost - but only in a few places, Greenland, New Zealand, some places around the poles and equator.

So it's perfect for web based applications as the +/- error differential is small enough to be acceptable for most users.
There is however a good audience for local lucene, who use it for more exact calculation, even down to the meter range. It's also used by some research groups for non-land based activities hence the desire to retain the exactness.

patrick o'leary
added a comment - 18/Sep/08 19:28 I believe you guys are using a branch of the code as we were looking at using the mapquest sids.
Both versions are solving the same basic problem, creating a sudo quad tree implementation.
com.pjaol was the initial API I built, com.mapquest is donated to us by MapQuest.
Both versions work by flattening out the earth onto a series of grids, the grids get progressively smaller
with each _localTierN, in the MapQuest version there is a notion of zooming.
Some quick info graphics here:
http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html
The differences are, com.pjaol uses pretty exact measurements, the flattening method is based on something
called a sinusoidal projection, where I translate lat / longs to x,y coordinates which provide an equal spaced projection on a flat surface. Then I use GeoTools for the actual precise distance calculation.
It comes at a slight performance cost to be that exact, but users have a need for it.
The com.mapquest code, does a direct conversion to cartesian x,y coordinate from lat / long, encodes and generates sids and uses a standard great circle equation for distance calculation. So not as convoluted.
It does come at a slight accuracy cost - but only in a few places, Greenland, New Zealand, some places around the poles and equator.
So it's perfect for web based applications as the +/- error differential is small enough to be acceptable for most users.
There is however a good audience for local lucene, who use it for more exact calculation, even down to the meter range. It's also used by some research groups for non-land based activities hence the desire to retain the exactness.

Ryan McKinley
added a comment - 18/Sep/08 20:41 Thanks for the clarification...
>
> Then I use GeoTools for the actual precise distance calculation.
>
FYI, in the initial apache checking, I'm removing the GeoTools dependancy (it is LGPL)
I'd like to make the distanceHandler logic for distance calculations pluggable so its easy to link to GeoTools when necessary.

1. LocalSolrQueryComponent duplicates most of the code from SolrQueryComponent. Perhaps a better solution would be to have a custom QParser that builds the query and then add a SearchComponent to the chain to augment the results with the calculated distance.

2. (related) If the query is implemented as a QParser, we would just need to implement:

Ryan McKinley
added a comment - 01/Jan/09 19:53 Thanks patrick!
Two things stick out to me:
1. LocalSolrQueryComponent duplicates most of the code from SolrQueryComponent. Perhaps a better solution would be to have a custom QParser that builds the query and then add a SearchComponent to the chain to augment the results with the calculated distance.
2. (related) If the query is implemented as a QParser, we would just need to implement:
public SortSpec getSort( boolean useGlobalParams) throws ParseException
rather then use the LocalSolrSortParser.

Ryan McKinley
added a comment - 01/Jan/09 19:58 Not a big deal, but it looks like the List<CartesianTierPlotter> plotters could be initialized once for the Factory then reused rather then initializing it for each request.

Ryan McKinley
added a comment - 02/Jan/09 01:15 Here is a (totally untested) patch that uses QParser.
This requires some small tweeks to the QParser class to make the sort parsing extensible.
Take a look and see what you think...

Ryan McKinley
added a comment - 02/Jan/09 04:51 This version runs, but still no tests.
I added spatial stuff to the example configs, but I'm not sure I like that long term. The examples are getting a bit over cluttered.
http://localhost:8983/solr/select?q=*:*&qt=geo&lat=40&long=-75&radius=99

Lucene uses a static sort comparator getCachedComparator in lucene's FieldSortedHitQueue.java
The assumption being that the sort comparator would never have any data in it I guess.

As the distances in the geo sort are a hashmap produced by the distance query, the ScoreDocComparator creates a memory leak
unless the scope of the distance query is within the process block.
It's messy but the only work around I could find.

Putting the distance query in the response builder could make this leak again.

patrick o'leary
added a comment - 02/Jan/09 20:50 Lucene uses a static sort comparator getCachedComparator in lucene's FieldSortedHitQueue.java
The assumption being that the sort comparator would never have any data in it I guess.
As the distances in the geo sort are a hashmap produced by the distance query, the ScoreDocComparator creates a memory leak
unless the scope of the distance query is within the process block.
It's messy but the only work around I could find.
Putting the distance query in the response builder could make this leak again.

Ryan McKinley
added a comment - 02/Jan/09 21:15 hymmm. I don't follow. Is the problem that the HashMap stays in static memory for each request? If so, could we put the map in the request context?
Is this an issue with the lucene Sort Comparator interface or with how the solr implementation passes the results around?

patrick o'leary
added a comment - 02/Jan/09 21:29 It's because of the FieldSortedHitQueue in lucene, even though sorts are generally created as new objects, the FieldSortedHitQueue maintains a static cache of them-
Somebody actually had another work around
http://mail-archives.apache.org/mod_mbox/lucene-java-user/200806.mbox/%3C571296.22735.qm@web50301.mail.re2.yahoo.com%3E
I haven't tried it, but it might be an option.

There is a patch LUCENE-1304 for SortComparatorSourceUncacheable, which hasn't had any TLC in a while
It's been associated with LUCENE-1483 which looks like a major change that could take a while to get in.

I'd like to see if we can get movement on LUCENE-1304 as it would help with some of the scope madness I've had to deal
with, and resolve the issue once and for all.

patrick o'leary
added a comment - 06/Jan/09 16:38 There is a patch LUCENE-1304 for SortComparatorSourceUncacheable, which hasn't had any TLC in a while
It's been associated with LUCENE-1483 which looks like a major change that could take a while to get in.
I'd like to see if we can get movement on LUCENE-1304 as it would help with some of the scope madness I've had to deal
with, and resolve the issue once and for all.

I just ran 'svn up' and 'ant test', a bunch of solrj things fail – i can't look into just now, but I'll post anyway.

- - - -

Note, this patch has a bunch of weirdness to try to avoid a memory error with custom sorting in lucene. The new field options in LUCENE-1483 should avoid this problem, but LocalLucene must be refactored to use the new sorting classes first.

Ryan McKinley
added a comment - 13/Apr/09 19:08 dooh – here is a patch that includes SpatialParams
I just ran 'svn up' and 'ant test', a bunch of solrj things fail – i can't look into just now, but I'll post anyway.
- - - -
Note, this patch has a bunch of weirdness to try to avoid a memory error with custom sorting in lucene. The new field options in LUCENE-1483 should avoid this problem, but LocalLucene must be refactored to use the new sorting classes first.

patrick o'leary
added a comment - 13/Apr/09 19:51 Thanks Ryan, I've also updated local / spatial lucene to use the new FieldComparatorSource with LUCENE-1588
But haven't had a chance to test it in Solr yet

Grant Ingersoll
added a comment - 15/Apr/09 20:11 I started documentation at: http://wiki.apache.org/solr/LocalSolr
I've also at least taken care of PJ's comment on incorporating FieldCompSource from a compilation standpoint. I'm in the process of setting up some unit tests as well.

Grant Ingersoll
added a comment - 15/Apr/09 20:28 We should be able to incorporate the GeoHash stuff in Lucene now, right? I'm not spatial expert, but this means we could have an update processor that only uses one field, right?

GeoHash can be incorporated to reduce memory, but it should be optional as there's sill overhead in decoding the
field for distance calculations. Again haven't been able to put a benchmark together for it, but i did notice it was slower.

patrick o'leary
added a comment - 15/Apr/09 21:11 GeoHash can be incorporated to reduce memory, but it should be optional as there's sill overhead in decoding the
field for distance calculations. Again haven't been able to put a benchmark together for it, but i did notice it was slower.

Here's a patch that compiles and the example works. The Lucene gzip contains the Lucene libs that I used (basically trunk from two nights ago) including the spatial contrib. It incorporates LUCENE-1588 for sorting.

Grant Ingersoll
added a comment - 15/Apr/09 21:19 Here's a patch that compiles and the example works. The Lucene gzip contains the Lucene libs that I used (basically trunk from two nights ago) including the spatial contrib. It incorporates LUCENE-1588 for sorting.
Still needs tests and some more example data.

Grant Ingersoll
added a comment - 15/Apr/09 21:34 OK, so color me a total geo newbie, but...
So, if I index the spatial.xml in the patch I just submitted and I execute:
http: //localhost:8983/solr/select?q=name:five
I get one result, which is expected.
If I then do a geo search:
http: //localhost:8983/solr/select?q=name:five&qt=geo& long =-74.0093994140625&lat=40.75141843299745&radius=100&debugQuery= true
I get two results. The second result is the other theater in the spatial.xml file. Yet, it does not contain the value "five" in the name field even though it meets the spatial search criteria.
Shouldn't there just be one result? What am I not understanding?

OK, I think I understand why it does this, but it seems a little odd to me. The reason is due to the fact that the geo handler uses the geo QParser, which ignores the query parameter and produces a query solely based on the lat/lon information.

Like I said, I'm a newbie to geo search, but it seems like the QParser should delegate the parsing of the q param to some other parser and then it would only do distance calculations on the docset returned from the QueryComponent. Of course, I guess one could ask what the semantics are of combining a text query with a spatial query, but I would suppose we could combine them with either AND or OR, right, such that if I OR'd them together, I would get all docs matching the query term OR'd with all docs in the bounding box. Similarily, AND would yield all docs with the term in the bounding box, right?

Grant Ingersoll
added a comment - 15/Apr/09 22:38 OK, I think I understand why it does this, but it seems a little odd to me. The reason is due to the fact that the geo handler uses the geo QParser, which ignores the query parameter and produces a query solely based on the lat/lon information.
Like I said, I'm a newbie to geo search, but it seems like the QParser should delegate the parsing of the q param to some other parser and then it would only do distance calculations on the docset returned from the QueryComponent. Of course, I guess one could ask what the semantics are of combining a text query with a spatial query, but I would suppose we could combine them with either AND or OR, right, such that if I OR'd them together, I would get all docs matching the query term OR'd with all docs in the bounding box. Similarily, AND would yield all docs with the term in the bounding box, right?
Again, I am likely missing something, so bear with me.

Looking at it, there's no actual query parsing going on.
You could call LuceneQParser, but it just doesn't seem like the right place for it.
The original LocalSolr code created a filter to perform the geo-distance stuff- but it did have
to duplicate a lot the SearchComponent code.

patrick o'leary
added a comment - 16/Apr/09 21:45 Looking at it, there's no actual query parsing going on.
You could call LuceneQParser, but it just doesn't seem like the right place for it.
The original LocalSolr code created a filter to perform the geo-distance stuff- but it did have
to duplicate a lot the SearchComponent code.

patrick o'leary
added a comment - 17/Apr/09 19:38 This fixes the query parsing issue, it defaults to the use the default QParserPlugin
and allows you to specify a basedOn optional argument, to use a different QParserPlugin
<queryParser name= "spatial_tier" class= "org.apache.solr.spatial.tier.SpatialTierQueryParserPlugin" basedOn= "dismax" />
There are a couple of things to note
Latest distance facet code not included
Faster distance filter using query intersect isn't working (spatial lucene fix)
fsv for shard sorting not present
I feel fsv should be extracted to a separate component to reduce the duplication of effort across
other search components. But this will give the basics for the moment.

It seems like quite a lot of work has gone into working around some of Solr's current limitations... perhaps we should fix them instead? It seems like we should be able to avoid custom request handlers, query components, or update processors and simply use generic mechanisms.

From the user interface point of view, what's needed is:

A way to filter by a bounding box. This could simply be a custom QParser
fq=
{!gbox p=101.2,234.5 f=position, d=1.5}

// a bounding box, centered on 101.2,234.5 including everything within 1.5 miles

A way to sort by a function query... this is generic desired functionality anyway!

A way to return the value of a function query for documents - also generic desired functionality. Perhaps use meta as proposed in SOLR-705?

If we had that, then geo becomes very generic - no need for special distributed search support, and one could do things like boosting the relevancy score by a function of the distance (not even necessarily a linear boost because of the flexibility of function query). If/when we get faceting on a function query, it will also automatically work with distances.

It seems like points should be stored and represented in a single field, that way there can be multiple points per document (otherwise how would one correlate which latitude went with which longitude). How it's indexed (multiple fields, etc) is more of an implementation detail. There is an issue with how to allow a single field to index to multiple fields - another Solr limitation we should figure out how to fix (an earlier version of TrieRangeQuery needed this too).

Yonik Seeley
added a comment - 25/Apr/09 16:04 It seems like quite a lot of work has gone into working around some of Solr's current limitations... perhaps we should fix them instead? It seems like we should be able to avoid custom request handlers, query components, or update processors and simply use generic mechanisms.
From the user interface point of view, what's needed is:
A way to filter by a bounding box. This could simply be a custom QParser
fq=
{!gbox p=101.2,234.5 f=position, d=1.5}
// a bounding box, centered on 101.2,234.5 including everything within 1.5 miles
A function query that calculates distances
gdist(position,101.2,234.3)
A way to sort by a function query... this is generic desired functionality anyway!
A way to return the value of a function query for documents - also generic desired functionality. Perhaps use meta as proposed in SOLR-705 ?
If we had that, then geo becomes very generic - no need for special distributed search support, and one could do things like boosting the relevancy score by a function of the distance (not even necessarily a linear boost because of the flexibility of function query). If/when we get faceting on a function query, it will also automatically work with distances.
It seems like points should be stored and represented in a single field, that way there can be multiple points per document (otherwise how would one correlate which latitude went with which longitude). How it's indexed (multiple fields, etc) is more of an implementation detail. There is an issue with how to allow a single field to index to multiple fields - another Solr limitation we should figure out how to fix (an earlier version of TrieRangeQuery needed this too).

I am not familiar with the function query internal – would it get called for things that do not match the filter? Distance calculations are typically the most expensive part of the query.

re "It seems like points should be stored and represented in a single field..." – I agree that the schema and URL API should point to a single field to represent the geometry field. In practice, the indexing will probably need multiple fields to get the job done (efficiently).

It would be great if the schema field type could define everything needed to index and search. There are (at least) three approaches to indexing points that each have their advantages (and disadvantages) – we should be able to support any of these options.

GeoPointField (abstract? the standard stuff about dealing with points)

Ryan McKinley
added a comment - 27/Apr/09 19:12 Yonik – I like all 4 proposals.
I am not familiar with the function query internal – would it get called for things that do not match the filter? Distance calculations are typically the most expensive part of the query.
re "It seems like points should be stored and represented in a single field..." – I agree that the schema and URL API should point to a single field to represent the geometry field. In practice, the indexing will probably need multiple fields to get the job done (efficiently).
It would be great if the schema field type could define everything needed to index and search. There are (at least) three approaches to indexing points that each have their advantages (and disadvantages) – we should be able to support any of these options.
GeoPointField (abstract? the standard stuff about dealing with points)
GeoPointFieldHash (represented as a GeoHash, fast bounds query (with limited accuracy))
GeoPointFieldTiers (highly scalable, fast, complex)
GeoPointFieldTrie (...)
GeoLineField...
GeoPolygonField...
I think it makes sense to try to follow the georss format to represent geometry:
<georss:point> 45.256 -71.92 </georss:point>
<georss:line> 45.256 -110.45 46.46 -109.48 43.84 -109.86 </georss:line>
<georss:polygon>
45.256 -110.45 46.46 -109.48 43.84 -109.86 45.256 -110.45
</georss:polygon>
<georss:box> 42.943 -71.032 43.039 -69.856 </georss:box>

I am not familiar with the function query internal - would it get called for things that do not match the filter? Distance calculations are typically the most expensive part of the query.

Right... function query currently does get called before filters are checked - but that's because current function queries are part of the main relevancy query. A function query that was only used to sort would only be called for docs that match the main relevancy query and all filters though.

The performance issue would be the "boost" scenario - when the distance calculation is part of the main query. That's another generic Solr issue we should tackle at some point... filter efficiency. Related to LUCENE-1536 I think (but we could already do this relatively easily for BitDocSet... just not HashDocSet).

Yonik Seeley
added a comment - 27/Apr/09 20:03 I am not familiar with the function query internal - would it get called for things that do not match the filter? Distance calculations are typically the most expensive part of the query.
Right... function query currently does get called before filters are checked - but that's because current function queries are part of the main relevancy query. A function query that was only used to sort would only be called for docs that match the main relevancy query and all filters though.
The performance issue would be the "boost" scenario - when the distance calculation is part of the main query. That's another generic Solr issue we should tackle at some point... filter efficiency. Related to LUCENE-1536 I think (but we could already do this relatively easily for BitDocSet... just not HashDocSet).

I'm interested in using LocalSOLR and the 4 proposals described by Yonik above are exactly what I need for an app here to stand up oceans data search here at work. We need the ability to do bounding box queries and spatial queries of the following form:

1. Lat, Lon, Radius Template Element
"&lat=

{geo:lat?}

&lon=

{geo:lon?}

&r=

{geo:radius?}

"

With latitude and longitude in decimal degrees in EPSG:4326 format. The radius parameter is in meters along the surface.

2. Box Template Element
"&bbox=

{geo:box?}

"

Bounding box coordinates in EPSG:4326 format in decimal degrees.

Ordering is "west, south, east, north".

3. Polygon
"&p=

{geo:polygon?}

"

Replaced with the latitude/longitude pairs describing a bounding area to perform a search within. The polygon is defined in latitude, longitude pairs, in clockwise order around the polygon, with

I realize that #3 above is probably a ways off, but how close are we to #1 and #2? I'm trying to push to use SOLR here rather than leverage a custom or COTS solution, but will need at least support for #1 and #2 to make any headway. I'm willing to contribute and help out towards this – I just want to find out where we are.

Chris A. Mattmann
added a comment - 11/May/09 14:30 Hi Guys,
I'm interested in using LocalSOLR and the 4 proposals described by Yonik above are exactly what I need for an app here to stand up oceans data search here at work. We need the ability to do bounding box queries and spatial queries of the following form:
1. Lat, Lon, Radius Template Element
"&lat=
{geo:lat?}
&lon=
{geo:lon?}
&r=
{geo:radius?}
"
With latitude and longitude in decimal degrees in EPSG:4326 format. The radius parameter is in meters along the surface.
2. Box Template Element
"&bbox=
{geo:box?}
"
Bounding box coordinates in EPSG:4326 format in decimal degrees.
Ordering is "west, south, east, north".
3. Polygon
"&p=
{geo:polygon?}
"
Replaced with the latitude/longitude pairs describing a bounding area to perform a search within. The polygon is defined in latitude, longitude pairs, in clockwise order around the polygon, with
I realize that #3 above is probably a ways off, but how close are we to #1 and #2? I'm trying to push to use SOLR here rather than leverage a custom or COTS solution, but will need at least support for #1 and #2 to make any headway. I'm willing to contribute and help out towards this – I just want to find out where we are.
Thanks!
Cheers,
Chris

I think, and correct me if I'm wrong, that one of the things that often happens with geo stuff is that there are a lot of unique values. This often has memory ramifications when using with FunctionQueries since most ValueSources uninvert the field.

Grant Ingersoll
added a comment - 12/May/09 18:57 - edited I think, and correct me if I'm wrong, that one of the things that often happens with geo stuff is that there are a lot of unique values. This often has memory ramifications when using with FunctionQueries since most ValueSources uninvert the field.
Otherwise, I like the sounds of Yonik's proposal as well.

FunctionQuery would just be the interface to the underlying geo distance function... it doesn't seem like it should affect the memory requirements of that underlying function (however it's currently implemented in local solr).

Use of TrieRange could just be another implementation detail on how to quickly implement a bounding box function... it doesn't sound like it's necessarily needed with the cartesian tier strategy.

Yonik Seeley
added a comment - 12/May/09 19:30 FunctionQuery would just be the interface to the underlying geo distance function... it doesn't seem like it should affect the memory requirements of that underlying function (however it's currently implemented in local solr).
Use of TrieRange could just be another implementation detail on how to quickly implement a bounding box function... it doesn't sound like it's necessarily needed with the cartesian tier strategy.

LocalLucene does something similar like TrieRange, but in two dimensions. It stores the Latitude and Longitude in one field as the number of a small rectangle (Cartesian tier) and the lower precision are simply bigger rectangles (I think they are quadrats). The effect is, that you only need one field name for the search, but you have the problem of limited precision.

TrieRange on the other side is more universal for any numeric searches and is not limited to Geo. The bounding box search in Solr as proposed in the issue can also be simply done with two ints (e.g. by scaling the lat/lon by a factor like 1000000 for 6 digits after decimal point) or float field TrieRangeQueries. Interesting would be a comparison in speed and index size between LocalLucene and TrieRange. Both can be simply done with Solr, but I had no time for it.

For our case (PANGAEA) we have another problem that is only solveable by TrieRange, not LocalLucene: Our Datasets itself are bounding boxes and if the user enters a bounding box, a hit is, if they intersect. This can be easily done with four half-open ranges. There is a small speed impact because of the half-open ranges that may hit very much TermDocs for the lower precs, but maybe I will create a special combined filter, that collects TermDocs only into one BitSet, so you can combine this ranges easily (but no idea, how to make an senseful API for that).

Another idea to use TrieRange for geo search is using a hilbert curve on the earth and just do a range around the position on this curve (look on the picture on http://en.wikipedia.org/wiki/Hilbert_curve then it is clear what the idea is). As far as I know, geohash is working with this hilbert curve (it's the position on this curve), so if you index the binary geohash as a long with TrieRange, you could do this range very simply (correct me if I am wrong!). The drawback is, that you will only find quadratic areas (so the use case is: find all phone cells around (lat,lon)).

In my opinion, I would recommend the following:
If you need standard queries like find all phone cells around a position, use LocalLucene. If you need full flexibility, just see lat/lon or whatever CRS (Gauss-Krüger etc.) as two numeric values, where you can do SQL-like "between", ">", "<", ">=" and "<=" searches very fast.

Uwe Schindler
added a comment - 12/May/09 19:49 - edited Also, how does the TrieRange stuff factor into this?
LocalLucene does something similar like TrieRange, but in two dimensions. It stores the Latitude and Longitude in one field as the number of a small rectangle (Cartesian tier) and the lower precision are simply bigger rectangles (I think they are quadrats). The effect is, that you only need one field name for the search, but you have the problem of limited precision.
TrieRange on the other side is more universal for any numeric searches and is not limited to Geo. The bounding box search in Solr as proposed in the issue can also be simply done with two ints (e.g. by scaling the lat/lon by a factor like 1000000 for 6 digits after decimal point) or float field TrieRangeQueries. Interesting would be a comparison in speed and index size between LocalLucene and TrieRange. Both can be simply done with Solr, but I had no time for it.
For our case (PANGAEA) we have another problem that is only solveable by TrieRange, not LocalLucene: Our Datasets itself are bounding boxes and if the user enters a bounding box, a hit is, if they intersect. This can be easily done with four half-open ranges. There is a small speed impact because of the half-open ranges that may hit very much TermDocs for the lower precs, but maybe I will create a special combined filter, that collects TermDocs only into one BitSet, so you can combine this ranges easily (but no idea, how to make an senseful API for that).
Another idea to use TrieRange for geo search is using a hilbert curve on the earth and just do a range around the position on this curve (look on the picture on http://en.wikipedia.org/wiki/Hilbert_curve then it is clear what the idea is). As far as I know, geohash is working with this hilbert curve (it's the position on this curve), so if you index the binary geohash as a long with TrieRange, you could do this range very simply (correct me if I am wrong!). The drawback is, that you will only find quadratic areas (so the use case is: find all phone cells around (lat,lon)).
In my opinion, I would recommend the following:
If you need standard queries like find all phone cells around a position, use LocalLucene. If you need full flexibility, just see lat/lon or whatever CRS (Gauss-Krüger etc.) as two numeric values, where you can do SQL-like "between", ">", "<", ">=" and "<=" searches very fast.

Lets take a step back for a second, and ask a couple of questions, my thoughts are provided.

1) What is the goal we want to achieve?

Provide a first iteration of a geographical search entity to SOLR

Bring an external popular plugin, in out of the cold into ASF and SOLR, helps solr users out, increases developers from 1 to many.

2) What is the level of commitment, and road map of spatial solutions in lucene and solr?

The primary goal of SOLR is as a text search engine, not GIS search, there are other and better ways to do that
without reinventing the wheel and shoe horn-ing it into lucene.
(e.g. persistent doc id mappings that can be referenced outside of lucene, so things like postGis and other tools can be used)

We can never fully solve everyone's needs at once, lets start with what we have, and iterate upon it.

I'm happy for any improvements as long as they keep to two goals A. don't make it stupid B. don't make it complex.

The reason for the difference, is with Trie Ranges, you are doing 2 sets of range filters/ queries.
Cartesian you are doing 1 iteration for maybe 4 to 16 fielded id's.
And maybe switching the _localTier fields from sdouble to tdouble might improve that, I haven't tried, 12ms is something I can live with.

However, the distance calculation is the killer, 300,000 took about 1.8 seconds in a single thread on a 3.2GHz machine.

I was working on some additional features in locallucene, such as poly lines, and convex hulls, which using the Cartesian tierIds
can give some basic quick features such as intersect, contains, and a nifty feature of having sorted id's is nearby results.

Also faceting on tierId's can give you hot spot results.
One final feature, the projection method is a an implementation of IProjector, which allows you to create your own projection
currently I'm using Sinusoidal, but you can do your own, such as say

Google Mercator (I use a similar quad grid concept, just different projection method)

Open Map
etc..

There's a lot that can be done, but we should stay focused on primary goals, and iterate, iterate iterate.

patrick o'leary
added a comment - 12/May/09 21:27 Sorry for not getting into this sooner-
Lets take a step back for a second, and ask a couple of questions, my thoughts are provided.
1) What is the goal we want to achieve?
Provide a first iteration of a geographical search entity to SOLR
Bring an external popular plugin, in out of the cold into ASF and SOLR, helps solr users out, increases developers from 1 to many.
2) What is the level of commitment, and road map of spatial solutions in lucene and solr?
The primary goal of SOLR is as a text search engine, not GIS search, there are other and better ways to do that
without reinventing the wheel and shoe horn-ing it into lucene.
(e.g. persistent doc id mappings that can be referenced outside of lucene, so things like postGis and other tools can be used)
We can never fully solve everyone's needs at once, lets start with what we have, and iterate upon it.
I'm happy for any improvements as long as they keep to two goals A. don't make it stupid B. don't make it complex.
3) Raw Math through trie data structures, Spatial ids geo hash, Tier Id's Cartesian tiers, which one?
Why not all? Again we can't solve everyone's needs so why not let them have the tools to help themselves.
As for bench marking, I have performed some recently using tdouble precision 0,
~1 Million docs covering the state of NY
Top density was ~300,000 between Manhattan & Brooklyn area.
Returning all results, avg of 100 hits:
Trie Double: 108ms
Cartesian Tier: 12ms
The reason for the difference, is with Trie Ranges, you are doing 2 sets of range filters/ queries.
Cartesian you are doing 1 iteration for maybe 4 to 16 fielded id's.
And maybe switching the _localTier fields from sdouble to tdouble might improve that, I haven't tried, 12ms is something I can live with.
However, the distance calculation is the killer, 300,000 took about 1.8 seconds in a single thread on a 3.2GHz machine.
I was working on some additional features in locallucene, such as poly lines, and convex hulls, which using the Cartesian tierIds
can give some basic quick features such as intersect, contains, and a nifty feature of having sorted id's is nearby results.
Also faceting on tierId's can give you hot spot results.
One final feature, the projection method is a an implementation of IProjector, which allows you to create your own projection
currently I'm using Sinusoidal, but you can do your own, such as say
Google Mercator (I use a similar quad grid concept, just different projection method)
Open Map
etc..
There's a lot that can be done, but we should stay focused on primary goals, and iterate, iterate iterate.

As for bench marking, I have performed some recently using tdouble precision 0,
~1 Million docs covering the state of NY
Top density was ~300,000 between Manhattan & Brooklyn area.

I wonder, what you mean with precison 0, so what was the precision step? 2, 4 or 8? precisionStep=0 should throw IAE, 64 should do a classical RangeQuery (enumerating all terms).

And maybe switching the _localTier fields from sdouble to tdouble might improve that, I haven't tried, 12ms is something I can live with.

I think much faster will not be possible. Even with TrieRange you always have to visit TermDocs. And something other: as you only return 100 docs, the number of terms visited may not be so big. The speed improvement of TrieRange is more visible the more distinct values are in the range.

Uwe Schindler
added a comment - 12/May/09 22:24 Hi Patrick,
thanks for ding the comparison!
As for bench marking, I have performed some recently using tdouble precision 0,
~1 Million docs covering the state of NY
Top density was ~300,000 between Manhattan & Brooklyn area.
I wonder, what you mean with precison 0, so what was the precision step? 2, 4 or 8? precisionStep=0 should throw IAE, 64 should do a classical RangeQuery (enumerating all terms).
And maybe switching the _localTier fields from sdouble to tdouble might improve that, I haven't tried, 12ms is something I can live with.
I think much faster will not be possible. Even with TrieRange you always have to visit TermDocs. And something other: as you only return 100 docs, the number of terms visited may not be so big. The speed improvement of TrieRange is more visible the more distinct values are in the range.

Bring an external popular plugin, in out of the cold into ASF and SOLR, helps solr users out, increases developers from 1 to many.

Agreed on the first, not 100% certain on the second. On the second, this issue is the gate keeper. If people reviewing the patch feel there are better ways to do things, then we should work through them before committing. What you are effectively seeing is an increase in the developers working on from 1 to many, it's just not on committed code.

2) What is the level of commitment, and road map of spatial solutions in lucene and solr?

The primary goal of SOLR is as a text search engine, not GIS search, there are other and better ways to do that
without reinventing the wheel and shoe horn-ing it into lucene.
(e.g. persistent doc id mappings that can be referenced outside of lucene, so things like postGis and other tools can be used)

We can never fully solve everyone's needs at once, lets start with what we have, and iterate upon it.

I'm happy for any improvements as long as they keep to two goals A. don't make it stupid B. don't make it complex.

On the first point, I don't follow. Isn't LocalLucene and LocalSolr, just exactly a GIS search capability for Lucene/Solr? I'm not sure if I would categorize it as shoe-horning. There are many things that Lucene/Solr can power, GIS search with text is one of them. By committing this patch (or some variation), we are saying Solr is going to support it. Of course, there are other ways to do it, but that doesn't preclude it from L/S. The combination of text search plus GIS search is very powerful, as you know.

Still, I think Yonik's main point is why reinvent the wheel when it comes to things like distributed search and the need for custom code for indexing, etc. when they likely can be handled through function queries and field types and therefore all of Solr's current functionality would just work. The other capabilities (like sorting by a FunctionQuery) is icing on the cake that helps solve other problems as well.

Totally agree on the other points. Also very cool to see the benchmarking info.

Grant Ingersoll
added a comment - 13/May/09 00:10 - edited
1) What is the goal we want to achieve?
Provide a first iteration of a geographical search entity to SOLR
Bring an external popular plugin, in out of the cold into ASF and SOLR, helps solr users out, increases developers from 1 to many.
Agreed on the first, not 100% certain on the second. On the second, this issue is the gate keeper. If people reviewing the patch feel there are better ways to do things, then we should work through them before committing. What you are effectively seeing is an increase in the developers working on from 1 to many, it's just not on committed code.
2) What is the level of commitment, and road map of spatial solutions in lucene and solr?
The primary goal of SOLR is as a text search engine, not GIS search, there are other and better ways to do that
without reinventing the wheel and shoe horn-ing it into lucene.
(e.g. persistent doc id mappings that can be referenced outside of lucene, so things like postGis and other tools can be used)
We can never fully solve everyone's needs at once, lets start with what we have, and iterate upon it.
I'm happy for any improvements as long as they keep to two goals A. don't make it stupid B. don't make it complex.
On the first point, I don't follow. Isn't LocalLucene and LocalSolr, just exactly a GIS search capability for Lucene/Solr? I'm not sure if I would categorize it as shoe-horning. There are many things that Lucene/Solr can power, GIS search with text is one of them. By committing this patch (or some variation), we are saying Solr is going to support it. Of course, there are other ways to do it, but that doesn't preclude it from L/S. The combination of text search plus GIS search is very powerful, as you know.
Still, I think Yonik's main point is why reinvent the wheel when it comes to things like distributed search and the need for custom code for indexing, etc. when they likely can be handled through function queries and field types and therefore all of Solr's current functionality would just work. The other capabilities (like sorting by a FunctionQuery) is icing on the cake that helps solve other problems as well.
Totally agree on the other points. Also very cool to see the benchmarking info.

On 1.2 LocalSolr has suffered from not being in the trunk of SOLR, it is popular and has successfully driven a lot of projects. But I have to put my hand up and say that I am it's biggest bottle neck in keep up to date with SOLR changes. And I think it would gain a lot from just the aspect of being current. Most of the changes that caused problems have been minor signature changes where any developer can resolve the issue, thus the 1 - many element really wins.

Certainly improvements are always good, and there are plenty of ways to improve LocalSolr, but even at this stage, I've had to move the trunk of localsolr in SF forward to meet other needs. It would be good to centralize the development, even in a contrib manor. Thus working but open for improvement.

On 2. GIS search can be defined in more ways than I can think of, the opengis consortium has a fairly large list of standardshttp://www.opengeospatial.org/standards/is
LocalSolr supports only 1 set of those items, which is why I define localsolr as not a full GIS solution. It has a framework that
can grow to be more.

patrick o'leary
added a comment - 13/May/09 00:42 On 1.2 LocalSolr has suffered from not being in the trunk of SOLR, it is popular and has successfully driven a lot of projects. But I have to put my hand up and say that I am it's biggest bottle neck in keep up to date with SOLR changes. And I think it would gain a lot from just the aspect of being current. Most of the changes that caused problems have been minor signature changes where any developer can resolve the issue, thus the 1 - many element really wins.
Certainly improvements are always good, and there are plenty of ways to improve LocalSolr, but even at this stage, I've had to move the trunk of localsolr in SF forward to meet other needs. It would be good to centralize the development, even in a contrib manor. Thus working but open for improvement.
On 2. GIS search can be defined in more ways than I can think of, the opengis consortium has a fairly large list of standards
http://www.opengeospatial.org/standards/is
LocalSolr supports only 1 set of those items, which is why I define localsolr as not a full GIS solution. It has a framework that
can grow to be more.

Agreed on the first, not 100% certain on the second. On the second, this issue is the gate keeper. If people reviewing the patch feel there are better ways to do things, then we should work through them before committing. What you are effectively seeing is an increase in the developers working on from 1 to many, it's just not on committed code.

I aggree with iterating about the patch and also LocalLucene (not only LocalSolr).

On the first point, I don't follow. Isn't LocalLucene and LocalSolr, just exactly a GIS search capability for Lucene/Solr? I'm not sure if I would categorize it as shoe-horning. There are many things that Lucene/Solr can power, GIS search with text is one of them. By committing this patch (or some variation), we are saying Solr is going to support it. Of course, there are other ways to do it, but that doesn't preclude it from L/S. The combination of text search plus GIS search is very powerful, as you know.

Yes, and we tried solutions in the past that use unique doc ids to do joins between RDBMS used for geo search and Lucene used for the full text part. The biggest problem is, that this join operations are very inefficient if many documents are affected. Lucene as a full text engine has the great advantage to display the results very fast without retrieving the whole hits (you normally display only the best ranking ones). If you combine with data bases, you have to intersect the results in a HitCollector during filling the PriorityQueue. RDBMS have the problem to always have "transactions" around select statements and will only deliver the results, when the query is completely done. This puts an additional time lag. Doing the geo query completely in Lucene for our search in PANGAEA about a hundred of times faster in most cases (with TrieRange).

Still, I think Yonik's main point is why reinvent the wheel when it comes to things like distributed search and the need for custom code for indexing, etc. when they likely can be handled through function queries and field types and therefore all of Solr's current functionality would just work. The other capabilities (like sorting by a FunctionQuery) is icing on the cake that helps solve other problems as well.

I also agree about thinking to reimplement specific parts of the code, that may be done with "standard" Lucene/Solr tools (I would count TrieRange to that, even as it is not "standard" today - but its generic and not bound to geo and hopefully will move to Lucene Core as NumericRangeQuery & utils) easily.

In my opinion, LocalLucene should be as generic as possible and should not add too many custom datatypes, specific index structures, fixed field names etc. A problem of most GIS solutions for relational databases available on the world is, that you are fixed to specific database schemas. E.g. for our search at PANGAEA, we want to display the results of the Lucene Query also as Map. But for that you cannot use common GIS solution, because they do not know how to extract the data from Lucene.

Soon I will start a small project, to add a plugin to GeoServer's feature store, that does not use RDBMS or shape files or whatever for the features, instead use Lucene. Using that it may also be possible to retrieve the geo objects (in our case data sets with lat/lon) and display them in a WMS using OpenLayers, stream it to Google Earth using the Geoserver KML Streaming API (using TrieRange to support the bounding box filter) and so on.

About your benchmarks:
I suspect, that you have warmed up the readers, but I think you should get faster performace out of TrieRange. In my opinion, you should not use doubles for lat/lon, just use ints and scale the float lat/lon by multiply with 1E7 to get 7 decimal digits (which is surely enough for geo, 180*1E7 should be <Integer.MAX_VALUE, too).
In general, the biggest speed improve of TrieRangecan be seen in comparison to other range queries, if the range contains a lot of distinct values and so hit many documents. E.g. you will also get 100 ms, if you do a search around the african continent where thousands of hits are in, each having a different lat/lon pair! How does LocalLucene behave with that?
Because of this, I would implement the Tiers using tint or tfloat or whatever.

Uwe Schindler
added a comment - 13/May/09 09:35
Agreed on the first, not 100% certain on the second. On the second, this issue is the gate keeper. If people reviewing the patch feel there are better ways to do things, then we should work through them before committing. What you are effectively seeing is an increase in the developers working on from 1 to many, it's just not on committed code.
I aggree with iterating about the patch and also LocalLucene (not only LocalSolr).
On the first point, I don't follow. Isn't LocalLucene and LocalSolr, just exactly a GIS search capability for Lucene/Solr? I'm not sure if I would categorize it as shoe-horning. There are many things that Lucene/Solr can power, GIS search with text is one of them. By committing this patch (or some variation), we are saying Solr is going to support it. Of course, there are other ways to do it, but that doesn't preclude it from L/S. The combination of text search plus GIS search is very powerful, as you know.
Yes, and we tried solutions in the past that use unique doc ids to do joins between RDBMS used for geo search and Lucene used for the full text part. The biggest problem is, that this join operations are very inefficient if many documents are affected. Lucene as a full text engine has the great advantage to display the results very fast without retrieving the whole hits (you normally display only the best ranking ones). If you combine with data bases, you have to intersect the results in a HitCollector during filling the PriorityQueue. RDBMS have the problem to always have "transactions" around select statements and will only deliver the results, when the query is completely done. This puts an additional time lag. Doing the geo query completely in Lucene for our search in PANGAEA about a hundred of times faster in most cases (with TrieRange).
Still, I think Yonik's main point is why reinvent the wheel when it comes to things like distributed search and the need for custom code for indexing, etc. when they likely can be handled through function queries and field types and therefore all of Solr's current functionality would just work. The other capabilities (like sorting by a FunctionQuery) is icing on the cake that helps solve other problems as well.
I also agree about thinking to reimplement specific parts of the code, that may be done with "standard" Lucene/Solr tools (I would count TrieRange to that, even as it is not "standard" today - but its generic and not bound to geo and hopefully will move to Lucene Core as NumericRangeQuery & utils) easily.
In my opinion, LocalLucene should be as generic as possible and should not add too many custom datatypes, specific index structures, fixed field names etc. A problem of most GIS solutions for relational databases available on the world is, that you are fixed to specific database schemas. E.g. for our search at PANGAEA, we want to display the results of the Lucene Query also as Map. But for that you cannot use common GIS solution, because they do not know how to extract the data from Lucene.
Soon I will start a small project, to add a plugin to GeoServer's feature store, that does not use RDBMS or shape files or whatever for the features, instead use Lucene. Using that it may also be possible to retrieve the geo objects (in our case data sets with lat/lon) and display them in a WMS using OpenLayers, stream it to Google Earth using the Geoserver KML Streaming API (using TrieRange to support the bounding box filter) and so on.
About your benchmarks:
I suspect, that you have warmed up the readers, but I think you should get faster performace out of TrieRange. In my opinion, you should not use doubles for lat/lon, just use ints and scale the float lat/lon by multiply with 1E7 to get 7 decimal digits (which is surely enough for geo, 180*1E7 should be <Integer.MAX_VALUE, too).
In general, the biggest speed improve of TrieRangecan be seen in comparison to other range queries, if the range contains a lot of distinct values and so hit many documents. E.g. you will also get 100 ms, if you do a search around the african continent where thousands of hits are in, each having a different lat/lon pair! How does LocalLucene behave with that?
Because of this, I would implement the Tiers using tint or tfloat or whatever.

So breaking things down, it seems like we basically need to be able to:
1) filter by a bounding box
2) filter by a geo radius (impl could first get the bounding box and narrow within that...)
3) sort by distance
4) return the distance

Due tue the surface distance on 0° latitude is 111.320 km per 1° longitude,
on 90° latitude it is 0 km per 1° longitude, if you use a rectangle that does not include any sphere information,
this would be very inaccurate.

Instead (if not to load intense) some mathematical functions should be used here.
For example you can calculate the distance between a given latitude and longitude and another position by calculating
the radian measure between these two points using the angle to the earth middle.

Norman Leutner
added a comment - 14/May/09 16:51 Hi,
just a comment on the distance function.
So breaking things down, it seems like we basically need to be able to:
1) filter by a bounding box
2) filter by a geo radius (impl could first get the bounding box and narrow within that...)
3) sort by distance
4) return the distance
Due tue the surface distance on 0° latitude is 111.320 km per 1° longitude,
on 90° latitude it is 0 km per 1° longitude, if you use a rectangle that does not include any sphere information,
this would be very inaccurate.
Instead (if not to load intense) some mathematical functions should be used here.
For example you can calculate the distance between a given latitude and longitude and another position by calculating
the radian measure between these two points using the angle to the earth middle.

if you use a rectangle that does not include any sphere information, this would be very inaccurate.

I've been really just commenting on what seemed to be the best way to hook into solr.... the interface, not the implementation.
The bounding box filter would simply guarantee to contain all of the points of interest in an efficient manner (but could have some outside the specified radius as well to increase efficiency.

Yonik Seeley
added a comment - 14/May/09 17:06 if you use a rectangle that does not include any sphere information, this would be very inaccurate.
I've been really just commenting on what seemed to be the best way to hook into solr.... the interface, not the implementation.
The bounding box filter would simply guarantee to contain all of the points of interest in an efficient manner (but could have some outside the specified radius as well to increase efficiency.

A way to sort by a function query... this is generic desired functionality anyway!

A way to return the value of a function query for documents - also generic desired functionality. Perhaps use meta as proposed in SOLR-705?

Yonik: clear description of what we need to do here, thanks for that. Myself and 3 collaborators at JPL (Paul Ramirez, Sean McCleese and Sean Hardman) are going to spend time this summer over the next few months to get some patches together than implement this architecture in order to generically support GIS search in SOLR. We have a large corpus of ocean data and lunar data over here at JPL that we'd like to get this working for.

Chris A. Mattmann
added a comment - 14/May/09 18:46 From the user interface point of view, what's needed is:
A way to filter by a bounding box. This could simply be a custom QParser
fq=
{!gbox p=101.2,234.5 f=position, d=1.5}
// a bounding box, centered on 101.2,234.5 including everything within 1.5 miles
A function query that calculates distances
gdist(position,101.2,234.3)
A way to sort by a function query... this is generic desired functionality anyway!
A way to return the value of a function query for documents - also generic desired functionality. Perhaps use meta as proposed in SOLR-705 ?
Yonik: clear description of what we need to do here, thanks for that. Myself and 3 collaborators at JPL (Paul Ramirez, Sean McCleese and Sean Hardman) are going to spend time this summer over the next few months to get some patches together than implement this architecture in order to generically support GIS search in SOLR. We have a large corpus of ocean data and lunar data over here at JPL that we'd like to get this working for.
Thanks and more to come – soon
Cheers,
Chris

My apologies for being out of the loop.... going back to some Patrick's high level points... I agree with Grant (and by extension with most of Patrick's points. Our main issue now is how to move forward.

We have a few options:

1. Get whatever we can working and integrated ASAP and iterate from there.
2. Make some core structural changes to Solr that will make integrating spatial stuff easier/cleaner. With this in place, we can then integrate.
3. Hybrid 1&2 – get a spatial contrib working ASAP with the knowledge that most of it needs to be replaced/reworked as solr core evolves to better support it. We would probalby want to keep the spatial contrib out of the 1.4 release or mark it as "experimental" and subject to change without notice, etc etc

I am partial to #3 so that we can point to concrete issues and have something to patch against.

---------

The primary goal of SOLR is as a text search engine, not GIS search, there are other and better ways to do that without reinventing the wheel and shoe horn-ing it into lucene. (e.g. persistent doc id mappings that can be referenced outside of lucene, so things like postGis and other tools can be used)

of course, but solr should make this kind of integration easy. The beauty of open source is that we need to get a good foundation and the various implementation extensions can be contributed down the road.

Ryan McKinley
added a comment - 14/May/09 21:48 My apologies for being out of the loop.... going back to some Patrick's high level points... I agree with Grant (and by extension with most of Patrick's points. Our main issue now is how to move forward.
We have a few options:
1. Get whatever we can working and integrated ASAP and iterate from there.
2. Make some core structural changes to Solr that will make integrating spatial stuff easier/cleaner. With this in place, we can then integrate.
3. Hybrid 1&2 – get a spatial contrib working ASAP with the knowledge that most of it needs to be replaced/reworked as solr core evolves to better support it. We would probalby want to keep the spatial contrib out of the 1.4 release or mark it as "experimental" and subject to change without notice, etc etc
I am partial to #3 so that we can point to concrete issues and have something to patch against.
---------
The primary goal of SOLR is as a text search engine, not GIS search, there are other and better ways to do that without reinventing the wheel and shoe horn-ing it into lucene. (e.g. persistent doc id mappings that can be referenced outside of lucene, so things like postGis and other tools can be used)
of course, but solr should make this kind of integration easy. The beauty of open source is that we need to get a good foundation and the various implementation extensions can be contributed down the road.

patrick o'leary
added a comment - 18/Jun/09 01:02 That comes from this patch-
This was an older port of localsolr to solr that's fallen behind and hasn't been maintained.
I'll take a look at it and see see about getting it working

It seems to me, as an outsider, that this project is not being incorporated and just languishing. Patrick has some really cool code that is very useful. Why don't we just incorporate it as part of the build first? Get it in there "as is" into the 1.4 build?

As ideas come up, we can track them separately in JIRA, and people can volunteer to fix them.

Bill Bell
added a comment - 23/Jun/09 09:07 It seems to me, as an outsider, that this project is not being incorporated and just languishing. Patrick has some really cool code that is very useful. Why don't we just incorporate it as part of the build first? Get it in there "as is" into the 1.4 build?
As ideas come up, we can track them separately in JIRA, and people can volunteer to fix them.
A lot of people use this today, and it is being left in the dust.
Bill

This is a dash and run comment, as I'm heading to the airport and am out of reach for a couple of weeks but:

Solr is in the run up to a 1.4 launch, I don't expect local / spatial solr to get into 1.4 at this stage.

This patch is out of date, it's a patch these things happen.

LocalSolr will continue in some format on sourceforge

It's been there for over a year now playing catch up with both lucene and solr releases, and while I can't guarantee it will always
have support I've done all that I can to bring in other engineers to help keep it going.

It would be great to get local / spatial solr into solr, but I have no idea where in the road map function queries enhancements ( to provide localsolr's current feature's in a different format ) are for solr, or what priority they will be.

For those reasons I cannot reasonably commit time to something that may / may not happen for who knows how long.
But there are a multitude of components that folks are asking for on the localsolr side of things, and once I'm back I'll be posting
a wish list to the localsolr community asking for features that folks would like.

Again it will move things more out of date with solr, but there isn't much I can do about that.

patrick o'leary
added a comment - 23/Jun/09 15:30 This is a dash and run comment, as I'm heading to the airport and am out of reach for a couple of weeks but:
Solr is in the run up to a 1.4 launch, I don't expect local / spatial solr to get into 1.4 at this stage.
This patch is out of date, it's a patch these things happen.
LocalSolr will continue in some format on sourceforge
It's been there for over a year now playing catch up with both lucene and solr releases, and while I can't guarantee it will always
have support I've done all that I can to bring in other engineers to help keep it going.
It would be great to get local / spatial solr into solr, but I have no idea where in the road map function queries enhancements ( to provide localsolr's current feature's in a different format ) are for solr, or what priority they will be.
For those reasons I cannot reasonably commit time to something that may / may not happen for who knows how long.
But there are a multitude of components that folks are asking for on the localsolr side of things, and once I'm back I'll be posting
a wish list to the localsolr community asking for features that folks would like.
Again it will move things more out of date with solr, but there isn't much I can do about that.

I have just added a patch which adds support to Solr for the multi-threaded spatial search I've added in LUCENE-1732 (Note, I have attached the jar built using the code in the Lucene issue). The performance improvements made by the multi-threaded search reduces the time taken to filter 1.2 million documents from 3s to between 500-800ms.

In addition to the support for the improved spatial search, I have changed the query syntax supported by Solr for spatial searches. The syntax now uses local params which contain any information specific to a spatial search. An example of a search using the new syntax is:

q=

{!spatial_tier lat=50.0 long=4.0 radius=10}

:

Also as part of the patch, I have removed the need for a specific DistanceCalcuatingComponent by changing the query produced by the SpatialTierQueryParserPlugin to a FilteredQuery, and by introducing the notion of a FieldValueSource.

FieldValueSources, which can be registered with the new FieldValueSourceRegistry, are used to add arbitrary information to documents as they are being written by ResponseWriters. Hence a DistanceFieldValueSource is created and registered by the SpatialTierQueryParserPlugin so that the distances calculated during the spatial search can be added the resulting documents. This removes the need to do the adding of the distances in a special component. A useful feature of the FieldValueSources is that they can be controlled through fl request parameter. This means that for spatial search, the distances calculated do not necessary have to be included in the response.

The final contribution of the patch is since the new spatial search uses multiple-threads through an ExecutorService, it is necessary for Solr to have an ExecutorService that can be configured and managed. Consequently the patch includes support for defining an ExecutorService in the solrconfig.xml. The ExecutorService is then cleaned up when the SolrCore it belongs to, is closed.

I am intending on creating an example configuration over the next few days, which will also include some example data.

Chris Male
added a comment - 03/Jul/09 16:26 - edited I have just added a patch which adds support to Solr for the multi-threaded spatial search I've added in LUCENE-1732 (Note, I have attached the jar built using the code in the Lucene issue). The performance improvements made by the multi-threaded search reduces the time taken to filter 1.2 million documents from 3s to between 500-800ms.
In addition to the support for the improved spatial search, I have changed the query syntax supported by Solr for spatial searches. The syntax now uses local params which contain any information specific to a spatial search. An example of a search using the new syntax is:
q=
{!spatial_tier lat=50.0 long=4.0 radius=10}
:
Also as part of the patch, I have removed the need for a specific DistanceCalcuatingComponent by changing the query produced by the SpatialTierQueryParserPlugin to a FilteredQuery, and by introducing the notion of a FieldValueSource.
FieldValueSources, which can be registered with the new FieldValueSourceRegistry, are used to add arbitrary information to documents as they are being written by ResponseWriters. Hence a DistanceFieldValueSource is created and registered by the SpatialTierQueryParserPlugin so that the distances calculated during the spatial search can be added the resulting documents. This removes the need to do the adding of the distances in a special component. A useful feature of the FieldValueSources is that they can be controlled through fl request parameter. This means that for spatial search, the distances calculated do not necessary have to be included in the response.
The final contribution of the patch is since the new spatial search uses multiple-threads through an ExecutorService, it is necessary for Solr to have an ExecutorService that can be configured and managed. Consequently the patch includes support for defining an ExecutorService in the solrconfig.xml. The ExecutorService is then cleaned up when the SolrCore it belongs to, is closed.
I am intending on creating an example configuration over the next few days, which will also include some example data.

I am not going to comment on the "spatialo search" part of this. Let us not keep the ExecutorService in the SolrConfig. SolrConfig is just a place where confiurations are parsed. SolrCore can create and keep the ExecutorService.

There is already another threadpool Executor maintained for distributed search. That one does not require any configuration and it uses some defaults (but it is useful to have some confugurability there). It makes sense to maintain one global threadpool at the core level which everyone component should use.

Noble Paul
added a comment - 06/Jul/09 05:38 I am not going to comment on the "spatialo search" part of this. Let us not keep the ExecutorService in the SolrConfig. SolrConfig is just a place where confiurations are parsed. SolrCore can create and keep the ExecutorService.
There is already another threadpool Executor maintained for distributed search. That one does not require any configuration and it uses some defaults (but it is useful to have some confugurability there). It makes sense to maintain one global threadpool at the core level which everyone component should use.

I guess it is possible to configure the executor service via the configuration of the query parser. That said, having a way to configure executor services in solr config will eliminate some code duplication. I don't think it's a good practice to have on executor service for all components to use - the last thing you want is to have component depend on each other in terms of "race conditions" over threads. I think it is better to fine tune each component with a thread pool of its own.

Uri Boness
added a comment - 06/Jul/09 08:37 I guess it is possible to configure the executor service via the configuration of the query parser. That said, having a way to configure executor services in solr config will eliminate some code duplication. I don't think it's a good practice to have on executor service for all components to use - the last thing you want is to have component depend on each other in terms of "race conditions" over threads. I think it is better to fine tune each component with a thread pool of its own.

the last thing you want is to have component depend on each other in terms of "race conditions" over threads.

The fact that each thread is going to compete for the same CPU resources ,I guess, it should not be a problem. If necessary we can take this discussion in a separate issue . If we discuss this here , it may take away the focus from this one .

Noble Paul
added a comment - 06/Jul/09 09:19 the last thing you want is to have component depend on each other in terms of "race conditions" over threads.
The fact that each thread is going to compete for the same CPU resources ,I guess, it should not be a problem. If necessary we can take this discussion in a separate issue . If we discuss this here , it may take away the focus from this one .

Chris Male
added a comment - 06/Jul/09 16:34 I have now attached an example configuration for the spatial search patch I added. It contains some sample documents and the lucene spatial search jar that my patch is designed to integrate with.

Wow, this issue just keeps growing! We need to figure the best way to move forward that will keep things clean and have the flexibility to enable the wide rage of spatial features we all want. As noted earlier, I hope we can come up with a simple interface that could support various strategies - including: trie, cartesian tier, geohash, rtree, jts/geotools etc, etc.

To get things going, it seems the biggest hurdles are getting the solr framework to support some basic wiring to make these things possible. As Yonik pointed out before, this comes down to a few core features:

SOLR-1131 – FieldType should be able to write multiple fields (consider WKT -> many fields)

Ryan McKinley
added a comment - 22/Jul/09 00:28 Wow, this issue just keeps growing! We need to figure the best way to move forward that will keep things clean and have the flexibility to enable the wide rage of spatial features we all want. As noted earlier, I hope we can come up with a simple interface that could support various strategies - including: trie, cartesian tier, geohash, rtree, jts/geotools etc, etc.
To get things going, it seems the biggest hurdles are getting the solr framework to support some basic wiring to make these things possible. As Yonik pointed out before, this comes down to a few core features:
SOLR-1131 – FieldType should be able to write multiple fields (consider WKT -> many fields)
SOLR-1298 – Add function query calculation to result
SOLR-705 – Attach arbitrary metadata (distance) to the results
A way to sort by a function query
With this, geosearch could be implemented with:
Custom QParser like: fq=
{!gbox p=101.2,234.5 f=position, d=1.5}
// a bounding box, centered on 101.2,234.5 including everything within 1.5 miles
A function query that calculates distances gdist(position,101.2,234.3) (may need to share data with the Query/Filter)

Ryan, you should really have a look at the patch Chris added as it already tackles a few of the requirements you listed:

The FieldValueSource is an abstraction that can be used to add "dynamic" fields to the returned docs. I think this approach is the most flexible and can be used as a starting point. I'm still not sure whether it should add fields to the docs or some sort of a meta data information, but for both approaches the mechanism can stay the same (if meta data approach is chosen then I guess it can be renamed to MetaDataSource instead)

A distance calculation abstraction was already added in the form of GeoDistanceCalculator interface (there are currently two implementation, but a third one can easly be added based on JTS) . I agree there might be other abstractions that one would want to use.

The query parser is already there. The only thing is that right now it differs a bit from the syntax you suggested... it's more in the form q=
{!spatial lat=XXX lng=YYY radius=10 calc=arc unit=km}

Uri Boness
added a comment - 22/Jul/09 11:04 Ryan, you should really have a look at the patch Chris added as it already tackles a few of the requirements you listed:
The FieldValueSource is an abstraction that can be used to add "dynamic" fields to the returned docs. I think this approach is the most flexible and can be used as a starting point. I'm still not sure whether it should add fields to the docs or some sort of a meta data information, but for both approaches the mechanism can stay the same (if meta data approach is chosen then I guess it can be renamed to MetaDataSource instead)
A distance calculation abstraction was already added in the form of GeoDistanceCalculator interface (there are currently two implementation, but a third one can easly be added based on JTS) . I agree there might be other abstractions that one would want to use.
The query parser is already there. The only thing is that right now it differs a bit from the syntax you suggested... it's more in the form q=
{!spatial lat=XXX lng=YYY radius=10 calc=arc unit=km}
.

The thing I keep coming back to is Yonik and Ryan's comments that most of this stuff need not require any custom work at all other than fixing things in Solr that prevent it from using existing capabilities I'd much rather see work done there than more work done customizing "spatial" code. At a minimum, implementing a FunctionQuery for Great Circle distance (and others), adding sort by function and pseudo-fields, those kinds of things and then maybe working on Ryan's FieldType ideas. It seems like none of those, other than FieldTypes, require custom components, right?

Grant Ingersoll
added a comment - 22/Jul/09 16:50 - edited The thing I keep coming back to is Yonik and Ryan's comments that most of this stuff need not require any custom work at all other than fixing things in Solr that prevent it from using existing capabilities I'd much rather see work done there than more work done customizing "spatial" code. At a minimum, implementing a FunctionQuery for Great Circle distance (and others), adding sort by function and pseudo-fields, those kinds of things and then maybe working on Ryan's FieldType ideas. It seems like none of those, other than FieldTypes, require custom components, right?

From my experience more than just a FunctionQuery is required for LocalSolr to be efficient. Without the Cartesian tier information that is added by the UpdateProcessor, you will have to calculate the distance for every single document in our index. The great circle distance calculations are actually quite expensive and when multiplied by say 1 million documents, the query time will become around 2 or 3 seconds. If you then repeat the calculation again for sorting on distance, then the time will be even worse. Therefore it seems necessary to include some way to reduce the number of distance calculations that are done.

Chris Male
added a comment - 10/Aug/09 11:34 From my experience more than just a FunctionQuery is required for LocalSolr to be efficient. Without the Cartesian tier information that is added by the UpdateProcessor, you will have to calculate the distance for every single document in our index. The great circle distance calculations are actually quite expensive and when multiplied by say 1 million documents, the query time will become around 2 or 3 seconds. If you then repeat the calculation again for sorting on distance, then the time will be even worse. Therefore it seems necessary to include some way to reduce the number of distance calculations that are done.

I'm going to have to disagree with Chris's asserion that more than a functionQuery is needed, I have a functionQuery that simply starts by getting a TermEnum that starts with the minimum latitude that can possibily match your spatial extent, and exits when it gets to the max lat. This way I take advantage of the lexical searching of the strings, and then only have to compute distances for things that are in the box.

This code runs at sub second on a shard of 12 million documents, actually its subsecond hitting 8 shards of 12mil each.

Just a thought? If interested I have a searchComponent that makes use of this filter I can attach

Brad Giaccio
added a comment - 13/Aug/09 00:08 I'm going to have to disagree with Chris's asserion that more than a functionQuery is needed, I have a functionQuery that simply starts by getting a TermEnum that starts with the minimum latitude that can possibily match your spatial extent, and exits when it gets to the max lat. This way I take advantage of the lexical searching of the strings, and then only have to compute distances for things that are in the box.
This code runs at sub second on a shard of 12 million documents, actually its subsecond hitting 8 shards of 12mil each.
Just a thought? If interested I have a searchComponent that makes use of this filter I can attach

Grant Ingersoll
added a comment - 13/Aug/09 15:57 I think the sum take away here to Ryan's point that there are multiple ways to do this. Cartesian tier is useful, as are other approaches, let's just not re-invent the wheel if we don't have to.

I realize this is marked for inclusion for 1.5, however, does the group feel that the patches here are ready to be used on 1.4 or should one stick with the LocalSolr project as found on Sourceforge? And if so should one then use 1.3 instead of 1.4?

Any input would be greatly appreciated, and if this is the wrong forum to ask such a question please remove the comment

Padraic Hannon
added a comment - 25/Aug/09 16:57 I realize this is marked for inclusion for 1.5, however, does the group feel that the patches here are ready to be used on 1.4 or should one stick with the LocalSolr project as found on Sourceforge? And if so should one then use 1.3 instead of 1.4?
Any input would be greatly appreciated, and if this is the wrong forum to ask such a question please remove the comment
aloha
Padraic Hannon

Most of these patches, particularly the latest ones, are built against Solr 1.4 therefore I recommend you use this version instead of 1.3. I wouldn't recommend you use LocalSolr from SourceForge as it does not seem as though it has been updated recently.

Chris Male
added a comment - 25/Aug/09 17:10 Hi Padriac,
Most of these patches, particularly the latest ones, are built against Solr 1.4 therefore I recommend you use this version instead of 1.3. I wouldn't recommend you use LocalSolr from SourceForge as it does not seem as though it has been updated recently.

patrick o'leary
added a comment - 25/Aug/09 17:23 Chris / Padraic
I have to disagree -
A patch is not an adequate way to maintain software for a company.
If you have something small, and you don't mind the bleeding edge software, then go ahead and use this.
But if you need stability, then use a completed piece of software such as localsolr.

I noticed that the current implementation only stores individual points per document, is there any plans to store a bounding box per document. This would be useful where complex geometries can be implemented by allowing lucene/solr to do the heavy lifting, filtering by the bounding box and use JTS to complete the more complicated spatial comparisons. (JTS will handle all your WKTs with ease too).

Vincent Yeung
added a comment - 31/Aug/09 09:29 I noticed that the current implementation only stores individual points per document, is there any plans to store a bounding box per document. This would be useful where complex geometries can be implemented by allowing lucene/solr to do the heavy lifting, filtering by the bounding box and use JTS to complete the more complicated spatial comparisons. (JTS will handle all your WKTs with ease too).

Myself, Chris Mattmann (commented above), Faranak Davoodi and some others at JPL are currently working on integrating JTS for just this purpose. We're looking at closely tying it into Chris Male's patches as posted above, and we've been communicating with him and Ryan McKinley about this process.

Right now we're focusing on how to tie JTS into the process as Vincent mentions, without requiring it to do all the filtering, as the speed hit there would be pretty intense. Right now I'm thinking of basically coopting the local lucene calls in Chris Male's approach and siphoning off the gbox related ones to JTS. This might also allow for more complex geodetic functions (like swath data and such) down the line.

Sean McCleese
added a comment - 01/Sep/09 07:52 Myself, Chris Mattmann (commented above), Faranak Davoodi and some others at JPL are currently working on integrating JTS for just this purpose. We're looking at closely tying it into Chris Male's patches as posted above, and we've been communicating with him and Ryan McKinley about this process.
Right now we're focusing on how to tie JTS into the process as Vincent mentions, without requiring it to do all the filtering, as the speed hit there would be pretty intense. Right now I'm thinking of basically coopting the local lucene calls in Chris Male's approach and siphoning off the gbox related ones to JTS. This might also allow for more complex geodetic functions (like swath data and such) down the line.

Bill Bell
added a comment - 07/Sep/09 00:16 OK, I need some sort of distance from lat long being returned in the results. I also need a sort=distance...
Patrick: I cannot get you current localsolr to work with locallucene trunk. Do you have a copy that works?
Thanks.

So right now I cannot get a clean build with LOCALSOLR and SOLR trunk. If I take Patrick's latest and copy the lucene lib and his localsolr.jar I get:

INFO: [core0] webapp=/solr path=/admin/ping params={} status=0 QTime=54
Sep 6, 2009 8:05:27 PM org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder getBoxShape
INFO: Best Fit is : 10
Sep 6, 2009 8:05:29 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.AbstractMethodError
at org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:437)
at org.apache.solr.search.DocSetDelegateCollector.setNextReader(DocSetHitCollector.java:140)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:251)
at org.apache.lucene.search.Searcher.search(Searcher.java:173)
at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1101)
at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880)
at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1302)
at com.pjaol.search.solr.component.LocalSolrQueryComponent.process(LocalSolrQueryComponent.java:300)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857)
at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509)
at java.lang.Thread.run(Thread.java:619)

Any ideas on this one? Maybe bring in something that is missing from Lucene?

Bill Bell
added a comment - 07/Sep/09 02:09 So right now I cannot get a clean build with LOCALSOLR and SOLR trunk. If I take Patrick's latest and copy the lucene lib and his localsolr.jar I get:
INFO: [core0] webapp=/solr path=/admin/ping params={} status=0 QTime=54
Sep 6, 2009 8:05:27 PM org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder getBoxShape
INFO: Best Fit is : 10
Sep 6, 2009 8:05:29 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.AbstractMethodError
at org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:437)
at org.apache.solr.search.DocSetDelegateCollector.setNextReader(DocSetHitCollector.java:140)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:251)
at org.apache.lucene.search.Searcher.search(Searcher.java:173)
at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1101)
at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880)
at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
at org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1302)
at com.pjaol.search.solr.component.LocalSolrQueryComponent.process(LocalSolrQueryComponent.java:300)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857)
at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509)
at java.lang.Thread.run(Thread.java:619)
Any ideas on this one? Maybe bring in something that is missing from Lucene?
Maybe we should have 2 issues:
Working with Patrick's code and SOLR.
Getting new code to work with distance and sorting.

patrick o'leary
added a comment - 07/Sep/09 02:32 I'm holding off updating localsolr on SF until SOLR 1.4 comes out.
There's a lot of flux right now, and I'm not maintaing a patch.
There was a June version of solr-1.4-dev version I made available on http://www.nsshutdown.com/solr-example.tgz
Once 1.4 comes out with a stabilized interface I'll adopt it, and re-release

Noble Paul
added a comment - 19/Sep/09 18:56 hi , For everyone who has not followed everything in the list.
As I see it , we have a workable solution now (correct me if I am wrong) . What is preventing us from committing this (after 1.4 of course)?

Chris: Can you add the distance injection from lat long or the sort=distance? The sort=distance appears more difficult. I could probably just loop through the results and get the distance by doing a simple geo spatial calculation, but the sorting needs to be in your patch.

Noble: Ideas on the best way to add the sorting? Local Lucene has functions for sorting.... Not sure how to expose them.

Bill Bell
added a comment - 21/Sep/09 05:45 - edited Chris: Can you add the distance injection from lat long or the sort=distance? The sort=distance appears more difficult. I could probably just loop through the results and get the distance by doing a simple geo spatial calculation, but the sorting needs to be in your patch.
Noble: Ideas on the best way to add the sorting? Local Lucene has functions for sorting.... Not sure how to expose them.
Thanks!

You commented:
Brad Giaccio added a comment - 12/Aug/09 04:08 PM
I'm going to have to disagree with Chris's asserion that more than a functionQuery is needed, I have a functionQuery that simply starts by getting a TermEnum that starts with the minimum latitude that can possibily match your spatial extent, and exits when it gets to the max lat. This way I take advantage of the lexical searching of the strings, and then only have to compute distances for things that are in the box.

This code runs at sub second on a shard of 12 million documents, actually its subsecond hitting 8 shards of 12mil each.

Just a thought? If interested I have a searchComponent that makes use of this filter I can attach

Bill Bell
added a comment - 25/Sep/09 08:16 Brad,
Were you able to complete your patch?
You commented:
Brad Giaccio added a comment - 12/Aug/09 04:08 PM
I'm going to have to disagree with Chris's asserion that more than a functionQuery is needed, I have a functionQuery that simply starts by getting a TermEnum that starts with the minimum latitude that can possibily match your spatial extent, and exits when it gets to the max lat. This way I take advantage of the lexical searching of the strings, and then only have to compute distances for things that are in the box.
This code runs at sub second on a shard of 12 million documents, actually its subsecond hitting 8 shards of 12mil each.
Just a thought? If interested I have a searchComponent that makes use of this filter I can attach

Basically this code adds a new search component, that uses a field identified in solrconfig.xml for searching. It does a circular search based on min/max radius. The component also handles searching across shards, provided

In the next few weeks I've been tasked to also do bounding box searches (i.e. find all documents that fall inside of a box defined by nw and se corners).

Hope this helps someone. Let me know if you have questions or can't get it to build.

Brad Giaccio
added a comment - 03/Nov/09 18:21 Sorry for the long delay on this, I had to get approval to submit it.
Basically this code adds a new search component, that uses a field identified in solrconfig.xml for searching. It does a circular search based on min/max radius. The component also handles searching across shards, provided
In the next few weeks I've been tasked to also do bounding box searches (i.e. find all documents that fall inside of a box defined by nw and se corners).
Hope this helps someone. Let me know if you have questions or can't get it to build.
Brad

I've written a Solr plugin which uses a field with the computed hilbert space filling curve to cluster resulting documents so they can be efficiently placed on a google map control. Basically given a precision and a southwest lat/lng and northeast lat/lng bounding box it returns a group of clusters with an exact lat/lng location, a bounding box for all the documents in the cluster and the count of the number of documents in that cluster. Depending on settings given to the application (number of results in docset and/or size of the requested bounding box) it will instead to return the list of documents so that when you're zoomed in far enough the clusters transform into actual distinct documents.

My implementation is very specific to our website and is not generally applicable:

The calculation of the hilbert space filling curve value is done by our index-script

Several field names are hardcoded

It uses a hardcoded precision for the hilbert value (30 bits)

It still uses highly inefficient methods for some actions (it stores the value in a sint field instead of a trie int as I was waiting for Solr 1.4 to be released before continuing working on the plugin, but now I'll have to find/make the time)

I think LocalSolr would really benefit from something like this as I think when you're storing geographic data displaying it on a map (whether it be google maps, bing maps, open streetview or whatever) is something a lot of people will want to do (and I love full faceted browsing on a map).

Gijs Kunze
added a comment - 13/Nov/09 16:32 I've written a Solr plugin which uses a field with the computed hilbert space filling curve to cluster resulting documents so they can be efficiently placed on a google map control. Basically given a precision and a southwest lat/lng and northeast lat/lng bounding box it returns a group of clusters with an exact lat/lng location, a bounding box for all the documents in the cluster and the count of the number of documents in that cluster. Depending on settings given to the application (number of results in docset and/or size of the requested bounding box) it will instead to return the list of documents so that when you're zoomed in far enough the clusters transform into actual distinct documents.
My implementation is very specific to our website and is not generally applicable:
The calculation of the hilbert space filling curve value is done by our index-script
Several field names are hardcoded
It uses a hardcoded precision for the hilbert value (30 bits)
It still uses highly inefficient methods for some actions (it stores the value in a sint field instead of a trie int as I was waiting for Solr 1.4 to be released before continuing working on the plugin, but now I'll have to find/make the time)
I think LocalSolr would really benefit from something like this as I think when you're storing geographic data displaying it on a map (whether it be google maps, bing maps, open streetview or whatever) is something a lot of people will want to do (and I love full faceted browsing on a map).
My implementation can be seen running on: http://www.mysecondhome.co.uk/search.html?view=map (It's not perfect, there are small bugs but in general it works fast enough on our dataset)

Other than that, I think the pieces that make up what is needed for spatial search are now being tracked through the various dependent JIRA issues listed above. I am going to keep this issue open as a way of tracking all the bits and pieces that go into making Solr do spatial work. Once I feel they are ready, then I will update here.

Grant Ingersoll
added a comment - 16/Nov/09 20:36 Hi Bill (and everyone else),
I'm working on bits and pieces of this, as are others. I don't think there will be one monolithic patch called "Local Solr" at this point as the donated LocalSolr solves one particular spatial problem in one particular way. I already added in distance function queries (see SOLR-1302 ) and am now working on a QParserPlugin that will produce CartesianTier filters, possibly reusing what is in contrib/spatial from Lucene, although I am not totally sold on what is in their just yet either, implementation wise. It may require some cleanup as well to be more generic and use newer Lucene capabilities. Basically, I am executing on what Yonik, Ryan and I laid out around https://issues.apache.org/jira/browse/SOLR-773?focusedCommentId=12733900&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12733900 , https://issues.apache.org/jira/browse/SOLR-773?focusedCommentId=12703259&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12703259 and https://issues.apache.org/jira/browse/SOLR-773?focusedCommentId=12631963&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12631963 . This result should be a much better Solr system overall, with the side effect being we can now support spatial search.
As it stands now, I've been doing searches w/ filtering using the dist() and hsin() methods in conjunction with Solr's frange functionality (see http://www.lucidimagination.com/blog/2009/07/06/ranges-over-functions-in-solr-14/ ) and that seems to be working quite well.
Other than that, I think the pieces that make up what is needed for spatial search are now being tracked through the various dependent JIRA issues listed above. I am going to keep this issue open as a way of tracking all the bits and pieces that go into making Solr do spatial work. Once I feel they are ready, then I will update here.

patrick o'leary
added a comment - 22/Nov/09 05:26 11/21/09 21:00 PDT
patrick o'leary
to locallucene-users, locallucene-developers
Folks
I've updated localsolr to work with solr-1.4 release, also works with solr-1.5?? nightly as of 11/21/09
There are a couple of changes needed to upgrade to this version.
1) schema.xml has to be updated
lat / long fields and dynamic field _localTier* has to be updated to type="tdouble"
2) your index has to be rebuilt from scratch.
This is not ideal, but unfortunately numeric util updates in lucene force us down this path.
As always I've put a batteries included demo on http://www.nsshutdown.com/solr-example.tgz
Thanks
Patrick

thanks a lot for your job on this issue, it is very useful for the project I'm working on (geo-localisation of cultural events in the french part of Switzerland).

special thanks to patrick o'leary for your last post! I had expected this update since several weeks as I'm updating solr to version 1.4 and the version I was using didn't work anymore.

I made modifications to your code to allow configuration of tierPrefix and distanceField values in the solrconfig (like what has been added to the spatial-solr project). Do you want to have my modifications? May I commit them?

Benoît Terradillos
added a comment - 22/Nov/09 15:26 Hello folks,
thanks a lot for your job on this issue, it is very useful for the project I'm working on (geo-localisation of cultural events in the french part of Switzerland).
special thanks to patrick o'leary for your last post! I had expected this update since several weeks as I'm updating solr to version 1.4 and the version I was using didn't work anymore.
I made modifications to your code to allow configuration of tierPrefix and distanceField values in the solrconfig (like what has been added to the spatial-solr project). Do you want to have my modifications? May I commit them?

It's important to realize that localsolr is just a stop gap until it's functionality / feature set is included in solr
Grant is doing some fantastic work here, and I'm looking forward to seeing the outcome.

patrick o'leary
added a comment - 22/Nov/09 15:46 It's important to realize that localsolr is just a stop gap until it's functionality / feature set is included in solr
Grant is doing some fantastic work here, and I'm looking forward to seeing the outcome.
Please feel free to join locallucene-users listserv on sourceforge http://sourceforge.net/mail/?group_id=208194
and send patches there, and I'll do my best to include them.

Grant is doing some fantastic work here, and I'm looking forward to seeing the outcome

Grant would definitely welcome help! This is way too big for me. People wanting to help, should take a look at all of the linked items on this issue and see where they can contribute. If in doubt, please ask. I'm good at telling people what to do

Grant Ingersoll
added a comment - 22/Nov/09 15:55 Grant is doing some fantastic work here, and I'm looking forward to seeing the outcome
Grant would definitely welcome help! This is way too big for me. People wanting to help, should take a look at all of the linked items on this issue and see where they can contribute. If in doubt, please ask. I'm good at telling people what to do

patrick o'leary
added a comment - 22/Nov/09 16:23 Yeah, this has become a big re-arch of solr-
The implementation of the spatial search is almost secondary to the key features
1) A method to add meta data to a document from a query life cycle
Should be common between lucene and solr (end goal)
2) Meta data can be used to perform sorting and boosting
I think once those two things are completed then spatial search will just fit right in

Not so much a re-arch, but an extension of some pieces to handle some new ideas. I think we all agree that Solr does a pretty good job of hiding some of the complexity of Lucene. So, by being able to simply declare a new field that is a CartesianTier field type, then the user need not worry at all about managing the tier prefix stuff that contrib/spatial requires.

Grant Ingersoll
added a comment - 22/Nov/09 16:43 Not so much a re-arch, but an extension of some pieces to handle some new ideas. I think we all agree that Solr does a pretty good job of hiding some of the complexity of Lucene. So, by being able to simply declare a new field that is a CartesianTier field type, then the user need not worry at all about managing the tier prefix stuff that contrib/spatial requires.

Bill Bell
added a comment - 23/Nov/09 05:52 Patrick:
In your http://www.nsshutdown.com/solr-example.tgz There is missing localsolr.jar and the other jar is also missing... Can you please add them?
What repo are we supposed to build from now? (kinda confusing).
Bill

simple run ant in that directory to create dist/polySpatial.war
Load is up in a web container like tomcat on port 8080 and hit http://localhost:8080/polySpatial/
Click on the map to start seeing results around the generated polygon.

patrick o'leary
added a comment - 03/Dec/09 04:11 I will be making some updates that fix a few bugs, and start working on polygon searching.
Right now there's a very very basic example in svn
https://locallucene.svn.sourceforge.net/svnroot/locallucene/trunk/contrib/polySpatial
simple run ant in that directory to create dist/polySpatial.war
Load is up in a web container like tomcat on port 8080 and hit http://localhost:8080/polySpatial/
Click on the map to start seeing results around the generated polygon.
Let me know your thoughts

Patrick, I tried out your "Batteries Included" example, and it worked great. One of the questions I have is that it seems like the scoring process doesn't take into account the distance from a central point.. In other words, if I specify a 10 mile radius, and there is a really high scoring match more then 10 miles out, it doesn't get returned. The radius functions as a strict filter of what gets returned. However, I think what we are really trying to do is to find the best search results, and have distance factored in as well.

I was thinking that I could sort of do this "fuzzy" boundary by making a query with a radius x, and then doing the same query radius x * 2. Then, if any of the documents in x * 2 are much better then in radius x, then to include them. Obviously this would be somewhat clunky to do from the client side!

A use case I can think of is searching for gas stations within 5 miles of me, but if a gas station has really cheap gas, and is 6 miles away, then include that. But just a penny cheaper ignore it.

Eric Pugh
added a comment - 08/Dec/09 18:57 - edited Patrick, I tried out your "Batteries Included" example, and it worked great. One of the questions I have is that it seems like the scoring process doesn't take into account the distance from a central point.. In other words, if I specify a 10 mile radius, and there is a really high scoring match more then 10 miles out, it doesn't get returned. The radius functions as a strict filter of what gets returned. However, I think what we are really trying to do is to find the best search results, and have distance factored in as well.
I was thinking that I could sort of do this "fuzzy" boundary by making a query with a radius x, and then doing the same query radius x * 2. Then, if any of the documents in x * 2 are much better then in radius x, then to include them. Obviously this would be somewhat clunky to do from the client side!
A use case I can think of is searching for gas stations within 5 miles of me, but if a gas station has really cheap gas, and is 6 miles away, then include that. But just a penny cheaper ignore it.
I added as a "screenshot" a drawing of what I was sort of thinking.

patrick o'leary
added a comment - 08/Dec/09 19:06 You can certainly implement a fuzzy scoring method, but you really want to avoid having to calculate distances for all your results, so
some sort of restriction is good.
If your data set is small ~100K docs, you might get away with using a value scorer and boost on distances.
But if your data set is in the order of millions, that's not going to be a good idea.

SOLR-1297: sort by function query just needs review and then can be committed.

After that, we can add in the Cartesian Tier indexing and the Cartesian Tier QParserPlugin (after a little re-write). Then we need pseudo-fields and we likely want to hook in a per request function cache (maybe)

Grant Ingersoll
added a comment - 12/Dec/09 13:55 Just an update:
SOLR-1131 : aka poly fields is almost ready to go. Please review.
SOLR-1297 : sort by function query just needs review and then can be committed.
After that, we can add in the Cartesian Tier indexing and the Cartesian Tier QParserPlugin (after a little re-write). Then we need pseudo-fields and we likely want to hook in a per request function cache (maybe)

Dave Craft
added a comment - 15/Dec/09 00:16 Hi,
I've created a blog post on installing LocalSolr onto Solr 1.4.. Which takes all the comments and breaks it down into step by step instructions.
Hope it helps
http://craftyfella.blogspot.com/2009/12/installing-localsolr-onto-solr-14.html

Grant Ingersoll
added a comment - 21/Dec/09 19:20 - edited There is already a spot for Spatial at: http://wiki.apache.org/solr/SpatialSearch
It probably would be useful to see if the LocalSolr project can make use of it, since Solr itself is not going to require any custom install stuff.

SOLR-1586 is committed for GeohashField and SpatialTileField. We likely will add one more FieldType that combines both a 2D PointType and the tiling capabilities into a single FieldType, mostly as a convenience mechanism.

Grant Ingersoll
added a comment - 30/Dec/09 14:36 SOLR-1586 is committed for GeohashField and SpatialTileField. We likely will add one more FieldType that combines both a 2D PointType and the tiling capabilities into a single FieldType, mostly as a convenience mechanism.

Issue #1:
If we specify fl=id (or fl= anything) and wt=json it seems that the fl parameter is ignored (thus we get a lot more detail in our results than we'd like).

If we specify fl=id and leave out wt=json (which defaults to returning xml results), we get the expected fields back. We'd really prefer to use wt=json because the results are easier for us to deal with (also, the same issue also arises with wt=python and wt=ruby).

Issue #2:
It looks like the defType parameter isn't properly passed through for geo queries, making it really hard to use things like dismax + geo. I've been playing with the code a bit and have a "working" patch for it. However, as I'm very new to the solr/localsolr source, I'd be uncomfortable submitting it without additional testing.

Brian Westphal
added a comment - 13/Jan/10 01:35 We've got Localsolr (2.9.1 lucene-spatial library) running on Solr 1.4 with Tomcat 1.6. Everything's looking good, except for a couple little issues.
Issue #1:
If we specify fl=id (or fl= anything) and wt=json it seems that the fl parameter is ignored (thus we get a lot more detail in our results than we'd like).
If we specify fl=id and leave out wt=json (which defaults to returning xml results), we get the expected fields back. We'd really prefer to use wt=json because the results are easier for us to deal with (also, the same issue also arises with wt=python and wt=ruby).
Issue #2:
It looks like the defType parameter isn't properly passed through for geo queries, making it really hard to use things like dismax + geo. I've been playing with the code a bit and have a "working" patch for it. However, as I'm very new to the solr/localsolr source, I'd be uncomfortable submitting it without additional testing.
---------
If anyone knows any workarounds for these issues, please let me know.

Trying the solr trunk right now, but I'm getting an exception: rsp java.lang.NoSuchFieldError: rsp at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:119)

Working on trying to figure out what the issue could be still – very well my fault – but though I'd mention it in case it rang any bells. I'm very new to looking at the solr and localsolr code so it might take me a bit of time to figure out – I looked at the code for ResponseBuilder and it seems like it has an rsp field.

Brian Westphal
added a comment - 13/Jan/10 21:50 Hi Grant,
Trying the solr trunk right now, but I'm getting an exception: rsp java.lang.NoSuchFieldError: rsp at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:119)
Working on trying to figure out what the issue could be still – very well my fault – but though I'd mention it in case it rang any bells. I'm very new to looking at the solr and localsolr code so it might take me a bit of time to figure out – I looked at the code for ResponseBuilder and it seems like it has an rsp field.
Thanks

Grant Ingersoll
added a comment - 13/Jan/10 22:00 Sorry, meant w/o LocalSolr. Most of LocalSolr has been incorporated into Solr at this point, with the exception of the Tier filtering. Docs are under way at http://wiki.apache.org/solr/SpatialSearch

I'm gonna work on getting stuff tested with solr 1.5. I wanted to ask about another issue in the meantime however.

I've noticed that I get a "Illegal Latitude Value" exception sometimes when working with points near the poles or just when working with very large radii. I would personally rather the system just cutoff at -90 and 90 artificially than throw an error. I'm not worried about finding things near the poles as much as I'd like to be able to use bigger search radii, but I don't care if it wraps around the earth correctly latitudinally speaking. (if avoiding this issue were doable by a flag or something, that'd be great too)

Here's the more precise error if it helps:
-----------------
HTTP Status 500 - Illegal latitude value 113.29902312168412 java.lang.IllegalArgumentException: Illegal latitude value 113.29902312168412 at org.apache.lucene.spatial.geometry.FloatLatLng.<init>(FloatLatLng.java:31) at org.apache.lucene.spatial.geometry.shape.LLRect.createBox(LLRect.java:85) at org.apache.lucene.spatial.tier.DistanceUtils.getBoundary(DistanceUtils.java:54) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoxShape(CartesianPolyFilterBuilder.java:59) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoundingArea(CartesianPolyFilterBuilder.java:121) at org.apache.lucene.spatial.tier.DistanceQueryBuilder.<init>(DistanceQueryBuilder.java:59) at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:151) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:637)
-------------------

Brian Westphal
added a comment - 14/Jan/10 02:33 I'm gonna work on getting stuff tested with solr 1.5. I wanted to ask about another issue in the meantime however.
I've noticed that I get a "Illegal Latitude Value" exception sometimes when working with points near the poles or just when working with very large radii. I would personally rather the system just cutoff at -90 and 90 artificially than throw an error. I'm not worried about finding things near the poles as much as I'd like to be able to use bigger search radii, but I don't care if it wraps around the earth correctly latitudinally speaking. (if avoiding this issue were doable by a flag or something, that'd be great too)
Here's the more precise error if it helps:
-----------------
HTTP Status 500 - Illegal latitude value 113.29902312168412 java.lang.IllegalArgumentException: Illegal latitude value 113.29902312168412 at org.apache.lucene.spatial.geometry.FloatLatLng.<init>(FloatLatLng.java:31) at org.apache.lucene.spatial.geometry.shape.LLRect.createBox(LLRect.java:85) at org.apache.lucene.spatial.tier.DistanceUtils.getBoundary(DistanceUtils.java:54) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoxShape(CartesianPolyFilterBuilder.java:59) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoundingArea(CartesianPolyFilterBuilder.java:121) at org.apache.lucene.spatial.tier.DistanceQueryBuilder.<init>(DistanceQueryBuilder.java:59) at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:151) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:637)
-------------------

I should have a patch up for SOLR-1568 up pretty soon. I'm also going to add a new FieldType specifically for Lat/Lon that extends PointType and is fixed to two dimensions and can be a bit more intelligent about that specific use case.

Could use some help on SOLR-1298 so that we could get pseudo-fields in sooner rather than later.

Grant Ingersoll
added a comment - 10/Feb/10 14:43 Just an update here on other work related to this issue.
I should have a patch up for SOLR-1568 up pretty soon. I'm also going to add a new FieldType specifically for Lat/Lon that extends PointType and is fixed to two dimensions and can be a bit more intelligent about that specific use case.
Could use some help on SOLR-1298 so that we could get pseudo-fields in sooner rather than later.

First of all I believe
<fieldType name="location" class="solr.PointType" dimension="2" subFieldTypes="double"/>
should be changed to
<fieldType name="location" class="solr.PointType" dimension="2" subFieldType="double"/>

But more importantly I'm having trouble getting this to work.
I was able to index my data using the Geohash type and can see it in my store field when not doing spatial queries. However, when doing the following query:
...?q=val:"recip(dist(2, store, vector(34.0232,-81.0664)),1,1,0)"&fl=*,score
I get error message:
Illegal number of sources. There must be an even number of sources

I also tried ...?q=

{!sfilt fl=location}

&pt=49.32,-79.0&dist=20 and get message unknown query type 'sfilt'.

Is there something I'm missing or is code just not committed to trunk yet?
Thanks

Dan Bentson
added a comment - 16/Mar/10 22:02 I'm trying to test spatial search out in 1.5 going by the docs on http://wiki.apache.org/solr/SpatialSearch
First of all I believe
<fieldType name="location" class="solr.PointType" dimension="2" subFieldTypes="double"/>
should be changed to
<fieldType name="location" class="solr.PointType" dimension="2" subFieldType="double"/>
But more importantly I'm having trouble getting this to work.
I was able to index my data using the Geohash type and can see it in my store field when not doing spatial queries. However, when doing the following query:
...?q= val :"recip(dist(2, store, vector(34.0232,-81.0664)),1,1,0)"&fl=*,score
I get error message:
Illegal number of sources. There must be an even number of sources
I also tried ...?q=
{!sfilt fl=location}
&pt=49.32,-79.0&dist=20 and get message unknown query type 'sfilt'.
Is there something I'm missing or is code just not committed to trunk yet?
Thanks

Dan Bentson
added a comment - 22/Mar/10 21:25 Update to my above comment. I was able to get both types of searches working.
Using the Spatial Filter QParser I'm getting the results I want (example query: q=pizza
{!sfilt fl=location}
&pt=49.32,-79.0&dist=20). I have a couple questions though:
First of all what is the distance unit of measurement? Miles? Meters?
Also, using Patrick's plug-in it returned the distance as a result field. Is there any way to do that in SOLR 1.5?
Any help with this would be greatly appreciated!!!
Thanks

SOLR-1568, which is the last big piece, I think, is almost done. I added a new LatLonType which should make it super easy to do pure LatLon stuff (Point is more for Rectangular Coordinate System. I guess maybe we should rename it?) and it should be easy to extend to use different distance methods. I will try to document some more on the wiki.

There are some minor bugs related to sorting by function right now, but it should be usable for people just doing spatial stuff (SOLR-1297). Probably the next most important piece to get in place is SOLR-1298 and it's related item SOLR-705. Help on those pieces would be most appreciated.

Grant Ingersoll
added a comment - 02/Apr/10 14:15 Status update:
SOLR-1568 , which is the last big piece, I think, is almost done. I added a new LatLonType which should make it super easy to do pure LatLon stuff (Point is more for Rectangular Coordinate System. I guess maybe we should rename it?) and it should be easy to extend to use different distance methods. I will try to document some more on the wiki.
There are some minor bugs related to sorting by function right now, but it should be usable for people just doing spatial stuff ( SOLR-1297 ). Probably the next most important piece to get in place is SOLR-1298 and it's related item SOLR-705 . Help on those pieces would be most appreciated.
As always, people kicking the tires on the trunk is appreciated too.

Grant, I started looking at SOLR-1298 yesterday. The idea is to somehow merge all the related issues (there are currently two open issues for the same purpose with two different patches). But this should be done with somewhat collaborated manner so everybody will be on the same page here.... also regarding the discussion about the different approaches (inline the pseudo fields or have them nested in a separate "meta" element). Is there some way to merge the issues? or perhaps mark one of them as duplicate, so the discussion will be centralized.

Uri Boness
added a comment - 02/Apr/10 14:21 - edited Grant, I started looking at SOLR-1298 yesterday. The idea is to somehow merge all the related issues (there are currently two open issues for the same purpose with two different patches). But this should be done with somewhat collaborated manner so everybody will be on the same page here.... also regarding the discussion about the different approaches (inline the pseudo fields or have them nested in a separate "meta" element). Is there some way to merge the issues? or perhaps mark one of them as duplicate, so the discussion will be centralized.
btw, the other "duplicate" issues is SOLR-1566

Hoss Man
added a comment - 27/May/10 23:09 Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...
http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.
A unique token for finding these 240 issues in the future: hossversioncleanup20100527

P B
added a comment - 13/Jul/10 09:05 I am newbie and I have a question:
Is there possible now to use Solr Night Build to solve this problem:
find all points (using index) in radius of R km on earth?
Is there ready to use query sample?

I'm extremely interested in the Polygon search here, is there any activity on that front or am I the only one interested in it? I'd be willing to try and contribute to this effort, but I imagine a lot of it will depend on the work being done on point searching.

Oliver Beattie
added a comment - 27/Jul/10 12:15 I'm extremely interested in the Polygon search here, is there any activity on that front or am I the only one interested in it? I'd be willing to try and contribute to this effort, but I imagine a lot of it will depend on the work being done on point searching.

Simon Rijnders
added a comment - 27/Sep/10 09:31 Same question as P B:
Is there possible now to use Solr Night Build to solve this problem: find all points (using index) in radius of R km on earth?
Im willing to act as an (ignorant) test subject, so if the above is possible, please let me know, and I'll see what I can do....

I'm going to mark this issue as resolved at this point. For a long time, this issue has served to track a bunch of different issues related to Solr, but I think we have incorporated almost all of the major features of local solr (and some others, too) such that it makes sense to just track things individually at this point.

Grant Ingersoll
added a comment - 06/Dec/10 22:02 I'm going to mark this issue as resolved at this point. For a long time, this issue has served to track a bunch of different issues related to Solr, but I think we have incorporated almost all of the major features of local solr (and some others, too) such that it makes sense to just track things individually at this point.