I'm attempting to do a spatial join much like the example here: Is there a python option to "join attributes by location"?. However, that approach seems really inefficient / slow. Even running this with a modest 250 points takes almost 2 minutes and it fails entirely on shapefiles with > 1,000 points. Is there a better approach? I'd like to do this entirely in Python without using ArcGIS, QGIS, etc.

I'd also be interested to know if it's possible to SUM attributes (i.e. population) of all the points that fall within a polygon and join that quantity to the polygon shapefile.

I think you should edit your second question out of here so that this one remains focussed on what I assume is the more important question to you. The other can be researched/asked separately.
– PolyGeo♦Jun 24 '14 at 19:40

# Create the R-tree index and store the features in it (bounding box)
from rtree import index
idx = index.Index()
for pos, poly in enumerate(polygons):
idx.insert(pos, shape(poly['geometry']).bounds)
#iterate through points
for i,pt in enumerate(points):
point = shape(pt['geometry'])
# iterate through spatial index
for j in idx.intersection(point.coords[0]):
if point.within(shape(multi[j]['geometry'])):
polygons[j]['properties']['score'] = polygons[j]['properties']['score'] + points[i]['properties']['score']

Result with the two solutions:

for poly in polygons:
print poly['properties']
OrderedDict([(u'score', 2)]) # 2 points in the polygon
OrderedDict([(u'score', 1)]) # 1 point in the polygon
OrderedDict([(u'score', 1)]) # 1 point in the polygon

What is the difference ?

Without the index, you must iterate through all the geometries (polygons and points).

With a bounding spatial index (Spatial Index RTree), you iterate only through the geometries which have a chance to intersect with your current geometry ('filter' which can save a considerable amount of calculations and time...)

but a Spatial Index is not a magic wand. When a very large part of the dataset has to be retrieved, a Spatial Index cannot give any speed benefit.

I was under the impression rtree was optional. Doesn't that mean you need to install rtree as well as the libspatialindex C-library?
– kuanbFeb 10 '17 at 1:54

it's been a while but I think when I did this installing geopandas from github automatically added rtree when I had first installed libspatialindex... they've done a fairly major release since so I'm sure things have changed a bit
– claytonrshFeb 27 '17 at 15:16

Use Rtree as an index to perform the much faster joins, then Shapely to do the spatial predicates to determine if a point is actually within a polygon. If done properly, this can be faster than most other GISes.

Thank you @Mike T ... using the dict object or PostGIS are great suggestions. I'm still a little confused where I can incorporate Rtree in my code, however (included code above).
– jburrfischerJun 24 '14 at 2:45

Thanks @klewis ... any chance you can help with the second part? To sum the point attributes (e.g. population) that fall within the polygons I tried something similar to code below but it threw an error. if shape(school['geometry']).within(shape(neighborhood['geometry'])): neighborhood['properties']['population'] += schools['properties']['population']
– jburrfischerJun 23 '14 at 20:10

If you open neighborhood in 'r' mode, it might be read-only. Do both shapefiles have field population? What line # is throwing the error? Good luck.
– klewisJun 23 '14 at 23:22

Thank you again @klewis ... I've added my code above and explained the error. Also, I've been playing around with rtree and I'm still little confused where I would add this into the code above. Sorry to be a such a bother.
– jburrfischerJun 24 '14 at 2:35