1# Copyright 2004 by Thomas Hamelryck. 2# All rights reserved. 3# This code is part of the Biopython distribution and governed by its 4# license. Please see the LICENSE file that should have been included 5# as part of this package. 6"""KD tree data structure for searching N-dimensional vectors. 7 8The KD tree data structure can be used for all kinds of searches that 9involve N-dimensional vectors, e.g. neighbor searches (find all points 10within a radius of a given point) or finding all point pairs in a set 11that are within a certain radius of each other. See "Computational Geometry: 12Algorithms and Applications" (Mark de Berg, Marc van Kreveld, Mark Overmars, 13Otfried Schwarzkopf). Author: Thomas Hamelryck. 14""" 15 16from__future__importprint_function 17 18fromnumpyimportsum,sqrt,array 19fromnumpyimportrandom 20 21fromBio.KDTreeimport_CKDTree 22 23__docformat__="restructuredtext en" 24 25

109"""KD tree implementation (C++, SWIG python wrapper)110111 The KD tree data structure can be used for all kinds of searches that112 involve N-dimensional vectors, e.g. neighbor searches (find all points113 within a radius of a given point) or finding all point pairs in a set114 that are within a certain radius of each other.115116 Reference:117118 Computational Geometry: Algorithms and Applications119 Second Edition120 Mark de Berg, Marc van Kreveld, Mark Overmars, Otfried Schwarzkopf121 published by Springer-Verlag122 2nd rev. ed. 2000.123 ISBN: 3-540-65620-0124125 The KD tree data structure is described in chapter 5, pg. 99.126127 The following article made clear to me that the nodes should128 contain more than one point (this leads to dramatic speed129 improvements for the "all fixed radius neighbor search", see130 below):131132 JL Bentley, "Kd trees for semidynamic point sets," in Sixth Annual ACM133 Symposium on Computational Geometry, vol. 91. San Francisco, 1990134135 This KD implementation also performs a "all fixed radius neighbor search",136 i.e. it can find all point pairs in a set that are within a certain radius137 of each other. As far as I know the algorithm has not been published.138 """139

148"""Add the coordinates of the points.149150 Arguments:151 - coords: two dimensional NumPy array. E.g. if the points152 have dimensionality D and there are N points, the coords153 array should be NxD dimensional.154 """155ifcoords.min()<=-1e6orcoords.max()>=1e6:156raiseException("Points should lie between -1e6 and 1e6")157iflen(coords.shape)!=2orcoords.shape[1]!=self.dim:158raiseException("Expected a Nx%i NumPy array"%self.dim)159self.kdt.set_data(coords)160self.built=1

191"""Return the list of indices.192193 Return the list of indices after a neighbor search.194 The indices refer to the original coords NumPy array. The195 coordinates with these indices were within radius of center.196197 For an index pair, the first index<second index.198 """199a=self.kdt.get_indices()200ifaisNone:201return[]202returna

207"""All fixed neighbor search.208209 Search all point pairs that are within radius.210211 Arguments:212 - radius: float (>0)213 """214ifnotself.built:215raiseException("No point set specified")216self.neighbors=self.kdt.neighbor_search(radius)

219"""Return All Fixed Neighbor Search results.220221 Return a Nx2 dim NumPy array containing222 the indices of the point pairs, where N223 is the number of neighbor pairs.224 """225a=array([[neighbor.index1,neighbor.index2]forneighborinself.neighbors])226returna

229"""Return All Fixed Neighbor Search results.230231 Return an N-dim array containing the distances232 of all the point pairs, where N is the number233 of neighbor pairs..234 """235return[neighbor.radiusforneighborinself.neighbors]

236237if__name__=="__main__":238239nr_points=100000240dim=3241bucket_size=10242query_radius=10243244coords=200*random.random((nr_points,dim))245246kdtree=KDTree(dim,bucket_size)247248# enter coords249kdtree.set_coords(coords)250251# Find all point pairs within radius252253kdtree.all_search(query_radius)254255# get indices & radii of points256257# indices is a list of tuples. Each tuple contains the258# two indices of a point pair within query_radius of259# each other.260indices=kdtree.all_get_indices()261radii=kdtree.all_get_radii()262263print("Found %i point pairs within radius %f."%(len(indices),query_radius))264265# Do 10 individual queries266267foriinrange(0,10):268# pick a random center269center=random.random(dim)270271# search neighbors272kdtree.search(center,query_radius)273274# get indices & radii of points275indices=kdtree.get_indices()276radii=kdtree.get_radii()277278x,y,z=center279print("Found %i points in radius %f around center (%.2f, %.2f, %.2f)."%(len(indices),query_radius,x,y,z))280