-
Notifications
You must be signed in to change notification settings - Fork 32
Tutorial: Radial Search
This tutorial covers the RadialSearch
method. Getting the number of points within a specified radius of a target point occurs in some machine learning algorithms such as the DBSCAN algorithm. This also has applications to spatial data-set queries. Such as the number of coffee shops within 25 miles of your current location. It is not hard to imagine similar situations requiring a radial search.
To construct a KDTree follow the steps in the wiki page Tutorial: Nearest Neighbor Search or follow the code in the wiki's Full Example Program
Similar to the nearest neighbor tutorial we generate 1,000,000 points and 10,000 test points to perform radial searches against.
var treeData = GenerateData(1000000, 1000);
var treeNodes = treeData.Select(p => string.Empty).ToArray();
var testData = GenerateData(10000, 1000);
var tree = new KDTree<double, string>(2, treeData, treeNodes, L2Norm);
// Measure the time to search
var stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < testData.Length; i++)
{
var test = tree.RadialSearch(center: testData[i], radius: 100, neighboors: 3);
}
stopwatch.Stop();
Console.WriteLine("Milliseconds: " + stopwatch.ElapsedMilliseconds);
Console.Read();
On my machine I get the following results:
Millisecods: 119
This is almost exactly the same speed as the result for our nearest neighbor search for the top 3 neighbors. This is to be expected as the nearest neighbor search is a radial search with an infinite radius and specified number of neighbors.
If we do not specify the number of neighbors the radial search will return all neighbors in the specified radius. Let run this again search for all neighbors in the given radius. Also, lets keep track of the average number of points returned.
modifying the code to be:
// Generate some data for the tree and testing
var treeData = GenerateData(1000000, 1000);
var treeNodes = treeData.Select(p => string.Empty).ToArray();
var testData = GenerateData(10000, 1000);
var tree = new KDTree<double, string>(2, treeData, treeNodes, L2Norm);
var pointsReturned = new List<int>();
// Measure the time to search
var stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < testData.Length; i++)
{
var test = tree.RadialSearch(center: testData[i], radius: 100);
pointsReturned.Add(test.Length);
}
stopwatch.Stop();
Console.WriteLine("Milliseconds: " + stopwatch.ElapsedMilliseconds);
Console.WriteLine("Average Number of Points: " + pointsReturned.Average());
Console.Read();
We get the following.
Milliseconds: 2247
Average Number of Points: 311.5181
This query is much "slower" as there are now many more branches in the KDTree to traverse.
##Discussion
There is actually much going on in this method. Lets look at a few things.
Specifying the number of neighbors returned in a radial search completely changes the intent of the query. The query tree.RadialSearch(center: testData[i], radius: 100, neighboors: 3)
answers the question "What are the 3 closest coffee shops from my location within 100 miles" where the query tree.RadialSearch(center: testData[i], radius: 100)
answers the questions what are all the coffee shows from my location within 100 miles?
A radial search without the specified number of neighbors will always be more expensive that a nearest neighbors search as more branches have to be traversed during the search process. However, a single radial search is still extremely quick. Keep in mind, the total time for all searches combined was 2247 milliseconds. The average time in milliseconds for a single search was only 0.0007 milliseconds.
When you specify the radius for your radial search it the radius as measured using your metric function. Take for example the euclidean distance without a square root as a metric function.
Func<double[], double[], double> L2Norm = (x, y) =>
{
double dist = 0;
for (int i = 0; i < x.Length; i++)
{
dist += (x[i] - y[i]) * (x[i] - y[i]);
}
return dist;
};
When calling tree.RadialSearch(center: testData[i], radius: 100)
on a KDTree using this metric function, we are looking for all points from the center testData[i] whose squared distance is less than or equal to 100.