In a past New York Crime blog, we covered querying geospatial data with GridDB’s Geometry features.
While the Geometry and WKT (Well-known text representation of geometry) querying works well, there are two downsides: it is more computationally intensive than numerical comparison, and it is only available in C and Java APIs. If you wish to use Python or JDBC, you need to use a different method. In this blog, we’ll showcase querying the same dataset with simple bounding boxes and also demonstrate how you can partition geospatial data into multiple containers.
Partition Data
To make use of GridDB’s Key-Container architecture, we’ll partition data into different containers. This allows the binary search trees to be very short while distributing data evenly.
There are several ways we could do this and in this post we’ll showcase the simplest method, by rounding latitude and longitude.
With Java this is accomplished with DecimalFormat while Python uses math.ceil() and if
DecimalFormat df = new DecimalFormat("#");
df.setRoundingMode(RoundingMode.CEILING);
df.setNegativePrefix("n");
String cName = "geo_"+df.format(c.Latitude*100)+"_"+df.format(c.Longitude*100);
cname = "geo_"+re.sub('-','n', str(math.ceil(lat*100)))+"_"+re.sub('-','n',str(math.ceil(lon*100)))))
We modified Ingest.java from our first blog in which we used Geometry data types to query the data.
DecimalFormat df = new DecimalFormat("#");
df.setRoundingMode(RoundingMode.CEILING);
df.setNegativePrefix("n");
for (CSVRecord record : records) {
try {
if (i++ % 1000 == 0) System.out.println("Ingesting "+ i);
Complaint c = parseCsvRecord(record);
if(c != null) {
String cName = "geo_"+df.format(c.Latitude*100)+"_"+df.format(c.Longitude*100);
System.out.println("adding "+c+" to "+cName);
Collection<String, Complaint> col = store.putCollection(cName, Complaint.class);
col.setAutoCommit(false);
col.put(c);
col.commit();
}
} catch(Exception e) {
System.out.println("Failed to ingest "+i);
System.out.println(e);
}
}
Which produces the following output:
adding 927207428: Fri Jan 03 13:30:00 GMT 2014 @ (40.745241809, -73.894253382) to geo_4075_n7389
adding 492142357: Wed Apr 13 00:00:00 BST 2016 @ (40.810351863, -73.924942326) to geo_4082_n7392
adding 572616350: Mon Aug 18 14:30:00 BST 2014 @ (40.683659778, -73.85154207) to geo_4069_n7385
... snip ...
Bounding Box
Now that we have data stored in GridDB we can query it. The first step is to find the boundaries, with Python we can use the geopy distance module.
import geopy
from geopy import distance
d = geopy.distance.geodesic(meters=radius)
start = geopy.Point(lat, lon)
north = d.destination(point=start, bearing=0)[0]
south = d.destination(point=start, bearing=180)[0]
east = d.destination(point=start, bearing=90)[1]
west = d.destination(point=start, bearing=270)[1]
Since the data is partitioned the next step is to generate a possible list of containers that may need to be queried.
If your query area distance is always smaller than the partition size, you can simply define the containers as a set:
containers = { re.sub('-', 'n', re.sub('\.', '', "geo_%d_%d" %(math.ceil(north*100), math.ceil(east*100)))),
re.sub('-', 'n', re.sub('\.', '', "geo_%d_%d" %(math.ceil(north*100), math.ceil(west*100)))),
re.sub('-', 'n', re.sub('\.', '', "geo_%d_%d" %(math.ceil(south*100), math.ceil(east*100)))),
re.sub('-', 'n', re.sub('\.', '', "geo_%d_%d" %(math.ceil(south*100), math.ceil(west*100))))};
But if your query area is larger than the partition size, you will need to iterate through all possible containers:
containers=[]
clat = math.ceil(south*100)
while clat <= math.ceil(north*100):
clon = math.ceil(west*100)
while clon <= math.ceil(east*100):
containers.append("geo_"+re.sub('-','n', str(clat))+"_"+re.sub('-','n',str(clon)))
clon+=1
clat+=1
With a list of containers set, we can now query them with a simple TQL statement that searches for points within the bounding box:
tql = "select * where Latitude > %f and Latitude < %f and Longitude < %f and Longitude > %f" %(south, north, east, west)
for container in containers:
ts = gridstore.get_container(container)
try:
query = ts.query(tql)
rs = query.fetch(False)
while rs.has_next():
complaints.append(rs.next()
except:
pass
It is important to catch exception within the loop while performing this kind of search as it is quite likely first containers do not have any results and will throw an exception while there are results in containers queried later.
Conclusion
Using bounding boxes is a simple method of finding data near a given location. The source code for this blog is — as always — available in the GridDB.net GitHub repository.
If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.