GridDB Community Edition 4.1 and newer feature a Geometry data type that allows developers to combine both Time Series and Geospatial analysis in their applications. We’ve covered GridDB’s Geometry features before in our posts on Using Geometry Values with GridDB and Geometry Data Applications.
This blog post delves into practical use of the Geometry data type by looking at New York City’s historical crime complaint data. This data provides Latitude and Longitude of reported crimes so we will be using GridDB to see in what area a complaint occurred in. After demonstrating how to ingest the crime data from NYC Open Data, we’ll look at seeing how the amount of crime differs month to month in Central Park. We will also load external polygon data by seeing complaint counts in individual precincts.
The primary purpose of use of GridDB’s Geometry data type with Geospatial data points is being able to search where points, polylines (paths), or polygons (areas) intersect. Points, polylines, and polygons are defined using Well-known-text (WKT) which is a markup language that defines vector geometry objects on a map.
Ingest
We fetched the historic crime data from NYC Open Data. The following table shows both the CSV data fields and the GridDB schema.
CSV Values | GridDB Schema |
---|---|
|
public class Complaint { int CMPLNT_NUM; Date CMPLNT_FR_DT; Date CMPLNT_TO_DT; int ADDR_PCT_CD; Date RPT_DT; int KY_CD; String OFNS_DESC; int PD_CD; String PD_DESC; String CRM_ATPT_CPTD_CD; String LAW_CAT_CD; String BORO_NM; String LOC_OF_OCCUR_DESC; String PREM_TYP_DESC; String JURIS_DESC; int JURISDICTION_CODE; String PARKS_NM; String HADEVELOPT; String HOUSING_PSA; int X_COORD_CD; int Y_COORD_CD; String SUSP_AGE_GROUP; String SUSP_RACE; String SUSP_SEX; int TRANSIT_DISTRICT; float Latitude; float Longitude; Geometry Lat_Lon; String PATROL_BORO; String STATION_NAME; String VIC_AGE_GROUP; String VIC_RACE; String VIC_SEX; } |
For simplicity sake, we’re only using one container instead of splitting data into multiple containers.
Parsing the CSV easy with the CSVParser Library:
Iterablerecords = CSVFormat.RFC4180.withFirstRecordAsHeader().parse(in); for (CSVRecord record : records) { Complaint c = parseCsvRecord(record); if(c != null) col.put(c); } col.commit();
There are a few changes to the data that need to be made within the parseCsvRecord function. First, the complaint time is a non-standard but easily parse-able format of MM/DD/YYYY and HH:MM:SS.
String dt[] = r.get("CMPLNT_FR_DT").split("/"); String tm[] = r.get("CMPLNT_FR_TM").split(":"); c.CMPLNT_FR_DT = new Date(Integer.parseInt(dt[2])-1900, Integer.parseInt(dt[0])-1, Integer.parseInt(dt[1]), Integer.parseInt(tm[0]), Integer.parseInt(tm[1]), Integer.parseInt(tm[2]));
While the raw CSV contains WKT text for the point where the crime occurred in “lat lon” but the accepted WKT format is POINT(x y”) and Latitude is denotes Y-axis and Longitude the X-axis so we flip them.
c.Lat_Lon = Geometry.valueOf("POINT("+c.Longitude+" "+c.Latitude+")");
Crime by Precinct
NYC Open Data also provides WKT polygons of individual police precincts available here https://data.cityofnewyork.us/Public-Safety/Police-Precincts/78dh-3ptz
. Like the complaint data, it’s easily loaded with CSVParser but since each precinct may consist of multiple polygons and uses the WKT MULTIPOLYGON type some further processing is required to split the MULTIPOLYGON into simple POLYGONs.
String polys[] = record.get("the_geom").split("\\),"); int count=0; for(int i=0; i < polys.length; i++) { String subpoly = polys[i].replace("MULTIPOLYGON (", "").replace(")))", ")"); query = col.query("select * where ST_MBRIntersects(Lat_Lon, ST_GeomFromText('POLYGON"+subpoly+")') )"); rs = query.fetch(false); count =+ rs.size(); }
The results are as follows:
Precinct 1: 243 Precinct 52: 888 Precinct 5: 177 Precinct 60: 185 Precinct 6: 216 Precinct 61: n/a Precinct 71: 227 Precinct 62: 210 Precinct 72: 262 Precinct 63: 324 Precinct 7: 132 Precinct 66: 261 Precinct 9: 233 Precinct 68: 233 Precinct 22: 345 Precinct 69: 220 Precinct 10: 203 Precinct 70: 369 Precinct 13: 400 Precinct 76: 135 Precinct 14: 428 Precinct 77: 334 Precinct 17: 174 Precinct 78: 211 Precinct 20: 132 Precinct 81: 12 Precinct 18: 379 Precinct 83: 498 Precinct 19: 225 Precinct 84: 175 Precinct 23: 225 Precinct 88: 174 Precinct 24: 147 Precinct 90: 290 Precinct 25: 336 Precinct 94: 102 Precinct 79: 266 Precinct 100: 8 Precinct 26: 217 Precinct 101: 0 Precinct 28: 213 Precinct 102: 283 Precinct 30: 206 Precinct 103: 367 Precinct 32: 361 Precinct 104: 511 Precinct 73: 410 Precinct 105: 481 Precinct 33: 152 Precinct 106: 227 Precinct 34: 224 Precinct 107: 294 Precinct 75: 529 Precinct 108: 262 Precinct 40: 444 Precinct 109: 299 Precinct 41: 304 Precinct 110: 431 Precinct 42: 487 Precinct 111: 138 Precinct 43: 408 Precinct 112: 178 Precinct 48: 495 Precinct 113: n/a Precinct 44: 559 Precinct 114: 28 Precinct 45: 323 Precinct 115: 246 Precinct 46: 400 Precinct 120: 228 Precinct 47: 441 Precinct 121: 194 Precinct 49: 267 Precinct 122: 217 Precinct 50: 219 Precinct 123: 83 Precinct 67: 504
Central Park
Our first geospatial analysis is looking at crime in Central Park month to month too see if crime complaints increase or decrease with the temperature. As Central Park is not aligned to the North-South axis, you can't simply use a bounding box (where lat > min && lat < max && lon > min && lon < max) as you would rudimentary geospatial analysis queries. Instead we can build a polygon of each corner of Central Park and query crime complaints that intersect with that polygon. To find the points of Central Park and build the WKT object a useful tool is Wicket.
With that, we can build the TQL query
String CentralParkWKT = "POLYGON((-73.97308900174315 40.764422448981996,-73.98192956265623 40.76812781417226,-73.9584064734938 40.80087951931638,-73.94982340464614 40.797240957024385,-73.97308900174315 40.764422448981996))"; for(int month=0; month <= 11; month++) { int count=0; for (int year=108; year <= 118; year++) { Date start = new Date(year, month, 1); Date end = new Date(year, month+1, 1); Queryquery = col.query("select * where ST_MBRIntersects(Lat_Lon, ST_GeomFromText('"+CentralParkWKT+"')) and CMPLNT_FR_DT >= TO_TIMESTAMP_MS("+start.getTime()+") and CMPLNT_FR_DT < TO_TIMESTAMP_MS("+ end.getTime()+") "); RowSet rs = query.fetch(false); count += rs.size(); } System.out.println(month+": "+count); }
Now to answer our initial question, does the amount of crime change as the temperature changes?
January: 30 February: 23 March: 36 April: 33 May: 29 June: 19 July: 26 August: 49 September: 25 October: 23 November: 26 December: 18
The December and August totals would lend support to the argument, but January and June disagree thus the results are inconclusive.
If you're interested in exploring the data further or having a look at the complete code, it is available here.
If you have any questions about the blog, please create a Stack Overflow post here https://stackoverflow.com/questions/ask?tags=griddb .
Make sure that you use the “griddb” tag so our engineers can quickly reply to your questions.
[…] Read the complete article at: griddb.net […]