Spatial operations: R vs. spatialite

R and sqlite / spatialite have very different performance for the same spatial operation – in this example, an intersection. The example here is overlaying a point on a set of polygons and returning the data (census tract number).

A quick R function that solved the problem (using US census tract data from TIGER):

library(maps)
library(maptools)
library(mapdata) # this is optional but lets you use the 'state' map outline.
tractLookup <- function(x, y, state) {
  pt <- SpatialPoints(data.frame(x = x, y = y)) overlay.
  pt <- overlay(pt, state) # what index number does pt fall inside
  return(census$TRACT[overlay.pt]) # give the Tract number from the census layer
}
census <- readShapePoly("tr06_d00.shp")
tractLookup(-123.123, 40.789, census)
# Look at the map
plot(census)
map('state', c('California'), lwd = 2, col = 'green', add = F) # optional
points(SpatialPoints(data.frame(x = -123, y = 40)), col = 'red', lwd = 3)

system.time() gives me a calculation time of 5 seconds.

To do the same sort of thing in Spatialite (an SQLite spatially-extended database), add the California shapefile to spatialite and then use a WKT point to query:

SELECT TRACT FROM tr06_d00 WHERE MBRIntersects(GeomFromText('POINT(-123.123 40.789)'), Geometry);

This seems to be instantaneous! What about with a whole set of points? Import a set of ten random points generated in QGIS and try the same operation on both tables:

SELECT TRACT FROM tr06_d00, pts_cal WHERE intersects(pts_cal.Geometry, tr06_d00.Geometry);

Less than three seconds for ten points … spatialite is apparently 17 times faster for this operation.