Wednesday, 14 March 2018

Performance of joining tables on a geometry column with SQL Server


I had a shapefile that I needed to integrate into my SQL Server DB. To do so, I used the very useful Shape2Sql.exe with which I chose 'Set SRID 4326' in the settings.


I now have a table that contains a geometry column while another table is showing to me the latitude and longitude of different addresses along with other useful information. This can be summed up as follows:


CREATE TABLE dbo.myTable1 (
FULL_ADDRESS varchar(120) NULL,
LONGITUDE varchar(50) NULL,
LATITUDE varchar(50) NULL
)


CREATE TABLE dbo.myTable2 (
POLYGON_NAME nvarchar(255) NULL,
geom geometry NULL
)

What I would like is to join the two tables based on the sets of coordinates that match a certain polygon. I took some inspiration from here and I basically did this:


SELECT DISTINCT
mt1.FULL_ADDRESS
, mt1.LONGITUDE
, mt1.LATITUDE

, mt2.POLYGON_NAME
INTO #TEMP_TABLE
FROM myTable1 mt1
LEFT JOIN myTable2 mt2
ON geom.STIntersects(geometry::STGeomFromWKB((geography::Point(mt2.LATITUDE, mt2.LONGITUDE, 4326)).STAsBinary(), 4326)) = 1

(the DISTINCT is useful because, for some reason that I can't explain yet, I get several matches for a single set)


The results thereof are exactly what I expect but, for 10,000 sets of coordinates, it almost took 3 hours. Problem is, I'm supposed to find the polygons of 25,000,000 sets (which means 300 days of computation...). I therefore need to find a trick to improve that query performance by several orders of magnitude.


BONUS QUESTION:


When I don't store the results of my query inside a temp table (i.e. when I want to see the results), I get the following error message:




A .NET Framework error occurred during execution of user-defined routine or aggregate "No extended information available.": .



But everything is fine when using a SELECT... INTO #TEMP_TABLE instead of a plain and simple SELECT. I couldn't find any piece of information about this message. Any idea to solve it?



Answer



The best way to increase spatial join is to create spatial index.


CREATE INDEX table_gist ON table USING GIST(geom);

Since your first table has coordinates but no geometry you can't create a spatial index. That's why it takes so much time.


You should consider adding a geometry column on your first table based on your coordinate.



ALTER TABLE mytable1
ADD COLUMN geom geometry;

UPDATE mytable1
SET geom=ST_MakePoint(longitude, latitude);

Then you can create an index on mytable1. Your ST_Intersects between your two geometries will be much faster.


PS : I haven't talk about geography/geometry and srid but you should handle it :)


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...