qgis - Using nearest neighbor query in SpatiaLite?

Tuesday, 3 November 2015

qgis - Using nearest neighbor query in SpatiaLite?

I am trying to find the nearest line to each point in a layer, and I have over 1 million points. I want to be able to associated the closest unique line ID with the point unique IDs.

I have tried to use the solution given in this post Finding nearest neighbors between two tables with point locations in SpatiaLite? on a small subset, and I do get a result table but the distance value is null, and the IDs from the lines that are associated with each point are not the IDs for the closest lines.

I am new with SpatialLite and running spatial queries in SQL. I don't really understand the purpose of this statement: WHERE g1.OGC_FID <> g2.OGC_FID.

I have also played around with this line: AND ST_Contains(ST_Expand(g1.geometry,50),g2.geometry) as well as removed it and still get no distance values, even though I am getting an ID.

I was wondering if it might be a coordinate system issue but both the points and the lines are in the same system. I am using the QSpatiaLite plugin in QGIS. Can anyone provide some guidance?

Using the answer below I was able to get the query working on some test data. However I am having trouble applying the spatial index. I've tried the query below, but it runs slower than the query statement without the spatial index. What am I doing wrong?

UPDATE UgandaBikeTaxis_test SET dist=(SELECT 
    ST_Distance(p.geometry, l.geometry) AND
p.ROWID IN
(SELECT ROWID FROM SpatialIndex WHERE f_table_name= "ugandabiketaxis_test"  AND search_frame=l.Geometry)
    FROM osm_filtered_prj_test AS l, ugandabiketaxis_test AS p
    WHERE p.uniqueID = ugandabiketaxis_test.uniqueID
    ORDER BY ST_Distance(p.geometry, l.geometry) LIMIT 1);

Answer

You can do this with two UPDATE statements, one for the distance, and the second for the line ID, with a subquery to get the values from the line table. And use the

ORDER BY ST_Distance(...) LIMIT 1

construct to get only the closest line.

I have a cities point layer, and a hiways line layer. Each has a primary key column 'pk'. I added to the cities two columns: 'dist' and 'line_id' to hold the values for the closest line. Here are the queries:

UPDATE cities_il SET line_id=(SELECT l.pk
    FROM hiways_il AS l, cities_il AS p
    WHERE p.pk=cities_il.pk
    ORDER BY ST_Distance(p.geometry, l.geometry) LIMIT 1);
UPDATE cities_il SET dist=(SELECT 
    ST_Distance(p.geometry, l.geometry)
    FROM hiways_il AS l, cities_il AS p
    WHERE p.pk=cities_il.pk

    ORDER BY ST_Distance(p.geometry, l.geometry) LIMIT 1);

Note the form of the WHERE clause in the subquery. Here you set the primary key in the subquery equal to the primary key in the outer update statement, to be sure to update each row separately.

In my example I had only 100 or so cities and 10,000 hiways. And each UPDATE took a minute or so. In your case, with a million points, you will definitely need to add a spatial index to the lines (maybe also the points), and use it in your query. Otherwise it will probably take hours. You'll find plenty of examples on spatial indexing with spatialite, such as: https://www.gaia-gis.it/fossil/libspatialite/wiki?name=SpatialIndex

Blog

Tuesday, 3 November 2015

qgis - Using nearest neighbor query in SpatiaLite?

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?