Saturday, 11 May 2019

arcpy - postgis recursive query to spatially select stream network connectivity


I have a python class that takes a polyline streams layer, selects 1 record and tries to find the network connectivity by using attribute and spatial methods. For this question I will only focus on the spatial class method. I want to convert the spatial class method into a postgis query.


Here is the python class for reference:


import arcpy
arcpy.env.workspace=r'path_gdb'
arcpy.env.overwriteOutput=True


class Stream_Connection():
def __init__(self,stream_layer,stream_name,query_name,where_name):
self.stream_layer=stream_layer #polyline streams layer
self.stream_name=stream_name #stream name to call FC
self.query_name=query_name #isolate stream record to start
self.where_name=where_name #query for all records that contain part of the name
self.stream_network=None
def attribute(self):
'''this class method finds all the records that have part of the stream/river name

in it and insert it into the self.stream_network layer'''
query="SW_NAME = '{}'".format(self.query_name)
where_clause_qry="SW_NAME LIKE '%{}%' ".format(self.where_name)
self.stream_network=arcpy.Select_analysis(self.stream_layer,self.stream_name,query)
with arcpy.da.InsertCursor(self.stream_network,("SW_NAME","SHAPE@")) as insert:
with arcpy.da.SearchCursor(self.stream_layer,("SW_NAME","SHAPE@"),where_clause=where_clause_qry) as cur:
for row in cur:
insert.insertRow(row)
def spatial(self):
'''this class method explodes the full polyline streams layer into singleparts

and runs a while loop that finds all the streams that intersect the self.stream_network layer.
those intersecting lines get added to the self.stream_network
and continues until the number of records plateau and remain the same(no more features that intersect)'''
streams_explode=arcpy.MultipartToSinglepart_management(self.stream_layer,r"in_memory\streams_boom")
streams_explode_fl=arcpy.MakeFeatureLayer_management(streams_explode,'explode')
previous=0
while int(arcpy.GetCount_management(arcpy.MakeFeatureLayer_management(self.stream_network,'sn')).getOutput(0)) >0:
nums= int(arcpy.GetCount_management(arcpy.MakeFeatureLayer_management(self.stream_network,'sn')).getOutput(0))
streams_fl=arcpy.MakeFeatureLayer_management(self.stream_network,'sn')
arcpy.SelectLayerByLocation_management(streams_explode_fl,"INTERSECT",streams_fl)

#arcpy.SelectLayerByLocation_management(streams_explode_fl,"WITHIN_A_DISTANCE",streams_fl,"2 Feet","NEW_SELECTION")
arcpy.Append_management(streams_explode_fl,streams_fl,schema_type="NO_TEST")
arcpy.Select_analysis(streams_fl,'interm')
arcpy.Dissolve_management('interm',self.stream_name,['SW_NAME'])
print nums
if previous == nums: break
previous=nums

streams=Stream_Connection('streams','Musconetcong_Network','Musconetcong River','Musconetcong')
streams.attribute()

streams.spatial()

What the spatial class method does:


1.  Explodes the polyline streams layer into singleparts 
2. Takes the record or records of streams selected in the attribute method(self.stream_network)
3. Finds the singlepart streams that intersect the self.stream_network and append those records to it. This keeps running until there are no more new records getting added to the self.stream_network

Trying it out in postgis:


drop table if exists stream_connectivity;
create table stream_connectivity as

with RECURSIVE stream_connection as (
select shape, sw_name
from streams where sw_name = 'Musconetcong River'
union all
select b.shape, b.sw_name
from stream_connection a join streams_boom b
on st_intersects(a.shape,b.shape)
)
select * from stream_connection


so from this tutorial http://www.postgresqltutorial.com/postgresql-recursive-query/ I have formulated this query. and from what I gather the first part of the CTE query selects 'Musconetcong River' record, then the second part in the CTE calls the first part and spatially joins the streams_boom layer. then those results are added to the 1st part.. and on and on and at some point during this recursive process if the results return zero then the query ends? This query has been running for 2+ days. The python script takes just 2 min. so I am just curious if I am formulating this correctly


UPDATE from comments and docs:


drop table if exists stream_connectivity;
create table stream_connectivity as
with RECURSIVE stream_connection(sw_name,shape,objectid)
as (
select shape, sw_name,objectid,
ARRAY[objectid] AS idlist
from streams_boom where sw_name = 'Musconetcong River'
union all

select b.shape, b.sw_name,b.objectid,
array_append(a.idlist,b.objectid) AS idlist
from stream_connection a,streams_boom b
where st_intersects(a.shape,b.shape)
AND NOT a.idlist @> ARRAY[b.objectid]
)
select objectid,sw_name,shape from stream_connection

FINAL QUERY THAT WORKED


I have correctly converted the entire python Stream_Connection class into a postgis query with the guidance of @ThingumaBob



*streamsx instead of streams_boom *query took 2 hours instead of 2 min :( so if anyone has any optimization recommendations please let me know


drop table if exists scrap.stream_connectivity;
create table scrap.stream_connectivity as
WITH
segment AS (
SELECT geom
FROM scrap.streamsx
WHERE sw_name = 'Musconetcong River'
),
network AS (

SELECT ST_Union(a.geom) AS geom,array_agg(id) ids
FROM (
SELECT ST_ClusterDBSCAN(geom, 0, 1) OVER() AS cluster_id,
geom,id
FROM scrap.streamsx
) AS a
GROUP BY cluster_id
),
sections as(SELECT (st_dump(a.geom)).geom geom,unnest(ids) id
FROM network AS a

JOIN segment AS b
ON ST_dwithin(a.geom, b.geom,.01)
),
spatial_finished as(select a.geom,a.id,b.sw_name
from sections a join scrap.streamsx b
on a.id=b.id
),
attribute as(select geom,id,sw_name
from scrap.streamsx
where sw_name like '%Musconetcong%'

),
distinct_ as(select distinct on(geom) geom,id,sw_name
from spatial_finished
union
select distinct on(geom) geom,id,sw_name from attribute
)
select st_union(geom) geom,sw_name
from distinct_ group by sw_name

Answer



Does it have to be a recursive query? Those are a) rather complicated if not familiar with and b) rather inefficient for this specific task, at least in comparison to ST_ClusterDBSCAN.

Assuming streams_boom are your streams exploded into LINESTRINGs as geom with their name as name, running



SELECT name,
ST_ClusterDBSCAN(geom, 0, 1) OVER(PARTITION BY name) AS conn_netw_id,
geom
FROM streams_boom

would return your geometries with a conn_netw_id assigned to each connected sub-network grouped by name (note: there will be conn_netw_id's ranging from 0 - n for each name)

Update:

To get only that network connected to a pre-selected segment using ST_ClusterDBSCAN, you could do sth. like (not optimized):


WITH
segment AS (
SELECT geom
FROM streams_boom

WHERE =
),
network AS (
SELECT ST_Union(a.geom) AS geom
FROM (
SELECT ST_ClusterDBSCAN(geom, 0, 1) OVER() AS cluster_id,
geom
FROM streams_boom
WHERE =
) AS a

GROUP BY cluster_id
)

SELECT a.*
FROM network AS a
JOIN segment AS b
ON ST_Intersects(a.geom, b.geom)

with being the uid of the segment of interest and being e.g. the name of the larger network (i.e. if you were looking for a sub-network connected to the segment with uid = 1 in the Amazonas, 'Amazonas' would be the ; I used this to limit the clustering on the river system you are looking at)


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...