arcpy - postgis recursive query to spatially select stream network connectivity

Saturday, 11 May 2019

arcpy - postgis recursive query to spatially select stream network connectivity

I have a python class that takes a polyline streams layer, selects 1 record and tries to find the network connectivity by using attribute and spatial methods. For this question I will only focus on the spatial class method. I want to convert the spatial class method into a postgis query.

Here is the python class for reference:

import arcpy
arcpy.env.workspace=r'path_gdb'
arcpy.env.overwriteOutput=True


class Stream_Connection():
    def __init__(self,stream_layer,stream_name,query_name,where_name):
        self.stream_layer=stream_layer #polyline streams layer
        self.stream_name=stream_name #stream name to call FC
        self.query_name=query_name   #isolate stream record to start
        self.where_name=where_name   #query for all records that contain part of the name
        self.stream_network=None
    def attribute(self):
        '''this class method finds all the records that have part of the stream/river name

            in it and insert it into the self.stream_network layer'''
        query="SW_NAME = '{}'".format(self.query_name)
        where_clause_qry="SW_NAME LIKE '%{}%' ".format(self.where_name)
        self.stream_network=arcpy.Select_analysis(self.stream_layer,self.stream_name,query)
        with arcpy.da.InsertCursor(self.stream_network,("SW_NAME","SHAPE@")) as insert:
            with arcpy.da.SearchCursor(self.stream_layer,("SW_NAME","SHAPE@"),where_clause=where_clause_qry) as cur:
                for row in cur:
                    insert.insertRow(row)
    def spatial(self):
        '''this class method explodes the full polyline streams layer into singleparts

            and runs a while loop that finds all the streams that intersect the self.stream_network layer.
            those intersecting lines get added to the self.stream_network
            and continues until the number of records plateau and remain the same(no more features that intersect)'''
        streams_explode=arcpy.MultipartToSinglepart_management(self.stream_layer,r"in_memory\streams_boom")
        streams_explode_fl=arcpy.MakeFeatureLayer_management(streams_explode,'explode')
        previous=0
        while int(arcpy.GetCount_management(arcpy.MakeFeatureLayer_management(self.stream_network,'sn')).getOutput(0)) >0:
            nums= int(arcpy.GetCount_management(arcpy.MakeFeatureLayer_management(self.stream_network,'sn')).getOutput(0))
            streams_fl=arcpy.MakeFeatureLayer_management(self.stream_network,'sn')
            arcpy.SelectLayerByLocation_management(streams_explode_fl,"INTERSECT",streams_fl)

            #arcpy.SelectLayerByLocation_management(streams_explode_fl,"WITHIN_A_DISTANCE",streams_fl,"2 Feet","NEW_SELECTION")
            arcpy.Append_management(streams_explode_fl,streams_fl,schema_type="NO_TEST")
            arcpy.Select_analysis(streams_fl,'interm')
            arcpy.Dissolve_management('interm',self.stream_name,['SW_NAME'])
            print nums
            if previous == nums: break
            previous=nums

streams=Stream_Connection('streams','Musconetcong_Network','Musconetcong River','Musconetcong')
streams.attribute()

streams.spatial()

What the spatial class method does:

1.  Explodes the polyline streams layer into singleparts 
2.  Takes the record or records of streams selected in the attribute method(self.stream_network)
3.  Finds the singlepart streams that intersect the self.stream_network and append those records to it. This keeps running until there are no more new records getting added to the self.stream_network

Trying it out in postgis:

drop table if exists stream_connectivity;
create table stream_connectivity as

with RECURSIVE stream_connection as (
    select shape, sw_name 
        from streams where sw_name = 'Musconetcong River'
    union all
    select b.shape, b.sw_name
        from stream_connection a join streams_boom b 
    on st_intersects(a.shape,b.shape)
    )
select * from stream_connection

so from this tutorial http://www.postgresqltutorial.com/postgresql-recursive-query/ I have formulated this query. and from what I gather the first part of the CTE query selects 'Musconetcong River' record, then the second part in the CTE calls the first part and spatially joins the streams_boom layer. then those results are added to the 1st part.. and on and on and at some point during this recursive process if the results return zero then the query ends? This query has been running for 2+ days. The python script takes just 2 min. so I am just curious if I am formulating this correctly

UPDATE from comments and docs:

drop table if exists stream_connectivity;
create table stream_connectivity as
with RECURSIVE stream_connection(sw_name,shape,objectid) 
    as (
        select shape, sw_name,objectid,
           ARRAY[objectid] AS idlist
           from streams_boom where sw_name = 'Musconetcong River'
        union all

        select b.shape, b.sw_name,b.objectid,
           array_append(a.idlist,b.objectid) AS idlist
           from stream_connection a,streams_boom b 
           where st_intersects(a.shape,b.shape) 
           AND NOT a.idlist @> ARRAY[b.objectid]
    )
select objectid,sw_name,shape from stream_connection

FINAL QUERY THAT WORKED

I have correctly converted the entire python Stream_Connection class into a postgis query with the guidance of @ThingumaBob

*streamsx instead of streams_boom *query took 2 hours instead of 2 min :( so if anyone has any optimization recommendations please let me know

drop table if exists scrap.stream_connectivity;
create table scrap.stream_connectivity as
WITH
    segment AS (
        SELECT geom
        FROM scrap.streamsx
        WHERE sw_name = 'Musconetcong River'
    ),
    network AS (

        SELECT ST_Union(a.geom) AS geom,array_agg(id) ids
        FROM (
            SELECT ST_ClusterDBSCAN(geom, 0, 1) OVER() AS cluster_id,
                   geom,id
            FROM scrap.streamsx
        ) AS a
        GROUP BY cluster_id
    ),
sections as(SELECT (st_dump(a.geom)).geom geom,unnest(ids) id
        FROM network AS a

        JOIN segment AS b
            ON ST_dwithin(a.geom, b.geom,.01)
        ),
spatial_finished as(select a.geom,a.id,b.sw_name 
            from sections a join scrap.streamsx b
            on a.id=b.id
        ),
attribute as(select geom,id,sw_name 
        from scrap.streamsx 
        where sw_name like '%Musconetcong%'

        ),
distinct_ as(select distinct on(geom) geom,id,sw_name
        from spatial_finished
        union
         select distinct on(geom) geom,id,sw_name from attribute
         )
select st_union(geom) geom,sw_name
    from distinct_ group by sw_name

Answer

Does it have to be a recursive query? Those are a) rather complicated if not familiar with and b) rather inefficient for this specific task, at least in comparison to ST_ClusterDBSCAN.

Assuming streams_boom are your streams exploded into LINESTRINGs as geom with their name as name, running

SELECT name,
       ST_ClusterDBSCAN(geom, 0, 1) OVER(PARTITION BY name) AS conn_netw_id,
       geom
FROM streams_boom

would return your geometries with a conn_netw_id assigned to each connected sub-network grouped by name (note: there will be conn_netw_id's ranging from 0 - n for each name)

Update:

To get only that network connected to a pre-selected segment using ST_ClusterDBSCAN, you could do sth. like (not optimized):

WITH
    segment AS (
        SELECT geom
        FROM streams_boom

        WHERE  = 
    ),
    network AS (
        SELECT ST_Union(a.geom) AS geom
        FROM (
            SELECT ST_ClusterDBSCAN(geom, 0, 1) OVER() AS cluster_id,
                   geom
            FROM streams_boom
            WHERE  = 
        ) AS a

        GROUP BY cluster_id
    )

SELECT a.*
FROM network AS a
JOIN segment AS b
  ON ST_Intersects(a.geom, b.geom)

with being the uid of the segment of interest and being e.g. the name of the larger network (i.e. if you were looking for a sub-network connected to the segment with uid = 1 in the Amazonas, 'Amazonas' would be the ; I used this to limit the clustering on the river system you are looking at)

Blog

Saturday, 11 May 2019

arcpy - postgis recursive query to spatially select stream network connectivity

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?