I have a python class that takes a polyline streams layer, selects 1 record and tries to find the network connectivity by using attribute and spatial methods. For this question I will only focus on the spatial class method. I want to convert the spatial class method into a postgis query.
Here is the python class for reference:
import arcpy
arcpy.env.workspace=r'path_gdb'
arcpy.env.overwriteOutput=True
class Stream_Connection():
def __init__(self,stream_layer,stream_name,query_name,where_name):
self.stream_layer=stream_layer #polyline streams layer
self.stream_name=stream_name #stream name to call FC
self.query_name=query_name #isolate stream record to start
self.where_name=where_name #query for all records that contain part of the name
self.stream_network=None
def attribute(self):
'''this class method finds all the records that have part of the stream/river name
in it and insert it into the self.stream_network layer'''
query="SW_NAME = '{}'".format(self.query_name)
where_clause_qry="SW_NAME LIKE '%{}%' ".format(self.where_name)
self.stream_network=arcpy.Select_analysis(self.stream_layer,self.stream_name,query)
with arcpy.da.InsertCursor(self.stream_network,("SW_NAME","SHAPE@")) as insert:
with arcpy.da.SearchCursor(self.stream_layer,("SW_NAME","SHAPE@"),where_clause=where_clause_qry) as cur:
for row in cur:
insert.insertRow(row)
def spatial(self):
'''this class method explodes the full polyline streams layer into singleparts
and runs a while loop that finds all the streams that intersect the self.stream_network layer.
those intersecting lines get added to the self.stream_network
and continues until the number of records plateau and remain the same(no more features that intersect)'''
streams_explode=arcpy.MultipartToSinglepart_management(self.stream_layer,r"in_memory\streams_boom")
streams_explode_fl=arcpy.MakeFeatureLayer_management(streams_explode,'explode')
previous=0
while int(arcpy.GetCount_management(arcpy.MakeFeatureLayer_management(self.stream_network,'sn')).getOutput(0)) >0:
nums= int(arcpy.GetCount_management(arcpy.MakeFeatureLayer_management(self.stream_network,'sn')).getOutput(0))
streams_fl=arcpy.MakeFeatureLayer_management(self.stream_network,'sn')
arcpy.SelectLayerByLocation_management(streams_explode_fl,"INTERSECT",streams_fl)
#arcpy.SelectLayerByLocation_management(streams_explode_fl,"WITHIN_A_DISTANCE",streams_fl,"2 Feet","NEW_SELECTION")
arcpy.Append_management(streams_explode_fl,streams_fl,schema_type="NO_TEST")
arcpy.Select_analysis(streams_fl,'interm')
arcpy.Dissolve_management('interm',self.stream_name,['SW_NAME'])
print nums
if previous == nums: break
previous=nums
streams=Stream_Connection('streams','Musconetcong_Network','Musconetcong River','Musconetcong')
streams.attribute()
streams.spatial()
What the spatial class method does:
1. Explodes the polyline streams layer into singleparts
2. Takes the record or records of streams selected in the attribute method(self.stream_network)
3. Finds the singlepart streams that intersect the self.stream_network and append those records to it. This keeps running until there are no more new records getting added to the self.stream_network
Trying it out in postgis:
drop table if exists stream_connectivity;
create table stream_connectivity as
with RECURSIVE stream_connection as (
select shape, sw_name
from streams where sw_name = 'Musconetcong River'
union all
select b.shape, b.sw_name
from stream_connection a join streams_boom b
on st_intersects(a.shape,b.shape)
)
select * from stream_connection
so from this tutorial http://www.postgresqltutorial.com/postgresql-recursive-query/ I have formulated this query. and from what I gather the first part of the CTE query selects 'Musconetcong River' record, then the second part in the CTE calls the first part and spatially joins the streams_boom layer. then those results are added to the 1st part.. and on and on and at some point during this recursive process if the results return zero then the query ends? This query has been running for 2+ days. The python script takes just 2 min. so I am just curious if I am formulating this correctly
UPDATE from comments and docs:
drop table if exists stream_connectivity;
create table stream_connectivity as
with RECURSIVE stream_connection(sw_name,shape,objectid)
as (
select shape, sw_name,objectid,
ARRAY[objectid] AS idlist
from streams_boom where sw_name = 'Musconetcong River'
union all
select b.shape, b.sw_name,b.objectid,
array_append(a.idlist,b.objectid) AS idlist
from stream_connection a,streams_boom b
where st_intersects(a.shape,b.shape)
AND NOT a.idlist @> ARRAY[b.objectid]
)
select objectid,sw_name,shape from stream_connection
FINAL QUERY THAT WORKED
I have correctly converted the entire python Stream_Connection class into a postgis query with the guidance of @ThingumaBob
*streamsx instead of streams_boom *query took 2 hours instead of 2 min :( so if anyone has any optimization recommendations please let me know
drop table if exists scrap.stream_connectivity;
create table scrap.stream_connectivity as
WITH
segment AS (
SELECT geom
FROM scrap.streamsx
WHERE sw_name = 'Musconetcong River'
),
network AS (
SELECT ST_Union(a.geom) AS geom,array_agg(id) ids
FROM (
SELECT ST_ClusterDBSCAN(geom, 0, 1) OVER() AS cluster_id,
geom,id
FROM scrap.streamsx
) AS a
GROUP BY cluster_id
),
sections as(SELECT (st_dump(a.geom)).geom geom,unnest(ids) id
FROM network AS a
JOIN segment AS b
ON ST_dwithin(a.geom, b.geom,.01)
),
spatial_finished as(select a.geom,a.id,b.sw_name
from sections a join scrap.streamsx b
on a.id=b.id
),
attribute as(select geom,id,sw_name
from scrap.streamsx
where sw_name like '%Musconetcong%'
),
distinct_ as(select distinct on(geom) geom,id,sw_name
from spatial_finished
union
select distinct on(geom) geom,id,sw_name from attribute
)
select st_union(geom) geom,sw_name
from distinct_ group by sw_name
Answer
Does it have to be a recursive query? Those are a) rather complicated if not familiar with and b) rather inefficient for this specific task, at least in comparison to ST_ClusterDBSCAN
.
Assuming streams_boom
are your streams exploded into LINESTRINGs as geom
with their name as name
, running
SELECT name,
ST_ClusterDBSCAN(geom, 0, 1) OVER(PARTITION BY name) AS conn_netw_id,
geom
FROM streams_boom
would return your geometries with a conn_netw_id
assigned to each connected sub-network grouped by name
(note: there will be conn_netw_id
's ranging from 0 - n for each name
)
Update:
To get only that network connected to a pre-selected segment using ST_ClusterDBSCAN
, you could do sth. like (not optimized):
WITH
segment AS (
SELECT geom
FROM streams_boom
WHERE =
),
network AS (
SELECT ST_Union(a.geom) AS geom
FROM (
SELECT ST_ClusterDBSCAN(geom, 0, 1) OVER() AS cluster_id,
geom
FROM streams_boom
WHERE =
) AS a
GROUP BY cluster_id
)
SELECT a.*
FROM network AS a
JOIN segment AS b
ON ST_Intersects(a.geom, b.geom)
with
being the uid of the segment of interest and
being e.g. the name of the larger network (i.e. if you were looking for a sub-network connected to the segment with uid = 1 in the Amazonas, 'Amazonas' would be the
; I used this to limit the clustering on the river system you are looking at)
No comments:
Post a Comment