Wednesday, 30 May 2018

Inaccurate output (missing features) while reading a shapefile into networkx


I am doing some work with networkx, which involves the conversion of a point and polyline shapefile into a graph with nodes and links. The documentation about the read_shp() method states that it:




Generates a networkx DiGraph from shapefiles. Point geometries are translated into nodes, lines into edges. Coordinate tuples are used as keys. Attributes are preserved, line geometries are simplified into start and end coordinates. Accepts a single shapefile or directory of many shapefiles.



My first observation is that the method does not read all the features in the shapefiles e.g. there are over 31000 points in my nodes shapefile but the length of nodelist returns only 4991 as against 31760 nodes. I observed the same behaviour for the lines shapefile. I have a small code snippet below as an illustration.


import networkx as nx
G = nx.read_shp("Sample_nodes.shp", simplify=False)
print len(G.nodes())

#output: 4991 instead of 31760

Is there something I am missing?




Answer



It is not complicated.




  • nx.read_shp uses ogr to read a shapefile , look at nx_shp.py line 78-88




  • then the script use a dictionary to add nodes to the Graph (lines 87-88, net.add_node((g.GetPoint_2D(0)), attributes))





First part of the script, ogr only


    shp  = ogr.Open("network_pts.shp")
for lyr in shp:
fields = [x.GetName() for x in lyr.schema]
for f in lyr:
flddata = [f.GetField(f.GetFieldIndex(x)) for x in fields]
g = f.geometry()
attributes = dict(zip(fields, flddata))
attributes["ShpName"] = lyr.GetName()
# Note: Using layer level geometry type

print g.GetPoint_2D(0), attributes

(204097.29746070135, 89662.23095525998) {'ShpName': 'network_pts', 'type': 'one'}
(204168.65175332528, 89745.26602176542) {'ShpName': 'network_pts', 'type': 'two'}
(204110.75574365177, 89765.58041112455) {'ShpName': 'network_pts', 'type': 'three'}
(204220.19951632406, 89794.7823458283) {'ShpName': 'network_pts', 'type': 'three'}
(204097.29746070135, 89662.23095525998) {'ShpName': 'network_pts', 'type': 'one-bis'}

The shapefile contains 5 features with "one" and "one-bis" with the same coordinates and two others with the same type.


2) with read_shp



 G = nx.read_shp("network_pts.shp")
G.number_of_nodes()
4

Why


print G.node
{(204097.29746070135, 89662.23095525998): {'ShpName': 'network_pts', 'type': 'one-bis'}, (204110.75574365177, 89765.58041112455): {'ShpName': 'network_pts', 'type': 'three'}, (204220.19951632406, 89794.7823458283): {'ShpName': 'network_pts', 'type': 'three'}, (204168.65175332528, 89745.26602176542): {'ShpName': 'network_pts', 'type': 'two'}}

And for the points with same coordinates


print G.node[(204097.29746070135, 89662.23095525998)]

{'ShpName': 'network_pts', 'type': 'one-bis'}

Only one point was retained, the last one (insertions in a dictionary with same key)


net = nx.DiGraph()
net.node[(204097.29746070135, 89662.23095525998)] = {'ShpName': 'network_pts', 'type': 'one'}
net.node[(204097.29746070135, 89662.23095525998)] = {'ShpName': 'network_pts', 'type': 'one-bis'}
print net.node
{(204097.29746070135, 89662.23095525998): {'ShpName': 'network_pts', 'type': 'one-bis'}}

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...