Inaccurate output (missing features) while reading a shapefile into networkx

Wednesday, 30 May 2018

Inaccurate output (missing features) while reading a shapefile into networkx

I am doing some work with networkx, which involves the conversion of a point and polyline shapefile into a graph with nodes and links. The documentation about the read_shp() method states that it:

Generates a networkx DiGraph from shapefiles. Point geometries are translated into nodes, lines into edges. Coordinate tuples are used as keys. Attributes are preserved, line geometries are simplified into start and end coordinates. Accepts a single shapefile or directory of many shapefiles.

My first observation is that the method does not read all the features in the shapefiles e.g. there are over 31000 points in my nodes shapefile but the length of nodelist returns only 4991 as against 31760 nodes. I observed the same behaviour for the lines shapefile. I have a small code snippet below as an illustration.

import networkx as nx
G = nx.read_shp("Sample_nodes.shp", simplify=False)
print len(G.nodes()) 

#output: 4991 instead of 31760

Is there something I am missing?

Answer

It is not complicated.

nx.read_shp uses ogr to read a shapefile , look at nx_shp.py line 78-88

then the script use a dictionary to add nodes to the Graph (lines 87-88, net.add_node((g.GetPoint_2D(0)), attributes))

First part of the script, ogr only

    shp  = ogr.Open("network_pts.shp")
    for lyr in shp:
    fields = [x.GetName() for x in lyr.schema]
    for f in lyr:
        flddata = [f.GetField(f.GetFieldIndex(x)) for x in fields]
        g = f.geometry()
        attributes = dict(zip(fields, flddata))
        attributes["ShpName"] = lyr.GetName()
        # Note:  Using layer level geometry type

        print g.GetPoint_2D(0), attributes

    (204097.29746070135, 89662.23095525998) {'ShpName': 'network_pts', 'type': 'one'}
    (204168.65175332528, 89745.26602176542) {'ShpName': 'network_pts', 'type': 'two'}
    (204110.75574365177, 89765.58041112455) {'ShpName': 'network_pts', 'type': 'three'}
    (204220.19951632406, 89794.7823458283) {'ShpName': 'network_pts', 'type': 'three'}
    (204097.29746070135, 89662.23095525998) {'ShpName': 'network_pts', 'type': 'one-bis'}

The shapefile contains 5 features with "one" and "one-bis" with the same coordinates and two others with the same type.

2) with read_shp

 G = nx.read_shp("network_pts.shp")
 G.number_of_nodes()
 4

Why

print G.node
{(204097.29746070135, 89662.23095525998): {'ShpName': 'network_pts', 'type': 'one-bis'}, (204110.75574365177, 89765.58041112455): {'ShpName': 'network_pts', 'type': 'three'}, (204220.19951632406, 89794.7823458283): {'ShpName': 'network_pts', 'type': 'three'}, (204168.65175332528, 89745.26602176542): {'ShpName': 'network_pts', 'type': 'two'}}

And for the points with same coordinates

print G.node[(204097.29746070135, 89662.23095525998)]

{'ShpName': 'network_pts', 'type': 'one-bis'}

Only one point was retained, the last one (insertions in a dictionary with same key)

net = nx.DiGraph()
net.node[(204097.29746070135, 89662.23095525998)] = {'ShpName': 'network_pts', 'type': 'one'}
net.node[(204097.29746070135, 89662.23095525998)] = {'ShpName': 'network_pts', 'type': 'one-bis'}
print net.node
{(204097.29746070135, 89662.23095525998): {'ShpName': 'network_pts', 'type': 'one-bis'}}

Blog

Wednesday, 30 May 2018

Inaccurate output (missing features) while reading a shapefile into networkx

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?