Saturday, 13 October 2018

python - Records of shapefile do not contain every field


I have taken this shapefile (warning > 1gB), and making my first steps towards decoding it in Python.


I had thought that for each field there would be a corresponding item in each record. Is that correct?


I have 21 fields, but only 20 items in each record(!). They almost, but not quite, align. What am I doing wrongly?


Fields



result = {list} : [('DeletionFlag', 'C', 1, 0), ['ID', 'N', 10, 0], ['SOURCE_ORG', 'C', 25, 0], ['SOURCE', 'C', 30, 0], ['EL_SURFACE', 'C', 15, 0], ['NORTH', 'N', 6, 0], ['SOUTH', 'N', 6, 0], ['WEST', 'N', 7, 0], ['EAST', 'N', 7, 0], ['X_SRCE_RES', 'N', 19, 
__len__ = {int} 21
00 = {tuple} : ('DeletionFlag', 'C', 1, 0)
01 = {list} : ['ID', 'N', 10, 0]
02 = {list} : ['SOURCE_ORG', 'C', 25, 0]
03 = {list} : ['SOURCE', 'C', 30, 0]
04 = {list} : ['EL_SURFACE', 'C', 15, 0]
05 = {list} : ['NORTH', 'N', 6, 0]
06 = {list} : ['SOUTH', 'N', 6, 0]
07 = {list} : ['WEST', 'N', 7, 0]

08 = {list} : ['EAST', 'N', 7, 0]
09 = {list} : ['X_SRCE_RES', 'N', 19, 4]
10 = {list} : ['Y_SRCE_RES', 'N', 19, 4]
11 = {list} : ['HORZ_UNIT', 'C', 15, 0]
12 = {list} : ['COORD_SYS', 'C', 25, 0]
13 = {list} : ['HORZ_DATUM', 'C', 25, 0]
14 = {list} : ['VERT_DATUM', 'C', 15, 0]
15 = {list} : ['VERT_UNIT', 'C', 15, 0]
16 = {list} : ['MIN_ELEV', 'N', 10, 0]
17 = {list} : ['MAX_ELEV', 'N', 10, 0]

18 = {list} : ['MEAN_ELEV', 'N', 19, 3]
19 = {list} : ['SDEV_ELEV', 'N', 19, 3]
20 = {list} : ['PROD_DATE', 'C', 10, 0]

First record


result = {tuple} : (b' ', b'       268', b'Univ of Bristol          ', b'Ant Radar and Laser Alt DEM   ', b'Reflective     ', b'   -55', b'   -90', b'   -180', b'    180', b'          1000.0000', b'          1000.0000', b'Meter          ', b'Polar Sterograph
__len__ = {int} 20
00 = {bytes} b' '
01 = {bytes} b' 268'
02 = {bytes} b'Univ of Bristol '

03 = {bytes} b'Ant Radar and Laser Alt DEM '
04 = {bytes} b'Reflective '
05 = {bytes} b' -55'
06 = {bytes} b' -90'
07 = {bytes} b' -180'
08 = {bytes} b' 180'
09 = {bytes} b' 1000.0000'
10 = {bytes} b' 1000.0000'
11 = {bytes} b'Meter '
12 = {bytes} b'Polar Sterographic '

13 = {bytes} b'WGS 84 '
14 = {bytes} b'WGS 84 '
15 = {bytes} b'Meter '
16 = {bytes} b' -82'
17 = {bytes} b' 4211'
18 = {bytes} b' 2152.694'
19 = {bytes} b' 1127.631'

You probably don't need the code, which is simple enough anyway, but here it is:


data = shapefile.Reader(arguments.data_path + '\\' + fileName)

fields = data.fields
records = data.records()

Btw, is there any advantage to using data.shapeRecords()?



Answer



The pyshp docs are not quite the worst I've seen, but they're in the wrong order. And have some errors, too.


pyshp is a pretty low-level module, and assumes you know quite a bit about the internals of a shapefile and dBASE files (yes, those old things). The reason you have 21 fields yet only 20 items is that DBFs have a hidden DeletionFlag field to mark whether a row has been deleted or not. This module leaves interpretation of this field up to the user, where higher level modules would silently skip those rows.


You're working with a large file here, so the recommendation to use the iterShapeRecords() method to iterate through large files is a good one. Here's a demo that trundles through a shapefile specified on the command line, printing some stuff about every row:


#!/usr/bin/env python
# pyshp test - show something about every row in a shapefile

# specified on the command line - scruss - 2016-09

import shapefile
import sys
from pprint import pprint

sf = shapefile.Reader(sys.argv[1])

# get field names, skipping deleted flag
field_names = []

for f in sf.fields[1:]:
field_names.append((f[0]))

count=0
for sr in sf.iterShapeRecords():
geom = sr.shape # get geo bit
rec = sr.record # get db fields
fields_of = dict(zip(field_names, rec))
print '### Record #', count, ':'
pprint(fields_of)

print ' Shape #', count, ' bounding box: ', geom.bbox
print
count=count+1

Running it on a file of local city ward data:


### Record # 0 :
{'CREATE_ID': 63519,
'GEO_ID': 14630026,
'LCODE_NAME': 'EA41',
'NAME': 'Scarborough-Rouge River (41)',

'OBJECTID': 1,
'SCODE_NAME': '41',
'SHAPE_AREA': '0.00000000000e+000',
'SHAPE_LEN': '0.00000000000e+000',
'TYPE_CODE': 'CITW',
'TYPE_DESC': 'Ward'}
Shape # 0 bounding box: [320686.25, 4848286.583, 325808.25, 4854989.0]

### Record # 1 :
{'CREATE_ID': 63519,

'GEO_ID': 14630028,
'LCODE_NAME': 'EA44',
'NAME': 'Scarborough East (44)',
'OBJECTID': 2,
'SCODE_NAME': '44',
'SHAPE_AREA': '0.00000000000e+000',
'SHAPE_LEN': '0.00000000000e+000',
'TYPE_CODE': 'CITW',
'TYPE_DESC': 'Ward'}
Shape # 1 bounding box: [329138.699, 4846015.756, 335747.204, 4852573.609]

....

No comments:

Post a Comment