Thursday, 4 April 2019

PyQGIS memory issues



I have some big problems with the memory usage of my standalone PyQgis script based on QGis 2.18 and run on ubuntu/linux (happens also on Windows 10 though). I will document it very precisely using memory profiler module.



It basically seems like QgsGeometry and other QGIS objects aren't deleted properly or in some cases memory is allocated without an obvious reason (background tasks?).


Do I explicitly need to call the destructor on Qgs Objects like QgsGeometry etc?


If so, how do I do it exactly?


importing and starting QgsApplication


import sys
import qgis
from qgis.core import *
from PyQt4 import *
from PyQt4.QtCore import *
from PyQt4.QtGui import *

qgs = QgsApplication(sys.argv, False)
qgs.initQgis()

memory profiler ouput


39   89.312 MiB   89.312 MiB   @profile
40 def main():
41 89.312 MiB 0.000 MiB f = '/media/sven/Dual_data/ma/site_search_program/gis_data/basis_dlm/gew01_f.shp'
42
43 93.023 MiB 3.711 MiB layer_l = QgsVectorLayer(f, 'file', 'ogr')
44 168.070 MiB 75.047 MiB geom_dict = {feat.id(): QgsGeometry(feat.geometry()) for feat in layer_l.getFeatures()}

45 168.078 MiB 0.008 MiB del layer_l
46 168.078 MiB 0.000 MiB size = 0.0
47 168.078 MiB 0.000 MiB for id, geom in geom_dict.iteritems():
48 168.078 MiB 0.000 MiB size += sys.getsizeof(geom)
49 168.078 MiB 0.000 MiB print 'size geom dict', size/1024/1024
50 168.078 MiB 0.000 MiB print 'input count', len(geom_dict)
#print outputs:
#size geom dict 2.96440124512 [MB]
#input count 20450


Why do the geometries have a size of 1.4 MB but 75 MB is used? Maybe deleting the layer will help? - No it actually costs memory? (Yeah not allot - still..)


51  168.078 MiB   -3.828 MiB       geom_dict = {id: geom for id, geom in geom_dict.iteritems() if id < 100}
52 164.250 MiB -3.828 MiB size = 0.0
53 164.102 MiB -0.148 MiB for id, geom in geom_dict.iteritems():
54 164.102 MiB 0.000 MiB size += sys.getsizeof(geom)
55 164.102 MiB 0.000 MiB print 'size shrunk geom dict', size/1024/1024
56 164.102 MiB 0.000 MiB print 'input count', len(geom_dict)
#print outputs:
#size shrunk geom dict 0.0144958496094
#input count 100


Lets delete most of the geoms and look at the memory usage: Yeah we free up some memory but by far not the share of 75 MB that should be expected - still more than to be expected when comparing sizes with sys module though.


57  164.855 MiB    0.754 MiB       buffered_geoms = {id: geom.buffer(100, 5) for id, geom in geom_dict.iteritems()}
58 164.855 MiB 0.000 MiB del geom_dict
59 164.855 MiB 0.000 MiB geom_dict = buffered_geoms
60 164.855 MiB 0.000 MiB size = 0.0
61 164.855 MiB 0.000 MiB for id, geom in geom_dict.iteritems():
62 164.855 MiB 0.000 MiB size += sys.getsizeof(geom)
63 164.855 MiB 0.000 MiB print 'buffered geoms size', size/1024/1024
#print outputs:

#buffered geoms size 0.0144958496094

Deleting the dict of geometries should free up its memory usage, shouldn't it? It doesn't apparently.


Note this propably isn't necessary anymore and the code gets a bit messy - but while I am at it - this is my real problem: Afterwards I want to unit the buffered geoms to calculate the area without calculating overlapping geometries. As said the code gets a bit messy - I hope it is understandable. I guess it will be the same problem as above. With big input layers this runs into using up all my memory and freezing my computer. As far as I understand I am not creating any permanent new Objects, even freeing up used ones, but memory usage rises and rises (Note not visible in memory profiler cause of small input/only last iteration memory usage is noted, you can still see rising memory usage though)


Calculating area of buffered geoms after uniting them


65                                 #calculating area of buffered geoms
66 #create spatial Index
67 164.855 MiB 0.000 MiB spaInd = QgsSpatialIndex()
68 164.855 MiB 0.000 MiB feat = QgsFeature()
69 164.855 MiB 0.000 MiB for nr, geom in geom_dict.iteritems():

70 164.855 MiB 0.000 MiB feat.setFeatureId(nr)
71 164.855 MiB 0.000 MiB feat.setGeometry(geom)
72 164.855 MiB 0.000 MiB feat.setValid(True)
73 164.855 MiB 0.000 MiB spaInd.insertFeature(feat)
74 164.855 MiB 0.000 MiB print 'spaInd size', sys.getsizeof(spaInd)/1024/1024
75 164.855 MiB 0.000 MiB size = 0
76 164.855 MiB 0.000 MiB ges_area = 0
77 #unit intersecting polys to calculate area
78 #should be no permanent objects - resulting geometries arent saved
79 #it moreover should free up memory with deleting input geometries - it doesnt though - it runs into memory issues

80 #very fast with big layers until freezing of pc
81 165.301 MiB 0.000 MiB for geom_id in geom_dict.keys():
82 165.301 MiB 0.000 MiB try:
83 165.301 MiB 0.000 MiB geom = geom_dict[geom_id]
84 165.301 MiB 0.000 MiB skip = {geom_id}
85 165.301 MiB 0.445 MiB abs_geom = geom.geometry()
86 165.301 MiB 0.000 MiB bboxes = [abs_geom.boundingBox()]
87 165.301 MiB 0.000 MiB united = False
88 165.301 MiB 0.000 MiB unit = True
89 #cause of enlarged polygon we need to do this repetitively until there are no geoms intersecting

90 165.301 MiB 0.000 MiB while unit:
91 165.301 MiB 0.000 MiB unit = False
92 165.301 MiB 0.000 MiB new_bboxes = []
93 # intersection tests done with engine for performance
94 165.301 MiB 0.000 MiB engine = QgsGeometry.createGeometryEngine(abs_geom)
95 165.301 MiB 0.000 MiB engine.prepareGeometry()
96 165.301 MiB 0.000 MiB intersecting_geoms = [abs_geom]
97 165.301 MiB 0.000 MiB for bbox in bboxes:
98 165.301 MiB 0.000 MiB ids = spaInd.intersects(bbox)
99 #geometries arent replaced so already combined geoms are saved in skip and not done again

100 #delted geometries are also not used
101 165.301 MiB 0.000 MiB ids = [i for i in ids if i not in skip and i in geom_dict]
102 165.301 MiB 0.000 MiB if len(ids) >= 100:
103 print 'uniting still working len ids', geom_id
104 165.301 MiB 0.000 MiB for id in ids:
105 165.301 MiB 0.000 MiB abs_geom_in = geom_dict[id].geometry()
106 165.301 MiB 0.000 MiB if engine.intersects(abs_geom_in):
107 165.301 MiB 0.000 MiB unit = True
108 #use bounding boxes of intersecting geoms for new spaInd requests to not use extremly large
109 #bbox of resulting geometry if e.g. a river network is combined

110 165.301 MiB 0.000 MiB new_bboxes.append(abs_geom_in.boundingBox())
111 165.301 MiB 0.000 MiB intersecting_geoms.append(abs_geom_in)
112 165.301 MiB 0.000 MiB skip.add(id)
113 165.301 MiB 0.000 MiB if unit:
114 165.301 MiB 0.000 MiB bboxes = new_bboxes
115 165.301 MiB 0.000 MiB united = True
116 165.301 MiB 0.000 MiB abs_geom = engine.combine(intersecting_geoms)
117
118 165.301 MiB 0.000 MiB if not united:
119 #make an independent deep copy of QgsAbstractGeometryV2 Object if it didnt get created by engine

120 #otherwise SegFaults will happen - also note that making a deep copy of the QgsGeometry object
121 #will still not prevent SegFaults
122 165.301 MiB 0.000 MiB abs_geom = type(abs_geom)(abs_geom)
123 165.301 MiB 0.000 MiB geom = QgsGeometry(abs_geom)
124 165.301 MiB 0.000 MiB ges_area += geom.area()
125 # united_geom_dict[geom_id] = geom
126 # if not len(united_geom_dict) % 1000:
127 # print 'uniting still working len geom dict'
128 165.301 MiB 0.000 MiB for id2 in skip:
129 # free memory

130 165.301 MiB 0.000 MiB del geom_dict[id2]
131 165.301 MiB 0.000 MiB except KeyError:
132 165.301 MiB 0.000 MiB pass
133 165.301 MiB 0.000 MiB print 'ges_area', ges_area
134 165.301 MiB 0.000 MiB print 'empty geom dict', geom_dict
#print outputs:
#ges_area 4710204.56653
#empty geom dict {}

Pretty long... short tl/dr:



QgsGeometry and other Qgs objects use a lot more memory than to be expected by measuring their size with sys.getsizeof(). Also they don't seem to be destroyed correctly. Do I need to call the destructor explicitly and if so, how do I do that? I thought deleting with del (and being the only reference) should do that.


Also can someone explain the destructor usage to me? According to https://qgis.org/api/2.18/classQgsGeometry.html#aacaf2856a136d270dcf274649439adf7 it should be geom.~QgsGeometry()? This doesn't work though.




No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...