Saturday, 22 August 2015

qgis - How to avoid creating corrupt Shapefiles during Editing?



I have one of my GIS technicians digitizing some lines in QGIS in shapefile format. I don't know how he did it (and neither does he), but somehow the shapefile became corrupt. It was creating random lines or some of the lines he created would just disappear. I went in into ArcCatalogue to see how it looked in ArcGIS and this is what I saw:


enter image description here


Notice the question mark icon where I should see a shapefile 'line' icon. Obviously ArcCatalogue cannot read this file. Also, a second dbf file seemed to have been created with the '_packed' attached to the end. When I look at the shapefile using Windows explorer, I see that there already is a .dbf for the shapefile 'M3_PRE_SMU_lines_10Apr13_SMC.dbf', so I don't know where this _packed shapefile came from and I can't seem to find anything online that speaks to it.


I tried to add this file into ArcMap and I receieved the following error:


enter image description here


The error is pretty self explanatory...number of shapes does not match the number of records. I just don't know why that is occuring. There doesn't seem to be anything online that explains how this is occurring in QGIS, but I do see a couple repair tools. I actually repaired this myself by just opening QGIS, adding the layer, and then right clicking on the layer and 'saving as' another shapefile. So, I've figured a work around, but I'm hoping to find a solution that will stop this from occurring in the first place. Thanks, Mike



Answer




OGR (part of GDAL) is the library used by QGIS to access shapefiles. When OGR deletes features it does not delete them immediately, but just marks the features as deleted. Once in a while, a command called repack is executed, which creates a new file with the suffix _repack and copies all the features which are not marked as deleted to this new file. Once it finishes, the original .dbf is replaced with the _repack.dbf. It then does the same thing to the shapefile: create a new one (_packed.shp), copy all non-deleted features and eventually replace the original .shp.


It seems somewhere in this process, something failed (maybe a crash?).



Within this process, feature ids change, so I guess, that the shp (geometry) you have and the dbf (attribute table) use different feature ids for the same features, what leads to the strange behavior you experience. It seems, that one of the two files still contains (part of) the deleted features while the other one does not.



Update, Nov. 2016: GDAL 2.2 ships with builtin functionality to call repack automatically whenever the file is written to the disk. So before doing anything else: check the GDAL version in the QGIS about dialog and update your GDAL (often shipped as part of QGIS) release to a recent version.


There is probably not a lot you can do about it apart from making regular backups in order not to lose more data than you can handle (you are doing that anyway, right? 😉 ). And if you find a way to reproduce this (best with a sample dataset) create a bug report.


If you experience this problem again, you can also try to create a spatial index on the shapefile. In this process, QGIS will call repack again on the shapefile, and might "repair" the shp/dbf. But this is just an unverified guess.


As mentioned by @rhm and in the comments, it may also help to rename the {xyz}_packed.{ext} file to {xyz}.{ext}. If the packed file has already been completely written and it was just the rename which failed, it is absolutely valid to do this step manually. However if the _packed file has not been completely written you may be missing information from parts of your features. So before you try this make backup copies of all the involved files.



Between QGIS 2.0 and 2.8 repack has been called whenever the layer was unloaded (exit QGIS, load different project...). If a feature has been deleted or a geometry changed, .shp and .dbf files with records marked as deleted have been present.


Starting with QGIS 2.10 repack is called whenever the layer is saved after an operation that has the potential to add the deleted flag to records. Therefore the files should now always be in a sane state to be processed by other applications.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...