Monday, 3 April 2017

qgis - How to easily edit attribute data using Regular Expressions?


I am writing a tutorial for MSc students that uses QGIS. One of the tasks requires the attribute data to be modified (see pdf draft of tutorial here if interested ).


The question is how to get them to edit the attribute data. I know this question has been asked before, and the answer used to be simple: just use LibreOffice calc to modify it and then save it, ensuring the row order and column titles are kept the same. I have used this approach before to great effect.


Unfortunately, the solution no longer works: saving the .dbf file, in LibreOffice (5.4.2) or OpenOffice (3.4.1) Calc scrambles the data from the perspective of QGIS (only random integer numbers appear in the attribute table), and creates an additional .dbt file instead.


So to get to the question: how do I tell a class of MSc students to do this next week? I've considered the following options:




  1. Use R instead (great for me, but scary for students)

  2. Edit the file as a .csv, and then join based on timestamp (fine, but seems overly complex, and could introduce more error in the 2 hour practical)

  3. Use an alternative program: I've tried Gnumeric and Excel, neither of which generated good results quickly.


To help answer this question, I've created a replicable example. Open the file "points.shp" in QGIS, observer the attribute table then close. Edit a couple of cells in the attribute table "points.dbf" in LibreOffice - save. Re-open "points.shp" in QGIS and observe the scrambled attribute table. The other shapefiles in the .zip folder have already been corrupted by LibreOffice and OpenOffice to illustrate what the problem looks like for me.


In short, how do I best solve this problem?



Answer



Simplest answer I've found, based on Ryan Garnett's advice to do it within QGIS:


Use regexp_replace



This capability was added to field calculator 1 year ago by Jürgen Fischer (as illustrated by this bug report). I tried for a while to find out how to do this, but failed searching for "qgis regex" and other vague terms.


It's functionality is probably best illustrated by the need to standardise all values in the "name" column. For example, if we need to replace all cells that contain the pattern "Tesc" with 'Tesco', removing superfluous information like "Tesco, Infirmary Road" (that OSM contributors frequently add), the following function can be used:


regexp_replace("name", 'Tesc.*','Tesco')


enter image description here


I guess this is Pythonic regular expression functionality, as it's a little different from grep, which I've used before.


I think more documentation (detail on the QGIS website is limited to a few words) is needed on how this works, so I've bookmarked the following pages for future reference:



This solution has a couple of advantages:



  1. Simple: no extra programs or add-ons needed


  2. Introduces students to the power of regular expressions, using (Pythonic?) functions (the "'" representing text is interesting. Warning: " or no symbol fails) and command-line functions.


Apologies for answering my own question, and many thanks to others for guiding me toward the light! I hope this question thread will be of educational use to others facing similar problems.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...