Tuesday, 18 June 2019

raster - Perform GDAL Tile Index Loop on list of PDF file paths



I am close to creating a vector extent/bounding box for each PDF exported from QGIS using PyQGIS. I'm using the algorithm 'Tile Index' to attempt imputing a list of file paths to PDFs. The code below finds all relevant PDFs which I want to input into the algorithm.


Credit goes to J. Monticolo who helped get this going.


import os
import pathlib

my_path = "V:/GIS - Files/1. Client Projects/"
pdf_parent_folder = "!pdf's"
pdf_paths = []

for path, sub, files in os.walk(my_path):

if pdf_parent_folder in sub:
for path, sub, files in os.walk(os.path.join(path, pdf_parent_folder)):
for name in files:
if os.path.splitext(name)[1] == ".pdf":
pdf_paths.append(str(pathlib.PurePath(path, name)))

print(pdf_paths)

The above successfully lists all the relevant PDF file paths I want to input into the algorithm.


With help from @BenW, the proceeding code to loop the pdf_paths is as follows:



out_file = 'C:\\Users\\Username\\Desktop\\Example_Folder\\Output_file.gpkg'#             
change to your own file path
for pdf_path in pdf_paths:
processing.run('gdal:tileindex',
{'ABSOLUTE_PATH': True,
'LAYERS': pdf_path,
'OUTPUT': out_file})

I have approximately 600 PDF files I need to generate a tile index of. Once I run the script on a particular day (dd/mm/yyyy) I won't want to generate tile indexes of PDF files already stored in the .gpkg. Is there a snippet of code I can add which only inputs the PDF files not already in the .gpkg?



Answer




Assuming your pdf_paths is a list containing all your file path strings, it should be as simple as either of the approaches below:


Firstly, there is no need to import gdal in the script, since you are working with the GDAL provider in QGIS processing, so you can call gdal algorithms in the python console like any other QGIS processing alg.


Store the path to your output file in a variable.


If you want to loop over each file path in your list, do it like this:


out_file = 'C:\\Users\\Username\\Desktop\\Example_Folder\\Output_file.gpkg'# change to your own file path
for pdf_path in pdf_paths:
processing.run('gdal:tileindex',
{'ABSOLUTE_PATH': True,
'LAYERS': pdf_path,
'OUTPUT': out_file})


Note the code indentation- the processing.run() call is inside the loop. Make sure you pass the parameters as a python dictionary. In Python a dictionary is an associative array enclosed with curly braces and contains key/ value pairs separated by a colon.


e.g. {Key_1: value_1, Key_2: value_2}


Edit: I note that in your code snippet above you are just missing an opening brace and a closing parenthesis- that's probably the main problem.


For the LAYERS parameter we are passing the pdf_path object from our loop which will be a different path on each iteration.


However, even simpler is not to use a loop at all and just call the algorithm, passing your pdf_paths list object as the LAYERS parameter, since Tile Index allows multiple file inputs:


out_file = 'C:\\Users\\Username\\Desktop\\Example_Folder\\Output_file.gpkg'# change to your own file path
processing.run('gdal:tileindex',
{'ABSOLUTE_PATH': True,
'LAYERS': pdf_paths,

'OUTPUT': out_file})

For the OUTPUT in both methods we are just passing the path to the output geopackage so that all the outputs will be added to the same file. You can add other parameters to the dictionary if you like. I tested the both above code snippets in QGIS 3.4 in Windows 8.1 on a list containing a couple of file paths to test pdf files and was able to successfully create vector files of the pdf extents and add them to a gpkg file with both methods.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...