Wednesday 20 May 2015

Extracting Vector Extents of PDFs derived from QGIS 3 with PyQGIS


I'm interested to create a vector point layer representing the centroid of some 500-1000 PDF files exported from QGIS to find PDFs geographically rather than using file explorer. I would like to automate this using PyQGIS.


I tested with the algorithm 'Tile Index' to manually extract the extents of 3 PDFs into a single GeoPackage through a batch process. Setting the output to the same .gpkg successfully appends each tile index into one GeoPackage. Writing the PyQGIS to take the centroid would then be straight forward. It is the first part which is convoluted. Below is the end result of my test.


example1



example2


I have a filing system where all 500-1000 PDFs are stored in client folders structured as below. The PDFs outlined green below are always stored in a folder named "!pdf's". An example of the file path is below.


example3


"V:/GIS - Files/1. Client Projects/A/Angus, A/ALN511253 - Lo/!pdf's/ALN511220-01.pdf"

I'd like to get the tile index of all 500-1000 PDFs in one swoop. Is it possible to target the "!pdf's" folder across all client folders using PyQGIS?


There are generic PDFs in the folder named 'Other' so I cannot search for all PDFs from the directory 'Client Projects'.



Answer



In pure Python3, here the solution :


import os

import pathlib

my_path = "V:/GIS - Files/1. Client Projects/"
pdf_parent_folder = "!pdf's"
pdf_paths = []

for path, sub, files in os.walk(my_path):
if pdf_parent_folder in sub:
for path, sub, files in os.walk(os.path.join(path, pdf_parent_folder)):
for name in files:

if os.path.splitext(name)[1] == ".pdf":
pdf_paths.append(str(pathlib.PurePath(path, name)))


  • os.walk : get all files in the directory and subdirectories,

  • files : contains list of all files,

  • os.path.splitext : split the file name and the extension,

  • pdf_paths : Python list which contains all .pdf file complete path.


EDIT for explain how to use



Once you have copied the script above, you can use it as :


# copy the entire code above

for pdf_file_path in pdf_paths:
python_pdf_object = method_that_open_and_read_the_pdf(pdf_file_path)
# script here for do something with the python_pdf_object
# ...
# end of the script

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...