Monday 18 March 2019

arcpy - Compiling Python scripts (to .exe) that use ArcGIS Geoprocessing Tools?


I've been coding with Python for several months now and have developed some reasonably complex scripts for primarily geoprocessing tasks. That being said, I'm still learning a lot as I'm coming from a SQL/VBA/VBScript background.


I know that compiled code typically runs faster than code that must be processed by a language interpreter, so I'm interested in the possibility of compiling a geoprocessing Python script to a .EXE file for working with big data.


Is this even possible? If it is, what is the best way to compile a Python (.py) script that is importing the arcgisscripting or arcpy modules?


I spent a few minutes trying to find what I want to do and the search returned this article among others: http://www.ehow.com/how_2091641_compile-python-code.html


The compiler seemed to work, but upon executing the resulting .EXE file, it gave a cryptic error saying some files were unavailable.


The Python script runs what seems to be reasonably well from the command line, but I'm wondering if I could see some slight improvement if I were able to compile the .py file. Again, I'm working with some big datasets that are taking +20 hours to process (delineating watersheds from input water-quality sample sites). I'll take anything I can get in way of improvements.


The script ran 10% quicker outside of ArcGIS from the command line using a test set of sites versus setting the script up as a script tool in a new toolbox in ArcCatalog. I've been running the script from the command line w/o any instance of ArcGIS open on a dedicated machine.


So, is it possible to compile Python scripts that import the arcgisscripting module and that call ArcToolBox tools?



EDIT


Thanks for the input, this is helpful for me. The script is largely a way to coordinate a number of ArcGIS tools and to output in desired formats/locations/with appropriate attribution. I've already trimmed some fat I think by writing to a scratch folder instead of a scratch personal geodatabase for some interim raster files so they can be stored in the ESRI GRID format vs. the IMG format. I'll check out the profiler suggestions though.


There are some in my office that question Python saying "that compiled code is so much quicker than code running through an interpreter" mainly in comparison to, say, a compiled Visual Basic program or VB.NET program, but that is a good point that the tools are going to take time either way. And, it seems like with present day computing machines that interpreted code may not be that much slower than compiled code to warrant going that extra mile.


EDIT - update on optimization of the program with raster formats.


Wanted to follow up on my "optimization" of this Python program, and I was able to shave 2 hours of processing time by writing interim rasters to GRID format instead of to a personal geodatabase. Not only that, there was a SIGNIFICANT reduction in data size disk space consumption. The original run I did writing all rasters (and they were only point features converted to rasters, and then watershed rasters) resulted in 37.1 GB of data just for those files. Writing the latter two data outputs to a folder in GRID format was reduced to 667 MB of data.


I'd be curious to see how a file GDB would handle these data though mainly in way of the size of the data. But, cutting my processing time down from 9.5 hours to 7.5 hours certainly is enough to advocate for dealing with rasters outside of geodatabases in the GRID format.



Answer



First question: how much of this are you doing in Python? Are you just calling out to Geoprocessing tools or are you doing a significant amount of numeric analysis in Python? If the former, the bottlenecks likely live in the tools and using native code in your script won't buy you as much as some other clever workarounds. If the latter, then you may want to find what's slow and make it faster with better algorithms, or possibly numpy, or some other option as discussed below.


py2exe does not actually compile your code to native x86/x64, it just provides an executable that embeds your script as bytecode and provides a mostly portable way of distributing it to users without Python on their systems. It failed when attempting to bundle arcgisscripting, which is why it did not work. Actually getting py2exe working still won't do anything performance-wise.


I very strongly recommend you first use a profiler to identify the slow bits and optimize from there. There is a very good set built in to Python, use cProfile on a long run to find potential places to make it faster. From there you can optimize away sections into custom C or possibly experiment with small portions as Cython .pyx modules.



You can look into Cython for possibly building the whole Python script as a native code extension module, but Psyco may also give you an performance boost with a lower barrier to entry.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...