Monday 16 September 2019

Performance of ArcGIS Engine using multiple file geodatabases as opposed to one?



I'm trying to decide the best way to organize my data for an ArcGIS Engine application. I am particularly interested in map display and query speed. Currently I have all of my data separated into separate file geodatabases based on theme. So I have Transportation.gdb, Utilities.gdb, etc. The data doesn't necessarily need to be organized based on themes, and I'm considering putting it all in one file geodatabase.


I will be doing my own testing, but I wanted to throw the question out to the community.


In general, is using a single file geodatabase faster than using multiple (roughly 7) smaller ones? I'm interested in any other pros/cons as well.


NOTE: the software and all data will be on the customer's local machine. No data served on the web or over a network, and the amount of data is fairly small (roughly 100,000 features).



Answer



I am going to go the other way and actually say that no, it is not a good performance improvement to separate the GeoDatabases for this particular use-case you described .


You have to remember that there is a cost associated with a connection to a DB. In the case of the GeoDatabase, it is loading all the related metadata tables. So whenever you separate your data into multiple GDBs, you are just increasing that cost, because now you have to open multiple versions of these tables (one for each DB). Multiplexing to query the different DBs usually may also mean i/o with cache that gets invalidated.


Nevertheless, there are a few cases when having multiple DBs may work better. For example. Consider the case of a personal gdb (not filegdb) that is 700MB vs two that are 350MB a piece. The MS Jet driver (what is used to interact with .mdb files) will memory map files smaller than 500MB - so if the machine has enough memory, you will be interacting with DBs fully in memory vs any disk i/o. Much much faster. The 700MB file will not be memory mapped.


Taking this case out of the equation, then it doesn't make sense to do separate dbs. ArcMap, as it is looping through the layers, will query each layer sequentially, so you don't have any parallelism going on.


You are better off rebuilding your FileGDB Indexes instead.



And yes, an SSD would definitely help.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...