First, what am I trying to do?
I am building a Python script that should be called from an ArcMap 10.0 toolbox. This scripts uses (among other inputs) data contained in huge gdb tables. It processes the data in many different ways (hence, there are many inevitable iterations on the table).
Why would I want to convert the data?
The reason why I am converting the tables to ditionaries is that it is extremely slow to iterate through the table using SearchCursor. I have many operations that I want to do on my data and it is much simpler and faster to use existing data structures.
How am I doing it?
The usual way to use gdb tables in Python (and the one to which I have compared my method) is something along the lines of:
# Create a search cursor
rows = arcpy.SearchCursor(table)
for row in rows:
(...)
What I'm doing instead, is that I am converting the table to dbf using TableToTable_conversion (throwing away some useless columns/rows on the way). Then I convert this DBF table to a dictionary (I was inspired by some code written by Tyrtamos, if you google "Lecture d'un fichier dbase III" you should find it). In the end, I have a dictionary containing lists of data indexed with the column names. And I'm very happy because I can do many things with it relatively quickly.
Finally,...
This method seems to be much faster. However, I am new to that game and I'm afraid to be missing something because I don't understand everything that's going on. Is it a bad idea to do that? Are there reasons not to do it other than potential memory overflow?
Please tell me if something is unclear. Thanks!
EDIT
After seeing your answers, I realize that there was one very important information missing: I'm using Arcmap 10.0.
I have run performance tests using nmpeterson's solution, modifying a few lines to make the code compatible with ArcGIS 10.0
cursor = arcpy.SearchCursor(fc, cursor_fields)
for row in cursor:
rowList = []
for field in cursor_fields:
rowList.append(row.getValue(field))
attr_dict[rowList[0]] = dict(zip(cursor_fields, rowList))
On a small table containing ~15000 rows, I measured an average elapsed time and got
make_attribute_dict -- 10.7378981873
gdb_to_dbf_to_dict-- 2.56576526461
No comments:
Post a Comment