I'm using ArcGIS 10.0 on Windows 7 64-bit with 4GB of RAM.
I have some very large tables in CSV format to import to ArcGIS, they all have about 30 fields, upwards of 5 million records per table (a few have double that or more), and file sizes up to about 5 GB. I am trying to import each of them into a file geodatabase as separate tables so I can, ultimately, link them to a feature class and analyze the results in the tables according to their location.
The problem is that ArcGIS seems to just quit importing records at a certain point. I'm using the "Table to Table" tool under Conversion > To Geodatabase, but the "Copy Rows" tool has the same problem. Even if I just add the CSV file directly to ArcGIS without trying to convert it to an FGDB table first, the problem is the same. One of my tables has about 11 million records, and ArcGIS only imports about 10 million of them. ArcGIS doesn't tell me that any error has occurred, the tool just finishes as if nothing is wrong.
I've tried it a few times now and the number of records that make it into the FGDB table is always the same, and doesn't appear to be a file size limit I've ever heard of (not a square of 2 or 16). ArcGIS was able to import another CSV with about 6 million records and all the records came through (though with the problems I'm having with the larger table, the smaller one is kind of suspect now too). ESRI's web site lists the following size limits in a file geodatabase, and I'm far from hitting any of them:
- File geodatabase size: No limit
- Table or feature class size: 1 TB (default), 4 GB or 256 TB with keyword
- Number of feature classes and tables: 2,147,483,647
- Number of fields in a feature class or table: 65,534
- Number of rows in a feature class or table: 2,147,483,647
- Geodatabase name length: Number of characters the operating system allows in a folder
- Feature class or table name length: 160 characters
- Field name length: 64 characters
- Text field width: 2,147,483,647
All I really need to do to these tables are add a couple fields, delete a couple others, and generate values for the new fields (sums of a few of the existing fields). I'm using ArcGIS for it because I'm familiar with the field calculator and I know (or knew, until now) that it could handle tables consisting of millions of records, whereas most other desktop software I have handy (MS Access/Excel) chokes on that many records. So I'm open to using some other piece of software to manipulate the original table and then exporting the (much smaller) resulting table to ArcGIS. Really, the fact that I'm having this problem and that ArcGIS is not giving me any errors or warnings that the problem is even occurring makes me want to handle this data outside ArcGIS as much as possible.
Answer
I did call ESRI support about this and their answer wasn't encouraging, but it did explain the problem. Paraphrasing ESRI: The problem is that ArcGIS Desktop, being 32-bit software, is limited to using 4GB of RAM at the most. The text file has to be processed in RAM before being stored as a table, so at some poing during processing ArcGIS was hitting the RAM limit and just stopping there. The file I was importing was around 6GB in size. Apparently the fact that it failed without giving an error message is unique to me, I tried having other people in my office do it and the import still failed, but it gave an error message (an unhelpful one, but at least something that let the user know something went wrong), and the ESRI rep said that it should give an error.
My solution was to split the file into two smaller CSVs using a text editor (I used EditPad Pro), import each of them into an FGDB as a separate table, then merge the two FGDB tables. For some reason this failed the first time I tried it but worked later on. I may get around to testing this a little more fully, I'm going to be dealing with files this size on an ongoing basis.
I'm using ArcGIS 10.0, but ArcGIS 10.1 service pack 1 was just released and adds the ability to use a 64-bit background geoprocessor, which will let the geoprocessor use more than 4GB RAM, that may fix this problem but I can't test that.
UPDATE: I am now using ArcGIS 10.1 SP1 (with the 64-bit background geoprocessing addon) and it does successfully import these giant .CSVs, at least the ones I've dealt with so far. On a machine with 14GB of RAM (yes, 14), a 6GB .CSV with about 10.5 million rows successfully imports to an FGDB table.
No comments:
Post a Comment