I'm attempting to clip a large polygon dataset (~5gb, thousands of features) by a much smaller polygon dataset (~40 features). Is there a best practice or most efficient route for performing this task?
The standard geoprocessor clip runs indefinitely on a dataset of this size. Would some form of spatial selection and export be more efficient?
EDIT: Some great answers below. I selected what I view to be the most thorough response, but each answer provides unique insight into the issue. Thanks!
Answer
As always when dealing with scalability problems, it's best to start small and simple and steadily work your way up to big and complex.
In the case of clip, it should be smart enough to deal with big datasets because it tiles them internally. But since it's not working, try running Clip with the input dataset (the data to be clipped) and the clip dataset (the data with which the clip is performed) with many, many fewer features. Like one clip feature, with only the area around it in the input dataset (use definition queries to shrink them). Make sure that it's running okay, and then steadily increase the scope of the geoprocessing operation until performance degrades.
A couple specific ideas:
Dissolve the clip features into a single, multi-part feature class.
Reduce the file size of the input features using Simplify Polygon. A 5GB vector dataset is enormous--even a shapefile of all 250,000 US Census block groups is only about 1GB.
Split the input features into parts. Theoretically the internal tiling routines within the geoprocessing tool should be doing this already, but you never know. There may be some 32-bit file size limitation issue where you can't have a shapefile bigger than 232 bytes = 4.29GB.
Some other, more general geoprocessing performance tips:
Make sure both datasets have the same coordinate system. If possible, it's faster to have both in a geographic coordinate system with no projection.
Make sure you are not running off of a network drive. Use the fastest local hard drive or, if possible, an SSD.
Load the clip dataset into memory.
Delete unnecessary attribute fields (and rejoin them later if needed).
Other geoprocessing performance tips.
No comments:
Post a Comment