Thursday, 16 November 2017

convert - Alternatives to ogr2ogr for loading large GeoJson file(s) to PostGIS


I have a 7GB GeoJson file that I would like to load into a PostGIS database. I have tried using ogr2ogr but it fails because the file is too big for ogr2ogr to load into memory and then process.


Are there any other alternatives for loading this geojson file into PostGIS?



The ogr2ogr error I get is:



ERROR 2: CPLMalloc(): Out of memory allocating -611145182 bytes. This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information.




Answer



The sample that you sent shows that it may be possible to manually split the file using an editor like notepad++


1)For each chunk create a header:


{"type":"FeatureCollection","features":[

2)After the header place many features:



{"geometry": {"type": "Point", "coordinates": [-103.422819, 20.686477]}, "type": "Feature", "id": "SG_3TspYXmaZcMIB8GxzXcayF_20.686477_-103.422819@1308163237", "properties": {"website": "http://www.buongiorno.com", "city": "M\u00e9xico D.F. ", "name": "Buongiorno", "tags": ["mobile", "vas", "community", "social-networking", "connected-devices", "android", "tablets", "smartphones"], "country": "MX", "classifiers": [{"category": "Professional", "type": "Services", "subcategory": "Computer Services"}], "href": "http://api.simplegeo.com/1.0/features/SG_3TspYXmaZcMIB8GxzXcayF_20.686477_-103.422819@1308163237.json", "address": "Le\u00f3n Tolstoi #18 PH Col. Anzures", "owner": "simplegeo", "postcode": "11590"}},

3) Finish the chunk with:


]}

EDIT - Here is python code that will split the file in pieces of defined size (in number of features):


import sys

class JsonFile(object):
def __init__(self,file):

self.file = open(file, 'r')
def split(self,csize):
header=self.file.readline()
number=0
while True:
output=open("chunk %s.geojson" %(number),'w')
output.write(header)
number+=1
feature=self.file.readline()
if feature==']}':

break
else:
for i in range(csize):
output.write(feature)
feature=self.file.readline()
if feature==']}':
output.write("]}")
output.close()
sys.exit("Done!")
output.write("]}")

output.close()

if __name__=="__main__":
myfile = JsonFile('places_mx.geojson')
myfile.split(2000) #size of the chunks.

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...