Wednesday, 25 May 2016

pgrouting - pgr_createTopology with large datasets


I have a road network with 8.5 million edges but when I run pgr_createTopology it processes edges at a rate of about 1000 edges per second up until 360,000 edges where it slows down and eventually fails with the following error.


server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I am running postgres 10.1 postgis 2.4 pgrouting 2.5. What would like cause it to fail like this? Or is there any way that I could process the network in 100,000 edge chunks?



Answer



The following is what I am using. Some of it is specific to our deployment environment since we are using docker and some bash scripts to deploy and set up the server. You could easily get rid of all the argeparse/os.getenv and hardcode the connection if you wanted.



import argparse
from os import getenv
import psycopg2

parser = argparse.ArgumentParser()
parser.add_argument("-H", "--host", help="host location of postgres database", type=str)
parser.add_argument("-U", "--user", help="username to connect to the database", type=str)
parser.add_argument("-d", "--dbname", help="database name", type=str)
parser.add_argument("-p", "--port", help="port to connect to postgres", type=str)
args = parser.parse_args()

password = getenv('POSTGRES_PASSWORD')

conn = psycopg2.connect(
f"dbname={args.dbname} user={args.user} host={args.host} port={args.port} password={password}"
)
cur = conn.cursor()
print("connected to database")

cur.execute("SELECT MIN(id), MAX(id) FROM ways;")
min_id, max_id = cur.fetchone()

print(f"there are {max_id - min_id + 1} edges to be processed")
cur.close()

interval = 200000
for x in range(min_id, max_id+1, interval):
cur = conn.cursor()
cur.execute(
f"select pgr_createTopology('ways', 0.000001, 'the_geom', 'gid', rows_where:='id>={x} and id<{x+interval}');"
)
conn.commit()

x_max = x + interval - 1
if x_max > max_id:
x_max = max_id
print(f"edges {x} - {x_max} have be processed")

cur = conn.cursor()
cur.execute("""ALTER TABLE ways_vertices_pgr
ADD COLUMN IF NOT EXISTS lat float8,
ADD COLUMN IF NOT EXISTS lon float8;""")


cur.execute("""UPDATE ways_vertices_pgr
SET lat = ST_Y(the_geom),
lon = ST_X(the_geom);""")

conn.commit()

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...