Saturday, 29 July 2017

shapefile - Fastest spatial storage format?



I am wondering what storage method will result in the fastest reading of the map vectors for rendering. SHP? PostGres? SQLite? (They do not change often and I do not need spatial functions for these vectors).



Answer




There are some very speed tests of shapefiles versus database (PostGIS) for MapServer in this presentation (from 2007).


In summary:



  • For a dataset of 3 million features running requests for 30 features one after another PostGIS was faster than shapefile (although this may have since changed by a fix to reading the shapefile index)

  • For a dataset of 10,000 features shapefile was slightly faster.

  • For concurrent requests shapfile was faster



And the times in detail, which can also help to decide if the storage format is an important factor.


                       PostGIS   Shapefile 
Start mapserv process 15ms 15ms
Load mapfile 3ms 3ms
Connect to DB 14ms n/a
Query 20ms n/a
Fetch 7ms n/a
Draw 11ms 28ms
Write Image 8ms 8ms
Network Delay 3ms 3ms


Always use FastCGI in MapServer if using a database, as the database connections can be reused, otherwise a new connection must be created on every request.



The speed of reading a shapefile (and data from a database) depends on the specific coding implementation.


The source code for MapServer opening a shapefile can be seen here. Following the comments you can see how important it is to have an index. Normally you can only read a file in one direction to get a record, but with an index you can read in two directions.


345   /*    Read the .shx file to get the offsets to each record in             */
346 /* the .shp file.

Another is example of opening a shapefile can be seen in the Python source for PyShp. Again you can see how an index is used to find specific shapes directly.




The limitations of the DBF format (limits on field size, no null support, limits on text storage), should also be taken into consideration when deciding on whether or not to use a database.


A database also offers means of securing data, easier joining and creation of views, logging and many other features you won't get with a standalone file.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...