Friday, 11 December 2015

vector - Are there any attempts to replace the shapefile?



Recently I've been spending a lot of time converting perfectly good field names like "Percent of citizens age 25 and over with a bachelor's degree or higher" into things like "edbchogtr" to meet the DBF's 10 character field name limit.


In another thread (“Oddities” in the Shapefile technical specification), geospatialpython commented that "Despite the shapefile format's flaws, oddities, and limitations it persists stubbornly in and around the field of GIS. Every other attempt to replace it has been too bloated for simple vector storage or too proprietary."


This activity coupled with Mr. Lawhead's comment has me wondering:




  • have any explicit attempts ever been made to replace the shapefile as GIS's ubiquitous data storage and interchange format?

  • Are there any contenders?

  • If there have been competing formats, why have they failed?

  • Has Esri refused to support them, or is the story simply one of technological inertia?

  • If there haven't been attempts... why not?


It seems like we could do a little better for ourselves, both as GIS developers and users.



Answer



This is a topic that always comes up. I may not have the right answer, but I can give you my personal opinion.


The reason that they are supported, can be attributed to several characteristics about them, so let me mention a few.





  • First, there is a spec. I mean, I am in my early thirties and this thing existed since I was a teenager. So it is safe to say that this spec has been around for some time. Of course, there are several other formats that are also published, but the difference about this one is that...




  • It is relatively simple! It is built on top of the DBF Format, which at the time already existed and was widely supported in several platforms/OSs. There were already parsers that could read half of this format (the DBF part), so it made supporting the extra addition easier. You have a geometry? Sure just serialize it and write it. You are done. Contrast this with a coverage! Try to explain to somebody in simple terms what a topology clean does. It is not trivial to write a topologically clean coverage.




  • Most importantly, I think the #1 reason for shapefiles to still be popular is that they are supported in both Open Source and Proprietary systems alike. What GIS do you know that doesn't support shapefiles?!? Unheard of.





As a replacement, we hear of File GeoDatabases and Spatialite. Both formats, are vastly superior in terms of functionality, flexibility, speed, etc. when compared to Shapefiles. In their own way, they have certain things that make them better than each other in different areas, but a comparison of spatialite and FileGDB is certainly out of the scope of this question.


Do I think that either of this formats will replace Shapefiles? Not in their current incarnations.


Why?


Not because of a technological argument (I did say they were superior in that aspect after all), but because of something else: licensing.


So what are their problems?


FileGDB:


FileGDB provides interoperability through the new FileGDB API. Nevertheless, this API is provided in binary format by ESRI. This is not a specification. Having worked in the GeoDatabase team in the past, I can tell you, contrary to all the tin-foil-hat-wearing conspiracy theorists, this is not malicious at all. It is because the internals of the GeoDatabase change on every release. Publishing a full spec would entail basically giving all the details of how everything is supposed to be maintained and then carefully documenting the changes to the format with every yearly release. It doesn't make sense. So the FileGDB API, even though it is not a spec, it abstracts out all those little changes. And now it can be used cross-platform! Mind you, this is a huge step forward! Considering the conservative nature of ESRI, this is definitely a reaction in the right direction.


And yet, binary-only support doesn't make anybody in the Open Source world too happy. How do you then take advantage of porting some code to say to some other flavor of Linux if ESRI doesn't support it. You can't. This is what makes Open Source powerful, and now, you cannot take advantage of this. If ESRI decides to stop supporting Debian, that's it. You are done. And there is nothing you can do to change it.


Spatialite:



Spatialite is awesome because it gets all the free functionality from SQLite. SQLite is used everywhere. It is on your Android Phone, on your iPhone/iPad, on Firefox, on Google Chrome, on several commercial embedded devices - can go on forever. To truly make it into a Geoformat (and not just do dumb bounding box operations), it needs to leverage the same geometry library that PostGIS uses: GEOS. Sadly, GEOS is based on another even more awesome geometry library known as JTS. All the algorithms in JTS are extremely powerful, so what is the problem?


Well, JTS is licensed as Open Source LGPL, and LGPL is a viral license. JTS is LGPL, means GEOS is LGPL, means spatialite linked statically with GEOS is LGPL. This sucks. Why? Without explaining open source licenses too much, I can tell you that, for example, I cannot use spatialite on, say, an iPhone app because that would make my entire app automatically open source (iOS only allows static linking). Any type of GPL license (reasonably) scares the crap out of ESRI, and so they will not touch it with a 10 foot pole. Hence, ArcGIS, the most popular GIS system in the world does not (and will probably never) support spatialite natively. This automatically kills it as a viable format.


And thus we go back to crappy shapefiles that are supported everywhere.


Update:


Apparently my answer was controversial enough that someone decided it was OK to freely edit and change the entire meaning of my answer to put their point of view. Please don't do that. If you disagree with me, that is completely fine, just post your opinion in a different answer and let the community decide. I rolled backed the edits to my answer to show the original meaning. I am adding this update in case you read the edited answer that claimed that sqlite was a viable format.


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...