Tuesday 16 June 2015

What is the difference between Coverage, Shapefiles and Geodatabases in ArcGIS?


I was wondering about the difference in spatial data storage methodology used by Coverages, Shapefiles and Geodatabases in ArcGIS. Coverage was the initial format, followed by Shape Files and now Geodatabases. So what has improved in these newer formats of Shapefiles and Geodatabases?


It would be great if someone could please explain it with examples.



Answer



This is such a great question. Coverages, Shapefile and Geodatabases are fundamentally different geospatial data stores from an implementation standpoint as well as from a philosophical one. I'll try to summarize without going too deep into it.


1. Coverages:


Coverages are interesting geospatial data structures. They concentrate on storing topology. So you will see that the emphasis is in storing the geometry elements first, that is the nodes, edges that make up all the geometries. You will then see a separate set of tables that relate those geometries to the attributes (and hence they "become" features).


From the ESRI help


A "clean" coverage guarantees certain rules, for example, that there are nodes at every node intersection, you will not have two (or more) nodes on top of each other (or even within a fuzzy tolerance distance), that there are not two edges on top of each other, etc. They also have a sense of direction (from->to) and can distingish between what is to its left and right side.



Clean coverage from ESRI help


Coverages work really well for edits that require awareness of topological relationships (imagine editing a parcel boundary). In addition, coverages compress very well since they remove geometric redundancy by design. In fact, you will see that nowadays, modern formats like TopoJSON started using the same techniques that we learned from coverages several decades back.


Coverages can be a bit more complicated to work with when you are dealing with 3D data (for example modeling a bridge that has an upper side and a lower side right below) because the algorithms that we used to use to deal with them were inherently meant for 2D planar graph math.


So why did we move away from it? That would take a longer answer, but perhaps we should explain a bit more what made ESRI Shapefiles popular first.


2. ESRI Shapefiles:


Along came the Shapefile. Probably the most important characteristic that it had was that it was/is an Open Specification that was (comparatively) simple to implement. The attributes leveraged DBF files, so there were already many libraries that implemented a big part of the spec. There was no concept of "clean", which meant that each individual geometry only had to worry about representing itself without taking in consideration the geometries around them or that they intersected. This meant that we did not have to do any complicated math to make sure that a shapefile was correct (unlike the coverage counterpart).


Have multiple geometries that cross each other? Sure why not. Two points on top of each other? Be my guest.


Sometimes, the (arguably) "best" format is not the one that wins, but the one that gets adopted. If a format is easy to implement, it has better chances to be adopted than a complicated one. That was the Shapefile.


All of a sudden you had several libraries (open source and proprietary) and software vendors that supported it. So all was great.


The obvious question is then - why Geodatabases?



3. Geodatabases:


I believe Geodatabases are one of the most misunderstood geospatial data stores. People usually think of them as just "a geospatial format". A couple of years ago, somebody asked "What are ESRI Geodatabases?". Instead of repeating what my answer was then, I welcome you to read that first. I'll wait :)


Now that you read that answer and know what a Geodatabase is, I can expand a bit more on that answer. At the time, there was a lot of research optimizing SQL and writing query optimizers that leveraged indexes, column stores, etc (there still is). By building the Geodatabase on top of a SQL datastore, we can leverage all that research for free. We only need to concentrate on the geospatial concepts, and as the SQL data stores get better, the Geodatabase gets better, too, for free. Not a bad proposition huh?


Nowadays, there are several specifications for geospatial data that come out. The jury is still out there on what is going to replace these technologies (if anything). Nevertheless, if you are interested in this topic, I recommend reading the answer to a questions asked here in GIS.SE some years back: "Are there any attempts to replace the shape file"


I hope this helps!


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...