Tuesday 19 June 2018

labeling - Measuring straightness of a curve segment (represented as a polyline)


I'm working on an automatic elevation contour labeling algorithm and one of the factors that I want to take into account when deciding the positions of labels is how "straight" a particular segment of a contour is. The more straight it is, the more likely it is to be used to place the label on that segment.


Each contour is represented by a polyline (but with points close together as to look like a curve to a naked eye). I then have a fixed length (width of a label), say, 100 pixels. If I randomly (or otherwise) choose a contour segment with the width of 100 pixels, I want to be able to get a numerical quantitative value of its straightness (say zero for a totally straight contour segment, some value larger than zero for a not so straight segment, and this value increasing as the crookedness increases).


I've searched around for answers but I couldn't find anything really useful. I'd be grateful for any pointers.



Answer



The answer depends on context: if you will be investigating only a small (bounded) number of segments, you might be able to afford a computationally expensive solution. However, it seems likely that you will want to incorporate this calculation within some kind of search for good label points. If so, it is of great advantage to have a solution that either is computationally fast or allows for rapid updating of a solution when the candidate line segment is varied slightly.


For instance, suppose you intend to conduct a systematic search across an entire connected component of a contour, represented as a sequence of points P(0), P(1), ..., P(n). This would be done by initializing one pointer (index into the sequence) s = 0 ("s" for "start") and another pointer f (for "finish") to be the smallest index for which distance(P(f), P(s)) >= 100, and then advancing s for as long as distance(P(f), P(s+1)) >= 100. This produces a candidate polyline (P(s), P(s+1) ..., P(f-1), P(f)) for evaluation. Having evaluated its "fitness" to support a label, you would then increment s by 1 (s = s+1) and proceed to increase f to (say) f' and s to s' until once more a candidate polyline exceeding the minimum span of 100 is produced, represented as (P(s'), ... P(f), P(f+1), ..., P(f')). In so doing, vertices P(s)...P(s'-1) are dropped from the preceding candidate and vertices P(f+1), ..., P(f') are added to it. It is highly desirable that the fitness could be rapidly updated from knowledge of only the dropped and added vertices. (This scanning procedure would be continued until s = n; as usual, f must be allowed to "wrap around" from n back to 0 in the process.)


This consideration rules out many possible measures of fitness (sinuosity, tortuosity, etc.) that otherwise might be attractive. It leads us to favor L2-based measures, because they typically can be updated quickly when the underlying data change slightly. Taking an analogy with Principal Components Analysis suggests we entertain the following measure (where small is better, as requested): use the smaller of the two eigenvalues of the covariance matrix of the point coordinates. Geometrically, this is one measure of the "typical" side-to-side deviation of the vertices within the candidate section of the polyline. (One interpretation is that its square root is the smaller semi-axis of the ellipse representing the second moments of inertia of the polyline's vertices.) It will equal zero only for sets of collinear vertices; otherwise, it exceeds zero. It measures an average side-to-side deviation relative to the 100 pixel baseline created by the start and end of a polyline, and thereby has a simple interpretation.


Because the covariance matrix is only 2 by 2, the eigenvalues are quickly found by solving a single quadratic equation. Moreover, the covariance matrix is a sum of contributions from each of the vertices in a polyline. Thus, it is rapidly updated when points are dropped out or added, leading to a O(n) algorithm for an n-point contour: this will scale well to the highly detailed contours envisioned in the application.


Here is an example of the result of this algorithm. The black dots are vertices of a contour. The solid red line is the best candidate polyline segment of end-to-end length greater than 100 within that contour. (The visually obvious candidate in the upper right is not quite long enough.)



Figure


No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...