Thursday, 7 April 2016

python - Explaining pyproj to_epsg min_confidence parameter?


Following on from this question How can I get epsg code in pyproj with version==2.1.3, I discovered that sometimes when intialising a pyproj Proj instance with a string ESPG code eventually results in an ambiguity when trying to determine the projection's EPSG code.


from pyproj import Proj
crs = Proj(init='epsg:4544').crs
print(crs.to_epsg())

# prints None

If you adjust themin_confidence parameter, then you can get it out:


from pyproj import Proj
crs = Proj(init='epsg:4544').crs
print(crs.to_epsg(min_confidence=25) # <-- !!
# prints 4544

I feel like this is something I should understand, but I don't. What is the ambiguity here; why does the to_epsg method require this min_confidence parameter to return a valid EPSG code in this case?


Reading the source code is not enlightening regarding the source of the ambiguity:



Parameters
----------
min_confidence: int, optional
A value between 0-100 where 100 is the most confident. Default is 70.

Returns
-------
int or None: The best matching EPSG code matching the confidence level.

Reading that function, as well as to_authority, I surmise that the ambiguity arises when there are multiple matches, and the confidence score allows pyproj to return the best candidate. I don't particularly understand that either, but it does make sense. What really doesn't make sense is how None can be a better candidate than 4544 in this case, especially when the Proj instance was initialised without an exception or warning.



Notably, using a CRS instance instead, but with the same EPSG string input, seems to be unambiguous:


from pyproj import CRS
crs = CRS('epsg:4544')
print(crs.to_epsg())
# prints "4544"

>>> import pyproj
>>> pyproj.__version__
'2.2.0'

Answer




This is a great question and I will do my best to answer.


To begin, the init style syntax is deprecated (https://pyproj4.github.io/pyproj/stable/gotchas.html#init-auth-auth-code-should-be-replaced-with-auth-auth-code). So, instead of CRS(init="epsg:4544"), you should use CRS("epsg:4544").



I discovered that sometimes when intialising a pyproj Proj instance with a string ESPG code eventually results in an ambiguity when trying to determine the projection's EPSG code.



The reason that the EPSG code does not appear with the CRS initialized with the init= syntax is that the CRS are different.


>>> from pyproj import CRS
>>> crs_deprecated = CRS(init="epsg:4544")
>>> crs = CRS("epsg:4544")
>>> crs == crs_deprecated

False

Upon further inspection, you can see that the difference is in the axis order (although it is not immediately apparent to the eye due to the content being the same).


>>> crs_deprecated

Name: CGCS2000 / 3-degree Gauss-Kruger CM 105E
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:

- name: China - 103.5°E to 106.5°E
- bounds: (103.5, 22.5, 106.5, 42.21)
Coordinate Operation:
- name: Gauss-Kruger CM 105E
- method: Transverse Mercator
Datum: China 2000
- Ellipsoid: CGCS2000
- Prime Meridian: Greenwich

>>> crs


Name: CGCS2000 / 3-degree Gauss-Kruger CM 105E
Axis Info [cartesian]:
- X[north]: Northing (metre)
- Y[east]: Easting (metre)
Area of Use:
- name: China - 103.5°E to 106.5°E
- bounds: (103.5, 22.5, 106.5, 42.21)
Coordinate Operation:
- name: Gauss-Kruger CM 105E

- method: Transverse Mercator
Datum: China 2000
- Ellipsoid: CGCS2000
- Prime Meridian: Greenwich



why does the to_epsg method require this min_confidence parameter to return a valid EPSG code in this case?



The reason the min_confidence parameter exists is because of the example above. If you have a WKT/PROJ string and you use it to create the CRS instance, in most cases you want to be sure that the EPSG code given by to_epsg will give you a CRS instance similar to the one created by the WKT/PROJ string. However, if an EPSG code does not exist that matches you WKT/PROJ string with a min_confidence you don't want to get that EPSG code back as it will make you think that the WKT/PROJ string and the EPSG code are one and the same when they are not.



However, if you are only wanting to get the EPSG code that is closest to the PROJ/WKT string, then you can reduce your min_confidence to a threshold you are comfortable with.


Here is an example of that:


>>> crs_deprecated = CRS("+init=epsg:4326")
>>> crs_deprecated.to_epsg(100)
>>> crs_deprecated.to_epsg(70)
>>> crs_deprecated.to_epsg(20)
4326
>>> crs_latlon = CRS("+proj=latlon")
>>> crs_latlon.to_epsg(100)
>>> crs_latlon.to_epsg(70)

4326
>>> crs_epsg = CRS.from_epsg(4326)
>>> crs_epsg.to_epsg(100)
4326
>>> crs_wkt = CRS(crs_epsg.to_wkt())
>>> crs_wkt.to_epsg(100)
4326
>>> crs_wkt == crs_epsg
True
>>> crs_epsg == crs_latlon

False
>>> crs_epsg == crs_deprecated
False

Hopefully this helps resolve some of the confusion.


For reference of the pyproj related versions I used in this example:


$ python -c "import pyproj; pyproj.show_versions()"

System:
python: 3.7.3 | packaged by conda-forge | (default, Mar 27 2019, 23:01:00) [GCC 7.3.0]

executable: ~/miniconda3/envs/pyproj/bin/python
machine: Linux-4.15.0-51-generic-x86_64-with-debian-buster-sid

PROJ:
PROJ: 6.1.1
data dir: ~/scripts/pyproj/pyproj/proj_dir/share/proj

Python deps:
pyproj: 2.2.1
pip: 18.0

setuptools: 41.0.1
Cython: 0.29.10
aenum: (2, 1, 2)

No comments:

Post a Comment

arcpy - Changing output name when exporting data driven pages to JPG?

Is there a way to save the output JPG, changing the output file name to the page name, instead of page number? I mean changing the script fo...