Analysis and handling of georeferenced rasters and vectors

GlacioHack, updated πŸ•₯ 2023-03-30 18:02:49

geoutils

Set of tools to handle raster and vector data sets in Python.

build Conda Version Conda Platforms Conda Downloads PyPI version Coverage Status

This package offers Python classes and functions as well as command line tools to work with both geospatial raster and vector datasets. It is built upon rasterio and GeoPandas. In a single command it can import any geo-referenced dataset that is understood by these libraries, complete with all geo-referencing information, various helper functions and interface between vector/raster data.

Installation

With conda (recommended)

bash conda install --channel conda-forge --strict-channel-priority geoutils The --strict-channel-priority flag seems essential for Windows installs to function correctly, and is recommended for UNIX-based systems as well.

With pip

From PyPI: bash pip install geoutils

Or from the repository tarball: make sure GDAL and PROJ are properly installed, then: bash pip install https://github.com/GlacioHack/geoutils/tarball/main

Documentation

See the full documentation at https://geoutils.readthedocs.io.

Structure

GeoUtils is composed of three libraries: - georaster.py to handle raster data set. In particular, a Raster class to load a raster file along with metadata. - geovector.py to handle vector data set. In particular, a Vector class to load a raster file along with metadata. - projtools.py with various tools around projections.

How to contribute

You can find ways to improve the libraries in the issues section. All contributions are welcome.

  1. Fork the repository to your personal GitHub account, clone to your computer.
  2. (Optional but preferred:) Make a feature branch.
  3. Push to your feature branch.
  4. When ready, submit a Pull Request from your feature branch to GlacioHack/geoutils:master.
  5. The PR will be reviewed by at least one other person. Usually your PR will be merged via 'squash and merge'.

Direct pushing to the GlacioHack repository is not permitted.

A more detailed contribution instruction can be found here.

Issues

Implicit uncertainty propagation in the future?

opened on 2023-03-30 23:28:18 by rhugonnet

Ideas based on Astropy and their amazing NDData class to work in a World CRS with unit/uncertainty (we've been working quite a lot on the error subject also! :stuck_out_tongue:): https://github.com/astropy/astropy/blob/main/astropy/nddata/nddata.py https://github.com/astropy/astropy/blob/main/astropy/nddata/nduncertainty.py

We could add something similar to GeoUtils: the unit could be derived from the CRS, the uncertainty would be optional, but propagate naturally if present! We could have the same structure of uncertainty and same propagation algorithms (simplified by being in 2D)! :smile:

Update `Raster` array interface to work with masked arrays once available in NumPy

opened on 2023-03-30 22:44:46 by rhugonnet

Work in progress in NumPy to add __array_ufunc__ and __array_function__ to np.ma, after which we'll be able to integrate that much more easily in GeoUtils: https://github.com/numpy/numpy/pull/22914 https://github.com/numpy/numpy/pull/22913

Floating precision of `rio.transform` in `Raster.reproject()` when passing both `dst_bounds` and `dst_res` (or `dst_size`)

opened on 2023-03-03 02:02:10 by rhugonnet

See #356 This specific case is not fixed after #354.

Shift in dem.value_at_coord or dem.show

opened on 2023-03-02 14:18:16 by LucBeraud

Describe the bug There seems to be a diagonal ~ half-pixel shift between the DEM.show() displays and the DEM.value_at_coord() output.

To Reproduce import xdem img = xdem.DEM(gu.examples.get_path('everest_landsat_b4')) target_x, target_y = 486811, 3097050 # example of coordinate fig,ax=plt.subplots(1,1) img.show(vmin=50, vmax=200, ax=ax) ax.set_xlim(target_x-60,target_x+60); ax.set_ylim(target_y-60,target_y+60) # zoom on the area ax.scatter(target_x, target_y,c='r') # plot the desired coordinate on the image plot ax.set_title('at ({}, {}), dem.value_at_coords={}'.format(target_x, target_y, np.round(img.value_at_coords(target_x, target_y), 2))) # prints as figure title the value returned by value_at_coords() ax.plot([486805, 486805, 486805+30, 486805+30, 486805],[3097056,3097056-30, 3097056-30, 3097056, 3097056]) # if coord falls in this blue polygon ax.plot([486820, 486820, 486820+30, 486820+30, 486820],[3097040,3097040-30, 3097040-30, 3097040, 3097040]) # it returns the value of this orange one

Expected behavior The value_at_coord() value does not return same value as the plot most of the time, I expect that the value_at_coords() (169 in the example, value of the orange box pixel) matches the show() pixels (131 in the example, under the red point).

System (please complete the following information): - OS: Ubuntu 22 - Environment : xdem (0.0.7, 823fff239daa864748677e4d6bcb6bd153c15969 commit), geoutils (0.0.10) Capture d’écran du 2023-03-02 15-15-19

Functionalities required to finalize documentation and generate API

opened on 2023-02-14 00:34:18 by rhugonnet

It looks like we are missing these functions: - [x] Vector.save() - [x] Vector.show() - [x] Vector.__str__() - [x] Vector.__repr__()

And these properties are not defined as such for the API: - [x] Vector.ds, - [x] All of additional SatelliteImage attributes, - [x] Raster.crs and Raster.transform.

Also, looking at our geospatial tools, looks like we could add: - [x] Vector.simplify() to simplify/degrade resolution of a Vector. - [x] Make indexing [] possible with boolean ndarrays. - [x] Allow to pass a Mask in Raster.set_mask(), - [x] Make "in_value" and "target_value" attribute name consistent between polygonize and proximity. - [x] Make equal_raster and georeferenced_grid_equal naming more consistent (start with equal for both?) - [x] Automatically load data when calling arithmetic, logical or NumPy operation? (remove some load_data=True in the docs due to this) - [x] Make show() plot on top of same ax by default. - [x] Add clip option to Vector.crop(). - [x] A subsample wrapper function directly in the Raster class (allow to return coordinates). - [x] Ensure support of Ufuncs through np.ufunc.reduce and other similar operations, - [x] Rename cropGeom into crop_geom for consistency with other argument syntax, - [x] Streamline main description of Raster and Vector methods and attributes, - [x] Add tests for Mask logical operations and overloaded methods, - [x] Change nodata from list to tuple (needs to be immutable as a property), - [x] Add tests for untested functions (Raster.shift(), Raster.to_xarray(), etc), - [x] Improve general test coverage, in particular checking warnings raised directly by GeoUtils. - [x] Added as_array option to create_mask, as it nows returns a Mask by default. The shape is also always 2D to avoid the squeeze previously used everywhere. - [x] Fixed loading multi-band raster explicitly after instantiation with load(), failed because of how out_shape was defined.

Bug found: - [x] Maximum recursion depth calling np.ma.mean on a Raster. - [x] In doc, array interface: isclose is returning a masked_array instead of Mask? - [x] Cast from Mask to Raster does not work in Mask.proximity, need to modify Raster.from_array? - [x] Fixed coords(grid=True): add more tests

Changed (need to add tests in the test suite): - [x] Defined Raster.__repr__, modified __str__ to be consistent with NumPy, - [x] Set load_data=False as default, - [x] Loading data by default if stats=True is passed to Raster.info(), - [x] Cast array to float automatically in get_nanarray(), otherwise was raising error, - [x] Added overload for proximity and polygonize for Mask, - [x] Changed default parameter of polygonize to "all".

To finish or move to another issue: - [ ] Polygonizing all values on the Landsat_B4 example creates some LineString inside polygon? - [ ] Raster.resample to change only .res, - [ ] Homogenize array shape input of interp_points and value_at_coords, and variable names. - [ ] Make output shape consistent for interp_points and others: if singe-coordinate float is passed, return the same. Otherwise, if ArrayLike is passed, return ArrayLike. - [ ] A Vector random point generation in an outline, can pass a Raster object as match-reference to get a random point in its extent. - [ ] Simply input of function dealing with points: accept a Vector of type point, accept any shape of input list. Write a reshaping function that does checks and reshaping smoothly somewhere, and use it everywhere? - [ ] Nodata plotting for GeoDataFrame (see regular grid interpolation example). - [ ] Change name of value_at_coords to value_at_window? Add mask options for window (disk?) - [ ] Overload Mask.show to only a discontinuous, 2-color colorbar, - [ ] Looks like Raster.to_points is erroneous, fix for Warp gallery example (to show the grid on first plot) - [ ] Allow consistently a single shape or res argument for Vector rasterizing methods (rasterize, create_mask, proximity).

Write documentation and streamline general behaviour

opened on 2023-02-10 22:25:29 by rhugonnet

Summary

This PR writes the documentation for GeoUtils! :partying_face:

It also: - Streamlines several features and behaviours to better fit the objective of the package, - Re-organizes the structure of several modules, - Homogenizes the syntax of the docstrings, - Interfaces GeoPandas/Shapely functionalities in Vector, - Adds many tests, solving several bugs and bringing test coverage to more than 90% (was 75% previously).

This PR is branched from #350 on the Mask addition (now closed and everything is in this PR, as some Mask fixes started to depend on new behaviours only available here).

Documentation

The hidden branch with the rendered documentation is here: https://geoutils-rhugonnet.readthedocs.io/en/docs/index.html

Regarding theme and rendering, this PR: - Implements the sphinx-book-theme (same as OGGM and many other packages, it's a sub-theme of the PyData theme, used by NumPy, Pandas, SciPy, etc). Good news also: the dark theme PR has been merged in their repo, we'll have it available at their next version release (EDIT: It's released! I just need to update the logo files to support the Dark theme :D; SECOND EDIT: Done!), - Switches to MyST-parser which is an extension of Markdown: partially to avoid re-learning another language with rST, but mostly because it is getting popular thanks to its many functionalities. See next point. - Switches to MyST-NB that allows to add Jupyter code-cells to Markdown to render code in the documentation, and easily switch to interactive mode with a Binder. It allows to write Python code in Markdown, or to write Markdown in Jupyter notebook. Both will be rendered the same! Older packages (and even more recent ones like Xarray) had to rely on the IPython sphinx extension, which was really a hassle (multi-line is hard, few rendering options, etc, for example here: https://docs.xarray.dev/en/stable/user-guide/indexing.html#positional-indexing). - Removes the /doc/code folder and the literalinclude:: code/coregistration.py :lines: 5-21 we were using in xDEM, which was a hassle to maintain, thanks to the previous point. Proper errors should be raised by Sphinx or can be tested in the CI directly (we can run the .md files as a Jupyter notebook). - Switches from pip to mambaforge for the Readthedocs in .readthedocs.yaml, which was failing and slow. Also adds pip -e . to dev-environment.yml to simplify dev installation of the package. - Adds sphinx-design for rendering nice tables in the documentation. - Soon, the sphinx inheritance diagrams will support intersphinx, so the diagram I created that shows inheritance from GeoUtils to xDEM will finally work (right now it only renders locally). See https://github.com/sphinx-doc/sphinx/pull/10614. - Adds the sphinx_gallery_conf option remove_config_comments = True to remove config comments in gallery examples, such as for choosing thumbnails: # sphinx_gallery_thumbnail_number = 2, - Adds a post-build sphinx routine to remove myraster.tif and myvector.gpkg created during the "Open/Save" examples by Sphinx-Gallery, as advised here: https://github.com/sphinx-gallery/sphinx-gallery/issues/361.

Regarding the contents of the docs, I let you have a look there directly! In short: - Structure recommended for writing docs (Getting started/Features/Examples/References), - An inviting landing page, with a logo :stuck_out_tongue: and grid-cards shortcuts, - The classic "About/Install/Quick start" section, the first one important to make clear our place in the landscape of geospatial packages, - An explanation of the core concepts in "Fundamentals", key to understand the behaviour of the package (and its added value), with small embedded examples, - A descriptive documentation in "Rasters", "Vectors", with small embedded examples as well, - A full gallery of examples separated in "I/O", "Handling" and "Analysis", with 20 short gallery examples. - An API declined by section to make it more user-friendly, so semi-automatic (how it's done in NumPy, SciPy, Xarray, etc) where we need to specify any new function we add.

Architecture changes

This PR: - Moves satimg to georaster/, - Renames georaster.py and geovector.py into raster.py and vector.py (the "geo" already being in geoutils.raster, as previously discussed), - Re-organizes spatial_tools into raster/array.py, raster/multiraster.py and raster/sampling.py, - Moves geoviewer.py to a new bin/ folder.

Syntax changes

This PR: - Homogenize argument names across functions that use similar arguments, - Removes camel-case (e.g., cropGeom renamed into crop_geom), - Streamlines docstring names: a "raster" or "vector" is used to designate a Raster or Vector object (like an "array" is used for a np.array), all docstrings start without article (Raster to do that), - Adds many missing docstrings for class properties.

Changes linked to documentation

This PR: - Adds crs and transform as properties to Raster, with a data.setter, to ensure those are consistent and listed as properties in the API. - Adds ds, crs and bounds as properties to Vector, also for consistency and API, - Adds datetime, etc... as properties to SatelliteImage, also for consistency and API, - Adds "Is loaded?" into Raster.info(), - Adds __repr__ and modifies __str__ for Raster and Vector to be consistent with what is done in NumPy and GeoPandas, those now only prints "not_loaded" when the data is unloaded, - Adds _repr_html_ in Raster and Vector for representing rasters and vectors more stylishly in HTML, - Adds a copy_doc decorator in misc.py to automaticallt generated the docstring of wrapped GeoPandas function (their __doc__) by replacing geopandas.GeoSeries or geopandas.GeoDataFrame by Vector, and adding an intersphinx link to the GeoPandas API in our doc, - Adds Vector.save as a wrapper around GeoDataFrame.to_file for consistency with Raster, - Adds Vector.show as a wrapper around GeoDataFrame.plot for consistency with Raster. - Makes Raster.show and Vector.show recognize and respect the first axis, to be able to plot everything on top by simply doing raster.show(); vector.show(). Axes can still be passed explicitly through ax=. To plot on a new axis, one can add a plt.figure() or simply do rst.show(ax="new").

Mask class

See #350, and the documentation. This PR adds tests, fixes behaviour of logical functions and adds overridden reproject, crop, polygonize and proximity to Mask. Notable changes for Raster include: - Renames __eq__ to raster_equal for consistency with NumPy, __eq__ is now used by logical bitwise operations (see documentation and #350) and casts a Mask, - Renames equal_georeferenced_grid to georeferenced_grid_equal for naming consistency.

GeoPandas functionalities in Vector

This PR: - Wraps all geometric functionalities of the geopandas.base.GeoPandasBase non-public class (inherited by both GeoSeries and GeoDataFrame for geometric functionalities) and the geopandas.GeoDataFrame class to Vector. Everything can be called using Vector, for example Vector.unary_union. For the output, three cases: 1/ returns a Vector when it is a geometric output, 2/ proposes to append a non-geometric per-feature output (pandas.Series) to Vector.ds, and 3/ when it's another type of output, it returns that output directly. Why is this useful? Practicality and consistency: it can be a hassle to work with GeoPandas sometimes as a "geometric" operation on a GeoDataFrame can return a GeoDataFrame, a GeoSeries or a shapely.Geometry. And "non-geometric per-feature" functions like .area or .length return a pandas.Series of the same length. While GeoPandas' team is still improving a couple things upstream, many behaviours are actually intentional and fixed to maintain a lower-level API for all types of user. End-users focusing on analysis like us don't need this, and it slows us down, so it's quite practical to have everything logically and automatically re-cast to a GeoDataFrame in the Vector, or appended to it! :smile: - Adds many tests to check all functionalities behave as intended, that all GeoPandas functions are covered (with a couple exceptions), that all arguments are still up-to-date, as well as their default values. Inheritance is out of the question as it would prevent us from re-encapsulating the output (so we'd have to override everything the same, anyway). This is how GeoPandas wrap Shapely on their side, also. Right now, we don't wrap the Pandas functionalities, those have to be called through Vector.ds directly. - Automatically fetches the documentation from GeoPandas with a renaming for Vector for the basic description, and points to GeoPandas's API for details, to ensure a minimal maintenance effort.

Feature changes

This PR: - Changes the default load_data of Raster.__init__ to False, and adds a condition to run .load() when .data is called (if data is not already loaded). This is, in short, implicit loading! :smile: Adds tests to check this functionality, and modifies a couple things that were breaking this behaviour in .load() and __init__. Now implicit loading also works properly with the downsample and indexes arguments! Also fixes the behaviour for multi-band rasters, and adds tests, - Changes default in_valiue of polygonize to "all", as Mask.polygonize() now automatically overrides Raster.polygonize for a boolean input, - Makes indexing __getitem__ (i.e., []) possible with boolean np.ndarray and not only Mask, - Adds index assignment __setitem (i.e., [] = x) to Raster, - Adds a clip argument to Vector.crop, to optionally clip the geometries to the extent, - Adds a Raster.subsample function to perform subsampling directly from the raster, - Adds support for universal function methods through the array interface of Raster (e.g., reduce as in np.logical_or.reduce()), - Adds Vector.from_bounds_projected to automatically create a polygon vector in any CRS from the bounds of Raster or Vector, - Adds as_array argument to Vector.create_mask to return a boolean np.ndarray (returns a Mask by default).

Bug fixes and tests

This PR: - Fixes value_at_coords on several bands and adds tests, - Fixes coords that returned a wrong output with its default grid=True argument and adds tests, - Fixes a bug in get_nanarray to run filled(np.nan) when original data was not of float type and adds tests, - Fixes a maximum recursion depth error when trying to run np.ma functions on Raster due to __array__ (not supported by NumPy, they don't have an array interface for masked-array functions), - Fixes bugs to geoviewer.py and adds tests, - Fixes mutability of Raster.nodata by defining it as tuple instead of list, - Fixes many untested functions and adds tests: Raster.to_xarray, Raster.shift, misc.diff_yml, etc...

Issues resolved

This PR: - Resolves #247, - Resolves #225, - Resolves #143, - Resolves #311, - Resolves #260, - Resolves #122, - Resolves #81, - Resolves #241, - Resolves #355, - Resolves #352, - Resolves #101, - Resolves #338, - Resolves #337.

Releases

v0.0.10 2023-02-01 05:47:21

Added proximity class method for Raster and Vector matching GDAL's. Added subsetting (crop) of Raster and Vector through [] (__getitem__) and homogenized crop functionalities. Added buffer_metric method in Vector class. Optimized CI runs with caching. Several bug fixes.

What's Changed

  • Fix deprecate version check by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/320
  • Add caching to CI environment setup by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/321
  • Upgrade pre-commit by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/325
  • Fix small bug in spatial_tools by @adehecq in https://github.com/GlacioHack/geoutils/pull/326
  • Fix pip dependency listing in CI by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/328
  • Fix bug in Raster.split_bands by @adehecq in https://github.com/GlacioHack/geoutils/pull/329
  • Update SETSM product filename metadata parsing in SatelliteImage class by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/334
  • Subscript-crop Raster or Vector by bracket call [], consistency and bug fixes by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/335
  • Add proximity and buffer_metric functionalities with metric option based on finding local UTM by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/336

Full Changelog: https://github.com/GlacioHack/geoutils/compare/v0.0.9...v0.0.10

v0.0.9 2022-09-26 08:42:57

What's Changed

In summary, the major changes in this release are:

  • The removal of the Raster MemoryFile and fixes on all resulting issues,
  • The improvement of the NumPy array interface to cast NumPy functions on masked arrays instead of the unmasked NumPy array of the Raster, including arithmetic operations,
  • Improves the consistency of the data.setter and nodata.setter of Raster with more intuitive behaviour,
  • Add a lot of tests and CI improvements!

Full list:

  • Remove the memoryfile by @erikmannerfelt in https://github.com/GlacioHack/geoutils/pull/265
  • Clearly define default rio attributes by @adehecq in https://github.com/GlacioHack/geoutils/pull/281
  • Update links for repository renaming into lower-case geoutils by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/283
  • Move examples to geoutils-data including an additional float32 raster with NaNs by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/284
  • Add version increment step in HOW_TO_RELEASE by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/288
  • Fix several issues that arose since the MemoryFile PR #265 by @adehecq in https://github.com/GlacioHack/geoutils/pull/289
  • Add test coverage from Coveralls in CI by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/295
  • Check that environment.yml can import geoutils, and use diff with dev-environment.yml during CI by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/297
  • Remove yaml method for diff calculation of environment.yml and dev-env.yml by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/301
  • Improve data.setter of Raster for consistent array output by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/300
  • Add instructions on how to update the conda-forge feedstock by @erikmannerfelt in https://github.com/GlacioHack/geoutils/pull/303
  • Improve array interface of Raster for consistency with np.ma.masked_array operations by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/302
  • Add get_nanarray method by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/308
  • Improve loading different rasters and handling different CRS in Raster by @adehecq in https://github.com/GlacioHack/geoutils/pull/307
  • Modify set_nodata behaviour, improve related warnings and tests by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/309
  • Fix __eq__, make arithmetic operations fully consistent with masked arrays, deprecate gu.misc.array_equal and bug fixes by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/313
  • Add test for deprecator function by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/314
  • Fix version check in misc.test_deprecate by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/315
  • Elaborate in __eq__ docstring by @rhugonnet in https://github.com/GlacioHack/geoutils/pull/317

Full Changelog: https://github.com/GlacioHack/geoutils/compare/v0.0.8...v0.0.9

v0.0.8 2022-08-25 08:50:50

Exactly the same as v0.0.7. Only created a new release so that it can be published on PyPI.

v0.0.7 2022-08-24 11:49:27

What's Changed

For Raster class: * Add a masking functionality set_mask by @adehecq in https://github.com/GlacioHack/GeoUtils/pull/271 * Make reproject work even with data not loaded by @adehecq in https://github.com/GlacioHack/GeoUtils/pull/274 * Set default resampling to bilinear instead of nearest in reproject by @adehecq in https://github.com/GlacioHack/GeoUtils/pull/275

Full Changelog: https://github.com/GlacioHack/GeoUtils/compare/v0.0.6...v0.0.7

v0.0.6 2022-02-24 10:32:31

Minor bugfixes

v0.0.5 2021-10-22 14:26:39

What's Changed

Major changes:

  • Complete the suite of overloading of arithmetic functions by @adehecq in https://github.com/GlacioHack/GeoUtils/pull/257
  • Create georaster submodule by @atedstone in https://github.com/GlacioHack/GeoUtils/pull/251
  • Improve the documentation backend by @erikmannerfelt in https://github.com/GlacioHack/GeoUtils/pull/252
  • Move xdem spatial_tools functionality to geoutils by @atedstone in https://github.com/GlacioHack/GeoUtils/pull/253

Minor changes

  • Fix development contribution instructions by @atedstone in https://github.com/GlacioHack/GeoUtils/pull/248
  • Correct the width/height calculation in geovector.rasterize by @atedstone in https://github.com/GlacioHack/GeoUtils/pull/254

Full Changelog: https://github.com/GlacioHack/GeoUtils/compare/v0.0.4...v0.0.5

geopandas geospatial-processing gis python raster-data rasterio vector-data masking polygonization rasterization reprojection